Another idea, could we change our "Retest X" phrases with "Retest X (Reason)" phrases? With this change a PR author will have to look at failed test logs. They could catch new flakiness introduced by their PR, file a JIRA for a flakiness that was not noted before, or ping an existing JIRA issue/raise its severity. On the downside this will require PR authors to do more.
On Fri, Jul 17, 2020 at 6:46 AM Tyson Hamilton <tyso...@google.com> wrote: > Adding retries can be beneficial in two ways, unblocking a PR, and > collecting metrics about the flakes. > Makes sense. I think we will still need to have a plan to remove retries similar to re-enabling disabled tests. > > If we also had a flaky test leaderboard that showed which tests are the > most flaky, then we could take action on them. Encouraging someone from the > community to fix the flaky test is another issue. > > The test status matrix of tests that is on the GitHub landing page could > show flake level to communicate to users which modules are losing a > trustable test signal. Maybe this shows up as a flake % or a code coverage > % that decreases due to disabled flaky tests. > +1 to a dashboard that will show a "leaderboard" of flaky tests. > > I didn't look for plugins, just dreaming up some options. > > > > > On Thu, Jul 16, 2020, 5:58 PM Luke Cwik <lc...@google.com> wrote: > >> What do other Apache projects do to address this issue? >> >> On Thu, Jul 16, 2020 at 5:51 PM Ahmet Altay <al...@google.com> wrote: >> >>> I agree with the comments in this thread. >>> - If we are not re-enabling tests back again or we do not have a plan to >>> re-enable them again, disabling tests only provides us temporary relief >>> until eventually users find issues instead of disabled tests. >>> - I feel similarly about retries. It is reasonable to add retries for >>> reasons we understand. Adding retries to avoid flakes is similar to >>> disabling tests. They might hide real issues. >>> >>> I think we are missing a way for checking that we are making progress on >>> P1 issues. For example, P0 issues block releases and this obviously results >>> in fixing/triaging/addressing P0 issues at least every 6 weeks. We do not >>> have a similar process for flaky tests. I do not know what would be a good >>> policy. One suggestion is to ping (email/slack) assignees of issues. I >>> recently missed a flaky issue that was assigned to me. A ping like that >>> would have reminded me. And if an assignee cannot help/does not have the >>> time, we can try to find a new assignee. >>> >>> Ahmet >>> >>> >>> On Thu, Jul 16, 2020 at 11:52 AM Valentyn Tymofieiev < >>> valen...@google.com> wrote: >>> >>>> I think the original discussion[1] on introducing tenacity might answer >>>> that question. >>>> >>>> [1] >>>> https://lists.apache.org/thread.html/16060fd7f4d408857a5e4a2598cc96ebac0f744b65bf4699001350af%40%3Cdev.beam.apache.org%3E >>>> >>>> On Thu, Jul 16, 2020 at 10:48 AM Rui Wang <ruw...@google.com> wrote: >>>> >>>>> Is there an observation that enabling tenacity improves the >>>>> development experience on Python SDK? E.g. less wait time to get PR pass >>>>> and merged? Or it might be a matter of a right number of retry to align >>>>> with the "flakiness" of a test? >>>>> >>>>> >>>>> -Rui >>>>> >>>>> On Thu, Jul 16, 2020 at 10:38 AM Valentyn Tymofieiev < >>>>> valen...@google.com> wrote: >>>>> >>>>>> We used tenacity[1] to retry some unit tests for which we understood >>>>>> the nature of flakiness. >>>>>> >>>>>> [1] >>>>>> https://github.com/apache/beam/blob/3b9aae2bcaeb48ab43a77368ae496edc73634c91/sdks/python/apache_beam/runners/portability/fn_api_runner/fn_runner_test.py#L1156 >>>>>> >>>>>> On Thu, Jul 16, 2020 at 10:25 AM Kenneth Knowles <k...@apache.org> >>>>>> wrote: >>>>>> >>>>>>> Didn't we use something like that flaky retry plugin for Python >>>>>>> tests at some point? Adding retries may be preferable to disabling the >>>>>>> test. We need a process to remove the retries ASAP though. As Luke says >>>>>>> that is not so easy to make happen. Having a way to make P1 bugs more >>>>>>> visible in an ongoing way may help. >>>>>>> >>>>>>> Kenn >>>>>>> >>>>>>> On Thu, Jul 16, 2020 at 8:57 AM Luke Cwik <lc...@google.com> wrote: >>>>>>> >>>>>>>> I don't think I have seen tests that were previously disabled >>>>>>>> become re-enabled. >>>>>>>> >>>>>>>> It seems as though we have about ~60 disabled tests in Java and ~15 >>>>>>>> in Python. Half of the Java ones seem to be in ZetaSQL/SQL due to >>>>>>>> missing >>>>>>>> features so unrelated to being a flake. >>>>>>>> >>>>>>>> On Thu, Jul 16, 2020 at 8:49 AM Gleb Kanterov <g...@spotify.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> There is something called test-retry-gradle-plugin [1]. It retries >>>>>>>>> tests if they fail, and have different modes to handle flaky tests. >>>>>>>>> Did we >>>>>>>>> ever try or consider using it? >>>>>>>>> >>>>>>>>> [1]: https://github.com/gradle/test-retry-gradle-plugin >>>>>>>>> >>>>>>>>> On Thu, Jul 16, 2020 at 1:15 PM Gleb Kanterov <g...@spotify.com> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> I agree with what Ahmet is saying. I can share my perspective, >>>>>>>>>> recently I had to retrigger build 6 times due to flaky tests, and >>>>>>>>>> each >>>>>>>>>> retrigger took one hour of waiting time. >>>>>>>>>> >>>>>>>>>> I've seen examples of automatic tracking of flaky tests, where a >>>>>>>>>> test is considered flaky if both fails and succeeds for the same git >>>>>>>>>> SHA. >>>>>>>>>> Not sure if there is anything we can enable to get this >>>>>>>>>> automatically. >>>>>>>>>> >>>>>>>>>> /Gleb >>>>>>>>>> >>>>>>>>>> On Thu, Jul 16, 2020 at 2:33 AM Ahmet Altay <al...@google.com> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> I think it will be reasonable to disable/sickbay any flaky test >>>>>>>>>>> that is actively blocking people. Collective cost of flaky tests >>>>>>>>>>> for such a >>>>>>>>>>> large group of contributors is very significant. >>>>>>>>>>> >>>>>>>>>>> Most of these issues are unassigned. IMO, it makes sense to >>>>>>>>>>> assign these issues to the most relevant person (who added the >>>>>>>>>>> test/who >>>>>>>>>>> generally maintains those components). Those people can either fix >>>>>>>>>>> and >>>>>>>>>>> re-enable the tests, or remove them if they no longer provide >>>>>>>>>>> valuable >>>>>>>>>>> signals. >>>>>>>>>>> >>>>>>>>>>> Ahmet >>>>>>>>>>> >>>>>>>>>>> On Wed, Jul 15, 2020 at 4:55 PM Kenneth Knowles <k...@apache.org> >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> The situation is much worse than that IMO. My experience of the >>>>>>>>>>>> last few days is that a large portion of time went to *just >>>>>>>>>>>> connecting >>>>>>>>>>>> failing runs with the corresponding Jira tickets or filing new >>>>>>>>>>>> ones*. >>>>>>>>>>>> >>>>>>>>>>>> Summarized on PRs: >>>>>>>>>>>> >>>>>>>>>>>> - >>>>>>>>>>>> https://github.com/apache/beam/pull/12272#issuecomment-659050891 >>>>>>>>>>>> - >>>>>>>>>>>> https://github.com/apache/beam/pull/12273#issuecomment-659070317 >>>>>>>>>>>> - >>>>>>>>>>>> https://github.com/apache/beam/pull/12225#issuecomment-656973073 >>>>>>>>>>>> - >>>>>>>>>>>> https://github.com/apache/beam/pull/12225#issuecomment-657743373 >>>>>>>>>>>> - >>>>>>>>>>>> https://github.com/apache/beam/pull/12224#issuecomment-657744481 >>>>>>>>>>>> - >>>>>>>>>>>> https://github.com/apache/beam/pull/12216#issuecomment-657735289 >>>>>>>>>>>> - >>>>>>>>>>>> https://github.com/apache/beam/pull/12216#issuecomment-657780781 >>>>>>>>>>>> - >>>>>>>>>>>> https://github.com/apache/beam/pull/12216#issuecomment-657799415 >>>>>>>>>>>> >>>>>>>>>>>> The tickets: >>>>>>>>>>>> >>>>>>>>>>>> - https://issues.apache.org/jira/browse/BEAM-10460 >>>>>>>>>>>> SparkPortableExecutionTest >>>>>>>>>>>> - https://issues.apache.org/jira/browse/BEAM-10471 >>>>>>>>>>>> CassandraIOTest > testEstimatedSizeBytes >>>>>>>>>>>> - https://issues.apache.org/jira/browse/BEAM-10504 >>>>>>>>>>>> ElasticSearchIOTest > testWriteFullAddressing and >>>>>>>>>>>> testWriteWithIndexFn >>>>>>>>>>>> - https://issues.apache.org/jira/browse/BEAM-10470 >>>>>>>>>>>> JdbcDriverTest >>>>>>>>>>>> - https://issues.apache.org/jira/browse/BEAM-8025 >>>>>>>>>>>> CassandraIOTest > @BeforeClass (classmethod) >>>>>>>>>>>> - https://issues.apache.org/jira/browse/BEAM-8454 >>>>>>>>>>>> FnHarnessTest >>>>>>>>>>>> - https://issues.apache.org/jira/browse/BEAM-10506 >>>>>>>>>>>> SplunkEventWriterTest >>>>>>>>>>>> - https://issues.apache.org/jira/browse/BEAM-10472 direct >>>>>>>>>>>> runner ParDoLifecycleTest >>>>>>>>>>>> - https://issues.apache.org/jira/browse/BEAM-9187 >>>>>>>>>>>> DefaultJobBundleFactoryTest >>>>>>>>>>>> >>>>>>>>>>>> Here are our P1 test flake bugs: >>>>>>>>>>>> https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22)%20AND%20resolution%20%3D%20Unresolved%20AND%20labels%20%3D%20flake%20ORDER%20BY%20priority%20DESC%2C%20updated%20DESC >>>>>>>>>>>> >>>>>>>>>>>> It seems quite a few of them are actively hindering people >>>>>>>>>>>> right now. >>>>>>>>>>>> >>>>>>>>>>>> Kenn >>>>>>>>>>>> >>>>>>>>>>>> On Wed, Jul 15, 2020 at 4:23 PM Andrew Pilloud < >>>>>>>>>>>> apill...@google.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> We have two test suites that are responsible for a large >>>>>>>>>>>>> percentage of our flaky tests and both have bugs open for about >>>>>>>>>>>>> a year >>>>>>>>>>>>> without being fixed. These suites are ParDoLifecycleTest ( >>>>>>>>>>>>> BEAM-8101 <https://issues.apache.org/jira/browse/BEAM-8101>) >>>>>>>>>>>>> in Java and BigQueryWriteIntegrationTests in python (py3 >>>>>>>>>>>>> BEAM-9484 <https://issues.apache.org/jira/browse/BEAM-9484>, >>>>>>>>>>>>> py2 BEAM-9232 >>>>>>>>>>>>> <https://issues.apache.org/jira/browse/BEAM-9232>, old >>>>>>>>>>>>> duplicate BEAM-8197 >>>>>>>>>>>>> <https://issues.apache.org/jira/browse/BEAM-8197>). >>>>>>>>>>>>> >>>>>>>>>>>>> Are there any volunteers to look into these issues? What can >>>>>>>>>>>>> we do to mitigate the flakiness until someone has time to >>>>>>>>>>>>> investigate? >>>>>>>>>>>>> >>>>>>>>>>>>> Andrew >>>>>>>>>>>>> >>>>>>>>>>>>