Didn't we use something like that flaky retry plugin for Python tests at some point? Adding retries may be preferable to disabling the test. We need a process to remove the retries ASAP though. As Luke says that is not so easy to make happen. Having a way to make P1 bugs more visible in an ongoing way may help.
Kenn On Thu, Jul 16, 2020 at 8:57 AM Luke Cwik <lc...@google.com> wrote: > I don't think I have seen tests that were previously disabled become > re-enabled. > > It seems as though we have about ~60 disabled tests in Java and ~15 in > Python. Half of the Java ones seem to be in ZetaSQL/SQL due to missing > features so unrelated to being a flake. > > On Thu, Jul 16, 2020 at 8:49 AM Gleb Kanterov <g...@spotify.com> wrote: > >> There is something called test-retry-gradle-plugin [1]. It retries tests >> if they fail, and have different modes to handle flaky tests. Did we ever >> try or consider using it? >> >> [1]: https://github.com/gradle/test-retry-gradle-plugin >> >> On Thu, Jul 16, 2020 at 1:15 PM Gleb Kanterov <g...@spotify.com> wrote: >> >>> I agree with what Ahmet is saying. I can share my perspective, recently >>> I had to retrigger build 6 times due to flaky tests, and each retrigger >>> took one hour of waiting time. >>> >>> I've seen examples of automatic tracking of flaky tests, where a test is >>> considered flaky if both fails and succeeds for the same git SHA. Not sure >>> if there is anything we can enable to get this automatically. >>> >>> /Gleb >>> >>> On Thu, Jul 16, 2020 at 2:33 AM Ahmet Altay <al...@google.com> wrote: >>> >>>> I think it will be reasonable to disable/sickbay any flaky test that is >>>> actively blocking people. Collective cost of flaky tests for such a large >>>> group of contributors is very significant. >>>> >>>> Most of these issues are unassigned. IMO, it makes sense to assign >>>> these issues to the most relevant person (who added the test/who generally >>>> maintains those components). Those people can either fix and re-enable the >>>> tests, or remove them if they no longer provide valuable signals. >>>> >>>> Ahmet >>>> >>>> On Wed, Jul 15, 2020 at 4:55 PM Kenneth Knowles <k...@apache.org> >>>> wrote: >>>> >>>>> The situation is much worse than that IMO. My experience of the last >>>>> few days is that a large portion of time went to *just connecting failing >>>>> runs with the corresponding Jira tickets or filing new ones*. >>>>> >>>>> Summarized on PRs: >>>>> >>>>> - https://github.com/apache/beam/pull/12272#issuecomment-659050891 >>>>> - https://github.com/apache/beam/pull/12273#issuecomment-659070317 >>>>> - https://github.com/apache/beam/pull/12225#issuecomment-656973073 >>>>> - https://github.com/apache/beam/pull/12225#issuecomment-657743373 >>>>> - https://github.com/apache/beam/pull/12224#issuecomment-657744481 >>>>> - https://github.com/apache/beam/pull/12216#issuecomment-657735289 >>>>> - https://github.com/apache/beam/pull/12216#issuecomment-657780781 >>>>> - https://github.com/apache/beam/pull/12216#issuecomment-657799415 >>>>> >>>>> The tickets: >>>>> >>>>> - https://issues.apache.org/jira/browse/BEAM-10460 >>>>> SparkPortableExecutionTest >>>>> - https://issues.apache.org/jira/browse/BEAM-10471 CassandraIOTest > >>>>> testEstimatedSizeBytes >>>>> - https://issues.apache.org/jira/browse/BEAM-10504 >>>>> ElasticSearchIOTest > testWriteFullAddressing and testWriteWithIndexFn >>>>> - https://issues.apache.org/jira/browse/BEAM-10470 JdbcDriverTest >>>>> - https://issues.apache.org/jira/browse/BEAM-8025 CassandraIOTest >>>>> > @BeforeClass (classmethod) >>>>> - https://issues.apache.org/jira/browse/BEAM-8454 FnHarnessTest >>>>> - https://issues.apache.org/jira/browse/BEAM-10506 >>>>> SplunkEventWriterTest >>>>> - https://issues.apache.org/jira/browse/BEAM-10472 direct runner >>>>> ParDoLifecycleTest >>>>> - https://issues.apache.org/jira/browse/BEAM-9187 >>>>> DefaultJobBundleFactoryTest >>>>> >>>>> Here are our P1 test flake bugs: >>>>> https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22)%20AND%20resolution%20%3D%20Unresolved%20AND%20labels%20%3D%20flake%20ORDER%20BY%20priority%20DESC%2C%20updated%20DESC >>>>> >>>>> It seems quite a few of them are actively hindering people right now. >>>>> >>>>> Kenn >>>>> >>>>> On Wed, Jul 15, 2020 at 4:23 PM Andrew Pilloud <apill...@google.com> >>>>> wrote: >>>>> >>>>>> We have two test suites that are responsible for a large percentage >>>>>> of our flaky tests and both have bugs open for about a year without >>>>>> being >>>>>> fixed. These suites are ParDoLifecycleTest (BEAM-8101 >>>>>> <https://issues.apache.org/jira/browse/BEAM-8101>) in Java >>>>>> and BigQueryWriteIntegrationTests in python (py3 BEAM-9484 >>>>>> <https://issues.apache.org/jira/browse/BEAM-9484>, py2 BEAM-9232 >>>>>> <https://issues.apache.org/jira/browse/BEAM-9232>, old duplicate >>>>>> BEAM-8197 <https://issues.apache.org/jira/browse/BEAM-8197>). >>>>>> >>>>>> Are there any volunteers to look into these issues? What can we do to >>>>>> mitigate the flakiness until someone has time to investigate? >>>>>> >>>>>> Andrew >>>>>> >>>>>