Adding https://testautonation.com/analyse-test-results-deflake-flaky-tests/ to the list which seems a more powerful test history tool.
On Fri, Jul 24, 2020 at 1:51 PM Kenneth Knowles <[email protected]> wrote: > Had some off-list chats to brainstorm and I wanted to bring ideas back to > the dev@ list for consideration. A lot can be combined. I would really > like to have a section in the release notes. I like the idea of banishing > flakes from pre-commit (since you can't tell easily if it was a real > failure caused by the PR) and auto-retrying in post-commit (so we can > gather data on exactly what is flaking without a lot of manual > investigation). > > *Include ignored or quarantined tests in the release notes* > Pro: > - Users are aware of what is not being tested so may be silently broken > - It forces discussion of ignored tests to be part of our community > processes > Con: > - It may look bad if the list is large (this is actually also a Pro > because if it looks bad, it is bad) > > *Run flaky tests only in postcommit* > Pro: > - isolates the bad signal so pre-commit is not affected > - saves pointless re-runs in pre-commit > - keeps a signal in post-commit that we can watch, instead of losing it > completely when we disable a test > - maybe keeps the flaky tests in job related to what they are testing > Con: > - we have to really watch post-commit or flakes can turn into failures > > *Separate flaky tests into quarantine job* > Pro: > - gain signal for healthy tests, as with disabling or running in > post-commit > - also saves pointless re-runs > Con: > - may collect bad tests so that we never look at it so it is the same as > disabling the test > - lots of unrelated tests grouped into signal instead of focused on > health of a particular component > > *Add Gradle or Jenkins plugin to retry flaky tests* > https://blog.gradle.org/gradle-flaky-test-retry-plugin > https://plugins.jenkins.io/flaky-test-handler/ > Pro: > - easier than Jiras with human pasting links; works with moving flakes to > post-commit > - get a somewhat automated view of flakiness, whether in pre-commit or > post-commit > - don't get stopped by flakiness > Con: > - maybe too easy to ignore flakes; we should add all flakes (not just > disabled or quarantined) to the release notes > - sometimes flakes are actual bugs (like concurrency) so treating this as > OK is not desirable > - without Jiras, no automated release notes > - Jenkins: retry only will work at job level because it needs Maven to > retry only failed (I think) > - Jenkins: some of our jobs may have duplicate test names (but might > already be fixed) > > *Consider Gradle Enterprise* > Pro: > - get Gradle scan granularity of flake data (and other stuff) > - also gives module-level health which we do not have today > Con: > - cost and administrative burden unknown > - we probably have to do some small work to make our jobs compatible with > their history tracking > > *Require link to Jira to rerun a test* > Instead of saying "Run Java PreCommit" you have to link to the bug > relating to the failure. > Pro: > - forces investigation > - helps others find out about issues > Con: > - adds a lot of manual work, or requires automation (which will probably > be ad hoc and fragile) > > Kenn > > On Mon, Jul 20, 2020 at 11:59 AM Brian Hulette <[email protected]> > wrote: > >> > I think we are missing a way for checking that we are making progress >> on P1 issues. For example, P0 issues block releases and this obviously >> results in fixing/triaging/addressing P0 issues at least every 6 weeks. We >> do not have a similar process for flaky tests. I do not know what would be >> a good policy. One suggestion is to ping (email/slack) assignees of issues. >> I recently missed a flaky issue that was assigned to me. A ping like that >> would have reminded me. And if an assignee cannot help/does not have the >> time, we can try to find a new assignee. >> >> Yeah I think this is something we should address. With the new jira >> automation at least assignees should get an email notification after 30 >> days because of a jira comment like [1], but that's too long to let a test >> continue to flake. Could Beam Jira Bot ping every N days for P1s that >> aren't making progress? >> >> That wouldn't help us with P1s that have no assignee, or are assigned to >> overloaded people. It seems we'd need some kind of dashboard or report to >> capture those. >> >> [1] >> https://issues.apache.org/jira/browse/BEAM-8101?focusedCommentId=17121918&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17121918 >> >> On Fri, Jul 17, 2020 at 1:09 PM Ahmet Altay <[email protected]> wrote: >> >>> Another idea, could we change our "Retest X" phrases with "Retest X >>> (Reason)" phrases? With this change a PR author will have to look at failed >>> test logs. They could catch new flakiness introduced by their PR, file a >>> JIRA for a flakiness that was not noted before, or ping an existing JIRA >>> issue/raise its severity. On the downside this will require PR authors to >>> do more. >>> >>> On Fri, Jul 17, 2020 at 6:46 AM Tyson Hamilton <[email protected]> >>> wrote: >>> >>>> Adding retries can be beneficial in two ways, unblocking a PR, and >>>> collecting metrics about the flakes. >>>> >>> >>> Makes sense. I think we will still need to have a plan to remove retries >>> similar to re-enabling disabled tests. >>> >>> >>>> >>>> If we also had a flaky test leaderboard that showed which tests are the >>>> most flaky, then we could take action on them. Encouraging someone from the >>>> community to fix the flaky test is another issue. >>>> >>>> The test status matrix of tests that is on the GitHub landing page >>>> could show flake level to communicate to users which modules are losing a >>>> trustable test signal. Maybe this shows up as a flake % or a code coverage >>>> % that decreases due to disabled flaky tests. >>>> >>> >>> +1 to a dashboard that will show a "leaderboard" of flaky tests. >>> >>> >>>> >>>> I didn't look for plugins, just dreaming up some options. >>>> >>>> >>>> >>>> >>>> On Thu, Jul 16, 2020, 5:58 PM Luke Cwik <[email protected]> wrote: >>>> >>>>> What do other Apache projects do to address this issue? >>>>> >>>>> On Thu, Jul 16, 2020 at 5:51 PM Ahmet Altay <[email protected]> wrote: >>>>> >>>>>> I agree with the comments in this thread. >>>>>> - If we are not re-enabling tests back again or we do not have a plan >>>>>> to re-enable them again, disabling tests only provides us temporary >>>>>> relief >>>>>> until eventually users find issues instead of disabled tests. >>>>>> - I feel similarly about retries. It is reasonable to add retries for >>>>>> reasons we understand. Adding retries to avoid flakes is similar to >>>>>> disabling tests. They might hide real issues. >>>>>> >>>>>> I think we are missing a way for checking that we are making progress >>>>>> on P1 issues. For example, P0 issues block releases and this obviously >>>>>> results in fixing/triaging/addressing P0 issues at least every 6 weeks. >>>>>> We >>>>>> do not have a similar process for flaky tests. I do not know what would >>>>>> be >>>>>> a good policy. One suggestion is to ping (email/slack) assignees of >>>>>> issues. >>>>>> I recently missed a flaky issue that was assigned to me. A ping like that >>>>>> would have reminded me. And if an assignee cannot help/does not have the >>>>>> time, we can try to find a new assignee. >>>>>> >>>>>> Ahmet >>>>>> >>>>>> >>>>>> On Thu, Jul 16, 2020 at 11:52 AM Valentyn Tymofieiev < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> I think the original discussion[1] on introducing tenacity might >>>>>>> answer that question. >>>>>>> >>>>>>> [1] >>>>>>> https://lists.apache.org/thread.html/16060fd7f4d408857a5e4a2598cc96ebac0f744b65bf4699001350af%40%3Cdev.beam.apache.org%3E >>>>>>> >>>>>>> On Thu, Jul 16, 2020 at 10:48 AM Rui Wang <[email protected]> wrote: >>>>>>> >>>>>>>> Is there an observation that enabling tenacity improves the >>>>>>>> development experience on Python SDK? E.g. less wait time to get PR >>>>>>>> pass >>>>>>>> and merged? Or it might be a matter of a right number of retry to align >>>>>>>> with the "flakiness" of a test? >>>>>>>> >>>>>>>> >>>>>>>> -Rui >>>>>>>> >>>>>>>> On Thu, Jul 16, 2020 at 10:38 AM Valentyn Tymofieiev < >>>>>>>> [email protected]> wrote: >>>>>>>> >>>>>>>>> We used tenacity[1] to retry some unit tests for which we >>>>>>>>> understood the nature of flakiness. >>>>>>>>> >>>>>>>>> [1] >>>>>>>>> https://github.com/apache/beam/blob/3b9aae2bcaeb48ab43a77368ae496edc73634c91/sdks/python/apache_beam/runners/portability/fn_api_runner/fn_runner_test.py#L1156 >>>>>>>>> >>>>>>>>> On Thu, Jul 16, 2020 at 10:25 AM Kenneth Knowles <[email protected]> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Didn't we use something like that flaky retry plugin for Python >>>>>>>>>> tests at some point? Adding retries may be preferable to disabling >>>>>>>>>> the >>>>>>>>>> test. We need a process to remove the retries ASAP though. As Luke >>>>>>>>>> says >>>>>>>>>> that is not so easy to make happen. Having a way to make P1 bugs more >>>>>>>>>> visible in an ongoing way may help. >>>>>>>>>> >>>>>>>>>> Kenn >>>>>>>>>> >>>>>>>>>> On Thu, Jul 16, 2020 at 8:57 AM Luke Cwik <[email protected]> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> I don't think I have seen tests that were previously disabled >>>>>>>>>>> become re-enabled. >>>>>>>>>>> >>>>>>>>>>> It seems as though we have about ~60 disabled tests in Java and >>>>>>>>>>> ~15 in Python. Half of the Java ones seem to be in ZetaSQL/SQL due >>>>>>>>>>> to >>>>>>>>>>> missing features so unrelated to being a flake. >>>>>>>>>>> >>>>>>>>>>> On Thu, Jul 16, 2020 at 8:49 AM Gleb Kanterov <[email protected]> >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> There is something called test-retry-gradle-plugin [1]. It >>>>>>>>>>>> retries tests if they fail, and have different modes to handle >>>>>>>>>>>> flaky tests. >>>>>>>>>>>> Did we ever try or consider using it? >>>>>>>>>>>> >>>>>>>>>>>> [1]: https://github.com/gradle/test-retry-gradle-plugin >>>>>>>>>>>> >>>>>>>>>>>> On Thu, Jul 16, 2020 at 1:15 PM Gleb Kanterov <[email protected]> >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> I agree with what Ahmet is saying. I can share my perspective, >>>>>>>>>>>>> recently I had to retrigger build 6 times due to flaky tests, and >>>>>>>>>>>>> each >>>>>>>>>>>>> retrigger took one hour of waiting time. >>>>>>>>>>>>> >>>>>>>>>>>>> I've seen examples of automatic tracking of flaky tests, where >>>>>>>>>>>>> a test is considered flaky if both fails and succeeds for the >>>>>>>>>>>>> same git SHA. >>>>>>>>>>>>> Not sure if there is anything we can enable to get this >>>>>>>>>>>>> automatically. >>>>>>>>>>>>> >>>>>>>>>>>>> /Gleb >>>>>>>>>>>>> >>>>>>>>>>>>> On Thu, Jul 16, 2020 at 2:33 AM Ahmet Altay <[email protected]> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> I think it will be reasonable to disable/sickbay any flaky >>>>>>>>>>>>>> test that is actively blocking people. Collective cost of flaky >>>>>>>>>>>>>> tests for >>>>>>>>>>>>>> such a large group of contributors is very significant. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Most of these issues are unassigned. IMO, it makes sense to >>>>>>>>>>>>>> assign these issues to the most relevant person (who added the >>>>>>>>>>>>>> test/who >>>>>>>>>>>>>> generally maintains those components). Those people can either >>>>>>>>>>>>>> fix and >>>>>>>>>>>>>> re-enable the tests, or remove them if they no longer provide >>>>>>>>>>>>>> valuable >>>>>>>>>>>>>> signals. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Ahmet >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Wed, Jul 15, 2020 at 4:55 PM Kenneth Knowles < >>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> The situation is much worse than that IMO. My experience of >>>>>>>>>>>>>>> the last few days is that a large portion of time went to *just >>>>>>>>>>>>>>> connecting >>>>>>>>>>>>>>> failing runs with the corresponding Jira tickets or filing new >>>>>>>>>>>>>>> ones*. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Summarized on PRs: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> - >>>>>>>>>>>>>>> https://github.com/apache/beam/pull/12272#issuecomment-659050891 >>>>>>>>>>>>>>> - >>>>>>>>>>>>>>> https://github.com/apache/beam/pull/12273#issuecomment-659070317 >>>>>>>>>>>>>>> - >>>>>>>>>>>>>>> https://github.com/apache/beam/pull/12225#issuecomment-656973073 >>>>>>>>>>>>>>> - >>>>>>>>>>>>>>> https://github.com/apache/beam/pull/12225#issuecomment-657743373 >>>>>>>>>>>>>>> - >>>>>>>>>>>>>>> https://github.com/apache/beam/pull/12224#issuecomment-657744481 >>>>>>>>>>>>>>> - >>>>>>>>>>>>>>> https://github.com/apache/beam/pull/12216#issuecomment-657735289 >>>>>>>>>>>>>>> - >>>>>>>>>>>>>>> https://github.com/apache/beam/pull/12216#issuecomment-657780781 >>>>>>>>>>>>>>> - >>>>>>>>>>>>>>> https://github.com/apache/beam/pull/12216#issuecomment-657799415 >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> The tickets: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> - https://issues.apache.org/jira/browse/BEAM-10460 >>>>>>>>>>>>>>> SparkPortableExecutionTest >>>>>>>>>>>>>>> - https://issues.apache.org/jira/browse/BEAM-10471 >>>>>>>>>>>>>>> CassandraIOTest > testEstimatedSizeBytes >>>>>>>>>>>>>>> - https://issues.apache.org/jira/browse/BEAM-10504 >>>>>>>>>>>>>>> ElasticSearchIOTest > testWriteFullAddressing and >>>>>>>>>>>>>>> testWriteWithIndexFn >>>>>>>>>>>>>>> - https://issues.apache.org/jira/browse/BEAM-10470 >>>>>>>>>>>>>>> JdbcDriverTest >>>>>>>>>>>>>>> - https://issues.apache.org/jira/browse/BEAM-8025 >>>>>>>>>>>>>>> CassandraIOTest > @BeforeClass (classmethod) >>>>>>>>>>>>>>> - https://issues.apache.org/jira/browse/BEAM-8454 >>>>>>>>>>>>>>> FnHarnessTest >>>>>>>>>>>>>>> - https://issues.apache.org/jira/browse/BEAM-10506 >>>>>>>>>>>>>>> SplunkEventWriterTest >>>>>>>>>>>>>>> - https://issues.apache.org/jira/browse/BEAM-10472 direct >>>>>>>>>>>>>>> runner ParDoLifecycleTest >>>>>>>>>>>>>>> - https://issues.apache.org/jira/browse/BEAM-9187 >>>>>>>>>>>>>>> DefaultJobBundleFactoryTest >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Here are our P1 test flake bugs: >>>>>>>>>>>>>>> https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22)%20AND%20resolution%20%3D%20Unresolved%20AND%20labels%20%3D%20flake%20ORDER%20BY%20priority%20DESC%2C%20updated%20DESC >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> It seems quite a few of them are actively hindering people >>>>>>>>>>>>>>> right now. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Kenn >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Wed, Jul 15, 2020 at 4:23 PM Andrew Pilloud < >>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> We have two test suites that are responsible for a large >>>>>>>>>>>>>>>> percentage of our flaky tests and both have bugs open for >>>>>>>>>>>>>>>> about a year >>>>>>>>>>>>>>>> without being fixed. These suites are ParDoLifecycleTest ( >>>>>>>>>>>>>>>> BEAM-8101 <https://issues.apache.org/jira/browse/BEAM-8101>) >>>>>>>>>>>>>>>> in Java and BigQueryWriteIntegrationTests in python (py3 >>>>>>>>>>>>>>>> BEAM-9484 <https://issues.apache.org/jira/browse/BEAM-9484>, >>>>>>>>>>>>>>>> py2 BEAM-9232 >>>>>>>>>>>>>>>> <https://issues.apache.org/jira/browse/BEAM-9232>, old >>>>>>>>>>>>>>>> duplicate BEAM-8197 >>>>>>>>>>>>>>>> <https://issues.apache.org/jira/browse/BEAM-8197>). >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Are there any volunteers to look into these issues? What >>>>>>>>>>>>>>>> can we do to mitigate the flakiness until someone has time to >>>>>>>>>>>>>>>> investigate? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Andrew >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>
