Re: Chronically flaky tests

Tyson Hamilton Fri, 17 Jul 2020 06:46:14 -0700

Adding retries can be beneficial in two ways, unblocking a PR, and
collecting metrics about the flakes.


If we also had a flaky test leaderboard that showed which tests are the
most flaky, then we could take action on them. Encouraging someone from the
community to fix the flaky test is another issue.

The test status matrix of tests that is on the GitHub landing page could
show flake level to communicate to users which modules are losing a
trustable test signal. Maybe this shows up as a flake % or a code coverage
% that decreases due to disabled flaky tests.

I didn't look for plugins, just dreaming up some options.




On Thu, Jul 16, 2020, 5:58 PM Luke Cwik <[email protected]> wrote:

> What do other Apache projects do to address this issue?
>
> On Thu, Jul 16, 2020 at 5:51 PM Ahmet Altay <[email protected]> wrote:
>
>> I agree with the comments in this thread.
>> - If we are not re-enabling tests back again or we do not have a plan to
>> re-enable them again, disabling tests only provides us temporary relief
>> until eventually users find issues instead of disabled tests.
>> - I feel similarly about retries. It is reasonable to add retries for
>> reasons we understand. Adding retries to avoid flakes is similar to
>> disabling tests. They might hide real issues.
>>
>> I think we are missing a way for checking that we are making progress on
>> P1 issues. For example, P0 issues block releases and this obviously results
>> in fixing/triaging/addressing P0 issues at least every 6 weeks. We do not
>> have a similar process for flaky tests. I do not know what would be a good
>> policy. One suggestion is to ping (email/slack) assignees of issues. I
>> recently missed a flaky issue that was assigned to me. A ping like that
>> would have reminded me. And if an assignee cannot help/does not have the
>> time, we can try to find a new assignee.
>>
>> Ahmet
>>
>>
>> On Thu, Jul 16, 2020 at 11:52 AM Valentyn Tymofieiev <[email protected]>
>> wrote:
>>
>>> I think the original discussion[1] on introducing tenacity might answer
>>> that question.
>>>
>>> [1]
>>> https://lists.apache.org/thread.html/16060fd7f4d408857a5e4a2598cc96ebac0f744b65bf4699001350af%40%3Cdev.beam.apache.org%3E
>>>
>>> On Thu, Jul 16, 2020 at 10:48 AM Rui Wang <[email protected]> wrote:
>>>
>>>> Is there an observation that enabling tenacity improves the
>>>> development experience on Python SDK? E.g. less wait time to get PR pass
>>>> and merged? Or it might be a matter of a right number of retry to align
>>>> with the "flakiness" of a test?
>>>>
>>>>
>>>> -Rui
>>>>
>>>> On Thu, Jul 16, 2020 at 10:38 AM Valentyn Tymofieiev <
>>>> [email protected]> wrote:
>>>>
>>>>> We used tenacity[1] to retry some unit tests for which we understood
>>>>> the nature of flakiness.
>>>>>
>>>>> [1]
>>>>> https://github.com/apache/beam/blob/3b9aae2bcaeb48ab43a77368ae496edc73634c91/sdks/python/apache_beam/runners/portability/fn_api_runner/fn_runner_test.py#L1156
>>>>>
>>>>> On Thu, Jul 16, 2020 at 10:25 AM Kenneth Knowles <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Didn't we use something like that flaky retry plugin for Python tests
>>>>>> at some point? Adding retries may be preferable to disabling the test. We
>>>>>> need a process to remove the retries ASAP though. As Luke says that is 
>>>>>> not
>>>>>> so easy to make happen. Having a way to make P1 bugs more visible in an
>>>>>> ongoing way may help.
>>>>>>
>>>>>> Kenn
>>>>>>
>>>>>> On Thu, Jul 16, 2020 at 8:57 AM Luke Cwik <[email protected]> wrote:
>>>>>>
>>>>>>> I don't think I have seen tests that were previously disabled become
>>>>>>> re-enabled.
>>>>>>>
>>>>>>> It seems as though we have about ~60 disabled tests in Java and ~15
>>>>>>> in Python. Half of the Java ones seem to be in ZetaSQL/SQL due to 
>>>>>>> missing
>>>>>>> features so unrelated to being a flake.
>>>>>>>
>>>>>>> On Thu, Jul 16, 2020 at 8:49 AM Gleb Kanterov <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> There is something called test-retry-gradle-plugin [1]. It retries
>>>>>>>> tests if they fail, and have different modes to handle flaky tests. 
>>>>>>>> Did we
>>>>>>>> ever try or consider using it?
>>>>>>>>
>>>>>>>> [1]: https://github.com/gradle/test-retry-gradle-plugin
>>>>>>>>
>>>>>>>> On Thu, Jul 16, 2020 at 1:15 PM Gleb Kanterov <[email protected]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> I agree with what Ahmet is saying. I can share my perspective,
>>>>>>>>> recently I had to retrigger build 6 times due to flaky tests, and each
>>>>>>>>> retrigger took one hour of waiting time.
>>>>>>>>>
>>>>>>>>> I've seen examples of automatic tracking of flaky tests, where a
>>>>>>>>> test is considered flaky if both fails and succeeds for the same git 
>>>>>>>>> SHA.
>>>>>>>>> Not sure if there is anything we can enable to get this automatically.
>>>>>>>>>
>>>>>>>>> /Gleb
>>>>>>>>>
>>>>>>>>> On Thu, Jul 16, 2020 at 2:33 AM Ahmet Altay <[email protected]>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> I think it will be reasonable to disable/sickbay any flaky test
>>>>>>>>>> that is actively blocking people. Collective cost of flaky tests for 
>>>>>>>>>> such a
>>>>>>>>>> large group of contributors is very significant.
>>>>>>>>>>
>>>>>>>>>> Most of these issues are unassigned. IMO, it makes sense to
>>>>>>>>>> assign these issues to the most relevant person (who added the 
>>>>>>>>>> test/who
>>>>>>>>>> generally maintains those components). Those people can either fix 
>>>>>>>>>> and
>>>>>>>>>> re-enable the tests, or remove them if they no longer provide 
>>>>>>>>>> valuable
>>>>>>>>>> signals.
>>>>>>>>>>
>>>>>>>>>> Ahmet
>>>>>>>>>>
>>>>>>>>>> On Wed, Jul 15, 2020 at 4:55 PM Kenneth Knowles <[email protected]>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> The situation is much worse than that IMO. My experience of the
>>>>>>>>>>> last few days is that a large portion of time went to *just 
>>>>>>>>>>> connecting
>>>>>>>>>>> failing runs with the corresponding Jira tickets or filing new 
>>>>>>>>>>> ones*.
>>>>>>>>>>>
>>>>>>>>>>> Summarized on PRs:
>>>>>>>>>>>
>>>>>>>>>>>  -
>>>>>>>>>>> https://github.com/apache/beam/pull/12272#issuecomment-659050891
>>>>>>>>>>>  -
>>>>>>>>>>> https://github.com/apache/beam/pull/12273#issuecomment-659070317
>>>>>>>>>>>  -
>>>>>>>>>>> https://github.com/apache/beam/pull/12225#issuecomment-656973073
>>>>>>>>>>>  -
>>>>>>>>>>> https://github.com/apache/beam/pull/12225#issuecomment-657743373
>>>>>>>>>>>  -
>>>>>>>>>>> https://github.com/apache/beam/pull/12224#issuecomment-657744481
>>>>>>>>>>>  -
>>>>>>>>>>> https://github.com/apache/beam/pull/12216#issuecomment-657735289
>>>>>>>>>>>  -
>>>>>>>>>>> https://github.com/apache/beam/pull/12216#issuecomment-657780781
>>>>>>>>>>>  -
>>>>>>>>>>> https://github.com/apache/beam/pull/12216#issuecomment-657799415
>>>>>>>>>>>
>>>>>>>>>>> The tickets:
>>>>>>>>>>>
>>>>>>>>>>>  - https://issues.apache.org/jira/browse/BEAM-10460
>>>>>>>>>>> SparkPortableExecutionTest
>>>>>>>>>>>  - https://issues.apache.org/jira/browse/BEAM-10471
>>>>>>>>>>> CassandraIOTest > testEstimatedSizeBytes
>>>>>>>>>>>  - https://issues.apache.org/jira/browse/BEAM-10504
>>>>>>>>>>> ElasticSearchIOTest > testWriteFullAddressing and 
>>>>>>>>>>> testWriteWithIndexFn
>>>>>>>>>>>  - https://issues.apache.org/jira/browse/BEAM-10470
>>>>>>>>>>> JdbcDriverTest
>>>>>>>>>>>  - https://issues.apache.org/jira/browse/BEAM-8025
>>>>>>>>>>> CassandraIOTest > @BeforeClass (classmethod)
>>>>>>>>>>>  - https://issues.apache.org/jira/browse/BEAM-8454 FnHarnessTest
>>>>>>>>>>>  - https://issues.apache.org/jira/browse/BEAM-10506
>>>>>>>>>>> SplunkEventWriterTest
>>>>>>>>>>>  - https://issues.apache.org/jira/browse/BEAM-10472 direct
>>>>>>>>>>> runner ParDoLifecycleTest
>>>>>>>>>>>  - https://issues.apache.org/jira/browse/BEAM-9187
>>>>>>>>>>> DefaultJobBundleFactoryTest
>>>>>>>>>>>
>>>>>>>>>>> Here are our P1 test flake bugs:
>>>>>>>>>>> https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22)%20AND%20resolution%20%3D%20Unresolved%20AND%20labels%20%3D%20flake%20ORDER%20BY%20priority%20DESC%2C%20updated%20DESC
>>>>>>>>>>>
>>>>>>>>>>> It seems quite a few of them are actively hindering people right
>>>>>>>>>>> now.
>>>>>>>>>>>
>>>>>>>>>>> Kenn
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Jul 15, 2020 at 4:23 PM Andrew Pilloud <
>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> We have two test suites that are responsible for a large
>>>>>>>>>>>> percentage of our flaky tests and  both have bugs open for about a 
>>>>>>>>>>>> year
>>>>>>>>>>>> without being fixed. These suites are ParDoLifecycleTest (
>>>>>>>>>>>> BEAM-8101 <https://issues.apache.org/jira/browse/BEAM-8101>)
>>>>>>>>>>>> in Java and BigQueryWriteIntegrationTests in python (py3
>>>>>>>>>>>> BEAM-9484 <https://issues.apache.org/jira/browse/BEAM-9484>,
>>>>>>>>>>>> py2 BEAM-9232 <https://issues.apache.org/jira/browse/BEAM-9232>,
>>>>>>>>>>>> old duplicate BEAM-8197
>>>>>>>>>>>> <https://issues.apache.org/jira/browse/BEAM-8197>).
>>>>>>>>>>>>
>>>>>>>>>>>> Are there any volunteers to look into these issues? What can we
>>>>>>>>>>>> do to mitigate the flakiness until someone has time to investigate?
>>>>>>>>>>>>
>>>>>>>>>>>> Andrew
>>>>>>>>>>>>
>>>>>>>>>>>

Re: Chronically flaky tests

Reply via email to