Re: Chronically flaky tests

Luke Cwik Thu, 16 Jul 2020 17:58:26 -0700

What do other Apache projects do to address this issue?

On Thu, Jul 16, 2020 at 5:51 PM Ahmet Altay <[email protected]> wrote:


> I agree with the comments in this thread.
> - If we are not re-enabling tests back again or we do not have a plan to
> re-enable them again, disabling tests only provides us temporary relief
> until eventually users find issues instead of disabled tests.
> - I feel similarly about retries. It is reasonable to add retries for
> reasons we understand. Adding retries to avoid flakes is similar to
> disabling tests. They might hide real issues.
>
> I think we are missing a way for checking that we are making progress on
> P1 issues. For example, P0 issues block releases and this obviously results
> in fixing/triaging/addressing P0 issues at least every 6 weeks. We do not
> have a similar process for flaky tests. I do not know what would be a good
> policy. One suggestion is to ping (email/slack) assignees of issues. I
> recently missed a flaky issue that was assigned to me. A ping like that
> would have reminded me. And if an assignee cannot help/does not have the
> time, we can try to find a new assignee.
>
> Ahmet
>
>
> On Thu, Jul 16, 2020 at 11:52 AM Valentyn Tymofieiev <[email protected]>
> wrote:
>
>> I think the original discussion[1] on introducing tenacity might answer
>> that question.
>>
>> [1]
>> https://lists.apache.org/thread.html/16060fd7f4d408857a5e4a2598cc96ebac0f744b65bf4699001350af%40%3Cdev.beam.apache.org%3E
>>
>> On Thu, Jul 16, 2020 at 10:48 AM Rui Wang <[email protected]> wrote:
>>
>>> Is there an observation that enabling tenacity improves the
>>> development experience on Python SDK? E.g. less wait time to get PR pass
>>> and merged? Or it might be a matter of a right number of retry to align
>>> with the "flakiness" of a test?
>>>
>>>
>>> -Rui
>>>
>>> On Thu, Jul 16, 2020 at 10:38 AM Valentyn Tymofieiev <
>>> [email protected]> wrote:
>>>
>>>> We used tenacity[1] to retry some unit tests for which we understood
>>>> the nature of flakiness.
>>>>
>>>> [1]
>>>> https://github.com/apache/beam/blob/3b9aae2bcaeb48ab43a77368ae496edc73634c91/sdks/python/apache_beam/runners/portability/fn_api_runner/fn_runner_test.py#L1156
>>>>
>>>> On Thu, Jul 16, 2020 at 10:25 AM Kenneth Knowles <[email protected]>
>>>> wrote:
>>>>
>>>>> Didn't we use something like that flaky retry plugin for Python tests
>>>>> at some point? Adding retries may be preferable to disabling the test. We
>>>>> need a process to remove the retries ASAP though. As Luke says that is not
>>>>> so easy to make happen. Having a way to make P1 bugs more visible in an
>>>>> ongoing way may help.
>>>>>
>>>>> Kenn
>>>>>
>>>>> On Thu, Jul 16, 2020 at 8:57 AM Luke Cwik <[email protected]> wrote:
>>>>>
>>>>>> I don't think I have seen tests that were previously disabled become
>>>>>> re-enabled.
>>>>>>
>>>>>> It seems as though we have about ~60 disabled tests in Java and ~15
>>>>>> in Python. Half of the Java ones seem to be in ZetaSQL/SQL due to missing
>>>>>> features so unrelated to being a flake.
>>>>>>
>>>>>> On Thu, Jul 16, 2020 at 8:49 AM Gleb Kanterov <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> There is something called test-retry-gradle-plugin [1]. It retries
>>>>>>> tests if they fail, and have different modes to handle flaky tests. Did 
>>>>>>> we
>>>>>>> ever try or consider using it?
>>>>>>>
>>>>>>> [1]: https://github.com/gradle/test-retry-gradle-plugin
>>>>>>>
>>>>>>> On Thu, Jul 16, 2020 at 1:15 PM Gleb Kanterov <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> I agree with what Ahmet is saying. I can share my perspective,
>>>>>>>> recently I had to retrigger build 6 times due to flaky tests, and each
>>>>>>>> retrigger took one hour of waiting time.
>>>>>>>>
>>>>>>>> I've seen examples of automatic tracking of flaky tests, where a
>>>>>>>> test is considered flaky if both fails and succeeds for the same git 
>>>>>>>> SHA.
>>>>>>>> Not sure if there is anything we can enable to get this automatically.
>>>>>>>>
>>>>>>>> /Gleb
>>>>>>>>
>>>>>>>> On Thu, Jul 16, 2020 at 2:33 AM Ahmet Altay <[email protected]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> I think it will be reasonable to disable/sickbay any flaky test
>>>>>>>>> that is actively blocking people. Collective cost of flaky tests for 
>>>>>>>>> such a
>>>>>>>>> large group of contributors is very significant.
>>>>>>>>>
>>>>>>>>> Most of these issues are unassigned. IMO, it makes sense to assign
>>>>>>>>> these issues to the most relevant person (who added the test/who 
>>>>>>>>> generally
>>>>>>>>> maintains those components). Those people can either fix and 
>>>>>>>>> re-enable the
>>>>>>>>> tests, or remove them if they no longer provide valuable signals.
>>>>>>>>>
>>>>>>>>> Ahmet
>>>>>>>>>
>>>>>>>>> On Wed, Jul 15, 2020 at 4:55 PM Kenneth Knowles <[email protected]>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> The situation is much worse than that IMO. My experience of the
>>>>>>>>>> last few days is that a large portion of time went to *just 
>>>>>>>>>> connecting
>>>>>>>>>> failing runs with the corresponding Jira tickets or filing new ones*.
>>>>>>>>>>
>>>>>>>>>> Summarized on PRs:
>>>>>>>>>>
>>>>>>>>>>  -
>>>>>>>>>> https://github.com/apache/beam/pull/12272#issuecomment-659050891
>>>>>>>>>>  -
>>>>>>>>>> https://github.com/apache/beam/pull/12273#issuecomment-659070317
>>>>>>>>>>  -
>>>>>>>>>> https://github.com/apache/beam/pull/12225#issuecomment-656973073
>>>>>>>>>>  -
>>>>>>>>>> https://github.com/apache/beam/pull/12225#issuecomment-657743373
>>>>>>>>>>  -
>>>>>>>>>> https://github.com/apache/beam/pull/12224#issuecomment-657744481
>>>>>>>>>>  -
>>>>>>>>>> https://github.com/apache/beam/pull/12216#issuecomment-657735289
>>>>>>>>>>  -
>>>>>>>>>> https://github.com/apache/beam/pull/12216#issuecomment-657780781
>>>>>>>>>>  -
>>>>>>>>>> https://github.com/apache/beam/pull/12216#issuecomment-657799415
>>>>>>>>>>
>>>>>>>>>> The tickets:
>>>>>>>>>>
>>>>>>>>>>  - https://issues.apache.org/jira/browse/BEAM-10460
>>>>>>>>>> SparkPortableExecutionTest
>>>>>>>>>>  - https://issues.apache.org/jira/browse/BEAM-10471
>>>>>>>>>> CassandraIOTest > testEstimatedSizeBytes
>>>>>>>>>>  - https://issues.apache.org/jira/browse/BEAM-10504
>>>>>>>>>> ElasticSearchIOTest > testWriteFullAddressing and 
>>>>>>>>>> testWriteWithIndexFn
>>>>>>>>>>  - https://issues.apache.org/jira/browse/BEAM-10470
>>>>>>>>>> JdbcDriverTest
>>>>>>>>>>  - https://issues.apache.org/jira/browse/BEAM-8025
>>>>>>>>>> CassandraIOTest > @BeforeClass (classmethod)
>>>>>>>>>>  - https://issues.apache.org/jira/browse/BEAM-8454 FnHarnessTest
>>>>>>>>>>  - https://issues.apache.org/jira/browse/BEAM-10506
>>>>>>>>>> SplunkEventWriterTest
>>>>>>>>>>  - https://issues.apache.org/jira/browse/BEAM-10472 direct
>>>>>>>>>> runner ParDoLifecycleTest
>>>>>>>>>>  - https://issues.apache.org/jira/browse/BEAM-9187
>>>>>>>>>> DefaultJobBundleFactoryTest
>>>>>>>>>>
>>>>>>>>>> Here are our P1 test flake bugs:
>>>>>>>>>> https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22)%20AND%20resolution%20%3D%20Unresolved%20AND%20labels%20%3D%20flake%20ORDER%20BY%20priority%20DESC%2C%20updated%20DESC
>>>>>>>>>>
>>>>>>>>>> It seems quite a few of them are actively hindering people right
>>>>>>>>>> now.
>>>>>>>>>>
>>>>>>>>>> Kenn
>>>>>>>>>>
>>>>>>>>>> On Wed, Jul 15, 2020 at 4:23 PM Andrew Pilloud <
>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>
>>>>>>>>>>> We have two test suites that are responsible for a large
>>>>>>>>>>> percentage of our flaky tests and  both have bugs open for about a 
>>>>>>>>>>> year
>>>>>>>>>>> without being fixed. These suites are ParDoLifecycleTest (
>>>>>>>>>>> BEAM-8101 <https://issues.apache.org/jira/browse/BEAM-8101>) in
>>>>>>>>>>> Java and BigQueryWriteIntegrationTests in python (py3 BEAM-9484
>>>>>>>>>>> <https://issues.apache.org/jira/browse/BEAM-9484>, py2 BEAM-9232
>>>>>>>>>>> <https://issues.apache.org/jira/browse/BEAM-9232>, old
>>>>>>>>>>> duplicate BEAM-8197
>>>>>>>>>>> <https://issues.apache.org/jira/browse/BEAM-8197>).
>>>>>>>>>>>
>>>>>>>>>>> Are there any volunteers to look into these issues? What can we
>>>>>>>>>>> do to mitigate the flakiness until someone has time to investigate?
>>>>>>>>>>>
>>>>>>>>>>> Andrew
>>>>>>>>>>>
>>>>>>>>>>

Re: Chronically flaky tests

Reply via email to