Re: Chronically flaky tests

Valentyn Tymofieiev Thu, 16 Jul 2020 11:52:15 -0700

I think the original discussion[1] on introducing tenacity might answer
that question.


[1]
https://lists.apache.org/thread.html/16060fd7f4d408857a5e4a2598cc96ebac0f744b65bf4699001350af%40%3Cdev.beam.apache.org%3E

On Thu, Jul 16, 2020 at 10:48 AM Rui Wang <[email protected]> wrote:

> Is there an observation that enabling tenacity improves the
> development experience on Python SDK? E.g. less wait time to get PR pass
> and merged? Or it might be a matter of a right number of retry to align
> with the "flakiness" of a test?
>
>
> -Rui
>
> On Thu, Jul 16, 2020 at 10:38 AM Valentyn Tymofieiev <[email protected]>
> wrote:
>
>> We used tenacity[1] to retry some unit tests for which we understood the
>> nature of flakiness.
>>
>> [1]
>> https://github.com/apache/beam/blob/3b9aae2bcaeb48ab43a77368ae496edc73634c91/sdks/python/apache_beam/runners/portability/fn_api_runner/fn_runner_test.py#L1156
>>
>> On Thu, Jul 16, 2020 at 10:25 AM Kenneth Knowles <[email protected]> wrote:
>>
>>> Didn't we use something like that flaky retry plugin for Python tests at
>>> some point? Adding retries may be preferable to disabling the test. We need
>>> a process to remove the retries ASAP though. As Luke says that is not so
>>> easy to make happen. Having a way to make P1 bugs more visible in an
>>> ongoing way may help.
>>>
>>> Kenn
>>>
>>> On Thu, Jul 16, 2020 at 8:57 AM Luke Cwik <[email protected]> wrote:
>>>
>>>> I don't think I have seen tests that were previously disabled become
>>>> re-enabled.
>>>>
>>>> It seems as though we have about ~60 disabled tests in Java and ~15 in
>>>> Python. Half of the Java ones seem to be in ZetaSQL/SQL due to missing
>>>> features so unrelated to being a flake.
>>>>
>>>> On Thu, Jul 16, 2020 at 8:49 AM Gleb Kanterov <[email protected]> wrote:
>>>>
>>>>> There is something called test-retry-gradle-plugin [1]. It retries
>>>>> tests if they fail, and have different modes to handle flaky tests. Did we
>>>>> ever try or consider using it?
>>>>>
>>>>> [1]: https://github.com/gradle/test-retry-gradle-plugin
>>>>>
>>>>> On Thu, Jul 16, 2020 at 1:15 PM Gleb Kanterov <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> I agree with what Ahmet is saying. I can share my perspective,
>>>>>> recently I had to retrigger build 6 times due to flaky tests, and each
>>>>>> retrigger took one hour of waiting time.
>>>>>>
>>>>>> I've seen examples of automatic tracking of flaky tests, where a test
>>>>>> is considered flaky if both fails and succeeds for the same git SHA. Not
>>>>>> sure if there is anything we can enable to get this automatically.
>>>>>>
>>>>>> /Gleb
>>>>>>
>>>>>> On Thu, Jul 16, 2020 at 2:33 AM Ahmet Altay <[email protected]> wrote:
>>>>>>
>>>>>>> I think it will be reasonable to disable/sickbay any flaky test that
>>>>>>> is actively blocking people. Collective cost of flaky tests for such a
>>>>>>> large group of contributors is very significant.
>>>>>>>
>>>>>>> Most of these issues are unassigned. IMO, it makes sense to assign
>>>>>>> these issues to the most relevant person (who added the test/who 
>>>>>>> generally
>>>>>>> maintains those components). Those people can either fix and re-enable 
>>>>>>> the
>>>>>>> tests, or remove them if they no longer provide valuable signals.
>>>>>>>
>>>>>>> Ahmet
>>>>>>>
>>>>>>> On Wed, Jul 15, 2020 at 4:55 PM Kenneth Knowles <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> The situation is much worse than that IMO. My experience of the
>>>>>>>> last few days is that a large portion of time went to *just connecting
>>>>>>>> failing runs with the corresponding Jira tickets or filing new ones*.
>>>>>>>>
>>>>>>>> Summarized on PRs:
>>>>>>>>
>>>>>>>>  - https://github.com/apache/beam/pull/12272#issuecomment-659050891
>>>>>>>>  - https://github.com/apache/beam/pull/12273#issuecomment-659070317
>>>>>>>>  - https://github.com/apache/beam/pull/12225#issuecomment-656973073
>>>>>>>>  - https://github.com/apache/beam/pull/12225#issuecomment-657743373
>>>>>>>>  - https://github.com/apache/beam/pull/12224#issuecomment-657744481
>>>>>>>>  - https://github.com/apache/beam/pull/12216#issuecomment-657735289
>>>>>>>>  - https://github.com/apache/beam/pull/12216#issuecomment-657780781
>>>>>>>>  - https://github.com/apache/beam/pull/12216#issuecomment-657799415
>>>>>>>>
>>>>>>>> The tickets:
>>>>>>>>
>>>>>>>>  - https://issues.apache.org/jira/browse/BEAM-10460
>>>>>>>> SparkPortableExecutionTest
>>>>>>>>  - https://issues.apache.org/jira/browse/BEAM-10471
>>>>>>>> CassandraIOTest > testEstimatedSizeBytes
>>>>>>>>  - https://issues.apache.org/jira/browse/BEAM-10504
>>>>>>>> ElasticSearchIOTest > testWriteFullAddressing and testWriteWithIndexFn
>>>>>>>>  - https://issues.apache.org/jira/browse/BEAM-10470 JdbcDriverTest
>>>>>>>>  - https://issues.apache.org/jira/browse/BEAM-8025 CassandraIOTest
>>>>>>>> > @BeforeClass (classmethod)
>>>>>>>>  - https://issues.apache.org/jira/browse/BEAM-8454 FnHarnessTest
>>>>>>>>  - https://issues.apache.org/jira/browse/BEAM-10506
>>>>>>>> SplunkEventWriterTest
>>>>>>>>  - https://issues.apache.org/jira/browse/BEAM-10472 direct runner
>>>>>>>> ParDoLifecycleTest
>>>>>>>>  - https://issues.apache.org/jira/browse/BEAM-9187
>>>>>>>> DefaultJobBundleFactoryTest
>>>>>>>>
>>>>>>>> Here are our P1 test flake bugs:
>>>>>>>> https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22)%20AND%20resolution%20%3D%20Unresolved%20AND%20labels%20%3D%20flake%20ORDER%20BY%20priority%20DESC%2C%20updated%20DESC
>>>>>>>>
>>>>>>>> It seems quite a few of them are actively hindering people right
>>>>>>>> now.
>>>>>>>>
>>>>>>>> Kenn
>>>>>>>>
>>>>>>>> On Wed, Jul 15, 2020 at 4:23 PM Andrew Pilloud <[email protected]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> We have two test suites that are responsible for a large
>>>>>>>>> percentage of our flaky tests and  both have bugs open for about a 
>>>>>>>>> year
>>>>>>>>> without being fixed. These suites are ParDoLifecycleTest (
>>>>>>>>> BEAM-8101 <https://issues.apache.org/jira/browse/BEAM-8101>) in
>>>>>>>>> Java and BigQueryWriteIntegrationTests in python (py3 BEAM-9484
>>>>>>>>> <https://issues.apache.org/jira/browse/BEAM-9484>, py2 BEAM-9232
>>>>>>>>> <https://issues.apache.org/jira/browse/BEAM-9232>, old duplicate
>>>>>>>>> BEAM-8197 <https://issues.apache.org/jira/browse/BEAM-8197>).
>>>>>>>>>
>>>>>>>>> Are there any volunteers to look into these issues? What can we do
>>>>>>>>> to mitigate the flakiness until someone has time to investigate?
>>>>>>>>>
>>>>>>>>> Andrew
>>>>>>>>>
>>>>>>>>

Re: Chronically flaky tests

Reply via email to