Another idea, could we change our "Retest X" phrases with "Retest X
(Reason)" phrases? With this change a PR author will have to look at failed
test logs. They could catch new flakiness introduced by their PR, file a
JIRA for a flakiness that was not noted before, or ping an existing JIRA
issue/raise its severity. On the downside this will require PR authors to
do more.

On Fri, Jul 17, 2020 at 6:46 AM Tyson Hamilton <tyso...@google.com> wrote:

> Adding retries can be beneficial in two ways, unblocking a PR, and
> collecting metrics about the flakes.
>

Makes sense. I think we will still need to have a plan to remove retries
similar to re-enabling disabled tests.


>
> If we also had a flaky test leaderboard that showed which tests are the
> most flaky, then we could take action on them. Encouraging someone from the
> community to fix the flaky test is another issue.
>
> The test status matrix of tests that is on the GitHub landing page could
> show flake level to communicate to users which modules are losing a
> trustable test signal. Maybe this shows up as a flake % or a code coverage
> % that decreases due to disabled flaky tests.
>

+1 to a dashboard that will show a "leaderboard" of flaky tests.


>
> I didn't look for plugins, just dreaming up some options.
>
>
>
>
> On Thu, Jul 16, 2020, 5:58 PM Luke Cwik <lc...@google.com> wrote:
>
>> What do other Apache projects do to address this issue?
>>
>> On Thu, Jul 16, 2020 at 5:51 PM Ahmet Altay <al...@google.com> wrote:
>>
>>> I agree with the comments in this thread.
>>> - If we are not re-enabling tests back again or we do not have a plan to
>>> re-enable them again, disabling tests only provides us temporary relief
>>> until eventually users find issues instead of disabled tests.
>>> - I feel similarly about retries. It is reasonable to add retries for
>>> reasons we understand. Adding retries to avoid flakes is similar to
>>> disabling tests. They might hide real issues.
>>>
>>> I think we are missing a way for checking that we are making progress on
>>> P1 issues. For example, P0 issues block releases and this obviously results
>>> in fixing/triaging/addressing P0 issues at least every 6 weeks. We do not
>>> have a similar process for flaky tests. I do not know what would be a good
>>> policy. One suggestion is to ping (email/slack) assignees of issues. I
>>> recently missed a flaky issue that was assigned to me. A ping like that
>>> would have reminded me. And if an assignee cannot help/does not have the
>>> time, we can try to find a new assignee.
>>>
>>> Ahmet
>>>
>>>
>>> On Thu, Jul 16, 2020 at 11:52 AM Valentyn Tymofieiev <
>>> valen...@google.com> wrote:
>>>
>>>> I think the original discussion[1] on introducing tenacity might answer
>>>> that question.
>>>>
>>>> [1]
>>>> https://lists.apache.org/thread.html/16060fd7f4d408857a5e4a2598cc96ebac0f744b65bf4699001350af%40%3Cdev.beam.apache.org%3E
>>>>
>>>> On Thu, Jul 16, 2020 at 10:48 AM Rui Wang <ruw...@google.com> wrote:
>>>>
>>>>> Is there an observation that enabling tenacity improves the
>>>>> development experience on Python SDK? E.g. less wait time to get PR pass
>>>>> and merged? Or it might be a matter of a right number of retry to align
>>>>> with the "flakiness" of a test?
>>>>>
>>>>>
>>>>> -Rui
>>>>>
>>>>> On Thu, Jul 16, 2020 at 10:38 AM Valentyn Tymofieiev <
>>>>> valen...@google.com> wrote:
>>>>>
>>>>>> We used tenacity[1] to retry some unit tests for which we understood
>>>>>> the nature of flakiness.
>>>>>>
>>>>>> [1]
>>>>>> https://github.com/apache/beam/blob/3b9aae2bcaeb48ab43a77368ae496edc73634c91/sdks/python/apache_beam/runners/portability/fn_api_runner/fn_runner_test.py#L1156
>>>>>>
>>>>>> On Thu, Jul 16, 2020 at 10:25 AM Kenneth Knowles <k...@apache.org>
>>>>>> wrote:
>>>>>>
>>>>>>> Didn't we use something like that flaky retry plugin for Python
>>>>>>> tests at some point? Adding retries may be preferable to disabling the
>>>>>>> test. We need a process to remove the retries ASAP though. As Luke says
>>>>>>> that is not so easy to make happen. Having a way to make P1 bugs more
>>>>>>> visible in an ongoing way may help.
>>>>>>>
>>>>>>> Kenn
>>>>>>>
>>>>>>> On Thu, Jul 16, 2020 at 8:57 AM Luke Cwik <lc...@google.com> wrote:
>>>>>>>
>>>>>>>> I don't think I have seen tests that were previously disabled
>>>>>>>> become re-enabled.
>>>>>>>>
>>>>>>>> It seems as though we have about ~60 disabled tests in Java and ~15
>>>>>>>> in Python. Half of the Java ones seem to be in ZetaSQL/SQL due to 
>>>>>>>> missing
>>>>>>>> features so unrelated to being a flake.
>>>>>>>>
>>>>>>>> On Thu, Jul 16, 2020 at 8:49 AM Gleb Kanterov <g...@spotify.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> There is something called test-retry-gradle-plugin [1]. It retries
>>>>>>>>> tests if they fail, and have different modes to handle flaky tests. 
>>>>>>>>> Did we
>>>>>>>>> ever try or consider using it?
>>>>>>>>>
>>>>>>>>> [1]: https://github.com/gradle/test-retry-gradle-plugin
>>>>>>>>>
>>>>>>>>> On Thu, Jul 16, 2020 at 1:15 PM Gleb Kanterov <g...@spotify.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> I agree with what Ahmet is saying. I can share my perspective,
>>>>>>>>>> recently I had to retrigger build 6 times due to flaky tests, and 
>>>>>>>>>> each
>>>>>>>>>> retrigger took one hour of waiting time.
>>>>>>>>>>
>>>>>>>>>> I've seen examples of automatic tracking of flaky tests, where a
>>>>>>>>>> test is considered flaky if both fails and succeeds for the same git 
>>>>>>>>>> SHA.
>>>>>>>>>> Not sure if there is anything we can enable to get this 
>>>>>>>>>> automatically.
>>>>>>>>>>
>>>>>>>>>> /Gleb
>>>>>>>>>>
>>>>>>>>>> On Thu, Jul 16, 2020 at 2:33 AM Ahmet Altay <al...@google.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> I think it will be reasonable to disable/sickbay any flaky test
>>>>>>>>>>> that is actively blocking people. Collective cost of flaky tests 
>>>>>>>>>>> for such a
>>>>>>>>>>> large group of contributors is very significant.
>>>>>>>>>>>
>>>>>>>>>>> Most of these issues are unassigned. IMO, it makes sense to
>>>>>>>>>>> assign these issues to the most relevant person (who added the 
>>>>>>>>>>> test/who
>>>>>>>>>>> generally maintains those components). Those people can either fix 
>>>>>>>>>>> and
>>>>>>>>>>> re-enable the tests, or remove them if they no longer provide 
>>>>>>>>>>> valuable
>>>>>>>>>>> signals.
>>>>>>>>>>>
>>>>>>>>>>> Ahmet
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Jul 15, 2020 at 4:55 PM Kenneth Knowles <k...@apache.org>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> The situation is much worse than that IMO. My experience of the
>>>>>>>>>>>> last few days is that a large portion of time went to *just 
>>>>>>>>>>>> connecting
>>>>>>>>>>>> failing runs with the corresponding Jira tickets or filing new 
>>>>>>>>>>>> ones*.
>>>>>>>>>>>>
>>>>>>>>>>>> Summarized on PRs:
>>>>>>>>>>>>
>>>>>>>>>>>>  -
>>>>>>>>>>>> https://github.com/apache/beam/pull/12272#issuecomment-659050891
>>>>>>>>>>>>  -
>>>>>>>>>>>> https://github.com/apache/beam/pull/12273#issuecomment-659070317
>>>>>>>>>>>>  -
>>>>>>>>>>>> https://github.com/apache/beam/pull/12225#issuecomment-656973073
>>>>>>>>>>>>  -
>>>>>>>>>>>> https://github.com/apache/beam/pull/12225#issuecomment-657743373
>>>>>>>>>>>>  -
>>>>>>>>>>>> https://github.com/apache/beam/pull/12224#issuecomment-657744481
>>>>>>>>>>>>  -
>>>>>>>>>>>> https://github.com/apache/beam/pull/12216#issuecomment-657735289
>>>>>>>>>>>>  -
>>>>>>>>>>>> https://github.com/apache/beam/pull/12216#issuecomment-657780781
>>>>>>>>>>>>  -
>>>>>>>>>>>> https://github.com/apache/beam/pull/12216#issuecomment-657799415
>>>>>>>>>>>>
>>>>>>>>>>>> The tickets:
>>>>>>>>>>>>
>>>>>>>>>>>>  - https://issues.apache.org/jira/browse/BEAM-10460
>>>>>>>>>>>> SparkPortableExecutionTest
>>>>>>>>>>>>  - https://issues.apache.org/jira/browse/BEAM-10471
>>>>>>>>>>>> CassandraIOTest > testEstimatedSizeBytes
>>>>>>>>>>>>  - https://issues.apache.org/jira/browse/BEAM-10504
>>>>>>>>>>>> ElasticSearchIOTest > testWriteFullAddressing and 
>>>>>>>>>>>> testWriteWithIndexFn
>>>>>>>>>>>>  - https://issues.apache.org/jira/browse/BEAM-10470
>>>>>>>>>>>> JdbcDriverTest
>>>>>>>>>>>>  - https://issues.apache.org/jira/browse/BEAM-8025
>>>>>>>>>>>> CassandraIOTest > @BeforeClass (classmethod)
>>>>>>>>>>>>  - https://issues.apache.org/jira/browse/BEAM-8454
>>>>>>>>>>>> FnHarnessTest
>>>>>>>>>>>>  - https://issues.apache.org/jira/browse/BEAM-10506
>>>>>>>>>>>> SplunkEventWriterTest
>>>>>>>>>>>>  - https://issues.apache.org/jira/browse/BEAM-10472 direct
>>>>>>>>>>>> runner ParDoLifecycleTest
>>>>>>>>>>>>  - https://issues.apache.org/jira/browse/BEAM-9187
>>>>>>>>>>>> DefaultJobBundleFactoryTest
>>>>>>>>>>>>
>>>>>>>>>>>> Here are our P1 test flake bugs:
>>>>>>>>>>>> https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22)%20AND%20resolution%20%3D%20Unresolved%20AND%20labels%20%3D%20flake%20ORDER%20BY%20priority%20DESC%2C%20updated%20DESC
>>>>>>>>>>>>
>>>>>>>>>>>> It seems quite a few of them are actively hindering people
>>>>>>>>>>>> right now.
>>>>>>>>>>>>
>>>>>>>>>>>> Kenn
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, Jul 15, 2020 at 4:23 PM Andrew Pilloud <
>>>>>>>>>>>> apill...@google.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> We have two test suites that are responsible for a large
>>>>>>>>>>>>> percentage of our flaky tests and  both have bugs open for about 
>>>>>>>>>>>>> a year
>>>>>>>>>>>>> without being fixed. These suites are ParDoLifecycleTest (
>>>>>>>>>>>>> BEAM-8101 <https://issues.apache.org/jira/browse/BEAM-8101>)
>>>>>>>>>>>>> in Java and BigQueryWriteIntegrationTests in python (py3
>>>>>>>>>>>>> BEAM-9484 <https://issues.apache.org/jira/browse/BEAM-9484>,
>>>>>>>>>>>>> py2 BEAM-9232
>>>>>>>>>>>>> <https://issues.apache.org/jira/browse/BEAM-9232>, old
>>>>>>>>>>>>> duplicate BEAM-8197
>>>>>>>>>>>>> <https://issues.apache.org/jira/browse/BEAM-8197>).
>>>>>>>>>>>>>
>>>>>>>>>>>>> Are there any volunteers to look into these issues? What can
>>>>>>>>>>>>> we do to mitigate the flakiness until someone has time to 
>>>>>>>>>>>>> investigate?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Andrew
>>>>>>>>>>>>>
>>>>>>>>>>>>

Reply via email to