I don't think I have seen tests that were previously disabled become
re-enabled.

It seems as though we have about ~60 disabled tests in Java and ~15 in
Python. Half of the Java ones seem to be in ZetaSQL/SQL due to missing
features so unrelated to being a flake.

On Thu, Jul 16, 2020 at 8:49 AM Gleb Kanterov <g...@spotify.com> wrote:

> There is something called test-retry-gradle-plugin [1]. It retries tests
> if they fail, and have different modes to handle flaky tests. Did we ever
> try or consider using it?
>
> [1]: https://github.com/gradle/test-retry-gradle-plugin
>
> On Thu, Jul 16, 2020 at 1:15 PM Gleb Kanterov <g...@spotify.com> wrote:
>
>> I agree with what Ahmet is saying. I can share my perspective, recently I
>> had to retrigger build 6 times due to flaky tests, and each retrigger took
>> one hour of waiting time.
>>
>> I've seen examples of automatic tracking of flaky tests, where a test is
>> considered flaky if both fails and succeeds for the same git SHA. Not sure
>> if there is anything we can enable to get this automatically.
>>
>> /Gleb
>>
>> On Thu, Jul 16, 2020 at 2:33 AM Ahmet Altay <al...@google.com> wrote:
>>
>>> I think it will be reasonable to disable/sickbay any flaky test that is
>>> actively blocking people. Collective cost of flaky tests for such a large
>>> group of contributors is very significant.
>>>
>>> Most of these issues are unassigned. IMO, it makes sense to assign these
>>> issues to the most relevant person (who added the test/who generally
>>> maintains those components). Those people can either fix and re-enable the
>>> tests, or remove them if they no longer provide valuable signals.
>>>
>>> Ahmet
>>>
>>> On Wed, Jul 15, 2020 at 4:55 PM Kenneth Knowles <k...@apache.org> wrote:
>>>
>>>> The situation is much worse than that IMO. My experience of the last
>>>> few days is that a large portion of time went to *just connecting failing
>>>> runs with the corresponding Jira tickets or filing new ones*.
>>>>
>>>> Summarized on PRs:
>>>>
>>>>  - https://github.com/apache/beam/pull/12272#issuecomment-659050891
>>>>  - https://github.com/apache/beam/pull/12273#issuecomment-659070317
>>>>  - https://github.com/apache/beam/pull/12225#issuecomment-656973073
>>>>  - https://github.com/apache/beam/pull/12225#issuecomment-657743373
>>>>  - https://github.com/apache/beam/pull/12224#issuecomment-657744481
>>>>  - https://github.com/apache/beam/pull/12216#issuecomment-657735289
>>>>  - https://github.com/apache/beam/pull/12216#issuecomment-657780781
>>>>  - https://github.com/apache/beam/pull/12216#issuecomment-657799415
>>>>
>>>> The tickets:
>>>>
>>>>  - https://issues.apache.org/jira/browse/BEAM-10460
>>>> SparkPortableExecutionTest
>>>>  - https://issues.apache.org/jira/browse/BEAM-10471 CassandraIOTest >
>>>> testEstimatedSizeBytes
>>>>  - https://issues.apache.org/jira/browse/BEAM-10504
>>>> ElasticSearchIOTest > testWriteFullAddressing and testWriteWithIndexFn
>>>>  - https://issues.apache.org/jira/browse/BEAM-10470 JdbcDriverTest
>>>>  - https://issues.apache.org/jira/browse/BEAM-8025 CassandraIOTest
>>>> > @BeforeClass (classmethod)
>>>>  - https://issues.apache.org/jira/browse/BEAM-8454 FnHarnessTest
>>>>  - https://issues.apache.org/jira/browse/BEAM-10506
>>>> SplunkEventWriterTest
>>>>  - https://issues.apache.org/jira/browse/BEAM-10472 direct runner
>>>> ParDoLifecycleTest
>>>>  - https://issues.apache.org/jira/browse/BEAM-9187
>>>> DefaultJobBundleFactoryTest
>>>>
>>>> Here are our P1 test flake bugs:
>>>> https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22)%20AND%20resolution%20%3D%20Unresolved%20AND%20labels%20%3D%20flake%20ORDER%20BY%20priority%20DESC%2C%20updated%20DESC
>>>>
>>>> It seems quite a few of them are actively hindering people right now.
>>>>
>>>> Kenn
>>>>
>>>> On Wed, Jul 15, 2020 at 4:23 PM Andrew Pilloud <apill...@google.com>
>>>> wrote:
>>>>
>>>>> We have two test suites that are responsible for a large percentage of
>>>>> our flaky tests and  both have bugs open for about a year without being
>>>>> fixed. These suites are ParDoLifecycleTest (BEAM-8101
>>>>> <https://issues.apache.org/jira/browse/BEAM-8101>) in Java
>>>>> and BigQueryWriteIntegrationTests in python (py3 BEAM-9484
>>>>> <https://issues.apache.org/jira/browse/BEAM-9484>, py2 BEAM-9232
>>>>> <https://issues.apache.org/jira/browse/BEAM-9232>, old duplicate
>>>>> BEAM-8197 <https://issues.apache.org/jira/browse/BEAM-8197>).
>>>>>
>>>>> Are there any volunteers to look into these issues? What can we do to
>>>>> mitigate the flakiness until someone has time to investigate?
>>>>>
>>>>> Andrew
>>>>>
>>>>

Reply via email to