Re: Flaky tests in Beam

Jan Lukavský Tue, 17 Aug 2021 00:42:54 -0700

If changes to core are causing Dataflow precommits to fail but notlocal precommits, that suggests we lack test coverage?

What is the difference between "Dataflow precommit" and "localprecommit" (besides that the latter can be run without GCP)? If "localprecommit" should catch _all_ regressions, what would be the reason tohave any other precommits then? My intuition would be, that precommitchecks (those being run as part of CI on pull requests) should ideallybe runnable by virtually anyone locally. Any checks that require aspecific environment should be run optionally (e.g. like validatesrunner suites).


On 8/16/21 7:11 PM, Andrew Pilloud wrote:

I can confirm the tests are passing now. Thank you.

If changes to core are causing Dataflow precommits to fail but notlocal precommits, that suggests we lack test coverage? I'm notsuggesting we remove the Dataflow tests entirely, just that weconsider removing them from the precommits where there is overlappingtest coverage.

I would be +1 in favor of a flag as it would allow us to easilydisable Dataflow tests in precommits should we have another outage.

On Mon, Aug 16, 2021 at 9:52 AM Jan Lukavský <je...@seznam.cz<mailto:je...@seznam.cz>> wrote:


    The issue is with pull requests. IIRC, I didn't encounter this
    problem, but I can imagine, that a change in core can make
    Dataflow precommit to fail. And it would be complicated to fix
    this without GCP credentials.

    So, to answer the question, I think that no, it would not help, as
    long as this flag would not be used in CI as well.

    On 8/16/21 6:47 PM, Luke Cwik wrote:

    Jan, it would be possible to add a flag that says to skip any IT
    tests that require a cloud service of any kind. Would that work
    for you?

    It turns out that the fix was rolled out and finished about 45
    mins ago so my prior e-mail was already out of date when I sent
    it. If you had a test that failed on your PR, please feel free to
    restart the test using the github trigger phrase associated with it.

    I reran one of the suites that were perma-red
    
https://ci-beam.apache.org/job/beam_PreCommit_Java_Examples_Dataflow_Cron/4059
    
<https://ci-beam.apache.org/job/beam_PreCommit_Java_Examples_Dataflow_Cron/4059>
    and it passed.


    On Mon, Aug 16, 2021 at 9:29 AM Jan Lukavský <je...@seznam.cz
    <mailto:je...@seznam.cz>> wrote:

        Not directly related to the 'flakiness' discussion of this
        thread, but I think it would be good if pre-commit checks
        could be run locally without GCP credentials.

        On 8/16/21 6:24 PM, Luke Cwik wrote:

        The fix was inadvertently run in dry run mode so didn't make
        any changes. Since the fix was taking a couple of hours or
        so and it was getting late on Friday people didn't want to
        start it again till today (after the weekend).

        I don't think removing the few tests that run an unbounded
        pipeline on Dataflow for a long term is a good idea. Sure,
        we can disable them and re-enable them when there is an
        issue that is blocking folks.

        On Mon, Aug 16, 2021 at 9:19 AM Andrew Pilloud
        <apill...@google.com <mailto:apill...@google.com>> wrote:

            The two hours to estimated fix has long passed and we
            are now at 18 days since the last successful run. What
            is the latest estimate?

            It sounds like these tests are primarily testing
            Dataflow, not Beam. They seem like good candidates to
            remove from the precommit (or limit to Dataflow runner
            changes) even after they are fixed.

            On Fri, Aug 13, 2021 at 6:48 PM Luke Cwik
            <lc...@google.com <mailto:lc...@google.com>> wrote:

                The failure is related due to data that is
                associated with the apache-beam-testing project
                which is impacting all the Dataflow streaming tests.

                Yes, disabling the tests should have happened weeks
                ago if:
                1) The fix seemed like it was going to take a long
                time (was unknown at the time)
                2) We had confidence in test coverage minus Dataflow
                streaming test coverage (which I believe we did)



                On Fri, Aug 13, 2021 at 6:27 PM Andrew Pilloud
                <apill...@google.com <mailto:apill...@google.com>>
                wrote:

                    Or if a rollback won't fix this, can we disable
                    the broken tests?

                    On Fri, Aug 13, 2021 at 6:25 PM Andrew Pilloud
                    <apill...@google.com
                    <mailto:apill...@google.com>> wrote:

                        So you can roll back in two hours. Beam has
                        been broken for two weeks. Why isn't a
                        rollback appropriate?

                        On Fri, Aug 13, 2021 at 6:06 PM Luke Cwik
                        <lc...@google.com <mailto:lc...@google.com>>
                        wrote:

                            From the test failures that I have seen
                            they have been because of BEAM-12676[1]
                            which is due to a bug impacting Dataflow
                            streaming pipelines for the
                            apache-beam-testing project. The fix is
                            rolling out now from my understanding
                            and should take another 2hrs or so.
                            Rolling back master doesn't seem like
                            what we should be doing at the moment.

                            1:
                            
https://issues.apache.org/jira/projects/BEAM/issues/BEAM-12676
                            
<https://issues.apache.org/jira/projects/BEAM/issues/BEAM-12676>

                            On Fri, Aug 13, 2021 at 5:51 PM Andrew
                            Pilloud <apill...@google.com
                            <mailto:apill...@google.com>> wrote:

                                Both java and python precommits are
                                reporting the last successful run
                                being in July (for both Cron and
                                Precommit), so it looks like changes
                                are being submitting without
                                successful test runs. We
                                probably shouldn't be doing that?
                                
https://ci-beam.apache.org/job/beam_PreCommit_Python_Cron/
                                
<https://ci-beam.apache.org/job/beam_PreCommit_Python_Cron/>
                                
https://ci-beam.apache.org/job/beam_PreCommit_Python_Commit/
                                
<https://ci-beam.apache.org/job/beam_PreCommit_Python_Commit/>
                                
https://ci-beam.apache.org/job/beam_PreCommit_Java_Examples_Dataflow_Cron/
                                
<https://ci-beam.apache.org/job/beam_PreCommit_Java_Examples_Dataflow_Cron/>
                                
https://ci-beam.apache.org/job/beam_PreCommit_Java_Examples_Dataflow_Commit/
                                
<https://ci-beam.apache.org/job/beam_PreCommit_Java_Examples_Dataflow_Commit/>

                                Is there a plan to get this fixed?
                                Should we roll master back to July?

                                On Tue, Aug 3, 2021 at 12:24 PM
                                Tyson Hamilton <tyso...@google.com
                                <mailto:tyso...@google.com>> wrote:

                                    I only realized after sending
                                    that I used the IP for the link,
                                    that was by accident, here is
                                    the proper domain link:
                                    
http://metrics.beam.apache.org/d/D81lW0pmk/post-commit-test-reliability?orgId=1
                                    
<http://metrics.beam.apache.org/d/D81lW0pmk/post-commit-test-reliability?orgId=1>

                                    On Tue, Aug 3, 2021 at 3:22 PM
                                    Tyson Hamilton
                                    <tyso...@google.com
                                    <mailto:tyso...@google.com>> wrote:

                                        The way I've investigated
                                        precommit flake stability is
                                        by looking at the
                                        'Post-commit Test
                                        Reliability' [1] dashboard
                                        (hah!). There is a cron job
                                        that runs precommits and
                                        those results are tracked in
                                        the post commit dashboard
                                        confusingly. This week, Java
                                        is about 50% green for the
                                        pre-commit cron job, not great.

                                        The plugin we installed for
                                        tracking the most flaky
                                        tests for a job doesn't do
                                        well for the number of tests
                                        present in the precommit
                                        cron job. This could be an
                                        area of improvement to help
                                        add granularity and
                                        visibility to the flakiest
                                        tests over some period of time.


                                        [1]:
                                        
http://104.154.241.245/d/D81lW0pmk/post-commit-test-reliability?orgId=1
                                        
<http://104.154.241.245/d/D81lW0pmk/post-commit-test-reliability?orgId=1>
                                         (look for
                                        "PreCommit_Java_Cron")

                                        On Tue, Aug 3, 2021 at 2:24
                                        PM Andrew Pilloud
                                        <apill...@google.com
                                        <mailto:apill...@google.com>>
                                        wrote:

                                            Our metrics show java is
                                            nearly free from flakes,
                                            that go has significant
                                            flakes, and that python
                                            is effectively broken.
                                            It appears they may be
                                            missing coverage on the
                                            Java side. The dashboard
                                            is here:
                                            
http://104.154.241.245/d/McTAiu0ik/stability-critical-jobs-status?orgId=1
                                            
<http://104.154.241.245/d/McTAiu0ik/stability-critical-jobs-status?orgId=1>


                                            I agree that this is
                                            important to address. I
                                            haven't submitted any
                                            code recently but I
                                            spent a significant
                                            amount of time on the
                                            2.31.0 release
                                            investigating flakes in
                                            the release
                                            validation tests.

                                            Andrew

                                            On Tue, Aug 3, 2021 at
                                            10:43 AM Reuven Lax
                                            <re...@google.com
                                            <mailto:re...@google.com>>
                                            wrote:

                                                I've noticed
                                                recently that our
                                                precommit tests are
                                                getting flakier and
                                                flakier. Recently I
                                                had to run Java
                                                PreCommit 5 times
                                                before I was able to
                                                get a clean run.
                                                This is frustrating
                                                for us as
                                                developers, but it
                                                also is extremely
                                                wasteful of our
                                                compute resources.

                                                I started making a
                                                list of the flaky
                                                tests I've seen.
                                                Here are some of the
                                                ones I've dealt with
                                                just the past few
                                                days; this is not
                                                nearly an exhaustive
                                                list - I've seen
                                                many others before I
                                                started recording
                                                them. Of the below,
                                                failures in
                                                ElasticsearchIOTest
                                                are by far the most
                                                common!

                                                We need to try and
                                                make these tests not
                                                flaky. Barring that,
                                                I think the
                                                extremely flaky
                                                tests need to be
                                                excluded from our
                                                presubmit until they
                                                can be fixed.
                                                Rerunning the
                                                precommit over and
                                                over again till
                                                green is not a good
                                                testing strategy.

                                                 *

                                                    
org.apache.beam.runners.flink.ReadSourcePortableTest.testExecution[streaming:
                                                    false]
                                                    
<https://ci-beam.apache.org/job/beam_PreCommit_Java_Phrase/3901/testReport/junit/org.apache.beam.runners.flink/ReadSourcePortableTest/testExecution_streaming__false_/>

                                                 *

                                                
org.apache.beam.sdk.io.jms.JmsIOTest.testCheckpointMarkSafety
                                                
<https://ci-beam.apache.org/job/beam_PreCommit_Java_Commit/18485/testReport/junit/org.apache.beam.sdk.io.jms/JmsIOTest/testCheckpointMarkSafety/>

                                                 *

                                                    
org.apache.beam.sdk.transforms.ParDoLifecycleTest.testTeardownCalledAfterExceptionInFinishBundleStateful
                                                    
<https://ci-beam.apache.org/job/beam_PreCommit_Java_Phrase/3903/testReport/junit/org.apache.beam.sdk.transforms/ParDoLifecycleTest/testTeardownCalledAfterExceptionInFinishBundleStateful/>

                                                 *

                                                    
org.apache.beam.sdk.io.elasticsearch.ElasticsearchIOTest.testSplit
                                                    
<https://ci-beam.apache.org/job/beam_PreCommit_Java_Phrase/3903/testReport/junit/org.apache.beam.sdk.io.elasticsearch/ElasticsearchIOTest/testSplit/>

                                                 *

                                                    
org.apache.beam.sdk.io.gcp.datastore.RampupThrottlingFnTest.testRampupThrottler
                                                    
<https://ci-beam.apache.org/job/beam_PreCommit_Java_Commit/18501/testReport/junit/org.apache.beam.sdk.io.gcp.datastore/RampupThrottlingFnTest/testRampupThrottler/>

Re: Flaky tests in Beam

Reply via email to