Re: Flaky dashboard for current branch-2

Duo Zhang Fri, 12 Jan 2018 16:36:23 -0800

Why a 100% failure test can not be detected with pre commit check?

Ted Yu <[email protected]>于2018年1月13日 周六07:44写道：


> As we get closer and closer to beta release, it is important to have as few
> flaky tests as possible.
>
> bq. we can actually update the script to send a mail to dev@
>
> A post to the JIRA which caused the 100% failing test would be better.
> The committer would notice the post and take corresponding action.
>
> Cheers
>
> On Fri, Jan 12, 2018 at 3:35 PM, Apekshit Sharma <[email protected]>
> wrote:
>
> > >   Is Nightly now using a list of flakes?
> > Dashboard job was flaky yesterday, so didn't start using it. Looks like
> > it's working fine now. Let me exclude flakies from nightly job.
> >
> > > Just took a look at the dashboard. Does this capture only failed runs
> or
> > all
> > runs?
> > Sorry the question isn't clear. Runs of what?
> > Here's an attempt to answer it in best way i can understand - it looks at
> > last X (x=6 now) runs of nightly branch-2 to collect failing, hanging,
> and
> > timedout tests.
> >
> > > I see that the following tests have failed 100% of the time for the
> last
> > 30
> > > runs [1]. If this captures all runs, this isn't truly flaky, but
> rather a
> > > legitimate failure, right?
> > > Maybe this tool is used to see all test failures, but if not, I feel
> like
> > > we could/should remove a test from the flaky tests/excludes if it fails
> > > consistently so we can fix the root cause
> >
> > Has come up a lot of times before. Yes, you're right 100% failure =
> > legitimate failure.
> > <rant>
> > We as a community suck at tracking nightly runs for failing tests and
> > fixing them, otherwise we wouldn't have ~40 bad test, right!
> > In fact, we suck at fixing tests even when it's presented in a nice clean
> > list (this dashboard). We just don't prioritize tests in our work.
> > The general attitude is, tests are failing...meh..what's new, have been
> > failing for years. Instead of - Oh, one test failed, find the cause and
> > revert it!
> > So the real thing to change here is attitude of the community towards
> > tests. I am +1 for anything that'll promote/support that change.
> > </rant>
> > I think we can actually update the script to send a mail to dev@ when it
> > encounters these 100% failing tests. Waana try? :)
> >
> > -- Appy
> >
> >
> >
> >
> > On Fri, Jan 12, 2018 at 11:29 AM, Zach York <
> [email protected]>
> > wrote:
> >
> > > Just took a look at the dashboard. Does this capture only failed runs
> or
> > > all runs?
> > >
> > > I see that the following tests have failed 100% of the time for the
> last
> > 30
> > > runs [1]. If this captures all runs, this isn't truly flaky, but
> rather a
> > > legitimate failure, right?
> > > Maybe this tool is used to see all test failures, but if not, I feel
> like
> > > we could/should remove a test from the flaky tests/excludes if it fails
> > > consistently so we can fix the root cause.
> > >
> > > [1]
> > > master.balancer.TestRegionsOnMasterOptions
> > > client.TestMultiParallel
> > > regionserver.TestRegionServerReadRequestMetrics
> > >
> > > Thanks,
> > > Zach
> > >
> > > On Fri, Jan 12, 2018 at 8:19 AM, Stack <[email protected]> wrote:
> > >
> > > > Dashboard doesn't capture timed out tests, right Appy?
> > > > Thanks,
> > > > S
> > > >
> > > > On Thu, Jan 11, 2018 at 6:10 PM, Apekshit Sharma <[email protected]>
> > > > wrote:
> > > >
> > > > > https://builds.apache.org/job/HBase-Find-Flaky-Tests-
> > > > > branch2.0/lastSuccessfulBuild/artifact/dashboard.html
> > > > >
> > > > > @stack: when you branch out branch-2.0, let me know, i'll update
> the
> > > jobs
> > > > > to point to that branch so that it's helpful in release. Once
> release
> > > is
> > > > > done, i'll move them back to "branch-2".
> > > > >
> > > > >
> > > > > -- Appy
> > > > >
> > > >
> > >
> >
> >
> >
> > --
> >
> > -- Appy
> >
>

Re: Flaky dashboard for current branch-2

Reply via email to