Re: Flaky dashboard for current branch-2

Ted Yu Fri, 12 Jan 2018 15:44:36 -0800

As we get closer and closer to beta release, it is important to have as few
flaky tests as possible.


bq. we can actually update the script to send a mail to dev@

A post to the JIRA which caused the 100% failing test would be better.
The committer would notice the post and take corresponding action.

Cheers

On Fri, Jan 12, 2018 at 3:35 PM, Apekshit Sharma <a...@cloudera.com> wrote:

> >   Is Nightly now using a list of flakes?
> Dashboard job was flaky yesterday, so didn't start using it. Looks like
> it's working fine now. Let me exclude flakies from nightly job.
>
> > Just took a look at the dashboard. Does this capture only failed runs or
> all
> runs?
> Sorry the question isn't clear. Runs of what?
> Here's an attempt to answer it in best way i can understand - it looks at
> last X (x=6 now) runs of nightly branch-2 to collect failing, hanging, and
> timedout tests.
>
> > I see that the following tests have failed 100% of the time for the last
> 30
> > runs [1]. If this captures all runs, this isn't truly flaky, but rather a
> > legitimate failure, right?
> > Maybe this tool is used to see all test failures, but if not, I feel like
> > we could/should remove a test from the flaky tests/excludes if it fails
> > consistently so we can fix the root cause
>
> Has come up a lot of times before. Yes, you're right 100% failure =
> legitimate failure.
> <rant>
> We as a community suck at tracking nightly runs for failing tests and
> fixing them, otherwise we wouldn't have ~40 bad test, right!
> In fact, we suck at fixing tests even when it's presented in a nice clean
> list (this dashboard). We just don't prioritize tests in our work.
> The general attitude is, tests are failing...meh..what's new, have been
> failing for years. Instead of - Oh, one test failed, find the cause and
> revert it!
> So the real thing to change here is attitude of the community towards
> tests. I am +1 for anything that'll promote/support that change.
> </rant>
> I think we can actually update the script to send a mail to dev@ when it
> encounters these 100% failing tests. Waana try? :)
>
> -- Appy
>
>
>
>
> On Fri, Jan 12, 2018 at 11:29 AM, Zach York <zyork.contribut...@gmail.com>
> wrote:
>
> > Just took a look at the dashboard. Does this capture only failed runs or
> > all runs?
> >
> > I see that the following tests have failed 100% of the time for the last
> 30
> > runs [1]. If this captures all runs, this isn't truly flaky, but rather a
> > legitimate failure, right?
> > Maybe this tool is used to see all test failures, but if not, I feel like
> > we could/should remove a test from the flaky tests/excludes if it fails
> > consistently so we can fix the root cause.
> >
> > [1]
> > master.balancer.TestRegionsOnMasterOptions
> > client.TestMultiParallel
> > regionserver.TestRegionServerReadRequestMetrics
> >
> > Thanks,
> > Zach
> >
> > On Fri, Jan 12, 2018 at 8:19 AM, Stack <st...@duboce.net> wrote:
> >
> > > Dashboard doesn't capture timed out tests, right Appy?
> > > Thanks,
> > > S
> > >
> > > On Thu, Jan 11, 2018 at 6:10 PM, Apekshit Sharma <a...@cloudera.com>
> > > wrote:
> > >
> > > > https://builds.apache.org/job/HBase-Find-Flaky-Tests-
> > > > branch2.0/lastSuccessfulBuild/artifact/dashboard.html
> > > >
> > > > @stack: when you branch out branch-2.0, let me know, i'll update the
> > jobs
> > > > to point to that branch so that it's helpful in release. Once release
> > is
> > > > done, i'll move them back to "branch-2".
> > > >
> > > >
> > > > -- Appy
> > > >
> > >
> >
>
>
>
> --
>
> -- Appy
>

Re: Flaky dashboard for current branch-2

Reply via email to