Why a 100% failure test can not be detected with pre commit check? Ted Yu <yuzhih...@gmail.com>于2018年1月13日 周六07:44写道:
> As we get closer and closer to beta release, it is important to have as few > flaky tests as possible. > > bq. we can actually update the script to send a mail to dev@ > > A post to the JIRA which caused the 100% failing test would be better. > The committer would notice the post and take corresponding action. > > Cheers > > On Fri, Jan 12, 2018 at 3:35 PM, Apekshit Sharma <a...@cloudera.com> > wrote: > > > > Is Nightly now using a list of flakes? > > Dashboard job was flaky yesterday, so didn't start using it. Looks like > > it's working fine now. Let me exclude flakies from nightly job. > > > > > Just took a look at the dashboard. Does this capture only failed runs > or > > all > > runs? > > Sorry the question isn't clear. Runs of what? > > Here's an attempt to answer it in best way i can understand - it looks at > > last X (x=6 now) runs of nightly branch-2 to collect failing, hanging, > and > > timedout tests. > > > > > I see that the following tests have failed 100% of the time for the > last > > 30 > > > runs [1]. If this captures all runs, this isn't truly flaky, but > rather a > > > legitimate failure, right? > > > Maybe this tool is used to see all test failures, but if not, I feel > like > > > we could/should remove a test from the flaky tests/excludes if it fails > > > consistently so we can fix the root cause > > > > Has come up a lot of times before. Yes, you're right 100% failure = > > legitimate failure. > > <rant> > > We as a community suck at tracking nightly runs for failing tests and > > fixing them, otherwise we wouldn't have ~40 bad test, right! > > In fact, we suck at fixing tests even when it's presented in a nice clean > > list (this dashboard). We just don't prioritize tests in our work. > > The general attitude is, tests are failing...meh..what's new, have been > > failing for years. Instead of - Oh, one test failed, find the cause and > > revert it! > > So the real thing to change here is attitude of the community towards > > tests. I am +1 for anything that'll promote/support that change. > > </rant> > > I think we can actually update the script to send a mail to dev@ when it > > encounters these 100% failing tests. Waana try? :) > > > > -- Appy > > > > > > > > > > On Fri, Jan 12, 2018 at 11:29 AM, Zach York < > zyork.contribut...@gmail.com> > > wrote: > > > > > Just took a look at the dashboard. Does this capture only failed runs > or > > > all runs? > > > > > > I see that the following tests have failed 100% of the time for the > last > > 30 > > > runs [1]. If this captures all runs, this isn't truly flaky, but > rather a > > > legitimate failure, right? > > > Maybe this tool is used to see all test failures, but if not, I feel > like > > > we could/should remove a test from the flaky tests/excludes if it fails > > > consistently so we can fix the root cause. > > > > > > [1] > > > master.balancer.TestRegionsOnMasterOptions > > > client.TestMultiParallel > > > regionserver.TestRegionServerReadRequestMetrics > > > > > > Thanks, > > > Zach > > > > > > On Fri, Jan 12, 2018 at 8:19 AM, Stack <st...@duboce.net> wrote: > > > > > > > Dashboard doesn't capture timed out tests, right Appy? > > > > Thanks, > > > > S > > > > > > > > On Thu, Jan 11, 2018 at 6:10 PM, Apekshit Sharma <a...@cloudera.com> > > > > wrote: > > > > > > > > > https://builds.apache.org/job/HBase-Find-Flaky-Tests- > > > > > branch2.0/lastSuccessfulBuild/artifact/dashboard.html > > > > > > > > > > @stack: when you branch out branch-2.0, let me know, i'll update > the > > > jobs > > > > > to point to that branch so that it's helpful in release. Once > release > > > is > > > > > done, i'll move them back to "branch-2". > > > > > > > > > > > > > > > -- Appy > > > > > > > > > > > > > > > > > > > > -- > > > > -- Appy > > >