As we get closer and closer to beta release, it is important to have as few flaky tests as possible.
bq. we can actually update the script to send a mail to dev@ A post to the JIRA which caused the 100% failing test would be better. The committer would notice the post and take corresponding action. Cheers On Fri, Jan 12, 2018 at 3:35 PM, Apekshit Sharma <a...@cloudera.com> wrote: > > Is Nightly now using a list of flakes? > Dashboard job was flaky yesterday, so didn't start using it. Looks like > it's working fine now. Let me exclude flakies from nightly job. > > > Just took a look at the dashboard. Does this capture only failed runs or > all > runs? > Sorry the question isn't clear. Runs of what? > Here's an attempt to answer it in best way i can understand - it looks at > last X (x=6 now) runs of nightly branch-2 to collect failing, hanging, and > timedout tests. > > > I see that the following tests have failed 100% of the time for the last > 30 > > runs [1]. If this captures all runs, this isn't truly flaky, but rather a > > legitimate failure, right? > > Maybe this tool is used to see all test failures, but if not, I feel like > > we could/should remove a test from the flaky tests/excludes if it fails > > consistently so we can fix the root cause > > Has come up a lot of times before. Yes, you're right 100% failure = > legitimate failure. > <rant> > We as a community suck at tracking nightly runs for failing tests and > fixing them, otherwise we wouldn't have ~40 bad test, right! > In fact, we suck at fixing tests even when it's presented in a nice clean > list (this dashboard). We just don't prioritize tests in our work. > The general attitude is, tests are failing...meh..what's new, have been > failing for years. Instead of - Oh, one test failed, find the cause and > revert it! > So the real thing to change here is attitude of the community towards > tests. I am +1 for anything that'll promote/support that change. > </rant> > I think we can actually update the script to send a mail to dev@ when it > encounters these 100% failing tests. Waana try? :) > > -- Appy > > > > > On Fri, Jan 12, 2018 at 11:29 AM, Zach York <zyork.contribut...@gmail.com> > wrote: > > > Just took a look at the dashboard. Does this capture only failed runs or > > all runs? > > > > I see that the following tests have failed 100% of the time for the last > 30 > > runs [1]. If this captures all runs, this isn't truly flaky, but rather a > > legitimate failure, right? > > Maybe this tool is used to see all test failures, but if not, I feel like > > we could/should remove a test from the flaky tests/excludes if it fails > > consistently so we can fix the root cause. > > > > [1] > > master.balancer.TestRegionsOnMasterOptions > > client.TestMultiParallel > > regionserver.TestRegionServerReadRequestMetrics > > > > Thanks, > > Zach > > > > On Fri, Jan 12, 2018 at 8:19 AM, Stack <st...@duboce.net> wrote: > > > > > Dashboard doesn't capture timed out tests, right Appy? > > > Thanks, > > > S > > > > > > On Thu, Jan 11, 2018 at 6:10 PM, Apekshit Sharma <a...@cloudera.com> > > > wrote: > > > > > > > https://builds.apache.org/job/HBase-Find-Flaky-Tests- > > > > branch2.0/lastSuccessfulBuild/artifact/dashboard.html > > > > > > > > @stack: when you branch out branch-2.0, let me know, i'll update the > > jobs > > > > to point to that branch so that it's helpful in release. Once release > > is > > > > done, i'll move them back to "branch-2". > > > > > > > > > > > > -- Appy > > > > > > > > > > > > > -- > > -- Appy >