On a semi-related note, I noticed recently that negative tests seem to OOM in setup from time to time. Can we increase the amount of memory for the tests a little bit, and/or maybe add the dump on OOM to them, saved to test logs directory, so we could investigate?
On 18/3/5, 11:07, "Vineet Garg" <vg...@hortonworks.com> wrote: >+1 for nightly build. We could generate reports to identify both frequent >and sporadic test failures plus other interesting bits like average build >time, yetus failures etc. It’ll also help narrow down the culprit >commit(s) range to one day. >If you guys decide to go ahead with this I would like to help. > >Vineet > >> On Mar 5, 2018, at 8:50 AM, Sahil Takiar <takiar.sa...@gmail.com> wrote: >> >> Wow that HBase UI looks super useful. +1 to having something like that. >> >> If not, +1 to having a proper nightly build, it would help devs identify >> which commits break which tests. I find using git-bisect can take a long >> time to run, and can be difficult to use (e.g. finding a known good >>commit >> isn't always easy). >> >> On Mon, Mar 5, 2018 at 9:03 AM, Peter Vary <pv...@cloudera.com> wrote: >> >>> Without a nightly build and with this many flaky tests it is very hard >>>to >>> identify the braking commits. We can use something like bisect and >>>multiple >>> test runs. >>> >>> There is a more elegant way to do this with nightly test runs: >>> https://issues.apache.org/jira/browse/HBASE-15917 < >>> https://issues.apache.org/jira/browse/HBASE-15917> >>> https://builds.apache.org/job/HBASE-Find-Flaky-Tests/ >>> lastSuccessfulBuild/artifact/dashboard.html <https://builds.apache.org/ >>> job/HBASE-Find-Flaky-Tests/lastSuccessfulBuild/artifact/dashboard.html> >>> >>> This also helps to identify the flaky tests, and creates a continuos, >>> updated list of them. >>> >>>> On Feb 23, 2018, at 6:55 PM, Sahil Takiar <takiar.sa...@gmail.com> >>> wrote: >>>> >>>> +1 >>>> >>>> Does anyone have suggestions about how to efficiently identify which >>> commit >>>> is breaking a test? Is it just git-bisect or is there an easier way? >>>>Hive >>>> QA isn't always that helpful, it will say a test is failing for the >>>>past >>>> "x" builds, but that doesn't help much since Hive QA isn't a nightly >>> build. >>>> >>>> On Thu, Feb 22, 2018 at 10:31 AM, Vihang Karajgaonkar < >>> vih...@cloudera.com> >>>> wrote: >>>> >>>>> +1 >>>>> Commenting on JIRA and giving a 24hr heads-up (excluding weekends) >>> would be >>>>> good. >>>>> >>>>> On Thu, Feb 22, 2018 at 10:19 AM, Alan Gates <alanfga...@gmail.com> >>> wrote: >>>>> >>>>>> +1. >>>>>> >>>>>> Alan. >>>>>> >>>>>> On Thu, Feb 22, 2018 at 8:25 AM, Thejas Nair <thejas.n...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> +1 >>>>>>> I agree, this makes sense. The number of failures keeps increasing. >>>>>>> A 24 hour heads up in either case before revert would be good. >>>>>>> >>>>>>> >>>>>>> On Thu, Feb 22, 2018 at 2:45 AM, Peter Vary <pv...@cloudera.com> >>>>> wrote: >>>>>>> >>>>>>>> I agree with Zoltan. The continuously braking tests make it very >>>>>>>>hard >>>>>> to >>>>>>>> spot real issues. >>>>>>>> Any thoughts on doing it automatically? >>>>>>>> >>>>>>>>> On Feb 22, 2018, at 10:47 AM, Zoltan Haindrich <k...@rxd.hu> >>>>> wrote: >>>>>>>>> >>>>>>>>> * >>>>>>>>> >>>>>>>>> Hello, >>>>>>>>> >>>>>>>>> * >>>>>>>>> * >>>>>>>>> >>>>>>>>> ** >>>>>>>>> >>>>>>>>> In the last couple weeks the number of broken tests have started >>>>>>>>>to >>>>>> go >>>>>>>> up...and even tho I run bisect/etc from time to time ; sometimes >>>>> people >>>>>>>> don’t react to my comments/tickets/etc. >>>>>>>>> >>>>>>>>> Because keeping this many failing tests makes it easier for a new >>>>> one >>>>>>> to >>>>>>>> slip in...I think reverting the patch introducing the test >>>>>>>>failures >>>>>> would >>>>>>>> also help in some case. >>>>>>>>> >>>>>>>>> I think it would help a lot to prevent further test breaks to >>>>> revert >>>>>>> the >>>>>>>> patch if any of the following conditions is met: >>>>>>>>> >>>>>>>>> * >>>>>>>>> * >>>>>>>>> >>>>>>>>> C1) if the notification/comment about the fact that the patch >>>>> indeed >>>>>>>> broken a test somehow have been unanswered for at least 24 hours. >>>>>>>>> >>>>>>>>> C2) if the patch is in for 7 days; but the test failure is still >>>>> not >>>>>>>> addressed (note that in this case there might be a conversation >>>>>>>>about >>>>>>>> fixing it...but in this case ; to enable other people to work in a >>>>>>> cleaner >>>>>>>> environment is more important than a single patch - and if it >>>>>>>>can't >>>>> be >>>>>>>> fixed in 7 days...well it might not get fixed in a month). >>>>>>>>> >>>>>>>>> * >>>>>>>>> * >>>>>>>>> >>>>>>>>> I would like to also note that I've seen a few tickets which have >>>>>> been >>>>>>>> picked up by people who were not involved in creating the original >>>>>>> change - >>>>>>>> and although the intention was good, they might miss the context >>>>>>>>of >>>>> the >>>>>>>> original patch and may "fix" the tests in the wrong way: accept a >>>>> q.out >>>>>>>> which is inappropriate or ignore the test... >>>>>>>>> >>>>>>>>> * >>>>>>>>> * >>>>>>>>> >>>>>>>>> would it be ok to implement this from now on? because it makes my >>>>>>>> efforts practically useless if people are not reacting… >>>>>>>>> >>>>>>>>> * >>>>>>>>> * >>>>>>>>> >>>>>>>>> note: just to be on the same page - this is only about running a >>>>>> single >>>>>>>> test which falls on its own - I feel that flaky tests are an >>>>>>>>entirely >>>>>>>> different topic. >>>>>>>>> >>>>>>>>> * >>>>>>>>> * >>>>>>>>> >>>>>>>>> cheers, >>>>>>>>> >>>>>>>>> Zoltan >>>>>>>>> >>>>>>>>> ** >>>>>>>>> * >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>>> >>>> >>>> -- >>>> Sahil Takiar >>>> Software Engineer >>>> takiar.sa...@gmail.com | (510) 673-0309 >>> >>> >> >> >> -- >> Sahil Takiar >> Software Engineer >> takiar.sa...@gmail.com | (510) 673-0309 >