TestNegativeCli driver tests are still failing with java.lang.OutOfMemoryError: GC overhead limit exceeded error. Can we increase the amount of memory for tests?
Vineet On Mar 5, 2018, at 11:35 AM, Sergey Shelukhin <ser...@hortonworks.com<mailto:ser...@hortonworks.com>> wrote: On a semi-related note, I noticed recently that negative tests seem to OOM in setup from time to time. Can we increase the amount of memory for the tests a little bit, and/or maybe add the dump on OOM to them, saved to test logs directory, so we could investigate? On 18/3/5, 11:07, "Vineet Garg" <vg...@hortonworks.com<mailto:vg...@hortonworks.com>> wrote: +1 for nightly build. We could generate reports to identify both frequent and sporadic test failures plus other interesting bits like average build time, yetus failures etc. It’ll also help narrow down the culprit commit(s) range to one day. If you guys decide to go ahead with this I would like to help. Vineet On Mar 5, 2018, at 8:50 AM, Sahil Takiar <takiar.sa...@gmail.com<mailto:takiar.sa...@gmail.com>> wrote: Wow that HBase UI looks super useful. +1 to having something like that. If not, +1 to having a proper nightly build, it would help devs identify which commits break which tests. I find using git-bisect can take a long time to run, and can be difficult to use (e.g. finding a known good commit isn't always easy). On Mon, Mar 5, 2018 at 9:03 AM, Peter Vary <pv...@cloudera.com<mailto:pv...@cloudera.com>> wrote: Without a nightly build and with this many flaky tests it is very hard to identify the braking commits. We can use something like bisect and multiple test runs. There is a more elegant way to do this with nightly test runs: https://issues.apache.org/jira/browse/HBASE-15917 < https://issues.apache.org/jira/browse/HBASE-15917> https://builds.apache.org/job/HBASE-Find-Flaky-Tests/ lastSuccessfulBuild/artifact/dashboard.html <https://builds.apache.org/ job/HBASE-Find-Flaky-Tests/lastSuccessfulBuild/artifact/dashboard.html> This also helps to identify the flaky tests, and creates a continuos, updated list of them. On Feb 23, 2018, at 6:55 PM, Sahil Takiar <takiar.sa...@gmail.com> wrote: +1 Does anyone have suggestions about how to efficiently identify which commit is breaking a test? Is it just git-bisect or is there an easier way? Hive QA isn't always that helpful, it will say a test is failing for the past "x" builds, but that doesn't help much since Hive QA isn't a nightly build. On Thu, Feb 22, 2018 at 10:31 AM, Vihang Karajgaonkar < vih...@cloudera.com> wrote: +1 Commenting on JIRA and giving a 24hr heads-up (excluding weekends) would be good. On Thu, Feb 22, 2018 at 10:19 AM, Alan Gates <alanfga...@gmail.com> wrote: +1. Alan. On Thu, Feb 22, 2018 at 8:25 AM, Thejas Nair <thejas.n...@gmail.com> wrote: +1 I agree, this makes sense. The number of failures keeps increasing. A 24 hour heads up in either case before revert would be good. On Thu, Feb 22, 2018 at 2:45 AM, Peter Vary <pv...@cloudera.com> wrote: I agree with Zoltan. The continuously braking tests make it very hard to spot real issues. Any thoughts on doing it automatically? On Feb 22, 2018, at 10:47 AM, Zoltan Haindrich <k...@rxd.hu> wrote: * Hello, * * ** In the last couple weeks the number of broken tests have started to go up...and even tho I run bisect/etc from time to time ; sometimes people don’t react to my comments/tickets/etc. Because keeping this many failing tests makes it easier for a new one to slip in...I think reverting the patch introducing the test failures would also help in some case. I think it would help a lot to prevent further test breaks to revert the patch if any of the following conditions is met: * * C1) if the notification/comment about the fact that the patch indeed broken a test somehow have been unanswered for at least 24 hours. C2) if the patch is in for 7 days; but the test failure is still not addressed (note that in this case there might be a conversation about fixing it...but in this case ; to enable other people to work in a cleaner environment is more important than a single patch - and if it can't be fixed in 7 days...well it might not get fixed in a month). * * I would like to also note that I've seen a few tickets which have been picked up by people who were not involved in creating the original change - and although the intention was good, they might miss the context of the original patch and may "fix" the tests in the wrong way: accept a q.out which is inappropriate or ignore the test... * * would it be ok to implement this from now on? because it makes my efforts practically useless if people are not reacting… * * note: just to be on the same page - this is only about running a single test which falls on its own - I feel that flaky tests are an entirely different topic. * * cheers, Zoltan ** * -- Sahil Takiar Software Engineer takiar.sa...@gmail.com | (510) 673-0309 -- Sahil Takiar Software Engineer takiar.sa...@gmail.com<mailto:takiar.sa...@gmail.com> | (510) 673-0309