Re: reverting test-breaking changes

Sergey Shelukhin Mon, 05 Mar 2018 11:35:51 -0800

On a semi-related note, I noticed recently that negative tests seem to OOM
in setup from time to time.
Can we increase the amount of memory for the tests a little bit, and/or
maybe add the dump on OOM to them, saved to test logs directory, so we
could investigate?


On 18/3/5, 11:07, "Vineet Garg" <vg...@hortonworks.com> wrote:

>+1 for nightly build. We could generate reports to identify both frequent
>and sporadic test failures plus other interesting bits like average build
>time, yetus failures etc. It’ll also help narrow down the culprit
>commit(s) range to one day.
>If you guys decide to go ahead with this I would like to help.
>
>Vineet
>
>> On Mar 5, 2018, at 8:50 AM, Sahil Takiar <takiar.sa...@gmail.com> wrote:
>> 
>> Wow that HBase UI looks super useful. +1 to having something like that.
>> 
>> If not, +1 to having a proper nightly build, it would help devs identify
>> which commits break which tests. I find using git-bisect can take a long
>> time to run, and can be difficult to use (e.g. finding a known good
>>commit
>> isn't always easy).
>> 
>> On Mon, Mar 5, 2018 at 9:03 AM, Peter Vary <pv...@cloudera.com> wrote:
>> 
>>> Without a nightly build and with this many flaky tests it is very hard
>>>to
>>> identify the braking commits. We can use something like bisect and
>>>multiple
>>> test runs.
>>> 
>>> There is a more elegant way to do this with nightly test runs:
>>> https://issues.apache.org/jira/browse/HBASE-15917 <
>>> https://issues.apache.org/jira/browse/HBASE-15917>
>>> https://builds.apache.org/job/HBASE-Find-Flaky-Tests/
>>> lastSuccessfulBuild/artifact/dashboard.html <https://builds.apache.org/
>>> job/HBASE-Find-Flaky-Tests/lastSuccessfulBuild/artifact/dashboard.html>
>>> 
>>> This also helps to identify the flaky tests, and creates a continuos,
>>> updated list of them.
>>> 
>>>> On Feb 23, 2018, at 6:55 PM, Sahil Takiar <takiar.sa...@gmail.com>
>>> wrote:
>>>> 
>>>> +1
>>>> 
>>>> Does anyone have suggestions about how to efficiently identify which
>>> commit
>>>> is breaking a test? Is it just git-bisect or is there an easier way?
>>>>Hive
>>>> QA isn't always that helpful, it will say a test is failing for the
>>>>past
>>>> "x" builds, but that doesn't help much since Hive QA isn't a nightly
>>> build.
>>>> 
>>>> On Thu, Feb 22, 2018 at 10:31 AM, Vihang Karajgaonkar <
>>> vih...@cloudera.com>
>>>> wrote:
>>>> 
>>>>> +1
>>>>> Commenting on JIRA and giving a 24hr heads-up (excluding weekends)
>>> would be
>>>>> good.
>>>>> 
>>>>> On Thu, Feb 22, 2018 at 10:19 AM, Alan Gates <alanfga...@gmail.com>
>>> wrote:
>>>>> 
>>>>>> +1.
>>>>>> 
>>>>>> Alan.
>>>>>> 
>>>>>> On Thu, Feb 22, 2018 at 8:25 AM, Thejas Nair <thejas.n...@gmail.com>
>>>>>> wrote:
>>>>>> 
>>>>>>> +1
>>>>>>> I agree, this makes sense. The number of failures keeps increasing.
>>>>>>> A 24 hour heads up in either case before revert would be good.
>>>>>>> 
>>>>>>> 
>>>>>>> On Thu, Feb 22, 2018 at 2:45 AM, Peter Vary <pv...@cloudera.com>
>>>>> wrote:
>>>>>>> 
>>>>>>>> I agree with Zoltan. The continuously braking tests make it very
>>>>>>>>hard
>>>>>> to
>>>>>>>> spot real issues.
>>>>>>>> Any thoughts on doing it automatically?
>>>>>>>> 
>>>>>>>>> On Feb 22, 2018, at 10:47 AM, Zoltan Haindrich <k...@rxd.hu>
>>>>> wrote:
>>>>>>>>> 
>>>>>>>>> *
>>>>>>>>> 
>>>>>>>>> Hello,
>>>>>>>>> 
>>>>>>>>> *
>>>>>>>>> *
>>>>>>>>> 
>>>>>>>>> **
>>>>>>>>> 
>>>>>>>>> In the last couple weeks the number of broken tests have started
>>>>>>>>>to
>>>>>> go
>>>>>>>> up...and even tho I run bisect/etc from time to time ; sometimes
>>>>> people
>>>>>>>> don’t react to my comments/tickets/etc.
>>>>>>>>> 
>>>>>>>>> Because keeping this many failing tests makes it easier for a new
>>>>> one
>>>>>>> to
>>>>>>>> slip in...I think reverting the patch introducing the test
>>>>>>>>failures
>>>>>> would
>>>>>>>> also help in some case.
>>>>>>>>> 
>>>>>>>>> I think it would help a lot to prevent further test breaks to
>>>>> revert
>>>>>>> the
>>>>>>>> patch if any of the following conditions is met:
>>>>>>>>> 
>>>>>>>>> *
>>>>>>>>> *
>>>>>>>>> 
>>>>>>>>> C1) if the notification/comment about the fact that the patch
>>>>> indeed
>>>>>>>> broken a test somehow have been unanswered for at least 24 hours.
>>>>>>>>> 
>>>>>>>>> C2) if the patch is in for 7 days; but the test failure is still
>>>>> not
>>>>>>>> addressed (note that in this case there might be a conversation
>>>>>>>>about
>>>>>>>> fixing it...but in this case ; to enable other people to work in a
>>>>>>> cleaner
>>>>>>>> environment is more important than a single patch - and if it
>>>>>>>>can't
>>>>> be
>>>>>>>> fixed in 7 days...well it might not get fixed in a month).
>>>>>>>>> 
>>>>>>>>> *
>>>>>>>>> *
>>>>>>>>> 
>>>>>>>>> I would like to also note that I've seen a few tickets which have
>>>>>> been
>>>>>>>> picked up by people who were not involved in creating the original
>>>>>>> change -
>>>>>>>> and although the intention was good, they might miss the context
>>>>>>>>of
>>>>> the
>>>>>>>> original patch and may "fix" the tests in the wrong way: accept a
>>>>> q.out
>>>>>>>> which is inappropriate or ignore the test...
>>>>>>>>> 
>>>>>>>>> *
>>>>>>>>> *
>>>>>>>>> 
>>>>>>>>> would it be ok to implement this from now on? because it makes my
>>>>>>>> efforts practically useless if people are not reacting…
>>>>>>>>> 
>>>>>>>>> *
>>>>>>>>> *
>>>>>>>>> 
>>>>>>>>> note: just to be on the same page - this is only about running a
>>>>>> single
>>>>>>>> test which falls on its own - I feel that flaky tests are an
>>>>>>>>entirely
>>>>>>>> different topic.
>>>>>>>>> 
>>>>>>>>> *
>>>>>>>>> *
>>>>>>>>> 
>>>>>>>>> cheers,
>>>>>>>>> 
>>>>>>>>> Zoltan
>>>>>>>>> 
>>>>>>>>> **
>>>>>>>>> *
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> --
>>>> Sahil Takiar
>>>> Software Engineer
>>>> takiar.sa...@gmail.com | (510) 673-0309
>>> 
>>> 
>> 
>> 
>> -- 
>> Sahil Takiar
>> Software Engineer
>> takiar.sa...@gmail.com | (510) 673-0309
>

Re: reverting test-breaking changes

Reply via email to