Re: reverting test-breaking changes

Vineet Garg Mon, 05 Mar 2018 11:15:07 -0800

+1 for nightly build. We could generate reports to identify both frequent and 
sporadic test failures plus other interesting bits like average build time, 
yetus failures etc. It’ll also help narrow down the culprit commit(s) range to 
one day.
If you guys decide to go ahead with this I would like to help.


Vineet

> On Mar 5, 2018, at 8:50 AM, Sahil Takiar <[email protected]> wrote:
> 
> Wow that HBase UI looks super useful. +1 to having something like that.
> 
> If not, +1 to having a proper nightly build, it would help devs identify
> which commits break which tests. I find using git-bisect can take a long
> time to run, and can be difficult to use (e.g. finding a known good commit
> isn't always easy).
> 
> On Mon, Mar 5, 2018 at 9:03 AM, Peter Vary <[email protected]> wrote:
> 
>> Without a nightly build and with this many flaky tests it is very hard to
>> identify the braking commits. We can use something like bisect and multiple
>> test runs.
>> 
>> There is a more elegant way to do this with nightly test runs:
>> https://issues.apache.org/jira/browse/HBASE-15917 <
>> https://issues.apache.org/jira/browse/HBASE-15917>
>> https://builds.apache.org/job/HBASE-Find-Flaky-Tests/
>> lastSuccessfulBuild/artifact/dashboard.html <https://builds.apache.org/
>> job/HBASE-Find-Flaky-Tests/lastSuccessfulBuild/artifact/dashboard.html>
>> 
>> This also helps to identify the flaky tests, and creates a continuos,
>> updated list of them.
>> 
>>> On Feb 23, 2018, at 6:55 PM, Sahil Takiar <[email protected]>
>> wrote:
>>> 
>>> +1
>>> 
>>> Does anyone have suggestions about how to efficiently identify which
>> commit
>>> is breaking a test? Is it just git-bisect or is there an easier way? Hive
>>> QA isn't always that helpful, it will say a test is failing for the past
>>> "x" builds, but that doesn't help much since Hive QA isn't a nightly
>> build.
>>> 
>>> On Thu, Feb 22, 2018 at 10:31 AM, Vihang Karajgaonkar <
>> [email protected]>
>>> wrote:
>>> 
>>>> +1
>>>> Commenting on JIRA and giving a 24hr heads-up (excluding weekends)
>> would be
>>>> good.
>>>> 
>>>> On Thu, Feb 22, 2018 at 10:19 AM, Alan Gates <[email protected]>
>> wrote:
>>>> 
>>>>> +1.
>>>>> 
>>>>> Alan.
>>>>> 
>>>>> On Thu, Feb 22, 2018 at 8:25 AM, Thejas Nair <[email protected]>
>>>>> wrote:
>>>>> 
>>>>>> +1
>>>>>> I agree, this makes sense. The number of failures keeps increasing.
>>>>>> A 24 hour heads up in either case before revert would be good.
>>>>>> 
>>>>>> 
>>>>>> On Thu, Feb 22, 2018 at 2:45 AM, Peter Vary <[email protected]>
>>>> wrote:
>>>>>> 
>>>>>>> I agree with Zoltan. The continuously braking tests make it very hard
>>>>> to
>>>>>>> spot real issues.
>>>>>>> Any thoughts on doing it automatically?
>>>>>>> 
>>>>>>>> On Feb 22, 2018, at 10:47 AM, Zoltan Haindrich <[email protected]>
>>>> wrote:
>>>>>>>> 
>>>>>>>> *
>>>>>>>> 
>>>>>>>> Hello,
>>>>>>>> 
>>>>>>>> *
>>>>>>>> *
>>>>>>>> 
>>>>>>>> **
>>>>>>>> 
>>>>>>>> In the last couple weeks the number of broken tests have started to
>>>>> go
>>>>>>> up...and even tho I run bisect/etc from time to time ; sometimes
>>>> people
>>>>>>> don’t react to my comments/tickets/etc.
>>>>>>>> 
>>>>>>>> Because keeping this many failing tests makes it easier for a new
>>>> one
>>>>>> to
>>>>>>> slip in...I think reverting the patch introducing the test failures
>>>>> would
>>>>>>> also help in some case.
>>>>>>>> 
>>>>>>>> I think it would help a lot to prevent further test breaks to
>>>> revert
>>>>>> the
>>>>>>> patch if any of the following conditions is met:
>>>>>>>> 
>>>>>>>> *
>>>>>>>> *
>>>>>>>> 
>>>>>>>> C1) if the notification/comment about the fact that the patch
>>>> indeed
>>>>>>> broken a test somehow have been unanswered for at least 24 hours.
>>>>>>>> 
>>>>>>>> C2) if the patch is in for 7 days; but the test failure is still
>>>> not
>>>>>>> addressed (note that in this case there might be a conversation about
>>>>>>> fixing it...but in this case ; to enable other people to work in a
>>>>>> cleaner
>>>>>>> environment is more important than a single patch - and if it can't
>>>> be
>>>>>>> fixed in 7 days...well it might not get fixed in a month).
>>>>>>>> 
>>>>>>>> *
>>>>>>>> *
>>>>>>>> 
>>>>>>>> I would like to also note that I've seen a few tickets which have
>>>>> been
>>>>>>> picked up by people who were not involved in creating the original
>>>>>> change -
>>>>>>> and although the intention was good, they might miss the context of
>>>> the
>>>>>>> original patch and may "fix" the tests in the wrong way: accept a
>>>> q.out
>>>>>>> which is inappropriate or ignore the test...
>>>>>>>> 
>>>>>>>> *
>>>>>>>> *
>>>>>>>> 
>>>>>>>> would it be ok to implement this from now on? because it makes my
>>>>>>> efforts practically useless if people are not reacting…
>>>>>>>> 
>>>>>>>> *
>>>>>>>> *
>>>>>>>> 
>>>>>>>> note: just to be on the same page - this is only about running a
>>>>> single
>>>>>>> test which falls on its own - I feel that flaky tests are an entirely
>>>>>>> different topic.
>>>>>>>> 
>>>>>>>> *
>>>>>>>> *
>>>>>>>> 
>>>>>>>> cheers,
>>>>>>>> 
>>>>>>>> Zoltan
>>>>>>>> 
>>>>>>>> **
>>>>>>>> *
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>>> 
>>> 
>>> --
>>> Sahil Takiar
>>> Software Engineer
>>> [email protected] | (510) 673-0309
>> 
>> 
> 
> 
> -- 
> Sahil Takiar
> Software Engineer
> [email protected] | (510) 673-0309

Re: reverting test-breaking changes

Reply via email to