Re: [Gluster-devel] How to cope with spurious regression failures

2016-03-09 Thread Raghavendra Talur
On Thu, Mar 3, 2016 at 7:12 AM, Raghavendra Talur  wrote:

>
>
> On Wed, Feb 10, 2016 at 8:29 PM, Emmanuel Dreyfus  wrote:
>
>> On Wed, Feb 10, 2016 at 07:30:24PM +0530, Raghavendra Talur wrote:
>> > Any comments before I merge the patch
>> http://review.gluster.org/#/c/13393/ ?
>>
>> The proposal has the merit of adressing the multi-OS case, but fails
>> to address future OS addition. If it does not matter to change the
>> naming the day we add another OS, it is fine for me. Otherwise, I
>> adivse using a 8 bit hex value that will be fine for a long time:
>>
>> K01-test.t kills for Linux
>> K02-test.t kills for NetBSD
>> K03-test.t kills for both Linux and NetBSD
>> K04-test.t kills for new OS 1 we would add later.
>> K08-test.t kills for new OS 2 we would add later.
>> K10-test.t kills for new OS 3 we would add later.
>> K19-tets.t kills for Linux, new OS 2 and new OS 3...
>>
>> Of course if we add more than 8 OS in regression that is not enough,
>> but you can start with a 16 bit value if you prefer :-)
>>
>>
> OK, I have updated the patch and replied to Manu's question and Jeff's
> question on the patch itself.
> The tests passed on the first run itself, except for the NetBSD with
> "another test running on slave" error.
> I consider passing on first run itself as a great improvement compared to
> our current state.
>
> Please review the patch: http://review.gluster.org/#/c/13393/
>

Next patch set uploaded, hopefully this is final one.
Prasanna had a comment that he did not like encoding of info in test name.
Also, parsing will become difficult if we ever want other data to be
encoded in file name.

This new patch set puts all the data in the test file as a comment.
At the most basic level it is a csv of key,value pairs.

Please have a look, http://review.gluster.org/#/c/13393/


>
> --
>> Emmanuel Dreyfus
>> m...@netbsd.org
>>
>
>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] How to cope with spurious regression failures

2016-03-02 Thread Emmanuel Dreyfus
Raghavendra Talur  wrote:

> Yes,  because I updated from patch set 2 to 3 and tests for 2 were running
> on the same slave.

It seems my test for concurent runs misfires when previous run was
aborted. I need to improve that.

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] How to cope with spurious regression failures

2016-03-02 Thread Raghavendra Talur
On Mar 3, 2016 7:42 AM, "Emmanuel Dreyfus"  wrote:
>
> Raghavendra Talur  wrote:
>
> > The tests passed on the first run itself, except for the NetBSD with
> > "another test running on slave" error.
>
> Was the previous test on the slave canceled?

Yes,  because I updated from patch set 2 to 3 and tests for 2 were running
on the same slave.

Job for patch set 2
https://build.gluster.org/job/rackspace-netbsd7-regression-triggered/14816/

Job for patch set 3
https://build.gluster.org/job/rackspace-netbsd7-regression-triggered/14817/

>
> --
> Emmanuel Dreyfus
> http://hcpnet.free.fr/pubz
> m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] How to cope with spurious regression failures

2016-03-02 Thread Emmanuel Dreyfus
Raghavendra Talur  wrote:

> The tests passed on the first run itself, except for the NetBSD with
> "another test running on slave" error.

Was the previous test on the slave canceled?

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] How to cope with spurious regression failures

2016-03-02 Thread Raghavendra Talur
On Wed, Feb 10, 2016 at 8:29 PM, Emmanuel Dreyfus  wrote:

> On Wed, Feb 10, 2016 at 07:30:24PM +0530, Raghavendra Talur wrote:
> > Any comments before I merge the patch
> http://review.gluster.org/#/c/13393/ ?
>
> The proposal has the merit of adressing the multi-OS case, but fails
> to address future OS addition. If it does not matter to change the
> naming the day we add another OS, it is fine for me. Otherwise, I
> adivse using a 8 bit hex value that will be fine for a long time:
>
> K01-test.t kills for Linux
> K02-test.t kills for NetBSD
> K03-test.t kills for both Linux and NetBSD
> K04-test.t kills for new OS 1 we would add later.
> K08-test.t kills for new OS 2 we would add later.
> K10-test.t kills for new OS 3 we would add later.
> K19-tets.t kills for Linux, new OS 2 and new OS 3...
>
> Of course if we add more than 8 OS in regression that is not enough,
> but you can start with a 16 bit value if you prefer :-)
>
>
OK, I have updated the patch and replied to Manu's question and Jeff's
question on the patch itself.
The tests passed on the first run itself, except for the NetBSD with
"another test running on slave" error.
I consider passing on first run itself as a great improvement compared to
our current state.

Please review the patch: http://review.gluster.org/#/c/13393/

--
> Emmanuel Dreyfus
> m...@netbsd.org
>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] How to cope with spurious regression failures

2016-02-10 Thread Raghavendra Talur
Any comments before I merge the patch http://review.gluster.org/#/c/13393/ ?

On Mon, Feb 8, 2016 at 3:15 PM, Raghavendra Talur  wrote:

>
>
> On Tue, Jan 19, 2016 at 8:33 PM, Emmanuel Dreyfus  wrote:
>
>> On Tue, Jan 19, 2016 at 07:08:03PM +0530, Raghavendra Talur wrote:
>> > a. Allowing re-running to tests to make them pass leads to complacency
>> with
>> > how tests are written.
>> > b. A test is bad if it is not deterministic and running a bad test has
>> *no*
>> > value. We are wasting time even if the test runs for a few seconds.
>>
>> I agree with your vision for the long term, but my proposal address the
>> short term situation. But we could use the retry approahc to fuel your
>> blacklist approach:
>>
>> We could immagine a system where the retry feature would cast votes on
>> individual tests: each time we fail once and succeed on retry, cast
>> a +1 unreliable for the test.
>>
>> After a few days, we will have a wall of shame for unreliable tests,
>> which could either be fixed or go to the blacklist.
>>
>> I do not know what software to use to collect and display the results,
>> though. Should we have a gerrit change for each test?
>>
>
> This should be the process of adding tests to bad tests list. However, I
> have run out of time on this one.
> If someone would like to implement go ahead. I don't see myself trying
> this any soon.
>
>
>>
>> --
>> Emmanuel Dreyfus
>> m...@netbsd.org
>
>
>
> Thanks for the inputs.
>
> I have refactored run-tests.sh to use retry option.
> If run-tests.sh is started with -r flag, failed tests would be run once
> again and won't be considered as failed if they pass. Note: Adding -r flag
> to jenkins config is not done yet.
>
> I have also implemented a better version of blacklist which complies with
> requirements from Manu on granularity of bad tests to be OS.
> Here is the patch: http://review.gluster.org/#/c/13393/
>
>
>
>
>
>
>
>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] How to cope with spurious regression failures

2016-02-08 Thread Raghavendra Talur
On Tue, Jan 19, 2016 at 8:33 PM, Emmanuel Dreyfus  wrote:

> On Tue, Jan 19, 2016 at 07:08:03PM +0530, Raghavendra Talur wrote:
> > a. Allowing re-running to tests to make them pass leads to complacency
> with
> > how tests are written.
> > b. A test is bad if it is not deterministic and running a bad test has
> *no*
> > value. We are wasting time even if the test runs for a few seconds.
>
> I agree with your vision for the long term, but my proposal address the
> short term situation. But we could use the retry approahc to fuel your
> blacklist approach:
>
> We could immagine a system where the retry feature would cast votes on
> individual tests: each time we fail once and succeed on retry, cast
> a +1 unreliable for the test.
>
> After a few days, we will have a wall of shame for unreliable tests,
> which could either be fixed or go to the blacklist.
>
> I do not know what software to use to collect and display the results,
> though. Should we have a gerrit change for each test?
>

This should be the process of adding tests to bad tests list. However, I
have run out of time on this one.
If someone would like to implement go ahead. I don't see myself trying this
any soon.


>
> --
> Emmanuel Dreyfus
> m...@netbsd.org



Thanks for the inputs.

I have refactored run-tests.sh to use retry option.
If run-tests.sh is started with -r flag, failed tests would be run once
again and won't be considered as failed if they pass. Note: Adding -r flag
to jenkins config is not done yet.

I have also implemented a better version of blacklist which complies with
requirements from Manu on granularity of bad tests to be OS.
Here is the patch: http://review.gluster.org/#/c/13393/
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] How to cope with spurious regression failures

2016-01-19 Thread Atin Mukherjee


On 01/19/2016 10:45 AM, Emmanuel Dreyfus wrote:
> Hi
> 
> Spurious regression failures make developers frustrated. One submits a
> change and gets completely unrelated failures. The only way out is to
> retrigger regression until it passes, a boring and time-wasting task.
> Sometimes after 4 or 5 failed runs, the submitter realize there is a
> real issue and look at it, which is a waste of time and resources.
> 
> The fact that we run regression on multiple platforms makes the
> situation worse. If you have 10% of chances to hit a spurious failure on
> Linux and a 20% chances to hit a spurious failure on NetBSD (random
> number chosen), that means you get roughtly a failure for four
> submissions (random prediction, as I used random input numbers, but you
> get the idea)
> 
> Two solutions are proposed:
> 
> 1) do not run unreliable tests, as proposed by Raghavendra Talur:
> http://review.gluster.org/13173
> 
> I have nothing against the idea, but I voted down the change because it
> fails to address the need for different test blacklists on different
> platforms: we do not have the same unreliable tests on Linux and NetBSD.
> 
> 2) add a regression option to retry a failed test once, and to validate
> the regression if second attempt passes, as I proposed:
> http://review.gluster.org/13245
> 
> The idea is basicaly to automatically do what every submitter has been
> doing: retry without a thought when regression fails. The benefit of
> this approach is also that it gives us a better view of what test failed
> because of the change, and what test failed because it was unreliable.
> 
> The retry feature is optionnal and triggered by using the -r flag to
> run-tests.sh. I intend to use it on NetBSD regression to reduce the
> number of failures that annoy people. It could be used on Linux
> regression too, though I do not plan to touch that on my own.
+1 to option 2
> 
> Please people tell us what approach you prefer. 
> 
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] How to cope with spurious regression failures

2016-01-19 Thread Raghavendra Talur
On Tue, Jan 19, 2016 at 5:21 PM, Atin Mukherjee  wrote:

>
>
> On 01/19/2016 10:45 AM, Emmanuel Dreyfus wrote:
> > Hi
> >
> > Spurious regression failures make developers frustrated. One submits a
> > change and gets completely unrelated failures. The only way out is to
> > retrigger regression until it passes, a boring and time-wasting task.
> > Sometimes after 4 or 5 failed runs, the submitter realize there is a
> > real issue and look at it, which is a waste of time and resources.
> >
> > The fact that we run regression on multiple platforms makes the
> > situation worse. If you have 10% of chances to hit a spurious failure on
> > Linux and a 20% chances to hit a spurious failure on NetBSD (random
> > number chosen), that means you get roughtly a failure for four
> > submissions (random prediction, as I used random input numbers, but you
> > get the idea)
> >
> > Two solutions are proposed:
> >
> > 1) do not run unreliable tests, as proposed by Raghavendra Talur:
> > http://review.gluster.org/13173
> >
> > I have nothing against the idea, but I voted down the change because it
> > fails to address the need for different test blacklists on different
> > platforms: we do not have the same unreliable tests on Linux and NetBSD.
>

Why I prefer having this solution:
a. Allowing re-running to tests to make them pass leads to complacency with
how tests are written.
b. A test is bad if it is not deterministic and running a bad test has *no*
value. We are wasting time even if the test runs for a few seconds.
c. I propose another method to overcome the technical difficulty of having
blacklists for different platforms. We could have "[K[a-z]*-]*" as prefix
of tests where [a-z]* could be L or N signify that the test is bad on Linux
and NetBSD respectively. The run-tests.sh script can be made intelligent
enough to determine host OS and skip them.



> >
> > 2) add a regression option to retry a failed test once, and to validate
> > the regression if second attempt passes, as I proposed:
> > http://review.gluster.org/13245
> >
> > The idea is basicaly to automatically do what every submitter has been
> > doing: retry without a thought when regression fails. The benefit of
> > this approach is also that it gives us a better view of what test failed
> > because of the change, and what test failed because it was unreliable.
> >
> > The retry feature is optionnal and triggered by using the -r flag to
> > run-tests.sh. I intend to use it on NetBSD regression to reduce the
> > number of failures that annoy people. It could be used on Linux
> > regression too, though I do not plan to touch that on my own.
> +1 to option 2
> >
> > Please people tell us what approach you prefer.
> >
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel
>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] How to cope with spurious regression failures

2016-01-19 Thread Atin Mukherjee


On 01/19/2016 07:08 PM, Raghavendra Talur wrote:
> 
> 
> On Tue, Jan 19, 2016 at 5:21 PM, Atin Mukherjee  > wrote:
> 
> 
> 
> On 01/19/2016 10:45 AM, Emmanuel Dreyfus wrote:
> > Hi
> >
> > Spurious regression failures make developers frustrated. One submits a
> > change and gets completely unrelated failures. The only way out is to
> > retrigger regression until it passes, a boring and time-wasting task.
> > Sometimes after 4 or 5 failed runs, the submitter realize there is a
> > real issue and look at it, which is a waste of time and resources.
> >
> > The fact that we run regression on multiple platforms makes the
> > situation worse. If you have 10% of chances to hit a spurious
> failure on
> > Linux and a 20% chances to hit a spurious failure on NetBSD (random
> > number chosen), that means you get roughtly a failure for four
> > submissions (random prediction, as I used random input numbers,
> but you
> > get the idea)
> >
> > Two solutions are proposed:
> >
> > 1) do not run unreliable tests, as proposed by Raghavendra Talur:
> > http://review.gluster.org/13173
> >
> > I have nothing against the idea, but I voted down the change
> because it
> > fails to address the need for different test blacklists on different
> > platforms: we do not have the same unreliable tests on Linux and
> NetBSD.
> 
> 
> Why I prefer having this solution:
> a. Allowing re-running to tests to make them pass leads to complacency
> with how tests are written.
> b. A test is bad if it is not deterministic and running a bad test has
> *no* value. We are wasting time even if the test runs for a few seconds.
IMHO, most of our tests are non-deterministic and that's why my vote
would be for option 2 over 1 as that reduces the probability of retriggers.
> c. I propose another method to overcome the technical difficulty of
> having blacklists for different platforms. We could have "[K[a-z]*-]*"
> as prefix of tests where [a-z]* could be L or N signify that the test is
> bad on Linux and NetBSD respectively. The run-tests.sh script can be
> made intelligent enough to determine host OS and skip them.
> 
>  
> 
> >
> > 2) add a regression option to retry a failed test once, and to
> validate
> > the regression if second attempt passes, as I proposed:
> > http://review.gluster.org/13245
> >
> > The idea is basicaly to automatically do what every submitter has been
> > doing: retry without a thought when regression fails. The benefit of
> > this approach is also that it gives us a better view of what test
> failed
> > because of the change, and what test failed because it was unreliable.
> >
> > The retry feature is optionnal and triggered by using the -r flag to
> > run-tests.sh. I intend to use it on NetBSD regression to reduce the
> > number of failures that annoy people. It could be used on Linux
> > regression too, though I do not plan to touch that on my own.
> +1 to option 2
> >
> > Please people tell us what approach you prefer.
> >
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org 
> http://www.gluster.org/mailman/listinfo/gluster-devel
> 
> 
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] How to cope with spurious regression failures

2016-01-19 Thread Emmanuel Dreyfus
On Tue, Jan 19, 2016 at 07:08:03PM +0530, Raghavendra Talur wrote:
> a. Allowing re-running to tests to make them pass leads to complacency with
> how tests are written.
> b. A test is bad if it is not deterministic and running a bad test has *no*
> value. We are wasting time even if the test runs for a few seconds.

I agree with your vision for the long term, but my proposal address the
short term situation. But we could use the retry approahc to fuel your
blacklist approach:

We could immagine a system where the retry feature would cast votes on 
individual tests: each time we fail once and succeed on retry, cast 
a +1 unreliable for the test.

After a few days, we will have a wall of shame for unreliable tests, 
which could either be fixed or go to the blacklist.

I do not know what software to use to collect and display the results, 
though. Should we have a gerrit change for each test?

-- 
Emmanuel Dreyfus
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel