Re: [Gluster-devel] [Gluster-infra] Reduce regression runs wait time - New gerrit/review work flow

2015-06-18 Thread Kaushal M
On Thu, Jun 18, 2015 at 1:50 AM, Niels de Vos  wrote:
> On Mon, Jun 15, 2015 at 04:19:14PM +0530, Kaushal M wrote:
>> Hi all,
> ...
>> I propose that we move back to the old model of starting regression
>> runs only once the maintainers are ready to merge. But instead of the
>> maintainers manually tiggering the runs, we could automate it.
>
> I think auto triggering regression tests is good. We should ask the
> developers to run regression tests before posting complex changes. If
> the parallelisation of regression tests is done, the wait time should
> reduce too.
>
> As a maintainer that spends quite some time reviewing patches, I prefer
> to see a +1 verified before I start to review something. With that, I at
> least have some confidence that there are no obvious mistakes I need to
> point out. If developers have to wait on me before regression testing
> gets started, I feel more like a block on road than helping them. There
> really are *many* patches that get a FAILED result where there is a
> problem in the code. Developers should get a response about that as soon
> as possible, and waiting for a maintainer to start the regression tests
> does not help.

The really obvious mistakes should get caught by the smoke tests. With
the proposed workflow, we will still run smoke whenever a change is
submitted. The result of this will still be communicated back to
Gerrit, it could be in the form of a Verified+1 flag, or could be
under another name.

If we feel that the current smoke tests aren't satisfactory enough to
build confidence, then we must improve them.

>
> I also had to ask maintainers for triggering regression tests for my
> first patches, it is not a nice experience. Anything we can do to
> improve the experience for (new) developers should be done, delaying
> (auotmated) feedback isnt a step in the right direction.
>
>> We can model our new workflow on those of OpenStack[1] and
>> Wikimedia[2]. The existing Gerrit plugin for Jenkins doesn't provide
>> the features necessary to enable selective triggering based on Gerrit
>> flags. Both OpenStack and Wikimedia use a project gating tool called
>> Zuul[3], which provides a much better integration with Jenkins and
>> Gerrit and more features on top.
>
> More intelligent triggering would be helpful. Unfortunately we have a
> stack of xlators and it is difficult to say if there are unintended
> side-effects in different, untouched pieces of the code.
>

I only started looking at Zuul, because I believed that the existing
Gerrit-trigger plugin for Jenkins didn't support triggering based on
flags. But your reply in another mail thread says otherwise. If it is
possible to achieve the workflow without using Zuul, I'm all for it.

When talking about intelligent triggering, I was referring to the
possibility of ordering several different types of jenkins jobs, and
not selection of a subset of regression tests.

>
>> I propose the following work flow,
>>
>> - Developer pushes change to Gerrit.
>>   - Zuul is notified by Gerrit of new change
>> - Zuul runs pre-review checks on Jenkins. This will be the current smoke 
>> tests.
>>   - Zuul reports back status of the checks to Gerrit.
>> - If checks fail, developer will need to resend the change after
>> the required fixes. The process starts once more.
>> - If the checks pass, the change is now ready for review
>> - The change is now reviewed by other developers and maintainers.
>> Non-maintainers will be able to give only a +1 review.
>>   - On a negative review, the developer will need to rework the change
>> and resend it. The process starts once more.
>> - The maintainer give a +2 review once he/she is satisfied. The
>> maintainers work is done here.
>>   - Zuul is notified of the +2 review
>> - Zuul runs the regression runs and reports back the status.
>>   - If the regression runs fail, the process starts over again.
>>   - If the runs pass, the change is ready for acceptance.
>> - Zuul will pick the change into the repository.
>>   - If the pick fails, Zuul will report back the failure, and the
>> process starts once again.
>
> It would be nice if Zuul, in its last step, can pick the change on top
> of the latest HEAD, run the build/smoke test again, and only push the
> change when all is OK. We have seen patch/merge races where a
> function/define was changed, and an other patch used that
> function/define. These caused much issues when the branch failed to
> compile. Being able to prevent that would be very good.
>
>
>> Following this flow should,
>> 1. Reduce regression wait time
>
> "wait time" for what or who? The merging of the patch would still only
> happen after all tests are done. If something fails the last test, more
> people (reviewers and maintainer) need to spend additional time.
>

The wait time for a change to have regression run on it once ready.
Now, we have changes which have been given +1/+2 reviews, but keep
waiting for ages for their regressions to finish, because of all th

Re: [Gluster-devel] [Gluster-infra] Reduce regression runs wait time - New gerrit/review work flow

2015-06-18 Thread Niels de Vos
On Mon, Jun 15, 2015 at 02:20:57PM +, Emmanuel Dreyfus wrote:
> On Mon, Jun 15, 2015 at 10:09:33AM -0400, Jeff Darcy wrote:
> > As long as there's some visible marking on the summary pages to
> > distinguish patches that have passed smoke vs. those that haven't, I
> > think we're good.
> 
> gerrit manual says you can add more comulns like revview and verified.

Yes, and I think that makes much sense to do so! I've moved that request
to a different thread so that it is less likely to get missed:

http://thread.gmane.org/gmane.comp.file-systems.gluster.infra/306

Thanks,
Niels
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] [Gluster-infra] Reduce regression runs wait time - New gerrit/review work flow

2015-06-17 Thread Niels de Vos
On Mon, Jun 15, 2015 at 04:19:14PM +0530, Kaushal M wrote:
> Hi all,
...
> I propose that we move back to the old model of starting regression
> runs only once the maintainers are ready to merge. But instead of the
> maintainers manually tiggering the runs, we could automate it.

I think auto triggering regression tests is good. We should ask the
developers to run regression tests before posting complex changes. If
the parallelisation of regression tests is done, the wait time should
reduce too.

As a maintainer that spends quite some time reviewing patches, I prefer
to see a +1 verified before I start to review something. With that, I at
least have some confidence that there are no obvious mistakes I need to
point out. If developers have to wait on me before regression testing
gets started, I feel more like a block on road than helping them. There
really are *many* patches that get a FAILED result where there is a
problem in the code. Developers should get a response about that as soon
as possible, and waiting for a maintainer to start the regression tests
does not help.

I also had to ask maintainers for triggering regression tests for my
first patches, it is not a nice experience. Anything we can do to
improve the experience for (new) developers should be done, delaying
(auotmated) feedback isnt a step in the right direction.

> We can model our new workflow on those of OpenStack[1] and
> Wikimedia[2]. The existing Gerrit plugin for Jenkins doesn't provide
> the features necessary to enable selective triggering based on Gerrit
> flags. Both OpenStack and Wikimedia use a project gating tool called
> Zuul[3], which provides a much better integration with Jenkins and
> Gerrit and more features on top.

More intelligent triggering would be helpful. Unfortunately we have a
stack of xlators and it is difficult to say if there are unintended
side-effects in different, untouched pieces of the code.


> I propose the following work flow,
> 
> - Developer pushes change to Gerrit.
>   - Zuul is notified by Gerrit of new change
> - Zuul runs pre-review checks on Jenkins. This will be the current smoke 
> tests.
>   - Zuul reports back status of the checks to Gerrit.
> - If checks fail, developer will need to resend the change after
> the required fixes. The process starts once more.
> - If the checks pass, the change is now ready for review
> - The change is now reviewed by other developers and maintainers.
> Non-maintainers will be able to give only a +1 review.
>   - On a negative review, the developer will need to rework the change
> and resend it. The process starts once more.
> - The maintainer give a +2 review once he/she is satisfied. The
> maintainers work is done here.
>   - Zuul is notified of the +2 review
> - Zuul runs the regression runs and reports back the status.
>   - If the regression runs fail, the process starts over again.
>   - If the runs pass, the change is ready for acceptance.
> - Zuul will pick the change into the repository.
>   - If the pick fails, Zuul will report back the failure, and the
> process starts once again.

It would be nice if Zuul, in its last step, can pick the change on top
of the latest HEAD, run the build/smoke test again, and only push the
change when all is OK. We have seen patch/merge races where a
function/define was changed, and an other patch used that
function/define. These caused much issues when the branch failed to
compile. Being able to prevent that would be very good.


> Following this flow should,
> 1. Reduce regression wait time

"wait time" for what or who? The merging of the patch would still only
happen after all tests are done. If something fails the last test, more
people (reviewers and maintainer) need to spend additional time.

> 2. Improve change acceptance time
> 3. Reduce unnecessary  wastage of infra resources

We could, and should optimize that in our parallel testing and educating
develpers to only re-run regressions when needed. Splitting up the
regression tests also makes it possible to only re-run a small part of
the tests.

> 4. Improve infra stability.

Not sure if adding an other component and (complex?) configuration adds
to "Improve infra stability". It would be nice to have a very minimal
set of tools, and many people understanding them. With the current
Gerrit and Jenkins configuration we have, we seem to be already very
limited on people that can investigate issues.


> It also brings in drawbacks that we need to maintain one other piece
> of infra (Zuul). This would be an additional maintenance overhead on
> top of Gerrit, Jenkins and the current slaves. But I feel the
> reduction in the upkeep efforts of the slaves would be enough to
> offset this.
> 
> tl;dr
> Current auto-triggering of regression runs is stupid and a waste of
> time and resources. Bring in a project gating system, Zuul, which can
> do a much more intelligent jobs triggering, and use it to
> automatically trigger regression only for changes with Reviewe

Re: [Gluster-devel] [Gluster-infra] Reduce regression runs wait time - New gerrit/review work flow

2015-06-17 Thread Michael Scherer
Le lundi 15 juin 2015 à 16:19 +0530, Kaushal M a écrit :
> Hi all,
> 
> The recent rush of reviews being sent due to the release of 3.7 was a
> cause of frustration for many of us because of the regression tests
> (gerrit troubles themselves are another thing).
> 
> W.R.T regression 3 main sources of frustration were,
> 1. Spurious test failures
> 2. Long wait times
> 3. Regression slave troubles
> 
> We've already tackled the spurious failure issue and are quite stable
> now. The trouble with the slave vms is related to the gerrit issues,
> and is mainly due to the network issues we are having between the
> data-centers hosting the slaves and gerrit/jenkins. People have been
> looking into this, but we haven't had much success. This leaves the
> issue of the long wait times.
> 
> The long wait times are because of the long queues of pending jobs,
> some of which take days to get scheduled. Two things cause the long
> queues,
> 1. Automatic regression job triggering for all submissions to gerrit
> 2. Long run time for regression (~2h)
> 
> The long queues coupled with the spurious failure and network
> problems, meant that jobs would fail for no reason after a long wait,
> and would have to be added to the back of the queue to be re-run. This
> meant that developers would have to wait days for their changes to get
> merged, and was one of the causes for the delay in the release of 3.7.
> 
> The solution reduce wait times for regression runs. To reduce wait
> times we should,
> 1. Trigger runs only when required
> 2. Reduce regression run time.
> 
> Raghavendra Talur (rtalur/RaSTar) will soon send out a mail with his
> findings on the regression run times, and we can continue discussion
> on it on that thread.
> 
> Earlier, the regression runs used to be manually triggered by the
> maintainers once they had determined that a change was ready for
> submission. But as there were only two maintainers before (Vijay and
> Avati) auto triggering was brought in to reduce their load. Auto
> triggering worked fine when we had a lower volume of changes being
> submitted to gerrit. But now, with the large volumes we see during the
> release freeze dates, auto triggering just adds to problems.
> 
> I propose that we move back to the old model of starting regression
> runs only once the maintainers are ready to merge. But instead of the
> maintainers manually tiggering the runs, we could automate it.
> 
> We can model our new workflow on those of OpenStack[1] and
> Wikimedia[2]. The existing Gerrit plugin for Jenkins doesn't provide
> the features necessary to enable selective triggering based on Gerrit
> flags. Both OpenStack and Wikimedia use a project gating tool called
> Zuul[3], which provides a much better integration with Jenkins and
> Gerrit and more features on top.
> 
> I propose the following work flow,
> 
> - Developer pushes change to Gerrit.
>   - Zuul is notified by Gerrit of new change
> - Zuul runs pre-review checks on Jenkins. This will be the current smoke 
> tests.
>   - Zuul reports back status of the checks to Gerrit.
> - If checks fail, developer will need to resend the change after
> the required fixes. The process starts once more.
> - If the checks pass, the change is now ready for review
> - The change is now reviewed by other developers and maintainers.
> Non-maintainers will be able to give only a +1 review.
>   - On a negative review, the developer will need to rework the change
> and resend it. The process starts once more.
> - The maintainer give a +2 review once he/she is satisfied. The
> maintainers work is done here.
>   - Zuul is notified of the +2 review
> - Zuul runs the regression runs and reports back the status.
>   - If the regression runs fail, the process starts over again.
>   - If the runs pass, the change is ready for acceptance.
> - Zuul will pick the change into the repository.
>   - If the pick fails, Zuul will report back the failure, and the
> process starts once again.
> 
> Following this flow should,
> 1. Reduce regression wait time
> 2. Improve change acceptance time
> 3. Reduce unnecessary  wastage of infra resources
> 4. Improve infra stability.
> 
> It also brings in drawbacks that we need to maintain one other piece
> of infra (Zuul). This would be an additional maintenance overhead on
> top of Gerrit, Jenkins and the current slaves. But I feel the
> reduction in the upkeep efforts of the slaves would be enough to
> offset this.
> 
> tl;dr
> Current auto-triggering of regression runs is stupid and a waste of
> time and resources. Bring in a project gating system, Zuul, which can
> do a much more intelligent jobs triggering, and use it to
> automatically trigger regression only for changes with Reviewed+2 and
> automatically merge ones that pass.
> 
> What does the community think of this?

Zuul is being packaged for Fedora/EPEL, so it would greatly help to have
it packaged rather that a non sustainable self installation like we had
in the past.
-- 
Micha

Re: [Gluster-devel] [Gluster-infra] Reduce regression runs wait time - New gerrit/review work flow

2015-06-15 Thread Atin Mukherjee
On Jun 15, 2015 10:08 PM, "Saravanakumar Arumugam" 
wrote:
>
> Hi,
>
> > - Developer pushes change to Gerrit.
> >   - Zuul is notified by Gerrit of new change
> > - Zuul runs pre-review checks on Jenkins. This will be the current
smoke tests.
> >   - Zuul reports back status of the checks to Gerrit.
> > - If checks fail, developer will need to resend the change after
> > the required fixes. The process starts once more.
> > - If the checks pass, the change is now ready for review
> > - The change is now reviewed by other developers and maintainers.
> > Non-maintainers will be able to give only a +1 review.
> >   - On a negative review, the developer will need to rework the change
> > and resend it. The process starts once more.
> > - The maintainer give a +2 review once he/she is satisfied. The
> > maintainers work is done here.
> >   - Zuul is notified of the +2 review
> > - Zuul runs the regression runs and reports back the status.
> >   - If the regression runs fail, the process starts over again.
> >   - If the runs pass, the change is ready for acceptance.
> > - Zuul will pick the change into the repository.
> >   - If the pick fails, Zuul will report back the failure, and the
> > process starts once again.
>
> +2 for the idea.
>
> Can we have a small change in this flow ?
> ==
> What is proposed now: ( as per my understanding)
>
> Reviewer1 gives +1
> Reviewer2 gives +1
>
> Maintainer gives +2 (for merge)
>
> Now, regression triggered => Regression failed.
>
> So, code is again changed by Developer.
>
> Now, review needs to be done by Reviewer1/Reviewer2/Maintainer.
> ==
> A small change in the proposal:
>
> Reviewer1 gives +1
>
> A single +1 is enough to get Regression Triggered.
>   Lets say immediately Regression triggered and Failed.
>
> So, developer Re-submit his/her changes.
>
> Goes through Reviewer1, Reviewer2, Maintainer.
I still feel triggering regression on +2 is better as this patch is then a
serious candidate for merge and having that criteria will have the
regression queue pretty light weight. Even if a patch goes for iterations I
don't see any reason of delay if regression is not triggered on +1.

Cheers,
Atin
>
> ==
>
> How this helps?
> It does not goes through the process from the beginning(especially when
there is a Regression failure).
> <<  - If the regression runs fail, the process starts over again.
>
> Thanks,
> Saravanakumar
>
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] [Gluster-infra] Reduce regression runs wait time - New gerrit/review work flow

2015-06-15 Thread Gaurav Garg
Hi,

some more cleaner way:

>>Can we have a small change in this flow ? 
>>==
>>What is proposed now: ( as per my understanding) 

>>Reviewer1 gives +1
>>Reviewer2 gives +1

>>Maintainer gives +2 (for merge)

>>Now, regression triggered => Regression failed. 

The idea is good but I think it will make more dependent on maintainer to run a 
regression test. Maintainer will give +2 only after reviewing of full patch. If 
maintainer will busy in some other work which is higher priority then reviewing 
this patch (and giving +2 if patch are good) might be delay and it might delay 
regression test to run. 

It is good if any Reviewer give +1 (after reviewing patch) for triggering 
regression test.

other idea  

http://www.gluster.org/pipermail/gluster-devel/2014-May/040822.html

we can have docker based regression test run to improve our regression test 
time.

Thanx,

Regards,
Gaurav Garg



>>So, code is again changed by Developer.

- Original Message -
From: "Saravanakumar Arumugam" 
To: "Kaushal M" , "Atin Mukherjee" 
Cc: "gluster-infra" , "Gluster Devel" 

Sent: Monday, June 15, 2015 10:08:46 PM
Subject: Re: [Gluster-devel] [Gluster-infra] Reduce regression runs wait time - 
New gerrit/review work flow

Hi,

> - Developer pushes change to Gerrit.
>   - Zuul is notified by Gerrit of new change
> - Zuul runs pre-review checks on Jenkins. This will be the current smoke 
> tests.
>   - Zuul reports back status of the checks to Gerrit.
> - If checks fail, developer will need to resend the change after
> the required fixes. The process starts once more.
> - If the checks pass, the change is now ready for review
> - The change is now reviewed by other developers and maintainers.
> Non-maintainers will be able to give only a +1 review.
>   - On a negative review, the developer will need to rework the change
> and resend it. The process starts once more.
> - The maintainer give a +2 review once he/she is satisfied. The
> maintainers work is done here.
>   - Zuul is notified of the +2 review
> - Zuul runs the regression runs and reports back the status.
>   - If the regression runs fail, the process starts over again.
>   - If the runs pass, the change is ready for acceptance.
> - Zuul will pick the change into the repository.
>   - If the pick fails, Zuul will report back the failure, and the
> process starts once again.

+2 for the idea.

Can we have a small change in this flow ? 
==
What is proposed now: ( as per my understanding) 

Reviewer1 gives +1
Reviewer2 gives +1

Maintainer gives +2 (for merge)

Now, regression triggered => Regression failed. 

So, code is again changed by Developer.

Now, review needs to be done by Reviewer1/Reviewer2/Maintainer.
==
A small change in the proposal:

Reviewer1 gives +1

A single +1 is enough to get Regression Triggered.
  Lets say immediately Regression triggered and Failed.

So, developer Re-submit his/her changes.

Goes through Reviewer1, Reviewer2, Maintainer.

==

How this helps? 
It does not goes through the process from the beginning(especially when there 
is a Regression failure).
<<  - If the regression runs fail, the process starts over again. 

Thanks,
Saravanakumar

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] [Gluster-infra] Reduce regression runs wait time - New gerrit/review work flow

2015-06-15 Thread Saravanakumar Arumugam
Hi,

> - Developer pushes change to Gerrit.
>   - Zuul is notified by Gerrit of new change
> - Zuul runs pre-review checks on Jenkins. This will be the current smoke 
> tests.
>   - Zuul reports back status of the checks to Gerrit.
> - If checks fail, developer will need to resend the change after
> the required fixes. The process starts once more.
> - If the checks pass, the change is now ready for review
> - The change is now reviewed by other developers and maintainers.
> Non-maintainers will be able to give only a +1 review.
>   - On a negative review, the developer will need to rework the change
> and resend it. The process starts once more.
> - The maintainer give a +2 review once he/she is satisfied. The
> maintainers work is done here.
>   - Zuul is notified of the +2 review
> - Zuul runs the regression runs and reports back the status.
>   - If the regression runs fail, the process starts over again.
>   - If the runs pass, the change is ready for acceptance.
> - Zuul will pick the change into the repository.
>   - If the pick fails, Zuul will report back the failure, and the
> process starts once again.

+2 for the idea.

Can we have a small change in this flow ? 
==
What is proposed now: ( as per my understanding) 

Reviewer1 gives +1
Reviewer2 gives +1

Maintainer gives +2 (for merge)

Now, regression triggered => Regression failed. 

So, code is again changed by Developer.

Now, review needs to be done by Reviewer1/Reviewer2/Maintainer.
==
A small change in the proposal:

Reviewer1 gives +1

A single +1 is enough to get Regression Triggered.
  Lets say immediately Regression triggered and Failed.

So, developer Re-submit his/her changes.

Goes through Reviewer1, Reviewer2, Maintainer.

==

How this helps? 
It does not goes through the process from the beginning(especially when there 
is a Regression failure).
<<  - If the regression runs fail, the process starts over again. 

Thanks,
Saravanakumar

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel