[openstack-dev] [TripleO] Review metrics - what do we want to measure?

2014-08-13 Thread James Polley
In recent history, we've been looking each week at stats from
http://russellbryant.net/openstack-stats/tripleo-openreviews.html to get a
gauge on how our review pipeline is tracking.

The main stats we've been tracking have been the "since the last revision
without -1 or -2". I've included some history at [1], but the summary is
that our 3rd quartile has slipped from 13 days to 16 days over the last 4
weeks or so. Our 1st quartile is fairly steady lately, around 1 day (down
from 4 a month ago) and median is unchanged around 7 days.

There was lots of discussion in our last meeting about what could be
causing this[2]. However, the thing we wanted to bring to the list for the
discussion is:

Are we tracking the right metric? Should we be looking to something else to
tell us how well our pipeline is performing?

The meeting logs have quite a few suggestions about ways we could tweak the
existing metrics, but if we're measuring the wrong thing that's not going
to help.

I think that what we are looking for is a metric that lets us know whether
the majority of patches are getting feedback quickly. Maybe there's some
other metric that would give us a good indication?




[1] Current "Stats since the last revision without -1 or -2" :

Average wait time: 10 days, 17 hours, 6 minutes
1st quartile wait time: 1 days, 1 hours, 36 minutes
Median wait time: 7 days, 5 hours, 33 minutes
3rd quartile wait time: 16 days, 8 hours, 16 minutes

At last week's meeting we had: 3rd quartile wait time: 15 days, 13 hours,
47 minutes
A week before that: 3rd quartile wait time: 13 days, 9 hours, 11 minutes
The week before that was the mid-cycle, but the week before that:

19:53:38  Stats since the last revision without -1 or -2 :
19:53:38  Average wait time: 10 days, 17 hours, 49 minutes
19:53:38  1st quartile wait time: 4 days, 7 hours, 57 minutes
19:53:38  Median wait time: 7 days, 10 hours, 52 minutes
19:53:40  3rd quartile wait time: 13 days, 13 hours, 25 minutes

[2] Some of the things suggested as potential causes of the long 3rd median
times:

* We have small number of really old reviews that have only positive scores
but aren't being landed
* Some reviews get a -1 but then sit for a long time waiting for the author
to reply
* We have some really old reviews that suddenly get revived after a long
period being in WIP or abandoned, which reviewstats seems to miscount
* Reviewstats counts weekends, we don't (so a change that gets pushed at
5pm US Friday and gets reviewed at 9am Aus Monday would be seen by us as
having no wait time, but by reviewstats as ~36 hours)
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO] Review metrics - what do we want to measure?

2014-08-13 Thread Ben Nemec
One thing I am very interested in finally following up on, especially in
light of the snazzy new Gerrit separation for CI jobs, is to make the
check-tripleo job leave an actual vote rather than just a comment.  This
would clean up the (usually) many reviews sitting with a failing CI run,
for the purposes of stats anyway.  It would also make it easier to find
reviews that need a recheck or are legitimately breaking CI.

On 08/13/2014 06:03 PM, James Polley wrote:
> In recent history, we've been looking each week at stats from
> http://russellbryant.net/openstack-stats/tripleo-openreviews.html to get a
> gauge on how our review pipeline is tracking.
> 
> The main stats we've been tracking have been the "since the last revision
> without -1 or -2". I've included some history at [1], but the summary is
> that our 3rd quartile has slipped from 13 days to 16 days over the last 4
> weeks or so. Our 1st quartile is fairly steady lately, around 1 day (down
> from 4 a month ago) and median is unchanged around 7 days.
> 
> There was lots of discussion in our last meeting about what could be
> causing this[2]. However, the thing we wanted to bring to the list for the
> discussion is:
> 
> Are we tracking the right metric? Should we be looking to something else to
> tell us how well our pipeline is performing?
> 
> The meeting logs have quite a few suggestions about ways we could tweak the
> existing metrics, but if we're measuring the wrong thing that's not going
> to help.
> 
> I think that what we are looking for is a metric that lets us know whether
> the majority of patches are getting feedback quickly. Maybe there's some
> other metric that would give us a good indication?
> 
> 
> 
> 
> [1] Current "Stats since the last revision without -1 or -2" :
> 
> Average wait time: 10 days, 17 hours, 6 minutes
> 1st quartile wait time: 1 days, 1 hours, 36 minutes
> Median wait time: 7 days, 5 hours, 33 minutes
> 3rd quartile wait time: 16 days, 8 hours, 16 minutes
> 
> At last week's meeting we had: 3rd quartile wait time: 15 days, 13 hours,
> 47 minutes
> A week before that: 3rd quartile wait time: 13 days, 9 hours, 11 minutes
> The week before that was the mid-cycle, but the week before that:
> 
> 19:53:38  Stats since the last revision without -1 or -2 :
> 19:53:38  Average wait time: 10 days, 17 hours, 49 minutes
> 19:53:38  1st quartile wait time: 4 days, 7 hours, 57 minutes
> 19:53:38  Median wait time: 7 days, 10 hours, 52 minutes
> 19:53:40  3rd quartile wait time: 13 days, 13 hours, 25 minutes
> 
> [2] Some of the things suggested as potential causes of the long 3rd median
> times:
> 
> * We have small number of really old reviews that have only positive scores
> but aren't being landed
> * Some reviews get a -1 but then sit for a long time waiting for the author
> to reply

These aren't reflected in the stats above though.  Anything with a
negative vote gets kicked out of that category, and it's the one I think
we care about.

> * We have some really old reviews that suddenly get revived after a long
> period being in WIP or abandoned, which reviewstats seems to miscount
> * Reviewstats counts weekends, we don't (so a change that gets pushed at
> 5pm US Friday and gets reviewed at 9am Aus Monday would be seen by us as
> having no wait time, but by reviewstats as ~36 hours)

I would also add to this list:

* Old reviews with failed CI runs that no one has bothered to
recheck/fix.  I find quite a few of these when I'm going through the
outstanding review list.  These would be addressed by the voting change
I mentioned above because they would have a negative vote then.

> 
> 
> 
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO] Review metrics - what do we want to measure?

2014-08-15 Thread Jeremy Stanley
On 2014-08-13 19:51:52 -0500 (-0500), Ben Nemec wrote:
[...]
> make the check-tripleo job leave an actual vote rather than just a
> comment.
[...]

That, as previously discussed, will require some design work in
Zuul. Gerrit uses a single field per account for verify votes, which
means that if you want to have multiple concurrent votes for
different pipelines/job collections then we either need to use more
than one account for those or add additional columns for each.

There have already been discussions around how to implement this,
but in the TripleO case it might make more sense to revisit why we
have those additional pipelines and instead focus on resolving the
underlying issues which led to their use as a stop-gap.
-- 
Jeremy Stanley

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO] Review metrics - what do we want to measure?

2014-09-02 Thread Robert Collins
On 16 August 2014 02:43, Jeremy Stanley  wrote:
> On 2014-08-13 19:51:52 -0500 (-0500), Ben Nemec wrote:
> [...]
>> make the check-tripleo job leave an actual vote rather than just a
>> comment.
> [...]
>
> That, as previously discussed, will require some design work in
> Zuul. Gerrit uses a single field per account for verify votes, which
> means that if you want to have multiple concurrent votes for
> different pipelines/job collections then we either need to use more
> than one account for those or add additional columns for each.
>
> There have already been discussions around how to implement this,
> but in the TripleO case it might make more sense to revisit why we
> have those additional pipelines and instead focus on resolving the
> underlying issues which led to their use as a stop-gap.

I thought there was now a thung where zuul can use a different account
per pipeline?

-Rob

-- 
Robert Collins 
Distinguished Technologist
HP Converged Cloud

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO] Review metrics - what do we want to measure?

2014-09-02 Thread Jeremy Stanley
On 2014-09-03 11:51:13 +1200 (+1200), Robert Collins wrote:
> I thought there was now a thung where zuul can use a different account
> per pipeline?

That was the most likely solution we discussed at the summit, but I
don't believe we've implemented it yet (or if we have then it isn't
yet being used for any existing pipelines).
-- 
Jeremy Stanley

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO] Review metrics - what do we want to measure?

2014-09-02 Thread Robert Collins
On 14 August 2014 11:03, James Polley  wrote:
> In recent history, we've been looking each week at stats from
> http://russellbryant.net/openstack-stats/tripleo-openreviews.html to get a
> gauge on how our review pipeline is tracking.
>
> The main stats we've been tracking have been the "since the last revision
> without -1 or -2". I've included some history at [1], but the summary is
> that our 3rd quartile has slipped from 13 days to 16 days over the last 4
> weeks or so. Our 1st quartile is fairly steady lately, around 1 day (down
> from 4 a month ago) and median is unchanged around 7 days.
>
> There was lots of discussion in our last meeting about what could be causing
> this[2]. However, the thing we wanted to bring to the list for the
> discussion is:
>
> Are we tracking the right metric? Should we be looking to something else to
> tell us how well our pipeline is performing?
>
> The meeting logs have quite a few suggestions about ways we could tweak the
> existing metrics, but if we're measuring the wrong thing that's not going to
> help.
>
> I think that what we are looking for is a metric that lets us know whether
> the majority of patches are getting feedback quickly. Maybe there's some
> other metric that would give us a good indication?

If we review all patches quickly and land none, thats bad too :).

For the reviewers specifically i think we need a metric(s) that:
 - doesn't go bad when submitters go awol, don't respond etc
   - including when they come back - our stats shouldn't jump hugely
because an old review was resurrected
 - when good means submitters will be getting feedback
 - flag inventory- things we'd be happy to have landed that haven't
   - including things with a -1 from non-core reviewers (*)

(*) I often see -1's on things core wouldn't -1 due to the learning
curve involved in becoming core

So, as Ben says, I think we need to address the its-not-a-vote issue
as a priority, that has tripped us up in lots of ways

I think we need to discount -workflow patches where that was set by
the submitter, which AFAICT we don't do today.

Looking at current stats:
Longest waiting reviews (based on oldest rev without -1 or -2):

54 days, 2 hours, 41 minutes https://review.openstack.org/106167
(Keystone/LDAP integration)
That patch had a -1 on Aug 16 1:23 AM: but was quickyl turned to +2.

So this patch had a -1 then after discussion it became a +2. And its
evolved multiple times.

What should we be saying here? Clearly its had little review input
over its life, so I think its sadly accurate.

I wonder if a big chunk of our sliding quartile is just use not
reviewing the oldest reviews.

-Rob


-- 
Robert Collins 
Distinguished Technologist
HP Converged Cloud

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO] Review metrics - what do we want to measure?

2014-09-02 Thread Jesus M. Gonzalez-Barahona
On Wed, 2014-09-03 at 12:58 +1200, Robert Collins wrote:
> On 14 August 2014 11:03, James Polley  wrote:
> > In recent history, we've been looking each week at stats from
> > http://russellbryant.net/openstack-stats/tripleo-openreviews.html to get a
> > gauge on how our review pipeline is tracking.
> >
> > The main stats we've been tracking have been the "since the last revision
> > without -1 or -2". I've included some history at [1], but the summary is
> > that our 3rd quartile has slipped from 13 days to 16 days over the last 4
> > weeks or so. Our 1st quartile is fairly steady lately, around 1 day (down
> > from 4 a month ago) and median is unchanged around 7 days.
> >
> > There was lots of discussion in our last meeting about what could be causing
> > this[2]. However, the thing we wanted to bring to the list for the
> > discussion is:
> >
> > Are we tracking the right metric? Should we be looking to something else to
> > tell us how well our pipeline is performing?
> >
> > The meeting logs have quite a few suggestions about ways we could tweak the
> > existing metrics, but if we're measuring the wrong thing that's not going to
> > help.
> >
> > I think that what we are looking for is a metric that lets us know whether
> > the majority of patches are getting feedback quickly. Maybe there's some
> > other metric that would give us a good indication?
> 
> If we review all patches quickly and land none, thats bad too :).
> 
> For the reviewers specifically i think we need a metric(s) that:
>  - doesn't go bad when submitters go awol, don't respond etc
>- including when they come back - our stats shouldn't jump hugely
> because an old review was resurrected
>  - when good means submitters will be getting feedback
>  - flag inventory- things we'd be happy to have landed that haven't
>- including things with a -1 from non-core reviewers (*)
> 
> (*) I often see -1's on things core wouldn't -1 due to the learning
> curve involved in becoming core
> 
> So, as Ben says, I think we need to address the its-not-a-vote issue
> as a priority, that has tripped us up in lots of ways
> 
> I think we need to discount -workflow patches where that was set by
> the submitter, which AFAICT we don't do today.
> 
> Looking at current stats:
> Longest waiting reviews (based on oldest rev without -1 or -2):
> 
> 54 days, 2 hours, 41 minutes https://review.openstack.org/106167
> (Keystone/LDAP integration)
> That patch had a -1 on Aug 16 1:23 AM: but was quickyl turned to +2.
> 
> So this patch had a -1 then after discussion it became a +2. And its
> evolved multiple times.
> 
> What should we be saying here? Clearly its had little review input
> over its life, so I think its sadly accurate.
> 
> I wonder if a big chunk of our sliding quartile is just use not
> reviewing the oldest reviews.

I've been researching review process in OpenStack and other projects for
a while, and my impression is that at least three timing metrics are
relevant:

(1) Total time from submitting a patch to final closing of the review
process (landing that, or a subsequent patch, or finally abandoning).
This gives an idea of how the whole process is working.

(2) Time from submitting a patch to that patch being approved (+2 in
OpenStack, I guess) or declined (and a new patch is requested). This
gives an idea of how quick reviewers are providing definite feedback to
patch submitters, and is a metric for each patch cycle.

(3) Time from a patch being reviewed, with a new patch being requested,
to a new patch being submitted. This gives an idea of the "reaction
time" of patch submitter.

Usually, you want to keep (1) low, while (2) and (3) give you an idea of
what is happening if (1) gets high.

There is another relevant metric in some cases, which is

(4) The number of patch cycles per review cycle (that is, how many
patches are needed per patch landing in master). In some cases, that may
help to explain how (2) and (3) contribute to (1).

And a fifth metric gives you a "throughput" metric:

(5) BMI (backlog management index), number of new review processes by
number of closed review process for a certain period. It gives an idea
of whether the backlog is going up (>1) or down (<1), and is usually
very interesting when seen over time.

(1) alone is not enough to assess on how well the review process is,
because it could low, but (5) showing an increasing backlog because
simply new review requests come too quickly (eg, in periods when
developers are submitting a lot of patch proposals after a freeze). (1)
could also be high, but (5) show a decrease in the backlog, because for
example reviewers or submitters are overworked or slowly scheduled, but
still the project copes with the backlog. Depending on the relationship
of (1) and (5), maybe you need more reviewers, or reviewers scheduling
their reviews with more priority wrt other actions, or something else.

Note for example that in a project with low BMI (<1) for a long period,
but with a high tot

Re: [openstack-dev] [TripleO] Review metrics - what do we want to measure?

2014-09-03 Thread Joshua Hesketh

On 9/3/14 10:43 AM, Jeremy Stanley wrote:

On 2014-09-03 11:51:13 +1200 (+1200), Robert Collins wrote:

I thought there was now a thung where zuul can use a different account
per pipeline?

That was the most likely solution we discussed at the summit, but I
don't believe we've implemented it yet (or if we have then it isn't
yet being used for any existing pipelines).
It's currently in review[0], although I think from discussions with 
other zuul devs we're likely to refactor large parts of how we connect 
to systems and hence this may be delayed. If it's something that's 
needed soon(erish) we could probably do this step before the refactoring.


Cheers,
Josh

[0] https://review.openstack.org/#/c/97391/

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO] Review metrics - what do we want to measure?

2014-09-03 Thread Robert Collins
We would benefit a great deal from having this sooner.

On 3 September 2014 20:11, Joshua Hesketh  wrote:
> On 9/3/14 10:43 AM, Jeremy Stanley wrote:
>>
>> On 2014-09-03 11:51:13 +1200 (+1200), Robert Collins wrote:
>>>
>>> I thought there was now a thung where zuul can use a different account
>>> per pipeline?
>>
>> That was the most likely solution we discussed at the summit, but I
>> don't believe we've implemented it yet (or if we have then it isn't
>> yet being used for any existing pipelines).
>
> It's currently in review[0], although I think from discussions with other
> zuul devs we're likely to refactor large parts of how we connect to systems
> and hence this may be delayed. If it's something that's needed soon(erish)
> we could probably do this step before the refactoring.
>
> Cheers,
> Josh
>
> [0] https://review.openstack.org/#/c/97391/
>
>
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



-- 
Robert Collins 
Distinguished Technologist
HP Converged Cloud

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO] Review metrics - what do we want to measure?

2014-09-03 Thread Joshua Hesketh

I've moved it back up the review chain for you :-)

Rackspace Australia

On 9/3/14 6:34 PM, Robert Collins wrote:

We would benefit a great deal from having this sooner.

On 3 September 2014 20:11, Joshua Hesketh  wrote:

On 9/3/14 10:43 AM, Jeremy Stanley wrote:

On 2014-09-03 11:51:13 +1200 (+1200), Robert Collins wrote:

I thought there was now a thung where zuul can use a different account
per pipeline?

That was the most likely solution we discussed at the summit, but I
don't believe we've implemented it yet (or if we have then it isn't
yet being used for any existing pipelines).

It's currently in review[0], although I think from discussions with other
zuul devs we're likely to refactor large parts of how we connect to systems
and hence this may be delayed. If it's something that's needed soon(erish)
we could probably do this step before the refactoring.

Cheers,
Josh

[0] https://review.openstack.org/#/c/97391/


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev






___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO] Review metrics - what do we want to measure?

2014-09-04 Thread Derek Higgins
On 14/08/14 00:03, James Polley wrote:
> In recent history, we've been looking each week at stats
> from http://russellbryant.net/openstack-stats/tripleo-openreviews.html
> to get a gauge on how our review pipeline is tracking.
> 
> The main stats we've been tracking have been the "since the last
> revision without -1 or -2". I've included some history at [1], but the
> summary is that our 3rd quartile has slipped from 13 days to 16 days
> over the last 4 weeks or so. Our 1st quartile is fairly steady lately,
> around 1 day (down from 4 a month ago) and median is unchanged around 7
> days.
> 
> There was lots of discussion in our last meeting about what could be
> causing this[2]. However, the thing we wanted to bring to the list for
> the discussion is:
> 
> Are we tracking the right metric? Should we be looking to something else
> to tell us how well our pipeline is performing?
> 
> The meeting logs have quite a few suggestions about ways we could tweak
> the existing metrics, but if we're measuring the wrong thing that's not
> going to help.
> 
> I think that what we are looking for is a metric that lets us know
> whether the majority of patches are getting feedback quickly. Maybe
> there's some other metric that would give us a good indication?

Bring back auto abandon...

Gerrit at one stage not so long ago used to auto abandon patches that
had negative feedback and was over a week without activity, I believe
this was removed when a gerrit upgrade gave core reviewers the ability
to abandon other peoples patches.

This was the single best part of the entire process to keep things moving
o submitters were forced to keep patches current
o reviewers were not looking at stale or already known broken patches
o If something wasn't important it got abandoned and was never heard of
again, if it was important it would be reopened
o patch submitters were forced to engage with the reviewers quickly on
negative feedback instead of leaving a patch sitting there indefinitely

Here is the number I think we should be looking at
http://russellbryant.net/openstack-stats/tripleo-reviewers-30.txt
  Queue growth in the last 30 days: 72 (2.4/day)

http://russellbryant.net/openstack-stats/tripleo-reviewers-90.txt
  Queue growth in the last 90 days: 132 (1.5/day)

Obviously this isn't sustainable

re enabling auto abandon will ensure the majority of the patches we are
looking at are current/good quality and not lost in a sea of -1's.

How would people feel about turning it back on? Can it be done on a per
project basis?

To make the whole process a little friendlier we could increase the time
frame from 1 week to 2.

Derek.

> 
> 
> 
> 
> [1] Current "Stats since the last revision without -1 or -2" :
> 
> Average wait time: 10 days, 17 hours, 6 minutes
> 1st quartile wait time: 1 days, 1 hours, 36 minutes
> Median wait time: 7 days, 5 hours, 33 minutes
> 3rd quartile wait time: 16 days, 8 hours, 16 minutes
> 
> At last week's meeting we had: 3rd quartile wait time: 15 days, 13
> hours, 47 minutes
> A week before that: 3rd quartile wait time: 13 days, 9 hours, 11 minutes
> The week before that was the mid-cycle, but the week before that:
> 
> 19:53:38  Stats since the last revision without -1 or -2 :
> 19:53:38  Average wait time: 10 days, 17 hours, 49 minutes
> 19:53:38  1st quartile wait time: 4 days, 7 hours, 57 minutes
> 19:53:38  Median wait time: 7 days, 10 hours, 52 minutes
> 19:53:40  3rd quartile wait time: 13 days, 13 hours, 25 minutes
> 
> [2] Some of the things suggested as potential causes of the long 3rd
> median times:
> 
> * We have small number of really old reviews that have only positive
> scores but aren't being landed
> * Some reviews get a -1 but then sit for a long time waiting for the
> author to reply
> * We have some really old reviews that suddenly get revived after a long
> period being in WIP or abandoned, which reviewstats seems to miscount
> * Reviewstats counts weekends, we don't (so a change that gets pushed at
> 5pm US Friday and gets reviewed at 9am Aus Monday would be seen by us as
> having no wait time, but by reviewstats as ~36 hours)
> 
> 
> 
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO] Review metrics - what do we want to measure?

2014-09-04 Thread Jeremy Stanley
On 2014-09-04 11:01:55 +0100 (+0100), Derek Higgins wrote:
[...]
> How would people feel about turning [auto-abandon] back on?

A lot of reviewers (myself among them) feel auto-abandon was a
cold and emotionless way to provide feedback on a change. Especially
on high-change-volume projects where core reviewers may at times get
sucked into triaging other problems for long enough that the
auto-abandoner kills lots of legitimate changes (possibly from
new contributors who will get even more disgusted by this than the
silence itself and walk away indefinitely with the impression that
we really aren't a welcoming development community at all).

> Can it be done on a per project basis?

It can, by running your own... but again it seems far better for
core reviewers to decide if a change has potential or needs to be
abandoned--that way there's an accountable human making that
deliberate choice rather than the review team hiding behind an
automated process so that no one is to blame for hurt feelings
besides the infra operators who are enforcing this draconian measure
for you.

> To make the whole process a little friendlier we could increase
> the time frame from 1 week to 2.

How about just automatically abandon any new change as soon
as it's published, and if the contributor really feels it's
important they'll unabandon it.
-- 
Jeremy Stanley

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO] Review metrics - what do we want to measure?

2014-09-04 Thread Derek Higgins
On 04/09/14 14:54, Jeremy Stanley wrote:
> On 2014-09-04 11:01:55 +0100 (+0100), Derek Higgins wrote:
> [...]
>> How would people feel about turning [auto-abandon] back on?
> 
> A lot of reviewers (myself among them) feel auto-abandon was a
> cold and emotionless way to provide feedback on a change. Especially
> on high-change-volume projects where core reviewers may at times get
> sucked into triaging other problems for long enough that the
> auto-abandoner kills lots of legitimate changes (possibly from
> new contributors who will get even more disgusted by this than the
> silence itself and walk away indefinitely with the impression that
> we really aren't a welcoming development community at all).

Ok, I see how this may be unwelcoming to a new contributor, a feeling
that could be justified in some cases. Any established contributor
should (I know I did when it was enforce) see it as part of the process.

perhaps we exempt new users?

On the other hand I'm not talking about abandoning a change because
there was silence for a fixed period of time, I'm talking about
abandoning it because it got negative feedback and it wasn't addressed
either through discussion or a new patch.

I have no problem if we push the inactivity period out to a month or
more, I just think there needs to be a cutoff at some stage.


> 
>> Can it be done on a per project basis?
> 
> It can, by running your own... but again it seems far better for
> core reviewers to decide if a change has potential or needs to be
> abandoned--that way there's an accountable human making that
> deliberate choice rather than the review team hiding behind an
> automated process so that no one is to blame for hurt feelings
> besides the infra operators who are enforcing this draconian measure
> for you.

There are plenty of examples of places where we have automated processes
in the community (some of which may hurt feeling) in order to take load
off specific individuals or the community in general. In fact automating
processes in places where people don't scale or are bottlenecks seems to
be a common theme.

We automate CI and give people negative feedback
We expire bugs in some projects that are Incomplete and are 60 days inactive

I really don't see this as the review team hiding behind an automated
process. A patch got negative feedback and we're automating the process
to prompt the submitter to deal with it. It may be more friendly if it
was a 2 step process
  1. (after a few days if inactivity) Add a comment saying you got
negative feedback with suggestions of how to proceed and information
that the review will be autoabandoned if nothing is done in X number of
days.
  2. Auto abandon patch, with as much information as possible on how to
reopen if needed.

> 
>> To make the whole process a little friendlier we could increase
>> the time frame from 1 week to 2.
> 
> How about just automatically abandon any new change as soon
> as it's published, and if the contributor really feels it's
> important they'll unabandon it.
> 


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO] Review metrics - what do we want to measure?

2014-09-04 Thread Ben Nemec
On 09/04/2014 08:54 AM, Jeremy Stanley wrote:
> On 2014-09-04 11:01:55 +0100 (+0100), Derek Higgins wrote:
> [...]
>> How would people feel about turning [auto-abandon] back on?
> 
> A lot of reviewers (myself among them) feel auto-abandon was a
> cold and emotionless way to provide feedback on a change. Especially
> on high-change-volume projects where core reviewers may at times get
> sucked into triaging other problems for long enough that the
> auto-abandoner kills lots of legitimate changes (possibly from
> new contributors who will get even more disgusted by this than the
> silence itself and walk away indefinitely with the impression that
> we really aren't a welcoming development community at all).
> 
>> Can it be done on a per project basis?
> 
> It can, by running your own... but again it seems far better for
> core reviewers to decide if a change has potential or needs to be
> abandoned--that way there's an accountable human making that
> deliberate choice rather than the review team hiding behind an
> automated process so that no one is to blame for hurt feelings
> besides the infra operators who are enforcing this draconian measure
> for you.

The thing is that it's also pushing more work onto already overloaded
core review teams.  Maybe submitters don't like auto-abandon, but I bet
they like having a core reviewer spending time cleaning up dead reviews
instead of reviewing their change even less.

TBH, if someone's offended by the bot then I can't imagine how incensed
they must be when a human does the same thing.  The bot clearly isn't
making it personal, and even if the human isn't either it's much easier
to have misunderstandings (see also every over-reaction to a -1 ever).

I suppose it makes it easier for cores to ignore reviews, but from the
other discussions I've read that hasn't gone away just because
auto-abandon did, so I'm not convinced that's a solution anyway.

/2 cents

> 
>> To make the whole process a little friendlier we could increase
>> the time frame from 1 week to 2.
> 
> How about just automatically abandon any new change as soon
> as it's published, and if the contributor really feels it's
> important they'll unabandon it.
> 


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO] Review metrics - what do we want to measure?

2014-09-04 Thread Jay Dobies

It can, by running your own... but again it seems far better for
core reviewers to decide if a change has potential or needs to be
abandoned--that way there's an accountable human making that
deliberate choice rather than the review team hiding behind an
automated process so that no one is to blame for hurt feelings
besides the infra operators who are enforcing this draconian measure
for you.


The thing is that it's also pushing more work onto already overloaded
core review teams.  Maybe submitters don't like auto-abandon, but I bet
they like having a core reviewer spending time cleaning up dead reviews
instead of reviewing their change even less.

TBH, if someone's offended by the bot then I can't imagine how incensed
they must be when a human does the same thing.  The bot clearly isn't
making it personal, and even if the human isn't either it's much easier
to have misunderstandings (see also every over-reaction to a -1 ever).

I suppose it makes it easier for cores to ignore reviews, but from the
other discussions I've read that hasn't gone away just because
auto-abandon did, so I'm not convinced that's a solution anyway.


+1, I don't think it'll come as much of a shock if a -1 review gets 
closed due to time without progress.



/2 cents




To make the whole process a little friendlier we could increase
the time frame from 1 week to 2.


How about just automatically abandon any new change as soon
as it's published, and if the contributor really feels it's
important they'll unabandon it.




___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO] Review metrics - what do we want to measure?

2014-09-05 Thread Sullivan, Jon Paul
> -Original Message-
> From: Jay Dobies [mailto:jason.dob...@redhat.com]
> Sent: 04 September 2014 18:24
> To: openstack-dev@lists.openstack.org
> Subject: Re: [openstack-dev] [TripleO] Review metrics - what do we want
> to measure?
> 
> >> It can, by running your own... but again it seems far better for core
> >> reviewers to decide if a change has potential or needs to be
> >> abandoned--that way there's an accountable human making that
> >> deliberate choice rather than the review team hiding behind an
> >> automated process so that no one is to blame for hurt feelings
> >> besides the infra operators who are enforcing this draconian measure
> >> for you.
> >
> > The thing is that it's also pushing more work onto already overloaded
> > core review teams.  Maybe submitters don't like auto-abandon, but I
> > bet they like having a core reviewer spending time cleaning up dead
> > reviews instead of reviewing their change even less.
> >
> > TBH, if someone's offended by the bot then I can't imagine how
> > incensed they must be when a human does the same thing.  The bot
> > clearly isn't making it personal, and even if the human isn't either
> > it's much easier to have misunderstandings (see also every over-
> reaction to a -1 ever).
> >
> > I suppose it makes it easier for cores to ignore reviews, but from the
> > other discussions I've read that hasn't gone away just because
> > auto-abandon did, so I'm not convinced that's a solution anyway.
> 
> +1, I don't think it'll come as much of a shock if a -1 review gets
> closed due to time without progress.

+1 to auto-abandon.

Taking an excerpt from Dereks mail:

> A patch got negative feedback and we're automating the process
> to prompt the submitter to deal with it. It may be more friendly if it
> was a 2 step process
>   1. (after a few days if inactivity) Add a comment saying you got
> negative feedback with suggestions of how to proceed and information
> that the review will be autoabandoned if nothing is done in X number of
> days.
>   2. Auto abandon patch, with as much information as possible on how to
> reopen if needed.

This sounds like the best solution.  A 1 week warning and a 2 week abandoning.

It is about as welcoming as this sort of thing could be for a new committer, 
whilst also avoiding an ever-growing backlog of changes that will never see 
attention because the submitter has moved on to other things.

There is also the benefit of prompting submitters into action when things get 
auto-abandoned, rather than them having sitting at priority #2 or #3 forever.

I was never particularly upset over the auto-abandon, just click a single 
button in the ui and get on with it.

> 
> > /2 cents
> >
> >>
> >>> To make the whole process a little friendlier we could increase
> >>> the time frame from 1 week to 2.
> >>
> >> How about just automatically abandon any new change as soon
> >> as it's published, and if the contributor really feels it's
> >> important they'll unabandon it.
> >>
> >
> >
> > ___
> > OpenStack-dev mailing list
> > OpenStack-dev@lists.openstack.org
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >
> >
> 
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Thanks, 
Jon-Paul Sullivan ☺ Cloud Services - @hpcloud

Postal Address: Hewlett-Packard Galway Limited, Ballybrit Business Park, Galway.
Registered Office: Hewlett-Packard Galway Limited, 63-74 Sir John Rogerson's 
Quay, Dublin 2. 
Registered Number: 361933
 
The contents of this message and any attachments to it are confidential and may 
be legally privileged. If you have received this message in error you should 
delete it from your system immediately and advise the sender.

To any recipient of this message within HP, unless otherwise stated, you should 
consider this message and attachments as "HP CONFIDENTIAL".
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO] Review metrics - what do we want to measure?

2014-09-10 Thread James Polley
There was some discussion of this topic during today's meeting.

Full notes are at
http://eavesdrop.openstack.org/meetings/tripleo/2014/tripleo.2014-09-09-19.04.log.html
starting around 19:24

To summarise, we had one action item, and one comment from dprince which
attracted broad agreement, and which I think is worth repeating here:

19:34:26  #action bnemec to look at adding a report of items that
have a -1 from a core but no response for 14 days, as a first step towards
possibly auto-WIPing these patches

19:41:54  I sort of feel like all the focus on stats in our
meetings is going to encourage people to game them (i.e. fly by reviews)
19:42:13  when what we really need is ownershipt to push through
the important things...
19:42:45 * dprince thinkgs stats are becoming a TripleO waste of time
19:44:12  stats not important, being active and responsive enough
to not push new contributors away does matter
19:44:51  jp_at_hp: I agree; the stats are a tool, not the thing
itself.

On Fri, Sep 5, 2014 at 9:47 PM, Sullivan, Jon Paul 
wrote:

> > -Original Message-
> > From: Jay Dobies [mailto:jason.dob...@redhat.com]
> > Sent: 04 September 2014 18:24
> > To: openstack-dev@lists.openstack.org
> > Subject: Re: [openstack-dev] [TripleO] Review metrics - what do we want
> > to measure?
> >
> > >> It can, by running your own... but again it seems far better for core
> > >> reviewers to decide if a change has potential or needs to be
> > >> abandoned--that way there's an accountable human making that
> > >> deliberate choice rather than the review team hiding behind an
> > >> automated process so that no one is to blame for hurt feelings
> > >> besides the infra operators who are enforcing this draconian measure
> > >> for you.
> > >
> > > The thing is that it's also pushing more work onto already overloaded
> > > core review teams.  Maybe submitters don't like auto-abandon, but I
> > > bet they like having a core reviewer spending time cleaning up dead
> > > reviews instead of reviewing their change even less.
> > >
> > > TBH, if someone's offended by the bot then I can't imagine how
> > > incensed they must be when a human does the same thing.  The bot
> > > clearly isn't making it personal, and even if the human isn't either
> > > it's much easier to have misunderstandings (see also every over-
> > reaction to a -1 ever).
> > >
> > > I suppose it makes it easier for cores to ignore reviews, but from the
> > > other discussions I've read that hasn't gone away just because
> > > auto-abandon did, so I'm not convinced that's a solution anyway.
> >
> > +1, I don't think it'll come as much of a shock if a -1 review gets
> > closed due to time without progress.
>
> +1 to auto-abandon.
>
> Taking an excerpt from Dereks mail:
>
> > A patch got negative feedback and we're automating the process
> > to prompt the submitter to deal with it. It may be more friendly if it
> > was a 2 step process
> >   1. (after a few days if inactivity) Add a comment saying you got
> > negative feedback with suggestions of how to proceed and information
> > that the review will be autoabandoned if nothing is done in X number of
> > days.
> >   2. Auto abandon patch, with as much information as possible on how to
> > reopen if needed.
>
> This sounds like the best solution.  A 1 week warning and a 2 week
> abandoning.
>
> It is about as welcoming as this sort of thing could be for a new
> committer, whilst also avoiding an ever-growing backlog of changes that
> will never see attention because the submitter has moved on to other things.
>
> There is also the benefit of prompting submitters into action when things
> get auto-abandoned, rather than them having sitting at priority #2 or #3
> forever.
>
> I was never particularly upset over the auto-abandon, just click a single
> button in the ui and get on with it.
>
> >
> > > /2 cents
> > >
> > >>
> > >>> To make the whole process a little friendlier we could increase
> > >>> the time frame from 1 week to 2.
> > >>
> > >> How about just automatically abandon any new change as soon
> > >> as it's published, and if the contributor really feels it's
> > >> important they'll unabandon it.
> > >>
> > >
> > >
> > > ___
> > > OpenStack-dev mailing list
> > > OpenStack-dev@lists.openstack.org
> > >

Re: [openstack-dev] [TripleO] Review metrics - what do we want to measure?

2014-09-10 Thread Steven Hardy
On Thu, Sep 04, 2014 at 01:54:20PM +, Jeremy Stanley wrote:
> On 2014-09-04 11:01:55 +0100 (+0100), Derek Higgins wrote:
> [...]
> > How would people feel about turning [auto-abandon] back on?
> 
> A lot of reviewers (myself among them) feel auto-abandon was a
> cold and emotionless way to provide feedback on a change. Especially
> on high-change-volume projects where core reviewers may at times get
> sucked into triaging other problems for long enough that the
> auto-abandoner kills lots of legitimate changes (possibly from
> new contributors who will get even more disgusted by this than the
> silence itself and walk away indefinitely with the impression that
> we really aren't a welcoming development community at all).

I can understand this argument, and perhaps an auto-abandon with a long
period like say a month for the submitter to address comments and reset the
clock would be a reasonable compromise?

The problem we have now, is there is a constantly expanding queue of zombie
reviews, where the submitter got some negative feedback and, for whatever
reason, has chosen not to address it.

For example, in my "Incoming reviews" queue on the gerrit dashboard, I've
got reviews I (or someone) -1'd over four months ago, with zero feedback
from the submitter, what value is there to these reviews cluttering up the
dashboard for every reviewer?

To make matters worse Jenkins comes along periodically and rechecks or
figures out the patch merge failed and bumps the zombie back up to the top
of the queue - obviously I don't realise that it's not a human comment I
need to read until I've expended effort clicking on the review and reading
it :(

>From a reviewer perspective, it's impossible, and means the review
dashboard is basically unusable without complex queries to weed out the
active reviews from the zombies.

> > Can it be done on a per project basis?
> 
> It can, by running your own... but again it seems far better for
> core reviewers to decide if a change has potential or needs to be
> abandoned--that way there's an accountable human making that
> deliberate choice rather than the review team hiding behind an
> automated process so that no one is to blame for hurt feelings
> besides the infra operators who are enforcing this draconian measure
> for you.

With all the threads about core-reviewer overload, I think it's
unreasonable to expect us to scrub the review queue making daily judgements
on whether a patch should be abandoned, and I'd argue that a human
abandoning another human's patch has much more demotivating impact on
contributors than a clearly documented automated process that you must
address negative review comments within a set period or your review will
expire.

If you think about mailing list patch workflow, it's similar - you post
your patch, get email review feedback, then post new reviews with fixes.
If you fail to post new reviews with fixes, your review falls out the
bottom of people's inboxes and you don't get your patch merged.

> 
> > To make the whole process a little friendlier we could increase
> > the time frame from 1 week to 2.
> 
> How about just automatically abandon any new change as soon
> as it's published, and if the contributor really feels it's
> important they'll unabandon it.

I think that's a pretty unfair comment - all reviewers and most
contributors are working really hard, all we're asking for is that
contributors work with reviewers to get their patch into shape withing a
reasonable time. :(

As someone who's drinking from the firehose every day with a seemingly
insurmountable review queue, I'd rather we worked towards processes which
help us manage the workload in a sustainable way, rather than turning the
review queue into the zombie-soup-of-dispair it currently is.

All we need is two things:

1. A way to automatically escalate reviews which have no feedback at all
from core reviewers within a reasonable time (say a week or two)

2. A way to automatically remove reviews from the queue which have core
negative feedback with no resolution within a reasonable time (say a month
or several weeks, so it's not percieved contributor-hostile).

Note (2) still allows "core reviewers to decide if a change has potential",
it just becomes opt-in, e.g we have to address the issues which prevent us
giving it positive feedback, either by directly working with the
contributor, or delegating the work to an active contributor if the
original patch author has decided not to continue work on the patch.

Ultimately, it's not just about reviews - who's going to maintain all these
features if the author is not committed enough to post just one patch a month
while getting it through the review process? Oh wait, that'll be another
job for the core team won't it?

Steve

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO] Review metrics - what do we want to measure?

2014-09-10 Thread Ben Nemec
On 09/10/2014 03:57 AM, Steven Hardy wrote:
> On Thu, Sep 04, 2014 at 01:54:20PM +, Jeremy Stanley wrote:
>> On 2014-09-04 11:01:55 +0100 (+0100), Derek Higgins wrote:
>> [...]
>>> How would people feel about turning [auto-abandon] back on?
>>
>> A lot of reviewers (myself among them) feel auto-abandon was a
>> cold and emotionless way to provide feedback on a change. Especially
>> on high-change-volume projects where core reviewers may at times get
>> sucked into triaging other problems for long enough that the
>> auto-abandoner kills lots of legitimate changes (possibly from
>> new contributors who will get even more disgusted by this than the
>> silence itself and walk away indefinitely with the impression that
>> we really aren't a welcoming development community at all).
> 
> I can understand this argument, and perhaps an auto-abandon with a long
> period like say a month for the submitter to address comments and reset the
> clock would be a reasonable compromise?
> 
> The problem we have now, is there is a constantly expanding queue of zombie
> reviews, where the submitter got some negative feedback and, for whatever
> reason, has chosen not to address it.
> 
> For example, in my "Incoming reviews" queue on the gerrit dashboard, I've
> got reviews I (or someone) -1'd over four months ago, with zero feedback
> from the submitter, what value is there to these reviews cluttering up the
> dashboard for every reviewer?
> 
> To make matters worse Jenkins comes along periodically and rechecks or
> figures out the patch merge failed and bumps the zombie back up to the top
> of the queue - obviously I don't realise that it's not a human comment I
> need to read until I've expended effort clicking on the review and reading
> it :(
> 
> From a reviewer perspective, it's impossible, and means the review
> dashboard is basically unusable without complex queries to weed out the
> active reviews from the zombies.
> 
>>> Can it be done on a per project basis?
>>
>> It can, by running your own... but again it seems far better for
>> core reviewers to decide if a change has potential or needs to be
>> abandoned--that way there's an accountable human making that
>> deliberate choice rather than the review team hiding behind an
>> automated process so that no one is to blame for hurt feelings
>> besides the infra operators who are enforcing this draconian measure
>> for you.
> 
> With all the threads about core-reviewer overload, I think it's
> unreasonable to expect us to scrub the review queue making daily judgements
> on whether a patch should be abandoned, and I'd argue that a human
> abandoning another human's patch has much more demotivating impact on
> contributors than a clearly documented automated process that you must
> address negative review comments within a set period or your review will
> expire.
> 
> If you think about mailing list patch workflow, it's similar - you post
> your patch, get email review feedback, then post new reviews with fixes.
> If you fail to post new reviews with fixes, your review falls out the
> bottom of people's inboxes and you don't get your patch merged.
> 
>>
>>> To make the whole process a little friendlier we could increase
>>> the time frame from 1 week to 2.
>>
>> How about just automatically abandon any new change as soon
>> as it's published, and if the contributor really feels it's
>> important they'll unabandon it.
> 
> I think that's a pretty unfair comment - all reviewers and most
> contributors are working really hard, all we're asking for is that
> contributors work with reviewers to get their patch into shape withing a
> reasonable time. :(
> 
> As someone who's drinking from the firehose every day with a seemingly
> insurmountable review queue, I'd rather we worked towards processes which
> help us manage the workload in a sustainable way, rather than turning the
> review queue into the zombie-soup-of-dispair it currently is.
> 
> All we need is two things:
> 
> 1. A way to automatically escalate reviews which have no feedback at all
> from core reviewers within a reasonable time (say a week or two)
> 
> 2. A way to automatically remove reviews from the queue which have core
> negative feedback with no resolution within a reasonable time (say a month
> or several weeks, so it's not percieved contributor-hostile).

A suggestion Robert made during that discussion was to have auto-WIP
instead of auto-abandon.  That should be less hostile to contributors
and still makes it easy to filter out the reviews that aren't ready to
merge anyway.  For me personally, and for the sake of tracking the
stats, that would be sufficient to address the problem.

> 
> Note (2) still allows "core reviewers to decide if a change has potential",
> it just becomes opt-in, e.g we have to address the issues which prevent us
> giving it positive feedback, either by directly working with the
> contributor, or delegating the work to an active contributor if the
> original patch author has