Hi Sean, and Happy New Year :-)

> -----Original Message-----
> From: Sean Dague [mailto:[email protected]]
> Sent: 30 December 2013 22:05
> To: Day, Phil; OpenStack Development Mailing List (not for usage questions)
> Subject: Re: [openstack-dev] [nova] minimum review period for functional
> changes that break backwards compatibility
> 
> On 12/29/2013 07:58 PM, Day, Phil wrote:
> > Hi Sean,
> >
> > I'm not convinced the comparison to my clean shut down change is valid
> here.  For sure that proved that beyond a certain point (in that case months)
> there is no additional value in extending the review period, and no amount
> of review will catch all problems,  but that's not the same as saying that 
> there
> is no value in a minimum period.
> 
> The reason I brought up graceful shutdown was that I actually think that
> review went the opposite direction, and got worse over time.
> 
> Borrowing from another part of Robert's thread -
> https://review.openstack.org/#/q/status:open+-Verified-1+-
> CodeReview%2B2+-CodeReview%2B1+-CodeReview-1+-CodeReview-
> 2+(project:openstack/nova+OR+project:openstack/python-
> novaclient)+branch:master,n,z
> 
> Minimum review time only works if all patches eventually see review, which
> they don't.

I agree that the controlled shutdown didn't improve over time, although I don’t 
think it got worse.   It ironic in a way that the controlled shutdown issue is 
one which adversely affected the gate tests but wouldn’t I think of affected 
any real workloads (the problem is if you stop an instance immediately after 
its been created but before the GuestOS is running then it doesn’t see the 
shutdown signal and so waits for the full 2 minute period)  whereas the ext4 
change improves the gate tests but breaks some production systems.   

However whilst I agree that too long is bad, I don’t think that's inconsistent 
with too short also being bad - it seems to me that there is probably some 
sweet spot between these two extremes that probably also depends on the type of 
change being proposed. 

> 
> > In this particular case if the review has been open for say three weeks then
> imo the issue would have been caught, as I spotted it as soon as I saw the
> merge.  As it wasn't and urgent bug fix I don't see a major gain from not
> waiting even if there wasn't a problem.
> 
> So... then why didn't you spot it on the submit? And would you have found
> the review on your own if it hadn't been for the merge commit email?
> 

If I'd been working during the four days that the patch was open for review I'm 
confident that I would have spotted it - I make a point of looking out for any 
changes which might break our production system.  
It wasn't the merge commit e-mail that made me notice it BTW,  I made a point 
of looking at the recently merged changes in gerrit to see if there was 
anything significant that I'd missed.


But this thread wasn't created to get a review cadence that takes my holiday 
plans into account, it was more to open a discussion about are there different 
types of review strategies we should adopt to take into account the different 
types of changes that we now see.    There was a time when Nova was a lot 
simpler, there were very few production deployments, there were a lot less 
active reviewers, and probably only the cores new all aspects of the system, 
and so two +2 and some +1 was enough to ensure a thorough review.  I'd suggest 
that things are different now and it would be worthwhile to identify a few 
characteristics and see if there isn't scope for a few different types of merge 
criteria.  For example (and I'd really like to hear other ideas and suggestions 
here) perhaps changes could be chacraterised / priorisied by one or more of the 
following:

- High priority bug fixes:  obvious why these need to be reviewed and merged 
quickly - two +2s is sufficient

- Approved BPs targeted for a specific release:  By being approved the design 
has had a level of scrutiny, and it's important to consumers of nova that the 
release roadmap is as reliable as possible, so these do need to get attention.  
 However a lot of design detail often only emerges in the code,  so these 
should be open to more scrutiny.  Maybe there should be an additional soft  
limit of at least 4 or 5 +1's

- Normal bug fixes:  Need to keep this ticking through, but should pause to 
make sure there is a reprentative range of reviews, for example a soft limit of 
at least 3 +1's

- Changes in default behaviour:   Always likely to affect existing systems in 
some way.   Maybe we should have an additional type of review vote that comes 
from people who are recognised as reperensting large production deployments ?


> It was urgent from a tripleo perspective, enough so that they were carrying
> an out of tree patch for it until it merged. Remember, we're trying to get
> tripleo, eventually, gating. And 45 mins deploy times was a big fix to move
> that ball forward. That's why I prioritized that.
> 
> So while it was a low priority change for your environment, don't assume that
> it wasn't really needed as part of CI.
> 
> > I'm all for continually improving the gate tests, but in this case they 
> > would
> need to be testing against a system that had been running before the
> change, to test specifically that new vms got the new fs, so there would have
> needed to be a matching test added to grenade as part of the same commit.
> >
> > Not quite sure where the number of open changes comes in either, just
> > because there are a lot of proposed changes doesn't to me mean we need
> > to rush the non urgent ones, it mwans we maybe need better
> > prioritisation.  There are plenty of long lived buf fixes siting with
> > many +1s
> 
> Again, you keep assuming this was non urgent, and your assumptions that no
> one should have been looking at it were because of that. However this was
> effectively communicated by Robert a high priority issue on trippleo CD
> testing, enough so that they were going to run a nova fork until it merged, so
> it was treated as such.
> 
> So please step back from the assumption that this was a non urgent change,
> because the tone in IRC was anything but. So it was treated as something
> that needed to move fast. The fact that it was a super small change, saw no
> negative reviews, no negative email comments, means I'm not sure we
> would have done any better. The point was, with no negative feedback it
> didn't really occur to anyone that it was a backwards compatibility problem. I
> was *actually* looking for someone to say something about why it would be
> a problem, and there was nothing. So I assumed we had exhausted the set
> of issues.

I agree and have already acknowledged that you did wait to see if there were 
any negative comments, but it does seem that you then made the assumption that 
the one person who responded with a+1 before you gave it a +2 meant that others 
were either not interested, or had looked and decided to neither +1 or -1 the 
change.   I think (and hindsight is always wonderful)  that waiting for 
feedback should mean waiting for a number of reviews, or as in the case with 
Neutron type changes, waiting for reviews from particular groups of reviewers.

> 
> So you are really making 2 assumptions here that aren't valid:
>   * it was known to be a backwards compatibility problem - because it wasn't,
> and no one provided feedback over the course of 4 days to indicate that it
> was (there were some alternative implementation ideas, but no one said
> "hold on, this could break X")

That's not really an assumption -  I'm saying it would have been spotted had 
there been more reviews because I know would have spotted it, as I think would 
some of my collegues.   We've been looking at a number of problems with backing 
files recently, including the security  fix we posted a week or so ago, and 
have been working on how to configure some of our images to use ext4 - so it 
was an area that was fresh in our minds.

>   * it wasn't urgent - and like I said the tone really was towards this being 
> an
> important need for trippleo
>

I'm assuming (or maybe asserting would be a better term) that it wasn't urgent 
because:

- Nothing in the commit message suggests its urgent - it describes it more as a 
major performance enhancement

- Its not linked to a high priority bug or approved blueprint

- The capability to use ext4 already existed via configuration options.  If 
this was needed for the TripelO gate tests (and again no mention of this in the 
commit itself) then why couldn’t the tests of been configured to use ext4 
rather than changing the default ?

So no sorry - I'm not willing yet to back away from my assumption that this 
wasn't an urgently needed fix. especially because of the last point.   

It may have been the tone in IRC, but should I really be expected to look there 
to understand the background to a submitted change ?


> Which is why I don't think saying 3 week minimum review time is helpful,
> because if any of the reviewers had thought it was backwards incompatible,
> they would have -1ed it. Humans only get so much of that right.
>

I think I agree that a time limit is the wrong way to go - as in this case the 
number of active reviewers is very variable throughout the year.   Something 
which was more around number and type of reviewers (and maybe we need a 
distinction now that goes beyond just core and non-core) would probably be more 
workable.


> So we need to handle this defensively and beef up the automated systems.
> Does that require more people adding in defensive tests to those scenarios?
> definitely. The fact that right now basically Dean and I are the only ones 
> that
> understand our upgrade testing (with a few others
> bootstrapping) is way too few people touching that problem.
>
> If seemless upgrade is the most important thing to people, we need more
> people in the trenches on the tooling that helps us verify that.
> 
>       -Sean
As I said before, I'm not arguing against improving the automated tests, but of 
course we also have to be cognisant that extending the time take to run tests 
is in itself a problem.   And however much effort we can divert into adding 
tests it will take time before that gets us improved coverage of this kind of 
guest / upgrade issue.   

In the meantime as you and Rob have pointed out the review process clearly 
isn’t working as well as we'd all like it to, with some changes getting no 
reviews and others some sitting for weeks with many +1's (for example here's a 
bug fix with seven +1's  that has been stuck since 6th Dec 
https://review.openstack.org/#/c/56258/ ) - none of which seems to be really 
correlated with the relative priority of the proposed changes.  So I'm suggest 
that that we look at what can be done to improve the review process now while 
the automated testing is improved. 

Cheers,
Phil

_______________________________________________
OpenStack-dev mailing list
[email protected]
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to