On 12/29/2013 07:58 PM, Day, Phil wrote:
Hi Sean,

I'm not convinced the comparison to my clean shut down change is valid here.  
For sure that proved that beyond a certain point (in that case months) there is 
no additional value in extending the review period, and no amount of review 
will catch all problems,  but that's not the same as saying that there is no 
value in a minimum period.

The reason I brought up graceful shutdown was that I actually think that review went the opposite direction, and got worse over time.

Borrowing from another part of Robert's thread - https://review.openstack.org/#/q/status:open+-Verified-1+-CodeReview%2B2+-CodeReview%2B1+-CodeReview-1+-CodeReview-2+(project:openstack/nova+OR+project:openstack/python-novaclient)+branch:master,n,z

Minimum review time only works if all patches eventually see review, which they don't.

In this particular case if the review has been open for say three weeks then 
imo the issue would have been caught, as I spotted it as soon as I saw the 
merge.  As it wasn't and urgent bug fix I don't see a major gain from not 
waiting even if there wasn't a problem.

So... then why didn't you spot it on the submit? And would you have found the review on your own if it hadn't been for the merge commit email?

It was urgent from a tripleo perspective, enough so that they were carrying an out of tree patch for it until it merged. Remember, we're trying to get tripleo, eventually, gating. And 45 mins deploy times was a big fix to move that ball forward. That's why I prioritized that.

So while it was a low priority change for your environment, don't assume that it wasn't really needed as part of CI.

I'm all for continually improving the gate tests, but in this case they would 
need to be testing against a system that had been running before the change, to 
test specifically that new vms got the new fs, so there would have needed to be 
a matching test added to grenade as part of the same commit.

Not quite sure where the number of open changes comes in either, just because 
there are a lot of proposed changes doesn't to me mean we need to rush the non 
urgent ones, it mwans we maybe need better prioritisation.  There are plenty of 
long lived buf fixes siting with many +1s

Again, you keep assuming this was non urgent, and your assumptions that no one should have been looking at it were because of that. However this was effectively communicated by Robert a high priority issue on trippleo CD testing, enough so that they were going to run a nova fork until it merged, so it was treated as such.

So please step back from the assumption that this was a non urgent change, because the tone in IRC was anything but. So it was treated as something that needed to move fast. The fact that it was a super small change, saw no negative reviews, no negative email comments, means I'm not sure we would have done any better. The point was, with no negative feedback it didn't really occur to anyone that it was a backwards compatibility problem. I was *actually* looking for someone to say something about why it would be a problem, and there was nothing. So I assumed we had exhausted the set of issues.

So you are really making 2 assumptions here that aren't valid:
* it was known to be a backwards compatibility problem - because it wasn't, and no one provided feedback over the course of 4 days to indicate that it was (there were some alternative implementation ideas, but no one said "hold on, this could break X") * it wasn't urgent - and like I said the tone really was towards this being an important need for trippleo

Which is why I don't think saying 3 week minimum review time is helpful, because if any of the reviewers had thought it was backwards incompatible, they would have -1ed it. Humans only get so much of that right.

So we need to handle this defensively and beef up the automated systems. Does that require more people adding in defensive tests to those scenarios? definitely. The fact that right now basically Dean and I are the only ones that understand our upgrade testing (with a few others bootstrapping) is way too few people touching that problem.

If seemless upgrade is the most important thing to people, we need more people in the trenches on the tooling that helps us verify that.

        -Sean

--
Sean Dague
Samsung Research America
s...@dague.net / sean.da...@samsung.com
http://dague.net

_______________________________________________
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to