On April 22, 2011, John Arbash Meinel wrote: > On 04/21/2011 10:16 PM, Francis J. Lacoste wrote: > > (Posted on the blog at > > http://blog.launchpad.net/general/5-9-23-51-and-other- numbers) > > > > We are now two months away from our next Thunderdome. How are we doing in > > regards with the objectives set for that milestone? You may recall from > > my > > > > last post the objectives: > > * have no timeouts with a cut-off at 9s; > > * have an empty critical bugs queue; > > * getting a slot free on our ‘Next’ queue. > > > > We practically achieved the first objective! Today, we lowered the hard > > timeout to 9s and this didn’t increase our number of daily timeouts. We > > don’t have zero timeouts yet. We still have a fair bunch of timeout bugs > > to fix. But we get on average 650 requests timing out in a day. That’s > > less than 0.0001% of our traffic. > > 650 / 8M = 0.008% (0.00008 as a fraction). But still, very well done.
Doh, well spotted! I've made the correction on the blog. > > > These remaining timeout bugs are part of our second objective. On that > > front, we are in a more difficult position. We have 259 critical bugs to > > close. That went up since last time! What went wrong? Well, we had less > > people working on critical bugs for once. That’s been fixed this week > > when the Orange squad rotated back on maintenance. We again have two > > full squads working on critical bugs. Second, we modified our OOPS > > reporting to show all timeouts happening, not only the ones occurring > > the most often. That resulted in about 30 new timeouts filed. (See the > > hight red bar at the start of the graph). Fortunately for us, the rate > > of new critical bugs is declining. We are at about 23 on average in the > > last two weeks. That’s still high and some of those are related to JS > > regressions escaping to production because our Windmill test > > infrastructure is disabled. This means that 51 is now the magic number. > > We need to close 51 of these critical bugs per week to reach 0 by the > > Thunderdome. That was the number we closed in our best week, just before > > the number of people working on criticals was reduced. So we’ll also > > need to reduce the number of new critical bugs found each week to > > succeed here. > > I realize 259 is a lot, enough that it is hard to get a handle on. Have > you gone through them at all to see if there are bulk-fixes, things that > are already fixed, etc. I'm certainly guessing there is a fair bulk that > are going to be similar in effort, and a really long tail of ones that > are hard to handle (like problematic timeout pages, etc.) It would be > interesting to get a feeling for where the knee of the curve is, though. > Robert has recently retriage those so would be in a better position to comment on the redundancy we have in the queue. The only one I know of is that a good number of the remaining timeouts seemed to happen in python land and could be related to the GIL problem. So hopefully should be fixed by us moving to a single-threaded deployment. Cheers -- Francis J. Lacoste [email protected]
signature.asc
Description: This is a digitally signed message part.
_______________________________________________ Mailing list: https://launchpad.net/~launchpad-dev Post to : [email protected] Unsubscribe : https://launchpad.net/~launchpad-dev More help : https://help.launchpad.net/ListHelp

