Re: Some data on mozilla-inbound

2013-04-26 Thread Wesley Johnston
Maybe. I started to avoid it if possible around then, but almost 4 hours for 
results still is basically unusable.

- Wes

- Original Message -
From: Phil Ringnalda philringna...@gmail.com
To: dev-platform@lists.mozilla.org
Sent: Friday, April 26, 2013 8:01:25 AM
Subject: Re: Some data on mozilla-inbound

On 4/25/13 4:47 PM, Wesley Johnston wrote:
 Requesting one set of tests on one platform is a 6-10 hour turnaround for me.

That's surprising. https://tbpl.mozilla.org/?tree=Tryrev=9d1daf69061d
was a midday -b do -p all -u all with a 3 hour 40 minute end-to-end.

Or did you mean, as a great many people do while discussing try these
days, back in February when I stopped using try because it was so awful
then, requesting one set of tests...?

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Some data on mozilla-inbound

2013-04-26 Thread Phil Ringnalda
On 4/26/13 8:25 AM, Wesley Johnston wrote:
 Maybe. I started to avoid it if possible around then, but almost 4 hours for 
 results still is basically unusable.

Tell me about it - that's actually the same as the end-to-end on
inbound/central. Unfortunately, engineering is totally indifferent to
things like having doubled the cycle time for Win debug browser-chrome
since last November.

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Some data on mozilla-inbound

2013-04-26 Thread Gavin Sharp
Bug 864085

On Fri, Apr 26, 2013 at 2:06 PM, Kartikaya Gupta kgu...@mozilla.com wrote:
 On 13-04-26 11:37 , Phil Ringnalda wrote:

  Unfortunately, engineering is totally indifferent to
 things like having doubled the cycle time for Win debug browser-chrome
 since last November.


 Is there a bug filed for this? I just cranked some of the build.json files
 through some scripts and got the average time (in seconds) for all the jobs
 run on the mozilla-central_xp-debug_test-mochitest-browser-chrome builders,
 and there is in fact a significant increase since November. This makes me
 think that we need a resource usage regression alarm of some sort too.

 builds-2012-11-01.js: 4063
 builds-2012-11-15.js: 4785
 builds-2012-12-01.js: 5311
 builds-2012-12-15.js: 5563
 builds-2013-01-01.js: 6326
 builds-2013-01-15.js: 5706
 builds-2013-02-01.js: 5823
 builds-2013-02-15.js: 6103
 builds-2013-03-01.js: 5642
 builds-2013-03-15.js: 5187
 builds-2013-04-01.js: 5643
 builds-2013-04-15.js: 6207

 kats

 ___
 dev-platform mailing list
 dev-platform@lists.mozilla.org
 https://lists.mozilla.org/listinfo/dev-platform
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Some data on mozilla-inbound

2013-04-26 Thread Gregory Szorc
On 4/26/2013 2:06 PM, Kartikaya Gupta wrote:
 On 13-04-26 11:37 , Phil Ringnalda wrote:
  Unfortunately, engineering is totally indifferent to
 things like having doubled the cycle time for Win debug browser-chrome
 since last November.


 Is there a bug filed for this? I just cranked some of the build.json
 files through some scripts and got the average time (in seconds) for
 all the jobs run on the
 mozilla-central_xp-debug_test-mochitest-browser-chrome builders, and
 there is in fact a significant increase since November. This makes me
 think that we need a resource usage regression alarm of some sort too.

 builds-2012-11-01.js: 4063
 builds-2012-11-15.js: 4785
 builds-2012-12-01.js: 5311
 builds-2012-12-15.js: 5563
 builds-2013-01-01.js: 6326
 builds-2013-01-15.js: 5706
 builds-2013-02-01.js: 5823
 builds-2013-02-15.js: 6103
 builds-2013-03-01.js: 5642
 builds-2013-03-15.js: 5187
 builds-2013-04-01.js: 5643
 builds-2013-04-15.js: 6207

Well, wall time will [likely] increase as we write new tests. I'm
guessing (OK, really hoping) the number of mochitest files has increased
in rough proportion to the wall time? Also, aren't we executing some
tests on virtual machines now? On any virtual machine (and especially on
EC2), you don't know what else is happening on the physical machine, so
CPU and I/O steal are expected to cause variations and slowness in
execution time.

Speaking of resource usage, I've filed bug 859573 to have system
resource counters reported as part of jobs. That way, we can have a
high-level handle on whether our CPU efficiency is increasing/decreasing
over time. I'd argue that we should strive for 100% CPU saturation on
every slave (for most jobs) otherwise those CPU cycles are lost forever
and we've wasted capacity. But, that's arguably a conversation for
another thread.

While I don't have numbers off hand, one of the things I noticed was the
wall time of the various test chunks isn't as balanced as it should be.
In particular, bc tests seem to be a long pole. Perhaps we should split
them into bc-1 and bc-2? Along that vein, perhaps we could combine some
of the regular mochitest jobs, as they don't seem to take too long to
execute. Who makes these kinds of decisions?

On the subject of mochitests, I think we should really pound home the
message that mochitests should be avoided if possible. If you can move
more business logic into JSMs and test with xpcshell tests and only
write mochitests for the code that exists in the browser, that's a net
win (xpcshell tests are lighter weight and easier to run in parallel).
This would likely involve a huge shift in the way FX Team (and others)
write code and tests, so I don't expect it will be an easy sell. But,
it's a discussion we should have because the impact on test execution
times could be drastic.
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Some data on mozilla-inbound

2013-04-26 Thread Chris AtLee

On 14:29, Fri, 26 Apr, Gregory Szorc wrote:

On 4/26/2013 2:06 PM, Kartikaya Gupta wrote:

On 13-04-26 11:37 , Phil Ringnalda wrote:

 Unfortunately, engineering is totally indifferent to
things like having doubled the cycle time for Win debug browser-chrome
since last November.



Is there a bug filed for this? I just cranked some of the build.json
files through some scripts and got the average time (in seconds) for
all the jobs run on the
mozilla-central_xp-debug_test-mochitest-browser-chrome builders, and
there is in fact a significant increase since November. This makes me
think that we need a resource usage regression alarm of some sort too.

builds-2012-11-01.js: 4063
builds-2012-11-15.js: 4785
builds-2012-12-01.js: 5311
builds-2012-12-15.js: 5563
builds-2013-01-01.js: 6326
builds-2013-01-15.js: 5706
builds-2013-02-01.js: 5823
builds-2013-02-15.js: 6103
builds-2013-03-01.js: 5642
builds-2013-03-15.js: 5187
builds-2013-04-01.js: 5643
builds-2013-04-15.js: 6207


Well, wall time will [likely] increase as we write new tests. I'm
guessing (OK, really hoping) the number of mochitest files has increased
in rough proportion to the wall time? Also, aren't we executing some
tests on virtual machines now? On any virtual machine (and especially on
EC2), you don't know what else is happening on the physical machine, so
CPU and I/O steal are expected to cause variations and slowness in
execution time.


Those tests are still on exactly the same hardware. philor points out in 
https://bugzilla.mozilla.org/show_bug.cgi?id=864085#c0 that the 
time increase is disproportionate for win7. It would be interesting to 
look at all the other suites too.


Perhaps a regular report of how much our wall-clock times for builds and 
different test suite has changed week-over-week would be useful?


That aside, how do we cope with an ever-increasing runtime requirement 
of tests? Keep adding more chunks?


signature.asc
Description: Digital signature
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Some data on mozilla-inbound

2013-04-25 Thread Phil Ringnalda
On 4/24/13 9:50 PM, Ehsan Akhgari wrote:
 No.  But that's not what I was talking about.  Whether something lands
 directly on try is a judgement call, and some people may be better at it
 than others.  As someone who has stopped using try server as a rule
 (because of the excessive wait times there, which I find unacceptable
 for day-to-day work), I always ask myself what are the chances that this
 thing that I want to push could bounce, and I test on try only when I
 can convince myself that the chances are slow.  All I was suggesting was
 give people a way to assess whether they're good at making these calls,
 and improve it if they're not.

I'm curious about what you think the wait times are, and what wait times
you would find acceptable.
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Some data on mozilla-inbound

2013-04-25 Thread Ehsan Akhgari

On 2013-04-25 2:42 AM, Phil Ringnalda wrote:

On 4/24/13 9:50 PM, Ehsan Akhgari wrote:

No.  But that's not what I was talking about.  Whether something lands
directly on try is a judgement call, and some people may be better at it
than others.  As someone who has stopped using try server as a rule
(because of the excessive wait times there, which I find unacceptable
for day-to-day work), I always ask myself what are the chances that this
thing that I want to push could bounce, and I test on try only when I
can convince myself that the chances are slow.  All I was suggesting was
give people a way to assess whether they're good at making these calls,
and improve it if they're not.


I'm curious about what you think the wait times are, and what wait times
you would find acceptable.


Ideally the end to end times would be the amount of time it takes to 
build + the amount of time it takes to run the slowest test suite 
requested (which would only be achievable with enough capacity.)  What 
has caused me to stop using the try server is that it is totally 
unreliable for getting results back on the *same day*, and whether or 
not you can do that depends on how everybody else is using it.


These days, I only run try server builds on things that I absolutely 
cannot test manually (e.g. when I do something which might break Windows 
on a weekend where I don't have easy access to a Windows box) or when I 
can deal with putting things off to the next day(s), as sometimes you 
need to do multiple rounds of try pushes and that makes what would 
otherwise be a few hour project for me into a week long project, which 
is devastating to my productivity.


Cheers,
Ehsan
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Some data on mozilla-inbound

2013-04-25 Thread Milan Sreckovic

With extremely limited experience of using try, I know that I would have at 
times set a flag stop as soon as you hit a first red on a platform.  So, I 
really like Chris' idea below, as a manual workaround, and a more powerful 
solution for that.  Easier said than done, I imagine...

Milan

On 2013-04-25, at 12:10 PM, Chris Lord cl...@mozilla.com wrote:

 Something that strikes me as very obvious that can be done to reduce load on 
 try, is to allow for jobs to be requested and cancelled in a more granular 
 fashion. Right now, I have to think before I push What's the most I could 
 possibly need? And if I don't request enough, I have to push an entire new 
 job!
 
 I know that I'd request a lot less from try, and request fewer jobs, if I 
 could, after I've pushed, trigger/cancel builds per platform, and 
 request/cancel particular tests.
 
 --Chris

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Some data on mozilla-inbound

2013-04-25 Thread Neil

Justin Lebar wrote:


Note that we don't have enough capacity to turn around current try requests 
within a reasonable amount of time.

Is this because people are requesting too much because try chooser 
simply isn't sufficiently descriptive for what people want?


--
Warning: May contain traces of nuts.
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Some data on mozilla-inbound

2013-04-24 Thread Ehsan Akhgari

On 2013-04-23 12:05 PM, Justin Lebar wrote:

The ratio of things landed on inbound which turn out to busted is really
worrying


On the one hand, we're told not to push to try too much, because that
wastes resources.

On the other hand, we're told not to burn m-i, because that wastes resources.


True!


Should we be surprised when people don't get this right 100% of the time?


No.  But that's not what I was talking about.  Whether something lands 
directly on try is a judgement call, and some people may be better at it 
than others.  As someone who has stopped using try server as a rule 
(because of the excessive wait times there, which I find unacceptable 
for day-to-day work), I always ask myself what are the chances that this 
thing that I want to push could bounce, and I test on try only when I 
can convince myself that the chances are slow.  All I was suggesting was 
give people a way to assess whether they're good at making these calls, 
and improve it if they're not.



Instead of considering how to get people people to strike a better
balance between wasting infra resources and burning inbound, I think
we need to consider what we can do to increase the acceptable margin
of error.


These are not either/or choices.


Note that we don't have enough capacity to turn around current try
requests within a reasonable amount of time.  Pushing to inbound is
the only way to get quick feedback on whether your patch works, these
days.  As I've said before, I'd love to see releng report on try
turnaround times, so we can hold someone accountable.  The data is
there; we just need to process it.

If we can't increase the amount of infra capacity we have, perhaps we
could use it more effectively.  We've discussed lots of ways we might
accomplish this on this newsgroup, and I've seen very few of them
tried.  Perhaps an important part of the problem is that we're not
able to innovate quickly enough on this front.


We've been asking for more infra capacity for as long as I can remember, 
and so far we've always had a shortage in that front (part of which is 
due to the continuous increase in development pace, which is a good 
thing), so I agree that the way to win this battle is to stop waiting 
for that magical day when we have enough capacity and stat using it 
more efficiently.  What I suggested could be a part of that.



People are always going to make mistakes, and the purpose of processes
is to minimize the harm caused by those mistakes, not to embarrass or
cajole people into behaving better in the future.  As Jono would say,
it's not the user's fault.


Nobody's blaming the user.  We should just empower them to make better 
choices.


Cheers,
Ehsan
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Some data on mozilla-inbound

2013-04-24 Thread Ehsan Akhgari

On 2013-04-23 1:17 PM, Ed Morley wrote:

On 23/04/2013 17:28, Kartikaya Gupta wrote:

On 13-04-23 00:39 , Ehsan Akhgari wrote:

How hard would it be to
gather a list of the total number of patches being backed out plus the
amount of time that we spent building/testing those, hopefully in a
style similar to
http://people.mozilla.org/~catlee/highscores/highscores.html?


Not trivial, but not too difficult either. Do we have any evidence to
show that the try highscores page has made an impact in reducing
unnecessary try usage? Also I agree with Justin that if we do this it
will be very much a case of sending mixed messages. The try highscores
list says to people don't land on try and the backout highscores list
would say to people always test on try.


It's worth noting that when I've contacted developers in the top 10 of
the tryserver usage leaderboard my message is not do not use try, but
instead suggestions like:
* please do not use -p all -u all when you only made an android
specific change
* you already did a |-p all -u all| run - on which mochitest-1 failed
on all platforms, so please don't test every testsuite on every platform
for the half dozen iterations you ran on Try thereafter (at much as
this sounds like an extreme example, there have been cases like this)


Yes, this! ^

The messaging around this should not be to tell people always test on 
try.  It should be to help them figure out how to make better judgement 
calls on this.  This is a skill that people develop and are not born 
with, and without data it's hard an an individual to judge how good I'm 
at that.


Ehsan

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Some data on mozilla-inbound

2013-04-24 Thread Ehsan Akhgari

On 2013-04-24 9:14 AM, Ben Hearsum wrote:

On 04/23/13 10:21 PM, Kartikaya Gupta wrote:

On 13-04-23 19:21 , Nicholas Nethercote wrote:

- The 'inbound was closed for 15.3068% of the total time due to
bustage' number is an underestimate, in one sense.  When inbound is
closed at 10am California time, it's a lot more inconvenient to
developers than when it's busted at midnight California time.  More
than 3x, according to
http://oduinn.com/images/2013/blog_2013_02_pushes_per_hour.png.


See my note 3 under the Inbound uptime section. I used exactly that
graph to weight the inbound downtime and there wasn't a significant
difference.


- Getting agreement on a significant process change is really
difficult.  Is it possible to set up a repo where a few people can
volunteer to try Kats' approach for a couple of weeks?  That would
provide invaluable experience and data.


Yeah, there are plans afoot to try this, pending sheriff approval.


If you know what you want the repo to be called I'd advise filing a
RelEng bug about it now and we can get it done without being in the
critical path later on. You can also just ask for one of the twigs to be
customized
(https://wiki.mozilla.org/ReleaseEngineering/DisposableProjectBranches).


We're planning to use the try server, I believe.

Cheers,
Ehsan

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Some data on mozilla-inbound

2013-04-24 Thread David Ascher
 The messaging around this should not be to tell people always test on
 try.  It should be to help them figure out how to make better judgement
 calls on this.  This is a skill that people develop and are not born with,
 and without data it's hard an an individual to judge how good I'm at that.


One idea might be to give developers feedback on the consequences of a
particular push, e.g. the AWS cost, a proxy for time during which
developers couldn't push or some other measurable metric.  Right now each
push probably feels as expensive as every other.
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Some data on mozilla-inbound

2013-04-24 Thread Ehsan Akhgari

On 2013-04-25 1:02 AM, David Ascher wrote:


The messaging around this should not be to tell people always test
on try.  It should be to help them figure out how to make better
judgement calls on this.  This is a skill that people develop and
are not born with, and without data it's hard an an individual to
judge how good I'm at that.


One idea might be to give developers feedback on the consequences of a
particular push, e.g. the AWS cost, a proxy for time during which
developers couldn't push or some other measurable metric.  Right now
each push probably feels as expensive as every other.


The AWS cost would be the wrong measure, since it doesn't account for 
the amount of time that 100 other people spent grinding their teeth 
because they could not push.  :-)  But yeah, I agree with the general 
idea of a cost measure, I just can't think of what a good one would be 
(well, one better than the wall-clock time...)


Ehsan
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Some data on mozilla-inbound

2013-04-24 Thread Justin Lebar
 One idea might be to give developers feedback on the consequences of a
 particular push, e.g. the AWS cost, a proxy for time during which
 developers couldn't push or some other measurable metric.  Right now
 each push probably feels as expensive as every other.

For tryserver, I proposed bug 848589 to do just this.  I think it's
worth trying, but someone needs to implement it.

 Nobody's blaming the user.  We should just empower them to make better 
 choices.

Okay.

I guess what's frustrating to me is that we have this problem and
essentially our only option to solve it is to change users' behavior.
I totally believe that some people could use resources much more
efficiently, but it's frustrating if changing user behavior is our
only tool.

We keep talking about this every few weeks, as though there's some
hidden solution that will emerge only after ten newsgroup threads.  In
actuality, we very likely will need to do a bunch of different things,
each having a small impact.  And in particular, I don't think we'll
solve this problem without significant work from release engineering.
If that work isn't forthcoming, I don't think we're going to make a
significant dent in this.

On Thu, Apr 25, 2013 at 1:20 AM, Ehsan Akhgari ehsan.akhg...@gmail.com wrote:
 On 2013-04-25 1:02 AM, David Ascher wrote:


 The messaging around this should not be to tell people always test
 on try.  It should be to help them figure out how to make better
 judgement calls on this.  This is a skill that people develop and
 are not born with, and without data it's hard an an individual to
 judge how good I'm at that.


 One idea might be to give developers feedback on the consequences of a
 particular push, e.g. the AWS cost, a proxy for time during which
 developers couldn't push or some other measurable metric.  Right now
 each push probably feels as expensive as every other.


 The AWS cost would be the wrong measure, since it doesn't account for the
 amount of time that 100 other people spent grinding their teeth because they
 could not push.  :-)  But yeah, I agree with the general idea of a cost
 measure, I just can't think of what a good one would be (well, one better
 than the wall-clock time...)

 Ehsan

 ___
 dev-platform mailing list
 dev-platform@lists.mozilla.org
 https://lists.mozilla.org/listinfo/dev-platform
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Some data on mozilla-inbound

2013-04-23 Thread Axel Hecht

On 4/22/13 9:54 PM, Kartikaya Gupta wrote:

TL;DR:
* Inbound is closed 25% of the time
* Turning off coalescing could increase resource usage by up to 60% (but
probably less than this).
* We spend 24% of our machine resources on changes that are later backed
out, or changes that are doing the backout
* The vast majority of changesets that are backed out from inbound are
detectable on a try push


Do we know how many of these have been pushed to try, and 
passed/compiled what they'd fail later?


I expect some cost of regressions to come from merging/rebasing, and 
it'd be interesting to know how much of that you can see in the data 
window you looked at.


has been pushed to try is obviously tricky to find out, in particular 
on rebases, and possibly modified patches during the rebase.


Axel



Because of the large effect from coalescing, any changes to the current
process must not require running the full set of tests on every push.
(In my proposal this is easily accomplished with trychooser syntax, but
other proposals include rotating through T-runs on pushes, etc.).

--- Long verion below ---

Following up from the infra load meeting we had last week, I spent some
time this weekend crunching various pieces of data on mozilla-inbound to
get a sense of how much coalescing actually helps us, how much backouts
hurt us, and generally to get some data on the impact of my previous
proposal for using a multi-headed tree. I didn't get all the data that I
wanted but as I probably won't get back to this for a bit, I thought I'd
share what I found so far and see if anybody has other specific pieces
of data they would like to see gathered.

-- Inbound uptime --

I looked at a ~9 day period from April 7th to April 16th. During this time:
* inbound was closed for 24.9587% of the total time
* inbound was closed for 15.3068% of the total time due to bustage.
* inbound was closed for 11.2059% of the total time due to infra.

Notes:
1) bustage and infra were determined by grep -i on the data from
treestatus.mozilla.org.
2) There is some overlap so bustage + infra != total.
3) I also weighted the downtime using checkins-per-hour histogram from
joduinn's blog at [1], but this didn't have a significant impact: the
total, bustage, and infra downtime percentages moved to 25.5392%,
15.7285%, and 11.3748% respectively.

-- Backout changes --

Next I did an analysis of the changes that landed on inbound during that
time period. The exact pushlog that I looked at (corresponding to the
same April 7 - April 16 time period) is at [2]. I removed all of the
merge changesets from this range, since I wanted to look at inbound in
as much isolation as possible.

In this range:
* there were a total of 916 changesets
* there were a total of 553 pushes
* 74 of the 916 changesets (8.07%) were backout changesets
* 116 of the 916 changesets (12.66%) were backed out
* removing all backouts and changes backed out removed 114 pushes (20.6%)

Of the 116 changesets that were backed out:
* 37 belonged to single-changeset pushes
* 65 belonged to multi-changeset pushes where the entire pushed was
backed out
* 14 belonged to multi-changeset pushes where the changesets were
selectively backed out

Of the 74 backout changesets:
* 4 were for commit message problems
* 25 were for build failures
* 36 were for test failures
* 5 were for leaks/talos regressions
* 1 was for premature landing
* 3 were for unknown reasons

Notes:
1) There were actually 79 backouts, but I ignored 5 of them because they
backed out changes that happened prior to the start of my range).
2) Additional changes at the end of my range may have been backed out,
but the backouts were not in my range so I didn't include them in my
analysis.
3) The 14 csets that were selectively backed out is interesting to me
because it implies that somebody did some work to identify which changes
in the push were bad, and this naturally means that there is room to
save on doing that work.

-- Merge conflicts --

I also wanted to determine how many of these changes conflicted with
each other, and how far away the conflicting changes were. I got a
partial result here but I need to do more analysis before I have numbers
worth posting.

-- Build farm resources --

Finally, I used a combination of gps' mozilla-build-analyzer tool [3]
and some custom tools to determine how much machine time was spent on
building all of these pushes and changes.

I looked at all the build.json files [4] from the 6th of April to the
17th of April and pulled out all the jobs that corresponding to the
push changesets in my range above. For this set of 553 changesets,
there were 500 (exactly!) distinct builders. 111 of these had -pgo
or _pgo in the name, and I excluded them. I created a 553x389 matrix
with the remaining builders and filled in how much time was spent on
each changeset for each builder (in case of multiple jobs, I added the
times).

Then I assumed that any empty field in the 553x389 matrix was a result
of coalescing. 

Re: Some data on mozilla-inbound

2013-04-23 Thread Neil

Kartikaya Gupta wrote:

The vast majority of changesets that are backed out from inbound are 
detectable on a try push


Hopefully a push never burns all platforms because the developer tried 
it locally first, but stranger things have happened! But what I'm most 
interested in is whether patches are more likely to be backed out for 
build or test failures. Perhaps if we could optimise our use of Try then 
that would reduce the load on inbound. For example:


   * At first, the push is built on one fast and readily available
 platform (linux64 is often mentioned)
   * If this builds, then all platforms build
   * Only once all platforms have built are tests run

This would avoid running tests for pushes that are known not to build on 
all platforms.


--
Warning: May contain traces of nuts.
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Some data on mozilla-inbound

2013-04-23 Thread Jonathan Kew

On 23/4/13 09:58, Neil wrote:

Kartikaya Gupta wrote:


The vast majority of changesets that are backed out from inbound are
detectable on a try push


Hopefully a push never burns all platforms because the developer tried
it locally first, but stranger things have happened! But what I'm most
interested in is whether patches are more likely to be backed out for
build or test failures. Perhaps if we could optimise our use of Try then
that would reduce the load on inbound. For example:

* At first, the push is built on one fast and readily available
  platform (linux64 is often mentioned)
* If this builds, then all platforms build
* Only once all platforms have built are tests run

This would avoid running tests for pushes that are known not to build on
all platforms.



OTOH, it would significantly extend the time a developer has to wait 
before tryserver test results begin to appear. Which I think people 
would find discouraging.


JK

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Some data on mozilla-inbound

2013-04-23 Thread Ed Morley

On 23 April 2013 09:58:41, Neil wrote:

Hopefully a push never burns all platforms because the developer tried
it locally first, but stranger things have happened!


This actually happens quite often. On occasion it's due to warnings as 
errors (switched off by default on local machines due to toolchain 
differences), but more often than not the developer didn't even try 
compiling locally :-/


Given that local machine time scales linearly with the rate at which we 
hire devs (unlike our automation capacity), I think we need to work out 
why (some) people aren't doing things like compiling locally and 
running their team's directory of tests before pushing. I would hazard 
a guess that if we improved incremental build times  created mach 
commands to simplify the edit-compile-test loop, then we could cut out 
many of these obvious inbound bustage cases.

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Some data on mozilla-inbound

2013-04-23 Thread Gervase Markham
On 23/04/13 10:17, Ed Morley wrote:
 Given that local machine time scales linearly with the rate at which we
 hire devs (unlike our automation capacity), I think we need to work out
 why (some) people aren't doing things like compiling locally and running
 their team's directory of tests before pushing. I would hazard a guess
 that if we improved incremental build times  created mach commands to
 simplify the edit-compile-test loop, then we could cut out many of these
 obvious inbound bustage cases.

That would be the carrot. The stick would be finding some way of finding
out whether a changeset was pushed to try before it was pushed to m-i.
If a developer failed to push to try and then broke m-i, we could (in a
pre-commit hook) refuse to let them commit to m-i in future unless
they'd already pushed to try. For a week, on first offence, a month on
subsequent offences :-)

This, of course, is predicated on being able to detect in real time
whether a changeset being pushed to m-i has previously been pushed to try.

Gerv

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Some data on mozilla-inbound

2013-04-23 Thread Chris AtLee

On 16:34, Tue, 23 Apr, Gervase Markham wrote:

On 23/04/13 10:17, Ed Morley wrote:

Given that local machine time scales linearly with the rate at which we
hire devs (unlike our automation capacity), I think we need to work out
why (some) people aren't doing things like compiling locally and running
their team's directory of tests before pushing. I would hazard a guess
that if we improved incremental build times  created mach commands to
simplify the edit-compile-test loop, then we could cut out many of these
obvious inbound bustage cases.


That would be the carrot. The stick would be finding some way of finding
out whether a changeset was pushed to try before it was pushed to m-i.
If a developer failed to push to try and then broke m-i, we could (in a
pre-commit hook) refuse to let them commit to m-i in future unless
they'd already pushed to try. For a week, on first offence, a month on
subsequent offences :-)

This, of course, is predicated on being able to detect in real time
whether a changeset being pushed to m-i has previously been pushed to try.


We've considered enforcing this using some cryptographic token. After 
you push to try and get good results, the system gives you a token you 
need to include in your commit to m-i.


Alternatively, you could indicate the try revision you pushed, and we 
could look up the results and refuse the commit based on your 
build/tests results on try, or if you commit to m-i is too different 
than the push to try.


Cheers,
Chris


signature.asc
Description: Digital signature
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Some data on mozilla-inbound

2013-04-23 Thread Kartikaya Gupta

On 13-04-23 11:41 , Chris AtLee wrote:

We've considered enforcing this using some cryptographic token. After
you push to try and get good results, the system gives you a token you
need to include in your commit to m-i.


... or you could just merge the cset directly from try to m-i or m-c. 
(i.e. my original proposal).


Cheers,
kats

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Some data on mozilla-inbound

2013-04-23 Thread Kartikaya Gupta

On 13-04-23 00:39 , Ehsan Akhgari wrote:

How hard would it be to
gather a list of the total number of patches being backed out plus the
amount of time that we spent building/testing those, hopefully in a
style similar to
http://people.mozilla.org/~catlee/highscores/highscores.html?


Not trivial, but not too difficult either. Do we have any evidence to 
show that the try highscores page has made an impact in reducing 
unnecessary try usage? Also I agree with Justin that if we do this it 
will be very much a case of sending mixed messages. The try highscores 
list says to people don't land on try and the backout highscores list 
would say to people always test on try.


Cheers,
kats

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Some data on mozilla-inbound

2013-04-23 Thread Kartikaya Gupta

On 13-04-23 03:57 , Axel Hecht wrote:

Do we know how many of these have been pushed to try, and
passed/compiled what they'd fail later?


I haven't looked at this. It would be useful to know but short of 
pulling patches and using some similarity heuristic or manually 
examining patches I can't think of a way to get this data.



I expect some cost of regressions to come from merging/rebasing, and
it'd be interesting to know how much of that you can see in the data
window you looked at.


This is something I did try to determine, by looking at the number of 
conflicts between patches in my data window. My algorithm was basically 
this:

1) Sync a tree to the last cset in the range
2) Iterate through each push backwards, skipping merges, backouts, and 
changes that are later backed out

3) For each of these pushes, try to qpush a backout of it.
4) If the attempted qpush fails, that means there is another change that 
landed since that one that there is a merge conflict with.


The problem here is that the farther back you go the more likely it is 
that you will run into conflicting changes, because an increasing 
portion of the data window is checked for conflicts when really you 
probably only want to test some small number of changes (~30?). Using 
this approach I got 129 conflicts, and as expected, the rate at which I 
encountered conflicts went up as I went farther back. I didn't get 
around to trying the sliding window approach which I believe will give a 
more representative (and much lower) count. My code for doing this is in 
the bottom half of [1] if you (or anybody else) wants to give that a shot.


kats

[1] 
https://github.com/staktrace/mozilla-tree-analyzer/blob/master/inbound-csets.sh 
- WARNING don't *run* anything in this repo because it may do 
destructive things. Ask me if you're not sure.

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Some data on mozilla-inbound

2013-04-23 Thread Gavin Sharp
On Tue, Apr 23, 2013 at 8:41 AM, Chris AtLee cat...@mozilla.com wrote:
 We've considered enforcing this using some cryptographic token. After you
 push to try and get good results, the system gives you a token you need to
 include in your commit to m-i.

Sounds like the goal of this kind of solution would be to eliminate
the developer made a bad judgement call case, but it's not at all
clear to me that that problem is worse than developer overuses try
for trivial changes or developer needs to wait for try results
before pushing trivial fix problem.

It's also not at all clear to me that a 13% backout rate on inbound is
a problem, because there are a lot of factors at play. Those backouts
represent wasted resources (build machine time, sheriff time,
sometimes tree-closure time), but if the alternative is wasting
developer time (needing to wait for try results unnecessarily) and
tryserver build machine time, the tradeoff becomes less clear.
Obviously different perspectives here also impact your view of those
tradeoffs.

Gavin
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Some data on mozilla-inbound

2013-04-23 Thread Gavin Sharp
On Tue, Apr 23, 2013 at 9:28 AM, Kartikaya Gupta kgu...@mozilla.com wrote:
 Not trivial, but not too difficult either. Do we have any evidence to show
 that the try highscores page has made an impact in reducing unnecessary try
 usage?

It's been used by people like Ed Morley to reach out to individual
developers and notify them of their impact. I'm sure that's had a
positive effect, though it seems rather difficult to measure.

Gavin
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Some data on mozilla-inbound

2013-04-23 Thread David Keeler
On 04/23/13 02:17, Ed Morley wrote:
 On 23 April 2013 09:58:41, Neil wrote:
 Hopefully a push never burns all platforms because the developer tried
 it locally first, but stranger things have happened!
 
 This actually happens quite often. On occasion it's due to warnings as
 errors (switched off by default on local machines due to toolchain
 differences)

I would like to know a bit more about this. Is our list of supported
toolchains so diverse that building with one version versus another will
report so many false positives as to be useless?
I enabled warnings-as-errors on my local machine after pushing something
to inbound that failed to build because of this, and I've had no
problems since then. Enabling this by default seems like an easy way to
remove instances of this problem.

 but more often than not the developer didn't even try
 compiling locally :-/

So there are instances where developers didn't use the try servers and
also didn't compile locally at all before pushing to inbound? I don't
think we as a community should be okay with that kind of irresponsible
behavior.
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Some data on mozilla-inbound

2013-04-23 Thread Ed Morley

On 23/04/2013 17:28, Kartikaya Gupta wrote:

On 13-04-23 00:39 , Ehsan Akhgari wrote:

How hard would it be to
gather a list of the total number of patches being backed out plus the
amount of time that we spent building/testing those, hopefully in a
style similar to
http://people.mozilla.org/~catlee/highscores/highscores.html?


Not trivial, but not too difficult either. Do we have any evidence to
show that the try highscores page has made an impact in reducing
unnecessary try usage? Also I agree with Justin that if we do this it
will be very much a case of sending mixed messages. The try highscores
list says to people don't land on try and the backout highscores list
would say to people always test on try.


It's worth noting that when I've contacted developers in the top 10 of 
the tryserver usage leaderboard my message is not do not use try, but 
instead suggestions like:
* please do not use -p all -u all when you only made an android 
specific change
* you already did a |-p all -u all| run - on which mochitest-1 failed 
on all platforms, so please don't test every testsuite on every platform 
for the half dozen iterations you ran on Try thereafter (at much as 
this sounds like an extreme example, there have been cases like this)

...
...
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Some data on mozilla-inbound

2013-04-23 Thread Boris Zbarsky

On 4/23/13 1:17 PM, David Keeler wrote:

I would like to know a bit more about this. Is our list of supported
toolchains so diverse that building with one version versus another will
report so many false positives as to be useless?


Yes.  For example a typical clang+ccache build of the tree with fatal 
warnings will fail unless you jump through deoptimize-ccache hoops, 
because things like if (FOO(x)) will warn if FOO(x) expands to (x == 5).


For another example msvc until recently didn't actually have warnings as 
errors enabled at all in many directories, so it didn't matter what you 
did with your local setup in msvc.



I enabled warnings-as-errors on my local machine after pushing something
to inbound that failed to build because of this, and I've had no
problems since then.


It _really_ depends on the exact compiler and toolchain you're using.


So there are instances where developers didn't use the try servers and
also didn't compile locally at all before pushing to inbound? I don't
think we as a community should be okay with that kind of irresponsible
behavior.


Agreed.

-Boris

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Some data on mozilla-inbound

2013-04-23 Thread Axel Hecht

On 4/23/13 6:35 PM, Kartikaya Gupta wrote:

On 13-04-23 03:57 , Axel Hecht wrote:

Do we know how many of these have been pushed to try, and
passed/compiled what they'd fail later?


I haven't looked at this. It would be useful to know but short of
pulling patches and using some similarity heuristic or manually
examining patches I can't think of a way to get this data.


I expect some cost of regressions to come from merging/rebasing, and
it'd be interesting to know how much of that you can see in the data
window you looked at.


This is something I did try to determine, by looking at the number of
conflicts between patches in my data window. My algorithm was basically
this:
1) Sync a tree to the last cset in the range
2) Iterate through each push backwards, skipping merges, backouts, and
changes that are later backed out
3) For each of these pushes, try to qpush a backout of it.
4) If the attempted qpush fails, that means there is another change that
landed since that one that there is a merge conflict with.

The problem here is that the farther back you go the more likely it is
that you will run into conflicting changes, because an increasing
portion of the data window is checked for conflicts when really you
probably only want to test some small number of changes (~30?). Using
this approach I got 129 conflicts, and as expected, the rate at which I
encountered conflicts went up as I went farther back. I didn't get
around to trying the sliding window approach which I believe will give a
more representative (and much lower) count. My code for doing this is in
the bottom half of [1] if you (or anybody else) wants to give that a shot.


I expect that only a part of our programmatic merge conflicts are 
actually version control merge conflicts. There are a lot of cases like 
modifications to supposedly internal properties in toolkit starting to 
get a new usecase in browser, a define changing or disappearing, etc.


All those invalidate the testing of the patch that has been done to some 
extent, and don't involve modifications to the same lines of code, which 
is all that version control catches.


Axel
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Some data on mozilla-inbound

2013-04-23 Thread Ehsan Akhgari

On 2013-04-23 12:50 AM, Justin Lebar wrote:

The ratio of things landed on inbound which turn out to busted is really
worrying



* 116 of the 916 changesets (12.66%) were backed out


If 13% is really worrying, what do you think our goal should be?


Less than that?  It's really hard to come up with hard numbers as goals 
here.


Ehsan

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Some data on mozilla-inbound

2013-04-23 Thread Nicholas Nethercote
On Mon, Apr 22, 2013 at 12:54 PM, Kartikaya Gupta kgu...@mozilla.com wrote:
 TL;DR:
 * Inbound is closed 25% of the time
 * Turning off coalescing could increase resource usage by up to 60% (but
 probably less than this).
 * We spend 24% of our machine resources on changes that are later backed
 out, or changes that are doing the backout
 * The vast majority of changesets that are backed out from inbound are
 detectable on a try push

Thanks for collecting real data!

A collage of thoughts follow.

- The 'inbound was closed for 15.3068% of the total time due to
bustage' number is an underestimate, in one sense.  When inbound is
closed at 10am California time, it's a lot more inconvenient to
developers than when it's busted at midnight California time.  More
than 3x, according to
http://oduinn.com/images/2013/blog_2013_02_pushes_per_hour.png.

- Having our main landing repo closed multiple times per day, for a
significant fraction of the time feels clownshoes-ish to me.  For this
reason, my gut feeling is that we'll end up doing something like what
Kats is suggesting.  My gut feeling is also that it won't end up
changing the infrastructure load that much.

- Any landing system that makes life harder for sheriffs is a problem.
I'm not at all certain that Kats' proposal would do that, but that's
my main worry about it.

- A process whereby developers choose which tests to run on the
official landing branch (be it inbound, or something else) feels like
a bad idea.  It's far too easy to get wrong.

- Getting agreement on a significant process change is really
difficult.  Is it possible to set up a repo where a few people can
volunteer to try Kats' approach for a couple of weeks?  That would
provide invaluable experience and data.

Nick
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Some data on mozilla-inbound

2013-04-23 Thread Robert O'Callahan
On Wed, Apr 24, 2013 at 11:21 AM, Nicholas Nethercote 
n.netherc...@gmail.com wrote:

 - The 'inbound was closed for 15.3068% of the total time due to
 bustage' number is an underestimate, in one sense.  When inbound is
 closed at 10am California time, it's a lot more inconvenient to
 developers than when it's busted at midnight California time.  More
 than 3x, according to
 http://oduinn.com/images/2013/blog_2013_02_pushes_per_hour.png.


Although I've been known to bust inbound, I also tend to check in around
2-3am PDT.

I think it's important to remember that the optimal bustage rate for
inbound is some value greater than zero and varies depending on the time of
day. If inbound is never busted then we're wasting try resources testing
patches that have a 0.99 probability of landing safely. OTOH, whenever the
bustage rate is high enough it's difficult to get things landed, or the
sheriffs' ability to detect regressions is impacted, it's too high. That
currently seems to be the case so it seems like a good idea to use a
highscore list or something like it to exert pressure to use try more until
the situation is resolved.

Rob
-- 
q“qIqfq qyqoquq qlqoqvqeq qtqhqoqsqeq qwqhqoq qlqoqvqeq qyqoquq,q qwqhqaqtq
qcqrqeqdqiqtq qiqsq qtqhqaqtq qtqoq qyqoquq?q qEqvqeqnq qsqiqnqnqeqrqsq
qlqoqvqeq qtqhqoqsqeq qwqhqoq qlqoqvqeq qtqhqeqmq.q qAqnqdq qiqfq qyqoquq
qdqoq qgqoqoqdq qtqoq qtqhqoqsqeq qwqhqoq qaqrqeq qgqoqoqdq qtqoq qyqoquq,q
qwqhqaqtq qcqrqeqdqiqtq qiqsq qtqhqaqtq qtqoq qyqoquq?q qEqvqeqnq
qsqiqnqnqeqrqsq qdqoq qtqhqaqtq.q
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Some data on mozilla-inbound

2013-04-23 Thread Kartikaya Gupta

On 13-04-23 19:21 , Nicholas Nethercote wrote:

- The 'inbound was closed for 15.3068% of the total time due to
bustage' number is an underestimate, in one sense.  When inbound is
closed at 10am California time, it's a lot more inconvenient to
developers than when it's busted at midnight California time.  More
than 3x, according to
http://oduinn.com/images/2013/blog_2013_02_pushes_per_hour.png.


See my note 3 under the Inbound uptime section. I used exactly that 
graph to weight the inbound downtime and there wasn't a significant 
difference.



- Getting agreement on a significant process change is really
difficult.  Is it possible to set up a repo where a few people can
volunteer to try Kats' approach for a couple of weeks?  That would
provide invaluable experience and data.


Yeah, there are plans afoot to try this, pending sheriff approval.

Cheers,
kats
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Some data on mozilla-inbound

2013-04-22 Thread Ehsan Akhgari
This was a fantastic read, it almost made me shed happy tears!  Thanks a 
lot kats for doing this.


The ratio of things landed on inbound which turn out to busted is really 
worrying, and it might be an indicator that (some?) developers have a 
poor judgement on how safe their patches are.  How hard would it be to 
gather a list of the total number of patches being backed out plus the 
amount of time that we spent building/testing those, hopefully in a 
style similar to 
http://people.mozilla.org/~catlee/highscores/highscores.html?  If we 
had such a list, perhaps we could reach out to the high offenders there 
and let them know about the problem, and see if that changes these stats 
a couple of weeks from now?


Thanks!
Ehsan

On 2013-04-22 3:54 PM, Kartikaya Gupta wrote:

TL;DR:
* Inbound is closed 25% of the time
* Turning off coalescing could increase resource usage by up to 60% (but
probably less than this).
* We spend 24% of our machine resources on changes that are later backed
out, or changes that are doing the backout
* The vast majority of changesets that are backed out from inbound are
detectable on a try push

Because of the large effect from coalescing, any changes to the current
process must not require running the full set of tests on every push.
(In my proposal this is easily accomplished with trychooser syntax, but
other proposals include rotating through T-runs on pushes, etc.).

--- Long verion below ---

Following up from the infra load meeting we had last week, I spent some
time this weekend crunching various pieces of data on mozilla-inbound to
get a sense of how much coalescing actually helps us, how much backouts
hurt us, and generally to get some data on the impact of my previous
proposal for using a multi-headed tree. I didn't get all the data that I
wanted but as I probably won't get back to this for a bit, I thought I'd
share what I found so far and see if anybody has other specific pieces
of data they would like to see gathered.

-- Inbound uptime --

I looked at a ~9 day period from April 7th to April 16th. During this time:
* inbound was closed for 24.9587% of the total time
* inbound was closed for 15.3068% of the total time due to bustage.
* inbound was closed for 11.2059% of the total time due to infra.

Notes:
1) bustage and infra were determined by grep -i on the data from
treestatus.mozilla.org.
2) There is some overlap so bustage + infra != total.
3) I also weighted the downtime using checkins-per-hour histogram from
joduinn's blog at [1], but this didn't have a significant impact: the
total, bustage, and infra downtime percentages moved to 25.5392%,
15.7285%, and 11.3748% respectively.

-- Backout changes --

Next I did an analysis of the changes that landed on inbound during that
time period. The exact pushlog that I looked at (corresponding to the
same April 7 - April 16 time period) is at [2]. I removed all of the
merge changesets from this range, since I wanted to look at inbound in
as much isolation as possible.

In this range:
* there were a total of 916 changesets
* there were a total of 553 pushes
* 74 of the 916 changesets (8.07%) were backout changesets
* 116 of the 916 changesets (12.66%) were backed out
* removing all backouts and changes backed out removed 114 pushes (20.6%)

Of the 116 changesets that were backed out:
* 37 belonged to single-changeset pushes
* 65 belonged to multi-changeset pushes where the entire pushed was
backed out
* 14 belonged to multi-changeset pushes where the changesets were
selectively backed out

Of the 74 backout changesets:
* 4 were for commit message problems
* 25 were for build failures
* 36 were for test failures
* 5 were for leaks/talos regressions
* 1 was for premature landing
* 3 were for unknown reasons

Notes:
1) There were actually 79 backouts, but I ignored 5 of them because they
backed out changes that happened prior to the start of my range).
2) Additional changes at the end of my range may have been backed out,
but the backouts were not in my range so I didn't include them in my
analysis.
3) The 14 csets that were selectively backed out is interesting to me
because it implies that somebody did some work to identify which changes
in the push were bad, and this naturally means that there is room to
save on doing that work.

-- Merge conflicts --

I also wanted to determine how many of these changes conflicted with
each other, and how far away the conflicting changes were. I got a
partial result here but I need to do more analysis before I have numbers
worth posting.

-- Build farm resources --

Finally, I used a combination of gps' mozilla-build-analyzer tool [3]
and some custom tools to determine how much machine time was spent on
building all of these pushes and changes.

I looked at all the build.json files [4] from the 6th of April to the
17th of April and pulled out all the jobs that corresponding to the
push changesets in my range above. For this set of 553 changesets,
there were 500 (exactly!) distinct 

Re: Some data on mozilla-inbound

2013-04-22 Thread Justin Lebar
 The ratio of things landed on inbound which turn out to busted is really
 worrying

 * 116 of the 916 changesets (12.66%) were backed out

If 13% is really worrying, what do you think our goal should be?

On Tue, Apr 23, 2013 at 12:39 AM, Ehsan Akhgari ehsan.akhg...@gmail.com wrote:
 This was a fantastic read, it almost made me shed happy tears!  Thanks a lot
 kats for doing this.

 The ratio of things landed on inbound which turn out to busted is really
 worrying, and it might be an indicator that (some?) developers have a poor
 judgement on how safe their patches are.  How hard would it be to gather a
 list of the total number of patches being backed out plus the amount of time
 that we spent building/testing those, hopefully in a style similar to
 http://people.mozilla.org/~catlee/highscores/highscores.html?  If we had
 such a list, perhaps we could reach out to the high offenders there and let
 them know about the problem, and see if that changes these stats a couple of
 weeks from now?

 Thanks!
 Ehsan


 On 2013-04-22 3:54 PM, Kartikaya Gupta wrote:

 TL;DR:
 * Inbound is closed 25% of the time
 * Turning off coalescing could increase resource usage by up to 60% (but
 probably less than this).
 * We spend 24% of our machine resources on changes that are later backed
 out, or changes that are doing the backout
 * The vast majority of changesets that are backed out from inbound are
 detectable on a try push

 Because of the large effect from coalescing, any changes to the current
 process must not require running the full set of tests on every push.
 (In my proposal this is easily accomplished with trychooser syntax, but
 other proposals include rotating through T-runs on pushes, etc.).

 --- Long verion below ---

 Following up from the infra load meeting we had last week, I spent some
 time this weekend crunching various pieces of data on mozilla-inbound to
 get a sense of how much coalescing actually helps us, how much backouts
 hurt us, and generally to get some data on the impact of my previous
 proposal for using a multi-headed tree. I didn't get all the data that I
 wanted but as I probably won't get back to this for a bit, I thought I'd
 share what I found so far and see if anybody has other specific pieces
 of data they would like to see gathered.

 -- Inbound uptime --

 I looked at a ~9 day period from April 7th to April 16th. During this
 time:
 * inbound was closed for 24.9587% of the total time
 * inbound was closed for 15.3068% of the total time due to bustage.
 * inbound was closed for 11.2059% of the total time due to infra.

 Notes:
 1) bustage and infra were determined by grep -i on the data from
 treestatus.mozilla.org.
 2) There is some overlap so bustage + infra != total.
 3) I also weighted the downtime using checkins-per-hour histogram from
 joduinn's blog at [1], but this didn't have a significant impact: the
 total, bustage, and infra downtime percentages moved to 25.5392%,
 15.7285%, and 11.3748% respectively.

 -- Backout changes --

 Next I did an analysis of the changes that landed on inbound during that
 time period. The exact pushlog that I looked at (corresponding to the
 same April 7 - April 16 time period) is at [2]. I removed all of the
 merge changesets from this range, since I wanted to look at inbound in
 as much isolation as possible.

 In this range:
 * there were a total of 916 changesets
 * there were a total of 553 pushes
 * 74 of the 916 changesets (8.07%) were backout changesets
 * 116 of the 916 changesets (12.66%) were backed out
 * removing all backouts and changes backed out removed 114 pushes (20.6%)

 Of the 116 changesets that were backed out:
 * 37 belonged to single-changeset pushes
 * 65 belonged to multi-changeset pushes where the entire pushed was
 backed out
 * 14 belonged to multi-changeset pushes where the changesets were
 selectively backed out

 Of the 74 backout changesets:
 * 4 were for commit message problems
 * 25 were for build failures
 * 36 were for test failures
 * 5 were for leaks/talos regressions
 * 1 was for premature landing
 * 3 were for unknown reasons

 Notes:
 1) There were actually 79 backouts, but I ignored 5 of them because they
 backed out changes that happened prior to the start of my range).
 2) Additional changes at the end of my range may have been backed out,
 but the backouts were not in my range so I didn't include them in my
 analysis.
 3) The 14 csets that were selectively backed out is interesting to me
 because it implies that somebody did some work to identify which changes
 in the push were bad, and this naturally means that there is room to
 save on doing that work.

 -- Merge conflicts --

 I also wanted to determine how many of these changes conflicted with
 each other, and how far away the conflicting changes were. I got a
 partial result here but I need to do more analysis before I have numbers
 worth posting.

 -- Build farm resources --

 Finally, I used a combination of gps' mozilla-build-analyzer