Re: Some data on mozilla-inbound
Maybe. I started to avoid it if possible around then, but almost 4 hours for results still is basically unusable. - Wes - Original Message - From: Phil Ringnalda philringna...@gmail.com To: dev-platform@lists.mozilla.org Sent: Friday, April 26, 2013 8:01:25 AM Subject: Re: Some data on mozilla-inbound On 4/25/13 4:47 PM, Wesley Johnston wrote: Requesting one set of tests on one platform is a 6-10 hour turnaround for me. That's surprising. https://tbpl.mozilla.org/?tree=Tryrev=9d1daf69061d was a midday -b do -p all -u all with a 3 hour 40 minute end-to-end. Or did you mean, as a great many people do while discussing try these days, back in February when I stopped using try because it was so awful then, requesting one set of tests...? ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Some data on mozilla-inbound
On 4/26/13 8:25 AM, Wesley Johnston wrote: Maybe. I started to avoid it if possible around then, but almost 4 hours for results still is basically unusable. Tell me about it - that's actually the same as the end-to-end on inbound/central. Unfortunately, engineering is totally indifferent to things like having doubled the cycle time for Win debug browser-chrome since last November. ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Some data on mozilla-inbound
Bug 864085 On Fri, Apr 26, 2013 at 2:06 PM, Kartikaya Gupta kgu...@mozilla.com wrote: On 13-04-26 11:37 , Phil Ringnalda wrote: Unfortunately, engineering is totally indifferent to things like having doubled the cycle time for Win debug browser-chrome since last November. Is there a bug filed for this? I just cranked some of the build.json files through some scripts and got the average time (in seconds) for all the jobs run on the mozilla-central_xp-debug_test-mochitest-browser-chrome builders, and there is in fact a significant increase since November. This makes me think that we need a resource usage regression alarm of some sort too. builds-2012-11-01.js: 4063 builds-2012-11-15.js: 4785 builds-2012-12-01.js: 5311 builds-2012-12-15.js: 5563 builds-2013-01-01.js: 6326 builds-2013-01-15.js: 5706 builds-2013-02-01.js: 5823 builds-2013-02-15.js: 6103 builds-2013-03-01.js: 5642 builds-2013-03-15.js: 5187 builds-2013-04-01.js: 5643 builds-2013-04-15.js: 6207 kats ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Some data on mozilla-inbound
On 4/26/2013 2:06 PM, Kartikaya Gupta wrote: On 13-04-26 11:37 , Phil Ringnalda wrote: Unfortunately, engineering is totally indifferent to things like having doubled the cycle time for Win debug browser-chrome since last November. Is there a bug filed for this? I just cranked some of the build.json files through some scripts and got the average time (in seconds) for all the jobs run on the mozilla-central_xp-debug_test-mochitest-browser-chrome builders, and there is in fact a significant increase since November. This makes me think that we need a resource usage regression alarm of some sort too. builds-2012-11-01.js: 4063 builds-2012-11-15.js: 4785 builds-2012-12-01.js: 5311 builds-2012-12-15.js: 5563 builds-2013-01-01.js: 6326 builds-2013-01-15.js: 5706 builds-2013-02-01.js: 5823 builds-2013-02-15.js: 6103 builds-2013-03-01.js: 5642 builds-2013-03-15.js: 5187 builds-2013-04-01.js: 5643 builds-2013-04-15.js: 6207 Well, wall time will [likely] increase as we write new tests. I'm guessing (OK, really hoping) the number of mochitest files has increased in rough proportion to the wall time? Also, aren't we executing some tests on virtual machines now? On any virtual machine (and especially on EC2), you don't know what else is happening on the physical machine, so CPU and I/O steal are expected to cause variations and slowness in execution time. Speaking of resource usage, I've filed bug 859573 to have system resource counters reported as part of jobs. That way, we can have a high-level handle on whether our CPU efficiency is increasing/decreasing over time. I'd argue that we should strive for 100% CPU saturation on every slave (for most jobs) otherwise those CPU cycles are lost forever and we've wasted capacity. But, that's arguably a conversation for another thread. While I don't have numbers off hand, one of the things I noticed was the wall time of the various test chunks isn't as balanced as it should be. In particular, bc tests seem to be a long pole. Perhaps we should split them into bc-1 and bc-2? Along that vein, perhaps we could combine some of the regular mochitest jobs, as they don't seem to take too long to execute. Who makes these kinds of decisions? On the subject of mochitests, I think we should really pound home the message that mochitests should be avoided if possible. If you can move more business logic into JSMs and test with xpcshell tests and only write mochitests for the code that exists in the browser, that's a net win (xpcshell tests are lighter weight and easier to run in parallel). This would likely involve a huge shift in the way FX Team (and others) write code and tests, so I don't expect it will be an easy sell. But, it's a discussion we should have because the impact on test execution times could be drastic. ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Some data on mozilla-inbound
On 14:29, Fri, 26 Apr, Gregory Szorc wrote: On 4/26/2013 2:06 PM, Kartikaya Gupta wrote: On 13-04-26 11:37 , Phil Ringnalda wrote: Unfortunately, engineering is totally indifferent to things like having doubled the cycle time for Win debug browser-chrome since last November. Is there a bug filed for this? I just cranked some of the build.json files through some scripts and got the average time (in seconds) for all the jobs run on the mozilla-central_xp-debug_test-mochitest-browser-chrome builders, and there is in fact a significant increase since November. This makes me think that we need a resource usage regression alarm of some sort too. builds-2012-11-01.js: 4063 builds-2012-11-15.js: 4785 builds-2012-12-01.js: 5311 builds-2012-12-15.js: 5563 builds-2013-01-01.js: 6326 builds-2013-01-15.js: 5706 builds-2013-02-01.js: 5823 builds-2013-02-15.js: 6103 builds-2013-03-01.js: 5642 builds-2013-03-15.js: 5187 builds-2013-04-01.js: 5643 builds-2013-04-15.js: 6207 Well, wall time will [likely] increase as we write new tests. I'm guessing (OK, really hoping) the number of mochitest files has increased in rough proportion to the wall time? Also, aren't we executing some tests on virtual machines now? On any virtual machine (and especially on EC2), you don't know what else is happening on the physical machine, so CPU and I/O steal are expected to cause variations and slowness in execution time. Those tests are still on exactly the same hardware. philor points out in https://bugzilla.mozilla.org/show_bug.cgi?id=864085#c0 that the time increase is disproportionate for win7. It would be interesting to look at all the other suites too. Perhaps a regular report of how much our wall-clock times for builds and different test suite has changed week-over-week would be useful? That aside, how do we cope with an ever-increasing runtime requirement of tests? Keep adding more chunks? signature.asc Description: Digital signature ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Some data on mozilla-inbound
On 4/24/13 9:50 PM, Ehsan Akhgari wrote: No. But that's not what I was talking about. Whether something lands directly on try is a judgement call, and some people may be better at it than others. As someone who has stopped using try server as a rule (because of the excessive wait times there, which I find unacceptable for day-to-day work), I always ask myself what are the chances that this thing that I want to push could bounce, and I test on try only when I can convince myself that the chances are slow. All I was suggesting was give people a way to assess whether they're good at making these calls, and improve it if they're not. I'm curious about what you think the wait times are, and what wait times you would find acceptable. ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Some data on mozilla-inbound
On 2013-04-25 2:42 AM, Phil Ringnalda wrote: On 4/24/13 9:50 PM, Ehsan Akhgari wrote: No. But that's not what I was talking about. Whether something lands directly on try is a judgement call, and some people may be better at it than others. As someone who has stopped using try server as a rule (because of the excessive wait times there, which I find unacceptable for day-to-day work), I always ask myself what are the chances that this thing that I want to push could bounce, and I test on try only when I can convince myself that the chances are slow. All I was suggesting was give people a way to assess whether they're good at making these calls, and improve it if they're not. I'm curious about what you think the wait times are, and what wait times you would find acceptable. Ideally the end to end times would be the amount of time it takes to build + the amount of time it takes to run the slowest test suite requested (which would only be achievable with enough capacity.) What has caused me to stop using the try server is that it is totally unreliable for getting results back on the *same day*, and whether or not you can do that depends on how everybody else is using it. These days, I only run try server builds on things that I absolutely cannot test manually (e.g. when I do something which might break Windows on a weekend where I don't have easy access to a Windows box) or when I can deal with putting things off to the next day(s), as sometimes you need to do multiple rounds of try pushes and that makes what would otherwise be a few hour project for me into a week long project, which is devastating to my productivity. Cheers, Ehsan ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Some data on mozilla-inbound
With extremely limited experience of using try, I know that I would have at times set a flag stop as soon as you hit a first red on a platform. So, I really like Chris' idea below, as a manual workaround, and a more powerful solution for that. Easier said than done, I imagine... Milan On 2013-04-25, at 12:10 PM, Chris Lord cl...@mozilla.com wrote: Something that strikes me as very obvious that can be done to reduce load on try, is to allow for jobs to be requested and cancelled in a more granular fashion. Right now, I have to think before I push What's the most I could possibly need? And if I don't request enough, I have to push an entire new job! I know that I'd request a lot less from try, and request fewer jobs, if I could, after I've pushed, trigger/cancel builds per platform, and request/cancel particular tests. --Chris ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Some data on mozilla-inbound
Justin Lebar wrote: Note that we don't have enough capacity to turn around current try requests within a reasonable amount of time. Is this because people are requesting too much because try chooser simply isn't sufficiently descriptive for what people want? -- Warning: May contain traces of nuts. ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Some data on mozilla-inbound
On 2013-04-23 12:05 PM, Justin Lebar wrote: The ratio of things landed on inbound which turn out to busted is really worrying On the one hand, we're told not to push to try too much, because that wastes resources. On the other hand, we're told not to burn m-i, because that wastes resources. True! Should we be surprised when people don't get this right 100% of the time? No. But that's not what I was talking about. Whether something lands directly on try is a judgement call, and some people may be better at it than others. As someone who has stopped using try server as a rule (because of the excessive wait times there, which I find unacceptable for day-to-day work), I always ask myself what are the chances that this thing that I want to push could bounce, and I test on try only when I can convince myself that the chances are slow. All I was suggesting was give people a way to assess whether they're good at making these calls, and improve it if they're not. Instead of considering how to get people people to strike a better balance between wasting infra resources and burning inbound, I think we need to consider what we can do to increase the acceptable margin of error. These are not either/or choices. Note that we don't have enough capacity to turn around current try requests within a reasonable amount of time. Pushing to inbound is the only way to get quick feedback on whether your patch works, these days. As I've said before, I'd love to see releng report on try turnaround times, so we can hold someone accountable. The data is there; we just need to process it. If we can't increase the amount of infra capacity we have, perhaps we could use it more effectively. We've discussed lots of ways we might accomplish this on this newsgroup, and I've seen very few of them tried. Perhaps an important part of the problem is that we're not able to innovate quickly enough on this front. We've been asking for more infra capacity for as long as I can remember, and so far we've always had a shortage in that front (part of which is due to the continuous increase in development pace, which is a good thing), so I agree that the way to win this battle is to stop waiting for that magical day when we have enough capacity and stat using it more efficiently. What I suggested could be a part of that. People are always going to make mistakes, and the purpose of processes is to minimize the harm caused by those mistakes, not to embarrass or cajole people into behaving better in the future. As Jono would say, it's not the user's fault. Nobody's blaming the user. We should just empower them to make better choices. Cheers, Ehsan ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Some data on mozilla-inbound
On 2013-04-23 1:17 PM, Ed Morley wrote: On 23/04/2013 17:28, Kartikaya Gupta wrote: On 13-04-23 00:39 , Ehsan Akhgari wrote: How hard would it be to gather a list of the total number of patches being backed out plus the amount of time that we spent building/testing those, hopefully in a style similar to http://people.mozilla.org/~catlee/highscores/highscores.html? Not trivial, but not too difficult either. Do we have any evidence to show that the try highscores page has made an impact in reducing unnecessary try usage? Also I agree with Justin that if we do this it will be very much a case of sending mixed messages. The try highscores list says to people don't land on try and the backout highscores list would say to people always test on try. It's worth noting that when I've contacted developers in the top 10 of the tryserver usage leaderboard my message is not do not use try, but instead suggestions like: * please do not use -p all -u all when you only made an android specific change * you already did a |-p all -u all| run - on which mochitest-1 failed on all platforms, so please don't test every testsuite on every platform for the half dozen iterations you ran on Try thereafter (at much as this sounds like an extreme example, there have been cases like this) Yes, this! ^ The messaging around this should not be to tell people always test on try. It should be to help them figure out how to make better judgement calls on this. This is a skill that people develop and are not born with, and without data it's hard an an individual to judge how good I'm at that. Ehsan ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Some data on mozilla-inbound
On 2013-04-24 9:14 AM, Ben Hearsum wrote: On 04/23/13 10:21 PM, Kartikaya Gupta wrote: On 13-04-23 19:21 , Nicholas Nethercote wrote: - The 'inbound was closed for 15.3068% of the total time due to bustage' number is an underestimate, in one sense. When inbound is closed at 10am California time, it's a lot more inconvenient to developers than when it's busted at midnight California time. More than 3x, according to http://oduinn.com/images/2013/blog_2013_02_pushes_per_hour.png. See my note 3 under the Inbound uptime section. I used exactly that graph to weight the inbound downtime and there wasn't a significant difference. - Getting agreement on a significant process change is really difficult. Is it possible to set up a repo where a few people can volunteer to try Kats' approach for a couple of weeks? That would provide invaluable experience and data. Yeah, there are plans afoot to try this, pending sheriff approval. If you know what you want the repo to be called I'd advise filing a RelEng bug about it now and we can get it done without being in the critical path later on. You can also just ask for one of the twigs to be customized (https://wiki.mozilla.org/ReleaseEngineering/DisposableProjectBranches). We're planning to use the try server, I believe. Cheers, Ehsan ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Some data on mozilla-inbound
The messaging around this should not be to tell people always test on try. It should be to help them figure out how to make better judgement calls on this. This is a skill that people develop and are not born with, and without data it's hard an an individual to judge how good I'm at that. One idea might be to give developers feedback on the consequences of a particular push, e.g. the AWS cost, a proxy for time during which developers couldn't push or some other measurable metric. Right now each push probably feels as expensive as every other. ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Some data on mozilla-inbound
On 2013-04-25 1:02 AM, David Ascher wrote: The messaging around this should not be to tell people always test on try. It should be to help them figure out how to make better judgement calls on this. This is a skill that people develop and are not born with, and without data it's hard an an individual to judge how good I'm at that. One idea might be to give developers feedback on the consequences of a particular push, e.g. the AWS cost, a proxy for time during which developers couldn't push or some other measurable metric. Right now each push probably feels as expensive as every other. The AWS cost would be the wrong measure, since it doesn't account for the amount of time that 100 other people spent grinding their teeth because they could not push. :-) But yeah, I agree with the general idea of a cost measure, I just can't think of what a good one would be (well, one better than the wall-clock time...) Ehsan ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Some data on mozilla-inbound
One idea might be to give developers feedback on the consequences of a particular push, e.g. the AWS cost, a proxy for time during which developers couldn't push or some other measurable metric. Right now each push probably feels as expensive as every other. For tryserver, I proposed bug 848589 to do just this. I think it's worth trying, but someone needs to implement it. Nobody's blaming the user. We should just empower them to make better choices. Okay. I guess what's frustrating to me is that we have this problem and essentially our only option to solve it is to change users' behavior. I totally believe that some people could use resources much more efficiently, but it's frustrating if changing user behavior is our only tool. We keep talking about this every few weeks, as though there's some hidden solution that will emerge only after ten newsgroup threads. In actuality, we very likely will need to do a bunch of different things, each having a small impact. And in particular, I don't think we'll solve this problem without significant work from release engineering. If that work isn't forthcoming, I don't think we're going to make a significant dent in this. On Thu, Apr 25, 2013 at 1:20 AM, Ehsan Akhgari ehsan.akhg...@gmail.com wrote: On 2013-04-25 1:02 AM, David Ascher wrote: The messaging around this should not be to tell people always test on try. It should be to help them figure out how to make better judgement calls on this. This is a skill that people develop and are not born with, and without data it's hard an an individual to judge how good I'm at that. One idea might be to give developers feedback on the consequences of a particular push, e.g. the AWS cost, a proxy for time during which developers couldn't push or some other measurable metric. Right now each push probably feels as expensive as every other. The AWS cost would be the wrong measure, since it doesn't account for the amount of time that 100 other people spent grinding their teeth because they could not push. :-) But yeah, I agree with the general idea of a cost measure, I just can't think of what a good one would be (well, one better than the wall-clock time...) Ehsan ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Some data on mozilla-inbound
On 4/22/13 9:54 PM, Kartikaya Gupta wrote: TL;DR: * Inbound is closed 25% of the time * Turning off coalescing could increase resource usage by up to 60% (but probably less than this). * We spend 24% of our machine resources on changes that are later backed out, or changes that are doing the backout * The vast majority of changesets that are backed out from inbound are detectable on a try push Do we know how many of these have been pushed to try, and passed/compiled what they'd fail later? I expect some cost of regressions to come from merging/rebasing, and it'd be interesting to know how much of that you can see in the data window you looked at. has been pushed to try is obviously tricky to find out, in particular on rebases, and possibly modified patches during the rebase. Axel Because of the large effect from coalescing, any changes to the current process must not require running the full set of tests on every push. (In my proposal this is easily accomplished with trychooser syntax, but other proposals include rotating through T-runs on pushes, etc.). --- Long verion below --- Following up from the infra load meeting we had last week, I spent some time this weekend crunching various pieces of data on mozilla-inbound to get a sense of how much coalescing actually helps us, how much backouts hurt us, and generally to get some data on the impact of my previous proposal for using a multi-headed tree. I didn't get all the data that I wanted but as I probably won't get back to this for a bit, I thought I'd share what I found so far and see if anybody has other specific pieces of data they would like to see gathered. -- Inbound uptime -- I looked at a ~9 day period from April 7th to April 16th. During this time: * inbound was closed for 24.9587% of the total time * inbound was closed for 15.3068% of the total time due to bustage. * inbound was closed for 11.2059% of the total time due to infra. Notes: 1) bustage and infra were determined by grep -i on the data from treestatus.mozilla.org. 2) There is some overlap so bustage + infra != total. 3) I also weighted the downtime using checkins-per-hour histogram from joduinn's blog at [1], but this didn't have a significant impact: the total, bustage, and infra downtime percentages moved to 25.5392%, 15.7285%, and 11.3748% respectively. -- Backout changes -- Next I did an analysis of the changes that landed on inbound during that time period. The exact pushlog that I looked at (corresponding to the same April 7 - April 16 time period) is at [2]. I removed all of the merge changesets from this range, since I wanted to look at inbound in as much isolation as possible. In this range: * there were a total of 916 changesets * there were a total of 553 pushes * 74 of the 916 changesets (8.07%) were backout changesets * 116 of the 916 changesets (12.66%) were backed out * removing all backouts and changes backed out removed 114 pushes (20.6%) Of the 116 changesets that were backed out: * 37 belonged to single-changeset pushes * 65 belonged to multi-changeset pushes where the entire pushed was backed out * 14 belonged to multi-changeset pushes where the changesets were selectively backed out Of the 74 backout changesets: * 4 were for commit message problems * 25 were for build failures * 36 were for test failures * 5 were for leaks/talos regressions * 1 was for premature landing * 3 were for unknown reasons Notes: 1) There were actually 79 backouts, but I ignored 5 of them because they backed out changes that happened prior to the start of my range). 2) Additional changes at the end of my range may have been backed out, but the backouts were not in my range so I didn't include them in my analysis. 3) The 14 csets that were selectively backed out is interesting to me because it implies that somebody did some work to identify which changes in the push were bad, and this naturally means that there is room to save on doing that work. -- Merge conflicts -- I also wanted to determine how many of these changes conflicted with each other, and how far away the conflicting changes were. I got a partial result here but I need to do more analysis before I have numbers worth posting. -- Build farm resources -- Finally, I used a combination of gps' mozilla-build-analyzer tool [3] and some custom tools to determine how much machine time was spent on building all of these pushes and changes. I looked at all the build.json files [4] from the 6th of April to the 17th of April and pulled out all the jobs that corresponding to the push changesets in my range above. For this set of 553 changesets, there were 500 (exactly!) distinct builders. 111 of these had -pgo or _pgo in the name, and I excluded them. I created a 553x389 matrix with the remaining builders and filled in how much time was spent on each changeset for each builder (in case of multiple jobs, I added the times). Then I assumed that any empty field in the 553x389 matrix was a result of coalescing.
Re: Some data on mozilla-inbound
Kartikaya Gupta wrote: The vast majority of changesets that are backed out from inbound are detectable on a try push Hopefully a push never burns all platforms because the developer tried it locally first, but stranger things have happened! But what I'm most interested in is whether patches are more likely to be backed out for build or test failures. Perhaps if we could optimise our use of Try then that would reduce the load on inbound. For example: * At first, the push is built on one fast and readily available platform (linux64 is often mentioned) * If this builds, then all platforms build * Only once all platforms have built are tests run This would avoid running tests for pushes that are known not to build on all platforms. -- Warning: May contain traces of nuts. ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Some data on mozilla-inbound
On 23/4/13 09:58, Neil wrote: Kartikaya Gupta wrote: The vast majority of changesets that are backed out from inbound are detectable on a try push Hopefully a push never burns all platforms because the developer tried it locally first, but stranger things have happened! But what I'm most interested in is whether patches are more likely to be backed out for build or test failures. Perhaps if we could optimise our use of Try then that would reduce the load on inbound. For example: * At first, the push is built on one fast and readily available platform (linux64 is often mentioned) * If this builds, then all platforms build * Only once all platforms have built are tests run This would avoid running tests for pushes that are known not to build on all platforms. OTOH, it would significantly extend the time a developer has to wait before tryserver test results begin to appear. Which I think people would find discouraging. JK ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Some data on mozilla-inbound
On 23 April 2013 09:58:41, Neil wrote: Hopefully a push never burns all platforms because the developer tried it locally first, but stranger things have happened! This actually happens quite often. On occasion it's due to warnings as errors (switched off by default on local machines due to toolchain differences), but more often than not the developer didn't even try compiling locally :-/ Given that local machine time scales linearly with the rate at which we hire devs (unlike our automation capacity), I think we need to work out why (some) people aren't doing things like compiling locally and running their team's directory of tests before pushing. I would hazard a guess that if we improved incremental build times created mach commands to simplify the edit-compile-test loop, then we could cut out many of these obvious inbound bustage cases. ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Some data on mozilla-inbound
On 23/04/13 10:17, Ed Morley wrote: Given that local machine time scales linearly with the rate at which we hire devs (unlike our automation capacity), I think we need to work out why (some) people aren't doing things like compiling locally and running their team's directory of tests before pushing. I would hazard a guess that if we improved incremental build times created mach commands to simplify the edit-compile-test loop, then we could cut out many of these obvious inbound bustage cases. That would be the carrot. The stick would be finding some way of finding out whether a changeset was pushed to try before it was pushed to m-i. If a developer failed to push to try and then broke m-i, we could (in a pre-commit hook) refuse to let them commit to m-i in future unless they'd already pushed to try. For a week, on first offence, a month on subsequent offences :-) This, of course, is predicated on being able to detect in real time whether a changeset being pushed to m-i has previously been pushed to try. Gerv ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Some data on mozilla-inbound
On 16:34, Tue, 23 Apr, Gervase Markham wrote: On 23/04/13 10:17, Ed Morley wrote: Given that local machine time scales linearly with the rate at which we hire devs (unlike our automation capacity), I think we need to work out why (some) people aren't doing things like compiling locally and running their team's directory of tests before pushing. I would hazard a guess that if we improved incremental build times created mach commands to simplify the edit-compile-test loop, then we could cut out many of these obvious inbound bustage cases. That would be the carrot. The stick would be finding some way of finding out whether a changeset was pushed to try before it was pushed to m-i. If a developer failed to push to try and then broke m-i, we could (in a pre-commit hook) refuse to let them commit to m-i in future unless they'd already pushed to try. For a week, on first offence, a month on subsequent offences :-) This, of course, is predicated on being able to detect in real time whether a changeset being pushed to m-i has previously been pushed to try. We've considered enforcing this using some cryptographic token. After you push to try and get good results, the system gives you a token you need to include in your commit to m-i. Alternatively, you could indicate the try revision you pushed, and we could look up the results and refuse the commit based on your build/tests results on try, or if you commit to m-i is too different than the push to try. Cheers, Chris signature.asc Description: Digital signature ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Some data on mozilla-inbound
On 13-04-23 11:41 , Chris AtLee wrote: We've considered enforcing this using some cryptographic token. After you push to try and get good results, the system gives you a token you need to include in your commit to m-i. ... or you could just merge the cset directly from try to m-i or m-c. (i.e. my original proposal). Cheers, kats ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Some data on mozilla-inbound
On 13-04-23 00:39 , Ehsan Akhgari wrote: How hard would it be to gather a list of the total number of patches being backed out plus the amount of time that we spent building/testing those, hopefully in a style similar to http://people.mozilla.org/~catlee/highscores/highscores.html? Not trivial, but not too difficult either. Do we have any evidence to show that the try highscores page has made an impact in reducing unnecessary try usage? Also I agree with Justin that if we do this it will be very much a case of sending mixed messages. The try highscores list says to people don't land on try and the backout highscores list would say to people always test on try. Cheers, kats ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Some data on mozilla-inbound
On 13-04-23 03:57 , Axel Hecht wrote: Do we know how many of these have been pushed to try, and passed/compiled what they'd fail later? I haven't looked at this. It would be useful to know but short of pulling patches and using some similarity heuristic or manually examining patches I can't think of a way to get this data. I expect some cost of regressions to come from merging/rebasing, and it'd be interesting to know how much of that you can see in the data window you looked at. This is something I did try to determine, by looking at the number of conflicts between patches in my data window. My algorithm was basically this: 1) Sync a tree to the last cset in the range 2) Iterate through each push backwards, skipping merges, backouts, and changes that are later backed out 3) For each of these pushes, try to qpush a backout of it. 4) If the attempted qpush fails, that means there is another change that landed since that one that there is a merge conflict with. The problem here is that the farther back you go the more likely it is that you will run into conflicting changes, because an increasing portion of the data window is checked for conflicts when really you probably only want to test some small number of changes (~30?). Using this approach I got 129 conflicts, and as expected, the rate at which I encountered conflicts went up as I went farther back. I didn't get around to trying the sliding window approach which I believe will give a more representative (and much lower) count. My code for doing this is in the bottom half of [1] if you (or anybody else) wants to give that a shot. kats [1] https://github.com/staktrace/mozilla-tree-analyzer/blob/master/inbound-csets.sh - WARNING don't *run* anything in this repo because it may do destructive things. Ask me if you're not sure. ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Some data on mozilla-inbound
On Tue, Apr 23, 2013 at 8:41 AM, Chris AtLee cat...@mozilla.com wrote: We've considered enforcing this using some cryptographic token. After you push to try and get good results, the system gives you a token you need to include in your commit to m-i. Sounds like the goal of this kind of solution would be to eliminate the developer made a bad judgement call case, but it's not at all clear to me that that problem is worse than developer overuses try for trivial changes or developer needs to wait for try results before pushing trivial fix problem. It's also not at all clear to me that a 13% backout rate on inbound is a problem, because there are a lot of factors at play. Those backouts represent wasted resources (build machine time, sheriff time, sometimes tree-closure time), but if the alternative is wasting developer time (needing to wait for try results unnecessarily) and tryserver build machine time, the tradeoff becomes less clear. Obviously different perspectives here also impact your view of those tradeoffs. Gavin ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Some data on mozilla-inbound
On Tue, Apr 23, 2013 at 9:28 AM, Kartikaya Gupta kgu...@mozilla.com wrote: Not trivial, but not too difficult either. Do we have any evidence to show that the try highscores page has made an impact in reducing unnecessary try usage? It's been used by people like Ed Morley to reach out to individual developers and notify them of their impact. I'm sure that's had a positive effect, though it seems rather difficult to measure. Gavin ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Some data on mozilla-inbound
On 04/23/13 02:17, Ed Morley wrote: On 23 April 2013 09:58:41, Neil wrote: Hopefully a push never burns all platforms because the developer tried it locally first, but stranger things have happened! This actually happens quite often. On occasion it's due to warnings as errors (switched off by default on local machines due to toolchain differences) I would like to know a bit more about this. Is our list of supported toolchains so diverse that building with one version versus another will report so many false positives as to be useless? I enabled warnings-as-errors on my local machine after pushing something to inbound that failed to build because of this, and I've had no problems since then. Enabling this by default seems like an easy way to remove instances of this problem. but more often than not the developer didn't even try compiling locally :-/ So there are instances where developers didn't use the try servers and also didn't compile locally at all before pushing to inbound? I don't think we as a community should be okay with that kind of irresponsible behavior. ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Some data on mozilla-inbound
On 23/04/2013 17:28, Kartikaya Gupta wrote: On 13-04-23 00:39 , Ehsan Akhgari wrote: How hard would it be to gather a list of the total number of patches being backed out plus the amount of time that we spent building/testing those, hopefully in a style similar to http://people.mozilla.org/~catlee/highscores/highscores.html? Not trivial, but not too difficult either. Do we have any evidence to show that the try highscores page has made an impact in reducing unnecessary try usage? Also I agree with Justin that if we do this it will be very much a case of sending mixed messages. The try highscores list says to people don't land on try and the backout highscores list would say to people always test on try. It's worth noting that when I've contacted developers in the top 10 of the tryserver usage leaderboard my message is not do not use try, but instead suggestions like: * please do not use -p all -u all when you only made an android specific change * you already did a |-p all -u all| run - on which mochitest-1 failed on all platforms, so please don't test every testsuite on every platform for the half dozen iterations you ran on Try thereafter (at much as this sounds like an extreme example, there have been cases like this) ... ... ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Some data on mozilla-inbound
On 4/23/13 1:17 PM, David Keeler wrote: I would like to know a bit more about this. Is our list of supported toolchains so diverse that building with one version versus another will report so many false positives as to be useless? Yes. For example a typical clang+ccache build of the tree with fatal warnings will fail unless you jump through deoptimize-ccache hoops, because things like if (FOO(x)) will warn if FOO(x) expands to (x == 5). For another example msvc until recently didn't actually have warnings as errors enabled at all in many directories, so it didn't matter what you did with your local setup in msvc. I enabled warnings-as-errors on my local machine after pushing something to inbound that failed to build because of this, and I've had no problems since then. It _really_ depends on the exact compiler and toolchain you're using. So there are instances where developers didn't use the try servers and also didn't compile locally at all before pushing to inbound? I don't think we as a community should be okay with that kind of irresponsible behavior. Agreed. -Boris ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Some data on mozilla-inbound
On 4/23/13 6:35 PM, Kartikaya Gupta wrote: On 13-04-23 03:57 , Axel Hecht wrote: Do we know how many of these have been pushed to try, and passed/compiled what they'd fail later? I haven't looked at this. It would be useful to know but short of pulling patches and using some similarity heuristic or manually examining patches I can't think of a way to get this data. I expect some cost of regressions to come from merging/rebasing, and it'd be interesting to know how much of that you can see in the data window you looked at. This is something I did try to determine, by looking at the number of conflicts between patches in my data window. My algorithm was basically this: 1) Sync a tree to the last cset in the range 2) Iterate through each push backwards, skipping merges, backouts, and changes that are later backed out 3) For each of these pushes, try to qpush a backout of it. 4) If the attempted qpush fails, that means there is another change that landed since that one that there is a merge conflict with. The problem here is that the farther back you go the more likely it is that you will run into conflicting changes, because an increasing portion of the data window is checked for conflicts when really you probably only want to test some small number of changes (~30?). Using this approach I got 129 conflicts, and as expected, the rate at which I encountered conflicts went up as I went farther back. I didn't get around to trying the sliding window approach which I believe will give a more representative (and much lower) count. My code for doing this is in the bottom half of [1] if you (or anybody else) wants to give that a shot. I expect that only a part of our programmatic merge conflicts are actually version control merge conflicts. There are a lot of cases like modifications to supposedly internal properties in toolkit starting to get a new usecase in browser, a define changing or disappearing, etc. All those invalidate the testing of the patch that has been done to some extent, and don't involve modifications to the same lines of code, which is all that version control catches. Axel ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Some data on mozilla-inbound
On 2013-04-23 12:50 AM, Justin Lebar wrote: The ratio of things landed on inbound which turn out to busted is really worrying * 116 of the 916 changesets (12.66%) were backed out If 13% is really worrying, what do you think our goal should be? Less than that? It's really hard to come up with hard numbers as goals here. Ehsan ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Some data on mozilla-inbound
On Mon, Apr 22, 2013 at 12:54 PM, Kartikaya Gupta kgu...@mozilla.com wrote: TL;DR: * Inbound is closed 25% of the time * Turning off coalescing could increase resource usage by up to 60% (but probably less than this). * We spend 24% of our machine resources on changes that are later backed out, or changes that are doing the backout * The vast majority of changesets that are backed out from inbound are detectable on a try push Thanks for collecting real data! A collage of thoughts follow. - The 'inbound was closed for 15.3068% of the total time due to bustage' number is an underestimate, in one sense. When inbound is closed at 10am California time, it's a lot more inconvenient to developers than when it's busted at midnight California time. More than 3x, according to http://oduinn.com/images/2013/blog_2013_02_pushes_per_hour.png. - Having our main landing repo closed multiple times per day, for a significant fraction of the time feels clownshoes-ish to me. For this reason, my gut feeling is that we'll end up doing something like what Kats is suggesting. My gut feeling is also that it won't end up changing the infrastructure load that much. - Any landing system that makes life harder for sheriffs is a problem. I'm not at all certain that Kats' proposal would do that, but that's my main worry about it. - A process whereby developers choose which tests to run on the official landing branch (be it inbound, or something else) feels like a bad idea. It's far too easy to get wrong. - Getting agreement on a significant process change is really difficult. Is it possible to set up a repo where a few people can volunteer to try Kats' approach for a couple of weeks? That would provide invaluable experience and data. Nick ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Some data on mozilla-inbound
On Wed, Apr 24, 2013 at 11:21 AM, Nicholas Nethercote n.netherc...@gmail.com wrote: - The 'inbound was closed for 15.3068% of the total time due to bustage' number is an underestimate, in one sense. When inbound is closed at 10am California time, it's a lot more inconvenient to developers than when it's busted at midnight California time. More than 3x, according to http://oduinn.com/images/2013/blog_2013_02_pushes_per_hour.png. Although I've been known to bust inbound, I also tend to check in around 2-3am PDT. I think it's important to remember that the optimal bustage rate for inbound is some value greater than zero and varies depending on the time of day. If inbound is never busted then we're wasting try resources testing patches that have a 0.99 probability of landing safely. OTOH, whenever the bustage rate is high enough it's difficult to get things landed, or the sheriffs' ability to detect regressions is impacted, it's too high. That currently seems to be the case so it seems like a good idea to use a highscore list or something like it to exert pressure to use try more until the situation is resolved. Rob -- q“qIqfq qyqoquq qlqoqvqeq qtqhqoqsqeq qwqhqoq qlqoqvqeq qyqoquq,q qwqhqaqtq qcqrqeqdqiqtq qiqsq qtqhqaqtq qtqoq qyqoquq?q qEqvqeqnq qsqiqnqnqeqrqsq qlqoqvqeq qtqhqoqsqeq qwqhqoq qlqoqvqeq qtqhqeqmq.q qAqnqdq qiqfq qyqoquq qdqoq qgqoqoqdq qtqoq qtqhqoqsqeq qwqhqoq qaqrqeq qgqoqoqdq qtqoq qyqoquq,q qwqhqaqtq qcqrqeqdqiqtq qiqsq qtqhqaqtq qtqoq qyqoquq?q qEqvqeqnq qsqiqnqnqeqrqsq qdqoq qtqhqaqtq.q ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Some data on mozilla-inbound
On 13-04-23 19:21 , Nicholas Nethercote wrote: - The 'inbound was closed for 15.3068% of the total time due to bustage' number is an underestimate, in one sense. When inbound is closed at 10am California time, it's a lot more inconvenient to developers than when it's busted at midnight California time. More than 3x, according to http://oduinn.com/images/2013/blog_2013_02_pushes_per_hour.png. See my note 3 under the Inbound uptime section. I used exactly that graph to weight the inbound downtime and there wasn't a significant difference. - Getting agreement on a significant process change is really difficult. Is it possible to set up a repo where a few people can volunteer to try Kats' approach for a couple of weeks? That would provide invaluable experience and data. Yeah, there are plans afoot to try this, pending sheriff approval. Cheers, kats ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Some data on mozilla-inbound
This was a fantastic read, it almost made me shed happy tears! Thanks a lot kats for doing this. The ratio of things landed on inbound which turn out to busted is really worrying, and it might be an indicator that (some?) developers have a poor judgement on how safe their patches are. How hard would it be to gather a list of the total number of patches being backed out plus the amount of time that we spent building/testing those, hopefully in a style similar to http://people.mozilla.org/~catlee/highscores/highscores.html? If we had such a list, perhaps we could reach out to the high offenders there and let them know about the problem, and see if that changes these stats a couple of weeks from now? Thanks! Ehsan On 2013-04-22 3:54 PM, Kartikaya Gupta wrote: TL;DR: * Inbound is closed 25% of the time * Turning off coalescing could increase resource usage by up to 60% (but probably less than this). * We spend 24% of our machine resources on changes that are later backed out, or changes that are doing the backout * The vast majority of changesets that are backed out from inbound are detectable on a try push Because of the large effect from coalescing, any changes to the current process must not require running the full set of tests on every push. (In my proposal this is easily accomplished with trychooser syntax, but other proposals include rotating through T-runs on pushes, etc.). --- Long verion below --- Following up from the infra load meeting we had last week, I spent some time this weekend crunching various pieces of data on mozilla-inbound to get a sense of how much coalescing actually helps us, how much backouts hurt us, and generally to get some data on the impact of my previous proposal for using a multi-headed tree. I didn't get all the data that I wanted but as I probably won't get back to this for a bit, I thought I'd share what I found so far and see if anybody has other specific pieces of data they would like to see gathered. -- Inbound uptime -- I looked at a ~9 day period from April 7th to April 16th. During this time: * inbound was closed for 24.9587% of the total time * inbound was closed for 15.3068% of the total time due to bustage. * inbound was closed for 11.2059% of the total time due to infra. Notes: 1) bustage and infra were determined by grep -i on the data from treestatus.mozilla.org. 2) There is some overlap so bustage + infra != total. 3) I also weighted the downtime using checkins-per-hour histogram from joduinn's blog at [1], but this didn't have a significant impact: the total, bustage, and infra downtime percentages moved to 25.5392%, 15.7285%, and 11.3748% respectively. -- Backout changes -- Next I did an analysis of the changes that landed on inbound during that time period. The exact pushlog that I looked at (corresponding to the same April 7 - April 16 time period) is at [2]. I removed all of the merge changesets from this range, since I wanted to look at inbound in as much isolation as possible. In this range: * there were a total of 916 changesets * there were a total of 553 pushes * 74 of the 916 changesets (8.07%) were backout changesets * 116 of the 916 changesets (12.66%) were backed out * removing all backouts and changes backed out removed 114 pushes (20.6%) Of the 116 changesets that were backed out: * 37 belonged to single-changeset pushes * 65 belonged to multi-changeset pushes where the entire pushed was backed out * 14 belonged to multi-changeset pushes where the changesets were selectively backed out Of the 74 backout changesets: * 4 were for commit message problems * 25 were for build failures * 36 were for test failures * 5 were for leaks/talos regressions * 1 was for premature landing * 3 were for unknown reasons Notes: 1) There were actually 79 backouts, but I ignored 5 of them because they backed out changes that happened prior to the start of my range). 2) Additional changes at the end of my range may have been backed out, but the backouts were not in my range so I didn't include them in my analysis. 3) The 14 csets that were selectively backed out is interesting to me because it implies that somebody did some work to identify which changes in the push were bad, and this naturally means that there is room to save on doing that work. -- Merge conflicts -- I also wanted to determine how many of these changes conflicted with each other, and how far away the conflicting changes were. I got a partial result here but I need to do more analysis before I have numbers worth posting. -- Build farm resources -- Finally, I used a combination of gps' mozilla-build-analyzer tool [3] and some custom tools to determine how much machine time was spent on building all of these pushes and changes. I looked at all the build.json files [4] from the 6th of April to the 17th of April and pulled out all the jobs that corresponding to the push changesets in my range above. For this set of 553 changesets, there were 500 (exactly!) distinct
Re: Some data on mozilla-inbound
The ratio of things landed on inbound which turn out to busted is really worrying * 116 of the 916 changesets (12.66%) were backed out If 13% is really worrying, what do you think our goal should be? On Tue, Apr 23, 2013 at 12:39 AM, Ehsan Akhgari ehsan.akhg...@gmail.com wrote: This was a fantastic read, it almost made me shed happy tears! Thanks a lot kats for doing this. The ratio of things landed on inbound which turn out to busted is really worrying, and it might be an indicator that (some?) developers have a poor judgement on how safe their patches are. How hard would it be to gather a list of the total number of patches being backed out plus the amount of time that we spent building/testing those, hopefully in a style similar to http://people.mozilla.org/~catlee/highscores/highscores.html? If we had such a list, perhaps we could reach out to the high offenders there and let them know about the problem, and see if that changes these stats a couple of weeks from now? Thanks! Ehsan On 2013-04-22 3:54 PM, Kartikaya Gupta wrote: TL;DR: * Inbound is closed 25% of the time * Turning off coalescing could increase resource usage by up to 60% (but probably less than this). * We spend 24% of our machine resources on changes that are later backed out, or changes that are doing the backout * The vast majority of changesets that are backed out from inbound are detectable on a try push Because of the large effect from coalescing, any changes to the current process must not require running the full set of tests on every push. (In my proposal this is easily accomplished with trychooser syntax, but other proposals include rotating through T-runs on pushes, etc.). --- Long verion below --- Following up from the infra load meeting we had last week, I spent some time this weekend crunching various pieces of data on mozilla-inbound to get a sense of how much coalescing actually helps us, how much backouts hurt us, and generally to get some data on the impact of my previous proposal for using a multi-headed tree. I didn't get all the data that I wanted but as I probably won't get back to this for a bit, I thought I'd share what I found so far and see if anybody has other specific pieces of data they would like to see gathered. -- Inbound uptime -- I looked at a ~9 day period from April 7th to April 16th. During this time: * inbound was closed for 24.9587% of the total time * inbound was closed for 15.3068% of the total time due to bustage. * inbound was closed for 11.2059% of the total time due to infra. Notes: 1) bustage and infra were determined by grep -i on the data from treestatus.mozilla.org. 2) There is some overlap so bustage + infra != total. 3) I also weighted the downtime using checkins-per-hour histogram from joduinn's blog at [1], but this didn't have a significant impact: the total, bustage, and infra downtime percentages moved to 25.5392%, 15.7285%, and 11.3748% respectively. -- Backout changes -- Next I did an analysis of the changes that landed on inbound during that time period. The exact pushlog that I looked at (corresponding to the same April 7 - April 16 time period) is at [2]. I removed all of the merge changesets from this range, since I wanted to look at inbound in as much isolation as possible. In this range: * there were a total of 916 changesets * there were a total of 553 pushes * 74 of the 916 changesets (8.07%) were backout changesets * 116 of the 916 changesets (12.66%) were backed out * removing all backouts and changes backed out removed 114 pushes (20.6%) Of the 116 changesets that were backed out: * 37 belonged to single-changeset pushes * 65 belonged to multi-changeset pushes where the entire pushed was backed out * 14 belonged to multi-changeset pushes where the changesets were selectively backed out Of the 74 backout changesets: * 4 were for commit message problems * 25 were for build failures * 36 were for test failures * 5 were for leaks/talos regressions * 1 was for premature landing * 3 were for unknown reasons Notes: 1) There were actually 79 backouts, but I ignored 5 of them because they backed out changes that happened prior to the start of my range). 2) Additional changes at the end of my range may have been backed out, but the backouts were not in my range so I didn't include them in my analysis. 3) The 14 csets that were selectively backed out is interesting to me because it implies that somebody did some work to identify which changes in the push were bad, and this naturally means that there is room to save on doing that work. -- Merge conflicts -- I also wanted to determine how many of these changes conflicted with each other, and how far away the conflicting changes were. I got a partial result here but I need to do more analysis before I have numbers worth posting. -- Build farm resources -- Finally, I used a combination of gps' mozilla-build-analyzer