Re: Experiment with running debug tests less often on mozilla-inbound the week of August 25
On 17:26, Tue, 23 Sep, Kyle Huey wrote: On Tue, Aug 26, 2014 at 8:23 AM, Chris AtLee wrote: Just a short note to say that this experiment is now live on mozilla-inbound. ___ dev-tree-management mailing list dev-tree-managem...@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-tree-management What was the outcome? Thanks for the reminder. The outcome of this experiment was inconclusive. On the one hand, we know we didn't make anything worse. The skipping behaved as expected, and wasn't a burden on sheriffs. We didn't make wait times any worse. On the other hand, it appears as though we improved wait times for the target platforms, but the signal there isn't clear due to other variables changing (e.g. overall load wasn't directly comparable between the two time windows). We've left the skipping behaviour enabled for the moment, and are considering some tweaks to the amount of skipping that happens, and which branches/platforms it's enabled for. Cheers, Chris signature.asc Description: Digital signature ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Experiment with running debug tests less often on mozilla-inbound the week of August 25
On Tue, Aug 26, 2014 at 8:23 AM, Chris AtLee wrote: > Just a short note to say that this experiment is now live on > mozilla-inbound. > > ___ > dev-tree-management mailing list > dev-tree-managem...@lists.mozilla.org > https://lists.mozilla.org/listinfo/dev-tree-management > What was the outcome? - Kyle ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Experiment with running debug tests less often on mozilla-inbound the week of August 25
Just a short note to say that this experiment is now live on mozilla-inbound. signature.asc Description: Digital signature ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Experiment with running debug tests less often on mozilla-inbound the week of August 25
On Thu, Aug 21, 2014 at 3:21 PM, Jonathan Griffin wrote: > In summary, the sheriffs won't be backing out extra commits because of the > coalescing, and it remains the sheriffs' job to backfill tests when they > determine they need to do so in order to bisect a failure. We aren't > placing any extra burden on developers with this experiment, and part of the > reason for this experiment is to determine how much of an extra burden this > is for the sheriffs. As long as sheriffs are in support of this (which it sounds like is the case), then this sounds awesome to me. / Jonas ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Experiment with running debug tests less often on mozilla-inbound the week of August 25
On Thu, Aug 21, 2014 at 03:03:30PM -0700, Jonas Sicking wrote: > What will be the policy if a test fails and it's unclear which push > caused the regression? You may have missed the main point that it's not "What will", but "What is". It *is* already the case that tests are skipped. Mike ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Experiment with running debug tests less often on mozilla-inbound the week of August 25
It will be handled just like coalesced jobs today: sheriffs will backfill the missing data, and backout only the offender. An illustration might help. Today we might have something like this, for a given job: linux64-debug win7-debug osx8-debug commit 1 pass pass pass commit 2 pass pass pass commit 3 pass fail pass commit 4 pass fail pass In this case (assuming the two failures are the same), it's easy for sheriffs to see that commit 3 is the culprit and the one that needs to be backed out. During the experiment, we might see something like this: linux64-debug win7-debug osx8-debug commit 1 pass pass pass commit 2 pass not runnot run commit 3 pass fail pass commit 4 pass not runnot run Here, it isn't obvious whether the problem is caused by commit 2 or commit 3. (This situation already occurs today because of "random" coalescing.) In this case, the sheriffs will backfill missing test data, so we might see: linux64-debug win7-debug osx8-debug commit 1 pass pass pass commit 2 pass pass not run commit 3 pass fail pass commit 4 pass fail not run ...and then they have enough data to determine that commit 3 (and not commit 2) is to blame, and can take the appropriate action. In summary, the sheriffs won't be backing out extra commits because of the coalescing, and it remains the sheriffs' job to backfill tests when they determine they need to do so in order to bisect a failure. We aren't placing any extra burden on developers with this experiment, and part of the reason for this experiment is to determine how much of an extra burden this is for the sheriffs. Jonathan On 8/21/2014 3:03 PM, Jonas Sicking wrote: What will be the policy if a test fails and it's unclear which push caused the regression? Is it the sheriff's job, or the people who pushed's job to figure out which push was the culprit and make sure that that push gets backed out? I.e. if 4 pushes land between two testruns, and we see a regression, will the 4 pushes be backed out? Or will sheriffs run the missing tests and only back out the offending push? / Jonas On Thu, Aug 21, 2014 at 10:50 AM, Jonathan Griffin wrote: Thanks Ed. To paraphrase, no test coverage is being lost here, we're just being a little more deliberate with job coalescing. All tests will be run on all platforms (including debug tests) on a commit before a merge to m-c. Jonathan On 8/21/2014 9:35 AM, Ed Morley wrote: I think much of the pushback in this thread is due to a misunderstanding of some combination of: * our current buildbot scheduling * the proposal * how trees are sheriffed and merged To clarify: 1) We already have coalescing [*] of jobs on all trees apart from try. 2) This coalescing means that all jobs are still run at some point, but just may not run on every push. 3) When failures are detected, coalescing means that regression ranges are larger and so sometimes result in longer tree integration repo closures, whilst the sheriffs force trigger jobs on the revisions that did not originally run them. 4) When merging into mozilla-central, sheriffs ensure that all jobs are green - including those that got coalesced and those that are only scheduled periodically (eg non-unified & PGO builds are only run every 3 hours). (This is a fairly manual process currently, but better tooling should be possible with treeherder). 5) This proposal does not mean debug-only issues are somehow not worth acting on or that they'll end up shipped/on mozilla-central, thanks to #4. 6) This proposal is purely trying to make existing coalescing (#1/#2) more intelligent, to ensure that we expend the finite amount of machine time we have at present on the most appropriate jobs at each point, in order to reduce the impact of #3. Fwiw I'm on the fence as to whether the algorithm suggested in this proposal is the most effective way to aid with #3 - however it's worth trying to find out. Best wishes, Ed [*] Collapsing of pending jobs of the same type, when the queue size is greater than 1. ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Experiment with running debug tests less often on mozilla-inbound the week of August 25
What will be the policy if a test fails and it's unclear which push caused the regression? Is it the sheriff's job, or the people who pushed's job to figure out which push was the culprit and make sure that that push gets backed out? I.e. if 4 pushes land between two testruns, and we see a regression, will the 4 pushes be backed out? Or will sheriffs run the missing tests and only back out the offending push? / Jonas On Thu, Aug 21, 2014 at 10:50 AM, Jonathan Griffin wrote: > Thanks Ed. To paraphrase, no test coverage is being lost here, we're just > being a little more deliberate with job coalescing. All tests will be run > on all platforms (including debug tests) on a commit before a merge to m-c. > > Jonathan > > > On 8/21/2014 9:35 AM, Ed Morley wrote: >> >> I think much of the pushback in this thread is due to a misunderstanding >> of some combination of: >> * our current buildbot scheduling >> * the proposal >> * how trees are sheriffed and merged >> >> To clarify: >> >> 1) We already have coalescing [*] of jobs on all trees apart from try. >> >> 2) This coalescing means that all jobs are still run at some point, but >> just may not run on every push. >> >> 3) When failures are detected, coalescing means that regression ranges are >> larger and so sometimes result in longer tree integration repo closures, >> whilst the sheriffs force trigger jobs on the revisions that did not >> originally run them. >> >> 4) When merging into mozilla-central, sheriffs ensure that all jobs are >> green - including those that got coalesced and those that are only scheduled >> periodically (eg non-unified & PGO builds are only run every 3 hours). (This >> is a fairly manual process currently, but better tooling should be possible >> with treeherder). >> >> 5) This proposal does not mean debug-only issues are somehow not worth >> acting on or that they'll end up shipped/on mozilla-central, thanks to #4. >> >> 6) This proposal is purely trying to make existing coalescing (#1/#2) more >> intelligent, to ensure that we expend the finite amount of machine time we >> have at present on the most appropriate jobs at each point, in order to >> reduce the impact of #3. >> >> Fwiw I'm on the fence as to whether the algorithm suggested in this >> proposal is the most effective way to aid with #3 - however it's worth >> trying to find out. >> >> Best wishes, >> >> Ed >> >> [*] Collapsing of pending jobs of the same type, when the queue size is >> greater than 1. > > > ___ > dev-platform mailing list > dev-platform@lists.mozilla.org > https://lists.mozilla.org/listinfo/dev-platform ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Experiment with running debug tests less often on mozilla-inbound the week of August 25
Hey Martin, This is a good idea, and we've been thinking about approaches like this. Basically, the idea is to run tests that "(nearly) always pass" less often. There are currently some tests that fit into this category, like dom level0,1,2 tests in mochitest-plain, and those are time-consuming to run. Your idea takes this a step further, by identifying tests that sometimes fail, correlating those with code changes, and ensuring those get run. Both of these require some tooling to implement, so we're experimenting initially with approaches that we can get nearly for "free", like running some tests only every other commit, and letting sheriffs trigger the missing tests in case a failure occurs. The ultimate solution may blend a bit of both approaches, and will have to balance implementation cost with the gain we get from the related reduction in slave load. Jonathan On 8/21/2014 10:07 AM, Martin Thomson wrote: On 20/08/14 17:37, Jonas Sicking wrote: It would however be really cool if we were able to pull data on which tests tend to fail in a way that affects all platforms, and which ones tend to fail on one platform only. Here's a potential project that might help. For all of the trees (probably try especially), look at the checkins and for each directory affected build up a probability of failure for each of the tests. You would have to find which commits were on m-c at the time of the run to set the baseline for the checkin; and intermittent failures would add a certain noise floor. The basic idea though is that the information would be very simple to use: For each directory touched in a commit, find all the tests that cross a certain failure threshold across the assembled dataset and ensure that those test groups are run. And this would need to include prerequisites, like builds for the given runs. You would, of course, include builds as tests. Setting the threshold might take some tuning, because failure rates will vary across different test groups. I keep hearing bad things about certain ones, for instance and build failures are far less common than test failures on the whole, naturally. ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Experiment with running debug tests less often on mozilla-inbound the week of August 25
Thanks Ed. To paraphrase, no test coverage is being lost here, we're just being a little more deliberate with job coalescing. All tests will be run on all platforms (including debug tests) on a commit before a merge to m-c. Jonathan On 8/21/2014 9:35 AM, Ed Morley wrote: I think much of the pushback in this thread is due to a misunderstanding of some combination of: * our current buildbot scheduling * the proposal * how trees are sheriffed and merged To clarify: 1) We already have coalescing [*] of jobs on all trees apart from try. 2) This coalescing means that all jobs are still run at some point, but just may not run on every push. 3) When failures are detected, coalescing means that regression ranges are larger and so sometimes result in longer tree integration repo closures, whilst the sheriffs force trigger jobs on the revisions that did not originally run them. 4) When merging into mozilla-central, sheriffs ensure that all jobs are green - including those that got coalesced and those that are only scheduled periodically (eg non-unified & PGO builds are only run every 3 hours). (This is a fairly manual process currently, but better tooling should be possible with treeherder). 5) This proposal does not mean debug-only issues are somehow not worth acting on or that they'll end up shipped/on mozilla-central, thanks to #4. 6) This proposal is purely trying to make existing coalescing (#1/#2) more intelligent, to ensure that we expend the finite amount of machine time we have at present on the most appropriate jobs at each point, in order to reduce the impact of #3. Fwiw I'm on the fence as to whether the algorithm suggested in this proposal is the most effective way to aid with #3 - however it's worth trying to find out. Best wishes, Ed [*] Collapsing of pending jobs of the same type, when the queue size is greater than 1. ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Experiment with running debug tests less often on mozilla-inbound the week of August 25
On 8/21/14 9:35 AM, Ed Morley wrote: 4) When merging into mozilla-central, sheriffs ensure that all jobs are green - including those that got coalesced and those that are only scheduled periodically (eg non-unified & PGO builds are only run every 3 hours). (This is a fairly manual process currently, but better tooling should be possible with treeherder). To ensure that all code landing in mozilla-central has passed debug tests, sheriffs could merge only from the mozilla-inbound changesets that ran the debug tests. chris ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Experiment with running debug tests less often on mozilla-inbound the week of August 25
On 20/08/14 17:37, Jonas Sicking wrote: It would however be really cool if we were able to pull data on which tests tend to fail in a way that affects all platforms, and which ones tend to fail on one platform only. Here's a potential project that might help. For all of the trees (probably try especially), look at the checkins and for each directory affected build up a probability of failure for each of the tests. You would have to find which commits were on m-c at the time of the run to set the baseline for the checkin; and intermittent failures would add a certain noise floor. The basic idea though is that the information would be very simple to use: For each directory touched in a commit, find all the tests that cross a certain failure threshold across the assembled dataset and ensure that those test groups are run. And this would need to include prerequisites, like builds for the given runs. You would, of course, include builds as tests. Setting the threshold might take some tuning, because failure rates will vary across different test groups. I keep hearing bad things about certain ones, for instance and build failures are far less common than test failures on the whole, naturally. ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Experiment with running debug tests less often on mozilla-inbound the week of August 25
I think much of the pushback in this thread is due to a misunderstanding of some combination of: * our current buildbot scheduling * the proposal * how trees are sheriffed and merged To clarify: 1) We already have coalescing [*] of jobs on all trees apart from try. 2) This coalescing means that all jobs are still run at some point, but just may not run on every push. 3) When failures are detected, coalescing means that regression ranges are larger and so sometimes result in longer tree integration repo closures, whilst the sheriffs force trigger jobs on the revisions that did not originally run them. 4) When merging into mozilla-central, sheriffs ensure that all jobs are green - including those that got coalesced and those that are only scheduled periodically (eg non-unified & PGO builds are only run every 3 hours). (This is a fairly manual process currently, but better tooling should be possible with treeherder). 5) This proposal does not mean debug-only issues are somehow not worth acting on or that they'll end up shipped/on mozilla-central, thanks to #4. 6) This proposal is purely trying to make existing coalescing (#1/#2) more intelligent, to ensure that we expend the finite amount of machine time we have at present on the most appropriate jobs at each point, in order to reduce the impact of #3. Fwiw I'm on the fence as to whether the algorithm suggested in this proposal is the most effective way to aid with #3 - however it's worth trying to find out. Best wishes, Ed [*] Collapsing of pending jobs of the same type, when the queue size is greater than 1. ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Experiment with running debug tests less often on mozilla-inbound the week of August 25
-- - Milan On Aug 21, 2014, at 10:12 , Chris AtLee wrote: > On 17:37, Wed, 20 Aug, Jonas Sicking wrote: >> On Wed, Aug 20, 2014 at 4:24 PM, Jeff Gilbert wrote: >>> I have been asked in the past if we really need to run WebGL tests on >>> Android, if they have coverage on Desktop platforms. >>> And then again later, why B2G if we have Android. >>> >>> There seems to be enough belief in test-once-run-everywhere that I feel the >>> need to *firmly* establish that this is not acceptable, at least for the >>> code I work with. >>> I'm happy I'm not alone in this. >> >> I'm a firm believer that we ultimately need to run basically all >> combinations of tests and platforms before allowing code to reach >> mozilla-central. There's lots of platform specific code paths, and >> it's hard to track which tests trigger them, and which don't. > > I think we can agree on this. However, not running all tests on all platforms > per push on mozilla-inbound (or other branch) doesn't mean that they won't be > run on mozilla-central, or even on mozilla-inbound prior to merging. > > I'm a firm believer that running all tests for all platforms for all pushes > is a waste of our infrastructure and human resources. > > I think the gap we need to figure out how to fill is between getting per-push > efficiency and full test coverage prior to merging. The cost of not catching a problem with a test and letting the code land is huge. I only know this for the graphics team, but to Ehsan’s and Jonas’ point, I’m sure it’s not specific to graphics. Now, one is preventative cost (tests), one is treatment cost (fixing issues that snuck through), so it’s sometimes difficult to compare, and we are not alone in first going after the preventative costs, but it’s a big mistake to do so. Now, if we need to save some electricity or cash, I understand that as well, and it eventually translates to the cost to the company the same as people’s time. If we can do something by skipping every n-th debug run, sure, let’s try it. We have to make sure that a failure on a debug test run triggers us going back and re-running the skipped ones, so that we don’t have any gaps in the tests where something may have gone wrong. > >> It would however be really cool if we were able to pull data on which >> tests tend to fail in a way that affects all platforms, and which ones >> tend to fail on one platform only. If we combine this with the ability >> of having tbpl (or treeherder) "fill in the blanks" whenever a test >> fails, it seems like we could run many of our tests only one one >> platform for most checkins to mozilla-inbound. > > There are dozens of really interesting approaches we could take here. > Skipping every nth debug test run is one of the simplest, and I hope we can > learn a lot from the experiment. > > Cheers, > Chris > ___ > dev-platform mailing list > dev-platform@lists.mozilla.org > https://lists.mozilla.org/listinfo/dev-platform ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Experiment with running debug tests less often on mozilla-inbound the week of August 25
On 17:37, Wed, 20 Aug, Jonas Sicking wrote: On Wed, Aug 20, 2014 at 4:24 PM, Jeff Gilbert wrote: I have been asked in the past if we really need to run WebGL tests on Android, if they have coverage on Desktop platforms. And then again later, why B2G if we have Android. There seems to be enough belief in test-once-run-everywhere that I feel the need to *firmly* establish that this is not acceptable, at least for the code I work with. I'm happy I'm not alone in this. I'm a firm believer that we ultimately need to run basically all combinations of tests and platforms before allowing code to reach mozilla-central. There's lots of platform specific code paths, and it's hard to track which tests trigger them, and which don't. I think we can agree on this. However, not running all tests on all platforms per push on mozilla-inbound (or other branch) doesn't mean that they won't be run on mozilla-central, or even on mozilla-inbound prior to merging. I'm a firm believer that running all tests for all platforms for all pushes is a waste of our infrastructure and human resources. I think the gap we need to figure out how to fill is between getting per-push efficiency and full test coverage prior to merging. It would however be really cool if we were able to pull data on which tests tend to fail in a way that affects all platforms, and which ones tend to fail on one platform only. If we combine this with the ability of having tbpl (or treeherder) "fill in the blanks" whenever a test fails, it seems like we could run many of our tests only one one platform for most checkins to mozilla-inbound. There are dozens of really interesting approaches we could take here. Skipping every nth debug test run is one of the simplest, and I hope we can learn a lot from the experiment. Cheers, Chris signature.asc Description: Digital signature ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Experiment with running debug tests less often on mozilla-inbound the week of August 25
On Wed, Aug 20, 2014 at 4:24 PM, Jeff Gilbert wrote: > I have been asked in the past if we really need to run WebGL tests on > Android, if they have coverage on Desktop platforms. > And then again later, why B2G if we have Android. > > There seems to be enough belief in test-once-run-everywhere that I feel the > need to *firmly* establish that this is not acceptable, at least for the code > I work with. > I'm happy I'm not alone in this. I'm a firm believer that we ultimately need to run basically all combinations of tests and platforms before allowing code to reach mozilla-central. There's lots of platform specific code paths, and it's hard to track which tests trigger them, and which don't. It would however be really cool if we were able to pull data on which tests tend to fail in a way that affects all platforms, and which ones tend to fail on one platform only. If we combine this with the ability of having tbpl (or treeherder) "fill in the blanks" whenever a test fails, it seems like we could run many of our tests only one one platform for most checkins to mozilla-inbound. / Jonas ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Experiment with running debug tests less often on mozilla-inbound the week of August 25
> From: "Ehsan Akhgari" > To: "Jeff Gilbert" > Cc: "Chris AtLee" , "Jonathan Griffin" > , dev-platform@lists.mozilla.org > Sent: Wednesday, August 20, 2014 4:00:15 PM > Subject: Re: Experiment with running debug tests less often on > mozilla-inbound the week of August 25 > > On 2014-08-20, 6:29 PM, Jeff Gilbert wrote: > > If running debug tests on a single platform is generally sufficient for > > non-graphics bugs, > > It is not. That is the point I was trying to make. :-) > > > it might be useful to have the Graphics branch run debug tests on all > platforms, for use with graphics checkins. (while running a decreased > number of debug tests on the main branches) It's still possible for > non-graphics code to expose platform-specific bugs, but it's less > likely, so maybe larger regression windows are acceptable for > platform-specific bugs in non-graphics code. > > I don't really understand how graphics is special here. We do have > platform specific code outside of graphics as well, so we don't need to > solve this problem for gfx specifically. > Maybe Graphics isn't that special, but this stuff hits really close to home for us. I have been asked in the past if we really need to run WebGL tests on Android, if they have coverage on Desktop platforms. And then again later, why B2G if we have Android. There seems to be enough belief in test-once-run-everywhere that I feel the need to *firmly* establish that this is not acceptable, at least for the code I work with. I'm happy I'm not alone in this. -Jeff ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Experiment with running debug tests less often on mozilla-inbound the week of August 25
On 2014-08-20, 6:29 PM, Jeff Gilbert wrote: If running debug tests on a single platform is generally sufficient for non-graphics bugs, It is not. That is the point I was trying to make. :-) > it might be useful to have the Graphics branch run debug tests on all platforms, for use with graphics checkins. (while running a decreased number of debug tests on the main branches) It's still possible for non-graphics code to expose platform-specific bugs, but it's less likely, so maybe larger regression windows are acceptable for platform-specific bugs in non-graphics code. I don't really understand how graphics is special here. We do have platform specific code outside of graphics as well, so we don't need to solve this problem for gfx specifically. ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Experiment with running debug tests less often on mozilla-inbound the week of August 25
If running debug tests on a single platform is generally sufficient for non-graphics bugs, it might be useful to have the Graphics branch run debug tests on all platforms, for use with graphics checkins. (while running a decreased number of debug tests on the main branches) It's still possible for non-graphics code to expose platform-specific bugs, but it's less likely, so maybe larger regression windows are acceptable for platform-specific bugs in non-graphics code. -Jeff - Original Message - From: "Ehsan Akhgari" To: "Jeff Gilbert" , "Chris AtLee" Cc: "Jonathan Griffin" , dev-platform@lists.mozilla.org Sent: Wednesday, August 20, 2014 3:16:31 PM Subject: Re: Experiment with running debug tests less often on mozilla-inbound the week of August 25 On 2014-08-20, 5:46 PM, Jeff Gilbert wrote: > Graphics in particular is plagued by non-cross-platform code. Debug coverage > on Linux gives us no practical coverage for our windows, mac, android, or b2g > code. Maybe this is better solved with reviving the Graphics branch, however. Having more branches doesn't necessarily help with consuming less infra resources, unless if the builds will be run with a lower frequency or something. Cheers, Ehsan ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Experiment with running debug tests less often on mozilla-inbound the week of August 25
On 2014-08-20, 5:46 PM, Jeff Gilbert wrote: Graphics in particular is plagued by non-cross-platform code. Debug coverage on Linux gives us no practical coverage for our windows, mac, android, or b2g code. Maybe this is better solved with reviving the Graphics branch, however. Having more branches doesn't necessarily help with consuming less infra resources, unless if the builds will be run with a lower frequency or something. Cheers, Ehsan ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Experiment with running debug tests less often on mozilla-inbound the week of August 25
Graphics in particular is plagued by non-cross-platform code. Debug coverage on Linux gives us no practical coverage for our windows, mac, android, or b2g code. Maybe this is better solved with reviving the Graphics branch, however. -Jeff - Original Message - From: "Chris AtLee" To: "Ehsan Akhgari" Cc: "Jonathan Griffin" , "Jeff Gilbert" , dev-platform@lists.mozilla.org Sent: Wednesday, August 20, 2014 9:02:14 AM Subject: Re: Experiment with running debug tests less often on mozilla-inbound the week of August 25 On 18:25, Tue, 19 Aug, Ehsan Akhgari wrote: >On 2014-08-19, 5:49 PM, Jonathan Griffin wrote: >>On 8/19/2014 2:41 PM, Ehsan Akhgari wrote: >>>On 2014-08-19, 3:57 PM, Jeff Gilbert wrote: >>>>I would actually say that debug tests are more important for >>>>continuous integration than opt tests. At least in code I deal with, >>>>we have a ton of asserts to guarantee behavior, and we really want >>>>test coverage with these via CI. If a test passes on debug, it should >>>>almost certainly pass on opt, just faster. The opposite is not true. >>>> >>>>"They take a long time and then break" is part of what I believe >>>>caused us to not bother with debug testing on much of Android and >>>>B2G, which we still haven't completely fixed. It should be >>>>unacceptable to ship without CI on debug tests, but here we are >>>>anyways. (This is finally nearly fixed, though there is still some >>>>work to do) >>>> >>>>I'm not saying running debug tests less often is on the same scale of >>>>bad, but I would like to express my concerns about heading in that >>>>direction. >>> >>>I second this. I'm curious to know why you picked debug tests for >>>this experiment. Would it not make more sense to run opt tests on >>>desktop platforms on every other run? >>> >>Just based on the fact that they take longer and thus running them less >>frequently would have a larger impact. If there's a broad consensus >>that debug runs are more valuable, we could switch to running opt tests >>less frequently instead. > >Yep, the debug tests indeed take more time, mostly because they run >more checks. :-) The checks in opt builds are not exactly a subset >of the ones in debug builds, but they are close. Based on that, I >think running opt tests on every other push is a more conservative >one, and I support it more. That being said, for this one week >limited trial, given that the sheriffs will help backfill the skipped >tests, I don't care very strongly about this, as long as it doesn't >set the precedence that we can ignore debug tests! I'd like to highlight that we're still planning on running debug linux64 tests for every build. This is based on the assumption that debug-specific failures are generally cross-platform failures as well. Does this help alleviate some concern? Or is that assumption just plain wrong? Cheers, Chris ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Experiment with running debug tests less often on mozilla-inbound the week of August 25
On Wed, Aug 20, 2014 at 03:58:55PM +0100, Ed Morley wrote: > On 19/08/2014 21:55, Benoit Girard wrote: > >I completely agree with Jeff Gilbert on this one. > > > >I think we should try to coalesce -better-. I just checked the current > >state of mozilla-inbound and it doesn't feel any of the current patch > >really need their own set of tests because they're are not time > >sensitive or sufficiently complex. Right now developers are asked to > >create bugs for their own change with their own patch. This leads to a > >lot of little patches being landed by individual developers which > >seems to reflect the current state of mozilla-inbound. > > > >Perhaps we should instead promote checkin-needed (or a similar simple) > >to coalesce simple changes together. Opting into this means that your > >patch may take significantly longer to get merged if it's landed with > >another bad patch and should only be used when that's acceptable. > >Right now developers with commit access are not encouraged to make use > >of checkin-needed AFAIK. If we started recommending against individual > >landings for simple changes, and improved the process, we could > >probably significantly cut the number of tests jobs by cutting the > >number of pushes. > > I agree we should try to coalesce better - however doing this via a manual > "let's get someone to push a bunch of checkin-needed patches in one go" is > suboptimal: > 1) By tweaking coalescing in buildbot & pushing patches individually, we > could get the same build+test job per commit ratio as doing checkin-neededs, > but with the bonus of being able to backfill jobs where needed. This isn't > possible when say 10-20 checkin-neededs are landed in one push, since our > tooling can only trigger (and more importantly display the results of) jobs > on a per push level. It would have been useful on several occasions to be able to trigger builds at changeset level instead of push level, independently of checkin-needed. Mike ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Experiment with running debug tests less often on mozilla-inbound the week of August 25
On 2014-08-20, 12:02 PM, Chris AtLee wrote: On 18:25, Tue, 19 Aug, Ehsan Akhgari wrote: On 2014-08-19, 5:49 PM, Jonathan Griffin wrote: On 8/19/2014 2:41 PM, Ehsan Akhgari wrote: On 2014-08-19, 3:57 PM, Jeff Gilbert wrote: I would actually say that debug tests are more important for continuous integration than opt tests. At least in code I deal with, we have a ton of asserts to guarantee behavior, and we really want test coverage with these via CI. If a test passes on debug, it should almost certainly pass on opt, just faster. The opposite is not true. "They take a long time and then break" is part of what I believe caused us to not bother with debug testing on much of Android and B2G, which we still haven't completely fixed. It should be unacceptable to ship without CI on debug tests, but here we are anyways. (This is finally nearly fixed, though there is still some work to do) I'm not saying running debug tests less often is on the same scale of bad, but I would like to express my concerns about heading in that direction. I second this. I'm curious to know why you picked debug tests for this experiment. Would it not make more sense to run opt tests on desktop platforms on every other run? Just based on the fact that they take longer and thus running them less frequently would have a larger impact. If there's a broad consensus that debug runs are more valuable, we could switch to running opt tests less frequently instead. Yep, the debug tests indeed take more time, mostly because they run more checks. :-) The checks in opt builds are not exactly a subset of the ones in debug builds, but they are close. Based on that, I think running opt tests on every other push is a more conservative one, and I support it more. That being said, for this one week limited trial, given that the sheriffs will help backfill the skipped tests, I don't care very strongly about this, as long as it doesn't set the precedence that we can ignore debug tests! I'd like to highlight that we're still planning on running debug linux64 tests for every build. This is based on the assumption that debug-specific failures are generally cross-platform failures as well. Does this help alleviate some concern? Or is that assumption just plain wrong? well, yes, most of our code is cross platform, but there are debug only checks in our platform specific code as well, so if we're talking about something more permanent than that week long experiment, then running debug tests on Linux64 doesn't alleviate all concerns. But it's fine for this short experiment. Cheers, Ehsan ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Experiment with running debug tests less often on mozilla-inbound the week of August 25
On 18:25, Tue, 19 Aug, Ehsan Akhgari wrote: On 2014-08-19, 5:49 PM, Jonathan Griffin wrote: On 8/19/2014 2:41 PM, Ehsan Akhgari wrote: On 2014-08-19, 3:57 PM, Jeff Gilbert wrote: I would actually say that debug tests are more important for continuous integration than opt tests. At least in code I deal with, we have a ton of asserts to guarantee behavior, and we really want test coverage with these via CI. If a test passes on debug, it should almost certainly pass on opt, just faster. The opposite is not true. "They take a long time and then break" is part of what I believe caused us to not bother with debug testing on much of Android and B2G, which we still haven't completely fixed. It should be unacceptable to ship without CI on debug tests, but here we are anyways. (This is finally nearly fixed, though there is still some work to do) I'm not saying running debug tests less often is on the same scale of bad, but I would like to express my concerns about heading in that direction. I second this. I'm curious to know why you picked debug tests for this experiment. Would it not make more sense to run opt tests on desktop platforms on every other run? Just based on the fact that they take longer and thus running them less frequently would have a larger impact. If there's a broad consensus that debug runs are more valuable, we could switch to running opt tests less frequently instead. Yep, the debug tests indeed take more time, mostly because they run more checks. :-) The checks in opt builds are not exactly a subset of the ones in debug builds, but they are close. Based on that, I think running opt tests on every other push is a more conservative one, and I support it more. That being said, for this one week limited trial, given that the sheriffs will help backfill the skipped tests, I don't care very strongly about this, as long as it doesn't set the precedence that we can ignore debug tests! I'd like to highlight that we're still planning on running debug linux64 tests for every build. This is based on the assumption that debug-specific failures are generally cross-platform failures as well. Does this help alleviate some concern? Or is that assumption just plain wrong? Cheers, Chris signature.asc Description: Digital signature ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Experiment with running debug tests less often on mozilla-inbound the week of August 25
On 14-08-20 11:48 AM, Ehsan Akhgari wrote: > On 2014-08-20, 10:58 AM, Ed Morley wrote: >> On 19/08/2014 21:55, Benoit Girard wrote: >>> I completely agree with Jeff Gilbert on this one. >>> >>> I think we should try to coalesce -better-. I just checked the current >>> state of mozilla-inbound and it doesn't feel any of the current patch >>> really need their own set of tests because they're are not time >>> sensitive or sufficiently complex. Right now developers are asked to >>> create bugs for their own change with their own patch. This leads to a >>> lot of little patches being landed by individual developers which >>> seems to reflect the current state of mozilla-inbound. >>> >>> Perhaps we should instead promote checkin-needed (or a similar simple) >>> to coalesce simple changes together. Opting into this means that your >>> patch may take significantly longer to get merged if it's landed with >>> another bad patch and should only be used when that's acceptable. >>> Right now developers with commit access are not encouraged to make use >>> of checkin-needed AFAIK. If we started recommending against individual >>> landings for simple changes, and improved the process, we could >>> probably significantly cut the number of tests jobs by cutting the >>> number of pushes. >> >> I agree we should try to coalesce better - however doing this via a >> manual "let's get someone to push a bunch of checkin-needed patches in >> one go" is suboptimal: >> 1) By tweaking coalescing in buildbot & pushing patches individually, we >> could get the same build+test job per commit ratio as doing >> checkin-neededs, but with the bonus of being able to backfill jobs where >> needed. This isn't possible when say 10-20 checkin-neededs are landed in >> one push, since our tooling can only trigger (and more importantly >> display the results of) jobs on a per push level. >> 2) Tooling can help make these decisions much more effectively and >> quickly than someone picking through bugs - ie we should expand the >> current "only schedule job X if directory Y changed" buildbotcustom >> logic further. >> 3) Adding a human in the workflow increases r+-to-committed cycle times, >> uses up scarce sheriff time, and also means the person who wrote the >> patch is not the one landing it, and so someone unfamiliar with the code >> often ends up being the one to resolve conflicts. We should be using >> tooling, not human cycles to lands patches in a repo (ie the >> long-promised autoland). > > Another approach to this would be to identify more patterns that allow > us to skip some jobs. We already do this for very simple things > (changes to a file in browser/ won't trigger b2g and Android builds, for > example), but I'm sure we could do more. For example, changes to files > under widget/ may only affect one platform, depending on which > directories you touch. Another idea that I have had is adding some > smarts to make it possible to parse the test manifest files, and > recognize things such as skip-if, to figure out what platforms a test > only change for example might not affect, and skip the builds and tests > on those platforms. > > One thing to note is that there is going to be a *long* tail of these > types of heuristics that we could come up with, so it would be nice to > try to only address the ones that provide the most benefits. For that, > someone needs to look at the recent N commits on a given branch and > figure out what jobs we _could_ have safely skipped for each one. If someone wants to have a look at doing this more intelligently, the relevant code is reasonably isolated in https://github.com/mozilla/build-buildbotcustom/blob/master/misc.py#L127. The object received there is a Buildbot Change object, which contains most (all) of the metadata about a revision: https://mxr.mozilla.org/build-central/source/buildbot/master/buildbot/changes/changes.py#11 I believe this is called once for every revision in a push. ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Experiment with running debug tests less often on mozilla-inbound the week of August 25
On 2014-08-20, 10:58 AM, Ed Morley wrote: On 19/08/2014 21:55, Benoit Girard wrote: I completely agree with Jeff Gilbert on this one. I think we should try to coalesce -better-. I just checked the current state of mozilla-inbound and it doesn't feel any of the current patch really need their own set of tests because they're are not time sensitive or sufficiently complex. Right now developers are asked to create bugs for their own change with their own patch. This leads to a lot of little patches being landed by individual developers which seems to reflect the current state of mozilla-inbound. Perhaps we should instead promote checkin-needed (or a similar simple) to coalesce simple changes together. Opting into this means that your patch may take significantly longer to get merged if it's landed with another bad patch and should only be used when that's acceptable. Right now developers with commit access are not encouraged to make use of checkin-needed AFAIK. If we started recommending against individual landings for simple changes, and improved the process, we could probably significantly cut the number of tests jobs by cutting the number of pushes. I agree we should try to coalesce better - however doing this via a manual "let's get someone to push a bunch of checkin-needed patches in one go" is suboptimal: 1) By tweaking coalescing in buildbot & pushing patches individually, we could get the same build+test job per commit ratio as doing checkin-neededs, but with the bonus of being able to backfill jobs where needed. This isn't possible when say 10-20 checkin-neededs are landed in one push, since our tooling can only trigger (and more importantly display the results of) jobs on a per push level. 2) Tooling can help make these decisions much more effectively and quickly than someone picking through bugs - ie we should expand the current "only schedule job X if directory Y changed" buildbotcustom logic further. 3) Adding a human in the workflow increases r+-to-committed cycle times, uses up scarce sheriff time, and also means the person who wrote the patch is not the one landing it, and so someone unfamiliar with the code often ends up being the one to resolve conflicts. We should be using tooling, not human cycles to lands patches in a repo (ie the long-promised autoland). Another approach to this would be to identify more patterns that allow us to skip some jobs. We already do this for very simple things (changes to a file in browser/ won't trigger b2g and Android builds, for example), but I'm sure we could do more. For example, changes to files under widget/ may only affect one platform, depending on which directories you touch. Another idea that I have had is adding some smarts to make it possible to parse the test manifest files, and recognize things such as skip-if, to figure out what platforms a test only change for example might not affect, and skip the builds and tests on those platforms. One thing to note is that there is going to be a *long* tail of these types of heuristics that we could come up with, so it would be nice to try to only address the ones that provide the most benefits. For that, someone needs to look at the recent N commits on a given branch and figure out what jobs we _could_ have safely skipped for each one. Cheers, Ehsan ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Experiment with running debug tests less often on mozilla-inbound the week of August 25
On 19/08/2014 21:55, Benoit Girard wrote: I completely agree with Jeff Gilbert on this one. I think we should try to coalesce -better-. I just checked the current state of mozilla-inbound and it doesn't feel any of the current patch really need their own set of tests because they're are not time sensitive or sufficiently complex. Right now developers are asked to create bugs for their own change with their own patch. This leads to a lot of little patches being landed by individual developers which seems to reflect the current state of mozilla-inbound. Perhaps we should instead promote checkin-needed (or a similar simple) to coalesce simple changes together. Opting into this means that your patch may take significantly longer to get merged if it's landed with another bad patch and should only be used when that's acceptable. Right now developers with commit access are not encouraged to make use of checkin-needed AFAIK. If we started recommending against individual landings for simple changes, and improved the process, we could probably significantly cut the number of tests jobs by cutting the number of pushes. I agree we should try to coalesce better - however doing this via a manual "let's get someone to push a bunch of checkin-needed patches in one go" is suboptimal: 1) By tweaking coalescing in buildbot & pushing patches individually, we could get the same build+test job per commit ratio as doing checkin-neededs, but with the bonus of being able to backfill jobs where needed. This isn't possible when say 10-20 checkin-neededs are landed in one push, since our tooling can only trigger (and more importantly display the results of) jobs on a per push level. 2) Tooling can help make these decisions much more effectively and quickly than someone picking through bugs - ie we should expand the current "only schedule job X if directory Y changed" buildbotcustom logic further. 3) Adding a human in the workflow increases r+-to-committed cycle times, uses up scarce sheriff time, and also means the person who wrote the patch is not the one landing it, and so someone unfamiliar with the code often ends up being the one to resolve conflicts. We should be using tooling, not human cycles to lands patches in a repo (ie the long-promised autoland). Best wishes, Ed ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Experiment with running debug tests less often on mozilla-inbound the week of August 25
On 8/20/2014 3:07 AM, Mike Hommey wrote: Optimized builds have been the default for a while, if not ever[1]. Bug 54828 made optimized builds the default in 2004 right before we released Firefox 1.0. It only took four years to make that decision ;-) --BDS ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Experiment with running debug tests less often on mozilla-inbound the week of August 25
On Tue, Aug 19, 2014 at 11:26:42PM -0700, Jeff Gilbert wrote: > I was just going to ask about this. I glanced through the mozconfigs > in the tree for at least Linux debug, but it looks like it only has > --enable-debug, not even -O1. Maybe it's buried somewhere in there, > but I didn't find it with a quick look. > > I took a look at the build log for WinXP debug, and --enable-opt is > only present on the configure line for nspr, whereas --enable-debug is > in a number of other places. Optimized builds have been the default for a while, if not ever[1]. So unless you add an explicit --disable-optimize, you still get an optimized build, whether you use --enable-debug or not. As a matter of fact, we *did* have --disable-optimize in the debug build mozconfigs, but that was removed 3 years ago, in bug 669953. Mike 1. At least, it was the case in the oldest tree we have in mercurial. ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Experiment with running debug tests less often on mozilla-inbound the week of August 25
I was just going to ask about this. I glanced through the mozconfigs in the tree for at least Linux debug, but it looks like it only has --enable-debug, not even -O1. Maybe it's buried somewhere in there, but I didn't find it with a quick look. I took a look at the build log for WinXP debug, and --enable-opt is only present on the configure line for nspr, whereas --enable-debug is in a number of other places. Can we get confirmation for whether debug builds are (partially?) optimized? If not, we should do that. (Unless I'm missing a reason not to, especially if we only care about pass/fail, and not crash stacks/debugability) -Jeff - Original Message - From: "Kyle Huey" To: "Joshua Cranmer 🐧" Cc: "dev-platform" Sent: Tuesday, August 19, 2014 3:56:27 PM Subject: Re: Experiment with running debug tests less often on mozilla-inbound the week of August 25 I'm pretty sure the debug builds on our CI infrastructure are built with optimization. - Kyle On Tue, Aug 19, 2014 at 3:42 PM, Joshua Cranmer 🐧 wrote: > On 8/19/2014 5:25 PM, Ehsan Akhgari wrote: >> >> Yep, the debug tests indeed take more time, mostly because they run more >> checks. > > > Actually, the bigger cause in the slowdown is probably that debug tests > don't have any optimizations, not more checks. An atomic increment on a > debug build invokes something like a hundred instructions (including several > call instructions) whereas the equivalent operation on an opt build is just > one. > > -- > Joshua Cranmer > Thunderbird and DXR developer > Source code archæologist > > > ___ > dev-platform mailing list > dev-platform@lists.mozilla.org > https://lists.mozilla.org/listinfo/dev-platform ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Experiment with running debug tests less often on mozilla-inbound the week of August 25
I'm pretty sure the debug builds on our CI infrastructure are built with optimization. - Kyle On Tue, Aug 19, 2014 at 3:42 PM, Joshua Cranmer 🐧 wrote: > On 8/19/2014 5:25 PM, Ehsan Akhgari wrote: >> >> Yep, the debug tests indeed take more time, mostly because they run more >> checks. > > > Actually, the bigger cause in the slowdown is probably that debug tests > don't have any optimizations, not more checks. An atomic increment on a > debug build invokes something like a hundred instructions (including several > call instructions) whereas the equivalent operation on an opt build is just > one. > > -- > Joshua Cranmer > Thunderbird and DXR developer > Source code archæologist > > > ___ > dev-platform mailing list > dev-platform@lists.mozilla.org > https://lists.mozilla.org/listinfo/dev-platform ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Experiment with running debug tests less often on mozilla-inbound the week of August 25
On 8/19/2014 5:25 PM, Ehsan Akhgari wrote: Yep, the debug tests indeed take more time, mostly because they run more checks. Actually, the bigger cause in the slowdown is probably that debug tests don't have any optimizations, not more checks. An atomic increment on a debug build invokes something like a hundred instructions (including several call instructions) whereas the equivalent operation on an opt build is just one. -- Joshua Cranmer Thunderbird and DXR developer Source code archæologist ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Experiment with running debug tests less often on mozilla-inbound the week of August 25
No, fx-team is not affected by this experiment; we intend to target mozilla-inbound only for this 1-week trial. The reason is that the number of commits on m-i seems larger than fx-team, and therefore the impacts should be more visible. Jonathan On 8/19/2014 3:19 PM, Matthew N. wrote: On 8/19/14 12:22 PM, Jonathan Griffin wrote: To assess the impact of doing this, we will be performing an experiment the week of August 25, in which we will run debug tests on mozilla-inbound on most desktop platforms every other run, instead of every run as we do now. Debug tests on linux64 will continue to run every time. Non-desktop platforms and trees other than mozilla-inbound will not be affected. To clarify, is fx-team affected by this change? I ask because you mention "desktop" and that is where the desktop front-end team does landings. I suspect fx-team landings are less likely to hit debug-only issues than mozilla-inbound as fx-team has much fewer C++ changes and anecdotally JS-only changes seem to trigger debug-only failures less often. This approach is based on the premise that the number of debug-only platform-specific failures on desktop is low enough to be manageable, and that the extra burden this imposes on the sheriffs will be small enough compared to the improvement in test slave metrics to justify the cost. FWIW, I think fx-team is more desktop-specific (although Android front-end stuff also lands there and I'm not familiar with that). MattN ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Experiment with running debug tests less often on mozilla-inbound the week of August 25
On Tue, Aug 19, 2014 at 02:49:48PM -0700, Jonathan Griffin wrote: > On 8/19/2014 2:41 PM, Ehsan Akhgari wrote: > >On 2014-08-19, 3:57 PM, Jeff Gilbert wrote: > >>I would actually say that debug tests are more important for continuous > >>integration than opt tests. At least in code I deal with, we have a ton > >>of asserts to guarantee behavior, and we really want test coverage with > >>these via CI. If a test passes on debug, it should almost certainly pass > >>on opt, just faster. The opposite is not true. > >> > >>"They take a long time and then break" is part of what I believe caused > >>us to not bother with debug testing on much of Android and B2G, which we > >>still haven't completely fixed. It should be unacceptable to ship > >>without CI on debug tests, but here we are anyways. (This is finally > >>nearly fixed, though there is still some work to do) > >> > >>I'm not saying running debug tests less often is on the same scale of > >>bad, but I would like to express my concerns about heading in that > >>direction. > > > >I second this. I'm curious to know why you picked debug tests for this > >experiment. Would it not make more sense to run opt tests on desktop > >platforms on every other run? > > > Just based on the fact that they take longer and thus running them less > frequently would have a larger impact. If there's a broad consensus that > debug runs are more valuable, we could switch to running opt tests less > frequently instead. It seems to me our goal here is basically to pick so that the expected time to detect bustage is minimized without increasing the maximum time it can take to detect bustage. That is take p(d) to be the probability only debug tests will fail, p(o) the probability only opt tests will fail, and p(b) the probability both will fail. Then take t(d) and t(o) the time for a debug and opt test to run respectively. Now you want to decide which to run first debug or opt. you'd expect that if you choose debug you'd expect to detect bustage in (p(d) + p(b)) * t(d) + p(o) * (t(o) + t(d)) which simplifies to t(d) + p(o) * t(o) On the other hand if you choose to test opt first you get t(o) + p(d) * t(d) I suspect we all agree t(d) > t(o) and it seems likely p(d) > p(o), but it should be clear which is the better choice depends on the exact values of those numbers (and this is not a good model of reality in many ways). Trev > > Jonathan > ___ > dev-platform mailing list > dev-platform@lists.mozilla.org > https://lists.mozilla.org/listinfo/dev-platform ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Experiment with running debug tests less often on mozilla-inbound the week of August 25
I know this is tangential but the small changes are the least tested changes in my experience. The try push requirement for checkin-needed has had a wonderful impact on the amount of times the tree is closed[1]. The tree is less likely to be closed these days. David [1] http://futurama.theautomatedtester.co.uk/ On 19/08/2014 22:04, Ralph Giles wrote: On 2014-08-19 1:55 PM, Benoit Girard wrote: Perhaps we should instead promote checkin-needed (or a similar simple) to coalesce simple changes together. I would prefer to use 'checkin-needed' for more things, but am blocked by the try-needed requirement. We need some way to bless small changes for inbound without a try push. Look up the author's commit access maybe? -r ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Experiment with running debug tests less often on mozilla-inbound the week of August 25
On 2014-08-19, 5:49 PM, Jonathan Griffin wrote: On 8/19/2014 2:41 PM, Ehsan Akhgari wrote: On 2014-08-19, 3:57 PM, Jeff Gilbert wrote: I would actually say that debug tests are more important for continuous integration than opt tests. At least in code I deal with, we have a ton of asserts to guarantee behavior, and we really want test coverage with these via CI. If a test passes on debug, it should almost certainly pass on opt, just faster. The opposite is not true. "They take a long time and then break" is part of what I believe caused us to not bother with debug testing on much of Android and B2G, which we still haven't completely fixed. It should be unacceptable to ship without CI on debug tests, but here we are anyways. (This is finally nearly fixed, though there is still some work to do) I'm not saying running debug tests less often is on the same scale of bad, but I would like to express my concerns about heading in that direction. I second this. I'm curious to know why you picked debug tests for this experiment. Would it not make more sense to run opt tests on desktop platforms on every other run? Just based on the fact that they take longer and thus running them less frequently would have a larger impact. If there's a broad consensus that debug runs are more valuable, we could switch to running opt tests less frequently instead. Yep, the debug tests indeed take more time, mostly because they run more checks. :-) The checks in opt builds are not exactly a subset of the ones in debug builds, but they are close. Based on that, I think running opt tests on every other push is a more conservative one, and I support it more. That being said, for this one week limited trial, given that the sheriffs will help backfill the skipped tests, I don't care very strongly about this, as long as it doesn't set the precedence that we can ignore debug tests! Cheers, Ehsan ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Experiment with running debug tests less often on mozilla-inbound the week of August 25
On 8/19/14 12:22 PM, Jonathan Griffin wrote: To assess the impact of doing this, we will be performing an experiment the week of August 25, in which we will run debug tests on mozilla-inbound on most desktop platforms every other run, instead of every run as we do now. Debug tests on linux64 will continue to run every time. Non-desktop platforms and trees other than mozilla-inbound will not be affected. To clarify, is fx-team affected by this change? I ask because you mention "desktop" and that is where the desktop front-end team does landings. I suspect fx-team landings are less likely to hit debug-only issues than mozilla-inbound as fx-team has much fewer C++ changes and anecdotally JS-only changes seem to trigger debug-only failures less often. This approach is based on the premise that the number of debug-only platform-specific failures on desktop is low enough to be manageable, and that the extra burden this imposes on the sheriffs will be small enough compared to the improvement in test slave metrics to justify the cost. FWIW, I think fx-team is more desktop-specific (although Android front-end stuff also lands there and I'm not familiar with that). MattN ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Experiment with running debug tests less often on mozilla-inbound the week of August 25
On 8/19/2014 2:41 PM, Ehsan Akhgari wrote: On 2014-08-19, 3:57 PM, Jeff Gilbert wrote: I would actually say that debug tests are more important for continuous integration than opt tests. At least in code I deal with, we have a ton of asserts to guarantee behavior, and we really want test coverage with these via CI. If a test passes on debug, it should almost certainly pass on opt, just faster. The opposite is not true. "They take a long time and then break" is part of what I believe caused us to not bother with debug testing on much of Android and B2G, which we still haven't completely fixed. It should be unacceptable to ship without CI on debug tests, but here we are anyways. (This is finally nearly fixed, though there is still some work to do) I'm not saying running debug tests less often is on the same scale of bad, but I would like to express my concerns about heading in that direction. I second this. I'm curious to know why you picked debug tests for this experiment. Would it not make more sense to run opt tests on desktop platforms on every other run? Just based on the fact that they take longer and thus running them less frequently would have a larger impact. If there's a broad consensus that debug runs are more valuable, we could switch to running opt tests less frequently instead. Jonathan ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Experiment with running debug tests less often on mozilla-inbound the week of August 25
I also agree about coalescing better. We are looking at ways to do that in conjunction with https://wiki.mozilla.org/Auto-tools/Projects/Autoland, which we'll have a prototype of by the end of the quarter. In this model, commits that are going through autoland could be coalesced when landing on inbound, which would reduce slave load on all platforms. Until that's deployed and in widespread use, we have other options to decrease slave load, and this experiment is the simplest. It won't result in reduced test coverage, since sheriffs will backfill in the case of a regression. Essentially, we're not running tests that would have passed anyway. Depending on feedback we receive after this experiment, we may opt to change our approach in the future: i.e., run tests every Nth opt build instead of debug build, or try to identify sets of "never failing" tests and just run those less frequently, or always include at least one flavor of Windows, OSX and Linux on every commit, etc. Regards, Jonathan On 8/19/2014 1:55 PM, Benoit Girard wrote: I completely agree with Jeff Gilbert on this one. I think we should try to coalesce -better-. I just checked the current state of mozilla-inbound and it doesn't feel any of the current patch really need their own set of tests because they're are not time sensitive or sufficiently complex. Right now developers are asked to create bugs for their own change with their own patch. This leads to a lot of little patches being landed by individual developers which seems to reflect the current state of mozilla-inbound. Perhaps we should instead promote checkin-needed (or a similar simple) to coalesce simple changes together. Opting into this means that your patch may take significantly longer to get merged if it's landed with another bad patch and should only be used when that's acceptable. Right now developers with commit access are not encouraged to make use of checkin-needed AFAIK. If we started recommending against individual landings for simple changes, and improved the process, we could probably significantly cut the number of tests jobs by cutting the number of pushes. On Tue, Aug 19, 2014 at 3:57 PM, Jeff Gilbert wrote: I would actually say that debug tests are more important for continuous integration than opt tests. At least in code I deal with, we have a ton of asserts to guarantee behavior, and we really want test coverage with these via CI. If a test passes on debug, it should almost certainly pass on opt, just faster. The opposite is not true. "They take a long time and then break" is part of what I believe caused us to not bother with debug testing on much of Android and B2G, which we still haven't completely fixed. It should be unacceptable to ship without CI on debug tests, but here we are anyways. (This is finally nearly fixed, though there is still some work to do) I'm not saying running debug tests less often is on the same scale of bad, but I would like to express my concerns about heading in that direction. -Jeff - Original Message - From: "Jonathan Griffin" To: dev-platform@lists.mozilla.org Sent: Tuesday, August 19, 2014 12:22:21 PM Subject: Experiment with running debug tests less often on mozilla-inbound the week of August 25 Our pools of test slaves are often at or over capacity, and this has the effect of increasing job coalescing and test wait times. This, in turn, can lead to longer tree closures caused by test bustage, and can cause try runs to be very slow to complete. One of the easiest ways to mitigate this is to run tests less often. To assess the impact of doing this, we will be performing an experiment the week of August 25, in which we will run debug tests on mozilla-inbound on most desktop platforms every other run, instead of every run as we do now. Debug tests on linux64 will continue to run every time. Non-desktop platforms and trees other than mozilla-inbound will not be affected. This approach is based on the premise that the number of debug-only platform-specific failures on desktop is low enough to be manageable, and that the extra burden this imposes on the sheriffs will be small enough compared to the improvement in test slave metrics to justify the cost. While this experiment is in progress, we will be monitoring job coalescing and test wait times, as well as impacts on sheriffs and developers. If the experiment causes sheriffs to be unable to perform their job effectively, it can be terminated prematurely. We intend to use the data we collect during the experiment to inform decisions about additional tooling we need to make this or a similar plan permanent at some point in the future, as well as validating the premise on which this experiment is based. After the conclusion of this experiment, a follow-up post will be made which will discuss
Re: Experiment with running debug tests less often on mozilla-inbound the week of August 25
On 2014-08-19, 3:57 PM, Jeff Gilbert wrote: I would actually say that debug tests are more important for continuous integration than opt tests. At least in code I deal with, we have a ton of asserts to guarantee behavior, and we really want test coverage with these via CI. If a test passes on debug, it should almost certainly pass on opt, just faster. The opposite is not true. "They take a long time and then break" is part of what I believe caused us to not bother with debug testing on much of Android and B2G, which we still haven't completely fixed. It should be unacceptable to ship without CI on debug tests, but here we are anyways. (This is finally nearly fixed, though there is still some work to do) I'm not saying running debug tests less often is on the same scale of bad, but I would like to express my concerns about heading in that direction. I second this. I'm curious to know why you picked debug tests for this experiment. Would it not make more sense to run opt tests on desktop platforms on every other run? Cheers, Ehsan ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Experiment with running debug tests less often on mozilla-inbound the week of August 25
On 2014-08-19 1:55 PM, Benoit Girard wrote: > Perhaps we should instead promote checkin-needed (or a similar simple) > to coalesce simple changes together. I would prefer to use 'checkin-needed' for more things, but am blocked by the try-needed requirement. We need some way to bless small changes for inbound without a try push. Look up the author's commit access maybe? -r ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Experiment with running debug tests less often on mozilla-inbound the week of August 25
I completely agree with Jeff Gilbert on this one. I think we should try to coalesce -better-. I just checked the current state of mozilla-inbound and it doesn't feel any of the current patch really need their own set of tests because they're are not time sensitive or sufficiently complex. Right now developers are asked to create bugs for their own change with their own patch. This leads to a lot of little patches being landed by individual developers which seems to reflect the current state of mozilla-inbound. Perhaps we should instead promote checkin-needed (or a similar simple) to coalesce simple changes together. Opting into this means that your patch may take significantly longer to get merged if it's landed with another bad patch and should only be used when that's acceptable. Right now developers with commit access are not encouraged to make use of checkin-needed AFAIK. If we started recommending against individual landings for simple changes, and improved the process, we could probably significantly cut the number of tests jobs by cutting the number of pushes. On Tue, Aug 19, 2014 at 3:57 PM, Jeff Gilbert wrote: > I would actually say that debug tests are more important for continuous > integration than opt tests. At least in code I deal with, we have a ton of > asserts to guarantee behavior, and we really want test coverage with these > via CI. If a test passes on debug, it should almost certainly pass on opt, > just faster. The opposite is not true. > > "They take a long time and then break" is part of what I believe caused us to > not bother with debug testing on much of Android and B2G, which we still > haven't completely fixed. It should be unacceptable to ship without CI on > debug tests, but here we are anyways. (This is finally nearly fixed, though > there is still some work to do) > > I'm not saying running debug tests less often is on the same scale of bad, > but I would like to express my concerns about heading in that direction. > > -Jeff > > - Original Message - > From: "Jonathan Griffin" > To: dev-platform@lists.mozilla.org > Sent: Tuesday, August 19, 2014 12:22:21 PM > Subject: Experiment with running debug tests less often on mozilla-inbound > the week of August 25 > > Our pools of test slaves are often at or over capacity, and this has the > effect of increasing job coalescing and test wait times. This, in turn, > can lead to longer tree closures caused by test bustage, and can cause > try runs to be very slow to complete. > > One of the easiest ways to mitigate this is to run tests less often. > > To assess the impact of doing this, we will be performing an experiment > the week of August 25, in which we will run debug tests on > mozilla-inbound on most desktop platforms every other run, instead of > every run as we do now. Debug tests on linux64 will continue to run > every time. Non-desktop platforms and trees other than mozilla-inbound > will not be affected. > > This approach is based on the premise that the number of debug-only > platform-specific failures on desktop is low enough to be manageable, > and that the extra burden this imposes on the sheriffs will be small > enough compared to the improvement in test slave metrics to justify the > cost. > > While this experiment is in progress, we will be monitoring job > coalescing and test wait times, as well as impacts on sheriffs and > developers. If the experiment causes sheriffs to be unable to perform > their job effectively, it can be terminated prematurely. > > We intend to use the data we collect during the experiment to inform > decisions about additional tooling we need to make this or a similar > plan permanent at some point in the future, as well as validating the > premise on which this experiment is based. > > After the conclusion of this experiment, a follow-up post will be made > which will discuss our findings. If you have any concerns, feel free to > reach out to me. > > Jonathan > > ___ > dev-platform mailing list > dev-platform@lists.mozilla.org > https://lists.mozilla.org/listinfo/dev-platform > ___ > dev-platform mailing list > dev-platform@lists.mozilla.org > https://lists.mozilla.org/listinfo/dev-platform ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Experiment with running debug tests less often on mozilla-inbound the week of August 25
I would actually say that debug tests are more important for continuous integration than opt tests. At least in code I deal with, we have a ton of asserts to guarantee behavior, and we really want test coverage with these via CI. If a test passes on debug, it should almost certainly pass on opt, just faster. The opposite is not true. "They take a long time and then break" is part of what I believe caused us to not bother with debug testing on much of Android and B2G, which we still haven't completely fixed. It should be unacceptable to ship without CI on debug tests, but here we are anyways. (This is finally nearly fixed, though there is still some work to do) I'm not saying running debug tests less often is on the same scale of bad, but I would like to express my concerns about heading in that direction. -Jeff - Original Message - From: "Jonathan Griffin" To: dev-platform@lists.mozilla.org Sent: Tuesday, August 19, 2014 12:22:21 PM Subject: Experiment with running debug tests less often on mozilla-inbound the week of August 25 Our pools of test slaves are often at or over capacity, and this has the effect of increasing job coalescing and test wait times. This, in turn, can lead to longer tree closures caused by test bustage, and can cause try runs to be very slow to complete. One of the easiest ways to mitigate this is to run tests less often. To assess the impact of doing this, we will be performing an experiment the week of August 25, in which we will run debug tests on mozilla-inbound on most desktop platforms every other run, instead of every run as we do now. Debug tests on linux64 will continue to run every time. Non-desktop platforms and trees other than mozilla-inbound will not be affected. This approach is based on the premise that the number of debug-only platform-specific failures on desktop is low enough to be manageable, and that the extra burden this imposes on the sheriffs will be small enough compared to the improvement in test slave metrics to justify the cost. While this experiment is in progress, we will be monitoring job coalescing and test wait times, as well as impacts on sheriffs and developers. If the experiment causes sheriffs to be unable to perform their job effectively, it can be terminated prematurely. We intend to use the data we collect during the experiment to inform decisions about additional tooling we need to make this or a similar plan permanent at some point in the future, as well as validating the premise on which this experiment is based. After the conclusion of this experiment, a follow-up post will be made which will discuss our findings. If you have any concerns, feel free to reach out to me. Jonathan ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Experiment with running debug tests less often on mozilla-inbound the week of August 25
Our pools of test slaves are often at or over capacity, and this has the effect of increasing job coalescing and test wait times. This, in turn, can lead to longer tree closures caused by test bustage, and can cause try runs to be very slow to complete. One of the easiest ways to mitigate this is to run tests less often. To assess the impact of doing this, we will be performing an experiment the week of August 25, in which we will run debug tests on mozilla-inbound on most desktop platforms every other run, instead of every run as we do now. Debug tests on linux64 will continue to run every time. Non-desktop platforms and trees other than mozilla-inbound will not be affected. This approach is based on the premise that the number of debug-only platform-specific failures on desktop is low enough to be manageable, and that the extra burden this imposes on the sheriffs will be small enough compared to the improvement in test slave metrics to justify the cost. While this experiment is in progress, we will be monitoring job coalescing and test wait times, as well as impacts on sheriffs and developers. If the experiment causes sheriffs to be unable to perform their job effectively, it can be terminated prematurely. We intend to use the data we collect during the experiment to inform decisions about additional tooling we need to make this or a similar plan permanent at some point in the future, as well as validating the premise on which this experiment is based. After the conclusion of this experiment, a follow-up post will be made which will discuss our findings. If you have any concerns, feel free to reach out to me. Jonathan ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform