Re: Policy for disabling tests which run on TBPL
On 07/04/14 04:33, Andrew Halberstadt wrote: On 06/04/14 08:59 AM, Aryeh Gregor wrote: On Sat, Apr 5, 2014 at 12:00 AM, Ehsan Akhgari ehsan.akhg...@gmail.com wrote: Note that is only accurate to a certain point. There are other things which we can do to guesswork our way out of the situation for Autoland, but of course they're resource/time intensive (basically running orange tests over and over again, etc.) Is there any reason in principle that we couldn't have the test runner automatically rerun tests with known intermittent failures a few times, and let the test pass if it passes a few times in a row after the first fail? This would be a much nicer option than disabling the test entirely, and would still mean the test is mostly effective, particularly if only specific failure messages are allowed to be auto-retried. Many of our test runners have that ability. But doing this implies that intermittents are always the fault of the test. We'd be missing whole classes of regressions (notably race conditions). In practice how effective are we at identifying bugs that lead to instability? Is it more common that we end up disabling the test, or marking it as known intermittent and learning to live with the instability, both of which options reduce the test coverage, or is it more common that we realise that there is a code reason for the intermittent, and get it fixed? If it is the latter then making the instability as obvious as possible makes sense, and the current setup where we run each test once can be regarded as a compromise between the ideal setup where we run each test multiple times and flag it as a fail if it ever fails, and the needs of performance. If the former is true, it makes a lot more sense to do reruns of the tests that fail in order to keep them active at all, and store information about the fact that reruns occurred so that we can see when a test started giving unexpected results. This does rely on having some mechanism to make people care about genuine intermittents that they caused, but maybe the right way to do that is to have some batch tool that takes all the tests that have become intermittent, and does reruns until it has identified the commits that introduced the intermittency, and then files P1 bugs on the developer(s) it identifies to fix their bugs. ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Formalising the current 'mozilla-central essential pushes only' recommendation
(Follow-ups to dev.tree-management please) Hi all :-) The vast majority of mozilla-central landings are now via curated merges from integration/team repositories. This dramatically increases the chance that the tip of mozilla-central is in a known-good state, meaning that: * Integration/project repos pulling from mozilla-central are less likely to receive breakage from elsewhere. * A bad landing + tree closure on one integration/team repo doesn't impact the ability of other repositories to merge into mozilla-central. * Developers have the choice of a safer repository to use as their qbase for local development / try pushes. * Nightly builds are less prone to being broken requiring a respin [1]. Non-critical mozilla-central landings are already discouraged and as such are rare. However, the sheriffs [2] would like to formalise this, by adjusting the mozilla-central tree rules [3] to state that direct pushes must be for one of the following reasons: 1) Merging from an integration/team/project repository (there is no restriction on who may make these merges). 2) Automated blocklist / HSTS preload list updates [4]. 3) For the resolution (ie: backout or follow-up fix) of critical regressions (eg: top-crashers or other major functional regression) that will result in a Nightly respin or must make the imminent scheduled Nightly at all costs. 4) Anything else for which common sense (or asking in #developers) says is an appropriate reason for a direct landing on mozilla-central. Clearly #4 is very fuzzy - but I'm hopeful self-policing has a good chance of success and so would like to try that first. Note #4 would not include the landing of new features directly onto mozilla-central if they have missed the last integration repository merge on the day of uplift - the correct course of action would be to request aurora-approval and uplift instead, due to the amount of merge-day breakage this has caused in the past. If there are any other cases that should be explicitly mentioned above instead of relying on #4, please let me know. A proportion of the current mozilla-central non-critical commits are made by people inadvertently pushing to the wrong repository. To prevent these, once the tree rules are adjusted on the wiki the sheriffs envisage the next step will be switching mozilla-central to a non-open tree state (name TBD) using the existing tree closure hook. Backouts, merges automated bot updates will not need any additional annotation - others will simply use a (yet to be chosen) commit message string to signify awareness adherence to the new tree policy. We welcome your feedback :-) Best wishes, Ed [1] We've recently disabled the last good revision functionality on mozilla-central, since it was both not functioning as expected and also frequently caused delays in Nightly generation when non-tier 1 jobs are failed or infra issues caused a single build to fail. The net change is positive, however it increases the need to ensure mozilla-central is in a known good state. [2] https://wiki.mozilla.org/Sheriffing [3] https://wiki.mozilla.org/Tree_Rules [4] eg: https://hg.mozilla.org/mozilla-central/rev/c401296f71ae and https://hg.mozilla.org/mozilla-central/rev/beea7a7f3fc3 - though I think we may wish to move these to an integration repository in the future, since they have occasionally caused breakage in the past. However that will require discussion and automation changes, so I believe we should maintain the status quo initially. ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Enable -Wswitch-enum? [was Re: MOZ_ASSUME_UNREACHABLE is being misused]
(2014/04/07 14:27), Karl Tomlinson wrote: It is allowed in N3242. I think the relevant sections are 5.2.9 Static cast Thank you for the pointer. I found a floating copy of n3242.pdf at the following url. http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2011/n3242.pdf I think 7.2 10 is also relevant here. --- quote --- An expression of arithmetic or enumeration type can be converted to an enumeration type explicitly. The value is unchanged if it is in the range of enumeration values of the enumeration type; otherwise the resulting enumeration value is unspecified. --- end quote I take so : typedef enum { a = 1, b, c = 10 } T; T x; x = 3; /* is OK it is within [1..10] although 3 is not in the list explicitly */ but x = 32; /*`unspecified' because it is outside [1..2^16] */ (I read the specification to mean that the range of enumeration values is [0 or 1 .. maximum_necessary_for_the_declared_maximum_value] where maximum_necessary_for_the_declared_maximum_value is the 2^M or (2^M-1) etc. depending on how the negative value is represented. 2's complement or 1's complement.) The description is very complex, but it seems that the enumeration range be calculated to produce the narrowest bit field that can contain all the explicitly declared values (min .. max) This is the range of values. Anyway, in my example above, a compiler can do anything if x = 32 is executed (?). Hmm.I will re-read 7.2.7. TIA ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Enable -Wswitch-enum? [was Re: MOZ_ASSUME_UNREACHABLE is being misused]
chiaki ISHIKAWA writes: I think 7.2 10 is also relevant here. --- quote --- An expression of arithmetic or enumeration type can be converted to an enumeration type explicitly. The value is unchanged if it is in the range of enumeration values of the enumeration type; otherwise the resulting enumeration value is unspecified. --- end quote I take so : typedef enum { a = 1, b, c = 10 } T; T x; Anyway, in my example above, a compiler can do anything if x = 32 is executed (?). Note here the enumeration value is unspecified, which I assume merely means the compiler can choose anything for the value of x. That might be considerably safer than undefined behavior of the program. ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Policy for disabling tests which run on TBPL
On Mon, Apr 7, 2014 at 6:33 AM, Andrew Halberstadt ahalberst...@mozilla.com wrote: Many of our test runners have that ability. But doing this implies that intermittents are always the fault of the test. We'd be missing whole classes of regressions (notably race conditions). We already are, because we already will star (i.e., ignore) any failure that looks like a known intermittent failure. I'm only saying we should automate that as much as possible. ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Policy for disabling tests which run on TBPL
On 07/04/14 05:10 AM, James Graham wrote: On 07/04/14 04:33, Andrew Halberstadt wrote: On 06/04/14 08:59 AM, Aryeh Gregor wrote: Is there any reason in principle that we couldn't have the test runner automatically rerun tests with known intermittent failures a few times, and let the test pass if it passes a few times in a row after the first fail? This would be a much nicer option than disabling the test entirely, and would still mean the test is mostly effective, particularly if only specific failure messages are allowed to be auto-retried. Many of our test runners have that ability. But doing this implies that intermittents are always the fault of the test. We'd be missing whole classes of regressions (notably race conditions). In practice how effective are we at identifying bugs that lead to instability? Is it more common that we end up disabling the test, or marking it as known intermittent and learning to live with the instability, both of which options reduce the test coverage, or is it more common that we realise that there is a code reason for the intermittent, and get it fixed? I would guess the former is true in most cases. But at least there we have a *chance* at tracking down and fixing the failure, even if it takes awhile before it becomes annoying enough to prioritize. If we made it so intermittents never annoyed anyone, there would be even less motivation to fix them. Yes in theory we would still have a list of top failing intermittents. In practice that list will be ignored. Case in point, desktop xpcshell does this right now. Open a log and ctrl-f for Retrying tests. Most runs have a few failures that got retried. No one knows about these and no one looks at them. Publishing results somewhere easy to discover would definitely help, but I'm not convinced it will help much. Doing this would also cause us to miss non-intermittent regressions, e.g where the ordering of tests tickles the platform the wrong way. On the retry, the test would get run in a completely different order and might show up green 100% of the time. Either way, the problem is partly culture, partly due to not good enough tooling. I see where this proposal is coming from, but I think there are ways of tackling the problem head on. This seems kind of like a last resort. Andrew ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Removing 'jit-tests' from make check
Hi Terrence, Thanks! I've filed Bug 992887 to track the expanded mach command. Cheers, Dan - Original Message - From: Terrence Cole tc...@mozilla.com To: dev-platform@lists.mozilla.org Sent: Saturday, April 5, 2014 4:30:53 PM Subject: Re: Removing 'jit-tests' from make check Dan, Congratulations on landing the jit-tests split! I'm glad to hear we're getting a make check replacement too. We discussed it a bit in IRC and the rough decision, at least between jorendorff and myself, was to expand the scope of the replacement to run /all/ of SpiderMonkey's test suites. Ideally we'd like to be able to tell new contributors Run this and if it passes you're probably good to go. instead of Why didn't you run this other test suite? What do you mean it's not in the wiki? Oh, you looked at that old page. Well then. This should be as simple as adding |./tests/jstests.py shell --tbpl| to mach check-spidermonkey. It sounds like you've got it under control, but please ping me if you want input. Cheers, Terrence On 04/04/2014 05:44 AM, Daniel Minor wrote: Hi Nicolas, This change only affects running the jit-test test suite as part of make check. This doesn't affect building or running the JS shell. The mach command that has been added replicates how this particular test suite was previously run in make check. It could be expanded, of course. Thanks, Dan - Original Message - From: Nicolas B. Pierron nicolas.b.pier...@mozilla.com To: dev-platform@lists.mozilla.org Sent: Friday, April 4, 2014 8:31:32 AM Subject: Re: Removing 'jit-tests' from make check On 04/04/2014 03:39 AM, Daniel Minor wrote: Just a heads up that very soon we'll be removing jit-tests from the make check target[1]. The tests have been split out into a separate test job on TBPL[2] (labelled Jit), have been running on Cedar for several months, and have been recently turned on for other trees. We've added a mach command-- mach jittest that runs the tests with the same arguments that make check currently does. mach jittest ? Is there any documentation which explain how to only work with the JS Shell by using mach commands? Does this change implies that every JS developer will have to compile the full browser just to work on the Shell? The only documentation I know [1] explains how to run a configure make. [1] https://developer.mozilla.org/en-US/docs/SpiderMonkey/Build_Documentation ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Policy for disabling tests which run on TBPL
On Mon, Apr 7, 2014 at 3:20 PM, Andrew Halberstadt ahalberst...@mozilla.com wrote: I would guess the former is true in most cases. But at least there we have a *chance* at tracking down and fixing the failure, even if it takes awhile before it becomes annoying enough to prioritize. If we made it so intermittents never annoyed anyone, there would be even less motivation to fix them. Yes in theory we would still have a list of top failing intermittents. In practice that list will be ignored. Is this better or worse than the status quo? Just because a bug happens to have made its way into our test suite doesn't mean it should be high priority. If the bug isn't causing known problems for users, it makes sense to ignore it in favor of working on bugs that are known to affect users. Why not let the relevant developers make that prioritization decision, and ignore the bug forever if they don't think it's as important as other things they're working on? ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Enable -Wswitch-enum? [was Re: MOZ_ASSUME_UNREACHABLE is being misused]
On 2014-04-07 6:00 AM, Karl Tomlinson wrote: chiaki ISHIKAWA writes: I think 7.2 10 is also relevant here. --- quote --- An expression of arithmetic or enumeration type can be converted to an enumeration type explicitly. The value is unchanged if it is in the range of enumeration values of the enumeration type; otherwise the resulting enumeration value is unspecified. --- end quote I take so : typedef enum { a = 1, b, c = 10 } T; T x; Anyway, in my example above, a compiler can do anything if x = 32 is executed (?). Note here the enumeration value is unspecified, which I assume merely means the compiler can choose anything for the value of x. That might be considerably safer than undefined behavior of the program. Right. The intention here is that the compiler is allowed to pick some underlying integer type, which must be able to represent all declared values of the enumeration, and then silently accept in-range values and truncate out-of-range values *for that type* -- but it's *not* allowed to apply assume the programmer never does that optimizations as it would for undefined behavior. I don't know what the C++ committee's attitude was, but the C committee specifically wanted to allow use of `enum` to declare bitmasks: enum X { X_FOO = 0x0001, X_BAR = 0x0002, /* ... exhaustive list of bit flags here ... */ X_BLURF = 0x8000 }; For that use case, any _combination_ of X_* flags must be representable by the underlying integer type; note that many such combinations are outside the *declared* range of values. zw ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Policy for disabling tests which run on TBPL
On 4/7/2014 9:02 AM, Aryeh Gregor wrote: On Mon, Apr 7, 2014 at 3:20 PM, Andrew Halberstadt ahalberst...@mozilla.com wrote: I would guess the former is true in most cases. But at least there we have a *chance* at tracking down and fixing the failure, even if it takes awhile before it becomes annoying enough to prioritize. If we made it so intermittents never annoyed anyone, there would be even less motivation to fix them. Yes in theory we would still have a list of top failing intermittents. In practice that list will be ignored. Is this better or worse than the status quo? Just because a bug happens to have made its way into our test suite doesn't mean it should be high priority. If the bug isn't causing known problems for users, it makes sense to ignore it in favor of working on bugs that are known to affect users. Why not let the relevant developers make that prioritization decision, and ignore the bug forever if they don't think it's as important as other things they're working on? If a bug is causing a test to fail intermittently, then that test loses value. It still has some value in that it can catch regressions that cause it to fail permanently, but we would not be able to catch a regression that causes it to fail intermittently. It's difficult to say whether bugs we find via tests are more or less important than bugs we find via users. It's entirely possible that lots of the bugs that cause intermittent test failures cause intermittent weird behavior for our users, we simply don't have any visibility into that. -Ted ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Policy for disabling tests which run on TBPL
On Mon, Apr 7, 2014 at 6:12 PM, Ted Mielczarek t...@mielczarek.org wrote: If a bug is causing a test to fail intermittently, then that test loses value. It still has some value in that it can catch regressions that cause it to fail permanently, but we would not be able to catch a regression that causes it to fail intermittently. To some degree, yes, marking a test as expected intermittent causes it to lose value. If the developers who work on the relevant component think the lost value is important enough to track down the cause of the intermittent failure, they can do so. That should be their decision, not something forced on them by infrastructure issues (everyone else will suffer if you don't find the cause for this failure in your test). Making known intermittent failures not turn the tree orange doesn't stop anyone from fixing intermittent failures, it just removes pressure from them if they decide they don't want to. If most developers think they have more important bugs to fix, then I don't see a problem with that. ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Policy for disabling tests which run on TBPL
On 2014-04-07, 11:12 AM, Ted Mielczarek wrote: It's difficult to say whether bugs we find via tests are more or less important than bugs we find via users. It's entirely possible that lots of the bugs that cause intermittent test failures cause intermittent weird behavior for our users, we simply don't have any visibility into that. Is there a way - or could there be a way - for us to push builds that generate intermittent-failure tests out to the larger Mozilla community? And, more generally: I'd love it if we could economically tie in community engagement to our automated test process somehow. - mhoye ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Linux testing on single-core VMs nowadays
I wanted to post about this because I don't think it's common knowledge (I only just came to the realization today) and it has potential impact on the effectiveness of our unit tests. Currently we run our Linux unit tests exclusively on Amazon EC2 m1.medium[1] instances which have only one CPU core. Previously we used to run Linux tests on in-house multicore hardware. This means that we're testing different threading behavior now. In more concrete terms, a threading bug[2] was found recently by AddressSanitizer but it only manifested on the build machines (conveniently we still run some limited xpcshell testing as part of `make check` as well as during packaging) and not in our extensive unit tests running on the test machines. This seems unfortunate. I'm not sure what the real impact of this is. Threading bugs can certainly manifest on single-core machines, but the scheduling behavior is different so they're likely to be different bugs. Is this an issue we should address? -Ted 1. http://aws.amazon.com/ec2/instance-types/#Instance_Types 2. https://bugzilla.mozilla.org/show_bug.cgi?id=990230 ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Formalising the current 'mozilla-central essential pushes only' recommendation
Ed Morley wrote: (Follow-ups to dev.tree-management please) Hi all :-) The vast majority of mozilla-central landings are now via curated merges from integration/team repositories. This dramatically increases the chance that the tip of mozilla-central is in a known-good state, meaning that: * Integration/project repos pulling from mozilla-central are less likely to receive breakage from elsewhere. * A bad landing + tree closure on one integration/team repo doesn't impact the ability of other repositories to merge into mozilla-central. * Developers have the choice of a safer repository to use as their qbase for local development / try pushes. * Nightly builds are less prone to being broken requiring a respin [1]. Non-critical mozilla-central landings are already discouraged and as such are rare. However, the sheriffs [2] would like to formalise this, by adjusting the mozilla-central tree rules [3] to state that direct pushes must be for one of the following reasons: 1) Merging from an integration/team/project repository (there is no restriction on who may make these merges). 2) Automated blocklist / HSTS preload list updates [4]. 3) For the resolution (ie: backout or follow-up fix) of critical regressions (eg: top-crashers or other major functional regression) that will result in a Nightly respin or must make the imminent scheduled Nightly at all costs. 4) Anything else for which common sense (or asking in #developers) says is an appropriate reason for a direct landing on mozilla-central. Clearly #4 is very fuzzy - but I'm hopeful self-policing has a good chance of success and so would like to try that first. Note #4 would not include the landing of new features directly onto mozilla-central if they have missed the last integration repository merge on the day of uplift - the correct course of action would be to request aurora-approval and uplift instead, due to the amount of merge-day breakage this has caused in the past. If there are any other cases that should be explicitly mentioned above instead of relying on #4, please let me know. A proportion of the current mozilla-central non-critical commits are made by people inadvertently pushing to the wrong repository. To prevent these, once the tree rules are adjusted on the wiki the sheriffs envisage the next step will be switching mozilla-central to a non-open tree state (name TBD) using the existing tree closure hook. Backouts, merges automated bot updates will not need any additional annotation - others will simply use a (yet to be chosen) commit message string to signify awareness adherence to the new tree policy. We welcome your feedback :-) This sounds good. Ideally there would be no manual merges, backouts and everything would be automated so only a bot could access the repo. Taras Best wishes, Ed [1] We've recently disabled the last good revision functionality on mozilla-central, since it was both not functioning as expected and also frequently caused delays in Nightly generation when non-tier 1 jobs are failed or infra issues caused a single build to fail. The net change is positive, however it increases the need to ensure mozilla-central is in a known good state. [2] https://wiki.mozilla.org/Sheriffing [3] https://wiki.mozilla.org/Tree_Rules [4] eg: https://hg.mozilla.org/mozilla-central/rev/c401296f71ae and https://hg.mozilla.org/mozilla-central/rev/beea7a7f3fc3 - though I think we may wish to move these to an integration repository in the future, since they have occasionally caused breakage in the past. However that will require discussion and automation changes, so I believe we should maintain the status quo initially. ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Linux testing on single-core VMs nowadays
Hey Ted, - Original Message - From: Ted Mielczarek t...@mielczarek.org To: Mozilla Platform Development dev-platform@lists.mozilla.org Sent: Monday, April 7, 2014 11:11:22 AM Subject: Linux testing on single-core VMs nowadays I wanted to post about this because I don't think it's common knowledge (I only just came to the realization today) and it has potential impact on the effectiveness of our unit tests. Currently we run our Linux unit tests exclusively on Amazon EC2 m1.medium[1] instances which have only one CPU core. Previously we used to run Linux tests on in-house multicore hardware. This means that we're testing different threading behavior now. In more concrete terms, a threading bug[2] was found recently by AddressSanitizer but it only manifested on the build machines (conveniently we still run some limited xpcshell testing as part of `make check` as well as during packaging) and not in our extensive unit tests running on the test machines. This seems unfortunate. I'm not sure what the real impact of this is. Threading bugs can certainly manifest on single-core machines, but the scheduling behavior is different so they're likely to be different bugs. Is this an issue we should address? Personally, I think that the more ways we can test for threading issues the better. It seems to me that we should do some amount of testing on single core and multi-core. Then I suppose the question becomes how many cores? 2? 4? 8? Maybe we can cycle through some different number of cores so that we get coverage without duplicating everything? Threading issues probably don't happen all that often, but when they do happen they can be more difficult to track down. So being able to get some coverage on machines with different numbers of core seems useful (especially if the number of cores is readily available and logged along with the TBPL failures). Dave Hylands ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Formalising the current 'mozilla-central essential pushes only' recommendation
(Follow-ups to dev.tree-management please) A proportion of the current mozilla-central non-critical commits are made by people inadvertently pushing to the wrong repository. To prevent these, once the tree rules are adjusted on the wiki the sheriffs envisage the next step will be switching mozilla-central to a non-open tree state (name TBD) using the existing tree closure hook. Backouts, merges automated bot updates will not need any additional annotation - others will simply use a (yet to be chosen) commit message string to signify awareness adherence to the new tree policy. This is why all my hgrc files have no default push target; all have things like: inbound = ssh://hg.mozilla.org/integration/mozilla-inbound/ or beta = ssh://hg.mozilla.org/releases/mozilla-beta/ or m-c = ssh://hg.mozilla.org/mozilla-central/ such that if I forget what directory I'm working with I can't push to the wrong repo. Of course, the default when people clone isn't set that way. requiring CENTRAL or whatever would have the effect of blocking everyone's accidental pushes. I usually land on central for 3 things: want a fix to get into the next nightly and I'm unsure or expect no more merges; inbound has been closed for ages; or it's uplift weekend - even landing Saturday has no guarantee to being in the merge (though *almost* always it is). I sleep better knowing I don't have to worry about when the merge will be done, if it will be done (and I always hang out and star). But I understand the contrary opinion to that last case. And once in a blue moon I'll land a m-c update after an uncorrect uplift (uplift of patch but not the backout) or other fubar. -- Randell Jesup, Mozilla Corp remove news for personal email ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
B2G emulator issues
The B2G emulator design is causing all sorts of problems. We just fixed the #2 orange which was caused by the Audio channel StartPlaying() taking up to 20 seconds to run (and we fixed it by effectively removing some timeouts). However, we just wasted half a week trying to land AEC MediaStreamGraph improvements. We still haven't landed due to yet another B2G emulator orange, but the solution we used for the M10 problem doesn't fix the fundamental problems with B2G emulator. Details: We ran into huge problems getting AEC/MediaStreamGraph changes (bug 818822 and things dependent on it) into the tree due to problems with B2g-emulator debug M10 (permaorange timeouts). This test adds a fairly small amount of processing to input audio data (resampling to 44100Hz). A test that runs perfectly in emulator opt builds and runs fine locally in M10 debug (10-12 seconds reported for the test in the logs, with or without the change), goes from taking 30-40 seconds on tbpl to 350-450(!) seconds (and then times out). Fix that one, and others fail even worse. I contacted Gregor Wagner asking for help and also jgriffin in #b2g. We found one problem (emulator going to 'sleep' during mochitests, bug 992436); I have a patch up to enable wakelock globally for mochitests. However, that just pushed the error a little deeper. The fundamental problem is that b2g-emulator can't deal safely with any sort of realtime or semi-realtime data unless run on a fast machine. The architecture for the emulator setup means the effective CPU power is dependent on the machine running the test, and that varies a lot (and tbpl machines are WAY slower than my 2.5 year old desktop). Combine that with Debug being much slower, and it's recipe for disaster for any sort of time-dependent tests. I worked around it for now, by turning down the timers that push fake realtime data into the system - this will cause audio underruns in MediaStreamGraph, and doesn't solve the problem of MediaStreamGraph potentially overloading itself for other reasons, or breaking assumptions about being able to keep up with data streams. (MSG wants to run every 10ms or so.) This problem also likely plays hell with the Web Audio tests, and will play hell with WebRTC echo cancellation and the media reception code, which will start trying to insert loss-concealment data and break timer-based packet loss recovery, bandwidth estimators, etc. As to what to do? That's a good question, as turning off the emulator tests isn't a realistic option. One option (very, very painful, and even slower) would be a proper device simulator which simulates both the CPU and the system hardware (of *some* B2G phone). This would produce the most realistic result with an emulator. Another option (likely not simple) would be to find a way to slow down time for the emulator, such as intercepting system calls and increasing any time constants (multiplying timer values, timeout values to socket calls, etc, etc). This may not be simple. For devices (audio, etc), frequencies may need modifying or other adjustments made. We could require that the emulator needs X Bogomips to run, or to run a specific test suite. We could segment out tests that require higher performance and run them on faster VMs/etc. We could turn off certain tests on tbpl and run them on separate dedicated test machines (a bit similar to PGO). There are downsides to this of course. Lastly, we could put in a bank of HW running B2G to run the tests like the Android test boards/phones. So, what do we do? Because if we do nothing, it will only get worse. -- Randell Jesup, Mozilla Corp remove news for personal email ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: B2G emulator issues
How easy is it to identify CPU-sensitive tests? I think the most practical solution (at least in the near term) is to find that set of tests, and run only that set on a faster VM, or on real hardware (like our ix slaves). Jonathan On 4/7/2014 3:16 PM, Randell Jesup wrote: The B2G emulator design is causing all sorts of problems. We just fixed the #2 orange which was caused by the Audio channel StartPlaying() taking up to 20 seconds to run (and we fixed it by effectively removing some timeouts). However, we just wasted half a week trying to land AEC MediaStreamGraph improvements. We still haven't landed due to yet another B2G emulator orange, but the solution we used for the M10 problem doesn't fix the fundamental problems with B2G emulator. Details: We ran into huge problems getting AEC/MediaStreamGraph changes (bug 818822 and things dependent on it) into the tree due to problems with B2g-emulator debug M10 (permaorange timeouts). This test adds a fairly small amount of processing to input audio data (resampling to 44100Hz). A test that runs perfectly in emulator opt builds and runs fine locally in M10 debug (10-12 seconds reported for the test in the logs, with or without the change), goes from taking 30-40 seconds on tbpl to 350-450(!) seconds (and then times out). Fix that one, and others fail even worse. I contacted Gregor Wagner asking for help and also jgriffin in #b2g. We found one problem (emulator going to 'sleep' during mochitests, bug 992436); I have a patch up to enable wakelock globally for mochitests. However, that just pushed the error a little deeper. The fundamental problem is that b2g-emulator can't deal safely with any sort of realtime or semi-realtime data unless run on a fast machine. The architecture for the emulator setup means the effective CPU power is dependent on the machine running the test, and that varies a lot (and tbpl machines are WAY slower than my 2.5 year old desktop). Combine that with Debug being much slower, and it's recipe for disaster for any sort of time-dependent tests. I worked around it for now, by turning down the timers that push fake realtime data into the system - this will cause audio underruns in MediaStreamGraph, and doesn't solve the problem of MediaStreamGraph potentially overloading itself for other reasons, or breaking assumptions about being able to keep up with data streams. (MSG wants to run every 10ms or so.) This problem also likely plays hell with the Web Audio tests, and will play hell with WebRTC echo cancellation and the media reception code, which will start trying to insert loss-concealment data and break timer-based packet loss recovery, bandwidth estimators, etc. As to what to do? That's a good question, as turning off the emulator tests isn't a realistic option. One option (very, very painful, and even slower) would be a proper device simulator which simulates both the CPU and the system hardware (of *some* B2G phone). This would produce the most realistic result with an emulator. Another option (likely not simple) would be to find a way to slow down time for the emulator, such as intercepting system calls and increasing any time constants (multiplying timer values, timeout values to socket calls, etc, etc). This may not be simple. For devices (audio, etc), frequencies may need modifying or other adjustments made. We could require that the emulator needs X Bogomips to run, or to run a specific test suite. We could segment out tests that require higher performance and run them on faster VMs/etc. We could turn off certain tests on tbpl and run them on separate dedicated test machines (a bit similar to PGO). There are downsides to this of course. Lastly, we could put in a bank of HW running B2G to run the tests like the Android test boards/phones. So, what do we do? Because if we do nothing, it will only get worse. ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: B2G emulator issues
On 4/7/2014 3:16 PM, Randell Jesup wrote: The B2G emulator design is causing all sorts of problems. We just fixed That sounds very similar to some of the failures seen on the Android 2.3 emulator. Many media-related mochitests intermittently time out on the Android 2.3 emulator when run on aws. These are reported in bug 981889, bug 981886, bug 981881, and bug 981898, but have not been investigated. ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: B2G emulator issues
How easy is it to identify CPU-sensitive tests? Easy for some (most but not all media tests). Almost all getUserMedia/PeerConnection tests. ICE/STUN/TURN tests. Not that easy for some. And some may be only indirectly sensitive - timeouts in delay-the-rendering code, TCP/DNS/SPDY timers, etc, etc. Anything that touches a timer even indirectly *could* be. So, large sections *could* be. I suppose we could include code checking for MainThread starvation as a partial check though that won't catch everything. I think the most practical solution (at least in the near term) is to find that set of tests, and run only that set on a faster VM, or on real hardware (like our ix slaves). That was an option I mentioned. It's not fun and will be a continual is this orange CPU sensitive? as they pop up, but it certainly can be done. And it may be simpler than better solutions. -- Randell Jesup, Mozilla Corp remove news for personal email ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Policy for disabling tests which run on TBPL
On 2014-04-07, 11:49 AM, Aryeh Gregor wrote: On Mon, Apr 7, 2014 at 6:12 PM, Ted Mielczarek t...@mielczarek.org wrote: If a bug is causing a test to fail intermittently, then that test loses value. It still has some value in that it can catch regressions that cause it to fail permanently, but we would not be able to catch a regression that causes it to fail intermittently. To some degree, yes, marking a test as expected intermittent causes it to lose value. If the developers who work on the relevant component think the lost value is important enough to track down the cause of the intermittent failure, they can do so. That should be their decision, not something forced on them by infrastructure issues (everyone else will suffer if you don't find the cause for this failure in your test). Making known intermittent failures not turn the tree orange doesn't stop anyone from fixing intermittent failures, it just removes pressure from them if they decide they don't want to. If most developers think they have more important bugs to fix, then I don't see a problem with that. What you're saying above is true *if* someone investigates the intermittent test failure and determines that the bug is not important. But in my experience, that's not what happens at all. I think many people treat intermittent test failures as a category of unimportant problems, and therefore some bugs are never investigated. The fact of the matter is that most of these bugs are bugs in our tests, which of course will not impact our users directly, but I have occasionally come across bugs in our code code which are exposed as intermittent failures. The real issue is that the work of identifying where the root of the problem is often time is the majority of work needed to fix the intermittent test failure, so unless someone is willing to investigate the bug we cannot say whether or not it impacts our users. The thing that really makes me care about these intermittent failures a lot is that ultimately they make us have to trade either disabling a whole bunch of tests with being unable to manage our tree. As more and more tests get disabled, we lose more and more test coverage, and that can have a much more severe impact on the health of our products than every individual intermittent test failure. Cheers, Ehsan ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: B2G emulator issues
When you say debug, you mean the emulator is running a FirefoxOS debug build, not that the emulator itself is built debug --- right? Given that, is it a correct summary to say that the problem is that the emulator is just too slow? Applying time dilation might make tests green but we'd be left with the problem of the tests still taking a long time to run. Maybe we should identify a subset of the tests that are more likely to suffer B2G-specific breaking and only run those? Rob -- Jtehsauts tshaei dS,o n Wohfy Mdaon yhoaus eanuttehrotraiitny eovni le atrhtohu gthot sf oirng iyvoeu rs ihnesa.rt sS?o Whhei csha iids teoa stiheer :p atroa lsyazye,d 'mYaonu,r sGients uapr,e tfaokreg iyvoeunr, 'm aotr atnod sgaoy ,h o'mGee.t uTph eann dt hwea lmka'n? gBoutt uIp waanndt wyeonut thoo mken.o w ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: B2G emulator issues
On 2014-04-07, 8:03 PM, Robert O'Callahan wrote: When you say debug, you mean the emulator is running a FirefoxOS debug build, not that the emulator itself is built debug --- right? Given that, is it a correct summary to say that the problem is that the emulator is just too slow? Applying time dilation might make tests green but we'd be left with the problem of the tests still taking a long time to run. Maybe we should identify a subset of the tests that are more likely to suffer B2G-specific breaking and only run those? Do we disable all compiler optimizations for those debug builds? Can we turn them on, let's say, build with --enable-optimize and --enable-debug which gives us a -O2 optimized debug build? ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: B2G emulator issues
Why don’t we just switch to x86 emulator? x86 emulator runs way faster than the ARM emulator. Best Regards, Shih-Chiang Chien Mozilla Taiwan On Apr 8, 2014, at 8:49 AM, Ehsan Akhgari ehsan.akhg...@gmail.com wrote: On 2014-04-07, 8:03 PM, Robert O'Callahan wrote: When you say debug, you mean the emulator is running a FirefoxOS debug build, not that the emulator itself is built debug --- right? Given that, is it a correct summary to say that the problem is that the emulator is just too slow? Applying time dilation might make tests green but we'd be left with the problem of the tests still taking a long time to run. Maybe we should identify a subset of the tests that are more likely to suffer B2G-specific breaking and only run those? Do we disable all compiler optimizations for those debug builds? Can we turn them on, let's say, build with --enable-optimize and --enable-debug which gives us a -O2 optimized debug build? ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform signature.asc Description: Message signed with OpenPGP using GPGMail ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform