Re: Policy for disabling tests which run on TBPL

2014-04-07 Thread James Graham

On 07/04/14 04:33, Andrew Halberstadt wrote:

On 06/04/14 08:59 AM, Aryeh Gregor wrote:

On Sat, Apr 5, 2014 at 12:00 AM, Ehsan Akhgari
ehsan.akhg...@gmail.com wrote:

Note that is only accurate to a certain point.  There are other
things which
we can do to guesswork our way out of the situation for Autoland, but of
course they're resource/time intensive (basically running orange
tests over
and over again, etc.)


Is there any reason in principle that we couldn't have the test runner
automatically rerun tests with known intermittent failures a few
times, and let the test pass if it passes a few times in a row after
the first fail?  This would be a much nicer option than disabling the
test entirely, and would still mean the test is mostly effective,
particularly if only specific failure messages are allowed to be
auto-retried.


Many of our test runners have that ability. But doing this implies that
intermittents are always the fault of the test. We'd be missing whole
classes of regressions (notably race conditions).


In practice how effective are we at identifying bugs that lead to 
instability? Is it more common that we end up disabling the test, or 
marking it as known intermittent and learning to live with the 
instability, both of which options reduce the test coverage, or is it 
more common that we realise that there is a code reason for the 
intermittent, and get it fixed?


If it is the latter then making the instability as obvious as possible 
makes sense, and the current setup where we run each test once can be 
regarded as a compromise between the ideal setup where we run each test 
multiple times and flag it as a fail if it ever fails, and the needs of 
performance.


If the former is true, it makes a lot more sense to do reruns of the 
tests that fail in order to keep them active at all, and store 
information about the fact that reruns occurred so that we can see when 
a test started giving unexpected results. This does rely on having some 
mechanism to make people care about genuine intermittents that they 
caused, but maybe the right way to do that is to have some batch tool 
that takes all the tests that have become intermittent, and does reruns 
until it has identified the commits that introduced the intermittency, 
and then files P1 bugs on the developer(s) it identifies to fix their bugs.

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Formalising the current 'mozilla-central essential pushes only' recommendation

2014-04-07 Thread Ed Morley

(Follow-ups to dev.tree-management please)

Hi all :-)

The vast majority of mozilla-central landings are now via curated merges 
from integration/team repositories. This dramatically increases the 
chance that the tip of mozilla-central is in a known-good state, meaning 
that:


* Integration/project repos pulling from mozilla-central are less likely 
to receive breakage from elsewhere.
* A bad landing + tree closure on one integration/team repo doesn't 
impact the ability of other repositories to merge into mozilla-central.
* Developers have the choice of a safer repository to use as their qbase 
for local development / try pushes.

* Nightly builds are less prone to being broken  requiring a respin [1].

Non-critical mozilla-central landings are already discouraged and as 
such are rare. However, the sheriffs [2] would like to formalise this, 
by adjusting the mozilla-central tree rules [3] to state that direct 
pushes must be for one of the following reasons:


1) Merging from an integration/team/project repository (there is no 
restriction on who may make these merges).

2) Automated blocklist / HSTS preload list updates [4].
3) For the resolution (ie: backout or follow-up fix) of critical 
regressions (eg: top-crashers or other major functional regression) that 
will result in a Nightly respin or must make the imminent scheduled 
Nightly at all costs.
4) Anything else for which common sense (or asking in #developers) says 
is an appropriate reason for a direct landing on mozilla-central.


Clearly #4 is very fuzzy - but I'm hopeful self-policing has a good 
chance of success and so would like to try that first. Note #4 would not 
include the landing of new features directly onto mozilla-central if 
they have missed the last integration repository merge on the day of 
uplift - the correct course of action would be to request 
aurora-approval and uplift instead, due to the amount of merge-day 
breakage this has caused in the past.


If there are any other cases that should be explicitly mentioned above 
instead of relying on #4, please let me know.


A proportion of the current mozilla-central non-critical commits are 
made by people inadvertently pushing to the wrong repository. To prevent 
these, once the tree rules are adjusted on the wiki the sheriffs 
envisage the next step will be switching mozilla-central to a non-open 
tree state (name TBD) using the existing tree closure hook. Backouts, 
merges  automated bot updates will not need any additional annotation - 
others will simply use a (yet to be chosen) commit message string to 
signify awareness  adherence to the new tree policy.


We welcome your feedback :-)

Best wishes,

Ed

[1] We've recently disabled the last good revision functionality on 
mozilla-central, since it was both not functioning as expected and also 
frequently caused delays in Nightly generation when non-tier 1 jobs are 
failed or infra issues caused a single build to fail. The net change is 
positive, however it increases the need to ensure mozilla-central is in 
a known good state.

[2] https://wiki.mozilla.org/Sheriffing
[3] https://wiki.mozilla.org/Tree_Rules
[4] eg: https://hg.mozilla.org/mozilla-central/rev/c401296f71ae and 
https://hg.mozilla.org/mozilla-central/rev/beea7a7f3fc3 - though I think 
we may wish to move these to an integration repository in the future, 
since they have occasionally caused breakage in the past. However that 
will require discussion and automation changes, so I believe we should 
maintain the status quo initially.

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Enable -Wswitch-enum? [was Re: MOZ_ASSUME_UNREACHABLE is being misused]

2014-04-07 Thread ISHIKAWA,chiaki

(2014/04/07 14:27), Karl Tomlinson wrote:

 It is allowed in N3242.  I think the relevant sections are


5.2.9 Static cast


Thank you for the pointer.

I found a floating copy of n3242.pdf at the following url.
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2011/n3242.pdf

I think 7.2 10 is also relevant here.

--- quote ---
An expression of arithmetic or enumeration type can be converted to an 
enumeration type explicitly. The
value is unchanged if it is in the range of enumeration values of the 
enumeration type; otherwise the resulting

enumeration value is unspecified.
--- end quote

I take so :

typedef enum { a = 1, b, c = 10 } T;
T x;

x = 3; /*  is OK it is within [1..10] although 3 is not in the list 
explicitly */


but

x = 32; /*`unspecified' because it is outside [1..2^16] */

(I read the specification to mean
 that the range of enumeration values is
[0 or 1 .. maximum_necessary_for_the_declared_maximum_value]
where maximum_necessary_for_the_declared_maximum_value is
the 2^M or (2^M-1) etc. depending on how the
negative value is represented. 2's complement or 1's complement.)
The description is very complex, but
it seems that the enumeration range be
calculated to produce the narrowest bit field that can contain
all the explicitly declared values (min .. max)
This is the range of values.

Anyway, in my example above, a compiler can do anything
if x = 32 is executed (?).
Hmm.I will re-read 7.2.7.

TIA




___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Enable -Wswitch-enum? [was Re: MOZ_ASSUME_UNREACHABLE is being misused]

2014-04-07 Thread Karl Tomlinson
chiaki ISHIKAWA writes:

 I think 7.2 10 is also relevant here.

 --- quote ---
 An expression of arithmetic or enumeration type can be converted
 to an enumeration type explicitly. The
 value is unchanged if it is in the range of enumeration values of
 the enumeration type; otherwise the resulting
 enumeration value is unspecified.
 --- end quote

 I take so :

 typedef enum { a = 1, b, c = 10 } T;
 T x;

 Anyway, in my example above, a compiler can do anything
 if x = 32 is executed (?).

Note here the enumeration value is unspecified, which I assume
merely means the compiler can choose anything for the value of x.

That might be considerably safer than undefined behavior of the
program.
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Policy for disabling tests which run on TBPL

2014-04-07 Thread Aryeh Gregor
On Mon, Apr 7, 2014 at 6:33 AM, Andrew Halberstadt
ahalberst...@mozilla.com wrote:
 Many of our test runners have that ability. But doing this implies that
 intermittents are always the fault of the test. We'd be missing whole
 classes of regressions (notably race conditions).

We already are, because we already will star (i.e., ignore) any
failure that looks like a known intermittent failure.  I'm only saying
we should automate that as much as possible.
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Policy for disabling tests which run on TBPL

2014-04-07 Thread Andrew Halberstadt

On 07/04/14 05:10 AM, James Graham wrote:

On 07/04/14 04:33, Andrew Halberstadt wrote:

On 06/04/14 08:59 AM, Aryeh Gregor wrote:

Is there any reason in principle that we couldn't have the test runner
automatically rerun tests with known intermittent failures a few
times, and let the test pass if it passes a few times in a row after
the first fail?  This would be a much nicer option than disabling the
test entirely, and would still mean the test is mostly effective,
particularly if only specific failure messages are allowed to be
auto-retried.


Many of our test runners have that ability. But doing this implies that
intermittents are always the fault of the test. We'd be missing whole
classes of regressions (notably race conditions).


In practice how effective are we at identifying bugs that lead to
instability? Is it more common that we end up disabling the test, or
marking it as known intermittent and learning to live with the
instability, both of which options reduce the test coverage, or is it
more common that we realise that there is a code reason for the
intermittent, and get it fixed?


I would guess the former is true in most cases. But at least there we 
have a *chance* at tracking down and fixing the failure, even if it 
takes awhile before it becomes annoying enough to prioritize. If we made 
it so intermittents never annoyed anyone, there would be even less 
motivation to fix them. Yes in theory we would still have a list of top 
failing intermittents. In practice that list will be ignored.


Case in point, desktop xpcshell does this right now. Open a log and 
ctrl-f for Retrying tests. Most runs have a few failures that got 
retried. No one knows about these and no one looks at them. Publishing 
results somewhere easy to discover would definitely help, but I'm not 
convinced it will help much.


Doing this would also cause us to miss non-intermittent regressions, e.g 
where the ordering of tests tickles the platform the wrong way. On the 
retry, the test would get run in a completely different order and might 
show up green 100% of the time.


Either way, the problem is partly culture, partly due to not good enough 
tooling. I see where this proposal is coming from, but I think there are 
ways of tackling the problem head on. This seems kind of like a last resort.


Andrew
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Removing 'jit-tests' from make check

2014-04-07 Thread Daniel Minor
Hi Terrence,

Thanks! I've filed Bug 992887 to track the expanded mach command.

Cheers,

Dan

- Original Message -
From: Terrence Cole tc...@mozilla.com
To: dev-platform@lists.mozilla.org
Sent: Saturday, April 5, 2014 4:30:53 PM
Subject: Re: Removing 'jit-tests' from make check

Dan,

Congratulations on landing the jit-tests split! I'm glad to hear we're
getting a make check replacement too. We discussed it a bit in IRC and
the rough decision, at least between jorendorff and myself, was to
expand the scope of the replacement to run /all/ of SpiderMonkey's test
suites.

Ideally we'd like to be able to tell new contributors Run this and if
it passes you're probably good to go. instead of Why didn't you run
this other test suite? What do you mean it's not in the wiki? Oh, you
looked at that old page. Well then.

This should be as simple as adding |./tests/jstests.py shell --tbpl|
to mach check-spidermonkey. It sounds like you've got it under control,
but please ping me if you want input.

Cheers,
Terrence


On 04/04/2014 05:44 AM, Daniel Minor wrote:
 Hi Nicolas,
 
 This change only affects running the jit-test test suite as part of make 
 check. This doesn't affect building or running the JS shell.
 
 The mach command that has been added replicates how this particular test 
 suite was previously run in make check. It could be expanded, of course.
 
 Thanks,
 
 Dan
 
 
 
 - Original Message -
 From: Nicolas B. Pierron nicolas.b.pier...@mozilla.com
 To: dev-platform@lists.mozilla.org
 Sent: Friday, April 4, 2014 8:31:32 AM
 Subject: Re: Removing 'jit-tests' from make check
 
 On 04/04/2014 03:39 AM, Daniel Minor wrote:
 Just a heads up that very soon we'll be removing jit-tests from the make 
 check target[1]. The tests have been split out into a separate test job on 
 TBPL[2] (labelled Jit), have been running on Cedar for several months, and 
 have been recently turned on for other trees. We've added a mach command-- 
 mach jittest that runs the tests with the same arguments that make check 
 currently does.
 
 mach jittest ?
 
 Is there any documentation which explain how to only work with the JS Shell 
 by using mach commands?  Does this change implies that every JS developer 
 will have to compile the full browser just to work on the Shell?
 
 The only documentation I know [1] explains how to run a configure  make.
 
 [1] https://developer.mozilla.org/en-US/docs/SpiderMonkey/Build_Documentation
 
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Policy for disabling tests which run on TBPL

2014-04-07 Thread Aryeh Gregor
On Mon, Apr 7, 2014 at 3:20 PM, Andrew Halberstadt
ahalberst...@mozilla.com wrote:
 I would guess the former is true in most cases. But at least there we have a
 *chance* at tracking down and fixing the failure, even if it takes awhile
 before it becomes annoying enough to prioritize. If we made it so
 intermittents never annoyed anyone, there would be even less motivation to
 fix them. Yes in theory we would still have a list of top failing
 intermittents. In practice that list will be ignored.

Is this better or worse than the status quo?  Just because a bug
happens to have made its way into our test suite doesn't mean it
should be high priority.  If the bug isn't causing known problems for
users, it makes sense to ignore it in favor of working on bugs that
are known to affect users.  Why not let the relevant developers make
that prioritization decision, and ignore the bug forever if they don't
think it's as important as other things they're working on?
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Enable -Wswitch-enum? [was Re: MOZ_ASSUME_UNREACHABLE is being misused]

2014-04-07 Thread Zack Weinberg

On 2014-04-07 6:00 AM, Karl Tomlinson wrote:

chiaki ISHIKAWA writes:


I think 7.2 10 is also relevant here.

--- quote ---
An expression of arithmetic or enumeration type can be converted
to an enumeration type explicitly. The
value is unchanged if it is in the range of enumeration values of
the enumeration type; otherwise the resulting
enumeration value is unspecified.
--- end quote

I take so :

typedef enum { a = 1, b, c = 10 } T;
T x;



Anyway, in my example above, a compiler can do anything
if x = 32 is executed (?).


Note here the enumeration value is unspecified, which I assume
merely means the compiler can choose anything for the value of x.

That might be considerably safer than undefined behavior of the
program.


Right.  The intention here is that the compiler is allowed to pick some 
underlying integer type, which must be able to represent all declared 
values of the enumeration, and then silently accept in-range values and 
truncate out-of-range values *for that type* -- but it's *not* allowed 
to apply assume the programmer never does that optimizations as it 
would for undefined behavior.


I don't know what the C++ committee's attitude was, but the C committee 
specifically wanted to allow use of `enum` to declare bitmasks:


enum X {
   X_FOO   = 0x0001,
   X_BAR   = 0x0002,
   /* ... exhaustive list of bit flags here ... */
   X_BLURF = 0x8000
};

For that use case, any _combination_ of X_* flags must be representable 
by the underlying integer type; note that many such combinations are 
outside the *declared* range of values.


zw
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Policy for disabling tests which run on TBPL

2014-04-07 Thread Ted Mielczarek
On 4/7/2014 9:02 AM, Aryeh Gregor wrote:
 On Mon, Apr 7, 2014 at 3:20 PM, Andrew Halberstadt
 ahalberst...@mozilla.com wrote:
 I would guess the former is true in most cases. But at least there we have a
 *chance* at tracking down and fixing the failure, even if it takes awhile
 before it becomes annoying enough to prioritize. If we made it so
 intermittents never annoyed anyone, there would be even less motivation to
 fix them. Yes in theory we would still have a list of top failing
 intermittents. In practice that list will be ignored.
 Is this better or worse than the status quo?  Just because a bug
 happens to have made its way into our test suite doesn't mean it
 should be high priority.  If the bug isn't causing known problems for
 users, it makes sense to ignore it in favor of working on bugs that
 are known to affect users.  Why not let the relevant developers make
 that prioritization decision, and ignore the bug forever if they don't
 think it's as important as other things they're working on?


If a bug is causing a test to fail intermittently, then that test loses
value. It still has some value in that it can catch regressions that
cause it to fail permanently, but we would not be able to catch a
regression that causes it to fail intermittently.

It's difficult to say whether bugs we find via tests are more or less
important than bugs we find via users. It's entirely possible that lots
of the bugs that cause intermittent test failures cause intermittent
weird behavior for our users, we simply don't have any visibility into
that.

-Ted


___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Policy for disabling tests which run on TBPL

2014-04-07 Thread Aryeh Gregor
On Mon, Apr 7, 2014 at 6:12 PM, Ted Mielczarek t...@mielczarek.org wrote:
 If a bug is causing a test to fail intermittently, then that test loses
 value. It still has some value in that it can catch regressions that
 cause it to fail permanently, but we would not be able to catch a
 regression that causes it to fail intermittently.

To some degree, yes, marking a test as expected intermittent causes it
to lose value.  If the developers who work on the relevant component
think the lost value is important enough to track down the cause of
the intermittent failure, they can do so.  That should be their
decision, not something forced on them by infrastructure issues
(everyone else will suffer if you don't find the cause for this
failure in your test).  Making known intermittent failures not turn
the tree orange doesn't stop anyone from fixing intermittent failures,
it just removes pressure from them if they decide they don't want to.
If most developers think they have more important bugs to fix, then I
don't see a problem with that.
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Policy for disabling tests which run on TBPL

2014-04-07 Thread Mike Hoye

On 2014-04-07, 11:12 AM, Ted Mielczarek wrote:
It's difficult to say whether bugs we find via tests are more or less 
important than bugs we find via users. It's entirely possible that 
lots of the bugs that cause intermittent test failures cause 
intermittent weird behavior for our users, we simply don't have any 
visibility into that.


Is there a way - or could there be a way - for us to push builds that 
generate intermittent-failure tests out to the larger Mozilla community?


And, more generally: I'd love it if we could economically tie in 
community engagement to our automated test process somehow.


- mhoye
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Linux testing on single-core VMs nowadays

2014-04-07 Thread Ted Mielczarek
I wanted to post about this because I don't think it's common knowledge
(I only just came to the realization today) and it has potential impact
on the effectiveness of our unit tests.

Currently we run our Linux unit tests exclusively on Amazon EC2
m1.medium[1] instances which have only one CPU core. Previously we used
to run Linux tests on in-house multicore hardware. This means that we're
testing different threading behavior now. In more concrete terms, a
threading bug[2] was found recently by AddressSanitizer but it only
manifested on the build machines (conveniently we still run some limited
xpcshell testing as part of `make check` as well as during packaging)
and not in our extensive unit tests running on the test machines. This
seems unfortunate.

I'm not sure what the real impact of this is. Threading bugs can
certainly manifest on single-core machines, but the scheduling behavior
is different so they're likely to be different bugs. Is this an issue we
should address?

-Ted

1. http://aws.amazon.com/ec2/instance-types/#Instance_Types
2. https://bugzilla.mozilla.org/show_bug.cgi?id=990230

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Formalising the current 'mozilla-central essential pushes only' recommendation

2014-04-07 Thread Taras Glek



Ed Morley wrote:

(Follow-ups to dev.tree-management please)

Hi all :-)

The vast majority of mozilla-central landings are now via curated merges
from integration/team repositories. This dramatically increases the
chance that the tip of mozilla-central is in a known-good state, meaning
that:

* Integration/project repos pulling from mozilla-central are less likely
to receive breakage from elsewhere.
* A bad landing + tree closure on one integration/team repo doesn't
impact the ability of other repositories to merge into mozilla-central.
* Developers have the choice of a safer repository to use as their qbase
for local development / try pushes.
* Nightly builds are less prone to being broken  requiring a respin [1].

Non-critical mozilla-central landings are already discouraged and as
such are rare. However, the sheriffs [2] would like to formalise this,
by adjusting the mozilla-central tree rules [3] to state that direct
pushes must be for one of the following reasons:

1) Merging from an integration/team/project repository (there is no
restriction on who may make these merges).
2) Automated blocklist / HSTS preload list updates [4].
3) For the resolution (ie: backout or follow-up fix) of critical
regressions (eg: top-crashers or other major functional regression) that
will result in a Nightly respin or must make the imminent scheduled
Nightly at all costs.
4) Anything else for which common sense (or asking in #developers) says
is an appropriate reason for a direct landing on mozilla-central.

Clearly #4 is very fuzzy - but I'm hopeful self-policing has a good
chance of success and so would like to try that first. Note #4 would not
include the landing of new features directly onto mozilla-central if
they have missed the last integration repository merge on the day of
uplift - the correct course of action would be to request
aurora-approval and uplift instead, due to the amount of merge-day
breakage this has caused in the past.

If there are any other cases that should be explicitly mentioned above
instead of relying on #4, please let me know.

A proportion of the current mozilla-central non-critical commits are
made by people inadvertently pushing to the wrong repository. To prevent
these, once the tree rules are adjusted on the wiki the sheriffs
envisage the next step will be switching mozilla-central to a non-open
tree state (name TBD) using the existing tree closure hook. Backouts,
merges  automated bot updates will not need any additional annotation -
others will simply use a (yet to be chosen) commit message string to
signify awareness  adherence to the new tree policy.

We welcome your feedback :-)


This sounds good. Ideally there would be no manual merges, backouts and 
everything would be automated so only a bot could access the repo.


Taras



Best wishes,

Ed

[1] We've recently disabled the last good revision functionality on
mozilla-central, since it was both not functioning as expected and also
frequently caused delays in Nightly generation when non-tier 1 jobs are
failed or infra issues caused a single build to fail. The net change is
positive, however it increases the need to ensure mozilla-central is in
a known good state.
[2] https://wiki.mozilla.org/Sheriffing
[3] https://wiki.mozilla.org/Tree_Rules
[4] eg: https://hg.mozilla.org/mozilla-central/rev/c401296f71ae and
https://hg.mozilla.org/mozilla-central/rev/beea7a7f3fc3 - though I think
we may wish to move these to an integration repository in the future,
since they have occasionally caused breakage in the past. However that
will require discussion and automation changes, so I believe we should
maintain the status quo initially.

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Linux testing on single-core VMs nowadays

2014-04-07 Thread Dave Hylands
Hey Ted,

- Original Message -
 From: Ted Mielczarek t...@mielczarek.org
 To: Mozilla Platform Development dev-platform@lists.mozilla.org
 Sent: Monday, April 7, 2014 11:11:22 AM
 Subject: Linux testing on single-core VMs nowadays
 
 I wanted to post about this because I don't think it's common knowledge
 (I only just came to the realization today) and it has potential impact
 on the effectiveness of our unit tests.
 
 Currently we run our Linux unit tests exclusively on Amazon EC2
 m1.medium[1] instances which have only one CPU core. Previously we used
 to run Linux tests on in-house multicore hardware. This means that we're
 testing different threading behavior now. In more concrete terms, a
 threading bug[2] was found recently by AddressSanitizer but it only
 manifested on the build machines (conveniently we still run some limited
 xpcshell testing as part of `make check` as well as during packaging)
 and not in our extensive unit tests running on the test machines. This
 seems unfortunate.
 
 I'm not sure what the real impact of this is. Threading bugs can
 certainly manifest on single-core machines, but the scheduling behavior
 is different so they're likely to be different bugs. Is this an issue we
 should address?

Personally, I think that the more ways we can test for threading issues the 
better.
It seems to me that we should do some amount of testing on single core and 
multi-core.

Then I suppose the question becomes how many cores? 2? 4? 8?

Maybe we can cycle through some different number of cores so that we get 
coverage without duplicating everything?

Threading issues probably don't happen all that often, but when they do happen 
they can be more difficult to track down. So being able to get some coverage on 
machines with different numbers of core seems useful (especially if the number 
of cores is readily available and logged along with the TBPL failures).

Dave Hylands
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Formalising the current 'mozilla-central essential pushes only' recommendation

2014-04-07 Thread Randell Jesup
(Follow-ups to dev.tree-management please)
A proportion of the current mozilla-central non-critical commits are made
by people inadvertently pushing to the wrong repository. To prevent these,
once the tree rules are adjusted on the wiki the sheriffs envisage the next
step will be switching mozilla-central to a non-open tree state (name TBD)
using the existing tree closure hook. Backouts, merges  automated bot
updates will not need any additional annotation - 
others will simply use a (yet to be chosen) commit message string to
signify awareness  adherence to the new tree policy.

This is why all my hgrc files have no default push target; all have
things like:

inbound = ssh://hg.mozilla.org/integration/mozilla-inbound/
or 
beta = ssh://hg.mozilla.org/releases/mozilla-beta/
or
m-c = ssh://hg.mozilla.org/mozilla-central/

such that if I forget what directory I'm working with I can't push to
the wrong repo.

Of course, the default when people clone isn't set that way.  requiring
CENTRAL or whatever would have the effect of blocking everyone's
accidental pushes.


I usually land on central for 3 things: want a fix to get into the
next nightly and I'm unsure or expect no more merges; inbound has been closed
for ages; or it's uplift weekend - even landing Saturday has no
guarantee to being in the merge (though *almost* always it is).  I sleep
better knowing I don't have to worry about when the merge will be done,
if it will be done (and I always hang out and star).  But I understand
the contrary opinion to that last case.   And once in a blue moon I'll
land a m-c update after an uncorrect uplift (uplift of patch but not the
backout) or other fubar.

-- 
Randell Jesup, Mozilla Corp
remove news for personal email
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


B2G emulator issues

2014-04-07 Thread Randell Jesup
The B2G emulator design is causing all sorts of problems.  We just fixed
the #2 orange which was caused by the Audio channel StartPlaying()
taking up to 20 seconds to run (and we fixed it by effectively
removing some timeouts).  However, we just wasted half a week trying to
land AEC  MediaStreamGraph improvements.  We still haven't landed due
to yet another B2G emulator orange, but the solution we used for the M10
problem doesn't fix the fundamental problems with B2G emulator.

Details:

We ran into huge problems getting AEC/MediaStreamGraph changes (bug
818822 and things dependent on it) into the tree due to problems with
B2g-emulator debug M10 (permaorange timeouts).  This test adds a fairly
small amount of processing to input audio data (resampling to 44100Hz).

A test that runs perfectly in emulator opt builds and runs fine locally
in M10 debug (10-12 seconds reported for the test in the logs, with or
without the change), goes from taking 30-40 seconds on tbpl to
350-450(!) seconds (and then times out).  Fix that one, and others fail
even worse.

I contacted Gregor Wagner asking for help and also jgriffin in #b2g.  We
found one problem (emulator going to 'sleep' during mochitests, bug
992436); I have a patch up to enable wakelock globally for mochitests.
However, that just pushed the error a little deeper.

The fundamental problem is that b2g-emulator can't deal safely with any
sort of realtime or semi-realtime data unless run on a fast machine.
The architecture for the emulator setup means the effective CPU power is
dependent on the machine running the test, and that varies a lot (and
tbpl machines are WAY slower than my 2.5 year old desktop).  Combine
that with Debug being much slower, and it's recipe for disaster for any
sort of time-dependent tests.

I worked around it for now, by turning down the timers that push fake
realtime data into the system - this will cause audio underruns in
MediaStreamGraph, and doesn't solve the problem of MediaStreamGraph
potentially overloading itself for other reasons, or breaking
assumptions about being able to keep up with data streams.  (MSG wants
to run every 10ms or so.)

This problem also likely plays hell with the Web Audio tests, and will
play hell with WebRTC echo cancellation and the media reception code,
which will start trying to insert loss-concealment data and break
timer-based packet loss recovery, bandwidth estimators, etc.


As to what to do?  That's a good question, as turning off the emulator
tests isn't a realistic option.

One option (very, very painful, and even slower) would be a proper
device simulator which simulates both the CPU and the system hardware
(of *some* B2G phone).  This would produce the most realistic result
with an emulator.

Another option (likely not simple) would be to find a way to slow down
time for the emulator, such as intercepting system calls and increasing
any time constants (multiplying timer values, timeout values to socket
calls, etc, etc).  This may not be simple.  For devices (audio, etc),
frequencies may need modifying or other adjustments made.

We could require that the emulator needs X Bogomips to run, or to run a
specific test suite.

We could segment out tests that require higher performance and run them
on faster VMs/etc.

We could turn off certain tests on tbpl and run them on separate
dedicated test machines (a bit similar to PGO).  There are downsides to
this of course.

Lastly, we could put in a bank of HW running B2G to run the tests like
the Android test boards/phones.


So, what do we do?  Because if we do nothing, it will only get worse.

-- 
Randell Jesup, Mozilla Corp
remove news for personal email
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: B2G emulator issues

2014-04-07 Thread Jonathan Griffin

How easy is it to identify CPU-sensitive tests?

I think the most practical solution (at least in the near term) is to 
find that set of tests, and run only that set on a faster VM, or on real 
hardware (like our ix slaves).


Jonathan


On 4/7/2014 3:16 PM, Randell Jesup wrote:

The B2G emulator design is causing all sorts of problems.  We just fixed
the #2 orange which was caused by the Audio channel StartPlaying()
taking up to 20 seconds to run (and we fixed it by effectively
removing some timeouts).  However, we just wasted half a week trying to
land AEC  MediaStreamGraph improvements.  We still haven't landed due
to yet another B2G emulator orange, but the solution we used for the M10
problem doesn't fix the fundamental problems with B2G emulator.

Details:

We ran into huge problems getting AEC/MediaStreamGraph changes (bug
818822 and things dependent on it) into the tree due to problems with
B2g-emulator debug M10 (permaorange timeouts).  This test adds a fairly
small amount of processing to input audio data (resampling to 44100Hz).

A test that runs perfectly in emulator opt builds and runs fine locally
in M10 debug (10-12 seconds reported for the test in the logs, with or
without the change), goes from taking 30-40 seconds on tbpl to
350-450(!) seconds (and then times out).  Fix that one, and others fail
even worse.

I contacted Gregor Wagner asking for help and also jgriffin in #b2g.  We
found one problem (emulator going to 'sleep' during mochitests, bug
992436); I have a patch up to enable wakelock globally for mochitests.
However, that just pushed the error a little deeper.

The fundamental problem is that b2g-emulator can't deal safely with any
sort of realtime or semi-realtime data unless run on a fast machine.
The architecture for the emulator setup means the effective CPU power is
dependent on the machine running the test, and that varies a lot (and
tbpl machines are WAY slower than my 2.5 year old desktop).  Combine
that with Debug being much slower, and it's recipe for disaster for any
sort of time-dependent tests.

I worked around it for now, by turning down the timers that push fake
realtime data into the system - this will cause audio underruns in
MediaStreamGraph, and doesn't solve the problem of MediaStreamGraph
potentially overloading itself for other reasons, or breaking
assumptions about being able to keep up with data streams.  (MSG wants
to run every 10ms or so.)

This problem also likely plays hell with the Web Audio tests, and will
play hell with WebRTC echo cancellation and the media reception code,
which will start trying to insert loss-concealment data and break
timer-based packet loss recovery, bandwidth estimators, etc.


As to what to do?  That's a good question, as turning off the emulator
tests isn't a realistic option.

One option (very, very painful, and even slower) would be a proper
device simulator which simulates both the CPU and the system hardware
(of *some* B2G phone).  This would produce the most realistic result
with an emulator.

Another option (likely not simple) would be to find a way to slow down
time for the emulator, such as intercepting system calls and increasing
any time constants (multiplying timer values, timeout values to socket
calls, etc, etc).  This may not be simple.  For devices (audio, etc),
frequencies may need modifying or other adjustments made.

We could require that the emulator needs X Bogomips to run, or to run a
specific test suite.

We could segment out tests that require higher performance and run them
on faster VMs/etc.

We could turn off certain tests on tbpl and run them on separate
dedicated test machines (a bit similar to PGO).  There are downsides to
this of course.

Lastly, we could put in a bank of HW running B2G to run the tests like
the Android test boards/phones.


So, what do we do?  Because if we do nothing, it will only get worse.



___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: B2G emulator issues

2014-04-07 Thread Geoffrey Brown
On 4/7/2014 3:16 PM, Randell Jesup wrote:
 The B2G emulator design is causing all sorts of problems.  We just fixed

That sounds very similar to some of the failures seen on the Android 2.3 
emulator. Many media-related mochitests intermittently time out on the Android 
2.3 emulator when run on aws. These are reported in bug 981889, bug 981886, bug 
981881, and bug 981898, but have not been investigated.
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: B2G emulator issues

2014-04-07 Thread Randell Jesup
How easy is it to identify CPU-sensitive tests?

Easy for some (most but not all media tests).  Almost all
getUserMedia/PeerConnection tests.  ICE/STUN/TURN tests.

Not that easy for some.  And some may be only indirectly sensitive -
timeouts in delay-the-rendering code, TCP/DNS/SPDY timers, etc, etc.
Anything that touches a timer even indirectly *could* be.  So, large
sections *could* be.  I suppose we could include code checking for
MainThread starvation as a partial check though that won't catch
everything.

I think the most practical solution (at least in the near term) is to find
that set of tests, and run only that set on a faster VM, or on real
hardware (like our ix slaves).

That was an option I mentioned.  It's not fun and will be a continual
is this orange CPU sensitive? as they pop up, but it certainly can be
done.  And it may be simpler than better solutions.

-- 
Randell Jesup, Mozilla Corp
remove news for personal email
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Policy for disabling tests which run on TBPL

2014-04-07 Thread Ehsan Akhgari

On 2014-04-07, 11:49 AM, Aryeh Gregor wrote:

On Mon, Apr 7, 2014 at 6:12 PM, Ted Mielczarek t...@mielczarek.org wrote:

If a bug is causing a test to fail intermittently, then that test loses
value. It still has some value in that it can catch regressions that
cause it to fail permanently, but we would not be able to catch a
regression that causes it to fail intermittently.


To some degree, yes, marking a test as expected intermittent causes it
to lose value.  If the developers who work on the relevant component
think the lost value is important enough to track down the cause of
the intermittent failure, they can do so.  That should be their
decision, not something forced on them by infrastructure issues
(everyone else will suffer if you don't find the cause for this
failure in your test).  Making known intermittent failures not turn
the tree orange doesn't stop anyone from fixing intermittent failures,
it just removes pressure from them if they decide they don't want to.
If most developers think they have more important bugs to fix, then I
don't see a problem with that.


What you're saying above is true *if* someone investigates the 
intermittent test failure and determines that the bug is not important. 
 But in my experience, that's not what happens at all.  I think many 
people treat intermittent test failures as a category of unimportant 
problems, and therefore some bugs are never investigated.  The fact of 
the matter is that most of these bugs are bugs in our tests, which of 
course will not impact our users directly, but I have occasionally come 
across bugs in our code code which are exposed as intermittent failures. 
 The real issue is that the work of identifying where the root of the 
problem is often time is the majority of work needed to fix the 
intermittent test failure, so unless someone is willing to investigate 
the bug we cannot say whether or not it impacts our users.


The thing that really makes me care about these intermittent failures a 
lot is that ultimately they make us have to trade either disabling a 
whole bunch of tests with being unable to manage our tree.  As more and 
more tests get disabled, we lose more and more test coverage, and that 
can have a much more severe impact on the health of our products than 
every individual intermittent test failure.


Cheers,
Ehsan

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: B2G emulator issues

2014-04-07 Thread Robert O'Callahan
When you say debug, you mean the emulator is running a FirefoxOS debug
build, not that the emulator itself is built debug --- right?

Given that, is it a correct summary to say that the problem is that the
emulator is just too slow?

Applying time dilation might make tests green but we'd be left with the
problem of the tests still taking a long time to run.

Maybe we should identify a subset of the tests that are more likely to
suffer B2G-specific breaking and only run those?

Rob
-- 
Jtehsauts  tshaei dS,o n Wohfy  Mdaon  yhoaus  eanuttehrotraiitny  eovni
le atrhtohu gthot sf oirng iyvoeu rs ihnesa.rt sS?o  Whhei csha iids  teoa
stiheer :p atroa lsyazye,d  'mYaonu,r  sGients  uapr,e  tfaokreg iyvoeunr,
'm aotr  atnod  sgaoy ,h o'mGee.t  uTph eann dt hwea lmka'n?  gBoutt  uIp
waanndt  wyeonut  thoo mken.o w
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: B2G emulator issues

2014-04-07 Thread Ehsan Akhgari

On 2014-04-07, 8:03 PM, Robert O'Callahan wrote:

When you say debug, you mean the emulator is running a FirefoxOS debug
build, not that the emulator itself is built debug --- right?

Given that, is it a correct summary to say that the problem is that the
emulator is just too slow?

Applying time dilation might make tests green but we'd be left with the
problem of the tests still taking a long time to run.

Maybe we should identify a subset of the tests that are more likely to
suffer B2G-specific breaking and only run those?


Do we disable all compiler optimizations for those debug builds?  Can we 
turn them on, let's say, build with --enable-optimize and --enable-debug 
which gives us a -O2 optimized debug build?


___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: B2G emulator issues

2014-04-07 Thread Shih-Chiang Chien
Why don’t we just switch to x86 emulator? x86 emulator runs way faster than the 
ARM emulator.

Best Regards,
Shih-Chiang Chien
Mozilla Taiwan

On Apr 8, 2014, at 8:49 AM, Ehsan Akhgari ehsan.akhg...@gmail.com wrote:

 On 2014-04-07, 8:03 PM, Robert O'Callahan wrote:
 When you say debug, you mean the emulator is running a FirefoxOS debug
 build, not that the emulator itself is built debug --- right?
 
 Given that, is it a correct summary to say that the problem is that the
 emulator is just too slow?
 
 Applying time dilation might make tests green but we'd be left with the
 problem of the tests still taking a long time to run.
 
 Maybe we should identify a subset of the tests that are more likely to
 suffer B2G-specific breaking and only run those?
 
 Do we disable all compiler optimizations for those debug builds?  Can we turn 
 them on, let's say, build with --enable-optimize and --enable-debug which 
 gives us a -O2 optimized debug build?
 
 ___
 dev-platform mailing list
 dev-platform@lists.mozilla.org
 https://lists.mozilla.org/listinfo/dev-platform



signature.asc
Description: Message signed with OpenPGP using GPGMail
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform