Re: B2G emulator issues

2014-08-28 Thread Randell Jesup
I wrote in April:
The B2G emulator design is causing all sorts of problems.  We just fixed
the #2 orange which was caused by the Audio channel StartPlaying()
taking up to 20 seconds to run (and we fixed it by effectively
removing some timeouts).  However, we just wasted half a week trying to
land AEC  MediaStreamGraph improvements.  We still haven't landed due
to yet another B2G emulator orange, but the solution we used for the M10
problem doesn't fix the fundamental problems with B2G emulator.

You can read the earlier thread (starting 7-apr) about this issue.  We
wallpapered over the issues (including turning down 'fake' audio
generation to 1/10th realtime and letting it underflow).

The problems with the b2g emulator have just gotten worse as we add more
tests and make changes to improve the system that give the emulators
fits.

Right now, we're looking at being blocked from landing important
improvements (that make things *not* fail due to perf timeouts in
real-user-scenarios) because b2g-emulator chokes on anything even
smelling of realtime data.  It can stall for 10's of seconds (see
above), or even minutes.  Even running a single test can cause other,
unrelated tests to perma-orange.

The stuff we've had to do (like turning down audio generation) to block
oranges in the current setup makes the tests very non-real-world, and so
greatly diminishes their utility anyways.

There was work being done to move media and other semi-realtime tests to
faster hardware; that is happening but it's not ready yet. (For reference,
in April tests showed that a b2g emulator mochitest that took 10
seconds on my Xeon took 350-450 seconds on tbpl.)

The fundamental problem is that b2g-emulator can't deal safely with any
sort of realtime or semi-realtime data unless run on a fast machine.
The architecture for the emulator setup means the effective CPU power is
dependent on the machine running the test, and that varies a lot (and
tbpl machines are WAY slower than my 2.5 year old desktop).  Combine
that with Debug being much slower, and it's recipe for disaster for any
sort of time-dependent tests.
...
So, what do we do?  Because if we do nothing, it will only get worse.

So we've done nothing (that's landed at least), and it has gotten worse,
and we're at the breaking point where b2g emulator (especially debug)
for media tests (especially webrtc) is providing negative value, and
blocking critically important improvements.

We've just landed bug 1059867 to disable most webrtc tests on the emulator
until we can get them running on hardware that has the power to run them
(or other fixes make them viable again (bug 1059878)). We may need to
consider similar measures for other media tests (webaudio, etc). In the
meantime, we're going to try to run local emulator pull/build/mochitest
cronjobs on faster desktop machines (perhaps mine) on a daily or perhaps
continuous basis.  (Poor man's tbpl - maybe I'll un-mothball tinderbox
for some nostalgic flames...)

Also note that webrtc tests do run on the b2g desktop tbpl runs, so we
have some coverage.

I hope we can find a better solution than run it on my dev machine
sometime soon (very soon!), but right now that's better than playing
whack-a-random-timeout or just increasing run times to infinity.


P.S. there are some interesting threads of stuff that could help a lot,
like the comment Jay Wang made in April about SpecialPowers.exactGC
taking 3-10s per instance on b2g debug, and tons of them being run (one
test took 102s to finish, and had 90 gc's which mostly took ~10s each).
Bug 1012516

-- 
Randell Jesup, Mozilla Corp
remove news for personal email
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: B2G emulator issues

2014-08-28 Thread Jonathan Griffin
Some more details on how we're approaching this problem from the 
infrastructure side:


Releng recently gave us the ability to run select jobs on faster VM's 
than the default, see 
https://bugzilla.mozilla.org/show_bug.cgi?id=1031083.  We have B2G 
emulator media mochitests scheduled on cedar using these faster VM's.  
After fixing a minor problem with these, we'll be able to see if these 
faster VM's solve the problem.  Local experiments suggest they do, but 
it will take a number of runs in buildbot to be sure.


If that doesn't fix the problem, we have the option of trying still 
faster VM's (at greater cost), or trying to run the tests on real 
hardware.  The disadvantage of running the tests on real hardware is 
that such hardware doesn't scale very readily and is already stretched 
pretty thin, and the emulator doesn't currently run on our linux 
hardware slaves, and will require some amount of work to fix.


This work is being tracked in 
https://bugzilla.mozilla.org/show_bug.cgi?id=994920.


Jonathan


On 8/28/2014 3:06 PM, Randell Jesup wrote:

I wrote in April:

The B2G emulator design is causing all sorts of problems.  We just fixed
the #2 orange which was caused by the Audio channel StartPlaying()
taking up to 20 seconds to run (and we fixed it by effectively
removing some timeouts).  However, we just wasted half a week trying to
land AEC  MediaStreamGraph improvements.  We still haven't landed due
to yet another B2G emulator orange, but the solution we used for the M10
problem doesn't fix the fundamental problems with B2G emulator.

You can read the earlier thread (starting 7-apr) about this issue.  We
wallpapered over the issues (including turning down 'fake' audio
generation to 1/10th realtime and letting it underflow).

The problems with the b2g emulator have just gotten worse as we add more
tests and make changes to improve the system that give the emulators
fits.

Right now, we're looking at being blocked from landing important
improvements (that make things *not* fail due to perf timeouts in
real-user-scenarios) because b2g-emulator chokes on anything even
smelling of realtime data.  It can stall for 10's of seconds (see
above), or even minutes.  Even running a single test can cause other,
unrelated tests to perma-orange.

The stuff we've had to do (like turning down audio generation) to block
oranges in the current setup makes the tests very non-real-world, and so
greatly diminishes their utility anyways.

There was work being done to move media and other semi-realtime tests to
faster hardware; that is happening but it's not ready yet. (For reference,
in April tests showed that a b2g emulator mochitest that took 10
seconds on my Xeon took 350-450 seconds on tbpl.)


The fundamental problem is that b2g-emulator can't deal safely with any
sort of realtime or semi-realtime data unless run on a fast machine.
The architecture for the emulator setup means the effective CPU power is
dependent on the machine running the test, and that varies a lot (and
tbpl machines are WAY slower than my 2.5 year old desktop).  Combine
that with Debug being much slower, and it's recipe for disaster for any
sort of time-dependent tests.

...

So, what do we do?  Because if we do nothing, it will only get worse.

So we've done nothing (that's landed at least), and it has gotten worse,
and we're at the breaking point where b2g emulator (especially debug)
for media tests (especially webrtc) is providing negative value, and
blocking critically important improvements.

We've just landed bug 1059867 to disable most webrtc tests on the emulator
until we can get them running on hardware that has the power to run them
(or other fixes make them viable again (bug 1059878)). We may need to
consider similar measures for other media tests (webaudio, etc). In the
meantime, we're going to try to run local emulator pull/build/mochitest
cronjobs on faster desktop machines (perhaps mine) on a daily or perhaps
continuous basis.  (Poor man's tbpl - maybe I'll un-mothball tinderbox
for some nostalgic flames...)

Also note that webrtc tests do run on the b2g desktop tbpl runs, so we
have some coverage.

I hope we can find a better solution than run it on my dev machine
sometime soon (very soon!), but right now that's better than playing
whack-a-random-timeout or just increasing run times to infinity.


P.S. there are some interesting threads of stuff that could help a lot,
like the comment Jay Wang made in April about SpecialPowers.exactGC
taking 3-10s per instance on b2g debug, and tons of them being run (one
test took 102s to finish, and had 90 gc's which mostly took ~10s each).
Bug 1012516



___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: B2G emulator issues

2014-08-28 Thread Nils Ohlmeier

Thanks Jonathan for the update.

I would like to point out that at least for the WebRTC tests which do 
test the connection between two WebRTC clients we theoretically also 
have the option to split the tests into two half's (which we do for 
steeplechase tests already anyhow), and then start two emulators on the 
same host machine. This could potentially speed up some of the test 
execution as the emulator seems to utilize only one host CPU.
This approach would have the advantage of scaling well as we could keep 
using EC2. But this doe not help with the media issues Randel was 
describing below.
But if we get to this point it might make sense to evaluate which test 
platform works best for which tests.


Best
  Nils

On 8/28/14 5:28 PM, Jonathan Griffin wrote:
Some more details on how we're approaching this problem from the 
infrastructure side:


Releng recently gave us the ability to run select jobs on faster VM's 
than the default, see 
https://bugzilla.mozilla.org/show_bug.cgi?id=1031083.  We have B2G 
emulator media mochitests scheduled on cedar using these faster VM's.  
After fixing a minor problem with these, we'll be able to see if these 
faster VM's solve the problem.  Local experiments suggest they do, but 
it will take a number of runs in buildbot to be sure.


If that doesn't fix the problem, we have the option of trying still 
faster VM's (at greater cost), or trying to run the tests on real 
hardware.  The disadvantage of running the tests on real hardware is 
that such hardware doesn't scale very readily and is already stretched 
pretty thin, and the emulator doesn't currently run on our linux 
hardware slaves, and will require some amount of work to fix.


This work is being tracked in 
https://bugzilla.mozilla.org/show_bug.cgi?id=994920.


Jonathan


On 8/28/2014 3:06 PM, Randell Jesup wrote:

I wrote in April:
The B2G emulator design is causing all sorts of problems.  We just 
fixed

the #2 orange which was caused by the Audio channel StartPlaying()
taking up to 20 seconds to run (and we fixed it by effectively
removing some timeouts).  However, we just wasted half a week trying to
land AEC  MediaStreamGraph improvements.  We still haven't landed due
to yet another B2G emulator orange, but the solution we used for the 
M10

problem doesn't fix the fundamental problems with B2G emulator.

You can read the earlier thread (starting 7-apr) about this issue.  We
wallpapered over the issues (including turning down 'fake' audio
generation to 1/10th realtime and letting it underflow).

The problems with the b2g emulator have just gotten worse as we add more
tests and make changes to improve the system that give the emulators
fits.

Right now, we're looking at being blocked from landing important
improvements (that make things *not* fail due to perf timeouts in
real-user-scenarios) because b2g-emulator chokes on anything even
smelling of realtime data.  It can stall for 10's of seconds (see
above), or even minutes.  Even running a single test can cause other,
unrelated tests to perma-orange.

The stuff we've had to do (like turning down audio generation) to block
oranges in the current setup makes the tests very non-real-world, and so
greatly diminishes their utility anyways.

There was work being done to move media and other semi-realtime tests to
faster hardware; that is happening but it's not ready yet. (For 
reference,

in April tests showed that a b2g emulator mochitest that took 10
seconds on my Xeon took 350-450 seconds on tbpl.)


The fundamental problem is that b2g-emulator can't deal safely with any
sort of realtime or semi-realtime data unless run on a fast machine.
The architecture for the emulator setup means the effective CPU 
power is

dependent on the machine running the test, and that varies a lot (and
tbpl machines are WAY slower than my 2.5 year old desktop). Combine
that with Debug being much slower, and it's recipe for disaster for any
sort of time-dependent tests.

...

So, what do we do?  Because if we do nothing, it will only get worse.

So we've done nothing (that's landed at least), and it has gotten worse,
and we're at the breaking point where b2g emulator (especially debug)
for media tests (especially webrtc) is providing negative value, and
blocking critically important improvements.

We've just landed bug 1059867 to disable most webrtc tests on the 
emulator

until we can get them running on hardware that has the power to run them
(or other fixes make them viable again (bug 1059878)). We may need to
consider similar measures for other media tests (webaudio, etc). In the
meantime, we're going to try to run local emulator pull/build/mochitest
cronjobs on faster desktop machines (perhaps mine) on a daily or perhaps
continuous basis.  (Poor man's tbpl - maybe I'll un-mothball tinderbox
for some nostalgic flames...)

Also note that webrtc tests do run on the b2g desktop tbpl runs, so we
have some coverage.

I hope we can find a better solution than run it on my dev 

Re: B2G emulator issues

2014-05-23 Thread JW Wang
On Tuesday, April 8, 2014 11:45:15 PM UTC+8, Mike Habicher wrote:
 In my experience running tests locally, a single mochitest run on the 
 ARM emulator (hardware: Thinkpad X220, 16GB RAM, SSD) where everything 
 was built with 'B2G_DEBUG=0 B2G_NOOPT=0' will run in 2 to 3 minutes. The 
 same test, run with 'B2G_DEBUG=1 B2G_NOOPT=0' will take 7 to 10 minutes.

 --m.

It could be the same problem as Bug 1012516. test_media_selection.html can take 
up to 1025454ms on B2G ICS Emulator Debug. MediaManager will GC after finishing 
each token and this test has 90 tokens. It takes 10s * 90 = 900s in GC.
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: B2G emulator issues

2014-04-15 Thread Vicamo Yang
I ran crashtest/reftest/marionette/xpcshell/mochitest on 
emulator-x86-kk, have filed related bugs and make them block bug 753928. 
 Basically:


1) need to carry --emulator x86 automatically (bug 996443)
2) to add x86 emulator for xpcshell tests (bug 996473)
3) PROCESS-CRASH at the end of reftest/crashtest (bug 996449)

With some temporary solutions to above, all the test variants run on 
emulator-x86-kk and are about six times faster than ARM emulators.


Best regards,
Vicamo

於 4/9/14, 2:55 AM, Jonathan Griffin 提到:


On 4/8/2014 1:05 AM, Thomas Zimmermann wrote:

There are tests that instruct the emulator to trigger certain HW events.
We can't run them on actual phones.

To me, the idea of switching to a x86-based emulator seems to be the
most promising solution. What would be necessary?

Best regards
Thomas


We'd need these things:

1 - a consensus we want to move to x86-based emulators, which presumes
that architecture-specific problems aren't likely or important enough to
warrant continued use of arm-based emulators

2 - RelEng would need to stand up x86-based KitKat emulator builds

3 - The A*Team would need to get all of the tests running against these
builds

4 - The A*Team and developers would have to work on fixing the
inevitable test failures that occur when standing up any new platform

I'll bring this topic up at the next B2G Engineering Meeting.

Jonathan



___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: B2G emulator issues

2014-04-15 Thread Randell Jesup
I ran crashtest/reftest/marionette/xpcshell/mochitest on emulator-x86-kk,
have filed related bugs and make them block bug 753928. Basically:

1) need to carry --emulator x86 automatically (bug 996443)
2) to add x86 emulator for xpcshell tests (bug 996473)
3) PROCESS-CRASH at the end of reftest/crashtest (bug 996449)

With some temporary solutions to above, all the test variants run on
emulator-x86-kk and are about six times faster than ARM emulators.

6x is good, if everything works and the tools are all in place - though
it means you're not running the real code used on devices, which could
be a problem.

於 4/9/14, 2:55 AM, Jonathan Griffin 提到:

 On 4/8/2014 1:05 AM, Thomas Zimmermann wrote:
 There are tests that instruct the emulator to trigger certain HW events.
 We can't run them on actual phones.

 To me, the idea of switching to a x86-based emulator seems to be the
 most promising solution. What would be necessary?

I don't think the *fundamental* problem is that the emulator is slow; I
think it's that the emulator doesn't simulate the environment very well,
and because of that, being slow (and running slow debug code) makes
things break.

Before worrying about x86 emulator (or going *too* far down the run it
on faster hardware road), we should verify that faster hardware will
produce less spurious oranges.  Manually standing up a few testers and
letting them run the mochitest load (even by hand) until we have enough
data to see what moving the tests will do.

I do think with the current emulator running it on faster hardware
*will* help wallpaper the fundamental problems.  I base this on my
experience with the M10 media tests that began this thread - they ran
fine on a 2.5 year old xeon (~10s, no timeouts) and took hundreds of
seconds (and timed out) on the AWS testers.  So moving the media tests
will likely be a large win.

But this (or x86) doesn't address the fundamental problem, which is that
the emulator clearly isn't emulating the underlying envirment well, in
particular timers (see some of the discussion in this thread). If we can
address the fundamental problem (even crudely), the need for high-perf
testers may decline or even go away.

-- 
Randell Jesup, Mozilla Corp
remove news for personal email
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: B2G emulator issues

2014-04-15 Thread Vicamo Yang

於 4/15/14, 9:42 PM, Randell Jesup 提到:

I ran crashtest/reftest/marionette/xpcshell/mochitest on emulator-x86-kk,
have filed related bugs and make them block bug 753928. Basically:

1) need to carry --emulator x86 automatically (bug 996443)
2) to add x86 emulator for xpcshell tests (bug 996473)


patch available, in review.


3) PROCESS-CRASH at the end of reftest/crashtest (bug 996449)


Actually we have more trouble than this, but I think that can be 
improved with time.  The top of the list should be the lack of 
gdb/gdbserver and maybe other debugging tools for x86 emulators. 
Rebuild AOSP toolchain doesn't seem to be a trivial task. :(



With some temporary solutions to above, all the test variants run on
emulator-x86-kk and are about six times faster than ARM emulators.


6x is good, if everything works and the tools are all in place - though
it means you're not running the real code used on devices, which could
be a problem.


However, emulator is also not real code used on devices. ;)


於 4/9/14, 2:55 AM, Jonathan Griffin 提到:


On 4/8/2014 1:05 AM, Thomas Zimmermann wrote:

There are tests that instruct the emulator to trigger certain HW events.
We can't run them on actual phones.

To me, the idea of switching to a x86-based emulator seems to be the
most promising solution. What would be necessary?


I don't think the *fundamental* problem is that the emulator is slow; I
think it's that the emulator doesn't simulate the environment very well,
and because of that, being slow (and running slow debug code) makes
things break.

Before worrying about x86 emulator (or going *too* far down the run it
on faster hardware road), we should verify that faster hardware will
produce less spurious oranges.  Manually standing up a few testers and
letting them run the mochitest load (even by hand) until we have enough
data to see what moving the tests will do.

I do think with the current emulator running it on faster hardware
*will* help wallpaper the fundamental problems.  I base this on my
experience with the M10 media tests that began this thread - they ran
fine on a 2.5 year old xeon (~10s, no timeouts) and took hundreds of
seconds (and timed out) on the AWS testers.  So moving the media tests
will likely be a large win.

But this (or x86) doesn't address the fundamental problem, which is that
the emulator clearly isn't emulating the underlying envirment well, in
particular timers (see some of the discussion in this thread). If we can
address the fundamental problem (even crudely), the need for high-perf
testers may decline or even go away.



___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: B2G emulator issues

2014-04-09 Thread Thomas Zimmermann
Hi

 That is what the emulator is already doing. If we start emulating HW
 down to individual CPU cycles, it'll only get slower. :(
 
 I think this is wrong in some way.  Otherwise I wouldn't see this:
 1) running on TBPL (AWS) the internal timings reported show the specific
test going from 30 seconds to 450 seconds with the patch.
 2) on my local system, the test self-reports ~10 seconds, with or
without the patch.
 
 The only way I can see that happening is if the simulator in some way
 exposes the underlying platform performance (in specific timers).

Right. What I mean is that we're currently emulating an ARM chipset, but
without timing. If we start doing cycle-correct emulation, it won't get
faster.

 Another option (likely not simple) would be to find a way to slow down
 time for the emulator, such as intercepting system calls and increasing
 any time constants (multiplying timer values, timeout values to socket
 calls, etc, etc).  This may not be simple.  For devices (audio, etc),
 frequencies may need modifying or other adjustments made.

 If we do that, writing and debugging tests will take even longer.
 
 It shouldn't, if the the system self-adapted (per below).  That should
 give a much more predictable (and closer-to-similar to a real device)
 result.  BTW, I presume we're simulating a single-core ARM, so again not
 entirely representative anymore.

Oh, I now get the point of this idea. We could probably implement this
by modifying the emulated timer(s?) within the emulator;
hw/goldfish_timer.c might be the place. Although I wouldn't do this if
we have other options. Don't know how this would affect frequencies
(audio, etc.).

Best regards
Thomas

 
 We could require that the emulator needs X Bogomips to run, or to run a
 specific test suite.

 We could segment out tests that require higher performance and run them
 on faster VMs/etc.

 Do we already know which tests are slow and why? Maybe there are ways to
 optimize the emulator. For example, if we execute lots of driver code
 within the guest, maybe we can move some of that into the emulator's
 binary, where it runs on the native machine.
 
 Dunno.  But it's REALLY slow.  Native linux on tbpl for a specific test: 1s.
 Local emulator (fast 2year-old desktop) 10s.  tbpl before patch 30-40s.
 after 350-450 and we're lucky it finishes at all.
 
 So compared to AWS linux native it's ~30-40x slower without the patch,
 300+ x slower with.  (Again speaks to realtime stuff leaving no CPU for
 test running on tbpl.)  Others can speak to overall speed.
 
 We could turn off certain tests on tbpl and run them on separate
 dedicated test machines (a bit similar to PGO).  There are downsides to
 this of course.

 Lastly, we could put in a bank of HW running B2G to run the tests like
 the Android test boards/phones.

 There are tests that instruct the emulator to trigger certain HW events.
 We can't run them on actual phones.
 
 Sure.  Most don't do that I presume (very few)
 
 To me, the idea of switching to a x86-based emulator seems to be the
 most promising solution. What would be necessary?
 
 Dunno.
 

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: B2G emulator issues

2014-04-09 Thread Randell Jesup
 That is what the emulator is already doing. If we start emulating HW
 down to individual CPU cycles, it'll only get slower. :(
 
 I think this is wrong in some way.  Otherwise I wouldn't see this:
 1) running on TBPL (AWS) the internal timings reported show the specific
test going from 30 seconds to 450 seconds with the patch.
 2) on my local system, the test self-reports ~10 seconds, with or
without the patch.
 
 The only way I can see that happening is if the simulator in some way
 exposes the underlying platform performance (in specific timers).

Right. What I mean is that we're currently emulating an ARM chipset, but
without timing. If we start doing cycle-correct emulation, it won't get
faster.

I still think there's a confusion here.  How are timers connected?  If
I set a timer for 20ms, is that  ARM instructions?  (ignoring if
they're cycle-accurate numbers or now, I'd be fine with assuming 1
instruction-per-cycle.  Or is is 20ms on the host machine, regardless of
how fast or slow the ARM emulation is running?  (I think strongly it's
the latter.)

 Another option (likely not simple) would be to find a way to slow down
 time for the emulator, such as intercepting system calls and increasing
 any time constants (multiplying timer values, timeout values to socket
 calls, etc, etc).  This may not be simple.  For devices (audio, etc),
 frequencies may need modifying or other adjustments made.

 If we do that, writing and debugging tests will take even longer.
 
 It shouldn't, if the the system self-adapted (per below).  That should
 give a much more predictable (and closer-to-similar to a real device)
 result.  BTW, I presume we're simulating a single-core ARM, so again not
 entirely representative anymore.

Oh, I now get the point of this idea. We could probably implement this
by modifying the emulated timer(s?) within the emulator;
hw/goldfish_timer.c might be the place. Although I wouldn't do this if
we have other options. Don't know how this would affect frequencies
(audio, etc.).

If we do this, it wouldn't really affect those - but from the host side
the data coming out would be slow (or even fast on a fast machine).  I
wouldn't use this for interactive use for media (that gets even more
fun); this would be an option to adjust timing or not.

-- 
Randell Jesup, Mozilla Corp
remove news for personal email
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: B2G emulator issues

2014-04-08 Thread Thomas Zimmermann
Hi,

Thanks for bringing up this issue.

 
 One option (very, very painful, and even slower) would be a proper
 device simulator which simulates both the CPU and the system hardware
 (of *some* B2G phone).  This would produce the most realistic result
 with an emulator.

That is what the emulator is already doing. If we start emulating HW
down to individual CPU cycles, it'll only get slower. :(

 Another option (likely not simple) would be to find a way to slow down
 time for the emulator, such as intercepting system calls and increasing
 any time constants (multiplying timer values, timeout values to socket
 calls, etc, etc).  This may not be simple.  For devices (audio, etc),
 frequencies may need modifying or other adjustments made.

If we do that, writing and debugging tests will take even longer.

 We could require that the emulator needs X Bogomips to run, or to run a
 specific test suite.
 
 We could segment out tests that require higher performance and run them
 on faster VMs/etc.

Do we already know which tests are slow and why? Maybe there are ways to
optimize the emulator. For example, if we execute lots of driver code
within the guest, maybe we can move some of that into the emulator's
binary, where it runs on the native machine.

 
 We could turn off certain tests on tbpl and run them on separate
 dedicated test machines (a bit similar to PGO).  There are downsides to
 this of course.
 
 Lastly, we could put in a bank of HW running B2G to run the tests like
 the Android test boards/phones.

There are tests that instruct the emulator to trigger certain HW events.
We can't run them on actual phones.

To me, the idea of switching to a x86-based emulator seems to be the
most promising solution. What would be necessary?

Best regards
Thomas


 
 
 So, what do we do?  Because if we do nothing, it will only get worse.
 

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: B2G emulator issues

2014-04-08 Thread Mike Habicher

On 14-04-07 08:49 PM, Ehsan Akhgari wrote:

On 2014-04-07, 8:03 PM, Robert O'Callahan wrote:

When you say debug, you mean the emulator is running a FirefoxOS debug
build, not that the emulator itself is built debug --- right?

Given that, is it a correct summary to say that the problem is that the
emulator is just too slow?

Applying time dilation might make tests green but we'd be left with the
problem of the tests still taking a long time to run.

Maybe we should identify a subset of the tests that are more likely to
suffer B2G-specific breaking and only run those?


Do we disable all compiler optimizations for those debug builds? Can 
we turn them on, let's say, build with --enable-optimize and 
--enable-debug which gives us a -O2 optimized debug build?


In my experience running tests locally, a single mochitest run on the 
ARM emulator (hardware: Thinkpad X220, 16GB RAM, SSD) where everything 
was built with 'B2G_DEBUG=0 B2G_NOOPT=0' will run in 2 to 3 minutes. The 
same test, run with 'B2G_DEBUG=1 B2G_NOOPT=0' will take 7 to 10 minutes.


--m.

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: B2G emulator issues

2014-04-08 Thread Randell Jesup
Hi,

Thanks for bringing up this issue.

 
 One option (very, very painful, and even slower) would be a proper
 device simulator which simulates both the CPU and the system hardware
 (of *some* B2G phone).  This would produce the most realistic result
 with an emulator.

That is what the emulator is already doing. If we start emulating HW
down to individual CPU cycles, it'll only get slower. :(

I think this is wrong in some way.  Otherwise I wouldn't see this:
1) running on TBPL (AWS) the internal timings reported show the specific
   test going from 30 seconds to 450 seconds with the patch.
2) on my local system, the test self-reports ~10 seconds, with or
   without the patch.

The only way I can see that happening is if the simulator in some way
exposes the underlying platform performance (in specific timers).

Note: the timer in question is nsITimer::TYPE_REPEATING_PRECISE with
10ms timing.  And changing it to 100ms makes the tests reliably green.

 Another option (likely not simple) would be to find a way to slow down
 time for the emulator, such as intercepting system calls and increasing
 any time constants (multiplying timer values, timeout values to socket
 calls, etc, etc).  This may not be simple.  For devices (audio, etc),
 frequencies may need modifying or other adjustments made.

If we do that, writing and debugging tests will take even longer.

It shouldn't, if the the system self-adapted (per below).  That should
give a much more predictable (and closer-to-similar to a real device)
result.  BTW, I presume we're simulating a single-core ARM, so again not
entirely representative anymore.

 We could require that the emulator needs X Bogomips to run, or to run a
 specific test suite.
 
 We could segment out tests that require higher performance and run them
 on faster VMs/etc.

Do we already know which tests are slow and why? Maybe there are ways to
optimize the emulator. For example, if we execute lots of driver code
within the guest, maybe we can move some of that into the emulator's
binary, where it runs on the native machine.

Dunno.  But it's REALLY slow.  Native linux on tbpl for a specific test: 1s.
Local emulator (fast 2year-old desktop) 10s.  tbpl before patch 30-40s.
after 350-450 and we're lucky it finishes at all.

So compared to AWS linux native it's ~30-40x slower without the patch,
300+ x slower with.  (Again speaks to realtime stuff leaving no CPU for
test running on tbpl.)  Others can speak to overall speed.

 We could turn off certain tests on tbpl and run them on separate
 dedicated test machines (a bit similar to PGO).  There are downsides to
 this of course.
 
 Lastly, we could put in a bank of HW running B2G to run the tests like
 the Android test boards/phones.

There are tests that instruct the emulator to trigger certain HW events.
We can't run them on actual phones.

Sure.  Most don't do that I presume (very few)

To me, the idea of switching to a x86-based emulator seems to be the
most promising solution. What would be necessary?

Dunno.

-- 
Randell Jesup, Mozilla Corp
remove news for personal email
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: B2G emulator issues

2014-04-08 Thread Jonathan Griffin


On 4/8/2014 1:05 AM, Thomas Zimmermann wrote:

There are tests that instruct the emulator to trigger certain HW events.
We can't run them on actual phones.

To me, the idea of switching to a x86-based emulator seems to be the
most promising solution. What would be necessary?

Best regards
Thomas


We'd need these things:

1 - a consensus we want to move to x86-based emulators, which presumes 
that architecture-specific problems aren't likely or important enough to 
warrant continued use of arm-based emulators


2 - RelEng would need to stand up x86-based KitKat emulator builds

3 - The A*Team would need to get all of the tests running against these 
builds


4 - The A*Team and developers would have to work on fixing the 
inevitable test failures that occur when standing up any new platform


I'll bring this topic up at the next B2G Engineering Meeting.

Jonathan

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: B2G emulator issues

2014-04-08 Thread Karl Tomlinson
Randell Jesup writes:

 1) running on TBPL (AWS) the internal timings reported show the specific
test going from 30 seconds to 450 seconds with the patch.
 2) on my local system, the test self-reports ~10 seconds, with or
without the patch.

 Note: the timer in question is nsITimer::TYPE_REPEATING_PRECISE with
 10ms timing.  And changing it to 100ms makes the tests reliably green.

Do you know how many simultaneous hardware threads are emulated?

Is it possible that the thread using TYPE_REPEATING_PRECISE has a
high priority, and so it would occupy the single hardware thread
when there is no spare time available for anything else?

The time taken for the test run might depend on the anything
else running.
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


B2G emulator issues

2014-04-07 Thread Randell Jesup
The B2G emulator design is causing all sorts of problems.  We just fixed
the #2 orange which was caused by the Audio channel StartPlaying()
taking up to 20 seconds to run (and we fixed it by effectively
removing some timeouts).  However, we just wasted half a week trying to
land AEC  MediaStreamGraph improvements.  We still haven't landed due
to yet another B2G emulator orange, but the solution we used for the M10
problem doesn't fix the fundamental problems with B2G emulator.

Details:

We ran into huge problems getting AEC/MediaStreamGraph changes (bug
818822 and things dependent on it) into the tree due to problems with
B2g-emulator debug M10 (permaorange timeouts).  This test adds a fairly
small amount of processing to input audio data (resampling to 44100Hz).

A test that runs perfectly in emulator opt builds and runs fine locally
in M10 debug (10-12 seconds reported for the test in the logs, with or
without the change), goes from taking 30-40 seconds on tbpl to
350-450(!) seconds (and then times out).  Fix that one, and others fail
even worse.

I contacted Gregor Wagner asking for help and also jgriffin in #b2g.  We
found one problem (emulator going to 'sleep' during mochitests, bug
992436); I have a patch up to enable wakelock globally for mochitests.
However, that just pushed the error a little deeper.

The fundamental problem is that b2g-emulator can't deal safely with any
sort of realtime or semi-realtime data unless run on a fast machine.
The architecture for the emulator setup means the effective CPU power is
dependent on the machine running the test, and that varies a lot (and
tbpl machines are WAY slower than my 2.5 year old desktop).  Combine
that with Debug being much slower, and it's recipe for disaster for any
sort of time-dependent tests.

I worked around it for now, by turning down the timers that push fake
realtime data into the system - this will cause audio underruns in
MediaStreamGraph, and doesn't solve the problem of MediaStreamGraph
potentially overloading itself for other reasons, or breaking
assumptions about being able to keep up with data streams.  (MSG wants
to run every 10ms or so.)

This problem also likely plays hell with the Web Audio tests, and will
play hell with WebRTC echo cancellation and the media reception code,
which will start trying to insert loss-concealment data and break
timer-based packet loss recovery, bandwidth estimators, etc.


As to what to do?  That's a good question, as turning off the emulator
tests isn't a realistic option.

One option (very, very painful, and even slower) would be a proper
device simulator which simulates both the CPU and the system hardware
(of *some* B2G phone).  This would produce the most realistic result
with an emulator.

Another option (likely not simple) would be to find a way to slow down
time for the emulator, such as intercepting system calls and increasing
any time constants (multiplying timer values, timeout values to socket
calls, etc, etc).  This may not be simple.  For devices (audio, etc),
frequencies may need modifying or other adjustments made.

We could require that the emulator needs X Bogomips to run, or to run a
specific test suite.

We could segment out tests that require higher performance and run them
on faster VMs/etc.

We could turn off certain tests on tbpl and run them on separate
dedicated test machines (a bit similar to PGO).  There are downsides to
this of course.

Lastly, we could put in a bank of HW running B2G to run the tests like
the Android test boards/phones.


So, what do we do?  Because if we do nothing, it will only get worse.

-- 
Randell Jesup, Mozilla Corp
remove news for personal email
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: B2G emulator issues

2014-04-07 Thread Jonathan Griffin

How easy is it to identify CPU-sensitive tests?

I think the most practical solution (at least in the near term) is to 
find that set of tests, and run only that set on a faster VM, or on real 
hardware (like our ix slaves).


Jonathan


On 4/7/2014 3:16 PM, Randell Jesup wrote:

The B2G emulator design is causing all sorts of problems.  We just fixed
the #2 orange which was caused by the Audio channel StartPlaying()
taking up to 20 seconds to run (and we fixed it by effectively
removing some timeouts).  However, we just wasted half a week trying to
land AEC  MediaStreamGraph improvements.  We still haven't landed due
to yet another B2G emulator orange, but the solution we used for the M10
problem doesn't fix the fundamental problems with B2G emulator.

Details:

We ran into huge problems getting AEC/MediaStreamGraph changes (bug
818822 and things dependent on it) into the tree due to problems with
B2g-emulator debug M10 (permaorange timeouts).  This test adds a fairly
small amount of processing to input audio data (resampling to 44100Hz).

A test that runs perfectly in emulator opt builds and runs fine locally
in M10 debug (10-12 seconds reported for the test in the logs, with or
without the change), goes from taking 30-40 seconds on tbpl to
350-450(!) seconds (and then times out).  Fix that one, and others fail
even worse.

I contacted Gregor Wagner asking for help and also jgriffin in #b2g.  We
found one problem (emulator going to 'sleep' during mochitests, bug
992436); I have a patch up to enable wakelock globally for mochitests.
However, that just pushed the error a little deeper.

The fundamental problem is that b2g-emulator can't deal safely with any
sort of realtime or semi-realtime data unless run on a fast machine.
The architecture for the emulator setup means the effective CPU power is
dependent on the machine running the test, and that varies a lot (and
tbpl machines are WAY slower than my 2.5 year old desktop).  Combine
that with Debug being much slower, and it's recipe for disaster for any
sort of time-dependent tests.

I worked around it for now, by turning down the timers that push fake
realtime data into the system - this will cause audio underruns in
MediaStreamGraph, and doesn't solve the problem of MediaStreamGraph
potentially overloading itself for other reasons, or breaking
assumptions about being able to keep up with data streams.  (MSG wants
to run every 10ms or so.)

This problem also likely plays hell with the Web Audio tests, and will
play hell with WebRTC echo cancellation and the media reception code,
which will start trying to insert loss-concealment data and break
timer-based packet loss recovery, bandwidth estimators, etc.


As to what to do?  That's a good question, as turning off the emulator
tests isn't a realistic option.

One option (very, very painful, and even slower) would be a proper
device simulator which simulates both the CPU and the system hardware
(of *some* B2G phone).  This would produce the most realistic result
with an emulator.

Another option (likely not simple) would be to find a way to slow down
time for the emulator, such as intercepting system calls and increasing
any time constants (multiplying timer values, timeout values to socket
calls, etc, etc).  This may not be simple.  For devices (audio, etc),
frequencies may need modifying or other adjustments made.

We could require that the emulator needs X Bogomips to run, or to run a
specific test suite.

We could segment out tests that require higher performance and run them
on faster VMs/etc.

We could turn off certain tests on tbpl and run them on separate
dedicated test machines (a bit similar to PGO).  There are downsides to
this of course.

Lastly, we could put in a bank of HW running B2G to run the tests like
the Android test boards/phones.


So, what do we do?  Because if we do nothing, it will only get worse.



___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: B2G emulator issues

2014-04-07 Thread Geoffrey Brown
On 4/7/2014 3:16 PM, Randell Jesup wrote:
 The B2G emulator design is causing all sorts of problems.  We just fixed

That sounds very similar to some of the failures seen on the Android 2.3 
emulator. Many media-related mochitests intermittently time out on the Android 
2.3 emulator when run on aws. These are reported in bug 981889, bug 981886, bug 
981881, and bug 981898, but have not been investigated.
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: B2G emulator issues

2014-04-07 Thread Randell Jesup
How easy is it to identify CPU-sensitive tests?

Easy for some (most but not all media tests).  Almost all
getUserMedia/PeerConnection tests.  ICE/STUN/TURN tests.

Not that easy for some.  And some may be only indirectly sensitive -
timeouts in delay-the-rendering code, TCP/DNS/SPDY timers, etc, etc.
Anything that touches a timer even indirectly *could* be.  So, large
sections *could* be.  I suppose we could include code checking for
MainThread starvation as a partial check though that won't catch
everything.

I think the most practical solution (at least in the near term) is to find
that set of tests, and run only that set on a faster VM, or on real
hardware (like our ix slaves).

That was an option I mentioned.  It's not fun and will be a continual
is this orange CPU sensitive? as they pop up, but it certainly can be
done.  And it may be simpler than better solutions.

-- 
Randell Jesup, Mozilla Corp
remove news for personal email
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: B2G emulator issues

2014-04-07 Thread Robert O'Callahan
When you say debug, you mean the emulator is running a FirefoxOS debug
build, not that the emulator itself is built debug --- right?

Given that, is it a correct summary to say that the problem is that the
emulator is just too slow?

Applying time dilation might make tests green but we'd be left with the
problem of the tests still taking a long time to run.

Maybe we should identify a subset of the tests that are more likely to
suffer B2G-specific breaking and only run those?

Rob
-- 
Jtehsauts  tshaei dS,o n Wohfy  Mdaon  yhoaus  eanuttehrotraiitny  eovni
le atrhtohu gthot sf oirng iyvoeu rs ihnesa.rt sS?o  Whhei csha iids  teoa
stiheer :p atroa lsyazye,d  'mYaonu,r  sGients  uapr,e  tfaokreg iyvoeunr,
'm aotr  atnod  sgaoy ,h o'mGee.t  uTph eann dt hwea lmka'n?  gBoutt  uIp
waanndt  wyeonut  thoo mken.o w
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: B2G emulator issues

2014-04-07 Thread Ehsan Akhgari

On 2014-04-07, 8:03 PM, Robert O'Callahan wrote:

When you say debug, you mean the emulator is running a FirefoxOS debug
build, not that the emulator itself is built debug --- right?

Given that, is it a correct summary to say that the problem is that the
emulator is just too slow?

Applying time dilation might make tests green but we'd be left with the
problem of the tests still taking a long time to run.

Maybe we should identify a subset of the tests that are more likely to
suffer B2G-specific breaking and only run those?


Do we disable all compiler optimizations for those debug builds?  Can we 
turn them on, let's say, build with --enable-optimize and --enable-debug 
which gives us a -O2 optimized debug build?


___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: B2G emulator issues

2014-04-07 Thread Shih-Chiang Chien
Why don’t we just switch to x86 emulator? x86 emulator runs way faster than the 
ARM emulator.

Best Regards,
Shih-Chiang Chien
Mozilla Taiwan

On Apr 8, 2014, at 8:49 AM, Ehsan Akhgari ehsan.akhg...@gmail.com wrote:

 On 2014-04-07, 8:03 PM, Robert O'Callahan wrote:
 When you say debug, you mean the emulator is running a FirefoxOS debug
 build, not that the emulator itself is built debug --- right?
 
 Given that, is it a correct summary to say that the problem is that the
 emulator is just too slow?
 
 Applying time dilation might make tests green but we'd be left with the
 problem of the tests still taking a long time to run.
 
 Maybe we should identify a subset of the tests that are more likely to
 suffer B2G-specific breaking and only run those?
 
 Do we disable all compiler optimizations for those debug builds?  Can we turn 
 them on, let's say, build with --enable-optimize and --enable-debug which 
 gives us a -O2 optimized debug build?
 
 ___
 dev-platform mailing list
 dev-platform@lists.mozilla.org
 https://lists.mozilla.org/listinfo/dev-platform



signature.asc
Description: Message signed with OpenPGP using GPGMail
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform