Re: Using rr chaos mode to find intermittent bugs

ISHIKAWA,chiaki Wed, 10 Feb 2016 13:44:48 -0800

On 2016/02/11 5:47, Robert O'Callahan wrote:

On Thu, Feb 11, 2016 at 9:32 AM, Ted Mielczarek <t...@mielczarek.org> wrote:

BenWa tried doing some work on this but kept getting hung up
on hitting test failures unrelated to the ones we see in production,

possibly due to environment issues.
Yes. In this vein, it's possible that in some cases rr chaos mode might
trigger bugs that don't normally happen, that one way or another block you
from finding the bug you care about.

However, bugs found by rr chaos mode should all be "real bugs". I'd
certainly love to hear about any cases where that's not true.

Rob

This scheduling change causing rare to reproduce bugs to occur moreoften sounds interesting.


I have found that running C-C TB (sorry it is not the browser here)
under valgrind/memcheck which slows down the operation dramatically
have helped me to find a few issues.
From the top of my head:

- incremental GC gets re-entered before it finishes the previousinvocation.

   This was not handled properly until I noticed the issue, but it is
   now handled OK.
 - there are some issues in threading.

For one, at start up, some threads incorrectly assume that window ason screen is

   already there, but due to the slowdown, it is not created yet.

I see some disturbing warning messages printed on the invoking ttywindow.I have not filed a bug yet since this is relatively new. I don'tthink I saw

   such messages early last year.

For the other, at shutdown, C-C TB has a problem of incorrectordering of

   thread shutdown: some threads seem to request services during shutdown

from service providers, but threads that provide the services havealreadyshutdown. So proper shutdown does not happen. There may even be acyclic

   dependency. Who knows?
   With the slowdown due to valgrind/memcheck, the issue
   gets more pronounced. Well, right now, though, there is
   a timer that monitors the shtudown process and the prolonged timeout of
   some operations due to the thread missing and the slowdown caused by

valgrind/memcheck automatically triggers the assertion of permanenthung atshutdown and so it is difficult to figure out what are going on. Butone can

   hope that the check for permanent hung gets removed temporarily to
   investigate the issue further.
   Crashes at C-C TB are something I experienced several times in the last
   couple of years in real life.

Another thing this rr framework or similar approach will be useful forC-C TB xpcshell testing (and I think it is useful for FF xpcshelltesting as well.)


There seem to be a few intermittent test failures in xpcshell tests.
This rr approach may make the test fail more often.

*HOWEVER*, I am going to file a bugzilla about
OVEREAGER ASYNC approach of the current test xpcshell script introducing

spurious errors at least under Windows (a previous test which still havesome files open has not completely shut down before the next test thatseems to use

THOSE files get started. Under windows, opening such a file may result in

file locked error (under linux/OSX, I think it is OK to open such filesunless the first program explicitly calls |flock| or something.)

So whether ALL the intermittent failures in C-C TB xpcshell tests aresomething that can be investigated better with rr approach is anyone'sguess, but

I think it does have a potential to trigger more dormant bugs just as
valgrind/memcheck uncovered a few timing issues.

But one other post suggested that it is not applicable right now outsideGecko, meaning C-C TB xpcshell testing cannot directly benefit from rr?

(The approach, of course, can be emulated, I suppose.)

TIA


_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Re: Using rr chaos mode to find intermittent bugs

Reply via email to