On 2016/02/11 5:47, Robert O'Callahan wrote:
On Thu, Feb 11, 2016 at 9:32 AM, Ted Mielczarek <t...@mielczarek.org> wrote:

BenWa tried doing some work on this but kept getting hung up
on hitting test failures unrelated to the ones we see in production,
possibly due to environment issues.
Yes. In this vein, it's possible that in some cases rr chaos mode might
trigger bugs that don't normally happen, that one way or another block you
from finding the bug you care about.

However, bugs found by rr chaos mode should all be "real bugs". I'd
certainly love to hear about any cases where that's not true.

Rob

This scheduling change causing rare to reproduce bugs to occur more often sounds interesting.

I have found that running C-C TB (sorry it is not the browser here)
under valgrind/memcheck which slows down the operation dramatically
have helped me to find a few issues.
From the top of my head:
- incremental GC gets re-entered before it finishes the previous invocation.
   This was not handled properly until I noticed the issue, but it is
   now handled OK.
 - there are some issues in threading.
For one, at start up, some threads incorrectly assume that window as on screen is
   already there, but due to the slowdown, it is not created yet.
I see some disturbing warning messages printed on the invoking tty window. I have not filed a bug yet since this is relatively new. I don't think I saw
   such messages early last year.

For the other, at shutdown, C-C TB has a problem of incorrect ordering of
   thread shutdown: some threads seem to request services during shutdown
from service providers, but threads that provide the services have already shutdown. So proper shutdown does not happen. There may even be a cyclic
   dependency. Who knows?
   With the slowdown due to valgrind/memcheck, the issue
   gets more pronounced. Well, right now, though, there is
   a timer that monitors the shtudown process and the prolonged timeout of
   some operations due to the thread missing and the slowdown caused by
valgrind/memcheck automatically triggers the assertion of permanent hung at shutdown and so it is difficult to figure out what are going on. But one can
   hope that the check for permanent hung gets removed temporarily to
   investigate the issue further.
   Crashes at C-C TB are something I experienced several times in the last
   couple of years in real life.


Another thing this rr framework or similar approach will be useful for C-C TB xpcshell testing (and I think it is useful for FF xpcshell testing as well.)

There seem to be a few intermittent test failures in xpcshell tests.
This rr approach may make the test fail more often.

*HOWEVER*, I am going to file a bugzilla about
OVEREAGER ASYNC approach of the current test xpcshell script introducing
spurious errors at least under Windows (a previous test which still have some files open has not completely shut down before the next test that seems to use
THOSE files get started. Under windows, opening such a file may result in
file locked error (under linux/OSX, I think it is OK to open such files unless the first program explicitly calls |flock| or something.)

So whether ALL the intermittent failures in C-C TB xpcshell tests are something that can be investigated better with rr approach is anyone's guess, but
I think it does have a potential to trigger more dormant bugs just as
valgrind/memcheck uncovered a few timing issues.

But one other post suggested that it is not applicable right now outside Gecko, meaning C-C TB xpcshell testing cannot directly benefit from rr?
(The approach, of course, can be emulated, I suppose.)

TIA


_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Reply via email to