On 2016/02/11 5:47, Robert O'Callahan wrote:
On Thu, Feb 11, 2016 at 9:32 AM, Ted Mielczarek <t...@mielczarek.org> wrote:
BenWa tried doing some work on this but kept getting hung up
on hitting test failures unrelated to the ones we see in production,
possibly due to environment issues.
Yes. In this vein, it's possible that in some cases rr chaos mode might
trigger bugs that don't normally happen, that one way or another block you
from finding the bug you care about.
However, bugs found by rr chaos mode should all be "real bugs". I'd
certainly love to hear about any cases where that's not true.
Rob
This scheduling change causing rare to reproduce bugs to occur more
often sounds interesting.
I have found that running C-C TB (sorry it is not the browser here)
under valgrind/memcheck which slows down the operation dramatically
have helped me to find a few issues.
From the top of my head:
- incremental GC gets re-entered before it finishes the previous
invocation.
This was not handled properly until I noticed the issue, but it is
now handled OK.
- there are some issues in threading.
For one, at start up, some threads incorrectly assume that window as
on screen is
already there, but due to the slowdown, it is not created yet.
I see some disturbing warning messages printed on the invoking tty
window.
I have not filed a bug yet since this is relatively new. I don't
think I saw
such messages early last year.
For the other, at shutdown, C-C TB has a problem of incorrect
ordering of
thread shutdown: some threads seem to request services during shutdown
from service providers, but threads that provide the services have
already
shutdown. So proper shutdown does not happen. There may even be a
cyclic
dependency. Who knows?
With the slowdown due to valgrind/memcheck, the issue
gets more pronounced. Well, right now, though, there is
a timer that monitors the shtudown process and the prolonged timeout of
some operations due to the thread missing and the slowdown caused by
valgrind/memcheck automatically triggers the assertion of permanent
hung at
shutdown and so it is difficult to figure out what are going on. But
one can
hope that the check for permanent hung gets removed temporarily to
investigate the issue further.
Crashes at C-C TB are something I experienced several times in the last
couple of years in real life.
Another thing this rr framework or similar approach will be useful for
C-C TB xpcshell testing (and I think it is useful for FF xpcshell
testing as well.)
There seem to be a few intermittent test failures in xpcshell tests.
This rr approach may make the test fail more often.
*HOWEVER*, I am going to file a bugzilla about
OVEREAGER ASYNC approach of the current test xpcshell script introducing
spurious errors at least under Windows (a previous test which still have
some files open has not completely shut down before the next test that
seems to use
THOSE files get started. Under windows, opening such a file may result in
file locked error (under linux/OSX, I think it is OK to open such files
unless the first program explicitly calls |flock| or something.)
So whether ALL the intermittent failures in C-C TB xpcshell tests are
something that can be investigated better with rr approach is anyone's
guess, but
I think it does have a potential to trigger more dormant bugs just as
valgrind/memcheck uncovered a few timing issues.
But one other post suggested that it is not applicable right now outside
Gecko, meaning C-C TB xpcshell testing cannot directly benefit from rr?
(The approach, of course, can be emulated, I suppose.)
TIA
_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform