On x86, the jits use ebp as a general purpose register (16.6% more gprs!)
so corruption is expected if you try to walk the stack using the classic
ebp/saved-ebp-linked-list traversal that nsStackWalk.cpp:894 is trying to
do.  I vaguely recall that, some time ago, we preserved ebp #ifdef
MOZ_PROFILING so ebp would hold its original value all throughout jit code
and thus a frame pointer walk would skip over all JIT activations
(reporting a single unknown-symbol JIT frame).  However, I can't find this
now and hg annotate shows that this hasn't been the case for a long time.
We currently preserve ebp/rbp if *dynamic* profiling is enabled, but that's
the devtools profiler, so not relevant here.

Given that, in my experience, these non-fatal assertions happen all the
time, I'm surprised we don't see this crash more often and that makes me
think I'm missing a piece of the story here.  Perhaps devs do hit these
crashes and then fix the root cause of the assertion instead (yea for fatal
assertions!)?

To fix your problem, we could consider adding back the #ifdef
MOZ_PROFILING-preserve-ebp, but, iiuc, this would change regalloc behavior
on all nightly/m-c builds and so it could hide bugs that only show up on
release channels.  It would also make our performance measurements on
nightly/m-c less representative of release.  So this probably isn't our
best option (and the reason why we removed the #ifdef in the first place).

Instead, perhaps the fix is to change nsStackWalk.cpp to sanity check that
'bp' is in the thread's stack range (returning NS_ERROR_UNEXPECTED if
outside the range)?  It looks like we already have __libc_stack_end and the
initial bp to use as the range.  Perhaps you could try this and see if it
fixes the crash?

(I think the value 0xffffff88 just happens to be the value we clobber ebp
with in this particular case, but it wouldn't be the problematic value in
general.)

Cheers,
Luke


On Fri, Jan 9, 2015 at 1:49 AM, Cameron McCormack <[email protected]> wrote:

> Hi JS hackers,
>
> I have a patch queue (unrelated to JS) that causes an (expected) assertion
> to trigger in a mochitest, and when nsStackWalker is invoked to print out a
> stack trace, it ends up crashing due to finding a bad value on the stack.
> The second last valid frame printed is js::jit::InvokeFunction, and the
> last valid frame printed is what I assume to be in JIT code.  The crashing
> frame pointer is 0xffffff88 (which I'm told is a magic number the JS engine
> uses).
>
> I don't know if my patches are causing the problem or not (I'm hoping
> not), so I have a couple of questions.
>
> 1. In a debug build, should JIT code always be stack walkable?
>
> 2. Is there an easy way one of you would be able to tell whether this is a
> bug in the JITted code, or I can rule it out so I can look harder at my
> patches or existing code?  What information can I give to help?
>
> The crash is easily reproducible; you can see it on try here
> https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=b02537901def
> and I have an rr recording of it locally.
>
> If I explicitly check for 0xffffff88 and abort the stack walking when I
> find it, then the mochitest completes without a problem; so what the stack
> walker is interpreting as a frame pointer must not actually be used as a
> frame pointer in the end.
>
> Grateful for any hints on how to proceed.
>
> Thanks,
>
> Cameron
> _______________________________________________
> dev-tech-js-engine-internals mailing list
> [email protected]
> https://lists.mozilla.org/listinfo/dev-tech-js-engine-internals
>
_______________________________________________
dev-tech-js-engine-internals mailing list
[email protected]
https://lists.mozilla.org/listinfo/dev-tech-js-engine-internals

Reply via email to