On x86, the jits use ebp as a general purpose register (16.6% more gprs!) so corruption is expected if you try to walk the stack using the classic ebp/saved-ebp-linked-list traversal that nsStackWalk.cpp:894 is trying to do. I vaguely recall that, some time ago, we preserved ebp #ifdef MOZ_PROFILING so ebp would hold its original value all throughout jit code and thus a frame pointer walk would skip over all JIT activations (reporting a single unknown-symbol JIT frame). However, I can't find this now and hg annotate shows that this hasn't been the case for a long time. We currently preserve ebp/rbp if *dynamic* profiling is enabled, but that's the devtools profiler, so not relevant here.
Given that, in my experience, these non-fatal assertions happen all the time, I'm surprised we don't see this crash more often and that makes me think I'm missing a piece of the story here. Perhaps devs do hit these crashes and then fix the root cause of the assertion instead (yea for fatal assertions!)? To fix your problem, we could consider adding back the #ifdef MOZ_PROFILING-preserve-ebp, but, iiuc, this would change regalloc behavior on all nightly/m-c builds and so it could hide bugs that only show up on release channels. It would also make our performance measurements on nightly/m-c less representative of release. So this probably isn't our best option (and the reason why we removed the #ifdef in the first place). Instead, perhaps the fix is to change nsStackWalk.cpp to sanity check that 'bp' is in the thread's stack range (returning NS_ERROR_UNEXPECTED if outside the range)? It looks like we already have __libc_stack_end and the initial bp to use as the range. Perhaps you could try this and see if it fixes the crash? (I think the value 0xffffff88 just happens to be the value we clobber ebp with in this particular case, but it wouldn't be the problematic value in general.) Cheers, Luke On Fri, Jan 9, 2015 at 1:49 AM, Cameron McCormack <[email protected]> wrote: > Hi JS hackers, > > I have a patch queue (unrelated to JS) that causes an (expected) assertion > to trigger in a mochitest, and when nsStackWalker is invoked to print out a > stack trace, it ends up crashing due to finding a bad value on the stack. > The second last valid frame printed is js::jit::InvokeFunction, and the > last valid frame printed is what I assume to be in JIT code. The crashing > frame pointer is 0xffffff88 (which I'm told is a magic number the JS engine > uses). > > I don't know if my patches are causing the problem or not (I'm hoping > not), so I have a couple of questions. > > 1. In a debug build, should JIT code always be stack walkable? > > 2. Is there an easy way one of you would be able to tell whether this is a > bug in the JITted code, or I can rule it out so I can look harder at my > patches or existing code? What information can I give to help? > > The crash is easily reproducible; you can see it on try here > https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=b02537901def > and I have an rr recording of it locally. > > If I explicitly check for 0xffffff88 and abort the stack walking when I > find it, then the mochitest completes without a problem; so what the stack > walker is interpreting as a frame pointer must not actually be used as a > frame pointer in the end. > > Grateful for any hints on how to proceed. > > Thanks, > > Cameron > _______________________________________________ > dev-tech-js-engine-internals mailing list > [email protected] > https://lists.mozilla.org/listinfo/dev-tech-js-engine-internals > _______________________________________________ dev-tech-js-engine-internals mailing list [email protected] https://lists.mozilla.org/listinfo/dev-tech-js-engine-internals

