Hi! To support our heap analysis developer tools, and for many other
hacks, we'd like to have a mode in which we record the JavaScript call
stack under which each object is allocated. Our lovely object metadata
API should work nicely for this. However, I'm expecting that doing a
stack walk every time we allocate an object will have significant
performance impact; after we've done our first cut simply using good old
ScriptFrameIter, we'll want to look at optimizations.
In keeping with established software development tradition regarding
performance, I would like to discuss a possible optimization before I
have obtained any actual data about how much time the obvious solution
takes. I'd love to hear that this seems infeasible or not valuable,
because that would save us work. :)
First optimization: AbstractCodePtr
When we're running bytecode, looking up the corresponding source
location entails parsing source notes until we have reached the desired
bytecode location, by which time the source notes have told us the
current line and column number. So this lookup is linear in the length
of the source notes.
Similarly, when we're running IonMonkey code, finding the corresponding
source location entails looking up the OsiIndex for the given return
address, and then (I gather) consulting the snapshot for more details.
And that work yields a JSScript and bytecode offset, which must be
looked up in the source notes, as above.
But for profiling - and, I suspect, many other uses - this work is
usually wasted: we often capture stacks that we don't print. So we
should put off all these lookups as long as possible.
It seemed to me that we could minimize the actual lookups by
representing code positions using a type that was quick to construct,
and put off doing the lookups until asked. This AbstractCodePtr class
could store an <IonScript, displacement> pair, or a <JSScript, bytecode
offset> pair, or an actual <URL, line, column> - and mutate itself from
lazier to more reified forms on demand.
If each compartment stored a HashMap from scripts to sets of
AbstractCodePtrs in that script, then the destructors for IonScripts and
JSScripts could do a just-in-time de-lazification, so that holding an
AbstractCodePtr needn't force the underlying IonScript or JSScript to be
held alive as well. An AbstractCodePtr would simply delazify itself as
needed to allow its referent to die first.
Second optimization: recognizing stack prefixes we've already unwound
When unwinding the stack for profiling, the frames at the older end of
the stack are going to get walked over and over. It would be helpful if
we could have a bit available on stack frames that is initially clear,
but which we can set to indicate that we have cached the rest of the
stack somewhere. js::StackFrame already has a flags field whose upper
bits are zeroed. In IonFrames, a bit in the descriptor would work for
this, if one is available; pushing descriptors with an extra zero bit
next to the constructing bit should have no runtime cost.
_______________________________________________
dev-tech-js-engine-internals mailing list
[email protected]
https://lists.mozilla.org/listinfo/dev-tech-js-engine-internals