On Wed, May 13, 2020 at 7:54 PM Jason Orendorff <jorendo...@mozilla.com> wrote:
> Jan, I'd like to know what you think about this. > Approach 2 is what we used to do before the big generator rewrite that resulted in Baseline JIT support. Problems I see with bringing it back: - It ties the generator's heap representation to JIT internals resulting in weird edge cases. For example: what happens if we execute in the Baseline JIT, yield from the generator, discard the BaselineScript on GC, then resume? We'd need to fix up the heap-allocated frame somehow or Baseline-compile immediately. Debugger and profiler make these things worse. - There are places <https://searchfox.org/mozilla-central/rev/9f074fab9bf905fad62e7cc32faf121195f4ba46/js/src/jit/BaselineCodeGen.cpp#611-615> where we calculate the BaselineFrame's size based on FP/SP distance. This especially matters for blinterp because it doesn't know anything about the script statically. - blinterp /must/ access expression stack slots via SP because the offset from FP isn't statically known (depends on number of locals and expression stack slots). - It makes Ion/Warp support hard because its frame layout is very different (regalloc spills to 'arbitrary' stack slots). For approach 1 we could potentially share get/set opcodes for locals/formals because it's a single array (or maybe not due to ArgumentsObject weirdness). Here's a hybrid approach 3: - GeneratorObject owns a list-of-Values (for formals and locals) as in approach 1, but InterpreterFrame and BaselineFrame store a raw pointer to that array. - We'd need different get/set opcodes for generator locals, but they'd only be a single load (fp->generatorArray) slower than normal locals [0]. - Maybe we could implement this by ensuring the "list of Values" == "the GeneratorObject's dynamic slots". That way we get things like nursery allocation of that array for free. - For expression stack slots: the list-of-Values could reserve space for them (based on script->nslots) and then we emit store ops before the yield and load + push ops after. This way yield/resume with expression stack slots would be faster than what we do now (ArrayObject allocation..) and if there are no expression stack slots we don't have any runtime overhead. This could potentially be optimized in the frontend so that if you have a yield right after another yield, you only have to emit stores for the part of the expression stack that actually changed. [0] For each approach: we probably want to avoid pre-barriers and post-barriers when storing generator locals while executing. The old code had some complexity around pre-barriers, it's probably worth trying to understand what it did. Jan > > Currently, in generators, all bindings—arguments, locals, etc.—are > marked as "aliased". It's slow. Two possible fixes: > > 1. LOCALS ON GENERATOR > > Add bytecode instructions for getting and setting locals in a > generator. Just as non-aliased locals in normal functions are > optimized into stack slots, in generators optimize them into slots > on the GeneratorObject, where these special instructions can reach > them. > > The expression stack remains as-is: on yield, copy it from the stack > to the generator; on resume, copy it back. > > 2. FRAME ON GENERATOR > > When calling a generator, create a normal interpreter/baseline stack > frame, but on the heap instead of the stack, a part of the > GeneratorObject. Copy the arguments there. That is actually going to > be our frame. It is not physically located on the stack. > > In Interpreter.cpp, make `REGS.fp` point to the current stack frame > even when it's allocated in a generator, and likewise `REGS.sp`. > Same for the corresponding registers in Baseline/ blinterp. All uses > of `BaselineFrameReg` we looked at would still work if it pointed to > a heap-allocated frame. > > The CPU's `rsp` would still point to the C stack as usual. > > Now just remove the special case in the frontend that marks all > bindings in generators as aliased. Emit normal GetLocal, SetLocal, > GetArg, etc. instructions. "Everything" "just" works. > > Approach 1 is more flexible and calls might be faster. In approach 2, > `yield` and resume are faster, and we avoid having new opcodes. However, > the complexity settles at boundaries between execution modes, which > would now have to cope with frames possibly being noncontiguous and > stored on the heap. > > What do you think? > > -j > _______________________________________________ > dev-tech-js-engine-internals mailing list > dev-tech-js-engine-internals@lists.mozilla.org > https://lists.mozilla.org/listinfo/dev-tech-js-engine-internals > _______________________________________________ dev-tech-js-engine-internals mailing list dev-tech-js-engine-internals@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-tech-js-engine-internals