On Wed, May 13, 2020 at 7:54 PM Jason Orendorff <jorendo...@mozilla.com>
wrote:

> Jan, I'd like to know what you think about this.
>

Approach 2 is what we used to do before the big generator rewrite that
resulted in Baseline JIT support. Problems I see with bringing it back:

   - It ties the generator's heap representation to JIT internals resulting
   in weird edge cases. For example: what happens if we execute in the
   Baseline JIT, yield from the generator, discard the BaselineScript on GC,
   then resume? We'd need to fix up the heap-allocated frame somehow or
   Baseline-compile immediately. Debugger and profiler make these things worse.
   - There are places
   
<https://searchfox.org/mozilla-central/rev/9f074fab9bf905fad62e7cc32faf121195f4ba46/js/src/jit/BaselineCodeGen.cpp#611-615>
   where we calculate the BaselineFrame's size based on FP/SP distance. This
   especially matters for blinterp because it doesn't know anything about the
   script statically.
   - blinterp /must/ access expression stack slots via SP because the
   offset from FP isn't statically known (depends on number of locals and
   expression stack slots).
   - It makes Ion/Warp support hard because its frame layout is very
   different (regalloc spills to 'arbitrary' stack slots).

For approach 1 we could potentially share get/set opcodes for
locals/formals because it's a single array (or maybe not due to
ArgumentsObject weirdness).

Here's a hybrid approach 3:

   - GeneratorObject owns a list-of-Values (for formals and locals) as in
   approach 1, but InterpreterFrame and BaselineFrame store a raw pointer to
   that array.
   - We'd need different get/set opcodes for generator locals, but they'd
   only be a single load (fp->generatorArray) slower than normal locals [0].
   - Maybe we could implement this by ensuring the "list of Values" == "the
   GeneratorObject's dynamic slots". That way we get things like nursery
   allocation of that array for free.
   - For expression stack slots: the list-of-Values could reserve space for
   them (based on script->nslots) and then we emit store ops before the yield
   and load + push ops after. This way yield/resume with expression stack
   slots would be faster than what we do now (ArrayObject allocation..) and if
   there are no expression stack slots we don't have any runtime overhead.
   This could potentially be optimized in the frontend so that if you have a
   yield right after another yield, you only have to emit stores for the part
   of the expression stack that actually changed.

[0] For each approach: we probably want to avoid pre-barriers and
post-barriers when storing generator locals while executing. The old code
had some complexity around pre-barriers, it's probably worth trying to
understand what it did.

Jan


>
> Currently, in generators, all bindings—arguments, locals, etc.—are
> marked as "aliased". It's slow. Two possible fixes:
>
> 1.  LOCALS ON GENERATOR
>
>     Add bytecode instructions for getting and setting locals in a
>     generator. Just as non-aliased locals in normal functions are
>     optimized into stack slots, in generators optimize them into slots
>     on the GeneratorObject, where these special instructions can reach
>     them.
>
>     The expression stack remains as-is: on yield, copy it from the stack
>     to the generator; on resume, copy it back.
>
> 2.  FRAME ON GENERATOR
>
>     When calling a generator, create a normal interpreter/baseline stack
>     frame, but on the heap instead of the stack, a part of the
>     GeneratorObject. Copy the arguments there. That is actually going to
>     be our frame. It is not physically located on the stack.
>
>     In Interpreter.cpp, make `REGS.fp` point to the current stack frame
>     even when it's allocated in a generator, and likewise `REGS.sp`.
>     Same for the corresponding registers in Baseline/ blinterp. All uses
>     of `BaselineFrameReg` we looked at would still work if it pointed to
>     a heap-allocated frame.
>
>     The CPU's `rsp` would still point to the C stack as usual.
>
>     Now just remove the special case in the frontend that marks all
>     bindings in generators as aliased. Emit normal GetLocal, SetLocal,
>     GetArg, etc. instructions. "Everything" "just" works.
>
> Approach 1 is more flexible and calls might be faster. In approach 2,
> `yield` and resume are faster, and we avoid having new opcodes. However,
> the complexity settles at boundaries between execution modes, which
> would now have to cope with frames possibly being noncontiguous and
> stored on the heap.
>
> What do you think?
>
> -j
> _______________________________________________
> dev-tech-js-engine-internals mailing list
> dev-tech-js-engine-internals@lists.mozilla.org
> https://lists.mozilla.org/listinfo/dev-tech-js-engine-internals
>
_______________________________________________
dev-tech-js-engine-internals mailing list
dev-tech-js-engine-internals@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-tech-js-engine-internals

Reply via email to