On 02/11/2015 06:37, Ryan Newton wrote:
Thanks Simon for linking that issue!  Does the patch linked there
<https://ghc.haskell.org/trac/ghc/attachment/ticket/8905/0001-EXPERIMENTAL-save-the-continuation-after-the-tag-che.patch>
 already
cover what you suggested in your last mail?  I think no, it's a more
limited change, but I'm having trouble understanding exactly what.

I've also got one really basic question -- was it decided long ago that
all these stack limit checks are cheaper than having a guard page at the
end of each stack and faulting to grow the stack?  (I couldn't find a
place that this rationale was described on wiki.)

Stacks would be immovable, and you need to arrange that you have enough address space to grow the stack if necessary, just like OS threads. Thus it's not really feasible on 32-bit platforms, and we would have to keep the existing stack-chunk mechanism for those platforms. Right now stacks start at 1k and are really cheap; they'd be somewhat more expensive under this scheme: at least 4K and a couple of mmap() calls.

Stack chunks have another benefit: we can track whether each chunk is dirty separately, and only traverse dirty chunks in the GC. I don't think you'd be able to do this with contiguous stacks and no stack checks.

Cheers
Simon



Best,
   -Ryan


On Sun, Nov 1, 2015 at 10:05 AM, Simon Marlow <marlo...@gmail.com
<mailto:marlo...@gmail.com>> wrote:

    Yes, I think we can probably do a better job of compiling case
    expressions.  The current compilation strategy optimises for code
    size, but at the expense of performance in the fast path.  We can
    tweak this tradeoff, perhaps under the control of a flag.

    Ideally the sequence should start by assuming that the closure is
    already evaluated, e.g.

      loop:
        tag = R2 & 7;
        if (tag == 1) then // code for []
        else if (tag == 2) then // code for (:)
        else evaluate; jump back to loop

    The nice thing is that now that we don't need proc points, "loop" is
    just a label that we can directly jump to.  Unfortunately this only
    works when using the NCG, not with LLVM, because LLVM requires proc
    points, so we might need to fall back to a different strategy for LLVM.

    Similar topics came up here:
    https://ghc.haskell.org/trac/ghc/ticket/8905 and I think there was
    another ticket but I can't find it now.

    Cheers
    Simon

    On 23/10/2015 19:00, Ryan Newton wrote:

              1. Small tweaks: The CMM code above seems to be /betting/
        than the
                 thunk is unevaluated, because it does the stack check
        and stack
                 write /before/ the predicate test that checks if the
        thunk is
                 evaluated (if(R1 & 7!= 0) gotoc3aO; elsegotoc3aP;).  With a
                 bang-pattern function, couldn't it make the opposite
        bet?  That
                 is, branch on whether the thunk is evaluated first, and
        then the
                 wasted computation is only a single correctly predicted
        branch
                 (and a read of a tag that we need to read anyway).

        Oh, a small further addition would be needed for this tweak.  In the
        generated code above "Sp = Sp + 8;" happens /late/, but I think
        it could
        happen right after the call to the thunk.  In general, does it seem
        feasible to separate the slowpath from fastpath as in the following
        tweak of the example CMM?


        *  // Skip to the chase if it's already evaluated:*
        *  start:*
        *      if (R2 & 7 != 0) goto fastpath; else goto slowpath;*
        *
        *
        *  slowpath:   // Formerly c3aY*
        *      if ((Sp + -8) < SpLim) goto c3aZ; else goto c3b0;*
        *  c3aZ:*
        *      // nop*
        *      R1 = PicBaseReg + foo_closure;*
        *      call (I64[BaseReg - 8])(R2, R1) args: 8, res: 0, upd: 8;*
        *  c3b0:*
        *      I64[Sp - 8] = PicBaseReg + block_c3aO_info;*
        *      R1 = R2;*
        *      Sp = Sp - 8;*

        *      call (I64[R1])(R1) returns to fastpath, args: 8, res: 8,
        upd: 8;*
        *      // Sp bump moved to here so it's separate from "fastpath"*
        *      Sp = Sp + 8;*
        *
        *
        *  fastpath: // Formerly c3aO*
        *      if (R1 & 7 >= 2) goto c3aW; else goto c3aX;*
        *  c3aW:*
        *      R1 = P64[R1 + 6] & (-8);*
        *      call (I64[R1])(R1) args: 8, res: 0, upd: 8;*
        *  c3aX:*
        *      R1 = PicBaseReg + lvl_r39S_closure;*
        *      call (I64[R1])(R1) args: 8, res: 0, upd: 8;*





        _______________________________________________
        ghc-devs mailing list
        ghc-devs@haskell.org <mailto:ghc-devs@haskell.org>
        http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs


_______________________________________________
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

Reply via email to