Optimizing the placement of heap checks (trac #1498)

David Peixotto Mon, 15 Mar 2010 11:35:50 -0700

I have some questions after starting to look at implementing an optimization in 
the GHC back end related to Trac ticket #1498[1,2]. The goal of the 
optimization is to eliminate unnecessary heap checks in recursive functions.


For example, if we have a recursive function such as

f :: Int -> Int -> Maybe Int
f = \x y -> case x of
             0 ->   (Just y)
             _ -> f (x-1) y

we see that it only does an allocation in the base case of the recursion. GHC 
HEAD generates great looking core for this function (-ddump-stg):

Main.$wf =
    \r [ww_sSX w_sT1]
        case ww_sSX of ds_sSZ {
          __DEFAULT ->
              case -# [ds_sSZ 1] of sat_sT8 {
                __DEFAULT -> Main.$wf sat_sT8 w_sT1;
              };
          0 -> Data.Maybe.Just [w_sT1];
        };

With the corresponding Cmm code (-ddump-cmm):

Main_zdwf_entry()
<...info table elided...>
cT6:
    Hp = Hp + 16;
    if (Hp > HpLim) goto cT9;
    _sSS::I64 = R2;
    if (_sSS::I64 != 0) goto cTc;
    I64[Hp - 8] = base_DataziMaybe_Just_con_info;
    I64[Hp + 0] = R3;
    R1 = Hp - 6;
    jump (I64[Sp + 0]) ();
cT9:
    HpAlloc = 16;
    R1 = Main_zdwf_closure;
    jump stg_gc_fun ();
cTc:
    _sT1::I64 = _sSS::I64 - 1;
    R2 = _sT1::I64;
    Hp = Hp - 16;
    jump Main_zdwf_info ();
}

Now we have a nice looking loop, but the straightforward code generation done 
in the native back end causes a heap check on every iteration giving these 
extra overheads on each iteration:

  1) incrementing the heap pointer
  2) loading the heap limit
  3) testing the heap pointer against the limit
  4) decrementing the heap pointer

It was suggested in an earlier discussion[2] that we could get rid of this 
overhead by pushing the heap check down into the branch that actually does the 
allocation. It is noted that this is not a safe transformation in general 
because the heap check can only be done at a "safe point" for the GC.

Interestingly, the LLVM backend gets good code by moving the heap check in the 
opposite direction. It pulls the heap check up out of the loop that it 
generates for the tail-recursive call. It ends up with the three instruction 
loop that you would get if you were to write the loop by hand (decrement, test, 
branch) and does the allocation in the fall through case of the loop. 

The LLVM generated code shows the advantage of having powerful low-level 
transformations in GHC. If the LLVM is invoked with -O instead of -O2, it still 
transforms the recursive call to a loop, but does not hoist the heap check out 
of the loop. The assembly codes from the two back ends are pasted at the end of 
the message for the interested.

Looking into this issue brings me to ask a few questions. Any feedback is 
welcomed.

1) Does it still make sense to pursue these optimizations in the native code 
generator?

Given the strong optimizations already in LLVM is it worth investing time in 
the GHC native backend? One real advantage I see with the native back end (and 
an area I would like to explore) is that it is easier to pass information from 
the high-level optimizer to the low-level optimizer when it all resides in the 
same compiler. This may allow certain optimizations to fire that would be very 
difficult to prove safe in the LLVM back end.

2) What is the definition of a safe point for the GC? 

My understanding is that currently all heap checks are done on entry to a 
closure, but this could be relaxed as long as we can provide a safe 
continuation point where we can resume execution after GC. One idea that could 
help in the example above would be to outline the allocation points into 
separate functions at the cmm level so that f no longer allocates any data and 
the check is only performed in the new outlined function in the base case. It 
seems to me that there could be some interesting optimization opportunities for 
code motion of heap checks.

3) What is the status of the new code generator? 

I've seen a number of wiki pages describing the new code generation 
pipeline[3,4], but it is a little difficult to tell the exact status of the 
implementation. Are people still actively working on the new code generator or 
has most of the work been completed?


Thanks!

-Dave

[1] http://hackage.haskell.org/trac/ghc/ticket/1498
[2] http://www.haskell.org/pipermail/cvs-ghc/2007-July/036496.html
[3] http://hackage.haskell.org/trac/ghc/wiki/Commentary/Compiler/NewCodeGen
[4] 
http://hackage.haskell.org/trac/ghc/wiki/Commentary/Compiler/NewCodeGenPipeline

#
# LLVM ASSEMBLY (x86_64)
#
Main_zdwf_entry:                        # @Main_zdwf_entry
# BB#0:                                 # %cVo
        subq    $8, %rsp
        leaq    16(%r12), %rax
        cmpq    144(%r13), %rax
        ja      .LBB1_3
        jmp     .LBB1_1
        .align  16
.LBB1_4:                                # %cVG
                                        #   in Loop: Header=BB1_1 Depth=1
        decq    %r14
.LBB1_1:                                # %nVA
                                        # =>This Inner Loop Header: Depth=1
        testq   %r14, %r14
        jne     .LBB1_4
# BB#2:                                 # %nVH
        movq    $base_DataziMaybe_Just_con_info, 8(%r12)
        movq    %rsi, 16(%r12)
        movq    %r12, %rbx
        movq    %rax, %r12
        movq    (%rbp), %rcx
        addq    $10, %rbx
        movq    (%rcx), %r11
        addq    $8, %rsp
        jmpq    *%r11  # TAILCALL
.LBB1_3:                                # %cVz
        movq    $16, 184(%r13)
        movl    $Main_zdwf_closure, %ebx
        movq    %rax, %r12
        movq    -8(%r13), %r11
        addq    $8, %rsp
        jmpq    *%r11  # TAILCALL
        
#
# NATIVE BACK-END ASSEMBLY (x86_64)
#       
Main_zdwf_info:
.LcTh:
        addq $16,%r12
        cmpq 144(%r13),%r12
        ja .LcTk
        movq %r14,%rax
        testq %rax,%rax
        jne .LcTn
        movq $base_DataziMaybe_Just_con_info,-8(%r12)
        movq %rsi,(%r12)
        leaq -6(%r12),%rbx
        jmp *(%rbp)
.LcTk:
        movq $16,184(%r13)
        movl $Main_zdwf_closure,%ebx
        jmp *-8(%r13)
.LcTn:
        leaq -1(%rax),%r14
        addq $-16,%r12
        jmp Main_zdwf_info

_______________________________________________
Cvs-ghc mailing list
[email protected]
http://www.haskell.org/mailman/listinfo/cvs-ghc

Optimizing the placement of heap checks (trac #1498)

Reply via email to