I have some questions after starting to look at implementing an optimization in
the GHC back end related to Trac ticket #1498[1,2]. The goal of the
optimization is to eliminate unnecessary heap checks in recursive functions.
For example, if we have a recursive function such as
f :: Int -> Int -> Maybe Int
f = \x y -> case x of
0 -> (Just y)
_ -> f (x-1) y
we see that it only does an allocation in the base case of the recursion. GHC
HEAD generates great looking core for this function (-ddump-stg):
Main.$wf =
\r [ww_sSX w_sT1]
case ww_sSX of ds_sSZ {
__DEFAULT ->
case -# [ds_sSZ 1] of sat_sT8 {
__DEFAULT -> Main.$wf sat_sT8 w_sT1;
};
0 -> Data.Maybe.Just [w_sT1];
};
With the corresponding Cmm code (-ddump-cmm):
Main_zdwf_entry()
<...info table elided...>
cT6:
Hp = Hp + 16;
if (Hp > HpLim) goto cT9;
_sSS::I64 = R2;
if (_sSS::I64 != 0) goto cTc;
I64[Hp - 8] = base_DataziMaybe_Just_con_info;
I64[Hp + 0] = R3;
R1 = Hp - 6;
jump (I64[Sp + 0]) ();
cT9:
HpAlloc = 16;
R1 = Main_zdwf_closure;
jump stg_gc_fun ();
cTc:
_sT1::I64 = _sSS::I64 - 1;
R2 = _sT1::I64;
Hp = Hp - 16;
jump Main_zdwf_info ();
}
Now we have a nice looking loop, but the straightforward code generation done
in the native back end causes a heap check on every iteration giving these
extra overheads on each iteration:
1) incrementing the heap pointer
2) loading the heap limit
3) testing the heap pointer against the limit
4) decrementing the heap pointer
It was suggested in an earlier discussion[2] that we could get rid of this
overhead by pushing the heap check down into the branch that actually does the
allocation. It is noted that this is not a safe transformation in general
because the heap check can only be done at a "safe point" for the GC.
Interestingly, the LLVM backend gets good code by moving the heap check in the
opposite direction. It pulls the heap check up out of the loop that it
generates for the tail-recursive call. It ends up with the three instruction
loop that you would get if you were to write the loop by hand (decrement, test,
branch) and does the allocation in the fall through case of the loop.
The LLVM generated code shows the advantage of having powerful low-level
transformations in GHC. If the LLVM is invoked with -O instead of -O2, it still
transforms the recursive call to a loop, but does not hoist the heap check out
of the loop. The assembly codes from the two back ends are pasted at the end of
the message for the interested.
Looking into this issue brings me to ask a few questions. Any feedback is
welcomed.
1) Does it still make sense to pursue these optimizations in the native code
generator?
Given the strong optimizations already in LLVM is it worth investing time in
the GHC native backend? One real advantage I see with the native back end (and
an area I would like to explore) is that it is easier to pass information from
the high-level optimizer to the low-level optimizer when it all resides in the
same compiler. This may allow certain optimizations to fire that would be very
difficult to prove safe in the LLVM back end.
2) What is the definition of a safe point for the GC?
My understanding is that currently all heap checks are done on entry to a
closure, but this could be relaxed as long as we can provide a safe
continuation point where we can resume execution after GC. One idea that could
help in the example above would be to outline the allocation points into
separate functions at the cmm level so that f no longer allocates any data and
the check is only performed in the new outlined function in the base case. It
seems to me that there could be some interesting optimization opportunities for
code motion of heap checks.
3) What is the status of the new code generator?
I've seen a number of wiki pages describing the new code generation
pipeline[3,4], but it is a little difficult to tell the exact status of the
implementation. Are people still actively working on the new code generator or
has most of the work been completed?
Thanks!
-Dave
[1] http://hackage.haskell.org/trac/ghc/ticket/1498
[2] http://www.haskell.org/pipermail/cvs-ghc/2007-July/036496.html
[3] http://hackage.haskell.org/trac/ghc/wiki/Commentary/Compiler/NewCodeGen
[4]
http://hackage.haskell.org/trac/ghc/wiki/Commentary/Compiler/NewCodeGenPipeline
#
# LLVM ASSEMBLY (x86_64)
#
Main_zdwf_entry: # @Main_zdwf_entry
# BB#0: # %cVo
subq $8, %rsp
leaq 16(%r12), %rax
cmpq 144(%r13), %rax
ja .LBB1_3
jmp .LBB1_1
.align 16
.LBB1_4: # %cVG
# in Loop: Header=BB1_1 Depth=1
decq %r14
.LBB1_1: # %nVA
# =>This Inner Loop Header: Depth=1
testq %r14, %r14
jne .LBB1_4
# BB#2: # %nVH
movq $base_DataziMaybe_Just_con_info, 8(%r12)
movq %rsi, 16(%r12)
movq %r12, %rbx
movq %rax, %r12
movq (%rbp), %rcx
addq $10, %rbx
movq (%rcx), %r11
addq $8, %rsp
jmpq *%r11 # TAILCALL
.LBB1_3: # %cVz
movq $16, 184(%r13)
movl $Main_zdwf_closure, %ebx
movq %rax, %r12
movq -8(%r13), %r11
addq $8, %rsp
jmpq *%r11 # TAILCALL
#
# NATIVE BACK-END ASSEMBLY (x86_64)
#
Main_zdwf_info:
.LcTh:
addq $16,%r12
cmpq 144(%r13),%r12
ja .LcTk
movq %r14,%rax
testq %rax,%rax
jne .LcTn
movq $base_DataziMaybe_Just_con_info,-8(%r12)
movq %rsi,(%r12)
leaq -6(%r12),%rbx
jmp *(%rbp)
.LcTk:
movq $16,184(%r13)
movl $Main_zdwf_closure,%ebx
jmp *-8(%r13)
.LcTn:
leaq -1(%rax),%r14
addq $-16,%r12
jmp Main_zdwf_info
_______________________________________________
Cvs-ghc mailing list
[email protected]
http://www.haskell.org/mailman/listinfo/cvs-ghc