David It would indeed be good to work on this. Moreover, I know exactly how! Indeed a Cambridge undergrad, Krzysztof Wos, may do this very thing in his undergrad project. If he has time, which is not certain. I'm ccing him.
I've just dumped in some notes about the new back end status here http://hackage.haskell.org/trac/ghc/wiki/Commentary/Compiler/NewCodeGen You'll see that things are in a tremendous state of flux at the moment, which is not very helpful for you. It should settle down in a few months, esp after David Terei's internship. Basically, you want to modify the new code generator, which converts STG to CmmZ/CmmGADT (see the above link for what that is). The new pipeline makes it easy to add safe points wherever you want, which the old one does not. Hmm. I hate to respond like this, but it might make most sense to give us 2-3 months to get the new codegen framework settled, and then pick up the cudgels. Simon From: [email protected] [mailto:[email protected]] On Behalf Of David Peixotto Sent: 15 March 2010 18:36 To: [email protected] Subject: Optimizing the placement of heap checks (trac #1498) I have some questions after starting to look at implementing an optimization in the GHC back end related to Trac ticket #1498[1,2]. The goal of the optimization is to eliminate unnecessary heap checks in recursive functions. For example, if we have a recursive function such as f :: Int -> Int -> Maybe Int f = \x y -> case x of 0 -> (Just y) _ -> f (x-1) y we see that it only does an allocation in the base case of the recursion. GHC HEAD generates great looking core for this function (-ddump-stg): Main.$wf = \r [ww_sSX w_sT1] case ww_sSX of ds_sSZ { __DEFAULT -> case -# [ds_sSZ 1] of sat_sT8 { __DEFAULT -> Main.$wf sat_sT8 w_sT1; }; 0 -> Data.Maybe.Just [w_sT1]; }; With the corresponding Cmm code (-ddump-cmm): Main_zdwf_entry() <...info table elided...> cT6: Hp = Hp + 16; if (Hp > HpLim) goto cT9; _sSS::I64 = R2; if (_sSS::I64 != 0) goto cTc; I64[Hp - 8] = base_DataziMaybe_Just_con_info; I64[Hp + 0] = R3; R1 = Hp - 6; jump (I64[Sp + 0]) (); cT9: HpAlloc = 16; R1 = Main_zdwf_closure; jump stg_gc_fun (); cTc: _sT1::I64 = _sSS::I64 - 1; R2 = _sT1::I64; Hp = Hp - 16; jump Main_zdwf_info (); } Now we have a nice looking loop, but the straightforward code generation done in the native back end causes a heap check on every iteration giving these extra overheads on each iteration: 1) incrementing the heap pointer 2) loading the heap limit 3) testing the heap pointer against the limit 4) decrementing the heap pointer It was suggested in an earlier discussion[2] that we could get rid of this overhead by pushing the heap check down into the branch that actually does the allocation. It is noted that this is not a safe transformation in general because the heap check can only be done at a "safe point" for the GC. Interestingly, the LLVM backend gets good code by moving the heap check in the opposite direction. It pulls the heap check up out of the loop that it generates for the tail-recursive call. It ends up with the three instruction loop that you would get if you were to write the loop by hand (decrement, test, branch) and does the allocation in the fall through case of the loop. The LLVM generated code shows the advantage of having powerful low-level transformations in GHC. If the LLVM is invoked with -O instead of -O2, it still transforms the recursive call to a loop, but does not hoist the heap check out of the loop. The assembly codes from the two back ends are pasted at the end of the message for the interested. Looking into this issue brings me to ask a few questions. Any feedback is welcomed. 1) Does it still make sense to pursue these optimizations in the native code generator? Given the strong optimizations already in LLVM is it worth investing time in the GHC native backend? One real advantage I see with the native back end (and an area I would like to explore) is that it is easier to pass information from the high-level optimizer to the low-level optimizer when it all resides in the same compiler. This may allow certain optimizations to fire that would be very difficult to prove safe in the LLVM back end. 2) What is the definition of a safe point for the GC? My understanding is that currently all heap checks are done on entry to a closure, but this could be relaxed as long as we can provide a safe continuation point where we can resume execution after GC. One idea that could help in the example above would be to outline the allocation points into separate functions at the cmm level so that f no longer allocates any data and the check is only performed in the new outlined function in the base case. It seems to me that there could be some interesting optimization opportunities for code motion of heap checks. 3) What is the status of the new code generator? I've seen a number of wiki pages describing the new code generation pipeline[3,4], but it is a little difficult to tell the exact status of the implementation. Are people still actively working on the new code generator or has most of the work been completed? Thanks! -Dave [1] http://hackage.haskell.org/trac/ghc/ticket/1498 [2] http://www.haskell.org/pipermail/cvs-ghc/2007-July/036496.html [3] http://hackage.haskell.org/trac/ghc/wiki/Commentary/Compiler/NewCodeGen [4] http://hackage.haskell.org/trac/ghc/wiki/Commentary/Compiler/NewCodeGenPipeline # # LLVM ASSEMBLY (x86_64) # Main_zdwf_entry: # @Main_zdwf_entry # BB#0: # %cVo subq $8, %rsp leaq 16(%r12), %rax cmpq 144(%r13), %rax ja .LBB1_3 jmp .LBB1_1 .align 16 .LBB1_4: # %cVG # in Loop: Header=BB1_1 Depth=1 decq %r14 .LBB1_1: # %nVA # =>This Inner Loop Header: Depth=1 testq %r14, %r14 jne .LBB1_4 # BB#2: # %nVH movq $base_DataziMaybe_Just_con_info, 8(%r12) movq %rsi, 16(%r12) movq %r12, %rbx movq %rax, %r12 movq (%rbp), %rcx addq $10, %rbx movq (%rcx), %r11 addq $8, %rsp jmpq *%r11 # TAILCALL .LBB1_3: # %cVz movq $16, 184(%r13) movl $Main_zdwf_closure, %ebx movq %rax, %r12 movq -8(%r13), %r11 addq $8, %rsp jmpq *%r11 # TAILCALL # # NATIVE BACK-END ASSEMBLY (x86_64) # Main_zdwf_info: .LcTh: addq $16,%r12 cmpq 144(%r13),%r12 ja .LcTk movq %r14,%rax testq %rax,%rax jne .LcTn movq $base_DataziMaybe_Just_con_info,-8(%r12) movq %rsi,(%r12) leaq -6(%r12),%rbx jmp *(%rbp) .LcTk: movq $16,184(%r13) movl $Main_zdwf_closure,%ebx jmp *-8(%r13) .LcTn: leaq -1(%rax),%r14 addq $-16,%r12 jmp Main_zdwf_info
_______________________________________________ Cvs-ghc mailing list [email protected] http://www.haskell.org/mailman/listinfo/cvs-ghc
