Re: Optimizing the placement of heap checks (trac #1498)

Simon Marlow Wed, 17 Mar 2010 05:25:54 -0700

On 15/03/2010 18:35, David Peixotto wrote:

I have some questions after starting to look at implementing an
optimization in the GHC back end related to Trac ticket #1498[1,2]. The
goal of the optimization is to eliminate unnecessary heap checks in
recursive functions.


Great!

Now we have a nice looking loop, but the straightforward code generation
done in the native back end causes a heap check on every iteration
giving these extra overheads on each iteration:

1) incrementing the heap pointer
2) loading the heap limit
3) testing the heap pointer against the limit
4) decrementing the heap pointer

It was suggested in an earlier discussion[2] that we could get rid of
this overhead by pushing the heap check down into the branch that
actually does the allocation. It is noted that this is not a safe
transformation in general because the heap check can only be done at a
"safe point" for the GC.

Interestingly, the LLVM backend gets good code by moving the heap check
in the opposite direction. It pulls the heap check up out of the loop
that it generates for the tail-recursive call. It ends up with the three
instruction loop that you would get if you were to write the loop by
hand (decrement, test, branch) and does the allocation in the fall
through case of the loop.


Yes, LLVM is pretty clever :)

The LLVM generated code shows the advantage of having powerful low-level
transformations in GHC. If the LLVM is invoked with -O instead of -O2,
it still transforms the recursive call to a loop, but does not hoist the
heap check out of the loop. The assembly codes from the two back ends
are pasted at the end of the message for the interested.

Looking into this issue brings me to ask a few questions. Any feedback
is welcomed.

1) Does it still make sense to pursue these optimizations in the native
code generator?

Well, this transformation is something that would be done earlier thanthe native code generator, probably as part of the STG->C-- phase or inone of the phases that follows it. It's important that we still dotransformations like these in GHC, even if in some cases LLVM can figureout how to get the same results on its own, because in general we havemore information than LLVM and can make more informed decisions. A goodexample of this is the fact that in GHC we know that stack and heapaddresses cannot alias, whereas LLVM cannot know that (is there a way totell it?).

Given the strong optimizations already in LLVM is it worth investing
time in the GHC native backend?

It's probably not worth investing a lot of time in optimisations thataffect only the native backend, no.

However, most of the work happening in the "new back end" in GHC isorthogonal to this, as it is essentially concerned with replacing thesingle monolithic STG->C-- pass with a series of smaller passes and anicer C-- datatype. The LLVM backend will be able to take advantage ofthis too.

One real advantage I see with the native
back end (and an area I would like to explore) is that it is easier to
pass information from the high-level optimizer to the low-level
optimizer when it all resides in the same compiler. This may allow
certain optimizations to fire that would be very difficult to prove safe
in the LLVM back end.


Quite.

2) What is the definition of a safe point for the GC?

Safe points are constructed, rather than identified. To make a safepoint, everything live has to be saved on the stack, and there needs tobe a return point with an info table describing the stack frame layout.I believe the new code generator makes this more convenient to arrangethan it was previously.

My understanding is that currently all heap checks are done on entry to
a closure, but this could be relaxed as long as we can provide a safe
continuation point where we can resume execution after GC. One idea that
could help in the example above would be to outline the allocation
points into separate functions at the cmm level so that f no longer
allocates any data and the check is only performed in the new outlined
function in the base case. It seems to me that there could be some
interesting optimization opportunities for code motion of heap checks.

3) What is the status of the new code generator?

I've seen a number of wiki pages describing the new code generation
pipeline[3,4], but it is a little difficult to tell the exact status of
the implementation. Are people still actively working on the new code
generator or has most of the work been completed?

It's still under development. As I understand it, the current codeworks, for some value of "works" that includes running most of thetestsuite at least the normal way, but there are plans afoot to do somerefactoring. Those involved can hopefully elaborate (Simon, John, Norman).


Cheers,
        Simon



Thanks!

-Dave

[1] http://hackage.haskell.org/trac/ghc/ticket/1498
[2] http://www.haskell.org/pipermail/cvs-ghc/2007-July/036496.html
[3] http://hackage.haskell.org/trac/ghc/wiki/Commentary/Compiler/NewCodeGen
[4]
http://hackage.haskell.org/trac/ghc/wiki/Commentary/Compiler/NewCodeGenPipeline

#
# LLVM ASSEMBLY (x86_64)
#
Main_zdwf_entry: # @Main_zdwf_entry
# BB#0: # %cVo
subq $8, %rsp
leaq 16(%r12), %rax
cmpq 144(%r13), %rax
ja .LBB1_3
jmp .LBB1_1
.align 16
.LBB1_4: # %cVG
# in Loop: Header=BB1_1 Depth=1
decq %r14
.LBB1_1: # %nVA
# =>This Inner Loop Header: Depth=1
testq %r14, %r14
jne .LBB1_4
# BB#2: # %nVH
movq $base_DataziMaybe_Just_con_info, 8(%r12)
movq %rsi, 16(%r12)
movq %r12, %rbx
movq %rax, %r12
movq (%rbp), %rcx
addq $10, %rbx
movq (%rcx), %r11
addq $8, %rsp
jmpq *%r11 # TAILCALL
.LBB1_3: # %cVz
movq $16, 184(%r13)
movl $Main_zdwf_closure, %ebx
movq %rax, %r12
movq -8(%r13), %r11
addq $8, %rsp
jmpq *%r11 # TAILCALL
#
# NATIVE BACK-END ASSEMBLY (x86_64)
#
Main_zdwf_info:
.LcTh:
addq $16,%r12
cmpq 144(%r13),%r12
ja .LcTk
movq %r14,%rax
testq %rax,%rax
jne .LcTn
movq $base_DataziMaybe_Just_con_info,-8(%r12)
movq %rsi,(%r12)
leaq -6(%r12),%rbx
jmp *(%rbp)
.LcTk:
movq $16,184(%r13)
movl $Main_zdwf_closure,%ebx
jmp *-8(%r13)
.LcTn:
leaq -1(%rax),%r14
addq $-16,%r12
jmp Main_zdwf_info



_______________________________________________
Cvs-ghc mailing list
[email protected]
http://www.haskell.org/mailman/listinfo/cvs-ghc


_______________________________________________
Cvs-ghc mailing list
[email protected]
http://www.haskell.org/mailman/listinfo/cvs-ghc

Re: Optimizing the placement of heap checks (trac #1498)

Reply via email to