Hi guys,

Looks like there's a possible fix on that bug already.

This is pretty interesting discovery, and from the llvm-dev thread it sounds 
like those guys are aware of this but probably snowed under. I wonder how much 
mileage there might be in attempting to collar one of the devs for a chat (eg. 
irc, skype or whatever)? I wouldn't be surprised if they had a load of useful 
info to divulge, and could probably tell you whether tailcallopt is essential 
or not. 

Anyway, just a thought.

Cheers,
Sam

-----Original Message-----
From: [email protected] on behalf of David Terei
Sent: Mon 05/04/2010 15:03
To: Max Bolingbroke
Cc: GHC Development Mailing List
Subject: Re: Stack manipulation in LLVM backend
 
Yeah I had noticed this as well recently when I tried out the sibling
call optimisation improvements in LLVM.

The interesting thing is, can we get away with removing the
'-tailcallopt' flag and just hoping for the best? It makes me a little
worried that the tail calls aren't guaranteed but for the code GHC
generates LLVM really shouldn't have any troubles picking them up with
the plain sibling call optimisation.

I ran the testsuite (fast) against the LLVM back-end on x86-32 today
without the '-tailcallopt' argument to llc and had everything pass
fine. I also ran nofib a couple of times and got a nice performance
improvement of 2%. I'll try some more tests and on a different
platform or two as well but its probably worth while getting rid of
the '-tailcallopt' argument for now.

Thanks for filing the bug report, will be interesting to see if there
is much of a response.

NoFib Results

--------------------------------------------------------------------------------
        Program           Size    Allocs   Runtime   Elapsed
--------------------------------------------------------------------------------
           anna          +2.0%     +0.0%      0.12      0.13
           ansi          +0.0%     +0.0%      0.00      0.00
           atom          +0.0%     +0.0%     +8.6%    +10.0%
         awards          +0.0%     +0.0%      0.00      0.00
         banner          +0.0%     +0.0%      0.00      0.00
     bernouilli          +0.0%     +0.0%     +0.5%     +0.5%
          boyer          +0.0%     +0.0%      0.04      0.05
         boyer2          +0.1%     +0.0%      0.01      0.01
           bspt          +0.5%     +0.0%      0.01      0.01
      cacheprof          +0.6%     +0.0%     +2.6%     +2.9%
       calendar          +0.0%     +0.0%      0.00      0.00
       cichelli          +0.1%     +0.0%      0.10      0.10
        circsim          +0.2%     +0.0%     +0.7%     +0.2%
       clausify          +0.0%     +0.0%      0.05      0.05
  comp_lab_zift          +0.1%     +0.0%     +0.0%     +0.9%
       compress          +0.1%     +0.0%      0.20      0.21
      compress2          +0.1%     +0.0%      0.16      0.18
    constraints          +0.1%     +0.0%     -1.4%     +1.5%
   cryptarithm1          +0.0%     +0.0%     +5.1%     +1.3%
   cryptarithm2          +0.1%     +0.0%      0.02      0.01
            cse          +0.1%     +0.0%      0.00      0.00
          eliza          +0.0%     +0.0%      0.00      0.00
          event          +0.0%     +0.0%      0.16      0.18
         exp3_8          +0.0%     +0.0%      0.11      0.12
         expert          +0.1%     +0.0%      0.00      0.00
            fem          +0.5%     -0.0%      0.02      0.03
            fft          +0.1%     +0.0%      0.03      0.03
           fft2          +0.1%     +0.0%      0.08      0.09
       fibheaps          +0.1%     +0.0%      0.03      0.04
           fish          +0.1%     +0.0%      0.02      0.02
          fluid          +0.8%     +0.0%      0.01      0.01
         fulsom          +0.6%     +0.0%     +2.3%     +0.7%
         gamteb          +0.2%     +0.0%      0.10      0.10
            gcd          +0.0%     +0.0%      0.03      0.03
    gen_regexps          +0.0%     +0.0%      0.00      0.00
         genfft          +0.0%     +0.0%      0.04      0.04
             gg          +0.6%     +0.0%      0.01      0.01
           grep          +0.2%     +0.0%      0.00      0.00
         hidden          +0.4%     -0.0%     +1.3%     -2.5%
            hpg          +0.4%     +0.0%      0.17     +2.7%
            ida          +0.1%     +0.0%      0.11      0.12
          infer          +0.4%     +0.0%      0.06      0.06
        integer          +0.0%     +0.0%     +3.0%     +4.2%
      integrate          +0.0%     +0.0%     -1.4%     -3.0%
        knights          +0.2%     +0.0%      0.00      0.01
           lcss          +0.0%     +0.0%     +3.8%     +1.1%
           life          +0.0%     +0.0%     +4.9%     +4.0%
           lift          +0.3%     +0.0%      0.00      0.00
      listcompr          +0.0%     +0.0%      0.11      0.11
       listcopy          +0.1%     +0.0%      0.11      0.12
       maillist          +0.0%     -0.0%      0.06      0.10
         mandel          +0.1%     -0.0%      0.13      0.13
        mandel2          +0.0%     +0.0%      0.01      0.01
        minimax          +0.1%     +0.0%      0.00      0.00
        mkhprog          +0.1%     +0.0%      0.00      0.00
     multiplier          +0.2%     +0.0%      0.13      0.14
       nucleic2          +0.2%     +0.0%      0.09      0.09
           para          +0.2%     +0.0%     +0.0%     -0.9%
      paraffins          +0.1%     +0.0%      0.10      0.11
         parser          +0.5%     +0.0%      0.04      0.04
        parstof          +0.3%     +0.0%      0.00      0.01
            pic          +0.2%     +0.0%      0.01      0.01
          power          +0.2%     +0.0%     +0.8%     +3.5%
         pretty          +0.1%     +0.0%      0.00      0.00
         primes          +0.0%     +0.0%      0.05      0.06
      primetest          +0.1%     +0.0%     -2.4%     -3.4%
         prolog          +0.2%     +0.0%      0.00      0.00
         puzzle          +0.1%     +0.0%      0.20     +5.9%
         queens          +0.0%     +0.0%      0.02      0.02
        reptile          +0.3%     +0.0%      0.02      0.02
        rewrite          +0.2%     +0.0%      0.02      0.02
           rfib          +0.0%     +0.0%      0.04      0.04
            rsa          +0.0%     +0.0%      0.09      0.09
            scc          +0.0%     +0.0%      0.00      0.00
          sched          +0.0%     +0.0%      0.03      0.03
            scs          +0.5%     -0.0%     -1.3%     -2.0%
         simple          +0.8%     +0.0%     +3.8%     +0.0%
          solid          +0.1%     +0.0%      0.18    -10.6%
        sorting          +0.0%     +0.0%      0.00      0.00
         sphere          +0.3%     +0.0%      0.12      0.13
         symalg          +0.4%     +0.0%      0.04      0.04
            tak          +0.0%     +0.0%      0.02      0.02
      transform          +0.4%     +0.0%     +4.3%     +0.5%
       treejoin          +0.0%     -0.0%     +1.7%    -12.3%
      typecheck          +0.1%     +0.0%     +1.6%     +0.7%
        veritas          +1.6%     +0.0%      0.00      0.00
           wang          +0.1%     +0.0%      0.10      0.11
      wave4main          +0.1%     +0.0%     +2.9%     +2.8%
   wheel-sieve1          +0.0%     +0.0%    +10.1%     +9.9%
   wheel-sieve2          +0.0%     +0.0%      0.17     -2.9%
           x2n1          +0.0%     +0.0%      0.02      0.03
--------------------------------------------------------------------------------
            Min          +0.0%     -0.0%     -2.4%    -12.3%
            Max          +2.0%     +0.0%    +10.1%    +10.0%
 Geometric Mean          +0.2%     -0.0%     +2.2%     +0.5%

On 4 April 2010 21:52, Max Bolingbroke <[email protected]> wrote:
> I've filed this as bug 6774 on the LLVM trac
> (http://llvm.org/bugs/show_bug.cgi?id=6774), so the issue at least
> stands a chance of being fixed.
>
> On 2 April 2010 16:02, Max Bolingbroke <[email protected]> wrote:
>> Hi,
>>
>> I've narrowed down one of the problems with the LLVM backend that
>> David descried in his thesis. LLVM sometimes generates gratuitous
>> extra stack manipulations.
>>
>> For example, consider this input program:
>>
>> {{{
>> module Toy where
>>
>> toy :: (Int -> Maybe Int) -> Int
>> toy f = case f 10 of Just x -> x; Nothing -> 20
>> }}}
>>
>> Compiled using ghc -O (which runs llvl's "opt -O2"), we get the
>> following LLVM IR:
>>
>> {{{
>> define cc10 void @Toy_toy_entry(i32 %stg_terei_baseArg, i32
>> %stg_terei_spArg, i32 %stg_terei_hpArg, i32 %stg_terei_r1Arg) nounwind
>> {
>> cdx:
>>  %ndz = add i32 %stg_terei_spArg, -4             ; <i32> [#uses=3]
>>  %ndB = add i32 %stg_terei_baseArg, 84           ; <i32> [#uses=1]
>>  %ndC = inttoptr i32 %ndB to i32*                ; <i32*> [#uses=1]
>>  %ndD = load i32* %ndC                           ; <i32> [#uses=1]
>>  %ndE = icmp ult i32 %ndz, %ndD                  ; <i1> [#uses=1]
>>  br i1 %ndE, label %cdG, label %ndH
>>
>> ndH:                                              ; preds = %cdx
>>  %ndJ = inttoptr i32 %stg_terei_spArg to i32*    ; <i32*> [#uses=2]
>>  %ndK = load i32* %ndJ                           ; <i32> [#uses=1]
>>  %ndP = inttoptr i32 %ndz to i32*                ; <i32*> [#uses=1]
>>  store i32 add (i32 ptrtoint (%Toy_toy1_closure_struct*
>> @Toy_toy2_closure to i32), i32 1), i32* %ndP
>>  store i32 ptrtoint (%sbo_info_struct* @sbo_info to i32), i32* %ndJ
>>  tail call cc10 void @stg_ap_p_fast(i32 %stg_terei_baseArg, i32 %ndz,
>> i32 %stg_terei_hpArg, i32 %ndK) nounwind
>>  ret void
>>
>> cdG:                                              ; preds = %cdx
>>  %ne1 = add i32 %stg_terei_baseArg, -4           ; <i32> [#uses=1]
>>  %ne2 = inttoptr i32 %ne1 to i32*                ; <i32*> [#uses=1]
>>  %ne3 = load i32* %ne2                           ; <i32> [#uses=1]
>>  %ne4 = inttoptr i32 %ne3 to void (i32, i32, i32, i32)* ; <void (i32,
>> i32, i32, i32)*> [#uses=1]
>>  tail call cc10 void %ne4(i32 %stg_terei_baseArg, i32
>> %stg_terei_spArg, i32 %stg_terei_hpArg, i32 ptrtoint
>> (%Toy_toy_closure_struct* @Toy_toy_closure to i32)) nounwind
>>  ret void
>> }
>> }}}
>>
>> So far so good. However, when we compile this with "llc -tailcallopt
>> -O2 -march=x86", we get:
>>
>> {{{
>> _Toy_toy_entry:                         ## @Toy_toy_entry
>> ## BB#0:                                ## %cdx
>>        subl    $12, %esp
>>        leal    -4(%ebp), %eax
>>        cmpl    84(%ebx), %eax
>>        jb      LBB2_2
>> ## BB#1:                                ## %ndH
>>        movl    (%ebp), %esi
>>        movl    $_Toy_toy2_closure+1, -4(%ebp)
>>        movl    $_sbo_info, (%ebp)
>>        movl    %eax, %ebp
>>        addl    $12, %esp
>>        jmp     _stg_ap_p_fast  # TAILCALL
>> LBB2_2:                                 ## %cdG
>>        movl    -4(%ebx), %eax
>>        movl    $_Toy_toy_closure, %esi
>>        addl    $12, %esp
>>        jmpl    *%eax  # TAILCALL
>>
>>        .globl  ___stginit_Toy_
>>        .align  4, 0x90
>> }}}
>>
>> Note the "subl  $12, %esp" and "addl    $12, %esp" pairs. %esp is never
>> actually used, so this fiddling is entirely pointless! This seems to
>> happen in every single function LLVM compiles, so I imagine fixing the
>> problem would be quite a big win.
>>
>> Without -tailcallopt, we get nice code:
>>
>> {{{
>> _Toy_toy_entry:                         ## @Toy_toy_entry
>> ## BB#0:                                ## %cdx
>>        leal    -4(%ebp), %eax
>>        cmpl    84(%ebx), %eax
>>        jb      LBB2_2
>> ## BB#1:                                ## %ndH
>>        movl    (%ebp), %esi
>>        movl    $_Toy_toy2_closure+1, -4(%ebp)
>>        movl    $_sbo_info, (%ebp)
>>        movl    %eax, %ebp
>>        jmp     _stg_ap_p_fast  # TAILCALL
>> LBB2_2:                                 ## %cdG
>>        movl    -4(%ebx), %eax
>>        movl    $_Toy_toy_closure, %esi
>>        jmpl    *%eax  # TAILCALL
>> }}}
>>
>> Unfortunately -tailcallopt is apparently the only way to get
>> *guaranteed* tail calls. The LLVM developers appear to know that
>> tailcallopt causes this sort of rubbish in the output
>> (http://old.nabble.com/Removing--tailcallopt--td27475523.html) but
>> obviously haven't fixed it (my results are with the very recent LLVM
>> r100183).
>>
>> I couldn't get the stack manipulations to go away with any amount of
>> "noreturn" and "unreachable" annotations around "tail call"s and
>> function definitions. It was worth a try though :-)
>>
>> In summary, it looks like anyone wanting to fix the excess stack
>> manipulations issue is going to have to get their hands messy and
>> delve into LLVM's LLC!
>>
>> Cheers,
>> Max
>>
>
> _______________________________________________
> Cvs-ghc mailing list
> [email protected]
> http://www.haskell.org/mailman/listinfo/cvs-ghc
>

_______________________________________________
Cvs-ghc mailing list
[email protected]
http://www.haskell.org/mailman/listinfo/cvs-ghc

_______________________________________________
Cvs-ghc mailing list
[email protected]
http://www.haskell.org/mailman/listinfo/cvs-ghc

Reply via email to