On 22/02/2010 16:49, Simon Marlow wrote:
On 22/02/2010 12:34, Simon Marlow wrote:

I'm currently running some benchmarks to see how much impact turning off
TNTC has on the -fasm backend.

Here are the results on x86-64/Linux:
[ snip ]
--------------------------------------------------------------------------------

Mi             +4.7% -0.0%  -0.6%  -1.7%
Max            +8.9% +0.0% +16.9% +13.8%
Geometric Mean +6.1% -0.0%  +4.9%  +4.2%

and here are the results on x86/Linux:

--------------------------------------------------------------------------------
        Program           Size    Allocs   Runtime   Elapsed
--------------------------------------------------------------------------------
           anna          +6.9%     +0.0%     +7.1%     +7.4%
           ansi          +4.3%     +0.0%      0.00      0.00
           atom          +4.5%     +0.0%    +23.6%    +21.7%
         awards          +4.2%     +0.0%      0.00      0.00
         banner          +3.5%     +0.0%      0.00      0.00
     bernouilli          +4.2%     +0.0%     +2.7%     +1.8%
          boyer          +4.3%     +0.0%      0.10      0.11
         boyer2          +4.1%     +0.0%      0.01      0.02
           bspt          +5.5%     +0.0%      0.02      0.02
      cacheprof          +5.3%     +0.0%     +3.1%     +3.0%
       calendar          +4.2%     +0.0%      0.00      0.00
       cichelli          +4.2%     +0.0%      0.19      0.22
        circsim          +4.6%     +0.0%     +3.3%     +2.5%
       clausify          +4.3%     +0.0%      0.07      0.09
  comp_lab_zift          +4.5%     +0.0%    +15.3%    +14.4%
       compress          +4.4%     +0.0%     +4.1%     +4.3%
      compress2          +4.3%     +0.0%     +0.5%     +0.4%
    constraints          +4.5%     +0.0%     +6.4%     +5.9%
   cryptarithm1          +3.8%     +0.0%     +5.3%     +3.3%
   cryptarithm2          +4.0%     +0.0%      0.03      0.03
            cse          +3.9%     +0.0%      0.00      0.00
          eliza          +3.6%     +0.0%      0.00      0.00
          event          +4.3%     +0.0%     +7.9%     +7.5%
         exp3_8          +4.2%     +0.0%    +17.8%    +13.3%
         expert          +4.1%     +0.0%      0.00      0.00
            fem          +5.5%     +0.0%      0.06      0.06
            fft          +4.6%     +0.0%      0.09      0.10
           fft2          +4.9%     +0.0%      0.22    +12.3%
       fibheaps          +4.3%     +0.0%      0.08      0.08
           fish          +4.0%     +0.0%      0.05      0.06
          fluid          +6.3%     +0.0%      0.02      0.02
         fulsom          +6.1%     +0.0%     +3.4%     +3.2%
         gamteb          +5.0%     +0.0%      0.19      0.21
            gcd          +4.2%     +0.0%      0.06      0.07
    gen_regexps          +4.0%     +0.0%      0.00      0.00
         genfft          +4.2%     +0.0%      0.09      0.10
             gg          +5.1%     +0.0%      0.03      0.03
           grep          +4.5%     +0.0%      0.00      0.00
         hidden          +5.7%  (stdout)  (stdout)  (stdout)
            hpg          +5.2%     +0.0%     +6.1%     +2.0%
            ida          +4.4%     +0.0%    +10.2%     +6.6%
          infer          +4.9%     +0.0%      0.13      0.14
        integer          +4.2%     +0.0%     +1.2%     -0.2%
      integrate          +4.6%     +0.0%     +4.9%     +5.0%
        knights          +4.6%     +0.0%      0.01      0.01
           lcss          +4.2%     +0.0%     +8.5%     +7.7%
           life          +3.8%     +0.0%    +23.8%    +19.5%
           lift          +4.5%     +0.0%      0.00      0.00
      listcompr          +3.8%     +0.0%     +5.3%     +4.7%
       listcopy          +3.8%     +0.0%     +5.7%     +6.3%
       maillist          +4.0%     +0.0%      0.15     +6.1%
         mandel          +4.5%     +0.0%     -0.6%     -2.4%
        mandel2          +3.9%     +0.0%      0.02      0.02
        minimax          +4.2%     +0.0%      0.01      0.01
        mkhprog          +4.2%     +0.0%      0.00      0.01
     multiplier          +4.4%     +0.0%    +10.0%    +10.6%
       nucleic2          +4.6%     +0.0%    +16.8%    +15.0%
           para          +4.4%     +0.0%    +11.7%     +9.7%
      paraffins          +4.3%     +0.0%     -1.9%     +0.8%
         parser          +5.0%     +0.0%      0.08      0.08
        parstof          +4.8%     +0.0%      0.02      0.02
            pic          +5.0%     +0.0%      0.03      0.03
          power          +4.4%     +0.0%     +2.7%     +2.7%
         pretty          +4.4%     +0.0%      0.00      0.00
         primes          +4.2%     +0.0%      0.12      0.13
      primetest          +4.3%     +0.0%     -0.9%     +0.5%
         prolog          +4.2%     +0.0%      0.00      0.00
         puzzle          +4.1%     +0.0%     +8.7%     +7.8%
         queens          +4.2%     +0.0%      0.03      0.03
        reptile          +5.1%     +0.0%      0.03      0.04
        rewrite          +4.6%     +0.0%      0.02      0.03
           rfib          +4.5%     +0.0%      0.12      0.12
            rsa          +4.3%     +0.0%      0.17      0.18
            scc          +3.7%     +0.0%      0.00      0.00
          sched          +4.3%     +0.0%      0.05      0.05
            scs          +5.7%     +0.0%     +2.3%     +1.3%
         simple          +6.8%     +0.0%     +5.6%     +5.8%
          solid          +4.5%     +0.0%    +11.1%     +6.6%
        sorting          +4.0%     +0.0%      0.00      0.00
         sphere          +5.3%     +0.0%    +17.2%    +12.9%
         symalg          +5.3%     +0.0%      0.10      0.10
            tak          +4.2%     +0.0%      0.02      0.02
      transform          +4.9%     +0.0%     +2.2%     +2.1%
       treejoin          +3.7%     +0.0%     -0.4%     +2.7%
      typecheck          +4.3%     +0.0%    -23.8%    -24.1%
        veritas          +6.5%     +0.0%      0.00      0.00
           wang          +4.6%     +0.0%     +8.0%     +7.7%
      wave4main          +4.4%     +0.0%     +5.2%     +5.3%
   wheel-sieve1          +4.2%     +0.0%    +10.0%     +8.8%
   wheel-sieve2          +4.2%     +0.0%     +2.1%     +2.2%
           x2n1          +4.6%     +0.0%      0.06      0.06
--------------------------------------------------------------------------------
            Min          +3.5%     +0.0%    -23.8%    -24.1%
            Max          +6.9%     +0.0%    +23.8%    +21.7%
 Geometric Mean          +4.5%     -0.0%     +6.0%     +5.3%

Slightly worse than the x86_64 results, though this is an older processor.

The result for typecheck is very odd. It's repeatable, but only on this machine - I suspect a bad cache interaction or similar. I should probably re-run the tests on a machine with a more recent processor.

While I was at it, I measured the -fvia-C backend against the NCG:

--------------------------------------------------------------------------------
        Program           Size    Allocs   Runtime   Elapsed
--------------------------------------------------------------------------------
            Min          -7.7%    -47.3%    -33.0%    -30.2%
            Max          -3.8%     +0.0%    +29.5%    +28.8%
 Geometric Mean          -5.0%     -0.7%     -6.7%     -5.7%

while we weren't looking, the via-C backend has regressed a lot, at least on these "average" Haskell programs. The +29.5% outlier is typecheck agaain, since I'm using the same set of results for -fasm as above.

I think the main reason for the regression is code like this:

        movl    $stg_ap_n_fast, %eax
.L3:
        jmp     *%eax
.L2:
        movl    $8, 112(%edx)
        movl    -8(%ebx), %eax
        jmp .L3

gcc is being too clever in commoning up the indirect jump.

Conclusion: don't use -fvia-C, even in 6.12, unless you are sure it speeds things up. I'm turning it off for our builds.

===============

So here's a crazy idea. Why don't we post-process the assembly code coming out of LLVM? Before you throw up your hands in horror, consider that

 - it's a simple transformation, just re-ordering blocks of code

 - we can do it in Haskell using ByteStrings, it would probably
   amount to a couple of hundred lines of code at the most.  Perhaps
   an Alex lexer would be the quickest way to split into blocks, then
   a bit of Haskell to glue them back into the correct order.  We may
   have to fiddle with the .aligns a bit.

 - we don't care too much about compile-time performance, since LLVM is
   a -O2 thing, we have the NCG for generating code fast

 - at the same time we can talk with the LLVM folks about adding
   support for TNTC, but we'd have a way to generate code in the
   meantime.

Just a thought...

Cheers,
        Simon

_______________________________________________
Cvs-ghc mailing list
[email protected]
http://www.haskell.org/mailman/listinfo/cvs-ghc

Reply via email to