On 22/02/2010 16:49, Simon Marlow wrote:
On 22/02/2010 12:34, Simon Marlow wrote:
I'm currently running some benchmarks to see how much impact turning off
TNTC has on the -fasm backend.
Here are the results on x86-64/Linux:
[ snip ]
--------------------------------------------------------------------------------
Mi +4.7% -0.0% -0.6% -1.7%
Max +8.9% +0.0% +16.9% +13.8%
Geometric Mean +6.1% -0.0% +4.9% +4.2%
and here are the results on x86/Linux:
--------------------------------------------------------------------------------
Program Size Allocs Runtime Elapsed
--------------------------------------------------------------------------------
anna +6.9% +0.0% +7.1% +7.4%
ansi +4.3% +0.0% 0.00 0.00
atom +4.5% +0.0% +23.6% +21.7%
awards +4.2% +0.0% 0.00 0.00
banner +3.5% +0.0% 0.00 0.00
bernouilli +4.2% +0.0% +2.7% +1.8%
boyer +4.3% +0.0% 0.10 0.11
boyer2 +4.1% +0.0% 0.01 0.02
bspt +5.5% +0.0% 0.02 0.02
cacheprof +5.3% +0.0% +3.1% +3.0%
calendar +4.2% +0.0% 0.00 0.00
cichelli +4.2% +0.0% 0.19 0.22
circsim +4.6% +0.0% +3.3% +2.5%
clausify +4.3% +0.0% 0.07 0.09
comp_lab_zift +4.5% +0.0% +15.3% +14.4%
compress +4.4% +0.0% +4.1% +4.3%
compress2 +4.3% +0.0% +0.5% +0.4%
constraints +4.5% +0.0% +6.4% +5.9%
cryptarithm1 +3.8% +0.0% +5.3% +3.3%
cryptarithm2 +4.0% +0.0% 0.03 0.03
cse +3.9% +0.0% 0.00 0.00
eliza +3.6% +0.0% 0.00 0.00
event +4.3% +0.0% +7.9% +7.5%
exp3_8 +4.2% +0.0% +17.8% +13.3%
expert +4.1% +0.0% 0.00 0.00
fem +5.5% +0.0% 0.06 0.06
fft +4.6% +0.0% 0.09 0.10
fft2 +4.9% +0.0% 0.22 +12.3%
fibheaps +4.3% +0.0% 0.08 0.08
fish +4.0% +0.0% 0.05 0.06
fluid +6.3% +0.0% 0.02 0.02
fulsom +6.1% +0.0% +3.4% +3.2%
gamteb +5.0% +0.0% 0.19 0.21
gcd +4.2% +0.0% 0.06 0.07
gen_regexps +4.0% +0.0% 0.00 0.00
genfft +4.2% +0.0% 0.09 0.10
gg +5.1% +0.0% 0.03 0.03
grep +4.5% +0.0% 0.00 0.00
hidden +5.7% (stdout) (stdout) (stdout)
hpg +5.2% +0.0% +6.1% +2.0%
ida +4.4% +0.0% +10.2% +6.6%
infer +4.9% +0.0% 0.13 0.14
integer +4.2% +0.0% +1.2% -0.2%
integrate +4.6% +0.0% +4.9% +5.0%
knights +4.6% +0.0% 0.01 0.01
lcss +4.2% +0.0% +8.5% +7.7%
life +3.8% +0.0% +23.8% +19.5%
lift +4.5% +0.0% 0.00 0.00
listcompr +3.8% +0.0% +5.3% +4.7%
listcopy +3.8% +0.0% +5.7% +6.3%
maillist +4.0% +0.0% 0.15 +6.1%
mandel +4.5% +0.0% -0.6% -2.4%
mandel2 +3.9% +0.0% 0.02 0.02
minimax +4.2% +0.0% 0.01 0.01
mkhprog +4.2% +0.0% 0.00 0.01
multiplier +4.4% +0.0% +10.0% +10.6%
nucleic2 +4.6% +0.0% +16.8% +15.0%
para +4.4% +0.0% +11.7% +9.7%
paraffins +4.3% +0.0% -1.9% +0.8%
parser +5.0% +0.0% 0.08 0.08
parstof +4.8% +0.0% 0.02 0.02
pic +5.0% +0.0% 0.03 0.03
power +4.4% +0.0% +2.7% +2.7%
pretty +4.4% +0.0% 0.00 0.00
primes +4.2% +0.0% 0.12 0.13
primetest +4.3% +0.0% -0.9% +0.5%
prolog +4.2% +0.0% 0.00 0.00
puzzle +4.1% +0.0% +8.7% +7.8%
queens +4.2% +0.0% 0.03 0.03
reptile +5.1% +0.0% 0.03 0.04
rewrite +4.6% +0.0% 0.02 0.03
rfib +4.5% +0.0% 0.12 0.12
rsa +4.3% +0.0% 0.17 0.18
scc +3.7% +0.0% 0.00 0.00
sched +4.3% +0.0% 0.05 0.05
scs +5.7% +0.0% +2.3% +1.3%
simple +6.8% +0.0% +5.6% +5.8%
solid +4.5% +0.0% +11.1% +6.6%
sorting +4.0% +0.0% 0.00 0.00
sphere +5.3% +0.0% +17.2% +12.9%
symalg +5.3% +0.0% 0.10 0.10
tak +4.2% +0.0% 0.02 0.02
transform +4.9% +0.0% +2.2% +2.1%
treejoin +3.7% +0.0% -0.4% +2.7%
typecheck +4.3% +0.0% -23.8% -24.1%
veritas +6.5% +0.0% 0.00 0.00
wang +4.6% +0.0% +8.0% +7.7%
wave4main +4.4% +0.0% +5.2% +5.3%
wheel-sieve1 +4.2% +0.0% +10.0% +8.8%
wheel-sieve2 +4.2% +0.0% +2.1% +2.2%
x2n1 +4.6% +0.0% 0.06 0.06
--------------------------------------------------------------------------------
Min +3.5% +0.0% -23.8% -24.1%
Max +6.9% +0.0% +23.8% +21.7%
Geometric Mean +4.5% -0.0% +6.0% +5.3%
Slightly worse than the x86_64 results, though this is an older processor.
The result for typecheck is very odd. It's repeatable, but only on this
machine - I suspect a bad cache interaction or similar. I should
probably re-run the tests on a machine with a more recent processor.
While I was at it, I measured the -fvia-C backend against the NCG:
--------------------------------------------------------------------------------
Program Size Allocs Runtime Elapsed
--------------------------------------------------------------------------------
Min -7.7% -47.3% -33.0% -30.2%
Max -3.8% +0.0% +29.5% +28.8%
Geometric Mean -5.0% -0.7% -6.7% -5.7%
while we weren't looking, the via-C backend has regressed a lot, at
least on these "average" Haskell programs. The +29.5% outlier is
typecheck agaain, since I'm using the same set of results for -fasm as
above.
I think the main reason for the regression is code like this:
movl $stg_ap_n_fast, %eax
.L3:
jmp *%eax
.L2:
movl $8, 112(%edx)
movl -8(%ebx), %eax
jmp .L3
gcc is being too clever in commoning up the indirect jump.
Conclusion: don't use -fvia-C, even in 6.12, unless you are sure it
speeds things up. I'm turning it off for our builds.
===============
So here's a crazy idea. Why don't we post-process the assembly code
coming out of LLVM? Before you throw up your hands in horror, consider that
- it's a simple transformation, just re-ordering blocks of code
- we can do it in Haskell using ByteStrings, it would probably
amount to a couple of hundred lines of code at the most. Perhaps
an Alex lexer would be the quickest way to split into blocks, then
a bit of Haskell to glue them back into the correct order. We may
have to fiddle with the .aligns a bit.
- we don't care too much about compile-time performance, since LLVM is
a -O2 thing, we have the NCG for generating code fast
- at the same time we can talk with the LLVM folks about adding
support for TNTC, but we'd have a way to generate code in the
meantime.
Just a thought...
Cheers,
Simon
_______________________________________________
Cvs-ghc mailing list
[email protected]
http://www.haskell.org/mailman/listinfo/cvs-ghc