I tested the greedy and PBQP register allocators with a recent GHC
HEAD (today) build and LLVM v3.0svn (revision 139459.) With nofib they
actually both give worse results than linear scan it seems.
Results of GHC's regular NCG vs LLVM:
NoFib Results
--------------------------------------------------------------------------------
Program Size Allocs Runtime Elapsed TotalMem
--------------------------------------------------------------------------------
...
--------------------------------------------------------------------------------
...
Geometric Mean +0.1% +0.0% +0.8% +0.8% -0.1%
GHC regular native code generator vs LLVM with greedy register
allocation, by saying:
$ make clean && make boot && make -k EXTRA_HC_OPTS="-fllvm -optlc
\"-regalloc=greedy\" " >& ghc-llvm-greed
Results:
--------------------------------------------------------------------------------
Program Size Allocs Runtime Elapsed TotalMem
--------------------------------------------------------------------------------
...
--------------------------------------------------------------------------------
...
Geometric Mean +0.1% -0.0% +1.7% +1.7% -0.0%
GHC regular code generator vs LLVM PBQP, by saying:
$
Results:
--------------------------------------------------------------------------------
Program Size Allocs Runtime Elapsed TotalMem
--------------------------------------------------------------------------------
...
--------------------------------------------------------------------------------
...
Geometric Mean +1.3% +0.0% +12.5% +12.7% -0.0%
The machine is a Core i5, running Linux (2.6.38.) I haven't tested any
of this with Max's analysis plugin because it seems to trip a bug in
my linux build of LLVM. I'll update to a more recent LLVM revision and
try it on my Mac OS X box, LLVM trunk is generally more stable there
anyway.
I also wonder if shuffling the order of optimisations in LLVM around
could bring any performance benefit. There may be a much better set of
default optimisations for GHC when combined with the alias analysis.
I'm reminded of Don Stewart using ACOVEA to try and find the optimal
set of flags to pass to the optimiser. Perhaps doing this over some
nofib benchmarks with different options could give some insight -
although it takes a while, and all references to official ACOVEA
development seemed to have mysteriously disappeared off the net, so I
don't know if it's maintained, but there are packages in Ubuntu...
On Sun, Oct 2, 2011 at 1:53 PM, David Terei <[email protected]> wrote:
> I've tried PBQP in the past. Not sure if I've tried with 2.9. In the
> past it had compilation problems and didn't really have much of an
> affect on the performance. It would be worthwhile trying again in 3.0
> and with the alias analysis pass.
>
> On 2 October 2011 10:24, Nathan Howell <[email protected]> wrote:
>> On Fri, Sep 30, 2011 at 4:08 PM, David Terei <[email protected]> wrote:
>>>
>>> Are you using LLVM 2.9 or the unreleased 3.0? I'm
>>> excited about the new register allocator in 3.0. I don't know if it
>>> will help but more then any other improvement in the last few releases
>>> it seems to have the potential to. I was hoping the new register
>>> allocator combined with your alias pass would be a winning
>>> combination.
>>
>> Have you tried using PBQP on 2.9? I've seen it perform better (sometimes
>> much better) than the linear scan allocator... though this was on very large
>> function bodies compared to the typical code emitted by GHC.
>> _______________________________________________
>> Cvs-ghc mailing list
>> [email protected]
>> http://www.haskell.org/mailman/listinfo/cvs-ghc
>>
>>
>
> _______________________________________________
> Cvs-ghc mailing list
> [email protected]
> http://www.haskell.org/mailman/listinfo/cvs-ghc
>
--
Regards,
Austin
_______________________________________________
Cvs-ghc mailing list
[email protected]
http://www.haskell.org/mailman/listinfo/cvs-ghc