Re: LTO: Speedup -- some preliminary SPEC2000 results

Richard Guenther Wed, 07 Oct 2009 07:39:58 -0700

On Wed, Oct 7, 2009 at 4:25 PM, Vladimir Makarov <vmaka...@redhat.com> wrote:
> Jan Hubicka wrote:
>>
>> So things seems to work now plus minus as expected.  I.e. LTO builds
>> seems similar to combined builds and whole-programs improves code size
>> quite noticeably.
>> Runtime results for gzip are pretty much unchanged, but that is
>> expected.  I am quite curoius about full SPEC run.
>>
> Before the fix (Jan's two latest patches), the lto results were
> disappointed.  In brief the results I checked SPEC2000 a week ago on
> lto branch LTO on Core I7  (-O3 vs -O3 -flto with optional
> -fwhole-program) were
>  o Usage of LTO made compiler 1.9 time slower (in cpu time) for
>   SPECInt2000 and 2.2 time for SPECFP2000 on x86 and x86_64.
>  o LTO generated 16-17% bigger code for Int2000 and 13-14% bigger for
> FP2000.
>  o There is 0.6% improvement for SPECFP2000 on x86 and 1% for
>   SPECInt2000 on x86_64 (only because of 20% improvement on vortex,
>   all other tests were actually worse than without LTO).
>  o No improvement for Int2000 on x86 and FP2000 on x86_64.
>  o 252.eon and 176.gcc crash compiler when LTO were used.
>
> With latest Jan's fixes, The results (for -O3 vs -O3 -flto
> -fwhole-program) are
>
> x86:
>  o Int2000:
>   - LTO crashes the compiler on vortex.  LTO generates
>     wrong code for vpr, gcc, perlbmk, and gap.
>   - Compiler is 1.85 times slower with LTO
>   - Average code size is almost 6% smaller:
>
>        4.615%          44287          46331 164.gzip
>       -3.145%         144101         139569 175.vpr
>        0.261%        1566926        1571009 176.gcc
>      -12.118%          12279          10791 181.mcf
>       11.130%         209956         233324 186.crafty
>      -29.735%         155358         109162 197.parser
>      -23.075%         497347         382585 252.eon
>        8.904%         552163         601327 253.perlbmk
>        1.516%         503006         510630 254.gap
>      -20.891%          47465          37549 256.bzip2
>       -3.047%         198365         192321 300.twolf
>       Average = -5.96236%
>
>    - Performance is improved almost by 4%
>
>      164.gzip    1668   1629  -2.33813%
>      181.mcf     5011   5020   0.17960%
>      186.crafty  2268   2277   0.39682%
>      197.parser  1928   1925  -0.15560%
>      252.eon     2477   2950  19.0957%
>      256.bzip2   1894   1956   3.2735%
>      300.twolf   2806   3026   7.84034%
>      GeoMean     2416   2509   3.84934%
>
>  o FP2000
>   - LTO generates wrong code for mgrid, applu, galgel, facerec,
>     fm3d, sxitrack, and apsi.
>   - Compiler is 2.1 times slower with LTO
>   - Average code size is almost 1.7% smaller:
>
>      -8.771%          27544          25128 168.wupwise
>       2.328%           9108           9320 171.swim
>       2.127%          18193          18580 172.mgrid
>       0.004%          76584          76587 173.applu
>      -5.938%         576270         542049 177.mesa
>      -2.046%         183667         179910 178.galgel
>     -10.635%          15881          14192 179.art
>     -16.292%          28812          24118 183.equake
>      -3.177%          67239          65103 187.facerec
>      10.989%         125273         139039 188.ammp
>      -0.735%          49137          48776 189.lucas
>      -0.856%        1144550        1134756 191.fma3d
>      11.457%         935941        1043168 200.sixtrack
>      Average = -1.65735%
>
>    - Performance is improved almost by 6%
>
>      168.wupwise    2349    3266  39.0379%
>      171.swim       3511    3529   0.51267%
>      177.mesa       1970    2008   1.92893%
>      179.art        7097    7293   2.76173%
>      183.equake     3844    4138   7.64828%
>      188.ammp       2423    2401  -0.90796%
>      189.lucas      2825    2718  -3.78761%
>      GeoMean        3144    3332   5.97964%
>   x86_64:
>  o Int2000:
>   - LTO crashes the compiler on gcc.  LTO generates
>     wrong code for vpr, perlbmk, gap, and vortex
>   - Compiler is 1.8 times slower with LTO
>   - Average code size is more than 8% smaller:
>
>        1.376%          49119          49795 164.gzip
>       -4.348%         158389         151503 175.vpr
>      -16.964%          14949          12413 181.mcf
>       12.875%         195234         220370 186.crafty
>      -29.519%         180780         127416 197.parser
>      -22.894%         521614         402197 252.eon
>        9.507%         645749         707141 253.perlbmk
>        6.550%         585164         623492 254.gap
>      -22.493%         660414         511866 255.vortex
>      -18.343%          55825          45585 256.bzip2
>       -5.295%         212727         201463 300.twolf
>      Average = -8.14068%
>
>    - Performance is improved by 2.1%
>
>      164.gzip     1804    1773  -1.7184%
>      181.mcf      3480    3460  -0.5747%
>      186.crafty   3397    3406   0.2649%
>      197.parser   1847    1803  -2.3822%
>      252.eon      4071    4537  11.4468%
>      256.bzip2    2197    2249   2.3668%
>      300.twolf    2878    3048   5.9068%
>      GeoMean      2688    2744   2.0833%
>
>  o FP2000
>   - LTO crashes the compiler on apsi.  LTO generates wrong code for
>     mgrid, applu, galgel, facerec, fm3d, sixtrack.
>   - Compiler is 2.1 times slower with LTO
>   - Average code size is 2.7% smaller:
>
>      27.674%          33902          43284 168.wupwise
>      -3.107%          15704          15216 171.swim
>      -0.685%          22929          22772 172.mgrid
>      -1.167%         103280         102075 173.applu
>      -8.346%         678724         622079 177.mesa
>      -4.304%         249773         239024 178.galgel
>     -25.801%          20375          15118 179.art
>     -28.805%          37514          26708 183.equake
>      -1.577%          76837          75625 187.facerec
>       1.570%         168235         170877 188.ammp
>      -1.168%          57271          56602 189.lucas
>      -0.940%        1276316        1264314 191.fma3d
>      10.949%        1106507        1227658 200.sixtrack
>     Average = -2.74672%
>
>    - Performance is improved almost by 6%
>
>      168.wupwise     2532   3708  46.4455%
>      171.swim        3740   3729  -0.2941%
>      177.mesa        2969   2946  -0.7746%
>      179.art         7278   7092  -2.5556%
>      183.equake      3978   4227   6.2594%
>      188.ammp        2490   2515   1.0040%
>      189.lucas       3886   3806  -2.0586%
>      GeoMean         3603   3812   5.8007%
>
> LTO is quite promising.  Actually it is in line or even better with
> improvement got from other compilers (pathscale is the most convenient
> compiler to check lto separately: lto gave there upto 5% improvement
> on SPECFP2000 and 3.5% for SPECInt2000 making compiler about 50%
> slower and generated code size upto 30% bigger).  LTO in GCC actually
> results in significant code reduction which is quite different from
> pathscale.  That is one of rare cases on my mind when a specific
> optimization works actually better in gcc than in other optimizing
> compilers.  So congratulation to all people who worked on LTO!
>
> I think the biggest winner of LTO will be big C++ programs (eon shows
> that).  Additional optimizations (like devirtualization) could improve
> that results even more.  I think the next big thing would be using
> subtarget-specialized functions.


Note that there are daily runs for SPEC2000 and SPEC2006 on
x86_64 with -flto (and now -fwhopr) beyond gcc.opensuse.org.

SPEC2000 all compile and run successfully for me with -flto
with the exception of gcc which is non-conforming C code.

SPEC2006 is a different story, a bunch of tests do not have
enough memory to compile, another bunch miscompare or
crash.

Note that today we had additional breakage due to IPA-SRA,
after that is fixed results should look a lot better.

My performance obvservations before Honzas patch are
disappointing as well - just some minor speedups / slowdowns.

Richard.

Re: LTO: Speedup -- some preliminary SPEC2000 results

Reply via email to