On Wed, Oct 7, 2009 at 4:25 PM, Vladimir Makarov <vmaka...@redhat.com> wrote: > Jan Hubicka wrote: >> >> So things seems to work now plus minus as expected. I.e. LTO builds >> seems similar to combined builds and whole-programs improves code size >> quite noticeably. >> Runtime results for gzip are pretty much unchanged, but that is >> expected. I am quite curoius about full SPEC run. >> > Before the fix (Jan's two latest patches), the lto results were > disappointed. In brief the results I checked SPEC2000 a week ago on > lto branch LTO on Core I7 (-O3 vs -O3 -flto with optional > -fwhole-program) were > o Usage of LTO made compiler 1.9 time slower (in cpu time) for > SPECInt2000 and 2.2 time for SPECFP2000 on x86 and x86_64. > o LTO generated 16-17% bigger code for Int2000 and 13-14% bigger for > FP2000. > o There is 0.6% improvement for SPECFP2000 on x86 and 1% for > SPECInt2000 on x86_64 (only because of 20% improvement on vortex, > all other tests were actually worse than without LTO). > o No improvement for Int2000 on x86 and FP2000 on x86_64. > o 252.eon and 176.gcc crash compiler when LTO were used. > > With latest Jan's fixes, The results (for -O3 vs -O3 -flto > -fwhole-program) are > > x86: > o Int2000: > - LTO crashes the compiler on vortex. LTO generates > wrong code for vpr, gcc, perlbmk, and gap. > - Compiler is 1.85 times slower with LTO > - Average code size is almost 6% smaller: > > 4.615% 44287 46331 164.gzip > -3.145% 144101 139569 175.vpr > 0.261% 1566926 1571009 176.gcc > -12.118% 12279 10791 181.mcf > 11.130% 209956 233324 186.crafty > -29.735% 155358 109162 197.parser > -23.075% 497347 382585 252.eon > 8.904% 552163 601327 253.perlbmk > 1.516% 503006 510630 254.gap > -20.891% 47465 37549 256.bzip2 > -3.047% 198365 192321 300.twolf > Average = -5.96236% > > - Performance is improved almost by 4% > > 164.gzip 1668 1629 -2.33813% > 181.mcf 5011 5020 0.17960% > 186.crafty 2268 2277 0.39682% > 197.parser 1928 1925 -0.15560% > 252.eon 2477 2950 19.0957% > 256.bzip2 1894 1956 3.2735% > 300.twolf 2806 3026 7.84034% > GeoMean 2416 2509 3.84934% > > o FP2000 > - LTO generates wrong code for mgrid, applu, galgel, facerec, > fm3d, sxitrack, and apsi. > - Compiler is 2.1 times slower with LTO > - Average code size is almost 1.7% smaller: > > -8.771% 27544 25128 168.wupwise > 2.328% 9108 9320 171.swim > 2.127% 18193 18580 172.mgrid > 0.004% 76584 76587 173.applu > -5.938% 576270 542049 177.mesa > -2.046% 183667 179910 178.galgel > -10.635% 15881 14192 179.art > -16.292% 28812 24118 183.equake > -3.177% 67239 65103 187.facerec > 10.989% 125273 139039 188.ammp > -0.735% 49137 48776 189.lucas > -0.856% 1144550 1134756 191.fma3d > 11.457% 935941 1043168 200.sixtrack > Average = -1.65735% > > - Performance is improved almost by 6% > > 168.wupwise 2349 3266 39.0379% > 171.swim 3511 3529 0.51267% > 177.mesa 1970 2008 1.92893% > 179.art 7097 7293 2.76173% > 183.equake 3844 4138 7.64828% > 188.ammp 2423 2401 -0.90796% > 189.lucas 2825 2718 -3.78761% > GeoMean 3144 3332 5.97964% > x86_64: > o Int2000: > - LTO crashes the compiler on gcc. LTO generates > wrong code for vpr, perlbmk, gap, and vortex > - Compiler is 1.8 times slower with LTO > - Average code size is more than 8% smaller: > > 1.376% 49119 49795 164.gzip > -4.348% 158389 151503 175.vpr > -16.964% 14949 12413 181.mcf > 12.875% 195234 220370 186.crafty > -29.519% 180780 127416 197.parser > -22.894% 521614 402197 252.eon > 9.507% 645749 707141 253.perlbmk > 6.550% 585164 623492 254.gap > -22.493% 660414 511866 255.vortex > -18.343% 55825 45585 256.bzip2 > -5.295% 212727 201463 300.twolf > Average = -8.14068% > > - Performance is improved by 2.1% > > 164.gzip 1804 1773 -1.7184% > 181.mcf 3480 3460 -0.5747% > 186.crafty 3397 3406 0.2649% > 197.parser 1847 1803 -2.3822% > 252.eon 4071 4537 11.4468% > 256.bzip2 2197 2249 2.3668% > 300.twolf 2878 3048 5.9068% > GeoMean 2688 2744 2.0833% > > o FP2000 > - LTO crashes the compiler on apsi. LTO generates wrong code for > mgrid, applu, galgel, facerec, fm3d, sixtrack. > - Compiler is 2.1 times slower with LTO > - Average code size is 2.7% smaller: > > 27.674% 33902 43284 168.wupwise > -3.107% 15704 15216 171.swim > -0.685% 22929 22772 172.mgrid > -1.167% 103280 102075 173.applu > -8.346% 678724 622079 177.mesa > -4.304% 249773 239024 178.galgel > -25.801% 20375 15118 179.art > -28.805% 37514 26708 183.equake > -1.577% 76837 75625 187.facerec > 1.570% 168235 170877 188.ammp > -1.168% 57271 56602 189.lucas > -0.940% 1276316 1264314 191.fma3d > 10.949% 1106507 1227658 200.sixtrack > Average = -2.74672% > > - Performance is improved almost by 6% > > 168.wupwise 2532 3708 46.4455% > 171.swim 3740 3729 -0.2941% > 177.mesa 2969 2946 -0.7746% > 179.art 7278 7092 -2.5556% > 183.equake 3978 4227 6.2594% > 188.ammp 2490 2515 1.0040% > 189.lucas 3886 3806 -2.0586% > GeoMean 3603 3812 5.8007% > > LTO is quite promising. Actually it is in line or even better with > improvement got from other compilers (pathscale is the most convenient > compiler to check lto separately: lto gave there upto 5% improvement > on SPECFP2000 and 3.5% for SPECInt2000 making compiler about 50% > slower and generated code size upto 30% bigger). LTO in GCC actually > results in significant code reduction which is quite different from > pathscale. That is one of rare cases on my mind when a specific > optimization works actually better in gcc than in other optimizing > compilers. So congratulation to all people who worked on LTO! > > I think the biggest winner of LTO will be big C++ programs (eon shows > that). Additional optimizations (like devirtualization) could improve > that results even more. I think the next big thing would be using > subtarget-specialized functions.
Note that there are daily runs for SPEC2000 and SPEC2006 on x86_64 with -flto (and now -fwhopr) beyond gcc.opensuse.org. SPEC2000 all compile and run successfully for me with -flto with the exception of gcc which is non-conforming C code. SPEC2006 is a different story, a bunch of tests do not have enough memory to compile, another bunch miscompare or crash. Note that today we had additional breakage due to IPA-SRA, after that is fixed results should look a lot better. My performance obvservations before Honzas patch are disappointing as well - just some minor speedups / slowdowns. Richard.