Hello,
I have also re-done most of my firefox testing similar to ones I published at
http://hubicka.blogspot.cz/2014/04/linktime-optimization-in-gcc-2-firefox.html
(thanks to Martin Liska who got LTO builds to work again)

I am attaching statistics on binary sizes.  Interesting is that for firefox LTO 
is quite
good size optimization (16% on text) and similarly FDO reduces text size and 
combines well
with LTO, which is bit different from Martin's gcc stats. I have looked into 
this very
briefly and one isse seems to be with the way we determine hot/cold threshold.

binary size             text            relocations     data            EH      
        rest
gcc6 -O3                90448658        12887358        13720073        
13035704        257839
gcc6 -O3 -flto          75810786        12145211        12390185        8422776 
        240002
gcc6 -O3 + FDO          67087824        13008294        13655305        
13719944        259585
gcc6 -O3 -flto + FDO    60206898        12169803        12334113        9083088 
        240050
gcc7 -O3                93233440        12928831        13780313        
13578224        257408
gcc7 -O3 -flto          76764274        12128031        12405369        8420448 
        240662
gcc7 -O3 + FDO          67500688        12994279        13650185        
13661760        263400
gcc7 -O3 -flto + FDO    59776994        12151360        12325217        8971344 
        239501
gcc8 -O2                80311120        12939568        13763033        
12948752        258711
gcc8 -O2 -flto          69156752        12109236        12475801        8501152 
        240163
gcc8 -O3                89913648        12924468        13790393        
13374328        256867
gcc8 -O3 -flto          75971122        12138528        12426649        8593024 
        239861
gcc8 -O3 + FDO          67047632        12996890        13707017        
13146232        263413
gcc8 -O3 -flto + FDO    58951410        12146008        12377161        8634152 
        241765

I also did some builds with clang. Observation is that clang's -O3 binary is
smaller than ours, while our LTO/FDO builds are smaller than clang's (LTO+FDO
build quite substantially).
Our EH is bigger than clang's which is probably something to look into.  One 
problem I am
aware of is that our nothrow pass is not type sensitive and thus won't figure 
out if
program throws an exception of specific type and catches it later.

clang6 -O3              84754848        13032018        13597433        
10791528        371429
clang6 -O3 -flto        90757024        12273574        12258521        6841424 
        350585
clang6 -O3 -flto=thin   92940576        12376724        12479233        7974856 
        353171
clang6 -O3 + FDO        81776880        13136428        13574489        
11501344        385123
clang6 -O3 -flto=thin+FDO 88374432      12405075        12434297        9574416 
        356508
clang6 -O3 -flto + FDO  90637168        12288433        12244265        9023304 
        349078

I also did some benchmarking and found at least an issue with -flto -O3 hitting
--param inline-unit-growth bit too early so we do not get much benefits (while
clang does but it also does not reduce binary size). -O3 -flto + FDO or -O2
-flto seems to work well. Will summarize the results later.

Firefox developer Tom Ritter has tested LTO with FDO and without here (it is
rather nice interface - I like that one can click to the graph and see the
results in context of other tests done recently).  This is done with gcc6.

Tracking bug:
https://bugzilla.mozilla.org/show_bug.cgi?format=default&id=521435

non-FDO build:
https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-central&newProject=try&newRevision=12ce14a5bcac9975b41a1f901bfc3a8dcb2d791b&framework=1&showOnlyImportant=1&selectedTimeRange=172800

FDO build:
https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-central&newProject=try&newRevision=7e5bd52e36fcc1703ced01fe87e831a716677295&framework=1&showOnlyImportant=1&selectedTimeRange=172800

Honza

Reply via email to