Hello, I have also re-done most of my firefox testing similar to ones I published at http://hubicka.blogspot.cz/2014/04/linktime-optimization-in-gcc-2-firefox.html (thanks to Martin Liska who got LTO builds to work again)
I am attaching statistics on binary sizes. Interesting is that for firefox LTO is quite good size optimization (16% on text) and similarly FDO reduces text size and combines well with LTO, which is bit different from Martin's gcc stats. I have looked into this very briefly and one isse seems to be with the way we determine hot/cold threshold. binary size text relocations data EH rest gcc6 -O3 90448658 12887358 13720073 13035704 257839 gcc6 -O3 -flto 75810786 12145211 12390185 8422776 240002 gcc6 -O3 + FDO 67087824 13008294 13655305 13719944 259585 gcc6 -O3 -flto + FDO 60206898 12169803 12334113 9083088 240050 gcc7 -O3 93233440 12928831 13780313 13578224 257408 gcc7 -O3 -flto 76764274 12128031 12405369 8420448 240662 gcc7 -O3 + FDO 67500688 12994279 13650185 13661760 263400 gcc7 -O3 -flto + FDO 59776994 12151360 12325217 8971344 239501 gcc8 -O2 80311120 12939568 13763033 12948752 258711 gcc8 -O2 -flto 69156752 12109236 12475801 8501152 240163 gcc8 -O3 89913648 12924468 13790393 13374328 256867 gcc8 -O3 -flto 75971122 12138528 12426649 8593024 239861 gcc8 -O3 + FDO 67047632 12996890 13707017 13146232 263413 gcc8 -O3 -flto + FDO 58951410 12146008 12377161 8634152 241765 I also did some builds with clang. Observation is that clang's -O3 binary is smaller than ours, while our LTO/FDO builds are smaller than clang's (LTO+FDO build quite substantially). Our EH is bigger than clang's which is probably something to look into. One problem I am aware of is that our nothrow pass is not type sensitive and thus won't figure out if program throws an exception of specific type and catches it later. clang6 -O3 84754848 13032018 13597433 10791528 371429 clang6 -O3 -flto 90757024 12273574 12258521 6841424 350585 clang6 -O3 -flto=thin 92940576 12376724 12479233 7974856 353171 clang6 -O3 + FDO 81776880 13136428 13574489 11501344 385123 clang6 -O3 -flto=thin+FDO 88374432 12405075 12434297 9574416 356508 clang6 -O3 -flto + FDO 90637168 12288433 12244265 9023304 349078 I also did some benchmarking and found at least an issue with -flto -O3 hitting --param inline-unit-growth bit too early so we do not get much benefits (while clang does but it also does not reduce binary size). -O3 -flto + FDO or -O2 -flto seems to work well. Will summarize the results later. Firefox developer Tom Ritter has tested LTO with FDO and without here (it is rather nice interface - I like that one can click to the graph and see the results in context of other tests done recently). This is done with gcc6. Tracking bug: https://bugzilla.mozilla.org/show_bug.cgi?format=default&id=521435 non-FDO build: https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-central&newProject=try&newRevision=12ce14a5bcac9975b41a1f901bfc3a8dcb2d791b&framework=1&showOnlyImportant=1&selectedTimeRange=172800 FDO build: https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-central&newProject=try&newRevision=7e5bd52e36fcc1703ced01fe87e831a716677295&framework=1&showOnlyImportant=1&selectedTimeRange=172800 Honza