On 09/27/2014 01:27 AM, Jan Hubicka wrote: >> While a plain Firefox -flto build works fine. LTO/PGO build fails with: >> >> lto1: internal compiler error: in ipa_merge_profiles, at ipa-utils.c:540 >> 0x7d6165 ipa_merge_profiles(cgraph_node*, cgraph_node*) >> ../../gcc/gcc/ipa-utils.c:540 >> 0xf10c41 ipa_icf::sem_function::merge(ipa_icf::sem_item*) >> ../../gcc/gcc/ipa-icf.c:753 >> 0xf15206 ipa_icf::sem_item_optimizer::merge_classes(unsigned int) >> ../../gcc/gcc/ipa-icf.c:2706 >> 0xf1c1f4 ipa_icf::sem_item_optimizer::execute() >> ../../gcc/gcc/ipa-icf.c:2098 >> 0xf1d3f1 ipa_icf_driver >> ../../gcc/gcc/ipa-icf.c:2784 >> 0xf1d3f1 ipa_icf::pass_ipa_icf::execute(function*) >> ../../gcc/gcc/ipa-icf.c:2831 >> >> >> The pass is also very memory hungry (from 3GB without ICF to 4GB during >> libxul link), while the code size savings are in the 1% range.
The majority of the problem are groups of candidates that are built according to hash. The hash value is based on a number of arguments, number of BB, number of gimple statements and types of these statements. It groups function into classes. In WPA (before a body of any function is loaded) I get following histogram: Dump after WPA based types groups Congruence classes: 97204 (unique hash values: 88725), with total: 191457 items Class size histogram [num of members]: number of classe number of classess [1]: 86453 classes [2]: 5680 classes [3]: 1541 classes [4]: 915 classes [5]: 446 classes [6]: 346 classes [7]: 200 classes [8]: 181 classes [9]: 154 classes [10]: 109 classes [11]: 87 classes [12]: 87 classes [13]: 68 classes [14]: 58 classes [15]: 58 classes [16]: 41 classes [17]: 25 classes [18]: 33 classes [19]: 28 classes [20]: 25 classes [21]: 19 classes [22]: 30 classes [23]: 24 classes [24]: 33 classes [25]: 17 classes [26]: 15 classes [27]: 10 classes [28]: 13 classes [29]: 18 classes [30]: 10 classes It means that each class with more than one member needs to be iterated and these functions are compared. And yes, there's the root of the problem. I have to load function body to process deep function comparison. As you can see, we have almost 200k function, where more than half each situated in a group with more that one member. So that 1GB extra memory usage is caused by these bodies: Init called for 105004 items (54.84%). Memory footprint can be significantly reduced if one can load the body and release it and the memory is freed. I asked Honza about it, but it looks GGC mechanism cannot be easily forced to release it. > > Thnks for checking. I was just thinking about doing that myself. Would > you mind posting -ftime-report of firefox WPA stage? > > It seems that in this case we reject too many of equality candidates? > It think the original numbers was about 4-5% but later some equivalences was > disabled because of devirt/aliasing issues. Do you compare it with gold ICF > enabled? There are quite few obvious improvements to the analysis that can > be done, but I guess we need to analyze the interesting cases one by one. You are right, the number were quite promising, but during the time, I had to reduce the "aggressivity" of the pass. As Honza said, it can be improved step-by-step. > > One thing that Martin can try is to hook into lto-symtab and try to check > that the COMDAT functions that are known to be same pass the equality check. > I suppose we will learn interesting things this way. Good point, I will try it. Martin > I think the patch adds quite important infrastructure for gimple semantic > equality checking and function merging. I went through the majority of code > and > I think it is mostly ready to mainline (i.e. cleaner than what we have in > tree-ssa-tailmerge) so hope we can finish the review process next week. > We will need to get better cost/benefits ratio to enable it for -O2 that is > someting I would really like to see for 5.0, but it seems to be easier to > handle this incrementally.... Thank you for the review, Martin > > Honza >