On 09/27/2014 01:27 AM, Jan Hubicka wrote:
>> While a plain Firefox -flto build works fine. LTO/PGO build fails with:
>>
>> lto1: internal compiler error: in ipa_merge_profiles, at ipa-utils.c:540
>> 0x7d6165 ipa_merge_profiles(cgraph_node*, cgraph_node*)
>>         ../../gcc/gcc/ipa-utils.c:540
>> 0xf10c41 ipa_icf::sem_function::merge(ipa_icf::sem_item*)
>>         ../../gcc/gcc/ipa-icf.c:753
>> 0xf15206 ipa_icf::sem_item_optimizer::merge_classes(unsigned int)
>>         ../../gcc/gcc/ipa-icf.c:2706
>> 0xf1c1f4 ipa_icf::sem_item_optimizer::execute()
>>         ../../gcc/gcc/ipa-icf.c:2098
>> 0xf1d3f1 ipa_icf_driver
>>         ../../gcc/gcc/ipa-icf.c:2784
>> 0xf1d3f1 ipa_icf::pass_ipa_icf::execute(function*)
>>         ../../gcc/gcc/ipa-icf.c:2831
>>
>>
>> The pass is also very memory hungry (from 3GB without ICF to 4GB during
>> libxul link), while the code size savings are in the 1% range.


The majority of the problem are groups of candidates that are built according 
to hash.
The hash value is based on a number of arguments, number of BB, number of 
gimple statements and types of these statements.
It groups function into classes. In WPA (before a body of any function is 
loaded) I get following histogram:

Dump after WPA based types groups
Congruence classes: 97204 (unique hash values: 88725), with total: 191457 items
Class size histogram [num of members]: number of classe number of classess
[1]: 86453 classes
[2]: 5680 classes
[3]: 1541 classes
[4]: 915 classes
[5]: 446 classes
[6]: 346 classes
[7]: 200 classes
[8]: 181 classes
[9]: 154 classes
[10]: 109 classes
[11]: 87 classes
[12]: 87 classes
[13]: 68 classes
[14]: 58 classes
[15]: 58 classes
[16]: 41 classes
[17]: 25 classes
[18]: 33 classes
[19]: 28 classes
[20]: 25 classes
[21]: 19 classes
[22]: 30 classes
[23]: 24 classes
[24]: 33 classes
[25]: 17 classes
[26]: 15 classes
[27]: 10 classes
[28]: 13 classes
[29]: 18 classes
[30]: 10 classes

It means that each class with more than one member needs to be iterated and 
these functions are compared. And yes, there's the root of the problem.
I have to load function body to process deep function comparison. As you can 
see, we have almost 200k function, where more than half each situated
in a group with more that one member. So that 1GB extra memory usage is caused 
by these bodies:

Init called for 105004 items (54.84%).

Memory footprint can be significantly reduced if one can load the body and 
release it and the memory is freed. I asked Honza about it, but it looks
GGC mechanism cannot be easily forced to release it.

> 
> Thnks for checking. I was just thinking about doing that myself.  Would
> you mind posting -ftime-report of firefox WPA stage?
> 
> It seems that in this case we reject too many of equality candidates?
> It think the original numbers was about 4-5% but later some equivalences was
> disabled because of devirt/aliasing issues. Do you compare it with gold ICF
> enabled? There are quite few obvious improvements to the analysis that can
> be done, but I guess we need to analyze the interesting cases one by one.

You are right, the number were quite promising, but during the time, I had to
reduce the "aggressivity" of the pass. As Honza said, it can be improved 
step-by-step.

> 
> One thing that Martin can try is to hook into lto-symtab and try to check
> that the COMDAT functions that are known to be same pass the equality check.
> I suppose we will learn interesting things this way.
 
Good point, I will try it.

Martin


> I think the patch adds quite important infrastructure for gimple semantic
> equality checking and function merging. I went through the majority of code 
> and
> I think it is mostly ready to mainline (i.e. cleaner than what we have in
> tree-ssa-tailmerge) so hope we can finish the review process next week.
> We will need to get better cost/benefits ratio to enable it for -O2 that is
> someting I would really like to see for 5.0, but it seems to be easier to
> handle this incrementally....

Thank you for the review,
Martin

> 
> Honza
> 

Reply via email to