[Bug ipa/92074] [10 regression] 26% performance regression on Spec2017 548.exchange2_r

luoxhu at cn dot ibm.com Tue, 15 Oct 2019 01:18:24 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92074


--- Comment #3 from Xiong Hu XS Luo <luoxhu at cn dot ibm.com> ---
(In reply to Jan Hubicka from comment #2)
> The regression is because we now inline covered into digits2:
> 
> IPA function summary for digits_2/29 inlinable
>   global time:     1553.078985
>   self size:       1295
>   global size:     1295
>   min size:       0
>   self stack:      261
>   global stack:    261
>     size:981.000000, time:1505.442572
>     size:3.000000, time:1.999121,  executed if:(not inlined)
>     size:0.500000, time:0.500000,  executed if:(not inlined),  nonconst
> if:(op0[ref offset: 0] changed) && (not inlined)
>     size:210.500000, time:27.456610,  nonconst if:(op0[ref offset: 0]
> changed)
>     size:21.000000, time:3.795164,  executed if:(op0[ref offset: 0] == 5)
>     size:6.000000, time:0.334389,  executed if:(op0[ref offset: 0] != 8)
>     size:1.000000, time:0.033237,  executed if:(op0[ref offset: 0] != 8), 
> nonconst if:(op0[ref offset: 0] changed) && (op0[ref offset: 0] != 8)
>     size:66.000000, time:13.130882,  executed if:(op0[ref offset: 0] == 8)
>   loop iterations:(op0[ref offset: 0] changed)
>   calls:
>     digits_2/29 function not considered for inlining
>       loop depth: 9 freq:0.03 size: 2 time: 11callee size:647 stack:261
> predicate: (op0[ref offset: 0] != 8)
>        op0 is compile time invariant
>     covered.constprop/93 function not considered for inlining
>       loop depth: 9 freq:0.00 size: 4 time: 13callee size:214 stack:1472
> predicate: (op0[ref offset: 0] == 8)
>        op0 is compile time invariant
>        op1 is compile time invariant
> 
> digits_2 is quite deeply recursive and inlining quite expensive function
> "covered" does not help. 

Hi Honza,

I am analyzing the exchange2 of the recursive call digits_2(int k), this is not
relevant with current PR. Sorry for distracting. 

In Fortran, k is pass by reference instead of pass by value, the new IPA-SRA
could do the SRA and convert it to pass by value with some workaround, but
ipa-sra is running after ipa-cp, and ipa-cp is not able to leverage the SRA
results in WPA stage.

As digits_2 consumes most of the run time, and the input param value increases
from 1 to 9, if manually convert the recursive call to non-recursive call like:
 case(1) call digits_2_1(); ... case(9) call digits_2_9();

The performance will go up for about 60%.

So there may be possible methods to do such kind of optimization:
1. Enable profile with value range and probability, save the input param k's
value range to be [1, 9] 90%, ~[1, 9] 10%, then ipa-cp and ipa-sra could do
recursive const propagation for digits_2 to generate digits_2.constprop1,
digits_2.constprop2, etc. It would be a combined optimization of ipa-profile,
ipa-cp, ipa-sra. This would be complicated as ipa-cp doesn't support recursive
const prop and pass by reference prop with operands yet(like *(&k)+1).
2. Or use an independent pass(I am not sure whether it already exists in
current GCC) to do the recursive to non-recursive call conversion like manual
way for HOT recursive calls, then ipa-cp could do the const prop as usual. 

Any suggestion about this, please? Thanks.



> 
> This can be solved by --param inline-heuristics-hint-percent=600
> the current default of 1600 is way too high and I scheduled some benchmarks
> to tune it down but unfortunately our LNT benchmarking is down currently. (I
> would like to see it reduced to even lower value if polyhedron and SPEC
> testing is happy about that)
> 
> Generally it would be nice if inliner understood that inlining into self
> recursive functions on the path that is not going to recursion may be
> harmful. This we do not model and thus this works/does not work sort of
> randomly.

[Bug ipa/92074] [10 regression] 26% performance regression on Spec2017 548.exchange2_r

Reply via email to