https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92074
--- Comment #3 from Xiong Hu XS Luo <luoxhu at cn dot ibm.com> --- (In reply to Jan Hubicka from comment #2) > The regression is because we now inline covered into digits2: > > IPA function summary for digits_2/29 inlinable > global time: 1553.078985 > self size: 1295 > global size: 1295 > min size: 0 > self stack: 261 > global stack: 261 > size:981.000000, time:1505.442572 > size:3.000000, time:1.999121, executed if:(not inlined) > size:0.500000, time:0.500000, executed if:(not inlined), nonconst > if:(op0[ref offset: 0] changed) && (not inlined) > size:210.500000, time:27.456610, nonconst if:(op0[ref offset: 0] > changed) > size:21.000000, time:3.795164, executed if:(op0[ref offset: 0] == 5) > size:6.000000, time:0.334389, executed if:(op0[ref offset: 0] != 8) > size:1.000000, time:0.033237, executed if:(op0[ref offset: 0] != 8), > nonconst if:(op0[ref offset: 0] changed) && (op0[ref offset: 0] != 8) > size:66.000000, time:13.130882, executed if:(op0[ref offset: 0] == 8) > loop iterations:(op0[ref offset: 0] changed) > calls: > digits_2/29 function not considered for inlining > loop depth: 9 freq:0.03 size: 2 time: 11callee size:647 stack:261 > predicate: (op0[ref offset: 0] != 8) > op0 is compile time invariant > covered.constprop/93 function not considered for inlining > loop depth: 9 freq:0.00 size: 4 time: 13callee size:214 stack:1472 > predicate: (op0[ref offset: 0] == 8) > op0 is compile time invariant > op1 is compile time invariant > > digits_2 is quite deeply recursive and inlining quite expensive function > "covered" does not help. Hi Honza, I am analyzing the exchange2 of the recursive call digits_2(int k), this is not relevant with current PR. Sorry for distracting. In Fortran, k is pass by reference instead of pass by value, the new IPA-SRA could do the SRA and convert it to pass by value with some workaround, but ipa-sra is running after ipa-cp, and ipa-cp is not able to leverage the SRA results in WPA stage. As digits_2 consumes most of the run time, and the input param value increases from 1 to 9, if manually convert the recursive call to non-recursive call like: case(1) call digits_2_1(); ... case(9) call digits_2_9(); The performance will go up for about 60%. So there may be possible methods to do such kind of optimization: 1. Enable profile with value range and probability, save the input param k's value range to be [1, 9] 90%, ~[1, 9] 10%, then ipa-cp and ipa-sra could do recursive const propagation for digits_2 to generate digits_2.constprop1, digits_2.constprop2, etc. It would be a combined optimization of ipa-profile, ipa-cp, ipa-sra. This would be complicated as ipa-cp doesn't support recursive const prop and pass by reference prop with operands yet(like *(&k)+1). 2. Or use an independent pass(I am not sure whether it already exists in current GCC) to do the recursive to non-recursive call conversion like manual way for HOT recursive calls, then ipa-cp could do the const prop as usual. Any suggestion about this, please? Thanks. > > This can be solved by --param inline-heuristics-hint-percent=600 > the current default of 1600 is way too high and I scheduled some benchmarks > to tune it down but unfortunately our LNT benchmarking is down currently. (I > would like to see it reduced to even lower value if polyhedron and SPEC > testing is happy about that) > > Generally it would be nice if inliner understood that inlining into self > recursive functions on the path that is not going to recursion may be > harmful. This we do not model and thus this works/does not work sort of > randomly.