https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112325
--- Comment #15 from rguenther at suse dot de <rguenther at suse dot de> --- On Wed, 28 Feb 2024, liuhongt at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112325 > > --- Comment #14 from Hongtao Liu <liuhongt at gcc dot gnu.org> --- > (In reply to rguent...@suse.de from comment #13) > > On Tue, 27 Feb 2024, liuhongt at gcc dot gnu.org wrote: > > > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112325 > > > > > > --- Comment #11 from Hongtao Liu <liuhongt at gcc dot gnu.org> --- > > > > > > > Loop body is likely going to simplify further, this is difficult > > > > to guess, we just decrease the result by 1/3. */ > > > > > > > > > > This is introduced by r0-68074-g91a01f21abfe19 > > > > > > /* Estimate number of insns of completely unrolled loop. We assume > > > + that the size of the unrolled loop is decreased in the > > > + following way (the numbers of insns are based on what > > > + estimate_num_insns returns for appropriate statements): > > > + > > > + 1) exit condition gets removed (2 insns) > > > + 2) increment of the control variable gets removed (2 insns) > > > + 3) All remaining statements are likely to get simplified > > > + due to constant propagation. Hard to estimate; just > > > + as a heuristics we decrease the rest by 1/3. > > > + > > > + NINSNS is the number of insns in the loop before unrolling. > > > + NUNROLL is the number of times the loop is unrolled. */ > > > + > > > +static unsigned HOST_WIDE_INT > > > +estimated_unrolled_size (unsigned HOST_WIDE_INT ninsns, > > > + unsigned HOST_WIDE_INT nunroll) > > > +{ > > > + HOST_WIDE_INT unr_insns = 2 * ((HOST_WIDE_INT) ninsns - 4) / 3; > > > + if (unr_insns <= 0) > > > + unr_insns = 1; > > > + unr_insns *= (nunroll + 1); > > > + > > > + return unr_insns; > > > +} > > > > > > And r0-93444-g08f1af2ed022e0 try do it more accurately by marking > > > likely_eliminated stmt and minus that from total insns, But 2 / 3 is still > > > keeped. > > > > > > +/* Estimate number of insns of completely unrolled loop. > > > + It is (NUNROLL + 1) * size of loop body with taking into account > > > + the fact that in last copy everything after exit conditional > > > + is dead and that some instructions will be eliminated after > > > + peeling. > > > > > > - NINSNS is the number of insns in the loop before unrolling. > > > - NUNROLL is the number of times the loop is unrolled. */ > > > + Loop body is likely going to simplify futher, this is difficult > > > + to guess, we just decrease the result by 1/3. */ > > > > > > static unsigned HOST_WIDE_INT > > > -estimated_unrolled_size (unsigned HOST_WIDE_INT ninsns, > > > +estimated_unrolled_size (struct loop_size *size, > > > unsigned HOST_WIDE_INT nunroll) > > > { > > > - HOST_WIDE_INT unr_insns = 2 * ((HOST_WIDE_INT) ninsns - 4) / 3; > > > + HOST_WIDE_INT unr_insns = ((nunroll) > > > + * (HOST_WIDE_INT) (size->overall > > > + - > > > size->eliminated_by_peeling)); > > > + if (!nunroll) > > > + unr_insns = 0; > > > + unr_insns += size->last_iteration - > > > size->last_iteration_eliminated_by_peeling; > > > + > > > + unr_insns = unr_insns * 2 / 3; > > > if (unr_insns <= 0) > > > unr_insns = 1; > > > - unr_insns *= (nunroll + 1); > > > > > > It looks to me 1 / 3 overestimates the instructions that can be optimised > > > away, > > > especially if we've subtracted eliminated_by_peeling > > > > Yes, that 1/3 reduction is a bit odd - you could have the same effect > > by increasing the instruction limit by 1/3, but that means it doesn't > > really matter, does it? It would be interesting to see if increasing > > the limit by 1/3 and removing the above is neutral on SPEC? > > Remove 1/3 reduction get ~2% improvement for 525.x264_r on SPR with > -march=native -O3, no big impact on other integer benchmark. 454.calculix was always the benchmark to cross check as that benefits from much unrolling. I'm all for removing the 1/3 for innermost loop handling (in cunroll the unrolled loop is then innermost). I'm more concerned about unrolling more than one level which is exactly what's required for 454.calculix.