https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112325

--- Comment #15 from rguenther at suse dot de <rguenther at suse dot de> ---
On Wed, 28 Feb 2024, liuhongt at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112325
> 
> --- Comment #14 from Hongtao Liu <liuhongt at gcc dot gnu.org> ---
> (In reply to rguent...@suse.de from comment #13)
> > On Tue, 27 Feb 2024, liuhongt at gcc dot gnu.org wrote:
> > 
> > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112325
> > > 
> > > --- Comment #11 from Hongtao Liu <liuhongt at gcc dot gnu.org> ---
> > > 
> > > >    Loop body is likely going to simplify further, this is difficult
> > > >    to guess, we just decrease the result by 1/3.  */
> > > > 
> > > 
> > > This is introduced by r0-68074-g91a01f21abfe19
> > > 
> > > /* Estimate number of insns of completely unrolled loop.  We assume
> > > +   that the size of the unrolled loop is decreased in the
> > > +   following way (the numbers of insns are based on what
> > > +   estimate_num_insns returns for appropriate statements):
> > > +
> > > +   1) exit condition gets removed (2 insns)
> > > +   2) increment of the control variable gets removed (2 insns)
> > > +   3) All remaining statements are likely to get simplified
> > > +      due to constant propagation.  Hard to estimate; just
> > > +      as a heuristics we decrease the rest by 1/3.
> > > +
> > > +   NINSNS is the number of insns in the loop before unrolling.
> > > +   NUNROLL is the number of times the loop is unrolled.  */
> > > +
> > > +static unsigned HOST_WIDE_INT
> > > +estimated_unrolled_size (unsigned HOST_WIDE_INT ninsns,
> > > +                        unsigned HOST_WIDE_INT nunroll)
> > > +{
> > > +  HOST_WIDE_INT unr_insns = 2 * ((HOST_WIDE_INT) ninsns - 4) / 3;
> > > +  if (unr_insns <= 0)
> > > +    unr_insns = 1;
> > > +  unr_insns *= (nunroll + 1);
> > > +
> > > +  return unr_insns;
> > > +}
> > > 
> > > And r0-93444-g08f1af2ed022e0 try do it more accurately by marking
> > > likely_eliminated stmt and minus that from total insns, But 2 / 3 is still
> > > keeped.
> > > 
> > > +/* Estimate number of insns of completely unrolled loop.
> > > +   It is (NUNROLL + 1) * size of loop body with taking into account
> > > +   the fact that in last copy everything after exit conditional
> > > +   is dead and that some instructions will be eliminated after
> > > +   peeling.
> > > 
> > > -   NINSNS is the number of insns in the loop before unrolling.
> > > -   NUNROLL is the number of times the loop is unrolled.  */
> > > +   Loop body is likely going to simplify futher, this is difficult
> > > +   to guess, we just decrease the result by 1/3.  */
> > > 
> > >  static unsigned HOST_WIDE_INT
> > > -estimated_unrolled_size (unsigned HOST_WIDE_INT ninsns,
> > > +estimated_unrolled_size (struct loop_size *size,
> > >                          unsigned HOST_WIDE_INT nunroll)
> > >  {
> > > -  HOST_WIDE_INT unr_insns = 2 * ((HOST_WIDE_INT) ninsns - 4) / 3;
> > > +  HOST_WIDE_INT unr_insns = ((nunroll)
> > > +                            * (HOST_WIDE_INT) (size->overall
> > > +                                               -
> > > size->eliminated_by_peeling));
> > > +  if (!nunroll)
> > > +    unr_insns = 0;
> > > +  unr_insns += size->last_iteration -
> > > size->last_iteration_eliminated_by_peeling;
> > > +
> > > +  unr_insns = unr_insns * 2 / 3;
> > >    if (unr_insns <= 0)
> > >      unr_insns = 1;
> > > -  unr_insns *= (nunroll + 1);
> > > 
> > > It looks to me 1 / 3 overestimates the instructions that can be optimised 
> > > away,
> > > especially if we've subtracted eliminated_by_peeling
> > 
> > Yes, that 1/3 reduction is a bit odd - you could have the same effect
> > by increasing the instruction limit by 1/3, but that means it doesn't
> > really matter, does it?  It would be interesting to see if increasing
> > the limit by 1/3 and removing the above is neutral on SPEC?
> 
> Remove 1/3 reduction get ~2% improvement for 525.x264_r on SPR with
> -march=native -O3, no big impact on other integer benchmark.

454.calculix was always the benchmark to cross check as that benefits
from much unrolling.

I'm all for removing the 1/3 for innermost loop handling (in cunroll
the unrolled loop is then innermost).  I'm more concerned about
unrolling more than one level which is exactly what's required for
454.calculix.

Reply via email to