On Tue, Aug 25, 2020 at 8:47 PM Kewen.Lin <li...@linux.ibm.com> wrote:
>
> Hi Bin,
>
> >>
> >> For one particular case like:
> >>
> >>             for (i = 0; i < SIZE; i++)
> >>               y[i] = a * x[i] + z[i];
> >>
> >> we will mark reg_offset_p for IV candidates on x as below:
> >>    - (unsigned long) (x_18(D) + 8)    // only mark this before.
> >>    - x_18(D) + 8
> >>    - (unsigned long) (x_18(D) + 24)
> >>    - (unsigned long) ((vector(2) double *) (x_18(D) + 8) + 
> >> 18446744073709551600)
> >>    ...
> >>
> >> Do you mind to have a review again?  Thanks in advance!
> > I trust you with the change.
>
> Thanks again!  Sorry for the late since it took some time to investigate
> the exposed issues.
>
> >>
> >> Bootstrapped/regtested on powerpc64le-linux-gnu P8 and P9.
> >>
> >> SPEC2017 P9 performance run has no remarkable degradations/improvements.
> > Is this run with unroll-loops?
>
> Yes, the options what I used were:
>
>    -Ofast -mcpu=power9 -fpeel-loops -mrecip -funroll-loops
>
> I also re-tested the newly attached patch, nothing changes for SPEC2017 data.
>
> > Could you exercise the code with unroll-loops enabled when
> > bootstrap/regtest please?  It doesn't matter if cases fail with
> > unroll-loops, just making sure there is no fallout.  Otherwise it's
> > fine with me.
>
> Great idea!  With explicitly specified -funroll-loops, it's bootstrapped
> but the regression testing did show one failure (the only one):
>
>   PASS->FAIL: gcc.dg/sms-4.c scan-rtl-dump-times sms "SMS succeeded" 1
>
> It exposes two issues:
>
> 1) Currently address_cost hook on rs6000 always return zero, but at least
> from Power7, pre_inc/pre_dec kind instructions are cracked, it means we
> have to take the address update into account (scalar normal operation).
> Since IVOPTs reduces the cost_step for ainc candidates, it makes us prefer
> ainc candidates.  In this case, the cand/group cost is -4 (minus cost_step),
> with scaling up, the off becomes much.  With one simple hack on for pre_inc/
> pre_dec in rs6000 address_cost, the case passed.  It should be handled in
> one separated issue.
>
> 2) This case makes me think we should exclude ainc candidates in function
> mark_reg_offset_candidates.  The justification is that: ainc candidate
> handles step update itself and when we calculate the cost for it against
> its ainc_use, the cost_step has been reduced. When unrolling happens,
> the ainc computations are replicated and it doesn't save step updates
> like normal reg_offset_p candidates.
Though auto-inc candidate embeds stepping operation into memory
access, we might want to avoid it in case of unroll if there are many
sequences of memory accesses, and if the unroll factor is big.  The
rationale is embedded stepping is a u-arch operation and does have its
cost.

>
> I've updated the patch to punt ainc_use candidates as below:
>
> > +         /* Skip AINC candidate since it contains address update itself,
> > +            the replicated AINC computations when unrolling still have
> > +            updates, unlike reg_offset_p candidates can save step updates
> > +            when unrolling.  */
> > +         if (cand->ainc_use)
> > +           continue;
> > +
>
> For this new attached patch, it's bootstrapped and regress-tested without
> explicit unrolling, while the only one failure has been identified as
> rs6000 specific address_cost issue in bootstrapping and regression testing
> with explicit unrolling.
>
> By the way, with above simple hack of address_cost, I also did one
> bootstrapping and regression testing with explicit unrolling, the above
> sms-4.c did pass as I expected but had two failures instead:
>
>   PASS->FAIL: gcc.dg/sms-compare-debug-1.c (test for excess errors)
>   PASS->FAIL: gcc.dg/tree-ssa/ivopts-lt.c scan-tree-dump-times ivopts "PHI" 2
>
> By further investigation, the 2nd one is expected due to the adddress_cost 
> hook
> hack, while the 1st one one exposed -fcompare-debug issue in sms.  The RTL
> sequence starts to off from sms, just some NOTE_INSN_DELETED positions change.
> I believe it's just exposed by this combination unluckily/luckily ;-) I will
> send a patch separately for it once I got time to look into it, but it should
> be unrelated to this patch series for sure.
This is the kind of situation I intended to avoid before.  IMHO, this
isn't a neat change (it can't be given we are predicting the future
transformation in compilation pipeline), accumulation of such changes
could make IVOPTs break in one way or another.  So as long as you make
sure it doesn't have functional impact in case of no-rtl_unroll, I am
fine.
>
> In a word, bootstrapping/regress-testing with unroll-loops enabled shows this
> patch looks fine.
>
> BR,
> Kewen
> -----
> gcc/ChangeLog:
>
>         * tree-ssa-loop-ivopts.c (struct iv_cand): New field reg_offset_p.
>         (struct ivopts_data): New field consider_reg_offset_for_unroll_p.
>         (mark_reg_offset_candidates): New function.
>         (add_candidate_1): Set reg_offset_p to false for new candidate.
>         (set_group_iv_cost): Scale up group cost with estimate_unroll_factor 
> if
>         consider_reg_offset_for_unroll_p.
>         (determine_iv_cost): Increase step cost with estimate_unroll_factor if
>         consider_reg_offset_for_unroll_p.
>         (tree_ssa_iv_optimize_loop): Call estimate_unroll_factor, update
>         consider_reg_offset_for_unroll_p.

Reply via email to