On Thu, Oct 12, 2017 at 12:13 PM, Richard Biener <rguent...@suse.de> wrote:
> On Thu, 12 Oct 2017, Bin.Cheng wrote:
>
>> On Wed, Oct 11, 2017 at 3:43 PM, Richard Biener <rguent...@suse.de> wrote:
>> >
>> > For PR82355 I introduced a fake dimension to ISL to allow CHRECs
>> > having an evolution in a loop that isn't fully part of the SESE
>> > region we are processing.  That was easier than fending off those
>> > CHRECs (without simply giving up on SESE regions with those).
>> >
>> > But it didn't fully solve the issue as PR82451 shows where we run
>> > into the issue that we eventually have to code-gen those
>> > evolutions and thus in theory need a canonical IV of that containing loop.
>> >
>> > So I decided (after Micha pressuring me a bit...) to revisit the
>> > original issue and make SCEV analysis "properly" handle SE regions.
>> > It turns out that it is mostly instantiate_scev lacking proper support
>> > plus the necessary interfacing change (really just cosmetic in some sense)
>> > from a instantiate_before basic-block to a instantiate_before edge.
>> >
>> > data-ref interfaces have been similarly adjusted, here changing
>> > the "loop nest" loop parameter to the entry edge for the SE region
>> > and passing that down accordingly.
>> >
>> > I've for now tried to keep other high-level loop-based interfaces the
>> > same by simply using the loop preheader edge as entry where appropriate
>> > (needing loop_preheader_edge cope with the loop root tree for simplicity).
>> >
>> > In the process I ran into issues with us too overly aggressive
>> > instantiating random trees and thus I cut those down.  That part
>> > doesn't successfully test separately (when I remove the strange
>> > ARRAY_REF instantiation), so it's part of this patch.  I've also
>> > run into an SSA verification fail (the id-27.f90 testcase) which
>> > shows we _do_ need to clear the SCEV cache after introducing
>> > the versioned CFG (and added a comment before it).
>> >
>> > On the previously failing testcases I've verified we produce
>> > sensible instantiations for those pesky refs residing in "no" loop
>> > in the SCOP and that we get away with the result in terms of
>> > optimizing.
>> >
>> > SPEC 2k6 testing shows
>> >
>> > loop nest optimized: 311
>> > loop nest not optimized, code generation error: 0
>> > loop nest not optimized, optimized schedule is identical to original
>> > schedule: 173
>> > loop nest not optimized, optimization timed out: 59
>> > loop nest not optimized, ISL signalled an error: 9
>> > loop nest: 552
>> >
>> > for SPEC 2k6 and -floop-nest-optimize while adding -fgraphite-identity
>> > still reveals some codegen errors:
>> >
>> > loop nest optimized: 437
>> > loop nest not optimized, code generation error: 25
>> > loop nest not optimized, optimized schedule is identical to original
>> > schedule: 169
>> > loop nest not optimized, optimization timed out: 60
>> > loop nest not optimized, ISL signalled an error: 9
>> > loop nest: 700
>> >
>> > Bootstrap and regtest in progress on x86_64-unknown-linux-gnu
>> > (with and without -fgraphite-identity -floop-nest-optimize).
>> >
>> > Ok?
>> >
>> > Thanks,
>> > Richard.
>> >
>>
>> > Index: gcc/tree-scalar-evolution.c
>> > ===================================================================
>> > --- gcc/tree-scalar-evolution.c (revision 253645)
>> > +++ gcc/tree-scalar-evolution.c (working copy)
>> > @@ -2344,7 +2348,7 @@ static tree instantiate_scev_r (basic_bl
>> >     instantiated, and to stop if it exceeds some limit.  */
>> >
>> >  static tree
>> > -instantiate_scev_name (basic_block instantiate_below,
>> > +instantiate_scev_name (edge instantiate_below,
>> >                        struct loop *evolution_loop, struct loop 
>> > *inner_loop,
>> >                        tree chrec,
>> >                        bool *fold_conversions,
>> > @@ -2358,7 +2362,7 @@ instantiate_scev_name (basic_block insta
>> >       evolutions in outer loops), nothing to do.  */
>> >    if (!def_bb
>> >        || loop_depth (def_bb->loop_father) == 0
>> > -      || dominated_by_p (CDI_DOMINATORS, instantiate_below, def_bb))
>> > +      || ! dominated_by_p (CDI_DOMINATORS, def_bb, 
>> > instantiate_below->dest))
>> >      return chrec;
>> >
>> >    /* We cache the value of instantiated variable to avoid exponential
>> > @@ -2380,6 +2384,51 @@ instantiate_scev_name (basic_block insta
>> >
>> >    def_loop = find_common_loop (evolution_loop, def_bb->loop_father);
>> >
>> > +  if (! dominated_by_p (CDI_DOMINATORS,
>> > +                       def_loop->header, instantiate_below->dest))
>> > +    {
>> > +      gimple *def = SSA_NAME_DEF_STMT (chrec);
>> > +      if (gassign *ass = dyn_cast <gassign *> (def))
>> > +       {
>> > +         switch (gimple_assign_rhs_class (ass))
>> > +           {
>> > +           case GIMPLE_UNARY_RHS:
>> > +             {
>> > +               tree op0 = instantiate_scev_r (instantiate_below, 
>> > evolution_loop,
>> > +                                              inner_loop, 
>> > gimple_assign_rhs1 (ass),
>> > +                                              fold_conversions, 
>> > size_expr);
>> > +               if (op0 == chrec_dont_know)
>> > +                 return chrec_dont_know;
>> > +               res = fold_build1 (gimple_assign_rhs_code (ass),
>> > +                                  TREE_TYPE (chrec), op0);
>> > +               break;
>> > +             }
>> > +           case GIMPLE_BINARY_RHS:
>> > +             {
>> > +               tree op0 = instantiate_scev_r (instantiate_below, 
>> > evolution_loop,
>> > +                                              inner_loop, 
>> > gimple_assign_rhs1 (ass),
>> > +                                              fold_conversions, 
>> > size_expr);
>> > +               if (op0 == chrec_dont_know)
>> > +                 return chrec_dont_know;
>> > +               tree op1 = instantiate_scev_r (instantiate_below, 
>> > evolution_loop,
>> > +                                              inner_loop, 
>> > gimple_assign_rhs2 (ass),
>> > +                                              fold_conversions, 
>> > size_expr);
>> > +               if (op1 == chrec_dont_know)
>> > +                 return chrec_dont_know;
>> > +               res = fold_build2 (gimple_assign_rhs_code (ass),
>> > +                                  TREE_TYPE (chrec), op0, op1);
>> > +               break;
>> > +             }
>> > +           default:
>> > +             res = chrec_dont_know;
>> > +           }
>> > +       }
>> > +      else
>> > +       res = chrec_dont_know;
>> > +      global_cache->set (si, res);
>> > +      return res;
>> > +    }
>> > +
>> >    /* If the analysis yields a parametric chrec, instantiate the
>> >       result again.  */
>> >    res = analyze_scalar_evolution (def_loop, chrec);
>>
>> IIUC, after changing instantiate_scev_r from loop based to region
>> based, there are
>> two cases.  In one case, def_loop is dominated by instantiate_edge,
>> which we'd like
>> to analyze/instantiate scev wrto the new def_loop; In the other case,
>> def_loop is not
>> fully part of sese region, which we'd like to expand ssa_name wrto the
>> basic block
>> instantiate_edge->dest.  It's simply ssa expanding and no loop is involved.
>
> Note there can be still a loop involved if the SSA chain arrives at
> a DEF that is defined within a loop inbetween the current def and
> instantiate_edge->dest.  In that case we need to process the
> compute_overall_effect_of_inner_loop case.
>
>> So how about factor out above big if-statement into a function with name like
>> expand_scev_name (Other better names?).  The code like:
>>
>>   /* Some comment explaining the two cases in region based instantiation.  */
>>   if (dominated_by_p (CDI_DOMINATORS, def_loop->header,
>> instantiate_below->dest))
>>     res = analyze_scalar_evolution (def_loop, chrec);
>>   else
>>     res = expand_scev_name (instantiate_below, chrec);
>>
>> could be easier to read?
>
> Note it would be
>
>    else
>      {
>        res = expand_scev_name (instantiate_below, chrec);
>        global_cache->set (si, res);
>        return res;
>      }
>
> and expand_scev_name would still need to recurse via instantiate_scev.
> It isn't merely gathering all expressions up to instantiate_edge->dest
> from the stmts (see above).
I was thinking only do expand here and have last piece of code in function
instantiate_scev_name do the recursive call:

  else if (res != chrec_dont_know)
    {
      if (inner_loop
      && def_bb->loop_father != inner_loop
      && !flow_loop_nested_p (def_bb->loop_father, inner_loop))
    /* ???  We could try to compute the overall effect of the loop here.  */
    res = chrec_dont_know;
      else
    res = instantiate_scev_r (instantiate_below, evolution_loop,
                  inner_loop, res,
                  fold_conversions, size_expr);
    }

Not sure if it's feasible. We might need to stop expansion at either
instantiate_edge
or at in between loop.

Thanks,
bin
>
> But yes, factoring the above might be a good idea.  The above is also
> not 100% equivalent in capabilities as the rest of the instantiation
> machinery -- it lacks condition PHI handling and it fails to capture
> the loop-closed PHI handling (so the above mentioned case wouldn't
> be handled right now).
>
> I sent the patch out for comments early before fleshing out all
> those pesky details ;)  It's at least conservative correct
> right now.
>
> Richard.

Reply via email to