On Wed, Oct 11, 2017 at 3:43 PM, Richard Biener <rguent...@suse.de> wrote: > > For PR82355 I introduced a fake dimension to ISL to allow CHRECs > having an evolution in a loop that isn't fully part of the SESE > region we are processing. That was easier than fending off those > CHRECs (without simply giving up on SESE regions with those). > > But it didn't fully solve the issue as PR82451 shows where we run > into the issue that we eventually have to code-gen those > evolutions and thus in theory need a canonical IV of that containing loop. > > So I decided (after Micha pressuring me a bit...) to revisit the > original issue and make SCEV analysis "properly" handle SE regions. > It turns out that it is mostly instantiate_scev lacking proper support > plus the necessary interfacing change (really just cosmetic in some sense) > from a instantiate_before basic-block to a instantiate_before edge. > > data-ref interfaces have been similarly adjusted, here changing > the "loop nest" loop parameter to the entry edge for the SE region > and passing that down accordingly. > > I've for now tried to keep other high-level loop-based interfaces the > same by simply using the loop preheader edge as entry where appropriate > (needing loop_preheader_edge cope with the loop root tree for simplicity). > > In the process I ran into issues with us too overly aggressive > instantiating random trees and thus I cut those down. That part > doesn't successfully test separately (when I remove the strange > ARRAY_REF instantiation), so it's part of this patch. I've also > run into an SSA verification fail (the id-27.f90 testcase) which > shows we _do_ need to clear the SCEV cache after introducing > the versioned CFG (and added a comment before it). > > On the previously failing testcases I've verified we produce > sensible instantiations for those pesky refs residing in "no" loop > in the SCOP and that we get away with the result in terms of > optimizing. > > SPEC 2k6 testing shows > > loop nest optimized: 311 > loop nest not optimized, code generation error: 0 > loop nest not optimized, optimized schedule is identical to original > schedule: 173 > loop nest not optimized, optimization timed out: 59 > loop nest not optimized, ISL signalled an error: 9 > loop nest: 552 > > for SPEC 2k6 and -floop-nest-optimize while adding -fgraphite-identity > still reveals some codegen errors: > > loop nest optimized: 437 > loop nest not optimized, code generation error: 25 > loop nest not optimized, optimized schedule is identical to original > schedule: 169 > loop nest not optimized, optimization timed out: 60 > loop nest not optimized, ISL signalled an error: 9 > loop nest: 700 > > Bootstrap and regtest in progress on x86_64-unknown-linux-gnu > (with and without -fgraphite-identity -floop-nest-optimize). > > Ok? > > Thanks, > Richard. >
> Index: gcc/tree-scalar-evolution.c > =================================================================== > --- gcc/tree-scalar-evolution.c (revision 253645) > +++ gcc/tree-scalar-evolution.c (working copy) > @@ -2344,7 +2348,7 @@ static tree instantiate_scev_r (basic_bl > instantiated, and to stop if it exceeds some limit. */ > > static tree > -instantiate_scev_name (basic_block instantiate_below, > +instantiate_scev_name (edge instantiate_below, > struct loop *evolution_loop, struct loop *inner_loop, > tree chrec, > bool *fold_conversions, > @@ -2358,7 +2362,7 @@ instantiate_scev_name (basic_block insta > evolutions in outer loops), nothing to do. */ > if (!def_bb > || loop_depth (def_bb->loop_father) == 0 > - || dominated_by_p (CDI_DOMINATORS, instantiate_below, def_bb)) > + || ! dominated_by_p (CDI_DOMINATORS, def_bb, instantiate_below->dest)) > return chrec; > > /* We cache the value of instantiated variable to avoid exponential > @@ -2380,6 +2384,51 @@ instantiate_scev_name (basic_block insta > > def_loop = find_common_loop (evolution_loop, def_bb->loop_father); > > + if (! dominated_by_p (CDI_DOMINATORS, > + def_loop->header, instantiate_below->dest)) > + { > + gimple *def = SSA_NAME_DEF_STMT (chrec); > + if (gassign *ass = dyn_cast <gassign *> (def)) > + { > + switch (gimple_assign_rhs_class (ass)) > + { > + case GIMPLE_UNARY_RHS: > + { > + tree op0 = instantiate_scev_r (instantiate_below, > evolution_loop, > + inner_loop, gimple_assign_rhs1 > (ass), > + fold_conversions, size_expr); > + if (op0 == chrec_dont_know) > + return chrec_dont_know; > + res = fold_build1 (gimple_assign_rhs_code (ass), > + TREE_TYPE (chrec), op0); > + break; > + } > + case GIMPLE_BINARY_RHS: > + { > + tree op0 = instantiate_scev_r (instantiate_below, > evolution_loop, > + inner_loop, gimple_assign_rhs1 > (ass), > + fold_conversions, size_expr); > + if (op0 == chrec_dont_know) > + return chrec_dont_know; > + tree op1 = instantiate_scev_r (instantiate_below, > evolution_loop, > + inner_loop, gimple_assign_rhs2 > (ass), > + fold_conversions, size_expr); > + if (op1 == chrec_dont_know) > + return chrec_dont_know; > + res = fold_build2 (gimple_assign_rhs_code (ass), > + TREE_TYPE (chrec), op0, op1); > + break; > + } > + default: > + res = chrec_dont_know; > + } > + } > + else > + res = chrec_dont_know; > + global_cache->set (si, res); > + return res; > + } > + > /* If the analysis yields a parametric chrec, instantiate the > result again. */ > res = analyze_scalar_evolution (def_loop, chrec); IIUC, after changing instantiate_scev_r from loop based to region based, there are two cases. In one case, def_loop is dominated by instantiate_edge, which we'd like to analyze/instantiate scev wrto the new def_loop; In the other case, def_loop is not fully part of sese region, which we'd like to expand ssa_name wrto the basic block instantiate_edge->dest. It's simply ssa expanding and no loop is involved. So how about factor out above big if-statement into a function with name like expand_scev_name (Other better names?). The code like: /* Some comment explaining the two cases in region based instantiation. */ if (dominated_by_p (CDI_DOMINATORS, def_loop->header, instantiate_below->dest)) res = analyze_scalar_evolution (def_loop, chrec); else res = expand_scev_name (instantiate_below, chrec); could be easier to read? Thanks, bin