Re: [PATCH] tree-optimization/102155 - fix LIM fill_always_executed_in CFG walk

Richard Biener via Gcc-patches Thu, 02 Sep 2021 03:37:59 -0700

On Thu, 2 Sep 2021, Xionghu Luo wrote:

> 
> 
> On 2021/9/2 16:50, Richard Biener wrote:
> > On Thu, 2 Sep 2021, Richard Biener wrote:
> > 
> >> On Thu, 2 Sep 2021, Xionghu Luo wrote:
> >>
> >>>
> >>>
> >>> On 2021/9/1 17:58, Richard Biener wrote:
> >>>> This fixes the CFG walk order of fill_always_executed_in to use
> >>>> RPO oder rather than the dominator based order computed by
> >>>> get_loop_body_in_dom_order.  That fixes correctness issues with
> >>>> unordered dominator children.
> >>>>
> >>>> The RPO order computed by rev_post_order_and_mark_dfs_back_seme in
> >>>> its for-iteration mode is a good match for the algorithm.
> >>>>
> >>>> Xionghu, I've tried to only fix the CFG walk order issue and not
> >>>> change anything else with this so we have a more correct base
> >>>> to work against.  The code still walks inner loop bodies
> >>>> up to loop depth times and thus is quadratic in the loop depth.
> >>>>
> >>>> Bootstrapped and tested on x86_64-unknown-linux-gnu, if you don't
> >>>> have any comments I plan to push this and then revisit what we
> >>>> were circling around.
> >>>
> >>> LGTM, thanks.
> >>
> >> I pushed it, thought again in the attempt to build a testcase and
> >> concluded I was wrong with the appearant mishandling of
> >> contains_call - get_loop_body_in_dom_order seems to be exactly
> >> correct for this specific case.  So I reverted the commit again.
> > 
> > And I figured what the
> > 
> >                /* In a loop that is always entered we may proceed anyway.
> >                   But record that we entered it and stop once we leave it.
> > */
> > 
> > comment was about.  The code was present before the fix for PR78185
> > and it was supposed to catch the case where the entered inner loop
> > is not finite.  Just as the testcase from PR78185 shows the
> > stopping was done too late when the exit block was already marked
> > as to be always executed.  A simpler fix for PR78185 would have been
> > to move
> > 
> >            if (!flow_bb_inside_loop_p (inn_loop, bb))
> >              break;
> > 
> > before setting of last = bb.  In fact the installed fix was more
> > pessimistic than that given it terminated already when entering
> > a possibly infinite loop.  So we can improve that by doing
> > sth like which should also improve the situation for some of
> > the cases you were looking at?
> > 
> > What remains is that we continue to stop when entering a
> > not always executed loop:
> > 
> >            if (bb->loop_father->header == bb)
> >              {
> >                if (!dominated_by_p (CDI_DOMINATORS, loop->latch, bb))
> >                  break;
> 
> Yes.  This will cause blocks after inner loop missed to be check
> if they are actually ALWAYS_EXECUTED.   I am afraid O(N^2) is 
> inevitable here...


Yes.  What we can try is pre-computing whether a loop
has a call or an inner loop that might not terminate and then
when that's true for the loop to be entered continue to break;
but when not, skip processing that loop blocks (but we still
fill the blocks array, and we do need to do this in the order
for the loop we're processing ...).

So what I was thinking was to somehow embed the dominator
walk of get_loop_body_in_dom_order and instead of pre-recording
the above info (call, infinite loop) for loops, pre-record
it on the dominator tree so that we can ask "in any of our
dominator children, is there a call or an infinite loop" and
thus cut the dominator walk at loop header blocks that are
not dominating the outer loop latch ...

Of course the simplistic solution might be to simply do

   if (!dominated_by_p (CDI_DOMINATORS, loop->latch, bb)
       && ((loop_depth (bb->loop_father) - loop_depth (loop))
           > param_max_lim_loop_depth_lookahead)))
     break;

and thus limit the processing of conditionally executed inner
loops by relative depth ... as you say the actual processing
is unlikely to be the bottleneck for the degenerate cases
of a very deep nest of conditionally executed loops.

But still for this case get_loop_body_in_dom_order is
doing quadratic processing so we can also say that
another linear walk over the produced array does not
increase complexity.

> > 
> > that I can at this point only explain by possible efficiency
> > concerns?  Any better idea on that one?
> From experiment, early break from inner loop seems not cost shorter
> time than full inner loop walk.  I will take more precise
> measurement and larger data set on the function fill_always_executed_in_1
> if necessary.   
> 
> My previous v2 patch also tried to update inn_loop level by level
> when exiting from inn_loops, but it is proved to be  unnecessary
> but you worried about the dominance order by get_loop_body_in_dom_order.
> 
> > 
> > I'm going to test the patch below which improves the situation for
> > 
> > volatile int flag, bar;
> > double foo (double *valp)
> > {
> >    double sum = 0;
> >    for (int i = 0; i < 256; ++i)
> >      {
> >        for (int j = 0; j < 256; ++j)
> >          bar = flag;
> >        if (flag)
> >          sum += 1.;
> >        sum += *valp;
> >      }
> >    return sum;
> > }
> 
> The patch still fails to handle cases like this:
> 
> 
> struct X { int i; int j; int k;};
> volatile int m;
> 
> void bar (struct X *x, int n, int l, int k)
> {
>   for (int i = 0; i < l; i++)
>     {
>      if (k)
>         for (int j = 0; j < l; j++)
>           {
>             if (m)
>               x->i = m;
>             else
>               x->i = 1 - m;
> 
>             int *r = &x->k;
>             int tem2 = *r;
>             x->k += tem2 * j;
>         }
> 
>       x->i = m;
>     }
> }
> 
> x->i is still not marked ALWAYS_EXECUTED for outer loop.

Yes.

Richard.

Re: [PATCH] tree-optimization/102155 - fix LIM fill_always_executed_in CFG walk

Reply via email to