https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80155

--- Comment #16 from rguenther at suse dot de <rguenther at suse dot de> ---
On Thu, 23 Mar 2017, thopre01 at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80155
> 
> --- Comment #15 from Thomas Preud'homme <thopre01 at gcc dot gnu.org> ---
> (In reply to Thomas Preud'homme from comment #14)
> > (In reply to rguent...@suse.de from comment #13)
> > > On Thu, 23 Mar 2017, thopre01 at gcc dot gnu.org wrote:
> > > 
> > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80155
> > > > 
> > > > --- Comment #12 from Thomas Preud'homme <thopre01 at gcc dot gnu.org> 
> > > > ---
> > > > (In reply to Richard Biener from comment #9)
> > > > > Ah, the patches do not fix the testcase because the testcase is _not_ 
> > > > > the
> > > > > PRE-creates-IV case.  It's indeed simply hoisting/PRE at work 
> > > > > transforming
> > > > > 
> > > > >   # a_14 = PHI <a_10(, a_5(D)>
> > > > >   if (!b)
> > > > >     a_8 = a_14 + 1;
> > > > > 
> > > > >   # a_2 = PHI <a_14(10), a_8(4)>
> > > > >   a_10 = a_2 + 1;
> > > > >   ... = *(a_2 + 1);
> > > > > 
> > > > > to
> > > > > 
> > > > >   # a_14 = PHI <prephimp_12, a_5(D)>
> > > > >   _4 = a_14 + 1;
> > > > >   if (b)
> > > > >     _3 = _4 + 1;
> > > > > 
> > > > >   # a_2 = PHI <a_14, _4>
> > > > >   # prephitmp_12 = PHI <_4, _3>
> > > > >   ... = *(a_2 + 1);
> > > > > 
> > > > > increasing register pressure mainly because nothing figures that a_2 
> > > > > + 1
> > > > > in the dereference can be replaced by prephitmp_12 ...
> > > > > 
> > > > > So this is a missed SLSR opportunity or, in this simple form, a missed
> > > > > PRE/CSE opportunity.  Fixing that with the following (otherwise 
> > > > > untested)
> > > > > restores good code generation for the testcase:
> > > > > 
> > > > > Index: gcc/tree-ssa-pre.c
> > > > > ===================================================================
> > > > > --- gcc/tree-ssa-pre.c  (revision 246414)
> > > > > +++ gcc/tree-ssa-pre.c  (working copy)
> > > > > @@ -4636,6 +4610,35 @@ eliminate_dom_walker::before_dom_childre
> > > > >             }
> > > > >         }
> > > > >  
> > > > > +      if (gimple_has_ops (stmt))
> > > > > +       for (unsigned i = 0; i < gimple_num_ops (stmt); ++i)
> > > > > +         {
> > > > > +           tree op = gimple_op (stmt, i);
> > > > > +           if (op)
> > > > > +             op = get_base_address (op);
> > > > > +           if (op
> > > > > +               && TREE_CODE (op) == MEM_REF
> > > > > +               && ! integer_zerop (TREE_OPERAND (op, 1)))
> > > > > +             {
> > > > > +               tree ops[2];
> > > > > +               vn_nary_op_t nary;
> > > > > +               ops[0] = TREE_OPERAND (op, 0);
> > > > > +               ops[1] = TREE_OPERAND (op, 1);
> > > > > +               tree res = vn_nary_op_lookup_pieces (2, 
> > > > > POINTER_PLUS_EXPR,
> > > > > +                                                    TREE_TYPE 
> > > > > (ops[0]),
> > > > > +                                                    ops, &nary);
> > > > > +               if (res && TREE_CODE (res) == SSA_NAME)
> > > > > +                 res = eliminate_avail (res);
> > > > > +               if (res)
> > > > > +                 {
> > > > > +                   TREE_OPERAND (op, 0) = res;
> > > > > +                   TREE_OPERAND (op, 1)
> > > > > +                     = build_int_cst (TREE_TYPE (TREE_OPERAND (op, 
> > > > > 1)), 0);
> > > > > +                   gimple_set_modified (stmt, true);
> > > 
> > > Add
> > > 
> > >                     if (TREE_CODE (res) == SSA_NAME
> > >                         && !is_gimple_debug (stmt))
> > >                       gimple_set_plf (SSA_NAME_DEF_STMT (res),
> > >                                       NECESSARY, true);
> > > 
> > > here.
> > > 
> > > > > +                 }
> > > > > +             }
> > > > > +         }
> > > > > +
> > > > >        if (gimple_modified_p (stmt))
> > > > >         {
> > > > >           /* If a formerly non-invariant ADDR_EXPR is turned into an
> > > > > 
> > > > > note that in general optimzing
> > > > > 
> > > > >    q = p + 1;
> > > > >      = ...*(p + 1);
> > > > > 
> > > > > "back" to *q will be undone by forwprop/stmt folding later but in this
> > > > > case the feeding stmt is a PHI node and not a pointer-plus.  It still
> > > > > means that the change might be a bit too disruptive at this point
> > > > > (we could restricit it a bit by only handling the case where we don't
> > > > > replace with a pointer-plus result).
> > > > 
> > > > Thanks for your work on this! Sadly GCC ICEs with this patch:
> > > > 
> > > > 0xd36f53 update_dep_bb
> > > >         gcc/tree-ssa-tail-merge.c:411
> > > > 0xd38f54 stmt_update_dep_bb
> > > >         gcc/tree-ssa-tail-merge.c:429
> > > > 0xd38f54 same_succ_hash
> > > >         gcc/tree-ssa-tail-merge.c:452
> > > > 0xd38f54 find_same_succ_bb
> > > >         gcc/tree-ssa-tail-merge.c:717
> > > > 0xd39927 find_same_succ
> > > >         gcc/tree-ssa-tail-merge.c:748
> > > > 0xd39927 init_worklist
> > > >         gcc/tree-ssa-tail-merge.c:767
> > > > 0xd39927 tail_merge_optimize(unsigned int)
> > > >         gcc/tree-ssa-tail-merge.c:1726
> > > > 0xce2d6a execute
> > > >         gcc/tree-ssa-pre.c:5164
> > > > 
> > > >
> > 
> > There's progress. Performance is improved but not as much as disabling code
> > hoisting. I'll try to reduce the testcase again with that patch to see if I
> > can find a testcase that expose all issues.
> 
> Funnily this led back to the Cortex-M0+ reduced testcase. With the patch in
> comment #13 applied we can still see a difference in the push (one register
> pushed Vs 0).

I can't reproduce zero pushes here I get three with/without 
-fno-code-hoisting.  code hoisting hoists the two loads inside
the switch before it so we have

  pretmp_36 = MEM[(char *)b_15 + 1B];
  switch (a.1_14) <default: <L2> [50.00%], case 0: <L1> [50.00%]>

<L1> [42.78%]:
  if (pretmp_36 != 0)
    goto <bb 6>; [92.51%]
  else
    goto <bb 9>; [7.49%]
...

<L2> [42.82%]:
  if (pretmp_36 != 0)
    goto <bb 13>; [92.49%]
  else
    goto <bb 11>; [7.51%]
...

that seems to be an opportunity for threading to f*ck things up.
Though to me the final assembly looks better than w/o code-hoisting.

Reply via email to