RE: [PATCH] improve ifcvt optimization (PR rtl-optimization/89430)

JiangNing OS Thu, 20 Jun 2019 02:53:35 -0700

Hi Jeff,

Appreciate your effort to review my patch! I've updated my patch as attached. 
See my answers below.


> in current function, so the store speculation can be avoided.
> So at a high level should we be doing this in gimple rather than RTL?
> We're going to have a lot more information about types, better
> infrastructure for looking at uses/defs, access to the alias oracle, we should
> be able to accurately distinguish between potentially shared objects vs those
> which are local to the thread, etc.  We lose the low level costing information
> though.
> 
> I'm still going to go through the patch and do some level of review, but I do
> think we need to answer the higher level question though.
> 
I have the following reasons,

1) Following the clue Richard B gave me before about parameter --param 
allow-store-data-races,
I did check the middle-end pass tree-if-conv, but I think this pass at the 
moment doesn't work
for the issue I'm trying to solve. Tree-if-conv is to do if conversion for 
loop, and its final goal is to
help loop vectorization, while my case doesn't have a loop at all. 
2) My current solution fits into current back-end if-conversion pass very well. 
I don't want to invent
a new framework to solve this relatively small issue. Besides, this back-end 
patch doesn't only
enhance store speculation detection, but also fix a bug in the original code. 

> Nits: We typically refer to parameters, variables, etc in comments using
> upper case.  You'll need to review the entire patch for these its.
> 
> So perhaps the comment should be something like:
> 
> /* Return true of X, a MEM expression, is on the stack.  A_INSN contains
>    X if A_INSN exists.  */
> 
Fixed in attached new patch.

> 
> Just from a design standpoint, what are the consequences if this function
> returns true for something that isn't actually in the stack or false for
> something that is on the stack?
> 
If noce_mem_is_on_stack returns true for something that isn't actually in the 
stack, 
it could potentially introduce store speculation, then the if-conversion 
optimization
will be incorrect. If this function returns false for something that is on 
stack, it doesn't
matter, because the optimization will not be triggered. 

> Nit: Space between the function name and its open paren for arguments.  ie
> 
> if (fixed_base_plus_p (a)
>                      ^
> I see other instances of this nit, please review the patch and correct them.
> 
Fixed in attached new patch.

> 
> > +
> > +  if (!a_insn)
> > +    return false;
> I'm not sure what the calling profile is for this function, but this is a 
> cheaper
> test, so you might consider moving it before the test of fixed_base_plus_p.
> 
Fixed in attached new patch.

> 
> > +
> > +  if (!reg_mentioned_p (x, a_insn))
> > +    return false;
> > +
> > +  /* Check if x is on stack. Assume a mem expression using registers
> > +     related to stack register is always on stack. */
> > + FOR_EACH_INSN_USE (use, a_insn)
> > +    if (reg_mentioned_p (DF_REF_REG (use), x)
> > +        && bitmap_bit_p (bba_sets_must_be_sfp, DF_REF_REGNO (use)))
> > +      return true;
> > +
> > +  return false;
> > +}
> So is X always a MEM?      Just wanted to make sure I understand.
> reg_mentioned_p will do the right thing (using rtx_equal_p) for the
> comparison.
> 
Yes. X is always a MEM. There is an assertion for this in the code above.

> 
> > +
> > +/* Always return true, if there is a dominating write.
> > +
> > +   When there is a dominating read from memory on stack,
> > +   1) if x = a is a memory read, return true.
> > +   2) if x = a is a memory write, return true if the memory is on stack.
> > +      This is the guarantee the memory is *not* readonly. */
> > +
> > +static bool
> > +noce_valid_for_dominating (basic_block bb, rtx_insn *a_insn,
> > +                           const_rtx x, bool is_store) {
> > +  rtx_insn *insn;
> > +  rtx set;
> > +
> > +  gcc_assert (MEM_P (x));
> > +
> > +  FOR_BB_INSNS (bb, insn)
> > +    {
> > +      set = single_set (insn);
> > +      if (!set)
> > +        continue;
> > +
> > +      /* Dominating store */
> > +      if (rtx_equal_p (x, SET_DEST (set)))
> > +        return true;
> > +
> > +      /* Dominating load */
> > +      if (rtx_equal_p (x, SET_SRC (set)))
> > +        if (is_store && noce_mem_is_on_stack (a_insn, x))
> > +          return true;
> > +    }
> > +
> > +  return false;
> > +}
> So what would be the consequences here of returning false when in fact
> there was a dominating read or write?  That could easily happen if the
> dominating read or write was not a single_set.
If return false when in fact there is a dominating read or write, the 
optimization will not
be triggered, because noce_mem_maybe_invalid_p will be true, which is playing 
the same
role as may_trap_or_fault_p.

> 
> I'm guessing that from a design standpoint you're trying to find cases where
> you know an object was written to within the block and does not escape.  So
> returning false when there was a dominating write is safe.
> Returning true when there was not would be disastrous.  Right?
> 
YES! You've completely understood my code! 😊

> 
> 
> 
> 
> > @@ -5347,6 +5462,234 @@
> >
> >    return FALSE;
> >  }
> > +
> > +
> > +/* Find all insns that must be stack address and store REGNO into
> > +   bitmap bba_sets_must_be_sfp. */
> > +
> > +static void
> > +find_all_must_be_sfp_insns (void)
> > +{
> > +  rtx_insn *a_insn;
> > +  basic_block bb;
> > +  bool sfp_found;
> > +  auto_vec<int, 1> def_count;
> > +  df_ref def, use;
> > +
> > +  /* Calculate def counts for each insn. */
> > + def_count.safe_grow_cleared (max_reg_num () + 1);  FOR_ALL_BB_FN
> > + (bb, cfun)
> > +    FOR_BB_INSNS (bb, a_insn)
> > +      {
> > +        if (!INSN_P (a_insn))
> > +          continue;
> > +
> > +        FOR_EACH_INSN_DEF (def, a_insn)
> > +          def_count[DF_REF_REGNO (def)]++;
> > +      }Is there a reason why you can't use the information computed by
> reginfo?
> 
Fixed in attached new patch.

> 
> 
> 
> > +
> > +  /* Iterate all insns until finding all stack pointers, and store
> > +     them into bba_sets_must_be_sfp. */  bba_sets_must_be_sfp =
> > + BITMAP_ALLOC (&reg_obstack);  do
> > +    {
> > +      sfp_found = false;
> > +      FOR_ALL_BB_FN (bb, cfun)
> > +        FOR_BB_INSNS (bb, a_insn)
> > +          {
> > +            rtx sset_a = single_set (a_insn);
> > +            if (!sset_a)
> > +              continue;
> > +            rtx src = SET_SRC (sset_a);
> > +            rtx dest = SET_DEST (sset_a);
> So again, we can get things that aren't a single_set here and they'll be
> ignored.  But I believe that's safe as you're trying to identify insns which 
> store
> stack addresses into a REG.  Missing a more complicated case where a stack
> address is stored into a REG is safe.  Right?
> 
Yes. Missing some pointers to stack is safe here, because we will use it to 
check if a memory
in if-conversion candidate doesn't have store speculation. If we miss it, the 
optimization will
not be triggered, so there will not be risky.

> 
> 
> 
> > +
> > +            if (!REG_P (dest))
> > +              continue;
> > +
> > +            /* For the case below,
> > +                 Control flow: B1->B3, B2->B3
> > +                 B1: p = &local_var
> > +                 B2: p = &global_var
> > +                 B3: ... = *p
> > +               pointer p is an address for either local or global variable.
> > +               so we don't treat p as a stack address pointer. To make
> > +               algorithm simple, we ignore all non-SSA cases. */
> > +            bool skip_insn = false;
> > +            unsigned int dest_regno = 0;
> > +            FOR_EACH_INSN_DEF (def, a_insn)
> > +              {
> > +                dest_regno = DF_REF_REGNO (def);
> > +                /* Skip current insn if
> > +                   1) already marked as stack pointer.
> > +                   2) or see multiple definition points. */
> > +                if (bitmap_bit_p (bba_sets_must_be_sfp, dest_regno)
> > +                    || def_count[dest_regno] > 1)
> > +                  {
> > +                    skip_insn = true;
> > +                    break;
> > +                  }
> > +              }
> > +            if (skip_insn)
> > +              continue;
> Right.  Again I wonder if you should be using the info computed by regstat.
> You can get single use information easily from that.
> 
Yes. Fixed in attached new patch.

> > +
> > +            /* Handle case like "x1 = sp + offset". */
> > +            if (fixed_base_plus_p (src))
> > +              {
> > +           bitmap_set_bit (bba_sets_must_be_sfp, dest_regno);
> > +                sfp_found = true;
> > +                continue;
> > +              }
> Looks like your you've got an indentation problem here.  Most likely you've
> got a case where you've got 8 spaces that should be replaced with a tab.
> 
Fixed in attached new patch.

> 
> > +
> > +            /* Handle case like "x2 = x1 + offset", in which x1 is already
> > +               identified as a stack pointer. */
> > +            if ((GET_CODE (src) == PLUS || GET_CODE (src) == MINUS)
> > +                && CONST_INT_P (XEXP (src, 1)))
> > +              {
> > +                rtx x1 = XEXP (src, 0);
> > +                if (!REG_P (x1))
> > +                  continue;
> > +
> > +                FOR_EACH_INSN_USE (use, a_insn)
> > +                  {
> > +                    if (!rtx_equal_p (x1, DF_REF_REG (use)))
> > +                      continue;
> > +
> > +                    if (bitmap_bit_p (bba_sets_must_be_sfp, DF_REF_REGNO
> (use)))
> > +                      {
> > +                        bitmap_set_bit (bba_sets_must_be_sfp, dest_regno);
> > +                        sfp_found = true;
> > +                        break;
> > +                      }
> > +                  }
> So how much is there to be gained from going from this kind of localized
> computation to global?  It shouldn't be hard to turn this into a truly global
> algorithm, but I'm not sure it's worth the effort.
> 
> I _do_ think it's worth visiting blocks in dominator order.  It's better than
> what you're doing, but nowhere near as expensive as a global propagation
> engine.
> 
Fixed in attached new patch using dom_walk class. Please review again.

> Is it worth handling simple copies in the code above?
I thought it less likely to have a copy for pointer to stack before register 
allocation. Right?

> 
> 
> > +         }
> > +          }
> > +    }
> > +  while (sfp_found);
> > +}
> > +
> > +/* Find all insns that may be stack pointer and store REGNO into
> > +   bitmap bba_sets_may_be_sfp. We iterate all insns in current func
> > +   until no more latent insns generating stack address are found.
> > +   The collection of stack pointer bba_sets_may_be_sfp will be used
> > +   to help analyze local stack variable address taken. Stack variable
> > +   address can be passed out of current frame if only any stack pointer
> > +   is passed to hard register or stored into memory. */
> Shouldn't "may be stack pointer" be "must be stack pointer"?
> 
There is a typo in the comments. Fixed in attached new patch.

> 
> > +
> > +static void
> > +find_all_may_be_sfp_insns (void)
> > +{
> > +  rtx_insn *a_insn;
> > +  basic_block bb;
> > +  bool sfp_found;
> > +  bba_sets_may_be_sfp = BITMAP_ALLOC (&reg_obstack);
> > +
> > +  /* Iterate all insns that may be stack pointers, and store them into
> > +     bitmap bba_sets_must_be_sfp. */
> > +  do
> > +    {
> > +      sfp_found = false;
> > +      FOR_ALL_BB_FN (bb, cfun)
> Again, you may ultimately be better off with a dominator walk here.
Yes. Fixed in attached new patch.

> 
> 
> > +        FOR_BB_INSNS (bb, a_insn)
> > +          {
> > +            /* Assume stack pointers can only be generated by single SET. 
> > */
> > +            rtx sset_a = single_set (a_insn);
> > +            if (!sset_a)
> > +              continue;
> > +            rtx src = SET_SRC (sset_a);
> > +            rtx dest = SET_DEST (sset_a);
> I'm not sure that's a safe assumption.
Fixed in attached new patch.

> 
> 
> > +
> > +            /* If we see "hard_register = ...", which implies passing out
> > +               of current frame potentially, we don't collect this case.
> > +               It can be treated as address taken in 
> > no_stack_address_taken */
> > +            if (HARD_REGISTER_P (dest))
> > +              continue;
> > +
> > +            /* Memory load and store are not to generate stack pointer. */
> > +            if (MEM_P (src) || MEM_P (dest))
> > +              continue;
> > +
> > +            /* Skip insn that is already identified as stack pointer. */
> > +            bool skip_insn = false;
> > +            df_ref def;
> > +            unsigned int dest_regno = 0;
> > +            FOR_EACH_INSN_DEF (def, a_insn)
> > +              {
> > +                dest_regno = DF_REF_REGNO (def);
> > +                if (bitmap_bit_p (bba_sets_may_be_sfp, dest_regno))
> > +                  {
> > +                    skip_insn = true;
> > +                    break;
> > +                  }
> > +              }
> So earlier you've verified you've got a single set.  I'm not sure you need an
> iterator over the defs.  Can't you just look at SET_DEST
> (sset_a) which is conveniently in DEST?
Fixed in attached new patch.

> 
> 
> I don't like the term "stack pointer" here -- most of us read "stack pointer"
> and we think the register holding the current stack pointer.
> Would "pointer into the stack" be more appropriate?
> 
Fixed in attached new patch.

> 
> > +/* Return true if current function doesn't pass stack address out of
> > +frame. */
> > +
> > +static bool
> > +no_stack_address_taken (void)
> > +{
> > +  basic_block bb;
> > +  rtx_insn *a_insn;
> > +  df_ref use;
> > +
> > +  FOR_ALL_BB_FN (bb, cfun)
> > +    FOR_BB_INSNS (bb, a_insn)
> > +      {
> > +        if (!INSN_P (a_insn))
> > +          continue;
> > +
> > +        rtx sset_a = single_set (a_insn);
> > +        if (!sset_a)
> > +          continue;
> > +        rtx src = SET_SRC (sset_a);
> > +        rtx dest = SET_DEST (sset_a);
> > +
> > +        /* Skip insn that is already identified as frame pointers. */
> > +        if (bitmap_bit_p (bba_sets_may_be_sfp, REGNO (dest)))
> > +          continue;
> > +
> > +        FOR_EACH_INSN_USE (use, a_insn)
> > +          {
> > +            /* Check if current insn is using any stack pointer. Ignore
> > +               if it doesn't use frame pointers at all. */
> > +            if (!bitmap_bit_p (bba_sets_may_be_sfp, DF_REF_REGNO (use)))
> > +              continue;
> > +
> > +            /* Load cannot introduce address taken. */
> > +            if (MEM_P (src))
> > +              continue;
> > +
> > +            /* Store is safe if only the src doesn't contain stack 
> > pointer. */
> > +            if (MEM_P (dest)
> > +                && !reg_mentioned_p (DF_REF_REG (use), src))
> > +              continue;
> > +
> > +            /* All other cases potentially introduce address taken. */
> > +            return false;
> > +          }
> So what if you have a PARALLEL where one entry causes an escape of a
> pointer into the stack and another entry does some other operation?  If so,
> don't you miss the fact that you've got an escaping pointer into the stack?  I
> don't think you can restrict yourself to single_sets here.
Yes. You are right. I have added code to handle PARALLEL case. I handle it in a 
rather conservative way though. Review again, please!

> 
> Jeff

Thanks,
-Jiangning

csel4.patch
Description: csel4.patch

RE: [PATCH] improve ifcvt optimization (PR rtl-optimization/89430)

Reply via email to