Hi Jeff, Appreciate your effort to review my patch! I've updated my patch as attached. See my answers below.
> in current function, so the store speculation can be avoided.
> So at a high level should we be doing this in gimple rather than RTL?
> We're going to have a lot more information about types, better
> infrastructure for looking at uses/defs, access to the alias oracle, we should
> be able to accurately distinguish between potentially shared objects vs those
> which are local to the thread, etc. We lose the low level costing information
> though.
>
> I'm still going to go through the patch and do some level of review, but I do
> think we need to answer the higher level question though.
>
I have the following reasons,
1) Following the clue Richard B gave me before about parameter --param
allow-store-data-races,
I did check the middle-end pass tree-if-conv, but I think this pass at the
moment doesn't work
for the issue I'm trying to solve. Tree-if-conv is to do if conversion for
loop, and its final goal is to
help loop vectorization, while my case doesn't have a loop at all.
2) My current solution fits into current back-end if-conversion pass very well.
I don't want to invent
a new framework to solve this relatively small issue. Besides, this back-end
patch doesn't only
enhance store speculation detection, but also fix a bug in the original code.
> Nits: We typically refer to parameters, variables, etc in comments using
> upper case. You'll need to review the entire patch for these its.
>
> So perhaps the comment should be something like:
>
> /* Return true of X, a MEM expression, is on the stack. A_INSN contains
> X if A_INSN exists. */
>
Fixed in attached new patch.
>
> Just from a design standpoint, what are the consequences if this function
> returns true for something that isn't actually in the stack or false for
> something that is on the stack?
>
If noce_mem_is_on_stack returns true for something that isn't actually in the
stack,
it could potentially introduce store speculation, then the if-conversion
optimization
will be incorrect. If this function returns false for something that is on
stack, it doesn't
matter, because the optimization will not be triggered.
> Nit: Space between the function name and its open paren for arguments. ie
>
> if (fixed_base_plus_p (a)
> ^
> I see other instances of this nit, please review the patch and correct them.
>
Fixed in attached new patch.
>
> > +
> > + if (!a_insn)
> > + return false;
> I'm not sure what the calling profile is for this function, but this is a
> cheaper
> test, so you might consider moving it before the test of fixed_base_plus_p.
>
Fixed in attached new patch.
>
> > +
> > + if (!reg_mentioned_p (x, a_insn))
> > + return false;
> > +
> > + /* Check if x is on stack. Assume a mem expression using registers
> > + related to stack register is always on stack. */
> > + FOR_EACH_INSN_USE (use, a_insn)
> > + if (reg_mentioned_p (DF_REF_REG (use), x)
> > + && bitmap_bit_p (bba_sets_must_be_sfp, DF_REF_REGNO (use)))
> > + return true;
> > +
> > + return false;
> > +}
> So is X always a MEM? Just wanted to make sure I understand.
> reg_mentioned_p will do the right thing (using rtx_equal_p) for the
> comparison.
>
Yes. X is always a MEM. There is an assertion for this in the code above.
>
> > +
> > +/* Always return true, if there is a dominating write.
> > +
> > + When there is a dominating read from memory on stack,
> > + 1) if x = a is a memory read, return true.
> > + 2) if x = a is a memory write, return true if the memory is on stack.
> > + This is the guarantee the memory is *not* readonly. */
> > +
> > +static bool
> > +noce_valid_for_dominating (basic_block bb, rtx_insn *a_insn,
> > + const_rtx x, bool is_store) {
> > + rtx_insn *insn;
> > + rtx set;
> > +
> > + gcc_assert (MEM_P (x));
> > +
> > + FOR_BB_INSNS (bb, insn)
> > + {
> > + set = single_set (insn);
> > + if (!set)
> > + continue;
> > +
> > + /* Dominating store */
> > + if (rtx_equal_p (x, SET_DEST (set)))
> > + return true;
> > +
> > + /* Dominating load */
> > + if (rtx_equal_p (x, SET_SRC (set)))
> > + if (is_store && noce_mem_is_on_stack (a_insn, x))
> > + return true;
> > + }
> > +
> > + return false;
> > +}
> So what would be the consequences here of returning false when in fact
> there was a dominating read or write? That could easily happen if the
> dominating read or write was not a single_set.
If return false when in fact there is a dominating read or write, the
optimization will not
be triggered, because noce_mem_maybe_invalid_p will be true, which is playing
the same
role as may_trap_or_fault_p.
>
> I'm guessing that from a design standpoint you're trying to find cases where
> you know an object was written to within the block and does not escape. So
> returning false when there was a dominating write is safe.
> Returning true when there was not would be disastrous. Right?
>
YES! You've completely understood my code! 😊
>
>
>
>
> > @@ -5347,6 +5462,234 @@
> >
> > return FALSE;
> > }
> > +
> > +
> > +/* Find all insns that must be stack address and store REGNO into
> > + bitmap bba_sets_must_be_sfp. */
> > +
> > +static void
> > +find_all_must_be_sfp_insns (void)
> > +{
> > + rtx_insn *a_insn;
> > + basic_block bb;
> > + bool sfp_found;
> > + auto_vec<int, 1> def_count;
> > + df_ref def, use;
> > +
> > + /* Calculate def counts for each insn. */
> > + def_count.safe_grow_cleared (max_reg_num () + 1); FOR_ALL_BB_FN
> > + (bb, cfun)
> > + FOR_BB_INSNS (bb, a_insn)
> > + {
> > + if (!INSN_P (a_insn))
> > + continue;
> > +
> > + FOR_EACH_INSN_DEF (def, a_insn)
> > + def_count[DF_REF_REGNO (def)]++;
> > + }Is there a reason why you can't use the information computed by
> reginfo?
>
Fixed in attached new patch.
>
>
>
> > +
> > + /* Iterate all insns until finding all stack pointers, and store
> > + them into bba_sets_must_be_sfp. */ bba_sets_must_be_sfp =
> > + BITMAP_ALLOC (®_obstack); do
> > + {
> > + sfp_found = false;
> > + FOR_ALL_BB_FN (bb, cfun)
> > + FOR_BB_INSNS (bb, a_insn)
> > + {
> > + rtx sset_a = single_set (a_insn);
> > + if (!sset_a)
> > + continue;
> > + rtx src = SET_SRC (sset_a);
> > + rtx dest = SET_DEST (sset_a);
> So again, we can get things that aren't a single_set here and they'll be
> ignored. But I believe that's safe as you're trying to identify insns which
> store
> stack addresses into a REG. Missing a more complicated case where a stack
> address is stored into a REG is safe. Right?
>
Yes. Missing some pointers to stack is safe here, because we will use it to
check if a memory
in if-conversion candidate doesn't have store speculation. If we miss it, the
optimization will
not be triggered, so there will not be risky.
>
>
>
> > +
> > + if (!REG_P (dest))
> > + continue;
> > +
> > + /* For the case below,
> > + Control flow: B1->B3, B2->B3
> > + B1: p = &local_var
> > + B2: p = &global_var
> > + B3: ... = *p
> > + pointer p is an address for either local or global variable.
> > + so we don't treat p as a stack address pointer. To make
> > + algorithm simple, we ignore all non-SSA cases. */
> > + bool skip_insn = false;
> > + unsigned int dest_regno = 0;
> > + FOR_EACH_INSN_DEF (def, a_insn)
> > + {
> > + dest_regno = DF_REF_REGNO (def);
> > + /* Skip current insn if
> > + 1) already marked as stack pointer.
> > + 2) or see multiple definition points. */
> > + if (bitmap_bit_p (bba_sets_must_be_sfp, dest_regno)
> > + || def_count[dest_regno] > 1)
> > + {
> > + skip_insn = true;
> > + break;
> > + }
> > + }
> > + if (skip_insn)
> > + continue;
> Right. Again I wonder if you should be using the info computed by regstat.
> You can get single use information easily from that.
>
Yes. Fixed in attached new patch.
> > +
> > + /* Handle case like "x1 = sp + offset". */
> > + if (fixed_base_plus_p (src))
> > + {
> > + bitmap_set_bit (bba_sets_must_be_sfp, dest_regno);
> > + sfp_found = true;
> > + continue;
> > + }
> Looks like your you've got an indentation problem here. Most likely you've
> got a case where you've got 8 spaces that should be replaced with a tab.
>
Fixed in attached new patch.
>
> > +
> > + /* Handle case like "x2 = x1 + offset", in which x1 is already
> > + identified as a stack pointer. */
> > + if ((GET_CODE (src) == PLUS || GET_CODE (src) == MINUS)
> > + && CONST_INT_P (XEXP (src, 1)))
> > + {
> > + rtx x1 = XEXP (src, 0);
> > + if (!REG_P (x1))
> > + continue;
> > +
> > + FOR_EACH_INSN_USE (use, a_insn)
> > + {
> > + if (!rtx_equal_p (x1, DF_REF_REG (use)))
> > + continue;
> > +
> > + if (bitmap_bit_p (bba_sets_must_be_sfp, DF_REF_REGNO
> (use)))
> > + {
> > + bitmap_set_bit (bba_sets_must_be_sfp, dest_regno);
> > + sfp_found = true;
> > + break;
> > + }
> > + }
> So how much is there to be gained from going from this kind of localized
> computation to global? It shouldn't be hard to turn this into a truly global
> algorithm, but I'm not sure it's worth the effort.
>
> I _do_ think it's worth visiting blocks in dominator order. It's better than
> what you're doing, but nowhere near as expensive as a global propagation
> engine.
>
Fixed in attached new patch using dom_walk class. Please review again.
> Is it worth handling simple copies in the code above?
I thought it less likely to have a copy for pointer to stack before register
allocation. Right?
>
>
> > + }
> > + }
> > + }
> > + while (sfp_found);
> > +}
> > +
> > +/* Find all insns that may be stack pointer and store REGNO into
> > + bitmap bba_sets_may_be_sfp. We iterate all insns in current func
> > + until no more latent insns generating stack address are found.
> > + The collection of stack pointer bba_sets_may_be_sfp will be used
> > + to help analyze local stack variable address taken. Stack variable
> > + address can be passed out of current frame if only any stack pointer
> > + is passed to hard register or stored into memory. */
> Shouldn't "may be stack pointer" be "must be stack pointer"?
>
There is a typo in the comments. Fixed in attached new patch.
>
> > +
> > +static void
> > +find_all_may_be_sfp_insns (void)
> > +{
> > + rtx_insn *a_insn;
> > + basic_block bb;
> > + bool sfp_found;
> > + bba_sets_may_be_sfp = BITMAP_ALLOC (®_obstack);
> > +
> > + /* Iterate all insns that may be stack pointers, and store them into
> > + bitmap bba_sets_must_be_sfp. */
> > + do
> > + {
> > + sfp_found = false;
> > + FOR_ALL_BB_FN (bb, cfun)
> Again, you may ultimately be better off with a dominator walk here.
Yes. Fixed in attached new patch.
>
>
> > + FOR_BB_INSNS (bb, a_insn)
> > + {
> > + /* Assume stack pointers can only be generated by single SET.
> > */
> > + rtx sset_a = single_set (a_insn);
> > + if (!sset_a)
> > + continue;
> > + rtx src = SET_SRC (sset_a);
> > + rtx dest = SET_DEST (sset_a);
> I'm not sure that's a safe assumption.
Fixed in attached new patch.
>
>
> > +
> > + /* If we see "hard_register = ...", which implies passing out
> > + of current frame potentially, we don't collect this case.
> > + It can be treated as address taken in
> > no_stack_address_taken */
> > + if (HARD_REGISTER_P (dest))
> > + continue;
> > +
> > + /* Memory load and store are not to generate stack pointer. */
> > + if (MEM_P (src) || MEM_P (dest))
> > + continue;
> > +
> > + /* Skip insn that is already identified as stack pointer. */
> > + bool skip_insn = false;
> > + df_ref def;
> > + unsigned int dest_regno = 0;
> > + FOR_EACH_INSN_DEF (def, a_insn)
> > + {
> > + dest_regno = DF_REF_REGNO (def);
> > + if (bitmap_bit_p (bba_sets_may_be_sfp, dest_regno))
> > + {
> > + skip_insn = true;
> > + break;
> > + }
> > + }
> So earlier you've verified you've got a single set. I'm not sure you need an
> iterator over the defs. Can't you just look at SET_DEST
> (sset_a) which is conveniently in DEST?
Fixed in attached new patch.
>
>
> I don't like the term "stack pointer" here -- most of us read "stack pointer"
> and we think the register holding the current stack pointer.
> Would "pointer into the stack" be more appropriate?
>
Fixed in attached new patch.
>
> > +/* Return true if current function doesn't pass stack address out of
> > +frame. */
> > +
> > +static bool
> > +no_stack_address_taken (void)
> > +{
> > + basic_block bb;
> > + rtx_insn *a_insn;
> > + df_ref use;
> > +
> > + FOR_ALL_BB_FN (bb, cfun)
> > + FOR_BB_INSNS (bb, a_insn)
> > + {
> > + if (!INSN_P (a_insn))
> > + continue;
> > +
> > + rtx sset_a = single_set (a_insn);
> > + if (!sset_a)
> > + continue;
> > + rtx src = SET_SRC (sset_a);
> > + rtx dest = SET_DEST (sset_a);
> > +
> > + /* Skip insn that is already identified as frame pointers. */
> > + if (bitmap_bit_p (bba_sets_may_be_sfp, REGNO (dest)))
> > + continue;
> > +
> > + FOR_EACH_INSN_USE (use, a_insn)
> > + {
> > + /* Check if current insn is using any stack pointer. Ignore
> > + if it doesn't use frame pointers at all. */
> > + if (!bitmap_bit_p (bba_sets_may_be_sfp, DF_REF_REGNO (use)))
> > + continue;
> > +
> > + /* Load cannot introduce address taken. */
> > + if (MEM_P (src))
> > + continue;
> > +
> > + /* Store is safe if only the src doesn't contain stack
> > pointer. */
> > + if (MEM_P (dest)
> > + && !reg_mentioned_p (DF_REF_REG (use), src))
> > + continue;
> > +
> > + /* All other cases potentially introduce address taken. */
> > + return false;
> > + }
> So what if you have a PARALLEL where one entry causes an escape of a
> pointer into the stack and another entry does some other operation? If so,
> don't you miss the fact that you've got an escaping pointer into the stack? I
> don't think you can restrict yourself to single_sets here.
Yes. You are right. I have added code to handle PARALLEL case. I handle it in a
rather conservative way though. Review again, please!
>
> Jeff
Thanks,
-Jiangning
csel4.patch
Description: csel4.patch
