On Wed, Feb 19, 2025 at 2:10 PM H.J. Lu <[email protected]> wrote:
>
> On Wed, Feb 19, 2025 at 8:16 PM Uros Bizjak <[email protected]> wrote:
> >
> > On Wed, Feb 19, 2025 at 12:53 PM H.J. Lu <[email protected]> wrote:
> > >
> > > Since stack can only be accessed by GPR, check GENERAL_REG_P, instead of
> > > REG_P, in ix86_find_all_reg_use_1.
> > >
> > > gcc/
> > >
> > > PR target/118936
> > > * config/i386/i386.cc (ix86_find_all_reg_use_1): Replace REG_P
> > > with GENERAL_REG_P.
> > >
> > > gcc/testsuite/
> > >
> > > PR target/118936
> > > * gcc.target/i386/pr118936.c: New test.
> > >
> > > OK for master?
> >
> > As said in the PR, this change "fixes" only this particular problem,
> > where we have:
> >
> > 97: ax:DI='g_1680'
> > 170: xmm0:DI=ax:DI
> > 98: xmm0:V2DI=vec_duplicate(xmm0:DI)
> > REG_EQUIV vec_duplicate(`g_1680')
> > 101: [`g_1679']=xmm0:V2DI
> > 103: [const(`g_1679'+0x10)]=xmm0:V2DI
> > 105: [const(`g_1679'+0x20)]=xmm0:V2DI
> > 107: [const(`g_1679'+0x30)]=xmm0:V2DI
> >
> > But does not fix the fact that your algorithm considers these stores
> > even when they are in no way connected to stack or frame pointer.
> > These do not even use registers in their address.
> >
> > Previous approach did this:
> > - if (check_stack_slot)
> > - {
> > - /* Find the maximum stack alignment. */
> > - subrtx_iterator::array_type array;
> > - FOR_EACH_SUBRTX (iter, array, PATTERN (insn), ALL)
> > - if (MEM_P (*iter)
> > - && (reg_mentioned_p (stack_pointer_rtx,
> > - *iter)
> > - || reg_mentioned_p (frame_pointer_rtx,
> > - *iter)))
> > - {
> > - unsigned int alignment = MEM_ALIGN (*iter);
> > - if (alignment > stack_alignment)
> > - stack_alignment = alignment;
> > - }
> > - }
> >
> > So, anywhere in the insn SP or FP was mentioned (either memory store
> > or load), the alignment was increased as requested by MEM. The issue
> > here is that another register that is calculated from SP or FP can be
> > used to access the stack slot. I'm not sure I know how your algorithm
> > works id detail, it detects e.g.:
> >
> > 103: [const(`g_1679'+0x10)]=xmm0:V2DI
> >
> > as a candidate to increase the alignment, but its address doesn't even
> > use registers, let alone SP or FP. Also,
> >
> > + note_stores (insn, ix86_update_stack_alignment,
> > + &stack_alignment);
> >
> > in your algorithm does not consider that loads from the stack also
> > increase alignment requirements.
> >
> > Can you please explain the approach in your original patch in some
> > more detail? I'd expect a variant of the original approach, where the
> > compiler keeps an up-to-date list of registers that depend on SP or
> > FP, and instead of a naive:
> >
> > reg_mentioned_p (stack_pointer_rtx, *iter)
> >
> > it goes through a list, formed of SP, FP and their dependent registers
> > and updates stack alignment if the insn memory address uses the
> > register on the list.
> >
> > Uros.
>
> My algorithm keeps a list of registers which can access the stack
> starting with SP and FP. If any registers are derived from the list,
> add them to the list until all used registers are exhausted. If
> any stores use the register on the list, update the stack alignment.
> The missing part is that it doesn't check if the register is actually
> used for memory access.
>
> Here is the v2 patch to also check if the register is used for memory
> access.
@@ -8473,16 +8482,15 @@ static void
ix86_update_stack_alignment (rtx, const_rtx pat, void *data)
{
/* This insn may reference stack slot. Update the maximum stack slot
- alignment. */
+ alignment if the memory is reference by the specified register. */
referenced
+ stack_access_data *p = (stack_access_data *) data;
subrtx_iterator::array_type array;
FOR_EACH_SUBRTX (iter, array, pat, ALL)
- if (MEM_P (*iter))
+ if (MEM_P (*iter) && reg_mentioned_p (p->reg, *iter))
@@ -8494,7 +8502,7 @@ ix86_find_all_reg_use_1 (rtx set, HARD_REG_SET
&stack_slot_access,
auto_bitmap &worklist)
{
rtx dest = SET_DEST (set);
- if (!REG_P (dest))
+ if (!GENERAL_REG_P (dest))
return;
The above change is not needed now that the register in the memory
reference is checked. The compiler can generate GPR->XMM->GPR
sequence.
@@ -8630,8 +8638,8 @@ ix86_find_max_used_stack_alignment (unsigned int
&stack_alignment,
if (!NONJUMP_INSN_P (insn))
continue;
- note_stores (insn, ix86_update_stack_alignment,
- &stack_alignment);
+ stack_access_data data = { DF_REF_REG (ref), &stack_alignment };
+ note_stores (insn, ix86_update_stack_alignment, &data);
Do we need FOR_EACH_SUBRTX here instead of note_stores to also process
reads from stack? Reads should also be aligned.
Uros.