Alex Coplan <[email protected]> writes:
> On 11/10/2024 14:30, Richard Biener wrote:
>> On Fri, 11 Oct 2024, Richard Sandiford wrote:
>>
>> > Alex Coplan <[email protected]> writes:
>> > > Hi,
>> > >
>> > > As the PR shows, pair-fusion was tricking memory_modified_in_insn_p into
>> > > returning false when a common base register (in this case, x1) was
>> > > modified between the mem and the store insn. This lead to wrong code as
>> > > the accesses really did alias.
>> > >
>> > > To avoid this sort of problem, this patch avoids invoking RTL alias
>> > > analysis altogether (and assume an alias conflict) if the two insns to
>> > > be compared share a common address register R, and the insns see
>> > > different
>> > > definitions of R (i.e. it was modified in between).
>> > >
>> > > Bootstrapped/regtested on aarch64-linux-gnu (all languages, both regular
>> > > bootstrap and LTO+PGO bootstrap). OK for trunk?
>> >
>> > Sorry for the slow review. The patch looks good to me, but...
>
> Thanks for the review. I'd missed that you'd sent this, sorry for not
> responding sooner.
>
>> >
>> > > @@ -2544,11 +2624,37 @@ pair_fusion_bb_info::try_fuse_pair (bool load_p,
>> > > unsigned access_size,
>> > > && bitmap_bit_p (&m_tombstone_bitmap, insn->uid ());
>> > > };
>> > >
>> > > + // Maximum number of distinct regnos we expect to appear in a single
>> > > + // MEM (and thus in a candidate insn).
>> > > + static constexpr int max_mem_regs = 2;
>> > > + auto_vec<access_info *, max_mem_regs> addr_use_vec[2];
>> > > + use_array addr_uses[2];
>> > > +
>> > > + // Collect the lists of register uses that occur in the candidate
>> > > MEMs.
>> > > + for (int i = 0; i < 2; i++)
>> > > + {
>> > > + // N.B. it's safe for us to ignore uses that only occur in notes
>> > > + // here (e.g. in a REG_EQUIV expression) since we only pass the
>> > > + // MEM down to the alias machinery, so it can't see any insn-level
>> > > + // notes.
>> > > + for (auto use : insns[i]->uses ())
>> > > + if (use->is_reg ()
>> > > + && use->includes_address_uses ()
>> > > + && !use->only_occurs_in_notes ())
>> > > + {
>> > > + gcc_checking_assert (addr_use_vec[i].length () <
>> > > max_mem_regs);
>> > > + addr_use_vec[i].quick_push (use);
>> >
>> > ...if possible, I think it would be better to just use safe_push here,
>> > without the assert. There'd then be no need to split max_mem_regs out;
>> > it could just be hard-coded in the addr_use_vec declaration.
>
> I hadn't realised at the time that quick_push () already does a
> gcc_checking_assert to make sure that we don't overflow. It does:
>
> template<typename T, typename A>
> inline T *
> vec<T, A, vl_embed>::quick_push (const T &obj)
> {
> gcc_checking_assert (space (1));
> T *slot = &address ()[m_vecpfx.m_num++];
> ::new (static_cast<void*>(slot)) T (obj);
> return slot;
> }
>
> (I checked the behaviour by writing a quick selftest in vec.cc, and it
> indeed aborts as expected with quick_push on overflow for a
> stack-allocated auto_vec with N = 2.)
>
> This means that the assert above is indeed redundant, so I agree that
> we should be able to drop the assert and drop the max_mem_regs constant,
> using a literal inside the auto_vec template instead (all while still
> using quick_push).
>
> Does that sound OK to you, or did you have another reason to prefer
> safe_push? AIUI the behaviour of safe_push on overflow would be to
> allocate a new (heap-allocated) vector instead of asserting.
I just thought it looked odd/unexpected. Normally the intent of:
auto_vec<foo, 16> bar;
is to reserve a sensible amount of stack space for the common case,
but still support the general case of arbitrarily many elements.
The common on-stack case will be fast with both quick_push and
safe_push[*]. The difference is just whether growing beyond the
static space would abort the compiler or work as expected.
quick_push makes sense if an earlier loop has calculated the runtime
length of the vector and if we've already reserved that amount, or if
there is a static property that guarantees a static limit. But the limit
of 2 looked more like a general assumption, rather than something that
had been definitively checked by earlier code.
I was also wondering whether using safe_push on an array of auto_vecs
caused issues, and so you were having to work around that. (I remember
sometimes hitting a warning about attempts to delete an on-stack buffer,
presumably due to code duplication creating contradictory paths that
jump threading couldn't optimise away as dead.)
No real objection though. Just wanted to clarify what I meant. :)
Thanks,
Richard
[*] well, ok, quick_push will be slightly faster in release builds,
since quick_push won't do a bounds check in that case. But the
check in safe_push would be highly predictable.