https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116125
--- Comment #5 from Richard Sandiford <rsandifo at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #3)
> We document
>
> class dr_with_seg_len
> {
> ...
> /* The minimum common alignment of DR's start address, SEG_LEN and
> ACCESS_SIZE. */
> unsigned int align;
>
> but here we have access_size == 1 and align == 4. It's also said
>
> /* All addresses involved are known to have a common alignment ALIGN.
> We can therefore subtract ALIGN from an exclusive endpoint to get
> an inclusive endpoint. In the best (and common) case, ALIGN is the
> same as the access sizes of both DRs, and so subtracting ALIGN
> cancels out the addition of an access size. */
> unsigned int align = MIN (dr_a.align, dr_b.align);
> poly_uint64 last_chunk_a = dr_a.access_size - align;
> poly_uint64 last_chunk_b = dr_b.access_size - align;
>
> and
>
> We also know
> that last_chunk_b <= |step|; this is checked elsewhere if it isn't
> guaranteed at compile time.
>
> step == 4, but last_chunk_a/b are -3U. I couldn't find the "elsewhere"
> to check what we validate there.
The assumption that access_size is a multiple of align is crucial, so like you
say, it all falls apart if that doesn't hold. In this case, that means that
last_chunk_* should never have been negative.
But I agree that the “elsewhere” doesn't seem to exist after all. That is, the
step can be arbitrarily smaller than the access size. Somewhat relatedly, we
seem to vectorise:
struct s { int x; } __attribute__((packed));
void f (char *xc, char *yc, int z)
{
for (int i = 0; i < 100; ++i)
{
struct s *x = (struct s *) xc;
struct s *y = (struct s *) yc;
x->x += y->x;
xc += z;
yc += z;
}
}
on aarch64 even with -mstrict-align -fno-vect-cost-model, generating
elementwise accesses that assume that the ints are aligned. E.g.:
_71 = (char *) ivtmp.19_21;
_30 = ivtmp.29_94 - _26;
_60 = (char *) _30;
_52 = __MEM <int> ((int *)_71);
_53 = (char *) ivtmp.25_18;
_54 = __MEM <int> ((int *)_53);
_55 = (char *) ivtmp.26_16;
_56 = __MEM <int> ((int *)_55);
_57 = (char *) ivtmp.27_88;
_58 = __MEM <int> ((int *)_57);
_59 = _Literal (int [[gnu::vector_size(16)]]) {_52, _54, _56, _58};
But the vector loop is executed even for a step of 1 (byte), provided that x
and y don't overlap.
> I think the case of align > access_size can easily happen with grouped
> accesses with a gap at the end (see vect_vfa_access_size), so simply
> failing the address-based check for this case is too pessimistic.
Yeah. Something similar happened in PR115192, and I think this PR shows that
the fix for PR115192 was misplaced. We should fix the overly large alignment
at source, to meet the dr_with_seg_len precondition you quoted above.