[Bug sanitizer/84508] Load of misaligned address using _mm_load_sd

peter at cordes dot ca via Gcc-bugs Sat, 26 Mar 2022 16:35:38 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84508


Peter Cordes <peter at cordes dot ca> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |peter at cordes dot ca

--- Comment #14 from Peter Cordes <peter at cordes dot ca> ---
This bug is mis-categorized; it's not a sanitizer bug, it's a bug in the
implementation _mm_load_ss / sd.

It currently derefs the  `float const*` arg directly, which is not
strict-aliasing or alignment safe.  alignof(float) is 4, but Intel's
documentation for this API still says "mem_addr does not need to be aligned on
any particular boundary."

_mm_load_ss (float const *__P)
{
  return _mm_set_ss (*__P);
}


As discussed on PR99754 _mm_load_si32(const void*) *is* strict-aliasing and
alignment safe.  But it only existed recently, and GCC11's implementation of it
is buggy (shuffling the element to the wrong place).  Before that, one safe way
to do a 32-bit SIMD load is with _mm_load_ss and _mm_castps_si128.  Or it was
supposed to be safe, but isn't!!

Clang uses a packed may_alias struct containing a float to get a safe load
done.  Another way would be casting the pointer to

typdef float aliasing_unaligned_f32 __attribute__((aligned(1),may_alias));

This is similar to what we do with __m32_u for use in aliasing-safe integer
load/store, except we define that as int with
vector_size(4),may_alias,aligned(1) for some reason.  Perhaps influenced by
__m64_u which is a vector of 2 ints.

MSVC is like gcc -fno-strict-aliasing, so however it handles intrinsics,
they're always aliasing-safe.

I'm not 100% sure about what ICC formally guarantees, but in practice it
doesn't move aliasing short*  stores across a _mm_load_ss( (float*)pshort )
load.
https://godbolt.org/z/6s76v71xz  I didn't test with _mm_store_ss aliasing with
short loads, only vice versa.

So GCC is the odd one out, out of the major 4 compilers that support Intel's
intrinsics API.  All our narrow load/store intrinsics should be strict-aliasing
and alignment safe, regardless of what pointer type they accept.

Intel's early design of taking float* and double* instead of void* could be
considered poor design.  Their naming with just load/store instead of
_mm_loadu_ss / storeu is also poor design, clearly motivated by the asm
differences rather than an actual intrinsic API difference.

In x86 asm, loads/stores narrower than 16 bytes never require alignment (unless
the AC bit is set in EFLAGS).  Assuming Intel modeled their intrinsics API
after their asm, then it makes sense to have load and loadu for ps and si128,
but only load/store with an implied lack of alignment for intrinsics that wrap
instructions like movlps / movhps / movss / movsd, and movd / movq, which do
narrower memory accesses.

That of course *doesn't* make sense in C terms, where it's always potentially a
problem to dereference misaligned pointers to narrow objects, even when
compiling for x86-64:
https://stackoverflow.com/questions/47510783/why-does-unaligned-access-to-mmaped-memory-sometimes-segfault-on-amd64
has an example and links some others, showing that compilers *don't* define the
behaviour of deref of misaligned pointers.

I'm pretty certain that Intel always intended their narrow load/store
intrinsics to not have any alignment requirements, like the asm instructions
that wrap them, but weren't thinking in C terms when naming them.  And were
sloppily in their choices of which ones to provide until decades later, since
it seems they thought that _mm_cvtsi32_si128(*x) was sufficient for a movd
load.  (Only the case on a compiler without strict-aliasing or alignment, since
the deref happens on the user's plain int*).

Anyway, hopefully this refutes the argument that _mm_load_sd should be aligned
because of the name, and clarifies what Intel might have been thinking when
naming these.

[Bug sanitizer/84508] Load of misaligned address using _mm_load_sd

Reply via email to