https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84508
Peter Cordes <peter at cordes dot ca> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |peter at cordes dot ca --- Comment #14 from Peter Cordes <peter at cordes dot ca> --- This bug is mis-categorized; it's not a sanitizer bug, it's a bug in the implementation _mm_load_ss / sd. It currently derefs the `float const*` arg directly, which is not strict-aliasing or alignment safe. alignof(float) is 4, but Intel's documentation for this API still says "mem_addr does not need to be aligned on any particular boundary." _mm_load_ss (float const *__P) { return _mm_set_ss (*__P); } As discussed on PR99754 _mm_load_si32(const void*) *is* strict-aliasing and alignment safe. But it only existed recently, and GCC11's implementation of it is buggy (shuffling the element to the wrong place). Before that, one safe way to do a 32-bit SIMD load is with _mm_load_ss and _mm_castps_si128. Or it was supposed to be safe, but isn't!! Clang uses a packed may_alias struct containing a float to get a safe load done. Another way would be casting the pointer to typdef float aliasing_unaligned_f32 __attribute__((aligned(1),may_alias)); This is similar to what we do with __m32_u for use in aliasing-safe integer load/store, except we define that as int with vector_size(4),may_alias,aligned(1) for some reason. Perhaps influenced by __m64_u which is a vector of 2 ints. MSVC is like gcc -fno-strict-aliasing, so however it handles intrinsics, they're always aliasing-safe. I'm not 100% sure about what ICC formally guarantees, but in practice it doesn't move aliasing short* stores across a _mm_load_ss( (float*)pshort ) load. https://godbolt.org/z/6s76v71xz I didn't test with _mm_store_ss aliasing with short loads, only vice versa. So GCC is the odd one out, out of the major 4 compilers that support Intel's intrinsics API. All our narrow load/store intrinsics should be strict-aliasing and alignment safe, regardless of what pointer type they accept. Intel's early design of taking float* and double* instead of void* could be considered poor design. Their naming with just load/store instead of _mm_loadu_ss / storeu is also poor design, clearly motivated by the asm differences rather than an actual intrinsic API difference. In x86 asm, loads/stores narrower than 16 bytes never require alignment (unless the AC bit is set in EFLAGS). Assuming Intel modeled their intrinsics API after their asm, then it makes sense to have load and loadu for ps and si128, but only load/store with an implied lack of alignment for intrinsics that wrap instructions like movlps / movhps / movss / movsd, and movd / movq, which do narrower memory accesses. That of course *doesn't* make sense in C terms, where it's always potentially a problem to dereference misaligned pointers to narrow objects, even when compiling for x86-64: https://stackoverflow.com/questions/47510783/why-does-unaligned-access-to-mmaped-memory-sometimes-segfault-on-amd64 has an example and links some others, showing that compilers *don't* define the behaviour of deref of misaligned pointers. I'm pretty certain that Intel always intended their narrow load/store intrinsics to not have any alignment requirements, like the asm instructions that wrap them, but weren't thinking in C terms when naming them. And were sloppily in their choices of which ones to provide until decades later, since it seems they thought that _mm_cvtsi32_si128(*x) was sufficient for a movd load. (Only the case on a compiler without strict-aliasing or alignment, since the deref happens on the user's plain int*). Anyway, hopefully this refutes the argument that _mm_load_sd should be aligned because of the name, and clarifies what Intel might have been thinking when naming these.