https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84508

--- Comment #17 from Peter Cordes <peter at cordes dot ca> ---
(In reply to Andrew Pinski from comment #16)
> >According to Intel (
> > https://software.intel.com/sites/landingpage/IntrinsicsGuide), there are no
> > alignment requirements for _mm_load_sd, _mm_store_sd and _mm_loaddup_pd. For
> > example, from _mm_load_sd:
> 
> I disagree with saying there is no alignment requirement.
> 
> The alignment requirement comes from the type of the argument (double
> const*). [...]
> Pointers themselves have an alignment requirement not just at the time of
> the load/store of them.

The intrinsics are badly designed to take pointer args with types other than
void*, despite how they're expected to work.  This is something we just need to
accept.  Starting with AVX-512, any new intrinsics take void*, but they haven't
redefined the old ones.

_mm_loadu_si128 takes a __m128i*, same as _mm_load_si128.  alignof(__m128i) ==
16, so _mm_loadu_si128 must not simply dereference it, that's what
_mm_load_si128 does.

Intel's intrinsics API requires you to do unaligned 16-byte loads by creating a
misaligned pointer and passing it to a loadu intrinsic.  (This in turn requires
that implementations supporting these intrinsics define the behaviour of
creating such a pointer without deref; in ISO C that alone would be UB.)

This additional unaligned-pointer behaviour that implementations must define
(at least for __m128i* and float/double*) is something I wrote about in an SO
answer:
https://stackoverflow.com/questions/52112605/is-reinterpret-casting-between-hardware-simd-vector-pointer-and-the-correspond


_mm_loadu_ps (like _mm_load_ps) takes a float*, but its entire purpose it to
not require alignment.

_mm512_loadu_ps takes a void* arg, so we can infer that earlier FP load
intrinsics really are intended to work on data with any alignment, not just
with the alignment of a float.

They're unlike a normal deref of a float* in aliasing rules, although that's
separate from creating a misaligned float* in code outside the intrinsic.  A
hypothetical low-performance portable emulation of intrinsics that ended up
dereferencing that float* arg directly would be broken for strict-aliasing as
well.

The requirement to define the behaviour of having a misaligned float* can be
blamed on Intel in 1995 (when SSE1 was new). Later extensions like AVX
_mm256_loadu_ps just followed the same pattern of taking float* until they
finally used void* for intrinsics introduced with or after AVX-512.

The introduction of _mm_loadu_si32 and si16 is another step in the right
direction, recognizing that _mm_cvtsi32_si128( *int_ptr ) isn't strict-aliasing
safe.  When those were new, it might have been around the time Intel started
exploring replacing ICC with the LLVM-based ICX.

Anyway, the requirement to support misaligned vector and float/double pointers
implies that _mm_load_ss/sd taking float*/double* doesn't imply alignof(float)
or alignof(double).

>  So either the intrinsics definition needs to be changed to be
> correct or GCC is correct.

That's an option; I'd love it if all the load/store intrinsics were changed
across all compilers to take void*.  It's ugly and a pain to type  
_mm_loadu_si128( (const __m128i*)ptr )
as well as creating cognitive dissonance because alignof(__m128i) == 16.

I'm not sure if it could break anything to change the intrinsics to take void*
even for older ones; possibly only C++ overload resolution for insane code that
defines a _mm_loadu_ps( other_type * ) and relies on float* args picking the
intrinsic.

If we changed just GCC, without getting buy-in from other compilers, taking
void* would let people's code compile on GCC without casts from stuff like
int*, when it wouldn't compile on other compilers.

That could be considered a bad thing if people test their code with GCC and are
surprised to get reports of failure from people using compilers that follow
Intel's documentation for the intrinsic function arg types. 
(https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html).  It
would basically be a case of being overly permissive for the feature / API that
people are trying to write portable code against.

Reply via email to