[Bug sanitizer/84508] Load of misaligned address using _mm_load_sd

2024-05-30 Thread pcordes at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84508

--- Comment #27 from Peter Cordes  ---
(In reply to Hongtao Liu from comment #26)
> (In reply to Hongtao Liu from comment #25)
> > (In reply to Peter Cordes from comment #22)
> > > Why are we adding an alignment requirement to _mm_storel_pd, the intrinsic
> > > for MOVLPD?
> > > 
> > From Intel intrinsic guide[1], there's explict "mem_addr does not need to be
> > aligned on any particular boundary" for mm_store_sd, but not for
> > _mm_storel_pd.
> > [1] https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html
> > 
> But for mm_loadl_pd, it also says no need for alignment, I need to confirm
> with my peers if there's any specific purpose on that.
> And yes, for <16-byte memory access, there's no alignment requirement
> functionally.

Interesting, yes some entries explicitly say the memory can be unaligned, some
don't.  But I don't think we should read that as alignment required being the
default if not stated.  Every intrinsic that does require alignment explicitly
says so.  (Like _mm_load_si128.)  We could make the same argument in the other
direction, that if an alignment requirement isn't mentioned, we should assume
there isn't one.

And I already posted earlier about why we shouldn't assume C semantics based on
the pointer type as Andrew Pinski had thought.  Intel's intrinsic docs were
originally written for ICC (classic), which takes intrinsics very literally: an
intrinsic in the C source will (almost?) always compile to the corresponding
asm instruction.  And presumably not optimizing based on pointer-alignment UB
even on a deref.  And definitely not on strict-aliasing UB.

So the C defaults for deref of a double* or __m64* shouldn't be assumed even
when the docs don't say anything about alignment.  They also don't mention
aliasing but we know from Intel's examples of how to use intrinsics (I think)
that the load/store intrinsics are all may_alias accesses.

Intel's current ICX compiler is based on LLVM which does care about aliasing
and alignment UB when optimizing, but their intrinsic docs still read like
they're thinking more in terms of asm than in terms of the C abstract machine. 
Probably they haven't been rewritten with that in mind since they implement
them (in their own compilers) so they Just Work even when aliasing other types
or without alignment.

[Bug sanitizer/84508] Load of misaligned address using _mm_load_sd

2024-05-29 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84508

--- Comment #26 from Hongtao Liu  ---
(In reply to Hongtao Liu from comment #25)
> (In reply to Peter Cordes from comment #22)
> > Why are we adding an alignment requirement to _mm_storel_pd, the intrinsic
> > for MOVLPD?
> > 
> From Intel intrinsic guide[1], there's explict "mem_addr does not need to be
> aligned on any particular boundary" for mm_store_sd, but not for
> _mm_storel_pd.
> [1] https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html
> 
But for mm_loadl_pd, it also says no need for alignment, I need to confirm with
my peers if there's any specific purpose on that.
And yes, for <16-byte memory access, there's no alignment requirement
functionally.

[Bug sanitizer/84508] Load of misaligned address using _mm_load_sd

2024-05-29 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84508

--- Comment #25 from Hongtao Liu  ---
(In reply to Peter Cordes from comment #22)
> Why are we adding an alignment requirement to _mm_storel_pd, the intrinsic
> for MOVLPD?
> 
>From Intel intrinsic guide[1], there's explict "mem_addr does not need to be
aligned on any particular boundary" for mm_store_sd, but not for _mm_storel_pd.
[1] https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html

> Also, should the type name  double_u   be changed to something with __ to
> avoid polluting the namespace?
Yes, __double_u makes sense.

[Bug sanitizer/84508] Load of misaligned address using _mm_load_sd

2024-05-29 Thread pcordes at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84508

--- Comment #24 from Peter Cordes  ---
(In reply to Jeffrey Walton from comment #23)
> (In reply to Peter Cordes from comment #22)
> > [...]
> > That instruction is useless and should never be used in asm except for
> > code-alignment reasons (1 byte longer than MOVLPS, same length as MOVSD, all
> > three doing the same thing for the memory-destination form).  But easy to
> > imagine some code using that intrinsic to store an unaligned double into a
> > byte buffer.
> 
> Reading from and writing to a [unaligned] byte stream in 4 or 8 byte chunks
> is our use case. Eventually, we need to perform traditional SIMD processing.
> But the loads and stores have to occur using these old instrinsics due to
> the word types, data stream format and supported ISA's.
> 
> I believe the other option is to memcpy the byte stream into a properly
> aligned intermediate buffer. But that could incur a performance hit if the
> optimizer misses the opportunity (and fails to elide the memcpy).


Apparently GCC has been "broken" for ages, making it UB to use misaligned
pointers with any of these intrinsics that only just now had their alignment
requirements removed.  And with _mm_storel_pd which is the same as before. 
Usually not resulting in miscompilation, though.

Going forward, simply avoid _mm_storel_pd.
Use _mm_store_sd (MOVSD) or _mm_storel_pi (MOVLPS) which have been fixed by
this patch.

_mm_store_sd derefs a  double_u  pointer, __attribute__((aligned(1),may_alias))

_mm_storel_pi uses __builtin_ia32_storelps
It didn't change in this patch, so presumably has been correct for longer.  If
you can put up with the amount of casting required to use it for the low double
of a __m128d (perhaps in a wrapper function that takes a void* and a vector),
_mm_storel_pi might be your best bet, unless there's anything weird about the
GCC internals for __builtin_ia32_storelps

The asm instruction you want is MOVLPS (1 byte shorter than the others in
non-AVX code) so it also has the advantage of hinting GCC to use that.

[Bug sanitizer/84508] Load of misaligned address using _mm_load_sd

2024-05-29 Thread noloader at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84508

--- Comment #23 from Jeffrey Walton  ---
(In reply to Peter Cordes from comment #22)
> [...]
> That instruction is useless and should never be used in asm except for
> code-alignment reasons (1 byte longer than MOVLPS, same length as MOVSD, all
> three doing the same thing for the memory-destination form).  But easy to
> imagine some code using that intrinsic to store an unaligned double into a
> byte buffer.

Reading from and writing to a [unaligned] byte stream in 4 or 8 byte chunks is
our use case. Eventually, we need to perform traditional SIMD processing. But
the loads and stores have to occur using these old instrinsics due to the word
types, data stream format and supported ISA's.

I believe the other option is to memcpy the byte stream into a properly aligned
intermediate buffer. But that could incur a performance hit if the optimizer
misses the opportunity (and fails to elide the memcpy).

[Bug sanitizer/84508] Load of misaligned address using _mm_load_sd

2024-05-29 Thread pcordes at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84508

--- Comment #22 from Peter Cordes  ---
Why are we adding an alignment requirement to _mm_storel_pd, the intrinsic for
MOVLPD?

It was defined in terms of _mm_store_sd (which this patch correctly changes to
remove the alignment requirement), so we're technically not *adding* an
alignment requirement, rather keeping it from the old definition of
_mm_store_sd.

This is a bad thing; we should be removing the alignment requirement on it,
too.

That instruction is useless and should never be used in asm except for
code-alignment reasons (1 byte longer than MOVLPS, same length as MOVSD, all
three doing the same thing for the memory-destination form).  But easy to
imagine some code using that intrinsic to store an unaligned double into a byte
buffer.

IDK if there's any authoritative documentation from Intel on which intrinsics
support unaligned pointers, but for intrinsics which are documented as
corresponding to one specific instruction (unlike _mm_set), the sensible
assumption would be that the intrinsic has the same alignment requirements as
the instruction.  For everything narrower than 16 bytes, that means no
alignment requirement.  I think most programmers would find it surprising if
that wasn't the case, especially since GCC doesn't AFAIK document the
intrinsics itself to specify anything else.

(And with Intel intrinsics, I think they're all intended to allow aliasing,
e.g. pointing a double* at a buffer also accessed with some struct type.)

Also, should the type name  double_u   be changed to something with __ to avoid
polluting the namespace?

(In reply to GCC Commits from comment #19)
> The master branch has been updated by Hu :
> 
> https://gcc.gnu.org/g:5967696c0f6300da4387fea5d102be5bc9f23233
> 
> commit r15-337-g5967696c0f6300da4387fea5d102be5bc9f23233
...
> (_mm_storel_pd): Add alignment requirement.
> * config/i386/xmmintrin.h

[Bug sanitizer/84508] Load of misaligned address using _mm_load_sd

2024-05-09 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84508

--- Comment #21 from GCC Commits  ---
The master branch has been updated by Jakub Jelinek :

https://gcc.gnu.org/g:e02b5683e77c2b4317b23be72e43b6e6cc6c8e5b

commit r15-350-ge02b5683e77c2b4317b23be72e43b6e6cc6c8e5b
Author: Jakub Jelinek 
Date:   Thu May 9 20:59:05 2024 +0200

testsuite: Fix up pr84508* tests [PR84508]

The tests FAIL on x86_64-linux with
/usr/bin/ld: cannot find -lubsan
collect2: error: ld returned 1 exit status
compiler exited with status 1
FAIL: gcc.target/i386/pr84508-1.c (test for excess errors)
Excess errors:
/usr/bin/ld: cannot find -lubsan

The problem is that only *.dg/ubsan/ubsan.exp calls ubsan_init
which adds the needed search paths to libubsan library.
So, link/run tests for -fsanitize=undefined need to go into
gcc.dg/ubsan/ or g++.dg/ubsan/, even when they are target specific.

2024-05-09  Jakub Jelinek  

PR target/84508
* gcc.target/i386/pr84508-1.c: Move to ...
* gcc.dg/ubsan/pr84508-1.c: ... here.  Restrict to i?86/x86_64
non-ia32 targets.
* gcc.target/i386/pr84508-2.c: Move to ...
* gcc.dg/ubsan/pr84508-2.c: ... here.  Restrict to i?86/x86_64
non-ia32 targets.

[Bug sanitizer/84508] Load of misaligned address using _mm_load_sd

2024-05-09 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84508

Hongtao Liu  changed:

   What|Removed |Added

 CC||liuhongt at gcc dot gnu.org
 Resolution|--- |FIXED
 Status|NEW |RESOLVED

--- Comment #20 from Hongtao Liu  ---
Fixed in GCC15.

[Bug sanitizer/84508] Load of misaligned address using _mm_load_sd

2024-05-09 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84508

--- Comment #19 from GCC Commits  ---
The master branch has been updated by Hu :

https://gcc.gnu.org/g:5967696c0f6300da4387fea5d102be5bc9f23233

commit r15-337-g5967696c0f6300da4387fea5d102be5bc9f23233
Author: Hu, Lin1 
Date:   Fri Jan 19 15:22:10 2024 +0800

i386: Fix some intrinsics without alignment requirements.

gcc/ChangeLog:

PR target/84508
* config/i386/emmintrin.h
(_mm_load_sd): Remove alignment requirement.
(_mm_store_sd): Ditto.
(_mm_loadh_pd): Ditto.
(_mm_loadl_pd): Ditto.
(_mm_storel_pd): Add alignment requirement.
* config/i386/xmmintrin.h
(_mm_loadh_pi): Remove alignment requirement.
(_mm_loadl_pi): Ditto.
(_mm_load_ss): Ditto.
(_mm_store_ss): Ditto.

gcc/testsuite/ChangeLog:

PR target/84508
* gcc.target/i386/pr84508-1.c: New test.
* gcc.target/i386/pr84508-2.c: Ditto.

[Bug sanitizer/84508] Load of misaligned address using _mm_load_sd

2023-12-17 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84508

Andrew Pinski  changed:

   What|Removed |Added

 CC||pobrn at protonmail dot com

--- Comment #18 from Andrew Pinski  ---
*** Bug 113053 has been marked as a duplicate of this bug. ***

[Bug sanitizer/84508] Load of misaligned address using _mm_load_sd

2022-03-28 Thread peter at cordes dot ca via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84508

--- Comment #17 from Peter Cordes  ---
(In reply to Andrew Pinski from comment #16)
> >According to Intel (
> > https://software.intel.com/sites/landingpage/IntrinsicsGuide), there are no
> > alignment requirements for _mm_load_sd, _mm_store_sd and _mm_loaddup_pd. For
> > example, from _mm_load_sd:
> 
> I disagree with saying there is no alignment requirement.
> 
> The alignment requirement comes from the type of the argument (double
> const*). [...]
> Pointers themselves have an alignment requirement not just at the time of
> the load/store of them.

The intrinsics are badly designed to take pointer args with types other than
void*, despite how they're expected to work.  This is something we just need to
accept.  Starting with AVX-512, any new intrinsics take void*, but they haven't
redefined the old ones.

_mm_loadu_si128 takes a __m128i*, same as _mm_load_si128.  alignof(__m128i) ==
16, so _mm_loadu_si128 must not simply dereference it, that's what
_mm_load_si128 does.

Intel's intrinsics API requires you to do unaligned 16-byte loads by creating a
misaligned pointer and passing it to a loadu intrinsic.  (This in turn requires
that implementations supporting these intrinsics define the behaviour of
creating such a pointer without deref; in ISO C that alone would be UB.)

This additional unaligned-pointer behaviour that implementations must define
(at least for __m128i* and float/double*) is something I wrote about in an SO
answer:
https://stackoverflow.com/questions/52112605/is-reinterpret-casting-between-hardware-simd-vector-pointer-and-the-correspond


_mm_loadu_ps (like _mm_load_ps) takes a float*, but its entire purpose it to
not require alignment.

_mm512_loadu_ps takes a void* arg, so we can infer that earlier FP load
intrinsics really are intended to work on data with any alignment, not just
with the alignment of a float.

They're unlike a normal deref of a float* in aliasing rules, although that's
separate from creating a misaligned float* in code outside the intrinsic.  A
hypothetical low-performance portable emulation of intrinsics that ended up
dereferencing that float* arg directly would be broken for strict-aliasing as
well.

The requirement to define the behaviour of having a misaligned float* can be
blamed on Intel in 1995 (when SSE1 was new). Later extensions like AVX
_mm256_loadu_ps just followed the same pattern of taking float* until they
finally used void* for intrinsics introduced with or after AVX-512.

The introduction of _mm_loadu_si32 and si16 is another step in the right
direction, recognizing that _mm_cvtsi32_si128( *int_ptr ) isn't strict-aliasing
safe.  When those were new, it might have been around the time Intel started
exploring replacing ICC with the LLVM-based ICX.

Anyway, the requirement to support misaligned vector and float/double pointers
implies that _mm_load_ss/sd taking float*/double* doesn't imply alignof(float)
or alignof(double).

>  So either the intrinsics definition needs to be changed to be
> correct or GCC is correct.

That's an option; I'd love it if all the load/store intrinsics were changed
across all compilers to take void*.  It's ugly and a pain to type  
_mm_loadu_si128( (const __m128i*)ptr )
as well as creating cognitive dissonance because alignof(__m128i) == 16.

I'm not sure if it could break anything to change the intrinsics to take void*
even for older ones; possibly only C++ overload resolution for insane code that
defines a _mm_loadu_ps( other_type * ) and relies on float* args picking the
intrinsic.

If we changed just GCC, without getting buy-in from other compilers, taking
void* would let people's code compile on GCC without casts from stuff like
int*, when it wouldn't compile on other compilers.

That could be considered a bad thing if people test their code with GCC and are
surprised to get reports of failure from people using compilers that follow
Intel's documentation for the intrinsic function arg types. 
(https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html).  It
would basically be a case of being overly permissive for the feature / API that
people are trying to write portable code against.

[Bug sanitizer/84508] Load of misaligned address using _mm_load_sd

2022-03-27 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84508

--- Comment #16 from Andrew Pinski  ---
>According to Intel (
> https://software.intel.com/sites/landingpage/IntrinsicsGuide), there are no
> alignment requirements for _mm_load_sd, _mm_store_sd and _mm_loaddup_pd. For
> example, from _mm_load_sd:

I disagree with saying there is no alignment requirement.

The alignment requirement comes from the type of the argument (double const*).
So either the intrinsics definition needs to be changed to be correct or GCC is
correct.
Pointers themselves have an alignment requirement not just at the time of the
load/store of them.

[Bug sanitizer/84508] Load of misaligned address using _mm_load_sd

2022-03-27 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84508

Hongtao.liu  changed:

   What|Removed |Added

 CC||crazylht at gmail dot com

--- Comment #15 from Hongtao.liu  ---

Clang's implementation:

1681static __inline__ __m128 __DEFAULT_FN_ATTRS
1682_mm_load_ss(const float *__p)
1683{
1684  struct __mm_load_ss_struct {
1685float __u;
1686  } __attribute__((__packed__, __may_alias__));
1687  float __u = ((const struct __mm_load_ss_struct*)__p)->__u;
1688  return __extension__ (__m128){ __u, 0, 0, 0 };
1689}

Guess we can do similar things, will handle it in GCC13.

[Bug sanitizer/84508] Load of misaligned address using _mm_load_sd

2022-03-26 Thread peter at cordes dot ca via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84508

Peter Cordes  changed:

   What|Removed |Added

 CC||peter at cordes dot ca

--- Comment #14 from Peter Cordes  ---
This bug is mis-categorized; it's not a sanitizer bug, it's a bug in the
implementation _mm_load_ss / sd.

It currently derefs the  `float const*` arg directly, which is not
strict-aliasing or alignment safe.  alignof(float) is 4, but Intel's
documentation for this API still says "mem_addr does not need to be aligned on
any particular boundary."

_mm_load_ss (float const *__P)
{
  return _mm_set_ss (*__P);
}


As discussed on PR99754 _mm_load_si32(const void*) *is* strict-aliasing and
alignment safe.  But it only existed recently, and GCC11's implementation of it
is buggy (shuffling the element to the wrong place).  Before that, one safe way
to do a 32-bit SIMD load is with _mm_load_ss and _mm_castps_si128.  Or it was
supposed to be safe, but isn't!!

Clang uses a packed may_alias struct containing a float to get a safe load
done.  Another way would be casting the pointer to

typdef float aliasing_unaligned_f32 __attribute__((aligned(1),may_alias));

This is similar to what we do with __m32_u for use in aliasing-safe integer
load/store, except we define that as int with
vector_size(4),may_alias,aligned(1) for some reason.  Perhaps influenced by
__m64_u which is a vector of 2 ints.

MSVC is like gcc -fno-strict-aliasing, so however it handles intrinsics,
they're always aliasing-safe.

I'm not 100% sure about what ICC formally guarantees, but in practice it
doesn't move aliasing short*  stores across a _mm_load_ss( (float*)pshort )
load.
https://godbolt.org/z/6s76v71xz  I didn't test with _mm_store_ss aliasing with
short loads, only vice versa.

So GCC is the odd one out, out of the major 4 compilers that support Intel's
intrinsics API.  All our narrow load/store intrinsics should be strict-aliasing
and alignment safe, regardless of what pointer type they accept.

Intel's early design of taking float* and double* instead of void* could be
considered poor design.  Their naming with just load/store instead of
_mm_loadu_ss / storeu is also poor design, clearly motivated by the asm
differences rather than an actual intrinsic API difference.

In x86 asm, loads/stores narrower than 16 bytes never require alignment (unless
the AC bit is set in EFLAGS).  Assuming Intel modeled their intrinsics API
after their asm, then it makes sense to have load and loadu for ps and si128,
but only load/store with an implied lack of alignment for intrinsics that wrap
instructions like movlps / movhps / movss / movsd, and movd / movq, which do
narrower memory accesses.

That of course *doesn't* make sense in C terms, where it's always potentially a
problem to dereference misaligned pointers to narrow objects, even when
compiling for x86-64:
https://stackoverflow.com/questions/47510783/why-does-unaligned-access-to-mmaped-memory-sometimes-segfault-on-amd64
has an example and links some others, showing that compilers *don't* define the
behaviour of deref of misaligned pointers.

I'm pretty certain that Intel always intended their narrow load/store
intrinsics to not have any alignment requirements, like the asm instructions
that wrap them, but weren't thinking in C terms when naming them.  And were
sloppily in their choices of which ones to provide until decades later, since
it seems they thought that _mm_cvtsi32_si128(*x) was sufficient for a movd
load.  (Only the case on a compiler without strict-aliasing or alignment, since
the deref happens on the user's plain int*).

Anyway, hopefully this refutes the argument that _mm_load_sd should be aligned
because of the name, and clarifies what Intel might have been thinking when
naming these.

[Bug sanitizer/84508] Load of misaligned address using _mm_load_sd

2021-05-17 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84508

Jakub Jelinek  changed:

   What|Removed |Added

   Target Milestone|8.5 |---

[Bug sanitizer/84508] Load of misaligned address using _mm_load_sd

2020-03-04 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84508

Jakub Jelinek  changed:

   What|Removed |Added

   Target Milestone|8.4 |8.5

--- Comment #13 from Jakub Jelinek  ---
GCC 8.4.0 has been released, adjusting target milestone.

[Bug sanitizer/84508] Load of misaligned address using _mm_load_sd

2019-02-22 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84508

Jakub Jelinek  changed:

   What|Removed |Added

   Target Milestone|8.3 |8.4

--- Comment #12 from Jakub Jelinek  ---
GCC 8.3 has been released.

[Bug sanitizer/84508] Load of misaligned address using _mm_load_sd

2018-07-26 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84508

Jakub Jelinek  changed:

   What|Removed |Added

   Target Milestone|8.2 |8.3

--- Comment #11 from Jakub Jelinek  ---
GCC 8.2 has been released.

[Bug sanitizer/84508] Load of misaligned address using _mm_load_sd

2018-05-02 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84508

Jakub Jelinek  changed:

   What|Removed |Added

   Target Milestone|8.0 |8.2

--- Comment #10 from Jakub Jelinek  ---
GCC 8.1 has been released.

[Bug sanitizer/84508] Load of misaligned address using _mm_load_sd

2018-02-27 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84508

--- Comment #9 from Richard Biener  ---
unaligned loads from non-aggregates should be fully supported these days even
on strict-align targets where they will result in bitfield extracts.

[Bug sanitizer/84508] Load of misaligned address using _mm_load_sd

2018-02-24 Thread noloader at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84508

--- Comment #8 from Jeffrey Walton  ---
(In reply to Jeffrey Walton from comment #0)
> According to Intel
> (https://software.intel.com/sites/landingpage/IntrinsicsGuide), there are no
> alignment requirements for _mm_load_sd, _mm_store_sd and _mm_loaddup_pd. For
> example, from _mm_load_sd:
> 
>   Description
> 
> Load a double-precision (64-bit) floating-point element from
> memory into the lower of dst, and zero the upper element.
> mem_addr does not need to be aligned on any particular boundary.

The opening sentence was incorrect. The *_sd functions do not have alignment
requirements. Searching through the Intrinsic Guide reveals the following
functions do not have the alignment requirements:

  * _mm_load_sd
  * _mm_store_sd
  * _mm_load_ss
  * _mm_loadh_pi
  * _mm_loadl_pd
  * _mm_loadl_pi
  * _mm_loadu_pd
  * _mm_loadu_ps
  * _mm_loadu_si128

  * _mm_store_sd
  * _mm_store_ss
  * _mm_storeu_pd
  * _mm_storeu_ps
  * _mm_storeu_si128

All the functions listed above specifically state "mem_addr does not need to be
aligned on any particular boundary."

The remaining functions have natural alignment or 16-byte alignment
requirements.

[Bug sanitizer/84508] Load of misaligned address using _mm_load_sd

2018-02-22 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84508

--- Comment #7 from Marc Glisse  ---
Unless vectors count as aggregates (more or less), in which case we can ignore
my previous comment.

[Bug sanitizer/84508] Load of misaligned address using _mm_load_sd

2018-02-22 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84508

--- Comment #6 from Marc Glisse  ---
(In reply to Jakub Jelinek from comment #5)
> (In reply to Marc Glisse from comment #3)
> > Workaround: define a typedef for double with
> > __attribute__((__aligned__(1))), and use _mm_set_sd(*(newtype*)p), that's
> > how it will likely be done if we change emmintrin.h."
> 
> I don't think we really support misaligned loads from non-aggregates, there
> have been lengthy discussions about that in the past.

We already use

typedef long long __m128i_u __attribute__ ((__vector_size__ (16),
__may_alias__, __aligned__ (1)));

_mm_loadu_si128 (__m128i_u const *__P)
{
  return *__P;
}

So if that doesn't work, it needs fixing in a number of places...

[Bug sanitizer/84508] Load of misaligned address using _mm_load_sd

2018-02-22 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84508

--- Comment #5 from Jakub Jelinek  ---
(In reply to Marc Glisse from comment #3)
> Workaround: define a typedef for double with
> __attribute__((__aligned__(1))), and use _mm_set_sd(*(newtype*)p), that's
> how it will likely be done if we change emmintrin.h."

I don't think we really support misaligned loads from non-aggregates, there
have been lengthy discussions about that in the past.

[Bug sanitizer/84508] Load of misaligned address using _mm_load_sd

2018-02-22 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84508

--- Comment #4 from Jakub Jelinek  ---
I don't see how this is related to sanitizer, sanitizer just checks what it
sees.
Say _mm_load_sd is implemented as
/* Create a vector with element 0 as *P and the rest zero.  */
extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__,
__artificial__))
_mm_load_sd (double const *__P)
{
  return _mm_set_sd (*__P);
}
and so pedantically requires aligned load, it is like any other double *
dereference.

If these intrinsics really allow misaligned loads, then we need to use
something different, not sure if e.g.
  struct S __attribute__((packed)) { double d; } const *p = (struct S const *)
(void *) __P;
  return _mm_set_sd (p->d);
would be ok from aliasing POV or if we'd need to introduce a builtin to load a
potentially misaligned float or double.  I guess the most important would be
that it doesn't slow down code.

[Bug sanitizer/84508] Load of misaligned address using _mm_load_sd

2018-02-22 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84508

--- Comment #3 from Marc Glisse  ---
Copying from
https://gcc.gnu.org/ml/gcc-help/2017-12/msg00031.html

"The way _mm_load_sd is currently implemented in gcc, yes, sanitizers are right
to complain. Intel could have named the thing _mm_loadu_sd if that's what they
meant. It would be simple to change if we decide to do so, please file a PR in
bugzilla.

Workaround: define a typedef for double with __attribute__((__aligned__(1))),
and use _mm_set_sd(*(newtype*)p), that's how it will likely be done if we
change emmintrin.h."

[Bug sanitizer/84508] Load of misaligned address using _mm_load_sd

2018-02-22 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84508

Martin Liška  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2018-02-22
   Target Milestone|--- |8.0
 Ever confirmed|0   |1
  Known to fail||6.4.0, 7.3.0, 8.0

--- Comment #2 from Martin Liška  ---
Confirmed for all releases supporting -fsanitize=alignment.
Clang does not emit the error. Jakub can you please take a look?

[Bug sanitizer/84508] Load of misaligned address using _mm_load_sd

2018-02-21 Thread noloader at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84508

--- Comment #1 from Jeffrey Walton  ---
__m128d is a tad bit cleaner. It does not require the casts.

$ cat test.c
#include 
int main(int argc, char* argv[])
{
  unsigned char t[16+1];
  __m128d x = _mm_load_sd((const double *)(t+1));
  _mm_store_pd((double*)t, x);
  return 0;
}

$ gcc -fsanitize=undefined test.c -o test.exe

$ ./test.exe
/usr/lib/gcc/x86_64-linux-gnu/6/include/emmintrin.h:140:10: runtime error: load
of misaligned address 0x7ffd1cf2dd11 for type 'const double', which requires 8
byte alignment
0x7ffd1cf2dd11: note: pointer points here
 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00
00 00  c0 c8 e0 ba c4