https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113025

juki at gcc dot mail.kapsi.fi changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|INVALID                     |FIXED

--- Comment #2 from juki at gcc dot mail.kapsi.fi ---
Unfortunately alignment of the cast type was not causing this issue.

I changed all calls that were defined in GCC headers to use __m128i_u or
__m128d_u types to use those types before unaligned intrinsic.

For example LOAD_SI128 macro looks like the following:

#define LOAD_SI128(ptr) \
        ( ((uintptr_t)(ptr) & 15) == 0 ) ? _mm_load_si128((__m128i*)(ptr)) :
_mm_loadu_si128((__m128i_u*)(ptr))

My changes only changed the debug information locations but did not lead to the
generation of different kind of load operations. In fact, generated assembly
was identical outside of debug line information changes:

$ diff -u0 orig.s fixed.s|grep movdq| wc
      0       0       0

But if aligned loads are removed completely as an option and only unaligned
loads (even with the wrong intrinsic type) are used, no invalid aligned loads
are generated and assembly changes significantly regarding movdq* instructions:

#define LOAD_SI128(ptr) \
        ( 0 ) ? _mm_load_si128((__m128i*)(ptr)) :
_mm_loadu_si128((__m128i*)(ptr))

diff -u0 orig.s align-loads-removed.s|grep movdq| wc
  11001   44004  263376

Above code fixes all our invalid instruction generation while only using
correct types does not.

While I can't share the related sources, I could still try to run different
tests locally to see what is be causing the issue. What could I do next to help
solve this as I do have reliable test cases to work with.

Reply via email to