[Bug target/115161] [15 Regression] highway-1.0.7 miscompilation of some SSE2 intrinsics

slyfox at gcc dot gnu.org via Gcc-bugs Fri, 24 May 2024 15:00:43 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115161


--- Comment #22 from Sergei Trofimovich <slyfox at gcc dot gnu.org> ---
(In reply to Sergei Trofimovich from comment #21)

gcc generates the following code for this C code:

> int main() {
>   const __m128i su = _mm_set1_epi32(0x4f800000);
>   const __m128  sf = _mm_castsi128_ps(su);
> 
>   const __m128  overflow_mask_f32 = _mm_cmpge_ps(sf,
> _mm_set1_ps(2147483648.0f));
>   const __m128i overflow_mask = _mm_castps_si128(overflow_mask_f32);
> 
>   const __m128i conv = _mm_cvttps_epi32(sf); // overflows
>   const __m128i yes = _mm_set1_epi32(INT32_MAX);
> 
>   const __m128i a = _mm_and_si128(overflow_mask, yes);
>   const __m128i na = _mm_andnot_si128(overflow_mask, conv);
> 
>   const __m128i conv_masked = _mm_or_si128(a, na);

Dump of assembler code for function main:
   0x0000000000401020 <+0>:     sub    $0x8,%rsp
   0x0000000000401024 <+4>:     movss  0xfdc(%rip),%xmm2        # 0x402008
   0x000000000040102c <+12>:    movss  0xfd0(%rip),%xmm0        # 0x402004
   0x0000000000401034 <+20>:    movss  0xfd0(%rip),%xmm3        # 0x40200c
   0x000000000040103c <+28>:    shufps $0x0,%xmm2,%xmm2
   0x0000000000401040 <+32>:    shufps $0x0,%xmm0,%xmm0
   0x0000000000401044 <+36>:    cmpleps %xmm2,%xmm0
   0x0000000000401048 <+40>:    cvttps2dq %xmm2,%xmm2
   0x000000000040104c <+44>:    shufps $0x0,%xmm3,%xmm3
   0x0000000000401050 <+48>:    movdqa %xmm0,%xmm1
   0x0000000000401054 <+52>:    andps  %xmm3,%xmm0
   0x0000000000401057 <+55>:    pandn  %xmm2,%xmm1
   0x000000000040105b <+59>:    por    %xmm0,%xmm1

All of this all looks fine.

>   const __m128i actual = _mm_cmpeq_epi32(conv_masked,
> _mm_set1_epi32(INT32_MAX));
>   const __m128i expected = _mm_set1_epi32(-1);

   0x000000000040105f <+63>:    pcmpeqd %xmm0,%xmm0
   0x0000000000401063 <+67>:    pcmpeqd %xmm2,%xmm1
   0x0000000000401067 <+71>:    call   0x401160 <_ZL9assert_eqDv2_xS_>

Here `pcmpeqd %xmm2,%xmm1` is a problematic instruction. Why does `gcc` use
`%xmm2` (result of `cvttps2dq`) instead of, say `%xmm0` which contains
`0xFFFFffff` pattern?

>   assert_eq(expected, actual);
> }
> ```

[Bug target/115161] [15 Regression] highway-1.0.7 miscompilation of some SSE2 intrinsics

Reply via email to