https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122818

--- Comment #5 from stoventtas at gmail dot com ---
(In reply to Matthias Kretz (Vir) from comment #2)
> This is due to the use of `fixed_size_simd`. The type implements an
> additional ABI guarantee, so that it is safe to use over ABI boundaries
> (e.g. when passing function arguments between TUs compiled with and without
> AVX512). It therefore implements masks as *bitmasks*. That's why you see the
> useless conversion.
> 
> Replace:
> 
> -using fixed_simd_t  = stdx::fixed_size_simd<uint32_t, 8>;
> +using fixed_simd_t  = stdx::simd<uint32_t,
> stdx::simd_abi::deduce_t<uint32_t, 8>>;
> 
> With AVX2 you should get the expected code-gen.
> 
> Wrt. the optimizer, if I had a way to convert vec-mask -> bit-mask ->
> vec-mask in a way that the compiler knows what I'm doing, I'm sure it would
> just optimize it away. ;-)
> 
> FWIW, the C++26 implementation will not have such an "ABI stable" type
> anymore and std::simd::vec<uint32_t, 8> will behave as you expected.

Ah thanks, good to know for C++26!
I wasn't aware of the ABI guarantees, but do they have to be respected within
the same function? Because deduce only works when the size matches, but with
-msse4.2 the issue appears again.
I'm not a C++ expert so I might not be aware what an ABI guarantee implies for
code generation, including within the same unit.

Reply via email to