https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122818
--- Comment #5 from stoventtas at gmail dot com --- (In reply to Matthias Kretz (Vir) from comment #2) > This is due to the use of `fixed_size_simd`. The type implements an > additional ABI guarantee, so that it is safe to use over ABI boundaries > (e.g. when passing function arguments between TUs compiled with and without > AVX512). It therefore implements masks as *bitmasks*. That's why you see the > useless conversion. > > Replace: > > -using fixed_simd_t = stdx::fixed_size_simd<uint32_t, 8>; > +using fixed_simd_t = stdx::simd<uint32_t, > stdx::simd_abi::deduce_t<uint32_t, 8>>; > > With AVX2 you should get the expected code-gen. > > Wrt. the optimizer, if I had a way to convert vec-mask -> bit-mask -> > vec-mask in a way that the compiler knows what I'm doing, I'm sure it would > just optimize it away. ;-) > > FWIW, the C++26 implementation will not have such an "ABI stable" type > anymore and std::simd::vec<uint32_t, 8> will behave as you expected. Ah thanks, good to know for C++26! I wasn't aware of the ABI guarantees, but do they have to be respected within the same function? Because deduce only works when the size matches, but with -msse4.2 the issue appears again. I'm not a C++ expert so I might not be aware what an ABI guarantee implies for code generation, including within the same unit.
