https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108804
--- Comment #3 from Hongtao.liu <crazylht at gmail dot com> --- I think the point here is: although it's unit64_t -> float, but the range of x and y can be represent as int32(k & 0x007FFFFF) | 0x3F800000), so we can use int32 -> float instructions which are supported by the backend. So it looks to me a middle-end issue. A simple testcase clang generates vcvtdq2ps but gcc doesn't vectorize. #include<stdint.h> uint64_t d[512]; float f[1024]; void foo() { for (int i=0; i<512; ++i) { uint64_t k = d[i]; f[i]=(k & 0x3F30FFFF); } } manually add convertion then gcc also can do vectorization. #include<stdint.h> uint64_t d[512]; float f[1024]; void foo() { for (int i=0; i<512; ++i) { uint64_t k = d[i]; f[i]=(int)(k & 0x3F30FFFF); } }