https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110551
--- Comment #6 from Moncef Mechri <moncef.mechri at gmail dot com> --- I confirm the extra mov disappears thanks to Roger's patch. However, the codegen still seems suboptimal to me when using -march=haswell or newer, even with Roger's patch: uint64_t mulx64(uint64_t x) { __uint128_t r = (__uint128_t)x * 0x9E3779B97F4A7C15ull; return (uint64_t)r ^ (uint64_t)( r >> 64 ); } With -O2: mulx64(unsigned long): movabs rax, -7046029254386353131 mul rdi xor rax, rdx ret With -O2 -march=haswell mulx64(unsigned long): movabs rdx, -7046029254386353131 mulx rdi, rsi, rdi mov rax, rdi xor rax, rsi ret So it looks like there is still one extra mov, since I think the optimal codegen using mulx should be: mulx64(unsigned long): movabs rdx, -7046029254386353131 mulx rax, rsi, rdi xor rax, rsi ret