14 Regression] an extra mov when doing 128bit multiply

moncef.mechri at gmail dot com via Gcc-bugs Sun, 29 Oct 2023 11:01:12 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110551


--- Comment #6 from Moncef Mechri <moncef.mechri at gmail dot com> ---
I confirm the extra mov disappears thanks to Roger's patch.

However, the codegen still seems suboptimal to me when using -march=haswell or
newer, even with Roger's patch:

uint64_t mulx64(uint64_t x)
{
    __uint128_t r = (__uint128_t)x * 0x9E3779B97F4A7C15ull;
    return (uint64_t)r ^ (uint64_t)( r >> 64 );
}


With -O2:

mulx64(unsigned long):
        movabs  rax, -7046029254386353131
        mul     rdi
        xor     rax, rdx
        ret

With -O2 -march=haswell

mulx64(unsigned long):
        movabs  rdx, -7046029254386353131
        mulx    rdi, rsi, rdi
        mov     rax, rdi
        xor     rax, rsi
        ret

So it looks like there is still one extra mov, since I think the optimal
codegen using mulx should be:

mulx64(unsigned long):
        movabs  rdx, -7046029254386353131
        mulx    rax, rsi, rdi
        xor     rax, rsi
        ret

[Bug target/110551] [11/12/13/14 Regression] an extra mov when doing 128bit multiply

Reply via email to