https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113560

Roger Sayle <roger at nextmovesoftware dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |roger at nextmovesoftware dot 
com

--- Comment #2 from Roger Sayle <roger at nextmovesoftware dot com> ---
The costs look sane, and I'd expect the synth_mult generated sequence to be
faster, though it would be good to get some microbenchmarking.
A reduced test case is:
__int128 foo(__int128 x) { return x*100; }
The x86 backend thinks that a 128-bit (TImode) multiplication would take 14
cycles, so instead generates:
x2 = x+x        2 cycles
x3 = x2+x       2 cycles
x24 = x<<3      2 cycles
x25 = x24+x     2 cycles
x100 = x<<2     2 cycles
which is a total of 10 cycles, and predicted to be faster than the generic
implementation (requiring 2 IMULQ, 1 MULQ and 2 ADDQ) for
__int128 bar(__int128 x, int y) { return x*y; }

Reply via email to