https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82418
--- Comment #2 from Antony Polukhin <antoshkka at gmail dot com> --- I've checked the instructions cost according to the "4. Instruction tables" by By Agner Fog. Technical University of Denmark. For skylake: ; recip throughp Latency Ports μops mov edx, 1374389535 ; 0.25 0 mul edx ; 1 4 p1+p0156 3 mov eax, edx ; 0.25 0-1 shr eax, 5 ; 0.5 1 ; Total: 2 5-6 vs imul rax, rax, 1374389535 ; 1 3 p1 1 shr rax, 37 ; 0.5 1 ; Total: 1.5 4 So it seems that imul version has less average number of core clock cycles per instruction (recip throughp), smaller delay in dependency chain (Latency). imul r64,r64,i consumes less ports than mul r32 while having the less μops for fused domain and for unfused domain. Finally, "imul rax,rax,0x51eb851f" consumes 10 bytes in binary, while mov+mul+mov consumes 8+5+5==18 bytes in binary