https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70703
--- Comment #10 from Jakub Jelinek <jakub at gcc dot gnu.org> --- Created attachment 41094 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=41094&action=edit gcc7-pr70703-widen.patch The widening_mult change. We get tiny bit better code with it with the #c0 testcase: - movl $6700417, %ecx - movl %ecx, %eax + movl $6700417, %edx + movl %edx, %eax mull 4(%esp) - movl %edx, %ecx - movl %ecx, %eax + movl %edx, %eax but still not ideal. On the other side, we regress on -m64: unsigned long foo (unsigned long x) { return ((__uint128_t) x * 0x663d811234567ULL) >> 64; } - movabsq $1798629511873895, %rax - mulq %rdi + movq %rdi, %rax + movabsq $1798629511873895, %rdx + mulq %rdx Another option is to deal with this at combine time, I see on the unpatched compiler: Failed to match this instruction: (set (reg:SI 95) (subreg:SI (mult:DI (zero_extend:DI (mem/c:SI (reg/f:SI 16 argp) [1 x+0 S4 A32])) (const_int 6700417 [0x663d81])) 4)) Maybe we could add some define_insn_and_split that would deal with this and make sure the constant is forced into a register (if the constant has depending on <s> all upper bits zero or set) and transform it into the highpart insns? Though, I'm worried about the regression above we got with the TImode highpart.