http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50256
--- Comment #6 from Georg-Johann Lay <gjl at gcc dot gnu.org> 2011-09-04 22:19:53 UTC --- You don't need R20: Simply use %D0 which is cleared, anyway. As %0 is early clobber, it's not an input and you can clear is at the beginning. You don't need to clear R4/R5 (similar R6/R7): Just rearrange multiplications and use (note R6 is (implicitely) 0 at that time) mul %A1,%C2 movw r4,r0 instead of mul %A1,%C2 add r4,r0 adc r5,r1 You don't need to move to answer by hand; just use %A0 instead of R5 etc. and you save moves and register footprint (notice that this interferes with previous hint because you change registers even/odd; it's up to you to work it out and find smartest way of your assembler). Finally, you could let the compiler allocate temporary registers for you, i.e. a 16-bit instead of R2/R3 etc. The compiler knows better which registers are best and will try to use call-clobbered registers instead of expensive call-used ones. All in all, you will get a much greater performance gain by tweaking you code than the compiler could ever do by saving some poor register moves ;-)