Weddington, Eric wrote: > >> Eric, can you review the assembler routines and say if such reuse is ok or >> if you'd prefer a >> speed-optimized version of __mulsi3 like in the current libgcc? > > Hi Johann, > > Typically a penalty on speed is preferred over a penalty on code size. Do you > already have > information on how it compares on code size with the old routines? > > Eric
The old sizes are 62 __mulsi3 26 __mulhisi3 22 __umulhisi3 10 __xmulhisi3 where the __[u]mulhisi3 will drag in __xmulhisi3 and the insns don't combine with constants. The new implementation has more fragments, the indented modules are dragged in i.e. used by respective function: 12 __mulhisi3 __umulhisi3 __usmulhisi3_tail 30 __umulhisi3 02 __usmulhisi3 10 __usmulhisi3_tail 20 __muluhisi3 __umulhisi3 08 __mulohisi3 04 __mulshisi3 __muluhisi3 30 __mulsi3 __muluhisi3 This means that a pure __mulsi3 will have 30+30+20 = 80 bytes (+18). If all functions are used they occupy 116 bytes (-4), so they actually save a little space if they are used all with the benefit that they also can one-extend, extend 32 = 16*32 as well as 32=16*16 and work for small (17 bit signed) constants. __umulhisi3 reads: DEFUN __umulhisi3 mul A0, B0 movw C0, r0 mul A1, B1 movw C2, r0 mul A0, B1 add C1, r0 adc C2, r1 clr __zero_reg__ adc C3, __zero_reg__ mul A1, B0 add C1, r0 adc C2, r1 clr __zero_reg__ adc C3, __zero_reg__ ret ENDF __umulhisi3 It could be compressed to the following sequence, i.e. 24 bytes instead of 30, but I think that's too much of quenching the last byte out of the code: DEFUN __umulhisi3 mul A0, B0 movw C0, r0 mul A1, B1 movw C2, r0 mul A0, B1 rcall 1f mul A1, B0 1: add C1, r0 adc C2, r1 clr __zero_reg__ adc C3, __zero_reg__ ret ENDF __umulhisi3 In that lack of real-world-code that uses 32-bit arithmetic I trust my intuition that code size will decrease in general ;-) Tiny examples are sometimes misleading because of additional moves from unpleasant register allocation, bit that's a different story... Johann