2011/11/20 Georg-Johann Lay <a...@gjlay.de>: > This implements assembler drop-in replacement for 64-bit division/remainder. > > The original libgcc implementation is extremely resource gulping because it > uses inline in several places and DImode is resource gulping, anyway. > > With the patch the sizes (accumulated over all modules with same name) are: > > Col #2 = Original libgcc (C-Code) > Col #3 = New implementation in Asm > Col #4 = Change (relative) > Col #5 = Change (absolute) > > _udivmod64.o : 0 1362 100000.0% 1362 > _negdi2.o : 1296 288 -77.8% -1008 > _umoddi3.o : 22326 0 -100.0% -22326 > _udivdi3.o : 26878 246 -99.1% -26632 > _moddi3.o : 28986 0 -100.0% -28986 > _divdi3.o : 33076 840 -97.5% -32236 > :::::: Total ::::::: : 362188 252362 -30.3% -109826 > > The detailed size report: > > avr3/libgcc.a!_udivmod64.o : 0 174 100000.0% 174 > avr31/libgcc.a!_udivmod64.o : 0 174 100000.0% 174 > avr6/libgcc.a!_udivmod64.o : 0 162 100000.0% 162 > avr5/libgcc.a!_udivmod64.o : 0 162 100000.0% 162 > avr35/libgcc.a!_udivmod64.o : 0 162 100000.0% 162 > avr51/libgcc.a!_udivmod64.o : 0 162 100000.0% 162 > avr25/libgcc.a!_udivmod64.o : 0 126 100000.0% 126 > avr4/libgcc.a!_udivmod64.o : 0 126 100000.0% 126 > libgcc.a!_udivmod64.o : 0 114 100000.0% 114 > avr4/libgcc.a!_negdi2.o : 140 32 -77.1% -108 > avr25/libgcc.a!_negdi2.o : 140 32 -77.1% -108 > avr5/libgcc.a!_negdi2.o : 144 32 -77.8% -112 > avr6/libgcc.a!_negdi2.o : 144 32 -77.8% -112 > libgcc.a!_negdi2.o : 144 32 -77.8% -112 > avr35/libgcc.a!_negdi2.o : 144 32 -77.8% -112 > avr51/libgcc.a!_negdi2.o : 144 32 -77.8% -112 > avr31/libgcc.a!_negdi2.o : 148 32 -78.4% -116 > avr3/libgcc.a!_negdi2.o : 148 32 -78.4% -116 > avr4/libgcc.a!_umoddi3.o : 2304 0 -100.0% -2304 > avr6/libgcc.a!_umoddi3.o : 2360 0 -100.0% -2360 > avr51/libgcc.a!_umoddi3.o : 2360 0 -100.0% -2360 > avr5/libgcc.a!_umoddi3.o : 2360 0 -100.0% -2360 > avr25/libgcc.a!_umoddi3.o : 2364 0 -100.0% -2364 > avr35/libgcc.a!_umoddi3.o : 2420 0 -100.0% -2420 > libgcc.a!_umoddi3.o : 2682 0 -100.0% -2682 > avr3/libgcc.a!_umoddi3.o : 2738 0 -100.0% -2738 > avr31/libgcc.a!_umoddi3.o : 2738 0 -100.0% -2738 > avr4/libgcc.a!_udivdi3.o : 2784 26 -99.1% -2758 > avr25/libgcc.a!_udivdi3.o : 2828 26 -99.1% -2802 > avr5/libgcc.a!_udivdi3.o : 2852 28 -99.0% -2824 > avr6/libgcc.a!_udivdi3.o : 2852 28 -99.0% -2824 > avr51/libgcc.a!_udivdi3.o : 2852 28 -99.0% -2824 > avr35/libgcc.a!_udivdi3.o : 2896 28 -99.0% -2868 > avr4/libgcc.a!_moddi3.o : 3016 0 -100.0% -3016 > avr5/libgcc.a!_moddi3.o : 3072 0 -100.0% -3072 > avr6/libgcc.a!_moddi3.o : 3072 0 -100.0% -3072 > avr51/libgcc.a!_moddi3.o : 3072 0 -100.0% -3072 > avr25/libgcc.a!_moddi3.o : 3124 0 -100.0% -3124 > avr35/libgcc.a!_moddi3.o : 3180 0 -100.0% -3180 > libgcc.a!_udivdi3.o : 3226 26 -99.2% -3200 > avr31/libgcc.a!_udivdi3.o : 3294 28 -99.1% -3266 > avr3/libgcc.a!_udivdi3.o : 3294 28 -99.1% -3266 > avr4/libgcc.a!_divdi3.o : 3396 86 -97.5% -3310 > avr5/libgcc.a!_divdi3.o : 3464 98 -97.2% -3366 > avr51/libgcc.a!_divdi3.o : 3464 98 -97.2% -3366 > avr6/libgcc.a!_divdi3.o : 3464 98 -97.2% -3366 > libgcc.a!_moddi3.o : 3446 0 -100.0% -3446 > avr25/libgcc.a!_divdi3.o : 3578 86 -97.6% -3492 > avr31/libgcc.a!_moddi3.o : 3502 0 -100.0% -3502 > avr3/libgcc.a!_moddi3.o : 3502 0 -100.0% -3502 > avr35/libgcc.a!_divdi3.o : 3646 98 -97.3% -3548 > libgcc.a!_divdi3.o : 3976 78 -98.0% -3898 > avr31/libgcc.a!_divdi3.o : 4044 100 -97.5% -3944 > avr3/libgcc.a!_divdi3.o : 4044 98 -97.6% -3946 > :::::: Total ::::::: : 362188 252362 -30.3% -109826 > > The implementation is basically the same as the division/modulo already > present > in lib1funcs.S. However, the algorithm does not compute the complement by > recycling the carry bit, instead it expands directly to the quotient. That > way, > n instructions can be saved when dealing with n-byte values. > > The implementation provides speed-up of the algorithm for the case when there > is enough flash. The assumption is that speed for arithmetic matters. > > As you can see above, the size of __udivmod64 varies from 114 to 174 bytes: > > 114 = small devices without MOVW (no speed-up: SPEED_DIV = 0) > 126 = small devices with MOVW (small speed-up: SPEED_DIV = 16) > 162, 174 = devices >= 16k (best speed-up: SPEED_DIV = 8) > > Passed without regressions. > > Moreover, the algorithm is individually tested against the old implementation. > The only difference I observed was for divisor = 0. > > Ok for trunk? > > Johann > > libgcc/ > PR target/49313 > * config/avr/t-avr (LIB2FUNCS_EXCLUDE): Add _moddi3, _umoddi3. > (LIB1ASMFUNCS): Add _divdi3, _udivdi3, _udivmod64, _negdi2. > > * config/avr/lib1funcs.S (wmov): New assembler macro. > (__umoddi3, __udivdi3, __udivdi3_umoddi3): New functions. > (__moddi3, __divdi3, __divdi3_moddi3): New functions. > (__udivmod64): New function. > (__negdi2): New function. >
Approved. Denis.