Denis Chertykov schrieb: > 2011/10/18 Georg-Johann Lay <a...@gjlay.de>: >> Denis Chertykov schrieb: >>> 2011/10/18 Georg-Johann Lay <a...@gjlay.de>: >>>> Denis Chertykov schrieb: >>>>> 2011/10/18 Georg-Johann Lay <a...@gjlay.de>: >>>>>> This patch do some tweaks to addhi3 like adding QI scratch register. >>>>>> >>>>>> The original *addhi3 insn is still there and located prior to new >>>>>> addhi3_clobber insn because addhi3 is special to reload (thanks Danis >>>>>> for this >>>>>> note) so that there is a version with and a version without scratch >>>>>> register. >>>>>> >>>>>> Patch passes without regressions. >>>>>> >>>>> Which improvements added by this patch ? >>>>> >>>>> Denis. >>>> If the addhi3 is expanded early, the addition happens with QI scratch which >>>> avoids reload of constant if target register is in NO_LD. And reduce >>>> register >>>> pressure as only QI is needed and not reload of constant to HI. >>>> >>>> Otherwise, there might be sequences like >>>> >>>> ldi r31, 2 ; *reload_inhi >>>> mov r12, r31 >>>> clr r13 >>>> >>>> add r14, r12 ; *addhi3 >>>> adc r15, r13 >>>> >>>> which now will be >>>> >>>> ldi r31, 2 ; addhi3_clobber >>>> add r14, r31 >>>> adc r15, __zero_reg__ >>>> >>>> Similar applies if the reload of the constant happens to LD regs: >>>> >>>> ldi r30, 2 ; *movhi >>>> clr r31 >>>> >>>> add r14, r12 ; *addhi3 >>>> adc r15, r13 >>>> >>>> will become >>>> >>>> ldi r30, 2 ; addhi3_clobber >>>> add r14, r30 >>>> adc r15, __zero_reg__ >>>> >>>> For *addhi3 insns the register pressure is not reduced but the insn >>>> sequence >>>> might be smarter if peep2 comes up with a QI scratch or if it detects a >>>> *reload_inhi insn just prior to the addition (and the reg that holds the >>>> reloaded constant dies after the addition). >>>> >>>> As *addhi3 is special to reload, there is still an "ordinary" add addhi >>>> insn >>>> without scratch. This is easier because, e.g. prologue and epilogue >>>> generation >>>> generate add insns (not by means of addhi3 expander but by explicit >>>> gan_rtx_PLUS). Yet the addhi3 expander factors out the situations when an >>>> addhi3 insn is to be generated via addhi3 expander late in the compilation >>>> process >>> Please provide any real world example. >>> >>> Denis. >> Consider avr-libc (under the assumption that it is "real world" code): >> >> In avr-libc's build directory, and with the patch integrated: >> >> $ cd avr/lib/avr4 >> $ make clean && make CFLAGS='-save-temps -dp -Os' >> $ grep -A 2 'addhi3_clobber\/2' *.s > out-nopeep2.txt (see attachment) >> $ grep 'addhi3_clobber\/2' *.s | wc -l >> 33 >> >> This shows that the insns are already there before peep2 and thus no reload >> of >> 16-bit constant is needed; an 8-bit scratch is sufficient. >> >> Alternatively, the implementation could omit the expansion to addhi3_clobber >> in >> addhi3 expander and instead rely completely on peep2. However, that does not >> reduce register pressure because a 16-bit register will be allocated and the >> peep2 just prints things smarter and needs just a QI scratch to call >> avr_out_plus_clobber. >> >> For +/-1, the addition with SEC/ADD/ADC resp. SEC/SBC/SBC leaves cc0 in a >> mess. >> as most loops use +/-1 on the counter variable, LDI/SUB/SBC is not shorter >> but >> better because it sets cc0. >> >> So you like this patch? >> Or prefer a patch that is neutral with respect to register allocator and just >> uses peep2 to print things smarter? > > I'm interested in code improvements. > What difference in size of avr-libc ? > > Denis.
I have to tool for smart size analysis, so here is just a diff: After rebuilding avr-libc with respective compiler version, did respectively: $ find . -name 'lib[mc].a' -exec avr-size {} ';' > size-orig.txt $ find . -name 'lib[mc].a' -exec avr-size {} ';' > size-patch.txt and then $ diff -U 0 size-orig.txt size-patch.txt > size.diff As far as I can see, there is not a big gain but no object increases in size. For some files like ./avr/lib/avr2/libc.a:dtoa_prf.o size gain is 3%. For ./avr/lib/avr4/libc.a:vfprintf_std.o it's 1.7% and for others just one instruction better. Johann
--- size-orig.txt 2011-10-18 19:59:52.000000000 +0200 +++ size-patch.txt 2011-10-18 19:50:59.000000000 +0200 @@ -7 +7 @@ - 750 0 0 750 2ee dtoa_prf.o (ex ./avr/lib/avr51/libc.a) + 724 0 0 724 2d4 dtoa_prf.o (ex ./avr/lib/avr51/libc.a) @@ -11 +11 @@ - 722 6 0 728 2d8 malloc.o (ex ./avr/lib/avr51/libc.a) + 720 6 0 726 2d6 malloc.o (ex ./avr/lib/avr51/libc.a) @@ -15,2 +15,2 @@ - 510 0 0 510 1fe realloc.o (ex ./avr/lib/avr51/libc.a) - 747 0 0 747 2eb strtod.o (ex ./avr/lib/avr51/libc.a) + 506 0 0 506 1fa realloc.o (ex ./avr/lib/avr51/libc.a) + 739 0 0 739 2e3 strtod.o (ex ./avr/lib/avr51/libc.a) @@ -18 +18 @@ - 536 0 0 536 218 strtoul.o (ex ./avr/lib/avr51/libc.a) + 530 0 0 530 212 strtoul.o (ex ./avr/lib/avr51/libc.a) @@ -246,2 +246,2 @@ - 1042 0 0 1042 412 vfprintf_std.o (ex ./avr/lib/avr51/libc.a) - 1490 0 0 1490 5d2 vfscanf_std.o (ex ./avr/lib/avr51/libc.a) + 1026 0 0 1026 402 vfprintf_std.o (ex ./avr/lib/avr51/libc.a) + 1488 0 0 1488 5d0 vfscanf_std.o (ex ./avr/lib/avr51/libc.a) @@ -423 +423 @@ - 688 0 0 688 2b0 dtoa_prf.o (ex ./avr/lib/avr35/libc.a) + 670 0 0 670 29e dtoa_prf.o (ex ./avr/lib/avr35/libc.a) @@ -427 +427 @@ - 708 6 0 714 2ca malloc.o (ex ./avr/lib/avr35/libc.a) + 706 6 0 712 2c8 malloc.o (ex ./avr/lib/avr35/libc.a) @@ -431,3 +431,3 @@ - 440 0 0 440 1b8 realloc.o (ex ./avr/lib/avr35/libc.a) - 733 0 0 733 2dd strtod.o (ex ./avr/lib/avr35/libc.a) - 564 0 0 564 234 strtol.o (ex ./avr/lib/avr35/libc.a) + 436 0 0 436 1b4 realloc.o (ex ./avr/lib/avr35/libc.a) + 725 0 0 725 2d5 strtod.o (ex ./avr/lib/avr35/libc.a) + 562 0 0 562 232 strtol.o (ex ./avr/lib/avr35/libc.a) @@ -662,2 +662,2 @@ - 964 0 0 964 3c4 vfprintf_std.o (ex ./avr/lib/avr35/libc.a) - 1352 0 0 1352 548 vfscanf_std.o (ex ./avr/lib/avr35/libc.a) + 948 0 0 948 3b4 vfprintf_std.o (ex ./avr/lib/avr35/libc.a) + 1350 0 0 1350 546 vfscanf_std.o (ex ./avr/lib/avr35/libc.a) @@ -815 +815 @@ - 682 0 0 682 2aa dtoa_prf.o (ex ./avr/lib/avr25/libc.a) + 664 0 0 664 298 dtoa_prf.o (ex ./avr/lib/avr25/libc.a) @@ -819 +819 @@ - 704 6 0 710 2c6 malloc.o (ex ./avr/lib/avr25/libc.a) + 702 6 0 708 2c4 malloc.o (ex ./avr/lib/avr25/libc.a) @@ -823,3 +823,3 @@ - 426 0 0 426 1aa realloc.o (ex ./avr/lib/avr25/libc.a) - 713 0 0 713 2c9 strtod.o (ex ./avr/lib/avr25/libc.a) - 554 0 0 554 22a strtol.o (ex ./avr/lib/avr25/libc.a) + 422 0 0 422 1a6 realloc.o (ex ./avr/lib/avr25/libc.a) + 705 0 0 705 2c1 strtod.o (ex ./avr/lib/avr25/libc.a) + 552 0 0 552 228 strtol.o (ex ./avr/lib/avr25/libc.a) @@ -1054,2 +1054,2 @@ - 930 0 0 930 3a2 vfprintf_std.o (ex ./avr/lib/avr25/libc.a) - 1286 0 0 1286 506 vfscanf_std.o (ex ./avr/lib/avr25/libc.a) + 914 0 0 914 392 vfprintf_std.o (ex ./avr/lib/avr25/libc.a) + 1284 0 0 1284 504 vfscanf_std.o (ex ./avr/lib/avr25/libc.a) @@ -1447 +1447 @@ - 758 0 0 758 2f6 dtoa_prf.o (ex ./avr/lib/avr31/libc.a) + 734 0 0 734 2de dtoa_prf.o (ex ./avr/lib/avr31/libc.a) @@ -1451 +1451 @@ - 752 6 0 758 2f6 malloc.o (ex ./avr/lib/avr31/libc.a) + 750 6 0 756 2f4 malloc.o (ex ./avr/lib/avr31/libc.a) @@ -1455,4 +1455,4 @@ - 464 0 0 464 1d0 realloc.o (ex ./avr/lib/avr31/libc.a) - 811 0 0 811 32b strtod.o (ex ./avr/lib/avr31/libc.a) - 634 0 0 634 27a strtol.o (ex ./avr/lib/avr31/libc.a) - 616 0 0 616 268 strtoul.o (ex ./avr/lib/avr31/libc.a) + 466 0 0 466 1d2 realloc.o (ex ./avr/lib/avr31/libc.a) + 809 0 0 809 329 strtod.o (ex ./avr/lib/avr31/libc.a) + 630 0 0 630 276 strtol.o (ex ./avr/lib/avr31/libc.a) + 614 0 0 614 266 strtoul.o (ex ./avr/lib/avr31/libc.a) @@ -1686,2 +1686,2 @@ - 1064 0 0 1064 428 vfprintf_std.o (ex ./avr/lib/avr31/libc.a) - 1582 0 0 1582 62e vfscanf_std.o (ex ./avr/lib/avr31/libc.a) + 1046 0 0 1046 416 vfprintf_std.o (ex ./avr/lib/avr31/libc.a) + 1580 0 0 1580 62c vfscanf_std.o (ex ./avr/lib/avr31/libc.a) @@ -1791 +1791 @@ - 750 0 0 750 2ee dtoa_prf.o (ex ./avr/lib/avr6/libc.a) + 724 0 0 724 2d4 dtoa_prf.o (ex ./avr/lib/avr6/libc.a) @@ -1795 +1795 @@ - 722 6 0 728 2d8 malloc.o (ex ./avr/lib/avr6/libc.a) + 720 6 0 726 2d6 malloc.o (ex ./avr/lib/avr6/libc.a) @@ -1799,2 +1799,2 @@ - 508 0 0 508 1fc realloc.o (ex ./avr/lib/avr6/libc.a) - 747 0 0 747 2eb strtod.o (ex ./avr/lib/avr6/libc.a) + 504 0 0 504 1f8 realloc.o (ex ./avr/lib/avr6/libc.a) + 739 0 0 739 2e3 strtod.o (ex ./avr/lib/avr6/libc.a) @@ -1802 +1802 @@ - 536 0 0 536 218 strtoul.o (ex ./avr/lib/avr6/libc.a) + 530 0 0 530 212 strtoul.o (ex ./avr/lib/avr6/libc.a) @@ -2030,2 +2030,2 @@ - 1042 0 0 1042 412 vfprintf_std.o (ex ./avr/lib/avr6/libc.a) - 1490 0 0 1490 5d2 vfscanf_std.o (ex ./avr/lib/avr6/libc.a) + 1026 0 0 1026 402 vfprintf_std.o (ex ./avr/lib/avr6/libc.a) + 1488 0 0 1488 5d0 vfscanf_std.o (ex ./avr/lib/avr6/libc.a) @@ -2135 +2135 @@ - 758 0 0 758 2f6 dtoa_prf.o (ex ./avr/lib/avr3/libc.a) + 734 0 0 734 2de dtoa_prf.o (ex ./avr/lib/avr3/libc.a) @@ -2139 +2139 @@ - 752 6 0 758 2f6 malloc.o (ex ./avr/lib/avr3/libc.a) + 750 6 0 756 2f4 malloc.o (ex ./avr/lib/avr3/libc.a) @@ -2143,4 +2143,4 @@ - 464 0 0 464 1d0 realloc.o (ex ./avr/lib/avr3/libc.a) - 811 0 0 811 32b strtod.o (ex ./avr/lib/avr3/libc.a) - 634 0 0 634 27a strtol.o (ex ./avr/lib/avr3/libc.a) - 616 0 0 616 268 strtoul.o (ex ./avr/lib/avr3/libc.a) + 466 0 0 466 1d2 realloc.o (ex ./avr/lib/avr3/libc.a) + 809 0 0 809 329 strtod.o (ex ./avr/lib/avr3/libc.a) + 630 0 0 630 276 strtol.o (ex ./avr/lib/avr3/libc.a) + 614 0 0 614 266 strtoul.o (ex ./avr/lib/avr3/libc.a) @@ -2374,2 +2374,2 @@ - 1064 0 0 1064 428 vfprintf_std.o (ex ./avr/lib/avr3/libc.a) - 1582 0 0 1582 62e vfscanf_std.o (ex ./avr/lib/avr3/libc.a) + 1046 0 0 1046 416 vfprintf_std.o (ex ./avr/lib/avr3/libc.a) + 1580 0 0 1580 62c vfscanf_std.o (ex ./avr/lib/avr3/libc.a) @@ -2527 +2527 @@ - 688 0 0 688 2b0 dtoa_prf.o (ex ./avr/lib/avr5/libc.a) + 670 0 0 670 29e dtoa_prf.o (ex ./avr/lib/avr5/libc.a) @@ -2531 +2531 @@ - 708 6 0 714 2ca malloc.o (ex ./avr/lib/avr5/libc.a) + 706 6 0 712 2c8 malloc.o (ex ./avr/lib/avr5/libc.a) @@ -2535,2 +2535,2 @@ - 440 0 0 440 1b8 realloc.o (ex ./avr/lib/avr5/libc.a) - 719 0 0 719 2cf strtod.o (ex ./avr/lib/avr5/libc.a) + 436 0 0 436 1b4 realloc.o (ex ./avr/lib/avr5/libc.a) + 711 0 0 711 2c7 strtod.o (ex ./avr/lib/avr5/libc.a) @@ -2538 +2538 @@ - 492 0 0 492 1ec strtoul.o (ex ./avr/lib/avr5/libc.a) + 486 0 0 486 1e6 strtoul.o (ex ./avr/lib/avr5/libc.a) @@ -2766,2 +2766,2 @@ - 960 0 0 960 3c0 vfprintf_std.o (ex ./avr/lib/avr5/libc.a) - 1352 0 0 1352 548 vfscanf_std.o (ex ./avr/lib/avr5/libc.a) + 944 0 0 944 3b0 vfprintf_std.o (ex ./avr/lib/avr5/libc.a) + 1350 0 0 1350 546 vfscanf_std.o (ex ./avr/lib/avr5/libc.a) @@ -3855 +3855 @@ - 682 0 0 682 2aa dtoa_prf.o (ex ./avr/lib/avr4/libc.a) + 664 0 0 664 298 dtoa_prf.o (ex ./avr/lib/avr4/libc.a) @@ -3859 +3859 @@ - 704 6 0 710 2c6 malloc.o (ex ./avr/lib/avr4/libc.a) + 702 6 0 708 2c4 malloc.o (ex ./avr/lib/avr4/libc.a) @@ -3863,2 +3863,2 @@ - 426 0 0 426 1aa realloc.o (ex ./avr/lib/avr4/libc.a) - 697 0 0 697 2b9 strtod.o (ex ./avr/lib/avr4/libc.a) + 422 0 0 422 1a6 realloc.o (ex ./avr/lib/avr4/libc.a) + 689 0 0 689 2b1 strtod.o (ex ./avr/lib/avr4/libc.a) @@ -3866 +3866 @@ - 482 0 0 482 1e2 strtoul.o (ex ./avr/lib/avr4/libc.a) + 476 0 0 476 1dc strtoul.o (ex ./avr/lib/avr4/libc.a) @@ -4094,2 +4094,2 @@ - 930 0 0 930 3a2 vfprintf_std.o (ex ./avr/lib/avr4/libc.a) - 1286 0 0 1286 506 vfscanf_std.o (ex ./avr/lib/avr4/libc.a) + 914 0 0 914 392 vfprintf_std.o (ex ./avr/lib/avr4/libc.a) + 1284 0 0 1284 504 vfscanf_std.o (ex ./avr/lib/avr4/libc.a) @@ -4379 +4379 @@ - 752 0 0 752 2f0 dtoa_prf.o (ex ./avr/lib/avr2/libc.a) + 728 0 0 728 2d8 dtoa_prf.o (ex ./avr/lib/avr2/libc.a) @@ -4383 +4383 @@ - 748 6 0 754 2f2 malloc.o (ex ./avr/lib/avr2/libc.a) + 746 6 0 752 2f0 malloc.o (ex ./avr/lib/avr2/libc.a) @@ -4387,4 +4387,4 @@ - 450 0 0 450 1c2 realloc.o (ex ./avr/lib/avr2/libc.a) - 791 0 0 791 317 strtod.o (ex ./avr/lib/avr2/libc.a) - 624 0 0 624 270 strtol.o (ex ./avr/lib/avr2/libc.a) - 606 0 0 606 25e strtoul.o (ex ./avr/lib/avr2/libc.a) + 452 0 0 452 1c4 realloc.o (ex ./avr/lib/avr2/libc.a) + 789 0 0 789 315 strtod.o (ex ./avr/lib/avr2/libc.a) + 620 0 0 620 26c strtol.o (ex ./avr/lib/avr2/libc.a) + 604 0 0 604 25c strtoul.o (ex ./avr/lib/avr2/libc.a) @@ -4618,2 +4618,2 @@ - 1030 0 0 1030 406 vfprintf_std.o (ex ./avr/lib/avr2/libc.a) - 1516 0 0 1516 5ec vfscanf_std.o (ex ./avr/lib/avr2/libc.a) + 1012 0 0 1012 3f4 vfprintf_std.o (ex ./avr/lib/avr2/libc.a) + 1514 0 0 1514 5ea vfscanf_std.o (ex ./avr/lib/avr2/libc.a)