[Bug rtl-optimization/90706] [10/11/12/13 Regression] Useless code generated for stack / register operations on AVR
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90706 Georg-Johann Lay changed: What|Removed |Added Target Milestone|10.5|12.3
[Bug rtl-optimization/90706] [10/11/12/13 Regression] Useless code generated for stack / register operations on AVR
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90706 --- Comment #24 from Georg-Johann Lay --- (In reply to Georg-Johann Lay from comment #23) > As it appears, this bug is not fixed completely. For the -mmcu=avrtiny > architecture, there is still bloat for even the smallest test cases like: Different story, f'up to PR113927.
[Bug rtl-optimization/90706] [10/11/12/13 Regression] Useless code generated for stack / register operations on AVR
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90706 --- Comment #23 from Georg-Johann Lay --- Created attachment 55130 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55130&action=edit Test case for -Os -mmcu=attiny40 As it appears, this bug is not fixed completely. For the -mmcu=avrtiny architecture, there is still bloat for even the smallest test cases like: $ avr-gcc bloat.c -mmcu=attiny40 -Os -S char func3 (char c) { return 1 + c; } "GCC: (GNU) 14.0.0 20230520 (experimental)" compiles this to: func3: push r28 ; 22 [c=4 l=1] pushqi1/0 push r29 ; 23 [c=4 l=1] pushqi1/0 push __tmp_reg__ ; 27 [c=4 l=1] *addhi3_sp in r28,__SP_L__ ; 38 [c=4 l=2] *movhi/7 in r29,__SP_H__ /* prologue: function */ /* frame size = 1 */ /* stack size = 3 */ mov r20,r24 ; 18 [c=4 l=1] movqi_insn/0 subi r20,lo8(-(1)) ; 19 [c=4 l=1] *addqi3/1 mov r24,r20 ; 21 [c=4 l=1] movqi_insn/0 /* epilogue start */ pop __tmp_reg__ ; 33 [c=4 l=1] *addhi3_sp pop r29 ; 34 [c=4 l=1] popqi pop r28 ; 35 [c=4 l=1] popqi ret ; 36 [c=0 l=1] return_from_epilogue For reference, avr-gcc v8 generates for this function: func3: /* prologue: function */ /* frame size = 0 */ /* stack size = 0 */ .L__stack_usage = 0 subi r24,lo8(-(1)) ; 6 [c=4 l=1] addqi3/1 /* epilogue start */ ret ; 17 [c=0 l=1] return
[Bug rtl-optimization/90706] [10/11/12/13 Regression] Useless code generated for stack / register operations on AVR
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90706 Georg-Johann Lay changed: What|Removed |Added Known to fail||10.0, 11.0, 12.0, 9.0 Status|NEW |RESOLVED Resolution|--- |FIXED Known to work||13.0 --- Comment #22 from Georg-Johann Lay --- Fixed in 12.3+
[Bug rtl-optimization/90706] [10/11/12/13 Regression] Useless code generated for stack / register operations on AVR
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90706 --- Comment #21 from Vladimir Makarov --- (In reply to CVS Commits from comment #20) > The releases/gcc-12 branch has been updated by Vladimir Makarov > : > > https://gcc.gnu.org/g:88792f04e5c63025506244b9ac7186a3cc10c25a > > The trunk with the patch behaved good for a few weeks. So I backported it to gcc-12 branch. GCC-12 branch with the patch was successfully tested and bootstrapped on x86-64.
[Bug rtl-optimization/90706] [10/11/12/13 Regression] Useless code generated for stack / register operations on AVR
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90706 --- Comment #20 from CVS Commits --- The releases/gcc-12 branch has been updated by Vladimir Makarov : https://gcc.gnu.org/g:88792f04e5c63025506244b9ac7186a3cc10c25a commit r12-9372-g88792f04e5c63025506244b9ac7186a3cc10c25a Author: Vladimir N. Makarov Date: Thu Mar 2 16:29:05 2023 -0500 IRA: Use minimal cost for hard register movement This is the 2nd attempt to fix PR90706. IRA calculates wrong AVR costs for moving general hard regs of SFmode. This was the reason for spilling a pseudo in the PR. In this patch we use smaller move cost of hard reg in its natural and operand modes. PR rtl-optimization/90706 gcc/ChangeLog: * ira-costs.cc: Include print-rtl.h. (record_reg_classes, scan_one_insn): Add code to print debug info. (record_operand_costs): Find and use smaller cost for hard reg move. gcc/testsuite/ChangeLog: * gcc.target/avr/pr90706.c: New.
[Bug rtl-optimization/90706] [10/11/12/13 Regression] Useless code generated for stack / register operations on AVR
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90706 Georg-Johann Lay changed: What|Removed |Added Known to work||8.5.0 --- Comment #19 from Georg-Johann Lay --- (In reply to CVS Commits from comment #18) > https://gcc.gnu.org/g:2639f9d2313664e6b4ed2f8131fefa60aeeb0518 > > commit r13-6424-g2639f9d2313664e6b4ed2f8131fefa60aeeb0518 > Author: Vladimir N. Makarov > Date: Thu Mar 2 16:29:05 2023 -0500 > > IRA: Use minimal cost for hard register movement Thank you; the code looks clean now. (For my test case from comment #16 I needed -fno-split wide-types which is a different story). Is there any chance your fix will be back-ported?
[Bug rtl-optimization/90706] [10/11/12/13 Regression] Useless code generated for stack / register operations on AVR
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90706 --- Comment #18 from CVS Commits --- The master branch has been updated by Vladimir Makarov : https://gcc.gnu.org/g:2639f9d2313664e6b4ed2f8131fefa60aeeb0518 commit r13-6424-g2639f9d2313664e6b4ed2f8131fefa60aeeb0518 Author: Vladimir N. Makarov Date: Thu Mar 2 16:29:05 2023 -0500 IRA: Use minimal cost for hard register movement This is the 2nd attempt to fix PR90706. IRA calculates wrong AVR costs for moving general hard regs of SFmode. This was the reason for spilling a pseudo in the PR. In this patch we use smaller move cost of hard reg in its natural and operand modes. PR rtl-optimization/90706 gcc/ChangeLog: * ira-costs.cc: Include print-rtl.h. (record_reg_classes, scan_one_insn): Add code to print debug info. (record_operand_costs): Find and use smaller cost for hard reg move. gcc/testsuite/ChangeLog: * gcc.target/avr/pr90706.c: New.
[Bug rtl-optimization/90706] [10/11/12/13 Regression] Useless code generated for stack / register operations on AVR
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90706 --- Comment #17 from Vladimir Makarov --- I've reverted my patch as it resulted in two new PRs. I'll do more work on this PR and I'll start this job in Jan.
[Bug rtl-optimization/90706] [10/11/12/13 Regression] Useless code generated for stack / register operations on AVR
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90706 Georg-Johann Lay changed: What|Removed |Added CC||gjl at gcc dot gnu.org --- Comment #16 from Georg-Johann Lay --- Created attachment 54113 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=54113&action=edit More elaborate C test case. This is a more complicated test case, compile with > avr-gcc -c pi-i.c -mmcu=atmega8 -Os -mcall-prologues -fno-tree-loop-optimize > -fno-move-loop-invariants && avr-size pi-i.o Code sizes are: 664 with avr-gcc v8.5 992 with avr-gcc v11.3 834 with avr-gcc master with the change from comment #13 So there is a clear improvement with patch #13, but size is still +25% compared to v8. What also has an effect is -fno-split-wide-types. The test case mostly operates on float; unfortunately I don't have a similar test-case for 32-bit integers at hand.
[Bug rtl-optimization/90706] [10/11/12/13 Regression] Useless code generated for stack / register operations on AVR
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90706 --- Comment #15 from CVS Commits --- The master branch has been updated by Vladimir Makarov : https://gcc.gnu.org/g:12abd5a7d13209f79664ea603b3f3517f71b8c4f commit r13-4727-g12abd5a7d13209f79664ea603b3f3517f71b8c4f Author: Vladimir N. Makarov Date: Thu Dec 15 14:11:05 2022 -0500 IRA: Check that reg classes contain a hard reg of given mode in reg move cost calculation IRA calculates wrong AVR costs for moving general hard regs of SFmode. To calculate the costs we did not exclude sub-classes which do not contain hard regs of given mode. This was the reason for spilling a pseudo in the PR. The patch fixes this. PR rtl-optimization/90706 gcc/ChangeLog: * ira-costs.cc: Include print-rtl.h. (record_reg_classes, scan_one_insn): Add code to print debug info. * ira.cc (ira_init_register_move_cost): Check that at least one hard reg of the mode are in the class contents to calculate the register move costs. gcc/testsuite/ChangeLog: * gcc.target/avr/pr90706.c: New.
[Bug rtl-optimization/90706] [10/11/12/13 Regression] Useless code generated for stack / register operations on AVR
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90706 Vladimir Makarov changed: What|Removed |Added CC||vmakarov at gcc dot gnu.org --- Comment #14 from Vladimir Makarov --- What I see is the input to RA was significantly changed sing gcc-8 (see insns marked by !). A lot of subregs is generated now and there is no promotion of (argument) hard regs (insns 44-47) because of https://gcc.gnu.org/legacy-ml/gcc-patches/2018-10/msg01356.html. 1: NOTE_INSN_DELETED 1: NOTE_INSN_DELETED 4: NOTE_INSN_BASIC_BLOCK 2 4: NOTE_INSN_BASIC_BLOCK 2 2: r44:SF=r22:SF44: r56:QI=r22:QI REG_DEAD r22:SF REG_DEAD r22:QI 3: NOTE_INSN_FUNCTION_BEG 45: r57:QI=r23:QI 6: r45:QI=0x1 REG_DEAD r23:QI REG_EQUAL 0x1 46: r58:QI=r24:QI 7: r18:SF=0.0 REG_DEAD r24:QI ! 8: r22:SF=r44:SF47: r59:QI=r25:QI REG_DEAD r44:SF REG_DEAD r25:QI 9: r24:QI=call [`__gtsf2'] argc:0 48: r52:QI=r56:QI REG_DEAD r25:QI REG_DEAD r56:QI REG_DEAD r23:QI 49: r53:QI=r57:QI REG_DEAD r22:QI REG_DEAD r57:QI REG_DEAD r18:SF 50: r54:QI=r58:QI REG_CALL_DECL `__gtsf2' REG_DEAD r58:QI REG_EH_REGION 0x8000 51: r55:QI=r59:QI 10: NOTE_INSN_DELETED REG_DEAD r59:QI 11: cc0=cmp(r24:QI,0) 3: NOTE_INSN_FUNCTION_BEG REG_DEAD r24:QI6: r46:QI=0x1 12: pc={(cc0>0)?L14:pc} REG_EQUAL 0x1 REG_BR_PROB 633507684 7: r18:SF=0.0 22: NOTE_INSN_BASIC_BLOCK 3! 52: clobber r60:SI 13: r45:QI=0 ! 53: r60:SI#0=r52:QI REG_EQUAL 0 REG_DEAD r52:QI 14: L14: ! 54: r60:SI#1=r53:QI 23: NOTE_INSN_BASIC_BLOCK 4 REG_DEAD r53:QI 19: r24:QI=r45:QI ! 55: r60:SI#2=r54:QI REG_DEAD r45:QI REG_DEAD r54:QI 20: use r24:QI ! 56: r60:SI#3=r55:QI REG_DEAD r55:QI ! 57: r22:SF=r60:SI#0 REG_DEAD r60:SI 9: r24:QI=call [`__gtsf2'] argc:0 REG_DEAD r25:QI REG_DEAD r23:QI REG_DEAD r22:QI REG_DEAD r18:SF REG_CALL_DECL `__gtsf2' REG_EH_REGION 0x8000 34: r50:QI=r24:QI REG_DEAD r24:QI 10: NOTE_INSN_DELETED 11: pc={(r50:QI>0)?L13:pc} REG_DEAD r50:QI REG_BR_PROB 633507684 21: NOTE_INSN_BASIC_BLOCK 3 12: r46:QI=0 REG_EQUAL 0 13: L13: 22: NOTE_INSN_BASIC_BLOCK 4 18: r24:QI=r46:QI REG_DEAD r46:QI 19: use r24:QI Currently, GCC generates the following AVR code: check: push r28 push r29 rcall . rcall . push __tmp_reg__ in r28,__SP_L__ in r29,__SP_H__ /* prologue: function */ /* frame size = 5 */ /* stack size = 7 */ .L__stack_usage = 7 ldi r18,lo8(1) std Y+5,r18 ldi r18,0 ldi r19,0 ldi r20,0 ldi r21,0 ! std Y+1,r22 ! std Y+2,r23 ! std Y+3,r24 ! std Y+4,r25 ! ldd r22,Y+1 !
[Bug rtl-optimization/90706] [10/11/12/13 Regression] Useless code generated for stack / register operations on AVR
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90706 --- Comment #13 from Georg-Johann Lay --- Created attachment 53812 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53812&action=edit Test case with 32-bit integer. This problem is still present in current master (future v13) and also occurs with 32-bit integers. > avr-gcc -S -Os -mul.c -fdump-rtl-ira With v8, mul.s has 15 instructions. With newer versions, mul.s has 26 additional instructions: * 12 silly, useless stores into / loads from frame. * 12 instructions to setup the frame. * More instructions due to sub-optimal register alloc. * Uses 6 bytes stack frame where v8 needs no frame at all. In the IRA dump, there is: Pass 0 for finding pseudo/allocno costs a0 (r53,l0) best NO_REGS, allocno NO_REGS a2 (r49,l0) best GENERAL_REGS, allocno GENERAL_REGS a1 (r48,l0) best NO_REGS, allocno NO_REGS ... Pass 1 for finding pseudo/allocno costs r53: preferred NO_REGS, alternative NO_REGS, allocno NO_REGS r49: preferred GENERAL_REGS, alternative NO_REGS, allocno GENERAL_REGS r48: preferred NO_REGS, alternative NO_REGS, allocno NO_REGS ... Spill a0(r53,l0) Spill a1(r48,l0) Allocno a2r49 of GENERAL_REGS(30) ... So there are 2 register spills for no reason that lead to that code bloat.
[Bug rtl-optimization/90706] [10/11/12/13 Regression] Useless code generated for stack / register operations on AVR
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90706 Jakub Jelinek changed: What|Removed |Added Target Milestone|10.4|10.5 --- Comment #12 from Jakub Jelinek --- GCC 10.4 is being released, retargeting bugs to GCC 10.5.
[Bug rtl-optimization/90706] [10/11/12/13 Regression] Useless code generated for stack / register operations on AVR
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90706 Richard Biener changed: What|Removed |Added Target Milestone|9.5 |10.4 --- Comment #11 from Richard Biener --- GCC 9 branch is being closed