[Bug rtl-optimization/90706] [10/11/12/13 Regression] Useless code generated for stack / register operations on AVR

2024-05-18 Thread gjl at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90706

Georg-Johann Lay  changed:

   What|Removed |Added

   Target Milestone|10.5|12.3

[Bug rtl-optimization/90706] [10/11/12/13 Regression] Useless code generated for stack / register operations on AVR

2024-03-05 Thread gjl at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90706

--- Comment #24 from Georg-Johann Lay  ---
(In reply to Georg-Johann Lay from comment #23)
> As it appears, this bug is not fixed completely.  For the -mmcu=avrtiny
> architecture, there is still bloat for even the smallest test cases like:

Different story, f'up to PR113927.

[Bug rtl-optimization/90706] [10/11/12/13 Regression] Useless code generated for stack / register operations on AVR

2023-05-21 Thread gjl at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90706

--- Comment #23 from Georg-Johann Lay  ---
Created attachment 55130
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55130=edit
Test case for -Os -mmcu=attiny40

As it appears, this bug is not fixed completely.  For the -mmcu=avrtiny
architecture, there is still bloat for even the smallest test cases like:

$ avr-gcc bloat.c -mmcu=attiny40 -Os -S

char func3 (char c)
{
return 1 + c;
}

"GCC: (GNU) 14.0.0 20230520 (experimental)" compiles this to:

func3:
push r28 ;  22  [c=4 l=1]  pushqi1/0
push r29 ;  23  [c=4 l=1]  pushqi1/0
push __tmp_reg__ ;  27  [c=4 l=1]  *addhi3_sp
in r28,__SP_L__  ;  38  [c=4 l=2]  *movhi/7
in r29,__SP_H__
/* prologue: function */
/* frame size = 1 */
/* stack size = 3 */
mov r20,r24  ;  18  [c=4 l=1]  movqi_insn/0
subi r20,lo8(-(1))   ;  19  [c=4 l=1]  *addqi3/1
mov r24,r20  ;  21  [c=4 l=1]  movqi_insn/0
/* epilogue start */
pop __tmp_reg__  ;  33  [c=4 l=1]  *addhi3_sp
pop r29  ;  34  [c=4 l=1]  popqi
pop r28  ;  35  [c=4 l=1]  popqi
ret  ;  36  [c=0 l=1]  return_from_epilogue

For reference, avr-gcc v8 generates for this function:

func3:
/* prologue: function */
/* frame size = 0 */
/* stack size = 0 */
.L__stack_usage = 0
subi r24,lo8(-(1))   ;  6   [c=4 l=1]  addqi3/1
/* epilogue start */
ret  ;  17  [c=0 l=1]  return

[Bug rtl-optimization/90706] [10/11/12/13 Regression] Useless code generated for stack / register operations on AVR

2023-03-31 Thread gjl at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90706

Georg-Johann Lay  changed:

   What|Removed |Added

  Known to fail||10.0, 11.0, 12.0, 9.0
 Status|NEW |RESOLVED
 Resolution|--- |FIXED
  Known to work||13.0

--- Comment #22 from Georg-Johann Lay  ---
Fixed in 12.3+

[Bug rtl-optimization/90706] [10/11/12/13 Regression] Useless code generated for stack / register operations on AVR

2023-03-31 Thread vmakarov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90706

--- Comment #21 from Vladimir Makarov  ---
(In reply to CVS Commits from comment #20)
> The releases/gcc-12 branch has been updated by Vladimir Makarov
> :
> 
> https://gcc.gnu.org/g:88792f04e5c63025506244b9ac7186a3cc10c25a
> 
> 

The trunk with the patch behaved good for a few weeks.  So I backported it to
gcc-12 branch.  GCC-12 branch with the patch was successfully tested and
bootstrapped on x86-64.

[Bug rtl-optimization/90706] [10/11/12/13 Regression] Useless code generated for stack / register operations on AVR

2023-03-31 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90706

--- Comment #20 from CVS Commits  ---
The releases/gcc-12 branch has been updated by Vladimir Makarov
:

https://gcc.gnu.org/g:88792f04e5c63025506244b9ac7186a3cc10c25a

commit r12-9372-g88792f04e5c63025506244b9ac7186a3cc10c25a
Author: Vladimir N. Makarov 
Date:   Thu Mar 2 16:29:05 2023 -0500

IRA: Use minimal cost for hard register movement

This is the 2nd attempt to fix PR90706.  IRA calculates wrong AVR
costs for moving general hard regs of SFmode.  This was the reason for
spilling a pseudo in the PR.  In this patch we use smaller move cost
of hard reg in its natural and operand modes.

PR rtl-optimization/90706

gcc/ChangeLog:

* ira-costs.cc: Include print-rtl.h.
(record_reg_classes, scan_one_insn): Add code to print debug info.
(record_operand_costs): Find and use smaller cost for hard reg
move.

gcc/testsuite/ChangeLog:

* gcc.target/avr/pr90706.c: New.

[Bug rtl-optimization/90706] [10/11/12/13 Regression] Useless code generated for stack / register operations on AVR

2023-03-04 Thread gjl at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90706

Georg-Johann Lay  changed:

   What|Removed |Added

  Known to work||8.5.0

--- Comment #19 from Georg-Johann Lay  ---
(In reply to CVS Commits from comment #18)
> https://gcc.gnu.org/g:2639f9d2313664e6b4ed2f8131fefa60aeeb0518
> 
> commit r13-6424-g2639f9d2313664e6b4ed2f8131fefa60aeeb0518
> Author: Vladimir N. Makarov 
> Date:   Thu Mar 2 16:29:05 2023 -0500
> 
> IRA: Use minimal cost for hard register movement

Thank you; the code looks clean now. (For my test case from comment #16 I
needed -fno-split wide-types which is a different story).

Is there any chance your fix will be back-ported?

[Bug rtl-optimization/90706] [10/11/12/13 Regression] Useless code generated for stack / register operations on AVR

2023-03-02 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90706

--- Comment #18 from CVS Commits  ---
The master branch has been updated by Vladimir Makarov :

https://gcc.gnu.org/g:2639f9d2313664e6b4ed2f8131fefa60aeeb0518

commit r13-6424-g2639f9d2313664e6b4ed2f8131fefa60aeeb0518
Author: Vladimir N. Makarov 
Date:   Thu Mar 2 16:29:05 2023 -0500

IRA: Use minimal cost for hard register movement

This is the 2nd attempt to fix PR90706.  IRA calculates wrong AVR
costs for moving general hard regs of SFmode.  This was the reason for
spilling a pseudo in the PR.  In this patch we use smaller move cost
of hard reg in its natural and operand modes.

PR rtl-optimization/90706

gcc/ChangeLog:

* ira-costs.cc: Include print-rtl.h.
(record_reg_classes, scan_one_insn): Add code to print debug info.
(record_operand_costs): Find and use smaller cost for hard reg
move.

gcc/testsuite/ChangeLog:

* gcc.target/avr/pr90706.c: New.

[Bug rtl-optimization/90706] [10/11/12/13 Regression] Useless code generated for stack / register operations on AVR

2022-12-16 Thread vmakarov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90706

--- Comment #17 from Vladimir Makarov  ---
I've reverted my patch as it resulted in two new PRs.  I'll do more work on
this PR and I'll start this job in Jan.

[Bug rtl-optimization/90706] [10/11/12/13 Regression] Useless code generated for stack / register operations on AVR

2022-12-16 Thread gjl at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90706

Georg-Johann Lay  changed:

   What|Removed |Added

 CC||gjl at gcc dot gnu.org

--- Comment #16 from Georg-Johann Lay  ---
Created attachment 54113
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=54113=edit
More elaborate C test case.

This is a more complicated test case, compile with

> avr-gcc -c pi-i.c -mmcu=atmega8 -Os -mcall-prologues -fno-tree-loop-optimize 
> -fno-move-loop-invariants && avr-size pi-i.o

Code sizes are:

664 with avr-gcc v8.5
992 with avr-gcc v11.3
834 with avr-gcc master with the change from comment #13

So there is a clear improvement with patch #13, but size is still +25% compared
to v8. What also has an effect is -fno-split-wide-types.

The test case mostly operates on float; unfortunately I don't have a similar
test-case for 32-bit integers at hand.

[Bug rtl-optimization/90706] [10/11/12/13 Regression] Useless code generated for stack / register operations on AVR

2022-12-15 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90706

--- Comment #15 from CVS Commits  ---
The master branch has been updated by Vladimir Makarov :

https://gcc.gnu.org/g:12abd5a7d13209f79664ea603b3f3517f71b8c4f

commit r13-4727-g12abd5a7d13209f79664ea603b3f3517f71b8c4f
Author: Vladimir N. Makarov 
Date:   Thu Dec 15 14:11:05 2022 -0500

IRA: Check that reg classes contain a hard reg of given mode in reg move
cost calculation

IRA calculates wrong AVR costs for moving general hard regs of SFmode.  To
calculate the costs we did not exclude sub-classes which do not contain
hard regs of given mode.  This was the reason for spilling a pseudo in the
PR. The patch fixes this.

PR rtl-optimization/90706

gcc/ChangeLog:

* ira-costs.cc: Include print-rtl.h.
(record_reg_classes, scan_one_insn): Add code to print debug info.
* ira.cc (ira_init_register_move_cost): Check that at least one
hard
reg of the mode are in the class contents to calculate the
register move costs.

gcc/testsuite/ChangeLog:

* gcc.target/avr/pr90706.c: New.

[Bug rtl-optimization/90706] [10/11/12/13 Regression] Useless code generated for stack / register operations on AVR

2022-12-13 Thread vmakarov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90706

Vladimir Makarov  changed:

   What|Removed |Added

 CC||vmakarov at gcc dot gnu.org

--- Comment #14 from Vladimir Makarov  ---
What I see is the input to RA was significantly changed sing gcc-8 (see
insns marked by !).  A lot of subregs is generated now and there is no
promotion of (argument) hard regs (insns 44-47) because of
https://gcc.gnu.org/legacy-ml/gcc-patches/2018-10/msg01356.html.


1: NOTE_INSN_DELETED 1: NOTE_INSN_DELETED
4: NOTE_INSN_BASIC_BLOCK 2   4: NOTE_INSN_BASIC_BLOCK 2
2: r44:SF=r22:SF44: r56:QI=r22:QI
  REG_DEAD r22:SF  REG_DEAD r22:QI
3: NOTE_INSN_FUNCTION_BEG   45: r57:QI=r23:QI
6: r45:QI=0x1  REG_DEAD r23:QI
  REG_EQUAL 0x1 46: r58:QI=r24:QI
7: r18:SF=0.0  REG_DEAD r24:QI
!   8: r22:SF=r44:SF47: r59:QI=r25:QI
  REG_DEAD r44:SF  REG_DEAD r25:QI
9: r24:QI=call [`__gtsf2'] argc:0   48: r52:QI=r56:QI
  REG_DEAD r25:QI  REG_DEAD r56:QI
  REG_DEAD r23:QI   49: r53:QI=r57:QI
  REG_DEAD r22:QI  REG_DEAD r57:QI
  REG_DEAD r18:SF   50: r54:QI=r58:QI
  REG_CALL_DECL `__gtsf2'  REG_DEAD r58:QI
  REG_EH_REGION 0x8000  51: r55:QI=r59:QI
   10: NOTE_INSN_DELETED   REG_DEAD r59:QI
   11: cc0=cmp(r24:QI,0) 3: NOTE_INSN_FUNCTION_BEG
  REG_DEAD r24:QI6: r46:QI=0x1
   12: pc={(cc0>0)?L14:pc} REG_EQUAL 0x1
  REG_BR_PROB 633507684  7: r18:SF=0.0
   22: NOTE_INSN_BASIC_BLOCK 3!  52: clobber r60:SI
   13: r45:QI=0   !  53: r60:SI#0=r52:QI
  REG_EQUAL 0  REG_DEAD r52:QI
   14: L14:   !  54: r60:SI#1=r53:QI
   23: NOTE_INSN_BASIC_BLOCK 4 REG_DEAD r53:QI
   19: r24:QI=r45:QI  !  55: r60:SI#2=r54:QI
  REG_DEAD r45:QI  REG_DEAD r54:QI
   20: use r24:QI !  56: r60:SI#3=r55:QI
   REG_DEAD r55:QI
  !  57: r22:SF=r60:SI#0
   REG_DEAD r60:SI
 9: r24:QI=call [`__gtsf2']
argc:0
   REG_DEAD r25:QI
   REG_DEAD r23:QI
   REG_DEAD r22:QI
   REG_DEAD r18:SF
   REG_CALL_DECL `__gtsf2'
   REG_EH_REGION
0x8000
34: r50:QI=r24:QI
   REG_DEAD r24:QI
10: NOTE_INSN_DELETED
11: pc={(r50:QI>0)?L13:pc}
   REG_DEAD r50:QI
   REG_BR_PROB 633507684
21: NOTE_INSN_BASIC_BLOCK 3
12: r46:QI=0
   REG_EQUAL 0
13: L13:
22: NOTE_INSN_BASIC_BLOCK 4
18: r24:QI=r46:QI
   REG_DEAD r46:QI
19: use r24:QI

Currently, GCC generates the following AVR code:

check:
push r28
push r29
rcall .
rcall .
push __tmp_reg__
in r28,__SP_L__
in r29,__SP_H__
/* prologue: function */
/* frame size = 5 */
/* stack size = 7 */
.L__stack_usage = 7
ldi r18,lo8(1)
std Y+5,r18
ldi r18,0
ldi r19,0
ldi r20,0
ldi r21,0
!   std Y+1,r22
!   std Y+2,r23
!   std Y+3,r24
!   std Y+4,r25
!   ldd r22,Y+1

[Bug rtl-optimization/90706] [10/11/12/13 Regression] Useless code generated for stack / register operations on AVR

2022-11-01 Thread gjl at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90706

--- Comment #13 from Georg-Johann Lay  ---
Created attachment 53812
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53812=edit
Test case with 32-bit integer.

This problem is still present in current master (future v13) and also occurs
with 32-bit integers.

> avr-gcc -S -Os -mul.c -fdump-rtl-ira

With v8, mul.s has 15 instructions.

With newer versions, mul.s has 26 additional instructions: 
* 12 silly, useless stores into / loads from frame.
* 12 instructions to setup the frame.
* More instructions due to sub-optimal register alloc.
* Uses 6 bytes stack frame where v8 needs no frame at all.

In the IRA dump, there is:

Pass 0 for finding pseudo/allocno costs
a0 (r53,l0) best NO_REGS, allocno NO_REGS
a2 (r49,l0) best GENERAL_REGS, allocno GENERAL_REGS
a1 (r48,l0) best NO_REGS, allocno NO_REGS
...
Pass 1 for finding pseudo/allocno costs
r53: preferred NO_REGS, alternative NO_REGS, allocno NO_REGS
r49: preferred GENERAL_REGS, alternative NO_REGS, allocno GENERAL_REGS
r48: preferred NO_REGS, alternative NO_REGS, allocno NO_REGS
...
  Spill a0(r53,l0)
  Spill a1(r48,l0)
  Allocno a2r49 of GENERAL_REGS(30) ...

So there are 2 register spills for no reason that lead to that code bloat.

[Bug rtl-optimization/90706] [10/11/12/13 Regression] Useless code generated for stack / register operations on AVR

2022-06-28 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90706

Jakub Jelinek  changed:

   What|Removed |Added

   Target Milestone|10.4|10.5

--- Comment #12 from Jakub Jelinek  ---
GCC 10.4 is being released, retargeting bugs to GCC 10.5.

[Bug rtl-optimization/90706] [10/11/12/13 Regression] Useless code generated for stack / register operations on AVR

2022-05-27 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90706

Richard Biener  changed:

   What|Removed |Added

   Target Milestone|9.5 |10.4

--- Comment #11 from Richard Biener  ---
GCC 9 branch is being closed