Re: Enabling -frename-registers?
On 05/02/2016 07:31 AM, Jeff Law wrote: On 04/29/2016 07:32 AM, Bernd Schmidt wrote: On 04/29/2016 03:02 PM, David Edelsohn wrote: How has this show general benefit for all architectures to deserve enabling it by default at -O2? It should improve postreload scheduling in general, and it can also help clear up bad code generation left behind by register allocation. Right. ISTM the round-robin renaming reduces the false dependencies that are inherently created by register allocation. It should benefit any architecture that utilizes post-reload scheduling. Following up to myself... As was seen on x86-64, renaming can cause problems with partial register stalls. FWIW, partial register stalls are not strictly limited to the x86-64. In fact the PA 2.0 processor had partial register stalls -- they were so bad that I ended up just turning off the ability to access both halves of the 64bit FP registers independently. I wouldn't be totally surprised if other targets have micro-architectures that would tickle this issue. jeff
Re: Enabling -frename-registers?
On May 5, 2016, at 6:00 AM, Wilco Dijkstrawrote: > > Ramana Radhakrishnan wrote: >> >> Can you file a bugzilla entry with a testcase that folks can look at please ? > > I created https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70961. Unfortunately > I don't have a simple testcase that I can share. Simple has never been a requirement for bug submission. It is merely nice.
Re: Enabling -frename-registers?
Ramana Radhakrishnan wrote: > > Can you file a bugzilla entry with a testcase that folks can look at please ? I created https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70961. Unfortunately I don't have a simple testcase that I can share. Wilco
Re: Enabling -frename-registers?
On Wed, May 4, 2016 at 4:20 PM, Wilco Dijkstrawrote: > Bernd Schmidt wrote: >> On 05/04/2016 03:25 PM, Ramana Radhakrishnan wrote: >>> On ARM / AArch32 I haven't seen any performance data yet - the one place we >>> are concerned >>> about the impact is on Thumb2 code size as regrename may end up >>> inadvertently putting more >>> things in high registers. >> >> In theory at least arm_preferred_rename_class is designed to make the >> opposite happen. Bernd > > I do not see that working unfortunately - Thumb-2 codesize increases by a few > percent even with -Os. > This is primarily due to replacing a low register with IP, which often > changes a 16-bit instruction like: > > movsr2, #8 > > into a 32-bit one: > > mov ip, #8 > > This will also affect other targets with multiple instruction sizes. So I > think it should check the > size of the new instruction patterns and only accept a rename if it is not > larger (certainly with -Os). Can you file a bugzilla entry with a testcase that folks can look at please ? Ramana >
Re: Enabling -frename-registers?
On 05/04/2016 10:20 AM, Wilco Dijkstra wrote: > Also when people claim they can't see any benefit, did they check the > codesize difference on SPEC2006? > On AArch64 codesize reduced uniformly due to fewer moves (and in a few cases > significantly so). I expect > that to be true for other RISC targets. Simply put, reduced codesize at no > performance loss = gain. Comparing text size on powerpc64 for CPU2006 executables, 20 decreased in size, 3 stayed the same, and 6 increased in size. -Pat
Re: Enabling -frename-registers?
> I do not see that working unfortunately - Thumb-2 codesize increases by a > few percent even with -Os. This is primarily due to replacing a low > register with IP, which often changes a 16-bit instruction like: > > movsr2, #8 > > into a 32-bit one: > > mov ip, #8 > > This will also affect other targets with multiple instruction sizes. So I > think it should check the size of the new instruction patterns and only > accept a rename if it is not larger (certainly with -Os). I'd rather let the back-end do that, either through preferred_rename_class or another hook. -- Eric Botcazou
Re: Enabling -frename-registers?
On Wed, May 04, 2016 at 12:11:04PM +0200, Bernd Schmidt wrote: > Given how many latent bugs it has shown up I think that alone would make > it valuable to have enabled at -O2. It is finding so many latent bugs simply because it is changing register allocation so much, and very aggressively. That does not mean enabling it by default is a good thing, quite the opposite, if there is no performance (or code size, etc.) advantage to doing so. Segher
Re: Enabling -frename-registers?
Bernd Schmidt wrote: > On 05/04/2016 03:25 PM, Ramana Radhakrishnan wrote: >> On ARM / AArch32 I haven't seen any performance data yet - the one place we >> are concerned >> about the impact is on Thumb2 code size as regrename may end up >> inadvertently putting more >> things in high registers. > > In theory at least arm_preferred_rename_class is designed to make the > opposite happen. Bernd I do not see that working unfortunately - Thumb-2 codesize increases by a few percent even with -Os. This is primarily due to replacing a low register with IP, which often changes a 16-bit instruction like: movsr2, #8 into a 32-bit one: mov ip, #8 This will also affect other targets with multiple instruction sizes. So I think it should check the size of the new instruction patterns and only accept a rename if it is not larger (certainly with -Os). Also when people claim they can't see any benefit, did they check the codesize difference on SPEC2006? On AArch64 codesize reduced uniformly due to fewer moves (and in a few cases significantly so). I expect that to be true for other RISC targets. Simply put, reduced codesize at no performance loss = gain. Wilco
Re: Enabling -frename-registers?
On Wed, May 4, 2016 at 2:37 PM, Bernd Schmidtwrote: > On 05/04/2016 03:25 PM, Ramana Radhakrishnan wrote: >> >> On ARM / AArch32 I haven't seen any performance data yet - the one >> place we are concerned about the impact is on Thumb2 code size as >> regrename may end up inadvertently putting more things in high >> registers. > > > In theory at least arm_preferred_rename_class is designed to make the > opposite happen. Indeed, yes - I'd forgotten that hook - but yes we should see if something sticks out. regards Ramana > > > Bernd
Re: Enabling -frename-registers?
On 05/04/2016 03:25 PM, Ramana Radhakrishnan wrote: On ARM / AArch32 I haven't seen any performance data yet - the one place we are concerned about the impact is on Thumb2 code size as regrename may end up inadvertently putting more things in high registers. In theory at least arm_preferred_rename_class is designed to make the opposite happen. Bernd
Re: Enabling -frename-registers?
On 04/05/16 11:26, Eric Botcazou wrote: >> Given how many latent bugs it has shown up I think that alone would make >> it valuable to have enabled at -O2. > > It might be worthwhile to test it on embedded architectures because modern > x86 > and PowerPC processors are probably not very sensitive to this kind of tweaks. > On ARM / AArch32 I haven't seen any performance data yet - the one place we are concerned about the impact is on Thumb2 code size as regrename may end up inadvertently putting more things in high registers. We'll check on that separately, having been away recently I haven't done any recent measurements with CSiBe - I'll try and get that done this week. Our bots have been a bit too flaky recently for me to say this with any certainty at the minute. On AArch64 (thanks to Wilco for some benchmarking), we've generally seen a small upside in our benchmarks (a couple of proprietary suites that I cannot name and SPEC2k(6)) aside from one major regression which is arguably an issue in the backend pattern for that particular intrinsic (aes) and would have been visible with -frename-regs anyway ! That needs to be fixed irrespective of turning this on by default. Wilco tells me that on SPEC2k6 we see a code size improvement by removing quite a lot of redundant moves though the performance difference appears to be in the noise on SPEC2k6. It does appear to look positive for aarch64 at first glance. regards Ramana
Re: Enabling -frename-registers?
> Given how many latent bugs it has shown up I think that alone would make > it valuable to have enabled at -O2. It might be worthwhile to test it on embedded architectures because modern x86 and PowerPC processors are probably not very sensitive to this kind of tweaks. -- Eric Botcazou
Re: Enabling -frename-registers?
On 05/04/2016 12:03 PM, Eric Botcazou wrote: The IBM LTC team has tested the benefit of -frename-registers at -O2 and sees no net benefit for PowerPC -- some benchmarks improve slightly but others degrade slightly (a few percent). You mentioned no overall benefit for x86. Although you mentioned benefit for Itanium, it is not a primary nor secondary architecture target for GCC and continues to have limited adoption. Andreas also reported a bootstrap comparison failure for Itanium due to the change. ...which had nothing to do with -frename-registers but was a latent issue in the speculation support of the scheduler, see the audit trail. Yeah, I'd forgotten to add this to the list of issues found. Also, Alan's morestack representation issue. Given how many latent bugs it has shown up I think that alone would make it valuable to have enabled at -O2. Bernd
Re: Enabling -frename-registers?
> The IBM LTC team has tested the benefit of -frename-registers at -O2 > and sees no net benefit for PowerPC -- some benchmarks improve > slightly but others degrade slightly (a few percent). You mentioned > no overall benefit for x86. Although you mentioned benefit for > Itanium, it is not a primary nor secondary architecture target for GCC > and continues to have limited adoption. Andreas also reported a > bootstrap comparison failure for Itanium due to the change. ...which had nothing to do with -frename-registers but was a latent issue in the speculation support of the scheduler, see the audit trail. -- Eric Botcazou
Re: Enabling -frename-registers?
On Wed, May 04, 2016 at 11:08:47AM +0200, Bernd Schmidt wrote: > On 05/04/2016 11:05 AM, Alan Modra wrote: > >I agree it's good to find these things.. Another nasty bug to add to > >the list is complete breakage of gccgo on powerpc64le. I see register > >renaming around the prologue call to __morestack, which trashes > >function arguments. > > How does this come about, is there something incorrect about the RTL > representation of __morestack? That's highly likely. call_fusage is just r12, it's non-standard parameter. -- Alan Modra Australia Development Lab, IBM
Re: Enabling -frename-registers?
On 05/04/2016 11:05 AM, Alan Modra wrote: I agree it's good to find these things.. Another nasty bug to add to the list is complete breakage of gccgo on powerpc64le. I see register renaming around the prologue call to __morestack, which trashes function arguments. How does this come about, is there something incorrect about the RTL representation of __morestack? Bernd
Re: Enabling -frename-registers?
On Wed, May 04, 2016 at 12:52:47AM +0200, Bernd Schmidt wrote: > I must say I find the argumentation about the fallout not compelling. It's a > normal consequence of development work, and by enabling it at -O2, we have > found: > * a Linux kernel bug > * a rs6000 testsuite bug > * some i386.md issues that can cause performance problems > * and a compare-debug problem in regrename itself. > All of these are _good_ things. If we don't want to run into such issues > we'll have to cease all development work. I agree it's good to find these things.. Another nasty bug to add to the list is complete breakage of gccgo on powerpc64le. I see register renaming around the prologue call to __morestack, which trashes function arguments. Dump of assembler code for function main: 0x100014e0 <+0>: lis r2,4098 0x100014e4 <+4>: addir2,r2,-18176 0x100014e8 <+8>: ld r0,-28736(r13) 0x100014ec <+12>:addir12,r1,-16496 0x100014f0 <+16>:nop 0x100014f4 <+20>:cmpld cr7,r12,r0 0x100014f8 <+24>:blt cr7,0x100015500x100014fc <+28>:mflrr5 0x10001500 <+32>:std r30,-16(r1) 0x10001504 <+36>:std r31,-8(r1) 0x10001508 <+40>:li r8,0 0x1000150c <+44>:nop 0x10001510 <+48>:ld r10,-32664(r2) 0x10001514 <+52>:nop 0x10001518 <+56>:ld r6,-32672(r2) 0x1000151c <+60>:std r5,16(r1) 0x10001520 <+64>:stdur1,-112(r1) 0x10001524 <+68>:stb r8,0(r6) 0x10001528 <+72>:lbz r9,0(r10) 0x1000152c <+76>:cmpwi r9,0 0x10001530 <+80>:beq 0x1000156c 0x10001534 <+84>:li r3,0 0x10001538 <+88>:addir1,r1,112 0x1000153c <+92>:ld r0,16(r1) 0x10001540 <+96>:ld r30,-16(r1) 0x10001544 <+100>: ld r31,-8(r1) 0x10001548 <+104>: mtlrr0 0x1000154c <+108>: blr => 0x10001550 <+112>: mflrr3 # argc trashed 0x10001554 <+116>: std r3,16(r1) 0x10001558 <+120>: bl 0x10001818 <__morestack> 0x1000155c <+124>: ld r4,16(r1) 0x10001560 <+128>: mtlrr4 0x10001564 <+132>: blr 0x10001568 <+136>: b 0x100014fc __morestack has a non-standard calling convention. After buying more stack, it calls its caller! In this case that means a call to 0x10001568. So regrename can't use r3 at 0x10001550 as it is a parameter passing reg for main. The use of r4 at 0x1000155c would also be wrong for a function that returns a value in r4. -- Alan Modra Australia Development Lab, IBM
Re: Enabling -frename-registers?
On Wed, May 4, 2016 at 12:52 AM, Bernd Schmidtwrote: > On 05/03/2016 11:26 PM, David Edelsohn wrote: >> >> Optimizations enabled by default at -O2 should show an overall net >> benefit -- that is the general justification that we have used in the >> past. I request that this change be reverted until more compelling >> evidence of benefit is presented. > > > Shrug. Done. I was going to look at adding some more smarts to it to clean > up after the register allocator; I guess I shan't bother. > > I must say I find the argumentation about the fallout not compelling. It's a > normal consequence of development work, and by enabling it at -O2, we have > found: > * a Linux kernel bug > * a rs6000 testsuite bug > * some i386.md issues that can cause performance problems > * and a compare-debug problem in regrename itself. > All of these are _good_ things. If we don't want to run into such issues > we'll have to cease all development work. > I'll still submit final versions of the fixes for the i386 and compare-debug > issues. I agree with you here, enabling at -O2 was fine. Reverting will have regressed PR38825 again (curiously also some SSE scheduling issue). Thus I wonder if regrename could be integrated with sched2 itself. Richard. > > Bernd
Re: Enabling -frename-registers?
On Tue, May 3, 2016 at 6:52 PM, Bernd Schmidtwrote: > On 05/03/2016 11:26 PM, David Edelsohn wrote: >> >> Optimizations enabled by default at -O2 should show an overall net >> benefit -- that is the general justification that we have used in the >> past. I request that this change be reverted until more compelling >> evidence of benefit is presented. > > Shrug. Done. I was going to look at adding some more smarts to it to clean > up after the register allocator; I guess I shan't bother. Why not add the smarts and then x86, POWER, ARM and other architectures can test the performance benefit? If that change makes the benefit more consistent, the motivation for making the option the default at -O2 will be clearer. Thanks, David
Re: Enabling -frename-registers?
On 05/03/2016 11:26 PM, David Edelsohn wrote: Optimizations enabled by default at -O2 should show an overall net benefit -- that is the general justification that we have used in the past. I request that this change be reverted until more compelling evidence of benefit is presented. Shrug. Done. I was going to look at adding some more smarts to it to clean up after the register allocator; I guess I shan't bother. I must say I find the argumentation about the fallout not compelling. It's a normal consequence of development work, and by enabling it at -O2, we have found: * a Linux kernel bug * a rs6000 testsuite bug * some i386.md issues that can cause performance problems * and a compare-debug problem in regrename itself. All of these are _good_ things. If we don't want to run into such issues we'll have to cease all development work. I'll still submit final versions of the fixes for the i386 and compare-debug issues. Bernd
Re: Enabling -frename-registers?
On Fri, Apr 29, 2016 at 10:21 AM, Richard Bienerwrote: > On April 29, 2016 3:48:37 PM GMT+02:00, David Edelsohn > wrote: >>On Fri, Apr 29, 2016 at 9:44 AM, Bernd Schmidt >>wrote: >>> >>> >>> On 04/29/2016 03:42 PM, David Edelsohn wrote: On Fri, Apr 29, 2016 at 9:32 AM, Bernd Schmidt wrote: > > On 04/29/2016 03:02 PM, David Edelsohn wrote: >> >> >> How has this show general benefit for all architectures to deserve >> enabling it by default at -O2? > > > > It should improve postreload scheduling in general, and it can also >>help > clear up bad code generation left behind by register allocation. Did you test the actual performance benefit on any architectures, especially architectures other than x86? >>> >>> >>> No. If that's the standard, I'll back out the change. >> >>It seems rather strange to enable an optimization by default across >>all targets without even knowing the performance impact. >> >>I'm eager to learn the opinion of others about this. > > It shows overall benefit on Itanic and ups and downs on x86. > > It's stage1 and the easiest to get feedback for all archs is to enable it by > default. Bernd and Richard, The IBM LTC team has tested the benefit of -frename-registers at -O2 and sees no net benefit for PowerPC -- some benchmarks improve slightly but others degrade slightly (a few percent). You mentioned no overall benefit for x86. Although you mentioned benefit for Itanium, it is not a primary nor secondary architecture target for GCC and continues to have limited adoption. Andreas also reported a bootstrap comparison failure for Itanium due to the change. If I understand correctly, the change to enable -frename-registers was motivated by PR 59173 for SSE. The performance experiments, bootstrap failure, and testsuite fallout do not provide good justification for enabling this optimization by default at -O2 for all targets. If this helps SSE, this optimization can be enabled using target-specific override mechanisms for that target. Optimizations enabled by default at -O2 should show an overall net benefit -- that is the general justification that we have used in the past. I request that this change be reverted until more compelling evidence of benefit is presented. Thanks, David
Re: Enabling -frename-registers?
On Mon, May 2, 2016 at 3:20 PM, Bernd Schmidtwrote: > On 05/02/2016 01:57 PM, Uros Bizjak wrote: > >> With the referred testcase, the compare-debug failure trips on (debug_insn >> 101) > > > Ok, INDEX_REG_CLASS is NO_REGS on alpha, and apparently the contents of the > MEM isn't a valid address. > > Try this? Yes, the patch fixes bootstrap miscomparison. Thanks, Uros.
Re: Enabling -frename-registers?
On 04/29/2016 07:32 AM, Bernd Schmidt wrote: On 04/29/2016 03:02 PM, David Edelsohn wrote: How has this show general benefit for all architectures to deserve enabling it by default at -O2? It should improve postreload scheduling in general, and it can also help clear up bad code generation left behind by register allocation. Right. ISTM the round-robin renaming reduces the false dependencies that are inherently created by register allocation. It should benefit any architecture that utilizes post-reload scheduling. jeff
Re: Enabling -frename-registers?
On 05/02/2016 01:57 PM, Uros Bizjak wrote: With the referred testcase, the compare-debug failure trips on (debug_insn 101) Ok, INDEX_REG_CLASS is NO_REGS on alpha, and apparently the contents of the MEM isn't a valid address. Try this? Bernd Index: gcc/regrename.c === --- gcc/regrename.c (revision 235753) +++ gcc/regrename.c (working copy) @@ -1238,6 +1238,19 @@ scan_rtx_reg (rtx_insn *insn, rtx *loc, } } +/* A wrapper around base_reg_class which returns ALL_REGS if INSN is a + DEBUG_INSN. The arguments MODE, AS, CODE and INDEX_CODE are as for + base_reg_class. */ + +static reg_class +base_reg_class_for_rename (rtx_insn *insn, machine_mode mode, addr_space_t as, + rtx_code code, rtx_code index_code) +{ + if (DEBUG_INSN_P (insn)) +return ALL_REGS; + return base_reg_class (mode, as, code, index_code); +} + /* Adapted from find_reloads_address_1. CL is INDEX_REG_CLASS or BASE_REG_CLASS depending on how the register is being considered. */ @@ -1343,12 +1356,16 @@ scan_rtx_address (rtx_insn *insn, rtx *l } if (locI) - scan_rtx_address (insn, locI, INDEX_REG_CLASS, action, mode, as); + { + reg_class iclass = DEBUG_INSN_P (insn) ? ALL_REGS : INDEX_REG_CLASS; + scan_rtx_address (insn, locI, iclass, action, mode, as); + } if (locB) - scan_rtx_address (insn, locB, - base_reg_class (mode, as, PLUS, index_code), - action, mode, as); - + { + reg_class bclass = base_reg_class_for_rename (insn, mode, as, PLUS, + index_code); + scan_rtx_address (insn, locB, bclass, action, mode, as); + } return; } @@ -1366,10 +1383,13 @@ scan_rtx_address (rtx_insn *insn, rtx *l break; case MEM: - scan_rtx_address (insn, (x, 0), - base_reg_class (GET_MODE (x), MEM_ADDR_SPACE (x), - MEM, SCRATCH), - action, GET_MODE (x), MEM_ADDR_SPACE (x)); + { + reg_class bclass = base_reg_class_for_rename (insn, GET_MODE (x), + MEM_ADDR_SPACE (x), + MEM, SCRATCH); + scan_rtx_address (insn, (x, 0), bclass, action, GET_MODE (x), + MEM_ADDR_SPACE (x)); + } return; case REG: @@ -1416,10 +1436,14 @@ scan_rtx (rtx_insn *insn, rtx *loc, enum return; case MEM: - scan_rtx_address (insn, (x, 0), - base_reg_class (GET_MODE (x), MEM_ADDR_SPACE (x), - MEM, SCRATCH), - action, GET_MODE (x), MEM_ADDR_SPACE (x)); + { + reg_class bclass = base_reg_class_for_rename (insn, GET_MODE (x), + MEM_ADDR_SPACE (x), + MEM, SCRATCH); + + scan_rtx_address (insn, (x, 0), bclass, action, GET_MODE (x), + MEM_ADDR_SPACE (x)); + } return; case SET:
Re: Enabling -frename-registers?
On Mon, May 2, 2016 at 1:21 PM, Uros Bizjakwrote: On 04/17/2016 08:59 PM, Jeff Law wrote: > invoke.texi has an independent list (probably incomplete! ;( of all the > things that -O2 enables. Make sure to add -frename-registers to that > list and this is Ok for the trunk (gcc-7). This is what I committed. >>> >>> >>> The patch introduced bootstrap failure on alpha-linux-gnu. >>> >>> Non-bootstrapped build regresses: >>> >>> FAIL: gcc.dg/torture/pr69542.c -O2 (test for excess errors) >>> FAIL: gcc.dg/torture/pr69542.c -O2 -flto -fno-use-linker-plugin >>> -flto-partition=none (test for excess errors) >>> FAIL: gcc.dg/torture/pr69542.c -Os (test for excess errors) >>> >>> where all failures are -fcompare-debug failures. >> >> >> Is the bootstrap failure also of this nature, or do you suspect something >> else? > > I'm analysing the dumps, but I strongly suspect that it is the same problem. > > (Please also note that the compiler includes Eric's PR 70886 patch [1]). > > [1] https://gcc.gnu.org/ml/gcc-patches/2016-05/msg00026.html With the referred testcase, the compare-debug failure trips on (debug_insn 101) ;; basic block 8, loop depth 0, count 0, freq 9788, maybe hot ;; prev block 7, next block 9, flags: (REACHABLE, RTL, MODIFIED) ;; pred: 7 [100.0%] (FALLTHRU) ;; bb 8 artificial_defs: { } ;; bb 8 artificial_uses: { u-1(30){ }} ;; lr in 1 [$1] 2 [$2] 3 [$3] 4 [$4] 5 [$5] 6 [$6] 29 [$29] 30 [$30] ;; lr use 1 [$1] 2 [$2] 4 [$4] 30 [$30] ;; lr def 1 [$1] 2 [$2] 4 [$4] ;; live in 1 [$1] 2 [$2] 3 [$3] 4 [$4] 5 [$5] 6 [$6] 29 [$29] 30 [$30] ;; live gen 1 [$1] 2 [$2] 4 [$4] ;; live kill (note 100 99 101 8 [bb 8] NOTE_INSN_BASIC_BLOCK) (debug_insn 101 100 104 8 (var_location:DI y (mem/f:DI (plus:DI (plus:DI (ashift:DI (reg:DI 2 $2 [orig:110 _19 ] [110]) (const_int 3 [0x3])) (reg/v/f:DI 1 $1 [orig:109 n ] [109])) (const_int 8 [0x8])) [3 n_18->h S8 A64])) pr69542.c:31 -1 (nil)) (insn 104 101 105 8 (set (reg:DI 17 $17 [154]) (ashift:DI (reg:DI 2 $2 [orig:110 _19 ] [110]) (const_int 3 [0x3]))) pr69542.c:31 64 {ashldi3} (nil)) The difference starts in rnreg pass, where without debug: processing block 8: opening incoming chain Creating chain $1 (53) opening incoming chain Creating chain $2 (54) opening incoming chain Creating chain $3 (55) opening incoming chain Creating chain $4 (56) opening incoming chain Creating chain $5 (57) opening incoming chain Creating chain $6 (58) opening incoming chain Creating chain $29 (59) opening incoming chain Creating chain $30 (60) Closing chain $2 (54) at insn 92 (terminate_write, superset) Creating chain $2 (61) at insn 92 Closing chain $1 (53) at insn 93 (terminate_dead, superset) ... and with debug: processing block 8: opening incoming chain Creating chain $1 (53) opening incoming chain Creating chain $2 (54) opening incoming chain Creating chain $3 (55) opening incoming chain Creating chain $4 (56) opening incoming chain Creating chain $5 (57) opening incoming chain Creating chain $6 (58) opening incoming chain Creating chain $29 (59) opening incoming chain Creating chain $30 (60) ==> Cannot rename chain $1 (53) at insn 101 (mark_read) ==> Widening register in chain $1 (53) at insn 101 Closing chain $2 (54) at insn 104 (terminate_write, superset) Creating chain $2 (61) at insn 104 Closing chain $1 (53) at insn 105 (terminate_dead, superset) ... > Uros.
Re: Enabling -frename-registers?
On Mon, May 2, 2016 at 1:15 PM, Bernd Schmidtwrote: > On 05/02/2016 01:12 PM, Uros Bizjak wrote: > >>> On 04/17/2016 08:59 PM, Jeff Law wrote: >>> invoke.texi has an independent list (probably incomplete! ;( of all the things that -O2 enables. Make sure to add -frename-registers to that list and this is Ok for the trunk (gcc-7). >>> >>> >>> This is what I committed. >> >> >> The patch introduced bootstrap failure on alpha-linux-gnu. >> >> Non-bootstrapped build regresses: >> >> FAIL: gcc.dg/torture/pr69542.c -O2 (test for excess errors) >> FAIL: gcc.dg/torture/pr69542.c -O2 -flto -fno-use-linker-plugin >> -flto-partition=none (test for excess errors) >> FAIL: gcc.dg/torture/pr69542.c -Os (test for excess errors) >> >> where all failures are -fcompare-debug failures. > > > Is the bootstrap failure also of this nature, or do you suspect something > else? I'm analysing the dumps, but I strongly suspect that it is the same problem. (Please also note that the compiler includes Eric's PR 70886 patch [1]). [1] https://gcc.gnu.org/ml/gcc-patches/2016-05/msg00026.html Uros.
Re: Enabling -frename-registers?
On 05/02/2016 01:12 PM, Uros Bizjak wrote: On 04/17/2016 08:59 PM, Jeff Law wrote: invoke.texi has an independent list (probably incomplete! ;( of all the things that -O2 enables. Make sure to add -frename-registers to that list and this is Ok for the trunk (gcc-7). This is what I committed. The patch introduced bootstrap failure on alpha-linux-gnu. Non-bootstrapped build regresses: FAIL: gcc.dg/torture/pr69542.c -O2 (test for excess errors) FAIL: gcc.dg/torture/pr69542.c -O2 -flto -fno-use-linker-plugin -flto-partition=none (test for excess errors) FAIL: gcc.dg/torture/pr69542.c -Os (test for excess errors) where all failures are -fcompare-debug failures. Is the bootstrap failure also of this nature, or do you suspect something else? Bernd
Re: Enabling -frename-registers?
Hello! > On 04/17/2016 08:59 PM, Jeff Law wrote: > >> invoke.texi has an independent list (probably incomplete! ;( of all the >> things that -O2 enables. Make sure to add -frename-registers to that >> list and this is Ok for the trunk (gcc-7). > > This is what I committed. The patch introduced bootstrap failure on alpha-linux-gnu. Non-bootstrapped build regresses: FAIL: gcc.dg/torture/pr69542.c -O2 (test for excess errors) FAIL: gcc.dg/torture/pr69542.c -O2 -flto -fno-use-linker-plugin -flto-partition=none (test for excess errors) FAIL: gcc.dg/torture/pr69542.c -Os (test for excess errors) where all failures are -fcompare-debug failures. The failure can be reproduced with a crosscompiler to alpha-linux-gnu: $ /ssd/uros/gcc-build-alpha/gcc/xgcc -B /ssd/uros/gcc-build-alpha/gcc -O2 -S -fcompare-debug pr69542.c xgcc: error: pr69542.c: -fcompare-debug failure $ /ssd/uros/gcc-build-alpha/gcc/xgcc -B /ssd/uros/gcc-build-alpha/gcc -O2 -S -fcompare-debug -fno-rename-registers pr69542.c [OK] Uros.
Re: Enabling -frename-registers?
Richard Bienerwrites: > It shows overall benefit on Itanic and ups and downs on x86. It's causing bootstrap comparison failures on ia64. Andreas. -- Andreas Schwab, sch...@linux-m68k.org GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 "And now for something completely different."
Re: Enabling -frename-registers?
On Fri, Apr 29, 2016 at 03:32:53PM +0200, Bernd Schmidt wrote: > On 04/29/2016 03:02 PM, David Edelsohn wrote: > >As an aside, this change seems to be the source of a new code > >generation bug affecting the PPC kernel. > > Please file a PR. -frename-registers merely exposed a pre-existing kernel bug; see https://lists.ozlabs.org/pipermail/linuxppc-dev/2016-April/142568.html Segher
Re: Enabling -frename-registers?
On April 29, 2016 3:48:37 PM GMT+02:00, David Edelsohnwrote: >On Fri, Apr 29, 2016 at 9:44 AM, Bernd Schmidt >wrote: >> >> >> On 04/29/2016 03:42 PM, David Edelsohn wrote: >>> >>> On Fri, Apr 29, 2016 at 9:32 AM, Bernd Schmidt >>> wrote: On 04/29/2016 03:02 PM, David Edelsohn wrote: > > > How has this show general benefit for all architectures to deserve > enabling it by default at -O2? It should improve postreload scheduling in general, and it can also >help clear up bad code generation left behind by register allocation. >>> >>> >>> Did you test the actual performance benefit on any architectures, >>> especially architectures other than x86? >> >> >> No. If that's the standard, I'll back out the change. > >It seems rather strange to enable an optimization by default across >all targets without even knowing the performance impact. > >I'm eager to learn the opinion of others about this. It shows overall benefit on Itanic and ups and downs on x86. It's stage1 and the easiest to get feedback for all archs is to enable it by default. Richard. >Thanks, David
Re: Enabling -frename-registers?
On 04/29/2016 03:42 PM, David Edelsohn wrote: On Fri, Apr 29, 2016 at 9:32 AM, Bernd Schmidtwrote: On 04/29/2016 03:02 PM, David Edelsohn wrote: How has this show general benefit for all architectures to deserve enabling it by default at -O2? It should improve postreload scheduling in general, and it can also help clear up bad code generation left behind by register allocation. Did you test the actual performance benefit on any architectures, especially architectures other than x86? No. If that's the standard, I'll back out the change. Bernd
Re: Enabling -frename-registers?
On Fri, Apr 29, 2016 at 9:44 AM, Bernd Schmidtwrote: > > > On 04/29/2016 03:42 PM, David Edelsohn wrote: >> >> On Fri, Apr 29, 2016 at 9:32 AM, Bernd Schmidt >> wrote: >>> >>> On 04/29/2016 03:02 PM, David Edelsohn wrote: How has this show general benefit for all architectures to deserve enabling it by default at -O2? >>> >>> >>> >>> It should improve postreload scheduling in general, and it can also help >>> clear up bad code generation left behind by register allocation. >> >> >> Did you test the actual performance benefit on any architectures, >> especially architectures other than x86? > > > No. If that's the standard, I'll back out the change. It seems rather strange to enable an optimization by default across all targets without even knowing the performance impact. I'm eager to learn the opinion of others about this. Thanks, David
Re: Enabling -frename-registers?
On Fri, Apr 29, 2016 at 9:32 AM, Bernd Schmidtwrote: > On 04/29/2016 03:02 PM, David Edelsohn wrote: >> >> How has this show general benefit for all architectures to deserve >> enabling it by default at -O2? > > > It should improve postreload scheduling in general, and it can also help > clear up bad code generation left behind by register allocation. Did you test the actual performance benefit on any architectures, especially architectures other than x86? Thanks, David
Re: Enabling -frename-registers?
On 04/29/2016 03:02 PM, David Edelsohn wrote: How has this show general benefit for all architectures to deserve enabling it by default at -O2? It should improve postreload scheduling in general, and it can also help clear up bad code generation left behind by register allocation. As an aside, this change seems to be the source of a new code generation bug affecting the PPC kernel. Please file a PR. Bernd
Re: Enabling -frename-registers?
How has this show general benefit for all architectures to deserve enabling it by default at -O2? As an aside, this change seems to be the source of a new code generation bug affecting the PPC kernel. Thanks, David
Re: Enabling -frename-registers?
> Index: invoke.texi > === > --- invoke.texi (revision 235475) > +++ invoke.texi (working copy) > @@ -8574,7 +8574,7 @@ make debugging impossible, since variabl > a ``home register''. > > Enabled by default with @option{-funroll-loops} and @option{-fpeel-loops}, > -and also enabled at levels @option{-O2} and @option{-O3}. > +and also enabled at levels @option{-O2}, @option{-O3} and @option{-Os}. > > @item -fschedule-fusion > @opindex fschedule-fusion Yes, this looks good to me (unless -frename-registers is specifically disabled at -Os in some other way, I didn't look into the details). -- Eric Botcazou
Re: Enabling -frename-registers?
On 04/27/2016 09:11 AM, Eric Botcazou wrote: @@ -8562,7 +8563,8 @@ debug information format adopted by the make debugging impossible, since variables no longer stay in a ``home register''. -Enabled by default with @option{-funroll-loops} and @option{-fpeel-loops}. +Enabled by default with @option{-funroll-loops} and @option{-fpeel-loops}, +and also enabled at levels @option{-O2} and @option{-O3}. OPT_LEVELS_2_PLUS includes -Os since it's basically -O2. So, this? Bernd Index: invoke.texi === --- invoke.texi (revision 235475) +++ invoke.texi (working copy) @@ -8574,7 +8574,7 @@ make debugging impossible, since variabl a ``home register''. Enabled by default with @option{-funroll-loops} and @option{-fpeel-loops}, -and also enabled at levels @option{-O2} and @option{-O3}. +and also enabled at levels @option{-O2}, @option{-O3} and @option{-Os}. @item -fschedule-fusion @opindex fschedule-fusion
Re: Enabling -frename-registers?
> @@ -8562,7 +8563,8 @@ debug information format adopted by the > make debugging impossible, since variables no longer stay in > a ``home register''. > > -Enabled by default with @option{-funroll-loops} and @option{-fpeel-loops}. > +Enabled by default with @option{-funroll-loops} and @option{-fpeel-loops}, > +and also enabled at levels @option{-O2} and @option{-O3}. OPT_LEVELS_2_PLUS includes -Os since it's basically -O2. -- Eric Botcazou
Re: Enabling -frename-registers?
On 04/17/2016 08:59 PM, Jeff Law wrote: invoke.texi has an independent list (probably incomplete! ;( of all the things that -O2 enables. Make sure to add -frename-registers to that list and this is Ok for the trunk (gcc-7). This is what I committed. Bernd Index: gcc/ChangeLog === --- gcc/ChangeLog (revision 235441) +++ gcc/ChangeLog (working copy) @@ -1,3 +1,10 @@ +2016-04-26 Bernd Schmidt+ + PR rtl-optimization/57193 + * opts.c (default_options_table): Add OPT_frename_registers at -O2 + and above. + * doc/invoke.texi (-frename-registers, -O2): Update documentation. + 2016-04-26 Bin Cheng * tree-if-conv.c (any_pred_load_store): New static variable. Index: gcc/opts.c === --- gcc/opts.c (revision 235441) +++ gcc/opts.c (working copy) @@ -498,6 +498,7 @@ static const struct default_options defa { OPT_LEVELS_2_PLUS, OPT_fstrict_overflow, NULL, 1 }, { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_freorder_blocks_algorithm_, NULL, REORDER_BLOCKS_ALGORITHM_STC }, +{ OPT_LEVELS_2_PLUS, OPT_frename_registers, NULL, 1 }, { OPT_LEVELS_2_PLUS, OPT_freorder_functions, NULL, 1 }, { OPT_LEVELS_2_PLUS, OPT_ftree_vrp, NULL, 1 }, { OPT_LEVELS_2_PLUS, OPT_ftree_pre, NULL, 1 }, Index: gcc/doc/invoke.texi === --- gcc/doc/invoke.texi (revision 235441) +++ gcc/doc/invoke.texi (working copy) @@ -6255,6 +6255,7 @@ also turns on the following optimization -foptimize-strlen @gol -fpartial-inlining @gol -fpeephole2 @gol +-frename-registers @gol -freorder-blocks-algorithm=stc @gol -freorder-blocks-and-partition -freorder-functions @gol -frerun-cse-after-loop @gol @@ -8562,7 +8563,8 @@ debug information format adopted by the make debugging impossible, since variables no longer stay in a ``home register''. -Enabled by default with @option{-funroll-loops} and @option{-fpeel-loops}. +Enabled by default with @option{-funroll-loops} and @option{-fpeel-loops}, +and also enabled at levels @option{-O2} and @option{-O3}. @item -fschedule-fusion @opindex fschedule-fusion
Re: Enabling -frename-registers?
On 01/29/2016 10:34 AM, Bernd Schmidt wrote: So PR57193 has an example of sub-optimal code generation, with some unnecessary register moves left after LRA. These seem to be difficult to prevent, but last year Robert Suchanek made some modifications to regrename that allow it to clean up such cases. Enabling -frename-registers removes one of the two unnecessary copies, and I'm pretty sure I could make it eliminate the other one as well with a bit more work. Hence, this patch. The renamer has seen a lot of fixes over the years and should be in pretty good shape IMO. Still, I won't deny that this is a bit riskier than the usual bugfix patch at this stage. Bootstrapped and tested on x86_64-linux, with my earlier patch to fix some i386 tests. Thoughts? Should we do this for gcc-7 at least? invoke.texi has an independent list (probably incomplete! ;( of all the things that -O2 enables. Make sure to add -frename-registers to that list and this is Ok for the trunk (gcc-7). jeff
Enabling -frename-registers?
So PR57193 has an example of sub-optimal code generation, with some unnecessary register moves left after LRA. These seem to be difficult to prevent, but last year Robert Suchanek made some modifications to regrename that allow it to clean up such cases. Enabling -frename-registers removes one of the two unnecessary copies, and I'm pretty sure I could make it eliminate the other one as well with a bit more work. Hence, this patch. The renamer has seen a lot of fixes over the years and should be in pretty good shape IMO. Still, I won't deny that this is a bit riskier than the usual bugfix patch at this stage. Bootstrapped and tested on x86_64-linux, with my earlier patch to fix some i386 tests. Thoughts? Should we do this for gcc-7 at least? Bernd PR rtl-optimization/57193 * opts.c (default_options_table): Add OPT_frename_registers at -O2 and above. * doc/invoke.texi (-frename-registers): Update documentation. Index: gcc/opts.c === --- gcc/opts.c (revision 232689) +++ gcc/opts.c (working copy) @@ -498,6 +498,7 @@ static const struct default_options defa { OPT_LEVELS_2_PLUS, OPT_fstrict_overflow, NULL, 1 }, { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_freorder_blocks_algorithm_, NULL, REORDER_BLOCKS_ALGORITHM_STC }, +{ OPT_LEVELS_2_PLUS, OPT_frename_registers, NULL, 1 }, { OPT_LEVELS_2_PLUS, OPT_freorder_functions, NULL, 1 }, { OPT_LEVELS_2_PLUS, OPT_ftree_vrp, NULL, 1 }, { OPT_LEVELS_2_PLUS, OPT_ftree_pre, NULL, 1 }, Index: gcc/doc/invoke.texi === --- gcc/doc/invoke.texi (revision 232689) +++ gcc/doc/invoke.texi (working copy) @@ -8467,7 +8467,8 @@ debug information format adopted by the make debugging impossible, since variables no longer stay in a ``home register''. -Enabled by default with @option{-funroll-loops} and @option{-fpeel-loops}. +Enabled by default with @option{-funroll-loops} and @option{-fpeel-loops}, +and also enabled at levels @option{-O2} and @option{-O3}. @item -fschedule-fusion @opindex fschedule-fusion
Re: Enabling -frename-registers?
On January 29, 2016 6:34:50 PM GMT+01:00, Bernd Schmidt <bschm...@redhat.com> wrote: >So PR57193 has an example of sub-optimal code generation, with some >unnecessary register moves left after LRA. These seem to be difficult >to >prevent, but last year Robert Suchanek made some modifications to >regrename that allow it to clean up such cases. Enabling >-frename-registers removes one of the two unnecessary copies, and I'm >pretty sure I could make it eliminate the other one as well with a bit >more work. > >Hence, this patch. The renamer has seen a lot of fixes over the years >and should be in pretty good shape IMO. Still, I won't deny that this >is >a bit riskier than the usual bugfix patch at this stage. > >Bootstrapped and tested on x86_64-linux, with my earlier patch to fix >some i386 tests. Thoughts? Should we do this for gcc-7 at least? I don't think it's appropriate at this stage. Richard. > >Bernd