Re: [committed] hppa: Revise REG+D address support to allow long displacements before reload
On 11/17/23 01:27, John David Anglin wrote: On 2023-11-16 10:00 p.m., Jeff Law wrote: I'm not seeing any obvious problems in the gcc testsuite. It needs testing on packages that do extensive floating point calculations. OK, I'll focus on those. THe more likely scenario is xmpy which is used for integer multiply, but the operands have to be moved into FP registers because the operation happens in the FPU. There are lots of xmpyu instructions in cc1 and cc1plus. For example, 9fee8c: 08 03 02 5c copy r3,ret0 9fee90: 37 9c 00 20 ldo 10(ret0),ret0 9fee94: 27 80 10 17 fldw 0(ret0),fr23 9fee98: 08 03 02 5c copy r3,ret0 9fee9c: 37 9c 00 28 ldo 14(ret0),ret0 9feea0: 27 80 10 16 fldw 0(ret0),fr22 9feea4: 3a f6 47 18 xmpyu fr23,fr22,fr24 9feea8: 2f c1 12 18 fstd fr24,-10(sp) 9feeac: 0f c1 10 9c ldw -10(sp),ret0 9feeb0: 0f c9 10 9d ldw -c(sp),ret1 There are 2169 xmpyu instructions in cc1plus in my current gcc bootstrap on linux. Yea, and as I said, it'll likely work for quite a while until you get spills in just the right scenario. We ran bootstraps, OS builds (our BSD and Mach systems), spec89, spec92, etc. Eventually it breaks :( jeff
Re: [committed] hppa: Revise REG+D address support to allow long displacements before reload
On 2023-11-16 10:00 p.m., Jeff Law wrote: I'm not seeing any obvious problems in the gcc testsuite. It needs testing on packages that do extensive floating point calculations. OK, I'll focus on those. THe more likely scenario is xmpy which is used for integer multiply, but the operands have to be moved into FP registers because the operation happens in the FPU. There are lots of xmpyu instructions in cc1 and cc1plus. For example, 9fee8c: 08 03 02 5c copy r3,ret0 9fee90: 37 9c 00 20 ldo 10(ret0),ret0 9fee94: 27 80 10 17 fldw 0(ret0),fr23 9fee98: 08 03 02 5c copy r3,ret0 9fee9c: 37 9c 00 28 ldo 14(ret0),ret0 9feea0: 27 80 10 16 fldw 0(ret0),fr22 9feea4: 3a f6 47 18 xmpyu fr23,fr22,fr24 9feea8: 2f c1 12 18 fstd fr24,-10(sp) 9feeac: 0f c1 10 9c ldw -10(sp),ret0 9feeb0: 0f c9 10 9d ldw -c(sp),ret1 There are 2169 xmpyu instructions in cc1plus in my current gcc bootstrap on linux. Dave -- John David Anglin dave.ang...@bell.net
Re: [committed] hppa: Revise REG+D address support to allow long displacements before reload
On 11/16/23 18:20, Sam James wrote: Jeff, I don't suppose you could dig out the old bugs/commits just out of interest? That work goes back to the early 90s when I was primarily responsible for the PA platform. But the core issue hasn't changed in that not enough context is provided for reload to know how to deal with these problems. So, digging out those testcases/codes would be quite difficult; at the time we didn't have standard procedures where tests were added to the testsuite for most changes or even discussed. I'm not seeing any obvious problems in the gcc testsuite. It needs testing on packages that do extensive floating point calculations. OK, I'll focus on those. THe more likely scenario is xmpy which is used for integer multiply, but the operands have to be moved into FP registers because the operation happens in the FPU. jeff
Re: [committed] hppa: Revise REG+D address support to allow long displacements before reload
Sam James writes: > John David Anglin writes: > >> On 2023-11-16 4:52 p.m., Jeff Law wrote: >>> >>> >>> On 11/16/23 10:54, John David Anglin wrote: Tested on hppa-unknown-linux-gnu and hppa64-hp-hpux11.11. Committed to trunk. This patch works around problem compiling python3.11 by improving REG+D address handling. The change results in smaller code and reduced register pressure. Dave --- hppa: Revise REG+D address support to allow long displacements before reload In analyzing PR rtl-optimization/112415, I realized that restricting REG+D offsets to 5-bits before reload results in very poor code and complexities in optimizing these instructions after reload. The general problem is long displacements are not allowed for floating point accesses when generating PA 1.1 code. Even with PA 2.0, there is a ELF linker bug that prevents using long displacements for floating point loads and stores. In the past, enabling long displacements before reload caused issues in reload. However, there have been fixes in the handling of reloads for floating-point accesses. This change allows long displacements before reload and corrects a couple of issues in the constraint handling for integer and floating-point accesses. 2023-11-16 John David Anglin gcc/ChangeLog: PR rtl-optimization/112415 * config/pa/pa.cc (pa_legitimate_address_p): Allow 14-bit displacements before reload. Simplify logic flow. Revise comments. * config/pa/pa.h (TARGET_ELF64): New define. (INT14_OK_STRICT): Update define and comment. * config/pa/pa64-linux.h (TARGET_ELF64): Define. * config/pa/predicates.md (base14_operand): Don't check alignment of short displacements. (integer_store_memory_operand): Don't return true when reload_in_progress is true. Remove INT_5_BITS check. (floating_point_store_memory_operand): Don't return true when reload_in_progress is true. Use INT14_OK_STRICT to check whether long displacements are always okay. >>> I strongly suspect this is going to cause problems in the end. >>> >>> I've already done what you're trying to do. It'll likely look fine >>> for an extended period of time, but it will almost certainly break >>> one day. > > Jeff, I don't suppose you could dig out the old bugs/commits just out of > interest? > >> I could happen. If it happens and can't be fixed, it's easy enough to >> return false in >> pa_legitimate_address_p before reload. Maybe we could add an optimization >> option for this. I might hack in an option for local testing so I can quickly check with/without... >> >> As it stands, the code improvement for python is significant. I don't think >> f-m-o can fix things after reload. >>a >> Hopefully, Sam will test the change with various package builds on gentoo. >> Debian is still on gcc-13. > > Yeah, happy to do that. We haven't got GCC 14 deployed in the wild, but > we have it available for people who want to test and opt-in to it. > > Fingers crossed it's calm. I'll let you know if it isn't ;) > >> I'm not seeing any obvious problems in the gcc testsuite. It needs testing >> on packages that do extensive >> floating point calculations. > > OK, I'll focus on those. > >> >> Dave
Re: [committed] hppa: Revise REG+D address support to allow long displacements before reload
John David Anglin writes: > On 2023-11-16 4:52 p.m., Jeff Law wrote: >> >> >> On 11/16/23 10:54, John David Anglin wrote: >>> Tested on hppa-unknown-linux-gnu and hppa64-hp-hpux11.11. Committed >>> to trunk. >>> >>> This patch works around problem compiling python3.11 by improving >>> REG+D address handling. The change results in smaller code and >>> reduced register pressure. >>> >>> Dave >>> --- >>> >>> hppa: Revise REG+D address support to allow long displacements before reload >>> >>> In analyzing PR rtl-optimization/112415, I realized that restricting >>> REG+D offsets to 5-bits before reload results in very poor code and >>> complexities in optimizing these instructions after reload. The >>> general problem is long displacements are not allowed for floating >>> point accesses when generating PA 1.1 code. Even with PA 2.0, there >>> is a ELF linker bug that prevents using long displacements for >>> floating point loads and stores. >>> >>> In the past, enabling long displacements before reload caused issues >>> in reload. However, there have been fixes in the handling of reloads >>> for floating-point accesses. This change allows long displacements >>> before reload and corrects a couple of issues in the constraint >>> handling for integer and floating-point accesses. >>> >>> 2023-11-16 John David Anglin >>> >>> gcc/ChangeLog: >>> >>> PR rtl-optimization/112415 >>> * config/pa/pa.cc (pa_legitimate_address_p): Allow 14-bit >>> displacements before reload. Simplify logic flow. Revise >>> comments. >>> * config/pa/pa.h (TARGET_ELF64): New define. >>> (INT14_OK_STRICT): Update define and comment. >>> * config/pa/pa64-linux.h (TARGET_ELF64): Define. >>> * config/pa/predicates.md (base14_operand): Don't check >>> alignment of short displacements. >>> (integer_store_memory_operand): Don't return true when >>> reload_in_progress is true. Remove INT_5_BITS check. >>> (floating_point_store_memory_operand): Don't return true when >>> reload_in_progress is true. Use INT14_OK_STRICT to check >>> whether long displacements are always okay. >> I strongly suspect this is going to cause problems in the end. >> >> I've already done what you're trying to do. It'll likely look fine >> for an extended period of time, but it will almost certainly break >> one day. Jeff, I don't suppose you could dig out the old bugs/commits just out of interest? > I could happen. If it happens and can't be fixed, it's easy enough to return > false in > pa_legitimate_address_p before reload. Maybe we could add an optimization > option for this. > > As it stands, the code improvement for python is significant. I don't think > f-m-o can fix things after reload. >a > Hopefully, Sam will test the change with various package builds on gentoo. > Debian is still on gcc-13. Yeah, happy to do that. We haven't got GCC 14 deployed in the wild, but we have it available for people who want to test and opt-in to it. Fingers crossed it's calm. I'll let you know if it isn't ;) > I'm not seeing any obvious problems in the gcc testsuite. It needs testing > on packages that do extensive > floating point calculations. OK, I'll focus on those. > > Dave
Re: [committed] hppa: Revise REG+D address support to allow long displacements before reload
On 2023-11-16 4:52 p.m., Jeff Law wrote: On 11/16/23 10:54, John David Anglin wrote: Tested on hppa-unknown-linux-gnu and hppa64-hp-hpux11.11. Committed to trunk. This patch works around problem compiling python3.11 by improving REG+D address handling. The change results in smaller code and reduced register pressure. Dave --- hppa: Revise REG+D address support to allow long displacements before reload In analyzing PR rtl-optimization/112415, I realized that restricting REG+D offsets to 5-bits before reload results in very poor code and complexities in optimizing these instructions after reload. The general problem is long displacements are not allowed for floating point accesses when generating PA 1.1 code. Even with PA 2.0, there is a ELF linker bug that prevents using long displacements for floating point loads and stores. In the past, enabling long displacements before reload caused issues in reload. However, there have been fixes in the handling of reloads for floating-point accesses. This change allows long displacements before reload and corrects a couple of issues in the constraint handling for integer and floating-point accesses. 2023-11-16 John David Anglin gcc/ChangeLog: PR rtl-optimization/112415 * config/pa/pa.cc (pa_legitimate_address_p): Allow 14-bit displacements before reload. Simplify logic flow. Revise comments. * config/pa/pa.h (TARGET_ELF64): New define. (INT14_OK_STRICT): Update define and comment. * config/pa/pa64-linux.h (TARGET_ELF64): Define. * config/pa/predicates.md (base14_operand): Don't check alignment of short displacements. (integer_store_memory_operand): Don't return true when reload_in_progress is true. Remove INT_5_BITS check. (floating_point_store_memory_operand): Don't return true when reload_in_progress is true. Use INT14_OK_STRICT to check whether long displacements are always okay. I strongly suspect this is going to cause problems in the end. I've already done what you're trying to do. It'll likely look fine for an extended period of time, but it will almost certainly break one day. I could happen. If it happens and can't be fixed, it's easy enough to return false in pa_legitimate_address_p before reload. Maybe we could add an optimization option for this. As it stands, the code improvement for python is significant. I don't think f-m-o can fix things after reload. Hopefully, Sam will test the change with various package builds on gentoo. Debian is still on gcc-13. I'm not seeing any obvious problems in the gcc testsuite. It needs testing on packages that do extensive floating point calculations. Dave -- John David Anglin dave.ang...@bell.net
Re: [committed] hppa: Revise REG+D address support to allow long displacements before reload
On 11/16/23 10:54, John David Anglin wrote: Tested on hppa-unknown-linux-gnu and hppa64-hp-hpux11.11. Committed to trunk. This patch works around problem compiling python3.11 by improving REG+D address handling. The change results in smaller code and reduced register pressure. Dave --- hppa: Revise REG+D address support to allow long displacements before reload In analyzing PR rtl-optimization/112415, I realized that restricting REG+D offsets to 5-bits before reload results in very poor code and complexities in optimizing these instructions after reload. The general problem is long displacements are not allowed for floating point accesses when generating PA 1.1 code. Even with PA 2.0, there is a ELF linker bug that prevents using long displacements for floating point loads and stores. In the past, enabling long displacements before reload caused issues in reload. However, there have been fixes in the handling of reloads for floating-point accesses. This change allows long displacements before reload and corrects a couple of issues in the constraint handling for integer and floating-point accesses. 2023-11-16 John David Anglin gcc/ChangeLog: PR rtl-optimization/112415 * config/pa/pa.cc (pa_legitimate_address_p): Allow 14-bit displacements before reload. Simplify logic flow. Revise comments. * config/pa/pa.h (TARGET_ELF64): New define. (INT14_OK_STRICT): Update define and comment. * config/pa/pa64-linux.h (TARGET_ELF64): Define. * config/pa/predicates.md (base14_operand): Don't check alignment of short displacements. (integer_store_memory_operand): Don't return true when reload_in_progress is true. Remove INT_5_BITS check. (floating_point_store_memory_operand): Don't return true when reload_in_progress is true. Use INT14_OK_STRICT to check whether long displacements are always okay. I strongly suspect this is going to cause problems in the end. I've already done what you're trying to do. It'll likely look fine for an extended period of time, but it will almost certainly break one day. Jeff