Re: [PATCH] PR target/65240, Fix Power{7,8} insn constraint issue with -O3 -ffast-math
Michael Meissner meiss...@linux.vnet.ibm.com writes: PR target/65240 * config/rs6000/predicates.md (easy_fp_constant): Remove special -ffast-math handling that kept non-0 constants live in the RTL until reload. Remove logic testing the number of instructions it took to create a constant in a GPR that was never used, due to a test for soft-float earlier. (memory_fp_constant): Delete, no longer used. spawn /daten/gcc/gcc-20150403/Build/gcc/xgcc -B/daten/gcc/gcc-20150403/Build/gcc/ -fno-diagnostics-show-caret -fdiagnostics-color=never -O2 -w -c -m64 -o pr33855.o /daten/gcc/gcc-20150403/gcc/testsuite/gcc.c-torture/compile/pr33855.c. /daten/gcc/gcc-20150403/gcc/testsuite/gcc.c-torture/compile/pr33855.c: In function 'foo':. /daten/gcc/gcc-20150403/gcc/testsuite/gcc.c-torture/compile/pr33855.c:27:1: error: unrecognizable insn:. (insn 136 135 48 5 (set (reg:DF 10 10). (mem/u/c:DF (symbol_ref/u:DI (*.LC1) [flags 0x82]) [8 S8 A64])) /daten/gcc/gcc-20150403/gcc/testsuite/gcc.c-torture/compile/pr33855.c:20 -1. (nil)). /daten/gcc/gcc-20150403/gcc/testsuite/gcc.c-torture/compile/pr33855.c:27:1: internal compiler error: in extract_insn, at recog.c:2343. 0x1062355b _fatal_insn(char const*, rtx_def const*, char const*, int, char const*). ../../gcc/rtl-error.c:110. 0x106235af _fatal_insn_not_found(rtx_def const*, char const*, int, char const*). ../../gcc/rtl-error.c:118. 0x105eba47 extract_insn(rtx_insn*). ../../gcc/recog.c:2343. 0x105ebb07 extract_insn_cached(rtx_insn*). ../../gcc/recog.c:2234. 0x103aa267 cleanup_subreg_operands(rtx_insn*). ../../gcc/final.c:3137. 0x10620d57 reload(rtx_insn*, int). ../../gcc/reload1.c:1278. 0x104d8ea7 do_reload. ../../gcc/ira.c:5430. 0x104d8ea7 execute. ../../gcc/ira.c:5589. http://gcc.gnu.org/ml/gcc-testresults/2015-04/msg00386.html Andreas. -- Andreas Schwab, sch...@linux-m68k.org GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 And now for something completely different.
Re: [PATCH] PR target/65240, Fix Power{7,8} insn constraint issue with -O3 -ffast-math
On Sat, Apr 04, 2015 at 08:25:10PM +0200, Andreas Schwab wrote: /daten/gcc/gcc-20150403/gcc/testsuite/gcc.c-torture/compile/pr33855.c:27:1: error: unrecognizable insn:. Should be fixed with r221862. -- Alan Modra Australia Development Lab, IBM
Re: [PATCH] PR target/65240, Fix Power{7,8} insn constraint issue with -O3 -ffast-math
On Wed, Mar 18, 2015 at 12:08:47PM +0100, Richard Biener wrote: Did you double-check if there are any differences in generated code? Esp. the SPEC INT benchmarks look odd - they don't contain any FP code. SpecINT does contain some amount of floating point. Off the top of my head: 1) Bzip2 does a percentage calculation for fprintf; 2) Libquantum uses sin/cos (which the compiler optimizes to sincos) in the initial setup to set up the data. 3) I don't recall if the version of GCC used in Spec had switched over to using floating point for the register allocator or not. 4) Perlbench has a lot of code that does floating pointing point, but the main loop excerised in the Spec runs probably doesn't use FP. Sorry I couldn't respond earlier, the corporate IMAP email server was down for a period of time. Any way, I do see code changes. Before the fix was made, if you used -ffast-math, it kept the floating point constants around until reload. When reload could not find a reload to load the constant, it would push the constant to memory, and do validize_mem on the address. This created the sequence: addis 9,2,.LC0@toc@ha addi 9,9,.LC0@toc@l lfd 0,0(9) Because the address was a single register, it could also load the value into the traditional Altivec registers: addis 9,2,.LC0@toc@ha addi 9,9,.LC0@toc@l lxsdx 32,0(9) And in fact the register allocator seemed to prefer loading constants into the traditional Altivec registers instead of the traditional floating point registers. Once I made the change to force the constant to memory earlier, it would use normal addressing, and generate something like: addis 9,2,.LC0@toc@ha lfs 0,.LC0@toc@l(9) This meant that a lot of addi's are no longer generated, because the addi is folded into the lfs/lfd instruction. In addition, due to the VSX memory instructions being only register+register, whenever the code wanted to load floating point constants, it would prefer the traditional floating point registers which had register+offset addressing. This meant in turn, that any instruction that used a FP constant could potentially be changed from a VSX form (i.e. xsmuldp) into a traditional FP form (i.e. fmul) if all of the inputs and outputs were traditional floating point registers. Finally, I suspect pushing the address out earlier, means that it might be keeping the address live a little bit longer, which could change things if we are spilling GPRs. One of the things I am working on for GCC 6.0 is going back to reworking on the address support to do a better job with fusion, and I believe it will reduce the life time for some of these address temporaries. -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Re: [PATCH] PR target/65240, Fix Power{7,8} insn constraint issue with -O3 -ffast-math
On Wed, Mar 18, 2015 at 12:18 AM, Michael Meissner meiss...@linux.vnet.ibm.com wrote: On Thu, Mar 12, 2015 at 11:37:14AM -0400, David Edelsohn wrote: Please check on the performance implications of removing the special constant support. I know that it is late, but I think that ripping it out is less risky than trying to fix this, if the performance impact is not bad. Now, I haven't drilled down to exactly what is causing the performance differences, but I've done some Spec 2006 runs comparing subversion id 221194, with the two patches. The first patch is a rewrite of the code that I originally put into the compiler to move floating point constants under -ffast-math during the first split pass. A minor tweak would need to be done to the original patch so that it works with -mcmodel=small or -m32 options. The second patch completely eliminates keeping the non-0 constant around in RTL, and pushes it to memory during the initial RTL generation, since it is felt that the RTL optimizations no longer need the constant in RTL to convert division by constant into multiplication by the reciprocal. The benchmarcks that show a difference are. Note, I do not count benchmarks that differ by less than 2% to be significant. Percentages more than 100% mean the benchmark ran faster: Benchmark Patch-1 Patch-2 = === === 401.bzip2 102.59% 103.51% 462.libquantum 100.28% 97.52% 483.xalancbmk97.72% 97.90% 435.gromacs 104.48% 99.39% 436.cactusADM 102.19% 102.90% 470.lbm 100.39% 97.45% Spec INT score 99.86% 99.86% Spec FP score 100.50% 99.81% Patch #1 had 3 faster benchmarks and 1 slower benchmark. Patch #2 had 2 faster benchmarks, and 3 slower benchmarks. Did you double-check if there are any differences in generated code? Esp. the SPEC INT benchmarks look odd - they don't contain any FP code. Richard. I tend to feel patch #2 is cleaner, though it is slightly slower. However, I can go with patch #1 if desired. Patch #2 bootstrapped fine, and had no regressions in the test suite. Did you want me to install patch #1, patch #2, or do you want more information? [gcc] 2015-03-17 Michael Meissner meiss...@linux.vnet.ibm.com PR target/65240 * config/rs6000/predicates.md (easy_fp_constant): Remove special -ffast-math handling that kept non-0 constants live in the RTL until reload. Remove logic testing the number of instructions it took to create a constant in a GPR that was never used, due to a test for soft-float earlier. (memory_fp_constant): Delete, no longer used. * config/rs6000/rs6000.md (movMODE_hardfloat): Remove alternatives for loading non-0 constants into GPRs for hard floating point that is no longer needed due to changes in easy_fp_constant. Add support for loading 0.0 into GPRs. (movmode_hardfloat32): Likewise. (movmode_hardfloat64): Likewise. (movmode_64bit_dm): Likewise. (movtd_64bit_nodm): Likewise. (pre-reload move FP constant define_split): Delete define_split, since it is no longer used. (extenddftf2_internal): Remove GHF constraints that are not valid for extenddftf2. [gcc/testsuite] 2015-03-17 Michael Meissner meiss...@linux.vnet.ibm.com PR target/65240 * gcc/testsuite/g++.dg/pr65240.h: Add tests for PR 65240. * gcc/testsuite/g++.dg/pr65240-1.C: Likewise. * gcc/testsuite/g++.dg/pr65240-2.C: Likewise. * gcc/testsuite/g++.dg/pr65240-3.C: Likewise. * gcc/testsuite/g++.dg/pr65240-4.C: Likewise. -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Re: [PATCH] PR target/65240, Fix Power{7,8} insn constraint issue with -O3 -ffast-math
On Wed, Mar 11, 2015 at 08:52:54PM -0400, David Edelsohn wrote: On Wed, Mar 11, 2015 at 6:21 PM, Michael Meissner meiss...@linux.vnet.ibm.com wrote: On Wed, Mar 11, 2015 at 01:02:06PM -0400, David Edelsohn wrote: I am concerned with the create_TOC_reference use for TARGET_TOC. Has this been tested with big endian -mcmodel=small? Yes, that was a problem. Patch coming up soon. Thanks. Can you call rs6000_emit_move_directly? Well, I can, but I would have to have some sort of flag that says after the split1 pass not to allow FP constants in move (other than 0.0). It is doable, but it does touch more areas in the rs6000 back end. I am starting to think that it is just simpler to rip out all of the special fast math handling of constants, considering the multiply by reciprocal support has moved to SSA/tree and away from RTL. Did you want me to investigate the performance implications of removing it now (rather than waiting to GCC 6.0), or just do the more limited patch that I've been pursuing. -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Re: [PATCH] PR target/65240, Fix Power{7,8} insn constraint issue with -O3 -ffast-math
On Thu, Mar 12, 2015 at 11:29 AM, Michael Meissner meiss...@linux.vnet.ibm.com wrote: On Wed, Mar 11, 2015 at 08:52:54PM -0400, David Edelsohn wrote: On Wed, Mar 11, 2015 at 6:21 PM, Michael Meissner meiss...@linux.vnet.ibm.com wrote: On Wed, Mar 11, 2015 at 01:02:06PM -0400, David Edelsohn wrote: I am concerned with the create_TOC_reference use for TARGET_TOC. Has this been tested with big endian -mcmodel=small? Yes, that was a problem. Patch coming up soon. Thanks. Can you call rs6000_emit_move_directly? Well, I can, but I would have to have some sort of flag that says after the split1 pass not to allow FP constants in move (other than 0.0). It is doable, but it does touch more areas in the rs6000 back end. I am starting to think that it is just simpler to rip out all of the special fast math handling of constants, considering the multiply by reciprocal support has moved to SSA/tree and away from RTL. Did you want me to investigate the performance implications of removing it now (rather than waiting to GCC 6.0), or just do the more limited patch that I've been pursuing. Please check on the performance implications of removing the special constant support. I know that it is late, but I think that ripping it out is less risky than trying to fix this, if the performance impact is not bad. Thanks, David
Re: [PATCH] PR target/65240, Fix Power{7,8} insn constraint issue with -O3 -ffast-math
On Thu, Mar 5, 2015 at 3:06 PM, Michael Meissner meiss...@linux.vnet.ibm.com wrote: This patch fixes PR 65240, which was a latent bug that was introduced when I added the -mupper-regs support to the PowerPC compiler. In the PowerPC compiler, if you use -ffast-math, the compiler allows floating point constants in move RTLs until register allocation time, in order to allow the optimizations that replace division by a constant with multiplication by the reciprocal. If the register being loaded up is a traditional Altivec register, the load will fail, since there is no D-FORM (register+offset) addressing mode for the traditional altivec registers. I had had a define_split to try and handle this situation, but it didn't work all of the time. I rewrote the insns, so there is an UNSPEC in it to prevent over optimization. I have done bootstrap builds and make check's on both a big endian power7 system, and a little endian power8 system with no regressions in the tests. I have also built both power7 and power8 spec 2006 (v5) versions, and Spec built. I have run the Spec floating point tests on a power7 box, and all of the tests performed about at the same speed as before the bug fix was done, with the exception of gromacs, which is 4% faster with the fix. Looking at the static instruction counts, the big change is to fold the addi instruction into the load of the constant if it is loading the value to a traditional floating point register (previously, it would load up the constant with addis/addi and then do an indirect load). I do see other instructions change, including a few more floating point loads, but with the exception of gromacs, most of the changes are in the noise level. Is this ok to install into the trunk? At present, it is not needed for GCC 4.9, since that branch does not have the -mupper-regs support. [gcc] 2015-03-05 Michael Meissner meiss...@linux.vnet.ibm.com PR target/65240 * config/rs6000/rs6000.md (UNSPEC_FUSION_FP_CONSTANT): New unspec. (Vsx_load): New mode attribute to give appropriate VSX load instruction. (move floating point constant define_split): Use an UNSPEC to make sure that the load of the constant is kept as a memory address, instead of being converted back into a move of the constant. (load_mode_constant): Likewise. [gcc/testsuite] 2015-03-05 Michael Meissner meiss...@linux.vnet.ibm.com PR target/65240 * gcc.target/powerpc/pr65240.c: New test. I am concerned with the create_TOC_reference use for TARGET_TOC. Has this been tested with big endian -mcmodel=small? Thanks, David
Re: [PATCH] PR target/65240, Fix Power{7,8} insn constraint issue with -O3 -ffast-math
On Wed, Mar 11, 2015 at 6:21 PM, Michael Meissner meiss...@linux.vnet.ibm.com wrote: On Wed, Mar 11, 2015 at 01:02:06PM -0400, David Edelsohn wrote: I am concerned with the create_TOC_reference use for TARGET_TOC. Has this been tested with big endian -mcmodel=small? Yes, that was a problem. Patch coming up soon. Thanks. Can you call rs6000_emit_move_directly? Thanks, David
Re: [PATCH] PR target/65240, Fix Power{7,8} insn constraint issue with -O3 -ffast-math
On Wed, Mar 11, 2015 at 01:02:06PM -0400, David Edelsohn wrote: I am concerned with the create_TOC_reference use for TARGET_TOC. Has this been tested with big endian -mcmodel=small? Yes, that was a problem. Patch coming up soon. Thanks. -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Re: [PATCH] PR target/65240, Fix Power{7,8} insn constraint issue with -O3 -ffast-math
On Fri, Mar 06, 2015 at 01:05:31PM +0100, Richard Biener wrote: On Thu, Mar 5, 2015 at 9:06 PM, Michael Meissner meiss...@linux.vnet.ibm.com wrote: This patch fixes PR 65240, which was a latent bug that was introduced when I added the -mupper-regs support to the PowerPC compiler. In the PowerPC compiler, if you use -ffast-math, the compiler allows floating point constants in move RTLs until register allocation time, in order to allow the optimizations that replace division by a constant with multiplication by the reciprocal. Don't we perform this optimization on the GIMPLE/tree level already? Richard. I've suspected that the code is a hold over from the days when the multiply by the reciprocal was done in RTL. I didn't want to change the logic right now, but it something that should be investigated when GCC 6.0 opens up. -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Re: [PATCH] PR target/65240, Fix Power{7,8} insn constraint issue with -O3 -ffast-math
On Thu, Mar 5, 2015 at 9:06 PM, Michael Meissner meiss...@linux.vnet.ibm.com wrote: This patch fixes PR 65240, which was a latent bug that was introduced when I added the -mupper-regs support to the PowerPC compiler. In the PowerPC compiler, if you use -ffast-math, the compiler allows floating point constants in move RTLs until register allocation time, in order to allow the optimizations that replace division by a constant with multiplication by the reciprocal. Don't we perform this optimization on the GIMPLE/tree level already? Richard. If the register being loaded up is a traditional Altivec register, the load will fail, since there is no D-FORM (register+offset) addressing mode for the traditional altivec registers. I had had a define_split to try and handle this situation, but it didn't work all of the time. I rewrote the insns, so there is an UNSPEC in it to prevent over optimization. I have done bootstrap builds and make check's on both a big endian power7 system, and a little endian power8 system with no regressions in the tests. I have also built both power7 and power8 spec 2006 (v5) versions, and Spec built. I have run the Spec floating point tests on a power7 box, and all of the tests performed about at the same speed as before the bug fix was done, with the exception of gromacs, which is 4% faster with the fix. Looking at the static instruction counts, the big change is to fold the addi instruction into the load of the constant if it is loading the value to a traditional floating point register (previously, it would load up the constant with addis/addi and then do an indirect load). I do see other instructions change, including a few more floating point loads, but with the exception of gromacs, most of the changes are in the noise level. Is this ok to install into the trunk? At present, it is not needed for GCC 4.9, since that branch does not have the -mupper-regs support. [gcc] 2015-03-05 Michael Meissner meiss...@linux.vnet.ibm.com PR target/65240 * config/rs6000/rs6000.md (UNSPEC_FUSION_FP_CONSTANT): New unspec. (Vsx_load): New mode attribute to give appropriate VSX load instruction. (move floating point constant define_split): Use an UNSPEC to make sure that the load of the constant is kept as a memory address, instead of being converted back into a move of the constant. (load_mode_constant): Likewise. [gcc/testsuite] 2015-03-05 Michael Meissner meiss...@linux.vnet.ibm.com PR target/65240 * gcc.target/powerpc/pr65240.c: New test. -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797