Re: [PATCH] PR target/65240, Fix Power{7,8} insn constraint issue with -O3 -ffast-math

2015-04-04 Thread Andreas Schwab
Michael Meissner meiss...@linux.vnet.ibm.com writes:

   PR target/65240
   * config/rs6000/predicates.md (easy_fp_constant): Remove special
   -ffast-math handling that kept non-0 constants live in the RTL
   until reload.  Remove logic testing the number of instructions it
   took to create a constant in a GPR that was never used, due to a
   test for soft-float earlier.
   (memory_fp_constant): Delete, no longer used.

spawn /daten/gcc/gcc-20150403/Build/gcc/xgcc 
-B/daten/gcc/gcc-20150403/Build/gcc/ -fno-diagnostics-show-caret 
-fdiagnostics-color=never -O2 -w -c -m64 -o pr33855.o 
/daten/gcc/gcc-20150403/gcc/testsuite/gcc.c-torture/compile/pr33855.c.
/daten/gcc/gcc-20150403/gcc/testsuite/gcc.c-torture/compile/pr33855.c: In 
function 'foo':.
/daten/gcc/gcc-20150403/gcc/testsuite/gcc.c-torture/compile/pr33855.c:27:1: 
error: unrecognizable insn:.
(insn 136 135 48 5 (set (reg:DF 10 10).
(mem/u/c:DF (symbol_ref/u:DI (*.LC1) [flags 0x82]) [8  S8 A64])) 
/daten/gcc/gcc-20150403/gcc/testsuite/gcc.c-torture/compile/pr33855.c:20 -1.
 (nil)).
/daten/gcc/gcc-20150403/gcc/testsuite/gcc.c-torture/compile/pr33855.c:27:1: 
internal compiler error: in extract_insn, at recog.c:2343.
0x1062355b _fatal_insn(char const*, rtx_def const*, char const*, int, char 
const*).
../../gcc/rtl-error.c:110.
0x106235af _fatal_insn_not_found(rtx_def const*, char const*, int, char const*).
../../gcc/rtl-error.c:118.
0x105eba47 extract_insn(rtx_insn*).
../../gcc/recog.c:2343.
0x105ebb07 extract_insn_cached(rtx_insn*).
../../gcc/recog.c:2234.
0x103aa267 cleanup_subreg_operands(rtx_insn*).
../../gcc/final.c:3137.
0x10620d57 reload(rtx_insn*, int).
../../gcc/reload1.c:1278.
0x104d8ea7 do_reload.
../../gcc/ira.c:5430.
0x104d8ea7 execute.
../../gcc/ira.c:5589.

http://gcc.gnu.org/ml/gcc-testresults/2015-04/msg00386.html

Andreas.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
And now for something completely different.


Re: [PATCH] PR target/65240, Fix Power{7,8} insn constraint issue with -O3 -ffast-math

2015-04-04 Thread Alan Modra
On Sat, Apr 04, 2015 at 08:25:10PM +0200, Andreas Schwab wrote:
 /daten/gcc/gcc-20150403/gcc/testsuite/gcc.c-torture/compile/pr33855.c:27:1: 
 error: unrecognizable insn:.

Should be fixed with r221862.

-- 
Alan Modra
Australia Development Lab, IBM


Re: [PATCH] PR target/65240, Fix Power{7,8} insn constraint issue with -O3 -ffast-math

2015-03-20 Thread Michael Meissner
On Wed, Mar 18, 2015 at 12:08:47PM +0100, Richard Biener wrote:
 Did you double-check if there are any differences in generated code?
 Esp. the SPEC INT benchmarks look odd - they don't contain any
 FP code.

SpecINT does contain some amount of floating point.  Off the top of my head:

   1)   Bzip2 does a percentage calculation for fprintf;
   2)   Libquantum uses sin/cos (which the compiler optimizes to sincos) in the
initial setup to set up the data.
   3)   I don't recall if the version of GCC used in Spec had switched over to
using floating point for the register allocator or not.
   4)   Perlbench has a lot of code that does floating pointing point, but the
main loop excerised in the Spec runs probably doesn't use FP.

Sorry I couldn't respond earlier, the corporate IMAP email server was down for
a period of time.

Any way, I do see code changes.

Before the fix was made, if you used -ffast-math, it kept the floating point
constants around until reload.  When reload could not find a reload to load the
constant, it would push the constant to memory, and do validize_mem on the
address.  This created the sequence:

addis 9,2,.LC0@toc@ha
addi 9,9,.LC0@toc@l
lfd 0,0(9)

Because the address was a single register, it could also load the value into
the traditional Altivec registers:

addis 9,2,.LC0@toc@ha
addi 9,9,.LC0@toc@l
lxsdx 32,0(9)

And in fact the register allocator seemed to prefer loading constants into the
traditional Altivec registers instead of the traditional floating point
registers.

Once I made the change to force the constant to memory earlier, it would use
normal addressing, and generate something like:

addis 9,2,.LC0@toc@ha
lfs 0,.LC0@toc@l(9)

This meant that a lot of addi's are no longer generated, because the addi is
folded into the lfs/lfd instruction.

In addition, due to the VSX memory instructions being only register+register,
whenever the code wanted to load floating point constants, it would prefer the
traditional floating point registers which had register+offset addressing.
This meant in turn, that any instruction that used a FP constant could
potentially be changed from a VSX form (i.e. xsmuldp) into a traditional FP
form (i.e. fmul) if all of the inputs and outputs were traditional floating
point registers.

Finally, I suspect pushing the address out earlier, means that it might be
keeping the address live a little bit longer, which could change things if we
are spilling GPRs.  One of the things I am working on for GCC 6.0 is going back
to reworking on the address support to do a better job with fusion, and I
believe it will reduce the life time for some of these address temporaries.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797



Re: [PATCH] PR target/65240, Fix Power{7,8} insn constraint issue with -O3 -ffast-math

2015-03-18 Thread Richard Biener
On Wed, Mar 18, 2015 at 12:18 AM, Michael Meissner
meiss...@linux.vnet.ibm.com wrote:
 On Thu, Mar 12, 2015 at 11:37:14AM -0400, David Edelsohn wrote:
 Please check on the performance implications of removing the special
 constant support.  I know that it is late, but I think that ripping it
 out is less risky than trying to fix this, if the performance impact
 is not bad.

 Now, I haven't drilled down to exactly what is causing the performance
 differences, but I've done some Spec 2006 runs comparing subversion id 221194,
 with the two patches.

 The first patch is a rewrite of the code that I originally put into the
 compiler to move floating point constants under -ffast-math during the first
 split pass.  A minor tweak would need to be done to the original patch so that
 it works with -mcmodel=small or -m32 options.

 The second patch completely eliminates keeping the non-0 constant around in
 RTL, and pushes it to memory during the initial RTL generation, since it is
 felt that the RTL optimizations no longer need the constant in RTL to convert
 division by constant into multiplication by the reciprocal.

 The benchmarcks that show a difference are.  Note, I do not count benchmarks
 that differ by less than 2% to be significant.  Percentages more than 100% 
 mean
 the benchmark ran faster:

 Benchmark   Patch-1 Patch-2
 =   === ===
 401.bzip2   102.59% 103.51%
 462.libquantum  100.28%  97.52%
 483.xalancbmk97.72%  97.90%
 435.gromacs 104.48%  99.39%
 436.cactusADM   102.19% 102.90%
 470.lbm 100.39%  97.45%
 Spec INT score   99.86%  99.86%
 Spec FP score   100.50%  99.81%

 Patch #1 had 3 faster benchmarks and 1 slower benchmark.  Patch #2 had 2 
 faster
 benchmarks, and 3 slower benchmarks.

Did you double-check if there are any differences in generated code?
Esp. the SPEC INT benchmarks look odd - they don't contain any
FP code.

Richard.

 I tend to feel patch #2 is cleaner, though it is slightly slower.  However, I
 can go with patch #1 if desired.

 Patch #2 bootstrapped fine, and had no regressions in the test suite.  Did
 you want me to install patch #1, patch #2, or do you want more information?

 [gcc]
 2015-03-17  Michael Meissner  meiss...@linux.vnet.ibm.com

 PR target/65240
 * config/rs6000/predicates.md (easy_fp_constant): Remove special
 -ffast-math handling that kept non-0 constants live in the RTL
 until reload.  Remove logic testing the number of instructions it
 took to create a constant in a GPR that was never used, due to a
 test for soft-float earlier.
 (memory_fp_constant): Delete, no longer used.

 * config/rs6000/rs6000.md (movMODE_hardfloat): Remove
 alternatives for loading non-0 constants into GPRs for hard
 floating point that is no longer needed due to changes in
 easy_fp_constant.  Add support for loading 0.0 into GPRs.
 (movmode_hardfloat32): Likewise.
 (movmode_hardfloat64): Likewise.
 (movmode_64bit_dm): Likewise.
 (movtd_64bit_nodm): Likewise.
 (pre-reload move FP constant define_split): Delete define_split,
 since it is no longer used.
 (extenddftf2_internal): Remove GHF constraints that are not valid
 for extenddftf2.

 [gcc/testsuite]
 2015-03-17  Michael Meissner  meiss...@linux.vnet.ibm.com

 PR target/65240
 * gcc/testsuite/g++.dg/pr65240.h: Add tests for PR 65240.
 * gcc/testsuite/g++.dg/pr65240-1.C: Likewise.
 * gcc/testsuite/g++.dg/pr65240-2.C: Likewise.
 * gcc/testsuite/g++.dg/pr65240-3.C: Likewise.
 * gcc/testsuite/g++.dg/pr65240-4.C: Likewise.

 --
 Michael Meissner, IBM
 IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
 email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797


Re: [PATCH] PR target/65240, Fix Power{7,8} insn constraint issue with -O3 -ffast-math

2015-03-12 Thread Michael Meissner
On Wed, Mar 11, 2015 at 08:52:54PM -0400, David Edelsohn wrote:
 On Wed, Mar 11, 2015 at 6:21 PM, Michael Meissner
 meiss...@linux.vnet.ibm.com wrote:
  On Wed, Mar 11, 2015 at 01:02:06PM -0400, David Edelsohn wrote:
  I am concerned with the create_TOC_reference use for TARGET_TOC.  Has
  this been tested with big endian -mcmodel=small?
 
  Yes, that was a problem.  Patch coming up soon.  Thanks.
 
 Can you call rs6000_emit_move_directly?

Well, I can, but I would have to have some sort of flag that says after the
split1 pass not to allow FP constants in move (other than 0.0).  It is doable,
but it does touch more areas in the rs6000 back end.

I am starting to think that it is just simpler to rip out all of the special
fast math handling of constants, considering the multiply by reciprocal support
has moved to SSA/tree and away from RTL.  Did you want me to investigate the
performance implications of removing it now (rather than waiting to GCC 6.0),
or just do the more limited patch that I've been pursuing.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797



Re: [PATCH] PR target/65240, Fix Power{7,8} insn constraint issue with -O3 -ffast-math

2015-03-12 Thread David Edelsohn
On Thu, Mar 12, 2015 at 11:29 AM, Michael Meissner
meiss...@linux.vnet.ibm.com wrote:
 On Wed, Mar 11, 2015 at 08:52:54PM -0400, David Edelsohn wrote:
 On Wed, Mar 11, 2015 at 6:21 PM, Michael Meissner
 meiss...@linux.vnet.ibm.com wrote:
  On Wed, Mar 11, 2015 at 01:02:06PM -0400, David Edelsohn wrote:
  I am concerned with the create_TOC_reference use for TARGET_TOC.  Has
  this been tested with big endian -mcmodel=small?
 
  Yes, that was a problem.  Patch coming up soon.  Thanks.

 Can you call rs6000_emit_move_directly?

 Well, I can, but I would have to have some sort of flag that says after the
 split1 pass not to allow FP constants in move (other than 0.0).  It is doable,
 but it does touch more areas in the rs6000 back end.

 I am starting to think that it is just simpler to rip out all of the special
 fast math handling of constants, considering the multiply by reciprocal 
 support
 has moved to SSA/tree and away from RTL.  Did you want me to investigate the
 performance implications of removing it now (rather than waiting to GCC 6.0),
 or just do the more limited patch that I've been pursuing.

Please check on the performance implications of removing the special
constant support.  I know that it is late, but I think that ripping it
out is less risky than trying to fix this, if the performance impact
is not bad.

Thanks, David


Re: [PATCH] PR target/65240, Fix Power{7,8} insn constraint issue with -O3 -ffast-math

2015-03-11 Thread David Edelsohn
On Thu, Mar 5, 2015 at 3:06 PM, Michael Meissner
meiss...@linux.vnet.ibm.com wrote:
 This patch fixes PR 65240, which was a latent bug that was introduced when I
 added the -mupper-regs support to the PowerPC compiler.  In the PowerPC
 compiler, if you use -ffast-math, the compiler allows floating point constants
 in move RTLs until register allocation time, in order to allow the
 optimizations that replace division by a constant with multiplication by the
 reciprocal.  If the register being loaded up is a traditional Altivec 
 register,
 the load will fail, since there is no D-FORM (register+offset) addressing mode
 for the traditional altivec registers.

 I had had a define_split to try and handle this situation, but it didn't work
 all of the time.  I rewrote the insns, so there is an UNSPEC in it to prevent
 over optimization.

 I have done bootstrap builds and make check's on both a big endian power7
 system, and a little endian power8 system with no regressions in the tests.  I
 have also built both power7 and power8 spec 2006 (v5) versions, and Spec
 built.  I have run the Spec floating point tests on a power7 box, and all of
 the tests performed about at the same speed as before the bug fix was done,
 with the exception of gromacs, which is 4% faster with the fix.

 Looking at the static instruction counts, the big change is to fold the addi
 instruction into the load of the constant if it is loading the value to a
 traditional floating point register (previously, it would load up the constant
 with addis/addi and then do an indirect load).  I do see other instructions
 change, including a few more floating point loads, but with the exception of
 gromacs, most of the changes are in the noise level.

 Is this ok to install into the trunk?  At present, it is not needed for GCC
 4.9, since that branch does not have the -mupper-regs support.

 [gcc]
 2015-03-05  Michael Meissner  meiss...@linux.vnet.ibm.com

 PR target/65240
 * config/rs6000/rs6000.md (UNSPEC_FUSION_FP_CONSTANT): New unspec.
 (Vsx_load): New mode attribute to give appropriate VSX load
 instruction.
 (move floating point constant define_split): Use an UNSPEC to make
 sure that the load of the constant is kept as a memory address,
 instead of being converted back into a move of the constant.
 (load_mode_constant): Likewise.

 [gcc/testsuite]
 2015-03-05  Michael Meissner  meiss...@linux.vnet.ibm.com

 PR target/65240
 * gcc.target/powerpc/pr65240.c: New test.

I am concerned with the create_TOC_reference use for TARGET_TOC.  Has
this been tested with big endian -mcmodel=small?

Thanks, David


Re: [PATCH] PR target/65240, Fix Power{7,8} insn constraint issue with -O3 -ffast-math

2015-03-11 Thread David Edelsohn
On Wed, Mar 11, 2015 at 6:21 PM, Michael Meissner
meiss...@linux.vnet.ibm.com wrote:
 On Wed, Mar 11, 2015 at 01:02:06PM -0400, David Edelsohn wrote:
 I am concerned with the create_TOC_reference use for TARGET_TOC.  Has
 this been tested with big endian -mcmodel=small?

 Yes, that was a problem.  Patch coming up soon.  Thanks.

Can you call rs6000_emit_move_directly?

Thanks, David


Re: [PATCH] PR target/65240, Fix Power{7,8} insn constraint issue with -O3 -ffast-math

2015-03-11 Thread Michael Meissner
On Wed, Mar 11, 2015 at 01:02:06PM -0400, David Edelsohn wrote:
 I am concerned with the create_TOC_reference use for TARGET_TOC.  Has
 this been tested with big endian -mcmodel=small?

Yes, that was a problem.  Patch coming up soon.  Thanks.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797



Re: [PATCH] PR target/65240, Fix Power{7,8} insn constraint issue with -O3 -ffast-math

2015-03-09 Thread Michael Meissner
On Fri, Mar 06, 2015 at 01:05:31PM +0100, Richard Biener wrote:
 On Thu, Mar 5, 2015 at 9:06 PM, Michael Meissner
 meiss...@linux.vnet.ibm.com wrote:
  This patch fixes PR 65240, which was a latent bug that was introduced when I
  added the -mupper-regs support to the PowerPC compiler.  In the PowerPC
  compiler, if you use -ffast-math, the compiler allows floating point 
  constants
  in move RTLs until register allocation time, in order to allow the
  optimizations that replace division by a constant with multiplication by the
  reciprocal.
 
 Don't we perform this optimization on the GIMPLE/tree level already?
 
 Richard.

I've suspected that the code is a hold over from the days when the
multiply by the reciprocal was done in RTL.  I didn't want to change the logic
right now, but it something that should be investigated when GCC 6.0 opens up.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797



Re: [PATCH] PR target/65240, Fix Power{7,8} insn constraint issue with -O3 -ffast-math

2015-03-06 Thread Richard Biener
On Thu, Mar 5, 2015 at 9:06 PM, Michael Meissner
meiss...@linux.vnet.ibm.com wrote:
 This patch fixes PR 65240, which was a latent bug that was introduced when I
 added the -mupper-regs support to the PowerPC compiler.  In the PowerPC
 compiler, if you use -ffast-math, the compiler allows floating point constants
 in move RTLs until register allocation time, in order to allow the
 optimizations that replace division by a constant with multiplication by the
 reciprocal.

Don't we perform this optimization on the GIMPLE/tree level already?

Richard.

  If the register being loaded up is a traditional Altivec register,
 the load will fail, since there is no D-FORM (register+offset) addressing mode
 for the traditional altivec registers.

 I had had a define_split to try and handle this situation, but it didn't work
 all of the time.  I rewrote the insns, so there is an UNSPEC in it to prevent
 over optimization.

 I have done bootstrap builds and make check's on both a big endian power7
 system, and a little endian power8 system with no regressions in the tests.  I
 have also built both power7 and power8 spec 2006 (v5) versions, and Spec
 built.  I have run the Spec floating point tests on a power7 box, and all of
 the tests performed about at the same speed as before the bug fix was done,
 with the exception of gromacs, which is 4% faster with the fix.

 Looking at the static instruction counts, the big change is to fold the addi
 instruction into the load of the constant if it is loading the value to a
 traditional floating point register (previously, it would load up the constant
 with addis/addi and then do an indirect load).  I do see other instructions
 change, including a few more floating point loads, but with the exception of
 gromacs, most of the changes are in the noise level.

 Is this ok to install into the trunk?  At present, it is not needed for GCC
 4.9, since that branch does not have the -mupper-regs support.

 [gcc]
 2015-03-05  Michael Meissner  meiss...@linux.vnet.ibm.com

 PR target/65240
 * config/rs6000/rs6000.md (UNSPEC_FUSION_FP_CONSTANT): New unspec.
 (Vsx_load): New mode attribute to give appropriate VSX load
 instruction.
 (move floating point constant define_split): Use an UNSPEC to make
 sure that the load of the constant is kept as a memory address,
 instead of being converted back into a move of the constant.
 (load_mode_constant): Likewise.

 [gcc/testsuite]
 2015-03-05  Michael Meissner  meiss...@linux.vnet.ibm.com

 PR target/65240
 * gcc.target/powerpc/pr65240.c: New test.

 --
 Michael Meissner, IBM
 IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
 email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797