Re: [PATCH], PR target/81550, Rewrite PowerPC loop_align test so it still tests the original target hook

2018-01-24 Thread Segher Boessenkool
On Wed, Jan 24, 2018 at 05:00:39PM -0500, Michael Meissner wrote:
> Replacing 'int' with 'unsigned long' allows the test to succeed once again.  I
> have checked this on a big endian power8 (both 32-bit and 64-bit) and on a
> little endian power8 (64-bit only), and it passes in all three environments.
> Can I install this on the trunk?

Yes, this is.  Please install on trunk.  Thanks!


Segher


> [gcc/testsuite]
> 2018-01-24  Michael Meissner  
> 
>   PR target/81550
>   * gcc.target/powerpc/loop_align.c: Use unsigned long for the loop
>   index instead of int, which allows IVOPTs to properly optimize the
>   loop.


Re: [PATCH], PR target/81550, Rewrite PowerPC loop_align test so it still tests the original target hook

2018-01-24 Thread Segher Boessenkool
On Wed, Jan 24, 2018 at 03:19:00PM -0500, Michael Meissner wrote:
> On Wed, Jan 24, 2018 at 12:35:38PM -0600, Segher Boessenkool wrote:
> > Although, hrm, in your patch you also change "int i" to "long i"; that
> > alone seems to be enough to fix everything?  Could you check that please?
> 
> Changing i and n to either 'long' or 'long unsigned' makes the test work.
> 
> It is interesting that -mcpu=power7 -mbig does not seem to be able to create
> LFDU and STFDU, but either setting cpu to power8/power9 or setting -mbig to
> -mlittle or -m32 it can generate those instructions.

Yeah, dunno...  I suspect we have some target costs a bit wrong,
influencing the ivopts etc. decisions.

The auto_inc_dec pass says (-mcpu=power7 -mabi=elfv2 -mbig):

   23: r147:DF=[r126:DI]
found mem(23) *(r[126]+0)
trying SIMPLE_PRE_INC
cost failure old=16 new=408

(where -mlittle thinks it is fine; does not say what costs, but the 408
for -mbig looks suspicious of course -- sounds like the call to
rs6000_slow_unaligned_access in rs6000_rtx_costs misfired.  Yet another
reason to use insn_cost instead ;-) )


Segher


Re: [PATCH], PR target/81550, Rewrite PowerPC loop_align test so it still tests the original target hook

2018-01-24 Thread Michael Meissner
On Wed, Jan 24, 2018 at 12:35:38PM -0600, Segher Boessenkool wrote:
> Hi!
> 
> On Wed, Jan 24, 2018 at 12:27:55AM -0500, Michael Meissner wrote:
> > 
> > As Segher and I were discussing over private IRC, the root cause of this 
> > bug is
> > the compiler no long generates the BDNZ instruction for a count down loop,
> > instead it decrements the index in a GPR and does a branch/comparison on it.
> 
> Yes, ivopts makes a bad decision (it uses stride 8 for all IVs, it should
> keep one with stride -1 for the loop counter, for optimal code; it also
> does three separate increments for the three memory accesses, which is
> a bit excessive here).
> 
> > In doing so, it now unrolls the loop twice, and and the resulting loop is 
> > too
> > big for the target hook TARGET_ASM_LOOP_ALIGN_MAX_SKIP.  This means the loop
> > isn't aligned to a 32 byte boundary.
> 
> It's not really unrolling, it is bb-reorder copying an RTL block.  However,
> even if you disable it you still get 9 insns on some configurations, so
> your patch does not hide the problem :-(
> 
> Although, hrm, in your patch you also change "int i" to "long i"; that
> alone seems to be enough to fix everything?  Could you check that please?

Replacing 'int' with 'unsigned long' allows the test to succeed once again.  I
have checked this on a big endian power8 (both 32-bit and 64-bit) and on a
little endian power8 (64-bit only), and it passes in all three environments.
Can I install this on the trunk?

[gcc/testsuite]
2018-01-24  Michael Meissner  

PR target/81550
* gcc.target/powerpc/loop_align.c: Use unsigned long for the loop
index instead of int, which allows IVOPTs to properly optimize the
loop.

Index: gcc/testsuite/gcc.target/powerpc/loop_align.c
===
--- gcc/testsuite/gcc.target/powerpc/loop_align.c   (revision 256992)
+++ gcc/testsuite/gcc.target/powerpc/loop_align.c   (working copy)
@@ -4,8 +4,8 @@
 /* { dg-options "-O2 -mcpu=power7 -falign-functions=16" } */
 /* { dg-final { scan-assembler ".p2align 5,,31" } } */
 
-void f(double *a, double *b, double *c, int n) {
-  int i;
+void f(double *a, double *b, double *c, unsigned long n) {
+  unsigned long i;
   for (i=0; i < n; i++)
 a[i] = b[i] + c[i];
 }

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797



Re: [PATCH], PR target/81550, Rewrite PowerPC loop_align test so it still tests the original target hook

2018-01-24 Thread Michael Meissner
On Wed, Jan 24, 2018 at 12:35:38PM -0600, Segher Boessenkool wrote:
> Hi!
> 
> On Wed, Jan 24, 2018 at 12:27:55AM -0500, Michael Meissner wrote:
> > 
> > As Segher and I were discussing over private IRC, the root cause of this 
> > bug is
> > the compiler no long generates the BDNZ instruction for a count down loop,
> > instead it decrements the index in a GPR and does a branch/comparison on it.
> 
> Yes, ivopts makes a bad decision (it uses stride 8 for all IVs, it should
> keep one with stride -1 for the loop counter, for optimal code; it also
> does three separate increments for the three memory accesses, which is
> a bit excessive here).
> 
> > In doing so, it now unrolls the loop twice, and and the resulting loop is 
> > too
> > big for the target hook TARGET_ASM_LOOP_ALIGN_MAX_SKIP.  This means the loop
> > isn't aligned to a 32 byte boundary.
> 
> It's not really unrolling, it is bb-reorder copying an RTL block.  However,
> even if you disable it you still get 9 insns on some configurations, so
> your patch does not hide the problem :-(
> 
> Although, hrm, in your patch you also change "int i" to "long i"; that
> alone seems to be enough to fix everything?  Could you check that please?

Changing i and n to either 'long' or 'long unsigned' makes the test work.

It is interesting that -mcpu=power7 -mbig does not seem to be able to create
LFDU and STFDU, but either setting cpu to power8/power9 or setting -mbig to
-mlittle or -m32 it can generate those instructions.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797



Re: [PATCH], PR target/81550, Rewrite PowerPC loop_align test so it still tests the original target hook

2018-01-24 Thread Segher Boessenkool
Hi!

On Wed, Jan 24, 2018 at 12:27:55AM -0500, Michael Meissner wrote:
> 
> As Segher and I were discussing over private IRC, the root cause of this bug 
> is
> the compiler no long generates the BDNZ instruction for a count down loop,
> instead it decrements the index in a GPR and does a branch/comparison on it.

Yes, ivopts makes a bad decision (it uses stride 8 for all IVs, it should
keep one with stride -1 for the loop counter, for optimal code; it also
does three separate increments for the three memory accesses, which is
a bit excessive here).

> In doing so, it now unrolls the loop twice, and and the resulting loop is too
> big for the target hook TARGET_ASM_LOOP_ALIGN_MAX_SKIP.  This means the loop
> isn't aligned to a 32 byte boundary.

It's not really unrolling, it is bb-reorder copying an RTL block.  However,
even if you disable it you still get 9 insns on some configurations, so
your patch does not hide the problem :-(

Although, hrm, in your patch you also change "int i" to "long i"; that
alone seems to be enough to fix everything?  Could you check that please?


Segher


[PATCH], PR target/81550, Rewrite PowerPC loop_align test so it still tests the original target hook

2018-01-23 Thread Michael Meissner
The loop_align.c test has been broken for some time, since I put in patches to
eliminate some debug hooks (-mno-upper-regs-{df,di,sf}) that we deemed to no
longer be needed.

As Segher and I were discussing over private IRC, the root cause of this bug is
the compiler no long generates the BDNZ instruction for a count down loop,
instead it decrements the index in a GPR and does a branch/comparison on it.
In doing so, it now unrolls the loop twice, and and the resulting loop is too
big for the target hook TARGET_ASM_LOOP_ALIGN_MAX_SKIP.  This means the loop
isn't aligned to a 32 byte boundary.

While it is important to ultimately fix the code generation bug to once again
generate the BDNZ instruction, it may be more involved in fixing the bug.  So,
I decided to rewrite the test to be simpler, and the resulting code fits within
the 4-8 instruction window the target hook is looking for.

I have tested this on a little endian power8 system, and a big endian power8
system, doing both bootstrap builds and regression tests.  The only change to
the regression test is that loop_align.c now passes on little endian 64-bit and
big endian 64-bit (big endian 32-bit did not fail with the new changes).  Can I
install this on the trunk?  Back ports to GCC 7/6 are not needed, since the
original code works on those systems.

[gcc/testsuite]
2018-01-24  Michael Meissner  

PR target/81550
* gcc.target/powerpc/loop_align.c: Rewrite test so that the loop
remains small enough that it tests the alignment of loops
specified by the target hook TARGET_ASM_LOOP_ALIGN_MAX_SKIP.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Index: gcc/testsuite/gcc.target/powerpc/loop_align.c
===
--- gcc/testsuite/gcc.target/powerpc/loop_align.c   (revision 256992)
+++ gcc/testsuite/gcc.target/powerpc/loop_align.c   (working copy)
@@ -1,11 +1,16 @@
 /* { dg-do compile { target { powerpc*-*-* } } } */
 /* { dg-skip-if "" { powerpc*-*-darwin* powerpc-ibm-aix* } } */
 /* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { 
"-mcpu=power7" } } */
-/* { dg-options "-O2 -mcpu=power7 -falign-functions=16" } */
+/* { dg-options "-O2 -mcpu=power7 -falign-functions=16 -fno-reorder-blocks" } 
*/
 /* { dg-final { scan-assembler ".p2align 5,,31" } } */
 
-void f(double *a, double *b, double *c, int n) {
-  int i;
+/* This test must be crafted so that the loop is less than 8 insns (and more
+   than 4) in order for the TARGET_ASM_LOOP_ALIGN_MAX_SKIP target hook to fire
+   and align the loop properly.  Originally the loop used double, and various
+   optimizations caused the loop to double in size, and fail the test.  */
+
+void f(long *a, long *b, long *c, long n) {
+  long i;
   for (i=0; i < n; i++)
 a[i] = b[i] + c[i];
 }