Re: [PATCH], PR target/81550, Rewrite PowerPC loop_align test so it still tests the original target hook
On Wed, Jan 24, 2018 at 05:00:39PM -0500, Michael Meissner wrote: > Replacing 'int' with 'unsigned long' allows the test to succeed once again. I > have checked this on a big endian power8 (both 32-bit and 64-bit) and on a > little endian power8 (64-bit only), and it passes in all three environments. > Can I install this on the trunk? Yes, this is. Please install on trunk. Thanks! Segher > [gcc/testsuite] > 2018-01-24 Michael Meissner> > PR target/81550 > * gcc.target/powerpc/loop_align.c: Use unsigned long for the loop > index instead of int, which allows IVOPTs to properly optimize the > loop.
Re: [PATCH], PR target/81550, Rewrite PowerPC loop_align test so it still tests the original target hook
On Wed, Jan 24, 2018 at 03:19:00PM -0500, Michael Meissner wrote: > On Wed, Jan 24, 2018 at 12:35:38PM -0600, Segher Boessenkool wrote: > > Although, hrm, in your patch you also change "int i" to "long i"; that > > alone seems to be enough to fix everything? Could you check that please? > > Changing i and n to either 'long' or 'long unsigned' makes the test work. > > It is interesting that -mcpu=power7 -mbig does not seem to be able to create > LFDU and STFDU, but either setting cpu to power8/power9 or setting -mbig to > -mlittle or -m32 it can generate those instructions. Yeah, dunno... I suspect we have some target costs a bit wrong, influencing the ivopts etc. decisions. The auto_inc_dec pass says (-mcpu=power7 -mabi=elfv2 -mbig): 23: r147:DF=[r126:DI] found mem(23) *(r[126]+0) trying SIMPLE_PRE_INC cost failure old=16 new=408 (where -mlittle thinks it is fine; does not say what costs, but the 408 for -mbig looks suspicious of course -- sounds like the call to rs6000_slow_unaligned_access in rs6000_rtx_costs misfired. Yet another reason to use insn_cost instead ;-) ) Segher
Re: [PATCH], PR target/81550, Rewrite PowerPC loop_align test so it still tests the original target hook
On Wed, Jan 24, 2018 at 12:35:38PM -0600, Segher Boessenkool wrote: > Hi! > > On Wed, Jan 24, 2018 at 12:27:55AM -0500, Michael Meissner wrote: > > > > As Segher and I were discussing over private IRC, the root cause of this > > bug is > > the compiler no long generates the BDNZ instruction for a count down loop, > > instead it decrements the index in a GPR and does a branch/comparison on it. > > Yes, ivopts makes a bad decision (it uses stride 8 for all IVs, it should > keep one with stride -1 for the loop counter, for optimal code; it also > does three separate increments for the three memory accesses, which is > a bit excessive here). > > > In doing so, it now unrolls the loop twice, and and the resulting loop is > > too > > big for the target hook TARGET_ASM_LOOP_ALIGN_MAX_SKIP. This means the loop > > isn't aligned to a 32 byte boundary. > > It's not really unrolling, it is bb-reorder copying an RTL block. However, > even if you disable it you still get 9 insns on some configurations, so > your patch does not hide the problem :-( > > Although, hrm, in your patch you also change "int i" to "long i"; that > alone seems to be enough to fix everything? Could you check that please? Replacing 'int' with 'unsigned long' allows the test to succeed once again. I have checked this on a big endian power8 (both 32-bit and 64-bit) and on a little endian power8 (64-bit only), and it passes in all three environments. Can I install this on the trunk? [gcc/testsuite] 2018-01-24 Michael MeissnerPR target/81550 * gcc.target/powerpc/loop_align.c: Use unsigned long for the loop index instead of int, which allows IVOPTs to properly optimize the loop. Index: gcc/testsuite/gcc.target/powerpc/loop_align.c === --- gcc/testsuite/gcc.target/powerpc/loop_align.c (revision 256992) +++ gcc/testsuite/gcc.target/powerpc/loop_align.c (working copy) @@ -4,8 +4,8 @@ /* { dg-options "-O2 -mcpu=power7 -falign-functions=16" } */ /* { dg-final { scan-assembler ".p2align 5,,31" } } */ -void f(double *a, double *b, double *c, int n) { - int i; +void f(double *a, double *b, double *c, unsigned long n) { + unsigned long i; for (i=0; i < n; i++) a[i] = b[i] + c[i]; } -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Re: [PATCH], PR target/81550, Rewrite PowerPC loop_align test so it still tests the original target hook
On Wed, Jan 24, 2018 at 12:35:38PM -0600, Segher Boessenkool wrote: > Hi! > > On Wed, Jan 24, 2018 at 12:27:55AM -0500, Michael Meissner wrote: > > > > As Segher and I were discussing over private IRC, the root cause of this > > bug is > > the compiler no long generates the BDNZ instruction for a count down loop, > > instead it decrements the index in a GPR and does a branch/comparison on it. > > Yes, ivopts makes a bad decision (it uses stride 8 for all IVs, it should > keep one with stride -1 for the loop counter, for optimal code; it also > does three separate increments for the three memory accesses, which is > a bit excessive here). > > > In doing so, it now unrolls the loop twice, and and the resulting loop is > > too > > big for the target hook TARGET_ASM_LOOP_ALIGN_MAX_SKIP. This means the loop > > isn't aligned to a 32 byte boundary. > > It's not really unrolling, it is bb-reorder copying an RTL block. However, > even if you disable it you still get 9 insns on some configurations, so > your patch does not hide the problem :-( > > Although, hrm, in your patch you also change "int i" to "long i"; that > alone seems to be enough to fix everything? Could you check that please? Changing i and n to either 'long' or 'long unsigned' makes the test work. It is interesting that -mcpu=power7 -mbig does not seem to be able to create LFDU and STFDU, but either setting cpu to power8/power9 or setting -mbig to -mlittle or -m32 it can generate those instructions. -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Re: [PATCH], PR target/81550, Rewrite PowerPC loop_align test so it still tests the original target hook
Hi! On Wed, Jan 24, 2018 at 12:27:55AM -0500, Michael Meissner wrote: > > As Segher and I were discussing over private IRC, the root cause of this bug > is > the compiler no long generates the BDNZ instruction for a count down loop, > instead it decrements the index in a GPR and does a branch/comparison on it. Yes, ivopts makes a bad decision (it uses stride 8 for all IVs, it should keep one with stride -1 for the loop counter, for optimal code; it also does three separate increments for the three memory accesses, which is a bit excessive here). > In doing so, it now unrolls the loop twice, and and the resulting loop is too > big for the target hook TARGET_ASM_LOOP_ALIGN_MAX_SKIP. This means the loop > isn't aligned to a 32 byte boundary. It's not really unrolling, it is bb-reorder copying an RTL block. However, even if you disable it you still get 9 insns on some configurations, so your patch does not hide the problem :-( Although, hrm, in your patch you also change "int i" to "long i"; that alone seems to be enough to fix everything? Could you check that please? Segher
[PATCH], PR target/81550, Rewrite PowerPC loop_align test so it still tests the original target hook
The loop_align.c test has been broken for some time, since I put in patches to eliminate some debug hooks (-mno-upper-regs-{df,di,sf}) that we deemed to no longer be needed. As Segher and I were discussing over private IRC, the root cause of this bug is the compiler no long generates the BDNZ instruction for a count down loop, instead it decrements the index in a GPR and does a branch/comparison on it. In doing so, it now unrolls the loop twice, and and the resulting loop is too big for the target hook TARGET_ASM_LOOP_ALIGN_MAX_SKIP. This means the loop isn't aligned to a 32 byte boundary. While it is important to ultimately fix the code generation bug to once again generate the BDNZ instruction, it may be more involved in fixing the bug. So, I decided to rewrite the test to be simpler, and the resulting code fits within the 4-8 instruction window the target hook is looking for. I have tested this on a little endian power8 system, and a big endian power8 system, doing both bootstrap builds and regression tests. The only change to the regression test is that loop_align.c now passes on little endian 64-bit and big endian 64-bit (big endian 32-bit did not fail with the new changes). Can I install this on the trunk? Back ports to GCC 7/6 are not needed, since the original code works on those systems. [gcc/testsuite] 2018-01-24 Michael MeissnerPR target/81550 * gcc.target/powerpc/loop_align.c: Rewrite test so that the loop remains small enough that it tests the alignment of loops specified by the target hook TARGET_ASM_LOOP_ALIGN_MAX_SKIP. -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797 Index: gcc/testsuite/gcc.target/powerpc/loop_align.c === --- gcc/testsuite/gcc.target/powerpc/loop_align.c (revision 256992) +++ gcc/testsuite/gcc.target/powerpc/loop_align.c (working copy) @@ -1,11 +1,16 @@ /* { dg-do compile { target { powerpc*-*-* } } } */ /* { dg-skip-if "" { powerpc*-*-darwin* powerpc-ibm-aix* } } */ /* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power7" } } */ -/* { dg-options "-O2 -mcpu=power7 -falign-functions=16" } */ +/* { dg-options "-O2 -mcpu=power7 -falign-functions=16 -fno-reorder-blocks" } */ /* { dg-final { scan-assembler ".p2align 5,,31" } } */ -void f(double *a, double *b, double *c, int n) { - int i; +/* This test must be crafted so that the loop is less than 8 insns (and more + than 4) in order for the TARGET_ASM_LOOP_ALIGN_MAX_SKIP target hook to fire + and align the loop properly. Originally the loop used double, and various + optimizations caused the loop to double in size, and fail the test. */ + +void f(long *a, long *b, long *c, long n) { + long i; for (i=0; i < n; i++) a[i] = b[i] + c[i]; }