On Wed, Jan 24, 2018 at 12:35:38PM -0600, Segher Boessenkool wrote:
> Hi!
> 
> On Wed, Jan 24, 2018 at 12:27:55AM -0500, Michael Meissner wrote:
> > 
> > As Segher and I were discussing over private IRC, the root cause of this 
> > bug is
> > the compiler no long generates the BDNZ instruction for a count down loop,
> > instead it decrements the index in a GPR and does a branch/comparison on it.
> 
> Yes, ivopts makes a bad decision (it uses stride 8 for all IVs, it should
> keep one with stride -1 for the loop counter, for optimal code; it also
> does three separate increments for the three memory accesses, which is
> a bit excessive here).
> 
> > In doing so, it now unrolls the loop twice, and and the resulting loop is 
> > too
> > big for the target hook TARGET_ASM_LOOP_ALIGN_MAX_SKIP.  This means the loop
> > isn't aligned to a 32 byte boundary.
> 
> It's not really unrolling, it is bb-reorder copying an RTL block.  However,
> even if you disable it you still get 9 insns on some configurations, so
> your patch does not hide the problem :-(
> 
> Although, hrm, in your patch you also change "int i" to "long i"; that
> alone seems to be enough to fix everything?  Could you check that please?

Replacing 'int' with 'unsigned long' allows the test to succeed once again.  I
have checked this on a big endian power8 (both 32-bit and 64-bit) and on a
little endian power8 (64-bit only), and it passes in all three environments.
Can I install this on the trunk?

[gcc/testsuite]
2018-01-24  Michael Meissner  <meiss...@linux.vnet.ibm.com>

        PR target/81550
        * gcc.target/powerpc/loop_align.c: Use unsigned long for the loop
        index instead of int, which allows IVOPTs to properly optimize the
        loop.

Index: gcc/testsuite/gcc.target/powerpc/loop_align.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/loop_align.c       (revision 256992)
+++ gcc/testsuite/gcc.target/powerpc/loop_align.c       (working copy)
@@ -4,8 +4,8 @@
 /* { dg-options "-O2 -mcpu=power7 -falign-functions=16" } */
 /* { dg-final { scan-assembler ".p2align 5,,31" } } */
 
-void f(double *a, double *b, double *c, int n) {
-  int i;
+void f(double *a, double *b, double *c, unsigned long n) {
+  unsigned long i;
   for (i=0; i < n; i++)
     a[i] = b[i] + c[i];
 }

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797

Reply via email to