On Wed, Jan 24, 2018 at 12:35:38PM -0600, Segher Boessenkool wrote: > Hi! > > On Wed, Jan 24, 2018 at 12:27:55AM -0500, Michael Meissner wrote: > > > > As Segher and I were discussing over private IRC, the root cause of this > > bug is > > the compiler no long generates the BDNZ instruction for a count down loop, > > instead it decrements the index in a GPR and does a branch/comparison on it. > > Yes, ivopts makes a bad decision (it uses stride 8 for all IVs, it should > keep one with stride -1 for the loop counter, for optimal code; it also > does three separate increments for the three memory accesses, which is > a bit excessive here). > > > In doing so, it now unrolls the loop twice, and and the resulting loop is > > too > > big for the target hook TARGET_ASM_LOOP_ALIGN_MAX_SKIP. This means the loop > > isn't aligned to a 32 byte boundary. > > It's not really unrolling, it is bb-reorder copying an RTL block. However, > even if you disable it you still get 9 insns on some configurations, so > your patch does not hide the problem :-( > > Although, hrm, in your patch you also change "int i" to "long i"; that > alone seems to be enough to fix everything? Could you check that please?
Changing i and n to either 'long' or 'long unsigned' makes the test work. It is interesting that -mcpu=power7 -mbig does not seem to be able to create LFDU and STFDU, but either setting cpu to power8/power9 or setting -mbig to -mlittle or -m32 it can generate those instructions. -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797