On Wed, May 28, 2014 at 9:30 AM, Richard Earnshaw <rearn...@arm.com> wrote: > Ah, light dawns (maybe).
> > I guess the problems stem from the attempts to combine Neon with ARMv5. > Neon shouldn't be used with anything prior to ARMv7, since that's the > earliest version of the architecture that can support it. > > I guess that what is happening is that we see we have Neon, so start to > generate a Neon-based copy sequence, but then notice that we don't have > misaligned access (something that must exist if we have Neon) and > generate VLDR instructions in a mistaken attempt to work around the > first inconsistency. > > Maybe we should tie -mfpu=neon to having at least ARMv7 (though ARMv6 > also has misaligned access support). Sigh. I can think of 2-3 options. 1. Allow Neon to imply unaligned access if the core architecture is below v7-a. 2. Disallow Neon for any earlier options - options breakage and regression from earlier releases ? We just have to document this some place - This is what you suggest. 3. Allow Neon to imply unaligned access only in the Neon unit but not on the integer side but I suspect that breaks the original testcase. And : 1 is confusing because there is no real implementation unless you want to drop down the architecture level on real v7-a implementation and you probably want lower architecture levels to imply no unaligned access on the core. (i.e. no unaligned ldr(h) instructions). 2 - clean break but probably need an override switch of I-know-what-I-am-doing in case people try such combinations for tuning purposes. 3 - Again confusing and a source of more testsuite failures and other cases. Sounds like #2 maybe the least worst option. regards Ramana > > R. > > On 28/05/14 00:03, Maciej W. Rozycki wrote: >> On Tue, 27 May 2014, Kyrill Tkachov wrote: >> >>>>> This change however has regressed gcc.dg/vect/vect-72.c on the >>>>> arm-linux-gnueabi target, -march=armv5te, in particular in 4.8. >>>> And what are all the configure flags you are using in case some one >>>> has to reproduce this issue ? >>> >>> Second that. My recently built 4.8 (gcc version 4.8.2 20130531) for vect-72 >>> with options: >>> -O2 -ftree-vectorize -march=armv5te -mfpu=neon -mfloat-abi=hard >>> -fno-vect-cost-model -fno-common >>> >>> gives code the same as your original one: >>> .L14: >>> sub r1, r3, #16 >>> add r3, r3, #16 >>> vld1.8 {q8}, [r1] >>> cmp r3, r0 >>> vst1.64 {d16-d17}, [r2:64]! >>> bne .L14 >>> ldr r3, .L22+12 >>> add ip, r3, #128 >>> add r2, r3, #129 >> >> I have this: >> >> .L14: >> sub r1, r3, #16 @ 130 *arm_addsi3/7 [length = 4] >> add r3, r3, #16 @ 135 *arm_addsi3/2 [length = 4] >> vld1.8 {q8}, [r1] @ 131 *movmisalignv16qi_neon_load >> [length = 4] >> cmp r3, r0 @ 136 *arm_cmpsi_insn/3 [length = 4] >> vst1.64 {d16-d17}, [r2:64]! @ 133 *neon_movv16qi/2 >> [length = 8] >> bne .L14 @ 137 arm_cond_branch [length = 4] >> >> without and this: >> >> .L14: >> vldr d16, [r3, #-16] @ 130 *neon_movv16qi/4 [length = 8] >> vldr d17, [r3, #-8] >> add r3, r3, #16 @ 133 *arm_addsi3/2 [length = 4] >> cmp r3, r1 @ 134 *arm_cmpsi_insn/3 [length = 4] >> vst1.64 {d16-d17}, [r2:64]! @ 131 *neon_movv16qi/2 >> [length = 8] >> bne .L14 @ 135 arm_cond_branch [length = 4] >> >> with your change applied respectively so clearly it's making its intended >> effect of disabling the use of movmisalignv16qi_neon_load. >> >> However VLD1.8 can also be produced from other RTL patterns, or maybe you >> don't have `unaligned_access' set to zero for some reason, which you >> should (for ARMv5TE) according to this gcc/config/arm/arm.c piece: >> >> /* Enable -munaligned-access by default for >> - all ARMv6 architecture-based processors >> - ARMv7-A, ARMv7-R, and ARMv7-M architecture-based processors. >> - ARMv8 architecture-base processors. >> >> Disable -munaligned-access by default for >> - all pre-ARMv6 architecture-based processors >> - ARMv6-M architecture-based processors. */ >> >> if (unaligned_access == 2) >> { >> if (arm_arch6 && (arm_arch_notm || arm_arch7)) >> unaligned_access = 1; >> else >> unaligned_access = 0; >> } >> else if (unaligned_access == 1 >> && !(arm_arch6 && (arm_arch_notm || arm_arch7))) >> { >> warning (0, "target CPU does not support unaligned accesses"); >> unaligned_access = 0; >> } >> >> -- can you build vect-72.c with -dp and see which pattern VLD1.8 is >> produced from in your case? >> >> As to the GCC configure options I have as I say nothing there beyond >> making -march=armv5te the default (surely you're not interested in >> --prefix, etc.). For the record the test framework adds these options on >> top of that for this particular case: >> >> -fno-diagnostics-show-caret -mfpu=neon -mfloat-abi=softfp -ffast-math >> -ftree-vectorize -fno-vect-cost-model -fno-common -O2 >> -fdump-tree-vect-details >> >> but these should be no different in your case (except perhaps from >> `-mfloat-abi=', but that shouldn't matter as this is no FP code). >> >> I have double-checked with current (r210984) 4.8 now and the issue is >> still there. >> >> Maciej >> > >