On Tue, 13 Aug 2013, Kyrylo Tkachov wrote: > > On 08/09/13 11:01, Julian Brown wrote: > > > On Thu, 8 Aug 2013 15:44:17 +0100 > > > Kyrylo Tkachov <kyrylo.tkac...@arm.com> wrote: > > > > > >> Hi all, > > >> > > >> The recently added gcc.target/arm/pr58041.c test exposed a bug in the > > >> backend. When compiling for NEON and with -mno-unaligned-access we > > >> end up generating the vld1.64 and vst1.64 instructions instead of > > >> doing the accesses one byte at a time like -mno-unaligned-access > > >> expects. This patch fixes that by enabling the NEON expander and > > >> insns that produce these instructions only when unaligned accesses > > >> are allowed. > > >> > > >> Bootstrapped on arm-linux-gnueabihf. Tested arm-none-eabi on qemu. > > >> > > >> Ok for trunk and 4.8? > > > > > > I'm not sure if this is right, FWIW -- do the instructions in question > > > trap if the CPU is set to disallow unaligned accesses? I thought that > > > control bit only affected ARM core loads & stores, not NEON ones. > > > > Thinking again - the ARM-ARM says - the alignment check is for element > > size, so an alternative might be to use vld1.8 instead to allow for this > > at which point we might as well do something else with the test. I note > > that these patterns are not allowed for BYTES_BIG_ENDIAN so that might > > be a better alternative than completely disabling it. > > Looking at the section on unaligned accesses, it seems that the > ldrb/strb-class instructions are the only ones that are unaffected by the > SCTLR.A bit and do not produce alignment faults in any case. > The NEON load/store instructions, including vld1.8 can still cause an > alignment fault when SCTLR.A is set. So it seems we can only use the byte-wise > core memory instructions for unaligned data.
This change however has regressed gcc.dg/vect/vect-72.c on the arm-linux-gnueabi target, -march=armv5te, in particular in 4.8. Beforehand the code fragment in question produced was: .L14: sub r1, r3, #16 add r3, r3, #16 vld1.8 {q8}, [r1] cmp r3, r0 vst1.64 {d16-d17}, [r2:64]! bne .L14 Afterwards it is: .L14: vldr d16, [r3, #-16] vldr d17, [r3, #-8] add r3, r3, #16 cmp r3, r1 vst1.64 {d16-d17}, [r2:64]! bne .L14 and the second VLDR instruction traps with SIGILL (the value in R3 is 0x10b29, odd as you'd expect, pointing into `ib'). I don't know why and especially why only the second of the two (regrettably I've been unable to track down an instruction reference that'd be detailed enough to specify what exceptions VLDR can produce and under what conditions). Interestingly enough the trap does not happen when the program is single-stepped under GDB (via gdbserver), however it then aborts once this copy loop has completed as `ia' contains rubbish and fails the test. Is there a fix that needs backporting to 4.8 or is this an issue that was unknown so far? Hardware and Linux used: $ cat /proc/cpuinfo Processor : ARMv7 Processor rev 2 (v7l) processor : 0 BogoMIPS : 2013.49 processor : 1 BogoMIPS : 1963.08 Features : swp half thumb fastmult vfp edsp thumbee neon vfpv3 CPU implementer : 0x41 CPU architecture: 7 CPU variant : 0x1 CPU part : 0xc09 CPU revision : 2 Hardware : OMAP4430 Panda Board Revision : 0020 Serial : 0000000000000000 $ uname -a Linux panda2 2.6.35-903-omap4 #14-Ubuntu SMP PREEMPT Wed Oct 6 17:23:24 UTC 2010 armv7l GNU/Linux $ Maciej