We currently generate very poor code for tests like: #include <arm_neon.h>
void foo (uint32_t *a, uint32_t *b, uint32_t *c) { uint32x4x3_t x, y; x = vld3q_u32 (a); y = vld3q_u32 (b); x.val[0] = vaddq_u32 (x.val[0], y.val[0]); x.val[1] = vaddq_u32 (x.val[1], y.val[1]); x.val[2] = vaddq_u32 (x.val[2], y.val[2]); vst3q_u32 (a, x); } This is because we force the uint32x4x3_t values to the stack and then load and store the individual vectors. What we actually want is for the uint32x4x3_t values to be stored in registers, and for the individual vectors to be accessed as subregs of those registers. The first part involves some middle-end mode changes (see recent gcc@ thread), while the second part requires a change to ARM's CANNOT_CHANGE_MODE_CLASS. CANNOT_CHANGE_MODE_CLASS is defined as: /* FPA registers can't do subreg as all values are reformatted to internal precision. VFP registers may only be accessed in the mode they were set. */ #define CANNOT_CHANGE_MODE_CLASS(FROM, TO, CLASS) \ (GET_MODE_SIZE (FROM) != GET_MODE_SIZE (TO) \ ? reg_classes_intersect_p (FPA_REGS, (CLASS)) \ || reg_classes_intersect_p (VFP_REGS, (CLASS)) \ : 0) But this VFP restriction appears to apply only to VFPv1; thanks to Peter Maydell for the archaeology. Tested on arm-linux-gnueabi. OK to install? This doesn't have any direct benefit without the middle-end mode change, but it needs to go in first in order for that change not to regress. Richard gcc/ * config/arm/arm.h (CANNOT_CHANGE_MODE_CLASS): Restrict FPA_REGS case to VFPv1. Index: gcc/config/arm/arm.h =================================================================== --- gcc/config/arm/arm.h 2011-03-24 13:47:14.000000000 +0000 +++ gcc/config/arm/arm.h 2011-03-24 15:26:19.000000000 +0000 @@ -1167,12 +1167,14 @@ #define IRA_COVER_CLASSES \ } /* FPA registers can't do subreg as all values are reformatted to internal - precision. VFP registers may only be accessed in the mode they + precision. VFPv1 registers may only be accessed in the mode they were set. */ -#define CANNOT_CHANGE_MODE_CLASS(FROM, TO, CLASS) \ - (GET_MODE_SIZE (FROM) != GET_MODE_SIZE (TO) \ - ? reg_classes_intersect_p (FPA_REGS, (CLASS)) \ - || reg_classes_intersect_p (VFP_REGS, (CLASS)) \ +#define CANNOT_CHANGE_MODE_CLASS(FROM, TO, CLASS) \ + (GET_MODE_SIZE (FROM) != GET_MODE_SIZE (TO) \ + ? (reg_classes_intersect_p (FPA_REGS, (CLASS)) \ + || (TARGET_VFP \ + && arm_fpu_desc->rev == 1 \ + && reg_classes_intersect_p (VFP_REGS, (CLASS)))) \ : 0) /* The class value for index registers, and the one for base regs. */