On Thu, 2011-03-24 at 15:40 +0000, Richard Sandiford wrote: > We currently generate very poor code for tests like: > > #include <arm_neon.h> > > void > foo (uint32_t *a, uint32_t *b, uint32_t *c) > { > uint32x4x3_t x, y; > > x = vld3q_u32 (a); > y = vld3q_u32 (b); > x.val[0] = vaddq_u32 (x.val[0], y.val[0]); > x.val[1] = vaddq_u32 (x.val[1], y.val[1]); > x.val[2] = vaddq_u32 (x.val[2], y.val[2]); > vst3q_u32 (a, x); > } > > This is because we force the uint32x4x3_t values to the stack and > then load and store the individual vectors. > > What we actually want is for the uint32x4x3_t values to be stored > in registers, and for the individual vectors to be accessed as > subregs of those registers. The first part involves some middle-end > mode changes (see recent gcc@ thread), while the second part requires > a change to ARM's CANNOT_CHANGE_MODE_CLASS. > > CANNOT_CHANGE_MODE_CLASS is defined as: > > /* FPA registers can't do subreg as all values are reformatted to internal > precision. VFP registers may only be accessed in the mode they > were set. */ > #define CANNOT_CHANGE_MODE_CLASS(FROM, TO, CLASS) \ > (GET_MODE_SIZE (FROM) != GET_MODE_SIZE (TO) \ > ? reg_classes_intersect_p (FPA_REGS, (CLASS)) \ > || reg_classes_intersect_p (VFP_REGS, (CLASS)) \ > : 0) > > But this VFP restriction appears to apply only to VFPv1; thanks to > Peter Maydell for the archaeology. > > Tested on arm-linux-gnueabi. OK to install? > > This doesn't have any direct benefit without the middle-end mode change, > but it needs to go in first in order for that change not to regress. > > Richard > > > gcc/ > * config/arm/arm.h (CANNOT_CHANGE_MODE_CLASS): Restrict FPA_REGS > case to VFPv1. >
GCC doesn't support VFPv1 (see the all_fpus table), and I don't think many chips based on that ever escaped into the wild world, so I'm not worried about trying to add that now. So it's probably safe to just kill that check for VFP entirely. R.