We currently generate very poor code for tests like:

#include <arm_neon.h>

void
foo (uint32_t *a, uint32_t *b, uint32_t *c)
{
  uint32x4x3_t x, y;

  x = vld3q_u32 (a);
  y = vld3q_u32 (b);
  x.val[0] = vaddq_u32 (x.val[0], y.val[0]);
  x.val[1] = vaddq_u32 (x.val[1], y.val[1]);
  x.val[2] = vaddq_u32 (x.val[2], y.val[2]);
  vst3q_u32 (a, x);
}

This is because we force the uint32x4x3_t values to the stack and
then load and store the individual vectors.

What we actually want is for the uint32x4x3_t values to be stored
in registers, and for the individual vectors to be accessed as
subregs of those registers.  The first part involves some middle-end
mode changes (see recent gcc@ thread), while the second part requires
a change to ARM's CANNOT_CHANGE_MODE_CLASS.

CANNOT_CHANGE_MODE_CLASS is defined as:

/* FPA registers can't do subreg as all values are reformatted to internal
   precision.  VFP registers may only be accessed in the mode they
   were set.  */
#define CANNOT_CHANGE_MODE_CLASS(FROM, TO, CLASS)       \
  (GET_MODE_SIZE (FROM) != GET_MODE_SIZE (TO)           \
   ? reg_classes_intersect_p (FPA_REGS, (CLASS))        \
     || reg_classes_intersect_p (VFP_REGS, (CLASS))     \
   : 0)

But this VFP restriction appears to apply only to VFPv1; thanks to
Peter Maydell for the archaeology.

Tested on arm-linux-gnueabi.  OK to install?

This doesn't have any direct benefit without the middle-end mode change,
but it needs to go in first in order for that change not to regress.

Richard


gcc/
        * config/arm/arm.h (CANNOT_CHANGE_MODE_CLASS): Restrict FPA_REGS
        case to VFPv1.

Index: gcc/config/arm/arm.h
===================================================================
--- gcc/config/arm/arm.h        2011-03-24 13:47:14.000000000 +0000
+++ gcc/config/arm/arm.h        2011-03-24 15:26:19.000000000 +0000
@@ -1167,12 +1167,14 @@ #define IRA_COVER_CLASSES                               
                     \
 }
 
 /* FPA registers can't do subreg as all values are reformatted to internal
-   precision.  VFP registers may only be accessed in the mode they
+   precision.  VFPv1 registers may only be accessed in the mode they
    were set.  */
-#define CANNOT_CHANGE_MODE_CLASS(FROM, TO, CLASS)      \
-  (GET_MODE_SIZE (FROM) != GET_MODE_SIZE (TO)          \
-   ? reg_classes_intersect_p (FPA_REGS, (CLASS))       \
-     || reg_classes_intersect_p (VFP_REGS, (CLASS))    \
+#define CANNOT_CHANGE_MODE_CLASS(FROM, TO, CLASS)              \
+  (GET_MODE_SIZE (FROM) != GET_MODE_SIZE (TO)                  \
+   ? (reg_classes_intersect_p (FPA_REGS, (CLASS))              \
+      || (TARGET_VFP                                           \
+         && arm_fpu_desc->rev == 1                             \
+         && reg_classes_intersect_p (VFP_REGS, (CLASS))))      \
    : 0)
 
 /* The class value for index registers, and the one for base regs.  */

Reply via email to