Re: Add an array_mode_supported_p target hook

2011-05-06 Thread Richard Earnshaw

On Fri, 2011-05-06 at 11:35 +0100, Richard Sandiford wrote:
 Richard Earnshaw rearn...@arm.com writes:
  +/* Implements target hook array_mode_supported_p.  */
  +
  +static bool
  +arm_array_mode_supported_p (enum machine_mode mode,
  +  unsigned HOST_WIDE_INT nelems)
  +{
  +  if (TARGET_NEON
  +   (VALID_NEON_DREG_MODE (mode) || VALID_NEON_QREG_MODE (mode))
  +   (nelems = 2  nelems = 4))
  +return true;
  +
  +  return false;
  +}
 
  I'm not sure I understand why this is limited to 4 or fewer elements. A
  Q reg of chars would surely be 16 elements.
 
 The mode here is the mode of the array element, which for the cases
 we're interested in would be something like V4HI (D) or V4SI (Q).
 nelems says how many of those (in our case, vector) elements there
 are in the array.
 
 The element range we want is 1-4 because that matches the number
 of vectors that can be loaded by the vld1-vld4 instructions.
 We don't include 1 because arrays of one element are already
 treated as having the same mode as their element.
 
 Richard

I understand now...

Ok.

R.
 




Re: Add an array_mode_supported_p target hook

2011-05-06 Thread Richard Sandiford
Richard Earnshaw rearn...@arm.com writes:
 +/* Implements target hook array_mode_supported_p.  */
 +
 +static bool
 +arm_array_mode_supported_p (enum machine_mode mode,
 +unsigned HOST_WIDE_INT nelems)
 +{
 +  if (TARGET_NEON
 +   (VALID_NEON_DREG_MODE (mode) || VALID_NEON_QREG_MODE (mode))
 +   (nelems = 2  nelems = 4))
 +return true;
 +
 +  return false;
 +}

 I'm not sure I understand why this is limited to 4 or fewer elements. A
 Q reg of chars would surely be 16 elements.

The mode here is the mode of the array element, which for the cases
we're interested in would be something like V4HI (D) or V4SI (Q).
nelems says how many of those (in our case, vector) elements there
are in the array.

The element range we want is 1-4 because that matches the number
of vectors that can be loaded by the vld1-vld4 instructions.
We don't include 1 because arrays of one element are already
treated as having the same mode as their element.

Richard


Re: Add an array_mode_supported_p target hook

2011-04-21 Thread Richard Sandiford
To get back to this...

Richard Sandiford richard.sandif...@linaro.org writes:
 Richard Guenther richard.guent...@gmail.com writes:
 On Thu, Mar 31, 2011 at 3:32 PM, Richard Sandiford
 richard.sandif...@linaro.org wrote:
 This patch adds an array_mode_supported_p hook, which says whether
 MAX_FIXED_MODE_SIZE should be ignored for a given type of array.
 It follows on from the discussion here:

    http://gcc.gnu.org/ml/gcc/2011-03/msg00342.html

 The intended use of the hook is to allow small arrays of vectors
 to have a non-BLK mode, and hence to be stored in rtl registers.
 These arrays are used both in the ARM arm_neon.h API and in the
 optabs proposed in:

    http://gcc.gnu.org/ml/gcc/2011-03/msg00322.html

 The tail end of the thread was about the definition of TYPE_MODE:

 #define TYPE_MODE(NODE) \
  (TREE_CODE (TYPE_CHECK (NODE)) == VECTOR_TYPE \
   ? vector_type_mode (NODE) : (NODE)-type.mode)

 with this outcome:

    http://gcc.gnu.org/ml/gcc/2011-03/msg00470.html

 To summarise my take on it:

 - The current definition of TYPE_MODE isn't sufficient even for vector
  modes and vector_mode_supported_p, because non-vector types can have
  vector modes.

 - We should no longer treat types as having one mode everywhere.
  We should instead replace TYPE_MODE with a function that takes
  a context.  Tests of things like vector_mode_supported_p would
  move from layout_type to this new function.

 I think this patch fits within that scheme.  array_mode_supported_p
 would be treated in the same way as vector_mode_supported_p.

 I realise the ideal would be to get rid of TYPE_MODE first.
 But that's going to be a longer-term thing.  Now that there's
 at least a plan, I'd like to press ahead with the array stuff
 on the basis that

 (a) although the new hook won't work with the target attribute,
    our current mode handling doesn't work in just the same way.

 (b) the new hook doesn't interfere with the plan.

 (c) getting good code from the intrinsics (and support for these
    instructions in the vectoriser) is going to be much more important
    to most ARM users than the ability to turn Neon on and off for
    individual functions in a TU.

 To give an example of the difference, the Neon code posted here:

    http://hilbert-space.de/?p=22

 produces this inner loop before the patch (but with
 http://gcc.gnu.org/ml/gcc-patches/2011-03/msg01996.html applied):

 .L3:
        vld3.8  {d16-d18}, [r1]!
        vstmia  ip, {d16-d18}
        fldd    d19, [sp, #24]
        adr     r5, .L6
        ldmia   r5, {r4-r5}
        fldd    d16, [sp, #32]
        vmov    d18, r4, r5  @ v8qi
        vmull.u8        q9, d19, d18
        adr     r5, .L6+8
        ldmia   r5, {r4-r5}
        vmov    d17, r4, r5  @ v8qi
        vstmia  sp, {d18-d19}
        vmlal.u8        q9, d16, d17
        fldd    d16, [sp, #40]
        adr     r5, .L6+16
        ldmia   r5, {r4-r5}
        vmov    d17, r4, r5  @ v8qi
        vmlal.u8        q9, d16, d17
        add     r3, r3, #1
        vshrn.i16       d16, q9, #8
        cmp     r3, r2
        vst1.8  {d16}, [r0]!
        bne     .L3

 With both patches applied, the inner loop is:

 .L3:
        vld3.8  {d18-d20}, [r1]!
        vmull.u8        q8, d18, d21
        vmlal.u8        q8, d19, d22
        vmlal.u8        q8, d20, d23
        add     r3, r3, #1
        vshrn.i16       d16, q8, #8
        cmp     r3, r2
        vst1.8  {d16}, [r0]!
        bne     .L3

 Tested on arm-linux-gnueabi.  OK to install?

 It looks reasonable given the past discussion, but - can you move forward
 with the Neon stuff a bit to see if it really fits?  Or is this all
 that is needed
 for the load/store lane support as well (apart from vectorizer changes of
 course).

 Yeah, I have a prototype that hacks up some C support for generating the
 (otherwise internal-only) load/store built-in functions that the vectoriser
 is suppsoed to generate.  This patch is all that seems to be needed for the
 types and optabs generation to work in the natural way.

 I'm happy to leave it until the vectoriser stuff is in a more
 submittable state though.

The vectorisation stuff has now been approved and uses this hook to
detect whether interleaved loads  stores are supported.  Also...

 Especially given:

 Can you check the code generated by for example

 float foo(char *p)
 {
   float a[2];
   int i;
   ((char *)a)[0] = p[0];
   ((char *)a)[1] = p[1];
   ((char *)a)[2] = p[2];
   ((char *)a)[3] = p[3];
   ((char *)a)[4] = p[4];
   ((char *)a)[5] = p[5];
   ((char *)a)[6] = p[6];
   ((char *)a)[7] = p[7];
   return a[0] + a[1];
 }

 for an array a that would get such a larger mode?  Thus, check what
 happens with partial defs of different types (just to avoid ICEs like the
 ones Jakub was fixing yesterday).

 OK, I tried:

 #include arm_neon.h

 uint32x2_t foo(char *p)
 {
   uint32x2_t a[2];
   int i;
   ((char *)a)[0] = p[0];
   ((char *)a)[1] = p[1];
   ((char *)a)[2] = p[2];
   ((char *)a)[3] = p[3];
   ((char 

Re: Add an array_mode_supported_p target hook

2011-04-21 Thread Richard Guenther
On Thu, Apr 21, 2011 at 11:50 AM, Richard Sandiford
richard.sandif...@linaro.org wrote:
 To get back to this...

 Richard Sandiford richard.sandif...@linaro.org writes:
 Richard Guenther richard.guent...@gmail.com writes:
 On Thu, Mar 31, 2011 at 3:32 PM, Richard Sandiford
 richard.sandif...@linaro.org wrote:
 This patch adds an array_mode_supported_p hook, which says whether
 MAX_FIXED_MODE_SIZE should be ignored for a given type of array.
 It follows on from the discussion here:

    http://gcc.gnu.org/ml/gcc/2011-03/msg00342.html

 The intended use of the hook is to allow small arrays of vectors
 to have a non-BLK mode, and hence to be stored in rtl registers.
 These arrays are used both in the ARM arm_neon.h API and in the
 optabs proposed in:

    http://gcc.gnu.org/ml/gcc/2011-03/msg00322.html

 The tail end of the thread was about the definition of TYPE_MODE:

 #define TYPE_MODE(NODE) \
  (TREE_CODE (TYPE_CHECK (NODE)) == VECTOR_TYPE \
   ? vector_type_mode (NODE) : (NODE)-type.mode)

 with this outcome:

    http://gcc.gnu.org/ml/gcc/2011-03/msg00470.html

 To summarise my take on it:

 - The current definition of TYPE_MODE isn't sufficient even for vector
  modes and vector_mode_supported_p, because non-vector types can have
  vector modes.

 - We should no longer treat types as having one mode everywhere.
  We should instead replace TYPE_MODE with a function that takes
  a context.  Tests of things like vector_mode_supported_p would
  move from layout_type to this new function.

 I think this patch fits within that scheme.  array_mode_supported_p
 would be treated in the same way as vector_mode_supported_p.

 I realise the ideal would be to get rid of TYPE_MODE first.
 But that's going to be a longer-term thing.  Now that there's
 at least a plan, I'd like to press ahead with the array stuff
 on the basis that

 (a) although the new hook won't work with the target attribute,
    our current mode handling doesn't work in just the same way.

 (b) the new hook doesn't interfere with the plan.

 (c) getting good code from the intrinsics (and support for these
    instructions in the vectoriser) is going to be much more important
    to most ARM users than the ability to turn Neon on and off for
    individual functions in a TU.

 To give an example of the difference, the Neon code posted here:

    http://hilbert-space.de/?p=22

 produces this inner loop before the patch (but with
 http://gcc.gnu.org/ml/gcc-patches/2011-03/msg01996.html applied):

 .L3:
        vld3.8  {d16-d18}, [r1]!
        vstmia  ip, {d16-d18}
        fldd    d19, [sp, #24]
        adr     r5, .L6
        ldmia   r5, {r4-r5}
        fldd    d16, [sp, #32]
        vmov    d18, r4, r5  @ v8qi
        vmull.u8        q9, d19, d18
        adr     r5, .L6+8
        ldmia   r5, {r4-r5}
        vmov    d17, r4, r5  @ v8qi
        vstmia  sp, {d18-d19}
        vmlal.u8        q9, d16, d17
        fldd    d16, [sp, #40]
        adr     r5, .L6+16
        ldmia   r5, {r4-r5}
        vmov    d17, r4, r5  @ v8qi
        vmlal.u8        q9, d16, d17
        add     r3, r3, #1
        vshrn.i16       d16, q9, #8
        cmp     r3, r2
        vst1.8  {d16}, [r0]!
        bne     .L3

 With both patches applied, the inner loop is:

 .L3:
        vld3.8  {d18-d20}, [r1]!
        vmull.u8        q8, d18, d21
        vmlal.u8        q8, d19, d22
        vmlal.u8        q8, d20, d23
        add     r3, r3, #1
        vshrn.i16       d16, q8, #8
        cmp     r3, r2
        vst1.8  {d16}, [r0]!
        bne     .L3

 Tested on arm-linux-gnueabi.  OK to install?

 It looks reasonable given the past discussion, but - can you move forward
 with the Neon stuff a bit to see if it really fits?  Or is this all
 that is needed
 for the load/store lane support as well (apart from vectorizer changes of
 course).

 Yeah, I have a prototype that hacks up some C support for generating the
 (otherwise internal-only) load/store built-in functions that the vectoriser
 is suppsoed to generate.  This patch is all that seems to be needed for the
 types and optabs generation to work in the natural way.

 I'm happy to leave it until the vectoriser stuff is in a more
 submittable state though.

 The vectorisation stuff has now been approved and uses this hook to
 detect whether interleaved loads  stores are supported.  Also...

 Especially given:

 Can you check the code generated by for example

 float foo(char *p)
 {
   float a[2];
   int i;
   ((char *)a)[0] = p[0];
   ((char *)a)[1] = p[1];
   ((char *)a)[2] = p[2];
   ((char *)a)[3] = p[3];
   ((char *)a)[4] = p[4];
   ((char *)a)[5] = p[5];
   ((char *)a)[6] = p[6];
   ((char *)a)[7] = p[7];
   return a[0] + a[1];
 }

 for an array a that would get such a larger mode?  Thus, check what
 happens with partial defs of different types (just to avoid ICEs like the
 ones Jakub was fixing yesterday).

 OK, I tried:

 #include arm_neon.h

 uint32x2_t foo(char *p)
 {
   uint32x2_t a[2];
   int i;
   ((char *)a)[0] = 

Re: Add an array_mode_supported_p target hook

2011-03-31 Thread Richard Guenther
On Thu, Mar 31, 2011 at 3:32 PM, Richard Sandiford
richard.sandif...@linaro.org wrote:
 This patch adds an array_mode_supported_p hook, which says whether
 MAX_FIXED_MODE_SIZE should be ignored for a given type of array.
 It follows on from the discussion here:

    http://gcc.gnu.org/ml/gcc/2011-03/msg00342.html

 The intended use of the hook is to allow small arrays of vectors
 to have a non-BLK mode, and hence to be stored in rtl registers.
 These arrays are used both in the ARM arm_neon.h API and in the
 optabs proposed in:

    http://gcc.gnu.org/ml/gcc/2011-03/msg00322.html

 The tail end of the thread was about the definition of TYPE_MODE:

 #define TYPE_MODE(NODE) \
  (TREE_CODE (TYPE_CHECK (NODE)) == VECTOR_TYPE \
   ? vector_type_mode (NODE) : (NODE)-type.mode)

 with this outcome:

    http://gcc.gnu.org/ml/gcc/2011-03/msg00470.html

 To summarise my take on it:

 - The current definition of TYPE_MODE isn't sufficient even for vector
  modes and vector_mode_supported_p, because non-vector types can have
  vector modes.

 - We should no longer treat types as having one mode everywhere.
  We should instead replace TYPE_MODE with a function that takes
  a context.  Tests of things like vector_mode_supported_p would
  move from layout_type to this new function.

 I think this patch fits within that scheme.  array_mode_supported_p
 would be treated in the same way as vector_mode_supported_p.

 I realise the ideal would be to get rid of TYPE_MODE first.
 But that's going to be a longer-term thing.  Now that there's
 at least a plan, I'd like to press ahead with the array stuff
 on the basis that

 (a) although the new hook won't work with the target attribute,
    our current mode handling doesn't work in just the same way.

 (b) the new hook doesn't interfere with the plan.

 (c) getting good code from the intrinsics (and support for these
    instructions in the vectoriser) is going to be much more important
    to most ARM users than the ability to turn Neon on and off for
    individual functions in a TU.

 To give an example of the difference, the Neon code posted here:

    http://hilbert-space.de/?p=22

 produces this inner loop before the patch (but with
 http://gcc.gnu.org/ml/gcc-patches/2011-03/msg01996.html applied):

 .L3:
        vld3.8  {d16-d18}, [r1]!
        vstmia  ip, {d16-d18}
        fldd    d19, [sp, #24]
        adr     r5, .L6
        ldmia   r5, {r4-r5}
        fldd    d16, [sp, #32]
        vmov    d18, r4, r5  @ v8qi
        vmull.u8        q9, d19, d18
        adr     r5, .L6+8
        ldmia   r5, {r4-r5}
        vmov    d17, r4, r5  @ v8qi
        vstmia  sp, {d18-d19}
        vmlal.u8        q9, d16, d17
        fldd    d16, [sp, #40]
        adr     r5, .L6+16
        ldmia   r5, {r4-r5}
        vmov    d17, r4, r5  @ v8qi
        vmlal.u8        q9, d16, d17
        add     r3, r3, #1
        vshrn.i16       d16, q9, #8
        cmp     r3, r2
        vst1.8  {d16}, [r0]!
        bne     .L3

 With both patches applied, the inner loop is:

 .L3:
        vld3.8  {d18-d20}, [r1]!
        vmull.u8        q8, d18, d21
        vmlal.u8        q8, d19, d22
        vmlal.u8        q8, d20, d23
        add     r3, r3, #1
        vshrn.i16       d16, q8, #8
        cmp     r3, r2
        vst1.8  {d16}, [r0]!
        bne     .L3

 Tested on arm-linux-gnueabi.  OK to install?

It looks reasonable given the past discussion, but - can you move forward
with the Neon stuff a bit to see if it really fits?  Or is this all
that is needed
for the load/store lane support as well (apart from vectorizer changes of
course).

Can you check the code generated by for example

float foo(char *p)
{
  float a[2];
  int i;
  ((char *)a)[0] = p[0];
  ((char *)a)[1] = p[1];
  ((char *)a)[2] = p[2];
  ((char *)a)[3] = p[3];
  ((char *)a)[4] = p[4];
  ((char *)a)[5] = p[5];
  ((char *)a)[6] = p[6];
  ((char *)a)[7] = p[7];
  return a[0] + a[1];
}

for an array a that would get such a larger mode?  Thus, check what
happens with partial defs of different types (just to avoid ICEs like the
ones Jakub was fixing yesterday).

Thanks,
Richard.

 Richard


 gcc/
        * hooks.h (hook_bool_mode_uhwi_false): Declare.
        * hooks.c (hook_bool_mode_uhwi_false): New function.
        * target.def (array_mode_supported_p): New hook.
        * doc/tm.texi.in (TARGET_ARRAY_MODE_SUPPORTED_P): Add @hook.
        * doc/tm.texi: Regenerate.
        * stor-layout.c (mode_for_array): New function.
        (layout_type): Use it.
        * config/arm/arm.c (arm_array_mode_supported_p): New function.
        (TARGET_ARRAY_MODE_SUPPORTED_P): Define.

 Index: gcc/hooks.h
 ===
 --- gcc/hooks.h 2011-03-31 10:57:26.0 +0100
 +++ gcc/hooks.h 2011-03-31 14:18:21.0 +0100
 @@ -34,6 +34,8 @@ extern bool hook_bool_mode_false (enum m
  extern bool hook_bool_mode_true (enum machine_mode);
  extern bool hook_bool_mode_const_rtx_false (enum machine_mode, 

Re: Add an array_mode_supported_p target hook

2011-03-31 Thread Richard Sandiford
Richard Guenther richard.guent...@gmail.com writes:
 On Thu, Mar 31, 2011 at 3:32 PM, Richard Sandiford
 richard.sandif...@linaro.org wrote:
 This patch adds an array_mode_supported_p hook, which says whether
 MAX_FIXED_MODE_SIZE should be ignored for a given type of array.
 It follows on from the discussion here:

    http://gcc.gnu.org/ml/gcc/2011-03/msg00342.html

 The intended use of the hook is to allow small arrays of vectors
 to have a non-BLK mode, and hence to be stored in rtl registers.
 These arrays are used both in the ARM arm_neon.h API and in the
 optabs proposed in:

    http://gcc.gnu.org/ml/gcc/2011-03/msg00322.html

 The tail end of the thread was about the definition of TYPE_MODE:

 #define TYPE_MODE(NODE) \
  (TREE_CODE (TYPE_CHECK (NODE)) == VECTOR_TYPE \
   ? vector_type_mode (NODE) : (NODE)-type.mode)

 with this outcome:

    http://gcc.gnu.org/ml/gcc/2011-03/msg00470.html

 To summarise my take on it:

 - The current definition of TYPE_MODE isn't sufficient even for vector
  modes and vector_mode_supported_p, because non-vector types can have
  vector modes.

 - We should no longer treat types as having one mode everywhere.
  We should instead replace TYPE_MODE with a function that takes
  a context.  Tests of things like vector_mode_supported_p would
  move from layout_type to this new function.

 I think this patch fits within that scheme.  array_mode_supported_p
 would be treated in the same way as vector_mode_supported_p.

 I realise the ideal would be to get rid of TYPE_MODE first.
 But that's going to be a longer-term thing.  Now that there's
 at least a plan, I'd like to press ahead with the array stuff
 on the basis that

 (a) although the new hook won't work with the target attribute,
    our current mode handling doesn't work in just the same way.

 (b) the new hook doesn't interfere with the plan.

 (c) getting good code from the intrinsics (and support for these
    instructions in the vectoriser) is going to be much more important
    to most ARM users than the ability to turn Neon on and off for
    individual functions in a TU.

 To give an example of the difference, the Neon code posted here:

    http://hilbert-space.de/?p=22

 produces this inner loop before the patch (but with
 http://gcc.gnu.org/ml/gcc-patches/2011-03/msg01996.html applied):

 .L3:
        vld3.8  {d16-d18}, [r1]!
        vstmia  ip, {d16-d18}
        fldd    d19, [sp, #24]
        adr     r5, .L6
        ldmia   r5, {r4-r5}
        fldd    d16, [sp, #32]
        vmov    d18, r4, r5  @ v8qi
        vmull.u8        q9, d19, d18
        adr     r5, .L6+8
        ldmia   r5, {r4-r5}
        vmov    d17, r4, r5  @ v8qi
        vstmia  sp, {d18-d19}
        vmlal.u8        q9, d16, d17
        fldd    d16, [sp, #40]
        adr     r5, .L6+16
        ldmia   r5, {r4-r5}
        vmov    d17, r4, r5  @ v8qi
        vmlal.u8        q9, d16, d17
        add     r3, r3, #1
        vshrn.i16       d16, q9, #8
        cmp     r3, r2
        vst1.8  {d16}, [r0]!
        bne     .L3

 With both patches applied, the inner loop is:

 .L3:
        vld3.8  {d18-d20}, [r1]!
        vmull.u8        q8, d18, d21
        vmlal.u8        q8, d19, d22
        vmlal.u8        q8, d20, d23
        add     r3, r3, #1
        vshrn.i16       d16, q8, #8
        cmp     r3, r2
        vst1.8  {d16}, [r0]!
        bne     .L3

 Tested on arm-linux-gnueabi.  OK to install?

 It looks reasonable given the past discussion, but - can you move forward
 with the Neon stuff a bit to see if it really fits?  Or is this all
 that is needed
 for the load/store lane support as well (apart from vectorizer changes of
 course).

Yeah, I have a prototype that hacks up some C support for generating the
(otherwise internal-only) load/store built-in functions that the vectoriser
is suppsoed to generate.  This patch is all that seems to be needed for the
types and optabs generation to work in the natural way.

I'm happy to leave it until the vectoriser stuff is in a more
submittable state though.  Especially given:

 Can you check the code generated by for example

 float foo(char *p)
 {
   float a[2];
   int i;
   ((char *)a)[0] = p[0];
   ((char *)a)[1] = p[1];
   ((char *)a)[2] = p[2];
   ((char *)a)[3] = p[3];
   ((char *)a)[4] = p[4];
   ((char *)a)[5] = p[5];
   ((char *)a)[6] = p[6];
   ((char *)a)[7] = p[7];
   return a[0] + a[1];
 }

 for an array a that would get such a larger mode?  Thus, check what
 happens with partial defs of different types (just to avoid ICEs like the
 ones Jakub was fixing yesterday).

OK, I tried:

#include arm_neon.h

uint32x2_t foo(char *p)
{
  uint32x2_t a[2];
  int i;
  ((char *)a)[0] = p[0];
  ((char *)a)[1] = p[1];
  ((char *)a)[2] = p[2];
  ((char *)a)[3] = p[3];
  ((char *)a)[4] = p[4];
  ((char *)a)[5] = p[5];
  ((char *)a)[6] = p[6];
  ((char *)a)[7] = p[7];
  ((char *)a)[8] = p[8];
  ((char *)a)[9] = p[9];
  ((char *)a)[10] = p[10];
  ((char *)a)[11] = p[11];
  ((char *)a)[12] = p[12];
  ((char