Re: Add an array_mode_supported_p target hook
On Fri, 2011-05-06 at 11:35 +0100, Richard Sandiford wrote: Richard Earnshaw rearn...@arm.com writes: +/* Implements target hook array_mode_supported_p. */ + +static bool +arm_array_mode_supported_p (enum machine_mode mode, + unsigned HOST_WIDE_INT nelems) +{ + if (TARGET_NEON + (VALID_NEON_DREG_MODE (mode) || VALID_NEON_QREG_MODE (mode)) + (nelems = 2 nelems = 4)) +return true; + + return false; +} I'm not sure I understand why this is limited to 4 or fewer elements. A Q reg of chars would surely be 16 elements. The mode here is the mode of the array element, which for the cases we're interested in would be something like V4HI (D) or V4SI (Q). nelems says how many of those (in our case, vector) elements there are in the array. The element range we want is 1-4 because that matches the number of vectors that can be loaded by the vld1-vld4 instructions. We don't include 1 because arrays of one element are already treated as having the same mode as their element. Richard I understand now... Ok. R.
Re: Add an array_mode_supported_p target hook
Richard Earnshaw rearn...@arm.com writes: +/* Implements target hook array_mode_supported_p. */ + +static bool +arm_array_mode_supported_p (enum machine_mode mode, +unsigned HOST_WIDE_INT nelems) +{ + if (TARGET_NEON + (VALID_NEON_DREG_MODE (mode) || VALID_NEON_QREG_MODE (mode)) + (nelems = 2 nelems = 4)) +return true; + + return false; +} I'm not sure I understand why this is limited to 4 or fewer elements. A Q reg of chars would surely be 16 elements. The mode here is the mode of the array element, which for the cases we're interested in would be something like V4HI (D) or V4SI (Q). nelems says how many of those (in our case, vector) elements there are in the array. The element range we want is 1-4 because that matches the number of vectors that can be loaded by the vld1-vld4 instructions. We don't include 1 because arrays of one element are already treated as having the same mode as their element. Richard
Re: Add an array_mode_supported_p target hook
To get back to this... Richard Sandiford richard.sandif...@linaro.org writes: Richard Guenther richard.guent...@gmail.com writes: On Thu, Mar 31, 2011 at 3:32 PM, Richard Sandiford richard.sandif...@linaro.org wrote: This patch adds an array_mode_supported_p hook, which says whether MAX_FIXED_MODE_SIZE should be ignored for a given type of array. It follows on from the discussion here: http://gcc.gnu.org/ml/gcc/2011-03/msg00342.html The intended use of the hook is to allow small arrays of vectors to have a non-BLK mode, and hence to be stored in rtl registers. These arrays are used both in the ARM arm_neon.h API and in the optabs proposed in: http://gcc.gnu.org/ml/gcc/2011-03/msg00322.html The tail end of the thread was about the definition of TYPE_MODE: #define TYPE_MODE(NODE) \ (TREE_CODE (TYPE_CHECK (NODE)) == VECTOR_TYPE \ ? vector_type_mode (NODE) : (NODE)-type.mode) with this outcome: http://gcc.gnu.org/ml/gcc/2011-03/msg00470.html To summarise my take on it: - The current definition of TYPE_MODE isn't sufficient even for vector modes and vector_mode_supported_p, because non-vector types can have vector modes. - We should no longer treat types as having one mode everywhere. We should instead replace TYPE_MODE with a function that takes a context. Tests of things like vector_mode_supported_p would move from layout_type to this new function. I think this patch fits within that scheme. array_mode_supported_p would be treated in the same way as vector_mode_supported_p. I realise the ideal would be to get rid of TYPE_MODE first. But that's going to be a longer-term thing. Now that there's at least a plan, I'd like to press ahead with the array stuff on the basis that (a) although the new hook won't work with the target attribute, our current mode handling doesn't work in just the same way. (b) the new hook doesn't interfere with the plan. (c) getting good code from the intrinsics (and support for these instructions in the vectoriser) is going to be much more important to most ARM users than the ability to turn Neon on and off for individual functions in a TU. To give an example of the difference, the Neon code posted here: http://hilbert-space.de/?p=22 produces this inner loop before the patch (but with http://gcc.gnu.org/ml/gcc-patches/2011-03/msg01996.html applied): .L3: vld3.8 {d16-d18}, [r1]! vstmia ip, {d16-d18} fldd d19, [sp, #24] adr r5, .L6 ldmia r5, {r4-r5} fldd d16, [sp, #32] vmov d18, r4, r5 @ v8qi vmull.u8 q9, d19, d18 adr r5, .L6+8 ldmia r5, {r4-r5} vmov d17, r4, r5 @ v8qi vstmia sp, {d18-d19} vmlal.u8 q9, d16, d17 fldd d16, [sp, #40] adr r5, .L6+16 ldmia r5, {r4-r5} vmov d17, r4, r5 @ v8qi vmlal.u8 q9, d16, d17 add r3, r3, #1 vshrn.i16 d16, q9, #8 cmp r3, r2 vst1.8 {d16}, [r0]! bne .L3 With both patches applied, the inner loop is: .L3: vld3.8 {d18-d20}, [r1]! vmull.u8 q8, d18, d21 vmlal.u8 q8, d19, d22 vmlal.u8 q8, d20, d23 add r3, r3, #1 vshrn.i16 d16, q8, #8 cmp r3, r2 vst1.8 {d16}, [r0]! bne .L3 Tested on arm-linux-gnueabi. OK to install? It looks reasonable given the past discussion, but - can you move forward with the Neon stuff a bit to see if it really fits? Or is this all that is needed for the load/store lane support as well (apart from vectorizer changes of course). Yeah, I have a prototype that hacks up some C support for generating the (otherwise internal-only) load/store built-in functions that the vectoriser is suppsoed to generate. This patch is all that seems to be needed for the types and optabs generation to work in the natural way. I'm happy to leave it until the vectoriser stuff is in a more submittable state though. The vectorisation stuff has now been approved and uses this hook to detect whether interleaved loads stores are supported. Also... Especially given: Can you check the code generated by for example float foo(char *p) { float a[2]; int i; ((char *)a)[0] = p[0]; ((char *)a)[1] = p[1]; ((char *)a)[2] = p[2]; ((char *)a)[3] = p[3]; ((char *)a)[4] = p[4]; ((char *)a)[5] = p[5]; ((char *)a)[6] = p[6]; ((char *)a)[7] = p[7]; return a[0] + a[1]; } for an array a that would get such a larger mode? Thus, check what happens with partial defs of different types (just to avoid ICEs like the ones Jakub was fixing yesterday). OK, I tried: #include arm_neon.h uint32x2_t foo(char *p) { uint32x2_t a[2]; int i; ((char *)a)[0] = p[0]; ((char *)a)[1] = p[1]; ((char *)a)[2] = p[2]; ((char *)a)[3] = p[3]; ((char
Re: Add an array_mode_supported_p target hook
On Thu, Apr 21, 2011 at 11:50 AM, Richard Sandiford richard.sandif...@linaro.org wrote: To get back to this... Richard Sandiford richard.sandif...@linaro.org writes: Richard Guenther richard.guent...@gmail.com writes: On Thu, Mar 31, 2011 at 3:32 PM, Richard Sandiford richard.sandif...@linaro.org wrote: This patch adds an array_mode_supported_p hook, which says whether MAX_FIXED_MODE_SIZE should be ignored for a given type of array. It follows on from the discussion here: http://gcc.gnu.org/ml/gcc/2011-03/msg00342.html The intended use of the hook is to allow small arrays of vectors to have a non-BLK mode, and hence to be stored in rtl registers. These arrays are used both in the ARM arm_neon.h API and in the optabs proposed in: http://gcc.gnu.org/ml/gcc/2011-03/msg00322.html The tail end of the thread was about the definition of TYPE_MODE: #define TYPE_MODE(NODE) \ (TREE_CODE (TYPE_CHECK (NODE)) == VECTOR_TYPE \ ? vector_type_mode (NODE) : (NODE)-type.mode) with this outcome: http://gcc.gnu.org/ml/gcc/2011-03/msg00470.html To summarise my take on it: - The current definition of TYPE_MODE isn't sufficient even for vector modes and vector_mode_supported_p, because non-vector types can have vector modes. - We should no longer treat types as having one mode everywhere. We should instead replace TYPE_MODE with a function that takes a context. Tests of things like vector_mode_supported_p would move from layout_type to this new function. I think this patch fits within that scheme. array_mode_supported_p would be treated in the same way as vector_mode_supported_p. I realise the ideal would be to get rid of TYPE_MODE first. But that's going to be a longer-term thing. Now that there's at least a plan, I'd like to press ahead with the array stuff on the basis that (a) although the new hook won't work with the target attribute, our current mode handling doesn't work in just the same way. (b) the new hook doesn't interfere with the plan. (c) getting good code from the intrinsics (and support for these instructions in the vectoriser) is going to be much more important to most ARM users than the ability to turn Neon on and off for individual functions in a TU. To give an example of the difference, the Neon code posted here: http://hilbert-space.de/?p=22 produces this inner loop before the patch (but with http://gcc.gnu.org/ml/gcc-patches/2011-03/msg01996.html applied): .L3: vld3.8 {d16-d18}, [r1]! vstmia ip, {d16-d18} fldd d19, [sp, #24] adr r5, .L6 ldmia r5, {r4-r5} fldd d16, [sp, #32] vmov d18, r4, r5 @ v8qi vmull.u8 q9, d19, d18 adr r5, .L6+8 ldmia r5, {r4-r5} vmov d17, r4, r5 @ v8qi vstmia sp, {d18-d19} vmlal.u8 q9, d16, d17 fldd d16, [sp, #40] adr r5, .L6+16 ldmia r5, {r4-r5} vmov d17, r4, r5 @ v8qi vmlal.u8 q9, d16, d17 add r3, r3, #1 vshrn.i16 d16, q9, #8 cmp r3, r2 vst1.8 {d16}, [r0]! bne .L3 With both patches applied, the inner loop is: .L3: vld3.8 {d18-d20}, [r1]! vmull.u8 q8, d18, d21 vmlal.u8 q8, d19, d22 vmlal.u8 q8, d20, d23 add r3, r3, #1 vshrn.i16 d16, q8, #8 cmp r3, r2 vst1.8 {d16}, [r0]! bne .L3 Tested on arm-linux-gnueabi. OK to install? It looks reasonable given the past discussion, but - can you move forward with the Neon stuff a bit to see if it really fits? Or is this all that is needed for the load/store lane support as well (apart from vectorizer changes of course). Yeah, I have a prototype that hacks up some C support for generating the (otherwise internal-only) load/store built-in functions that the vectoriser is suppsoed to generate. This patch is all that seems to be needed for the types and optabs generation to work in the natural way. I'm happy to leave it until the vectoriser stuff is in a more submittable state though. The vectorisation stuff has now been approved and uses this hook to detect whether interleaved loads stores are supported. Also... Especially given: Can you check the code generated by for example float foo(char *p) { float a[2]; int i; ((char *)a)[0] = p[0]; ((char *)a)[1] = p[1]; ((char *)a)[2] = p[2]; ((char *)a)[3] = p[3]; ((char *)a)[4] = p[4]; ((char *)a)[5] = p[5]; ((char *)a)[6] = p[6]; ((char *)a)[7] = p[7]; return a[0] + a[1]; } for an array a that would get such a larger mode? Thus, check what happens with partial defs of different types (just to avoid ICEs like the ones Jakub was fixing yesterday). OK, I tried: #include arm_neon.h uint32x2_t foo(char *p) { uint32x2_t a[2]; int i; ((char *)a)[0] =
Re: Add an array_mode_supported_p target hook
On Thu, Mar 31, 2011 at 3:32 PM, Richard Sandiford richard.sandif...@linaro.org wrote: This patch adds an array_mode_supported_p hook, which says whether MAX_FIXED_MODE_SIZE should be ignored for a given type of array. It follows on from the discussion here: http://gcc.gnu.org/ml/gcc/2011-03/msg00342.html The intended use of the hook is to allow small arrays of vectors to have a non-BLK mode, and hence to be stored in rtl registers. These arrays are used both in the ARM arm_neon.h API and in the optabs proposed in: http://gcc.gnu.org/ml/gcc/2011-03/msg00322.html The tail end of the thread was about the definition of TYPE_MODE: #define TYPE_MODE(NODE) \ (TREE_CODE (TYPE_CHECK (NODE)) == VECTOR_TYPE \ ? vector_type_mode (NODE) : (NODE)-type.mode) with this outcome: http://gcc.gnu.org/ml/gcc/2011-03/msg00470.html To summarise my take on it: - The current definition of TYPE_MODE isn't sufficient even for vector modes and vector_mode_supported_p, because non-vector types can have vector modes. - We should no longer treat types as having one mode everywhere. We should instead replace TYPE_MODE with a function that takes a context. Tests of things like vector_mode_supported_p would move from layout_type to this new function. I think this patch fits within that scheme. array_mode_supported_p would be treated in the same way as vector_mode_supported_p. I realise the ideal would be to get rid of TYPE_MODE first. But that's going to be a longer-term thing. Now that there's at least a plan, I'd like to press ahead with the array stuff on the basis that (a) although the new hook won't work with the target attribute, our current mode handling doesn't work in just the same way. (b) the new hook doesn't interfere with the plan. (c) getting good code from the intrinsics (and support for these instructions in the vectoriser) is going to be much more important to most ARM users than the ability to turn Neon on and off for individual functions in a TU. To give an example of the difference, the Neon code posted here: http://hilbert-space.de/?p=22 produces this inner loop before the patch (but with http://gcc.gnu.org/ml/gcc-patches/2011-03/msg01996.html applied): .L3: vld3.8 {d16-d18}, [r1]! vstmia ip, {d16-d18} fldd d19, [sp, #24] adr r5, .L6 ldmia r5, {r4-r5} fldd d16, [sp, #32] vmov d18, r4, r5 @ v8qi vmull.u8 q9, d19, d18 adr r5, .L6+8 ldmia r5, {r4-r5} vmov d17, r4, r5 @ v8qi vstmia sp, {d18-d19} vmlal.u8 q9, d16, d17 fldd d16, [sp, #40] adr r5, .L6+16 ldmia r5, {r4-r5} vmov d17, r4, r5 @ v8qi vmlal.u8 q9, d16, d17 add r3, r3, #1 vshrn.i16 d16, q9, #8 cmp r3, r2 vst1.8 {d16}, [r0]! bne .L3 With both patches applied, the inner loop is: .L3: vld3.8 {d18-d20}, [r1]! vmull.u8 q8, d18, d21 vmlal.u8 q8, d19, d22 vmlal.u8 q8, d20, d23 add r3, r3, #1 vshrn.i16 d16, q8, #8 cmp r3, r2 vst1.8 {d16}, [r0]! bne .L3 Tested on arm-linux-gnueabi. OK to install? It looks reasonable given the past discussion, but - can you move forward with the Neon stuff a bit to see if it really fits? Or is this all that is needed for the load/store lane support as well (apart from vectorizer changes of course). Can you check the code generated by for example float foo(char *p) { float a[2]; int i; ((char *)a)[0] = p[0]; ((char *)a)[1] = p[1]; ((char *)a)[2] = p[2]; ((char *)a)[3] = p[3]; ((char *)a)[4] = p[4]; ((char *)a)[5] = p[5]; ((char *)a)[6] = p[6]; ((char *)a)[7] = p[7]; return a[0] + a[1]; } for an array a that would get such a larger mode? Thus, check what happens with partial defs of different types (just to avoid ICEs like the ones Jakub was fixing yesterday). Thanks, Richard. Richard gcc/ * hooks.h (hook_bool_mode_uhwi_false): Declare. * hooks.c (hook_bool_mode_uhwi_false): New function. * target.def (array_mode_supported_p): New hook. * doc/tm.texi.in (TARGET_ARRAY_MODE_SUPPORTED_P): Add @hook. * doc/tm.texi: Regenerate. * stor-layout.c (mode_for_array): New function. (layout_type): Use it. * config/arm/arm.c (arm_array_mode_supported_p): New function. (TARGET_ARRAY_MODE_SUPPORTED_P): Define. Index: gcc/hooks.h === --- gcc/hooks.h 2011-03-31 10:57:26.0 +0100 +++ gcc/hooks.h 2011-03-31 14:18:21.0 +0100 @@ -34,6 +34,8 @@ extern bool hook_bool_mode_false (enum m extern bool hook_bool_mode_true (enum machine_mode); extern bool hook_bool_mode_const_rtx_false (enum machine_mode,
Re: Add an array_mode_supported_p target hook
Richard Guenther richard.guent...@gmail.com writes: On Thu, Mar 31, 2011 at 3:32 PM, Richard Sandiford richard.sandif...@linaro.org wrote: This patch adds an array_mode_supported_p hook, which says whether MAX_FIXED_MODE_SIZE should be ignored for a given type of array. It follows on from the discussion here: http://gcc.gnu.org/ml/gcc/2011-03/msg00342.html The intended use of the hook is to allow small arrays of vectors to have a non-BLK mode, and hence to be stored in rtl registers. These arrays are used both in the ARM arm_neon.h API and in the optabs proposed in: http://gcc.gnu.org/ml/gcc/2011-03/msg00322.html The tail end of the thread was about the definition of TYPE_MODE: #define TYPE_MODE(NODE) \ (TREE_CODE (TYPE_CHECK (NODE)) == VECTOR_TYPE \ ? vector_type_mode (NODE) : (NODE)-type.mode) with this outcome: http://gcc.gnu.org/ml/gcc/2011-03/msg00470.html To summarise my take on it: - The current definition of TYPE_MODE isn't sufficient even for vector modes and vector_mode_supported_p, because non-vector types can have vector modes. - We should no longer treat types as having one mode everywhere. We should instead replace TYPE_MODE with a function that takes a context. Tests of things like vector_mode_supported_p would move from layout_type to this new function. I think this patch fits within that scheme. array_mode_supported_p would be treated in the same way as vector_mode_supported_p. I realise the ideal would be to get rid of TYPE_MODE first. But that's going to be a longer-term thing. Now that there's at least a plan, I'd like to press ahead with the array stuff on the basis that (a) although the new hook won't work with the target attribute, our current mode handling doesn't work in just the same way. (b) the new hook doesn't interfere with the plan. (c) getting good code from the intrinsics (and support for these instructions in the vectoriser) is going to be much more important to most ARM users than the ability to turn Neon on and off for individual functions in a TU. To give an example of the difference, the Neon code posted here: http://hilbert-space.de/?p=22 produces this inner loop before the patch (but with http://gcc.gnu.org/ml/gcc-patches/2011-03/msg01996.html applied): .L3: vld3.8 {d16-d18}, [r1]! vstmia ip, {d16-d18} fldd d19, [sp, #24] adr r5, .L6 ldmia r5, {r4-r5} fldd d16, [sp, #32] vmov d18, r4, r5 @ v8qi vmull.u8 q9, d19, d18 adr r5, .L6+8 ldmia r5, {r4-r5} vmov d17, r4, r5 @ v8qi vstmia sp, {d18-d19} vmlal.u8 q9, d16, d17 fldd d16, [sp, #40] adr r5, .L6+16 ldmia r5, {r4-r5} vmov d17, r4, r5 @ v8qi vmlal.u8 q9, d16, d17 add r3, r3, #1 vshrn.i16 d16, q9, #8 cmp r3, r2 vst1.8 {d16}, [r0]! bne .L3 With both patches applied, the inner loop is: .L3: vld3.8 {d18-d20}, [r1]! vmull.u8 q8, d18, d21 vmlal.u8 q8, d19, d22 vmlal.u8 q8, d20, d23 add r3, r3, #1 vshrn.i16 d16, q8, #8 cmp r3, r2 vst1.8 {d16}, [r0]! bne .L3 Tested on arm-linux-gnueabi. OK to install? It looks reasonable given the past discussion, but - can you move forward with the Neon stuff a bit to see if it really fits? Or is this all that is needed for the load/store lane support as well (apart from vectorizer changes of course). Yeah, I have a prototype that hacks up some C support for generating the (otherwise internal-only) load/store built-in functions that the vectoriser is suppsoed to generate. This patch is all that seems to be needed for the types and optabs generation to work in the natural way. I'm happy to leave it until the vectoriser stuff is in a more submittable state though. Especially given: Can you check the code generated by for example float foo(char *p) { float a[2]; int i; ((char *)a)[0] = p[0]; ((char *)a)[1] = p[1]; ((char *)a)[2] = p[2]; ((char *)a)[3] = p[3]; ((char *)a)[4] = p[4]; ((char *)a)[5] = p[5]; ((char *)a)[6] = p[6]; ((char *)a)[7] = p[7]; return a[0] + a[1]; } for an array a that would get such a larger mode? Thus, check what happens with partial defs of different types (just to avoid ICEs like the ones Jakub was fixing yesterday). OK, I tried: #include arm_neon.h uint32x2_t foo(char *p) { uint32x2_t a[2]; int i; ((char *)a)[0] = p[0]; ((char *)a)[1] = p[1]; ((char *)a)[2] = p[2]; ((char *)a)[3] = p[3]; ((char *)a)[4] = p[4]; ((char *)a)[5] = p[5]; ((char *)a)[6] = p[6]; ((char *)a)[7] = p[7]; ((char *)a)[8] = p[8]; ((char *)a)[9] = p[9]; ((char *)a)[10] = p[10]; ((char *)a)[11] = p[11]; ((char *)a)[12] = p[12]; ((char