Re: [PATCH ARM] Improve ARM memset inlining
On Fri, Jul 4, 2014 at 1:17 PM, Bin Cheng bin.ch...@arm.com wrote: Hi Ramana, This is the rebased patch, there is no conflict against latest trunk. I am still doing some tests. Is it OK if tests are fine? Also, it depends on patch at https://gcc.gnu.org/ml/gcc-patches/2014-04/msg01923.html, I will update that patch two. Hi Ramana, Bootstrap and tests for this patch are done. Is it ok for me to submit? Thanks, bin
Re: [PATCH ARM] Improve ARM memset inlining
Hi Ramana, This is the rebased patch, there is no conflict against latest trunk. I am still doing some tests. Is it OK if tests are fine? Also, it depends on patch at https://gcc.gnu.org/ml/gcc-patches/2014-04/msg01923.html, I will update that patch two. Thanks, bin Index: gcc/config/arm/arm.c === --- gcc/config/arm/arm.c(revision 212295) +++ gcc/config/arm/arm.c(working copy) @@ -1588,34 +1588,38 @@ const struct tune_params arm_slowmul_tune = { arm_slowmul_rtx_costs, NULL, - NULL,/* Sched adj cost. */ - 3, /* Constant limit. */ - 5, /* Max cond insns. */ + NULL,/* Sched adj cost. */ + 3, /* Constant limit. */ + 5, /* Max cond insns. */ Please make sure alignment is maintained with comments as today. I'm not sure why I see the following diffs in your patch since you don't really should be touching those lines, that applies to all the cost tables. I haven't called out all the places where you appear to have unrelated formatting changes in detail, but have done so in one cost table. Please re-create a patch that doesn't have these hunks. ARM_PREFETCH_NOT_BENEFICIAL, - true,/* Prefer constant pool. */ + true,/* Prefer constant pool. */ Likewise. arm_default_branch_cost, - false, /* Prefer LDRD/STRD. */ - {true, true},/* Prefer non short circuit. */ - arm_default_vec_cost,/* Vectorizer costs. */ - false,/* Prefer Neon for 64-bits bitops. */ - false, false /* Prefer 32-bit encodings. */ + false, /* Prefer LDRD/STRD. */ + {true, true},/* Prefer non short circuit. */ + arm_default_vec_cost, /* Vectorizer costs. */ + false, /* Prefer Neon for 64-bits bitops. */ + false, false,/* Prefer 32-bit encodings. */ Likewise. + false, /* Prefer Neon for stringops. */ + 8/* Maximum insns to inline memset. */ }; const struct tune_params arm_fastmul_tune = { arm_fastmul_rtx_costs, NULL, - NULL,/* Sched adj cost. */ - 1, /* Constant limit. */ - 5, /* Max cond insns. */ + NULL,/* Sched adj cost. */ + 1, /* Constant limit. */ + 5, /* Max cond insns. */ ARM_PREFETCH_NOT_BENEFICIAL, - true,/* Prefer constant pool. */ + true,/* Prefer constant pool. */ arm_default_branch_cost, - false, /* Prefer LDRD/STRD. */ - {true, true},/* Prefer non short circuit. */ - arm_default_vec_cost,/* Vectorizer costs. */ - false,/* Prefer Neon for 64-bits bitops. */ - false, false /* Prefer 32-bit encodings. */ + false, /* Prefer LDRD/STRD. */ + {true, true},/* Prefer non short circuit. */ + arm_default_vec_cost, /* Vectorizer costs. */ + false, /* Prefer Neon for 64-bits bitops. */ + false, false,/* Prefer 32-bit encodings. */ + false, /* Prefer Neon for stringops. */ + 8/* Maximum insns to inline memset. */ }; /* StrongARM has early execution of branches, so a sequence that is worth @@ -1625,17 +1629,19 @@ const struct tune_params arm_strongarm_tune = { arm_fastmul_rtx_costs, NULL, - NULL,/* Sched adj cost. */ - 1, /* Constant limit. */ - 3, /* Max cond insns. */ + NULL,/* Sched adj cost. */ + 1, /* Constant limit. */ + 3, /* Max cond insns. */ ARM_PREFETCH_NOT_BENEFICIAL, - true,/* Prefer constant pool. */ + true,
Re: [PATCH ARM] Improve ARM memset inlining
On Tue, May 6, 2014 at 5:59 AM, bin.cheng bin.ch...@arm.com wrote: -Original Message- From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- ow...@gcc.gnu.org] On Behalf Of bin.cheng Sent: Monday, May 05, 2014 3:21 PM To: Richard Earnshaw Cc: gcc-patches@gcc.gnu.org Subject: RE: [PATCH ARM] Improve ARM memset inlining Hi Richard, Thanks for reviewing. I embedded answers to your comments, also updated the patch. -Original Message- From: Richard Earnshaw Sent: Friday, May 02, 2014 10:00 PM To: Bin Cheng Cc: gcc-patches@gcc.gnu.org Subject: Re: [PATCH ARM] Improve ARM memset inlining On 30/04/14 03:52, bin.cheng wrote: Hi, This patch expands small memset calls into direct memory set instructions by introducing setmemsi pattern. For processors without NEON support, it expands memset using general store instruction. For example, strd for 4-bytes aligned addresses. For processors with NEON support, it expands memset using neon instructions like vstr and miscellaneous vst1.* instructions for both aligned and unaligned cases. This patch depends on http://gcc.gnu.org/ml/gcc-patches/2014-04/msg01923.html otherwise vst1.64 will be generated for 32-bit aligned memory unit. There is also one leftover work of this patch: Since vst1.* instructions only support post-increment addressing mode, the inlined memset for unaligned neon cases should be like: vmov.i32 q8, #... vst1.8 {q8}, [r3]! vst1.8 {q8}, [r3]! vst1.8 {q8}, [r3]! vst1.8 {q8}, [r3] Other than for zero, I'd expect the vmov to be vmov.i8 to move an arbitrary I just used vmov.i32 as an example. The element size is actually calculated by function neon_valid_immediate which works as expected I think. byte value into all lanes in a vector. After that, if the alignment is known to be more than 8-bit, I'd expect the vst1 instructions (with the exception of the last store if the length is not a multiple of the alignment) to use vst1.align {reg}, [addr-reg :align]! Hence, for 16-bit aligned data, we want vst1.16 {q8}, [r3:16]! Did I miss something important? It seems to me the explicit alignment notes supported are 64/128/256. So what do you mean by 16 bits alignment here? But for now, gcc can't do this and below code is generated: vmov.i32 q8, #... vst1.8 {q8}, [r3] addr2, r3, #16 addr3, r2, #16 vst1.8 {q8}, [r2] vst1.8 {q8}, [r3] addr2, r3, #16 vst1.8 {q8}, [r2] I investigated this issue. The root cause lies in rtx cost returned by ARM backend. Anyway, I think this is another issue and should be fixed in separated patch. Ok looks like Charles B from Linaro has run into the same thing and has some fixes to suggest in costs. Bootstrap and reg-test on cortex-a15, with or without neon support. Is it OK? Some more comments inline. Thanks, bin 2014-04-29 Bin Cheng bin.ch...@arm.com PR target/55701 * config/arm/arm.md (setmem): New pattern. * config/arm/arm-protos.h (struct tune_params): New field. (arm_gen_setmem): New prototype. * config/arm/arm.c (arm_slowmul_tune): Initialize new field. (arm_fastmul_tune, arm_strongarm_tune, arm_xscale_tune): Ditto. (arm_9e_tune, arm_v6t2_tune, arm_cortex_tune): Ditto. (arm_cortex_a8_tune, arm_cortex_a7_tune): Ditto. (arm_cortex_a15_tune, arm_cortex_a53_tune): Ditto. (arm_cortex_a57_tune, arm_cortex_a5_tune): Ditto. (arm_cortex_a9_tune, arm_cortex_a12_tune): Ditto. (arm_v7m_tune, arm_v6m_tune, arm_fa726te_tune): Ditto. (arm_const_inline_cost): New function. (arm_block_set_max_insns): New function. (arm_block_set_straight_profit_p): New function. (arm_block_set_vect_profit_p): New function. (arm_block_set_unaligned_vect): New function. (arm_block_set_aligned_vect): New function. (arm_block_set_unaligned_straight): New function. (arm_block_set_aligned_straight): New function. (arm_block_set_vect, arm_gen_setmem): New functions. gcc/testsuite/ChangeLog 2014-04-29 Bin Cheng bin.ch...@arm.com PR target/55701 * gcc.target/arm/memset-inline-1.c: New test. * gcc.target/arm/memset-inline-2.c: New test. * gcc.target/arm/memset-inline-3.c: New test. * gcc.target/arm/memset-inline-4.c: New test. * gcc.target/arm/memset-inline-5.c: New test. * gcc.target/arm/memset-inline-6.c: New test. * gcc.target/arm/memset-inline-7.c: New test. * gcc.target/arm/memset-inline-8.c: New test. * gcc.target/arm/memset-inline-9.c: New test. j1328-20140429.txt Index: gcc/config/arm/arm.c == = --- gcc/config/arm/arm.c (revision 209852) +++ gcc/config/arm/arm.c (working
RE: [PATCH ARM] Improve ARM memset inlining
Ping^4. The original thread is https://gcc.gnu.org/ml/gcc-patches/2014-05/msg00182.html, also there is some info at https://gcc.gnu.org/ml/gcc-patches/2014-05/msg00182.html in the same thread. Thanks, bin -Original Message- From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- ow...@gcc.gnu.org] On Behalf Of bin.cheng Sent: Wednesday, May 28, 2014 4:53 PM To: Richard Earnshaw Cc: gcc-patches List Subject: RE: [PATCH ARM] Improve ARM memset inlining Ping^3 -Original Message- From: Bin.Cheng [mailto:amker.ch...@gmail.com] Sent: Monday, May 19, 2014 2:40 PM To: Bin Cheng Cc: Richard Earnshaw; gcc-patches List Subject: Re: [PATCH ARM] Improve ARM memset inlining Ping^2 Thanks, bin On Mon, May 12, 2014 at 11:17 AM, Bin.Cheng amker.ch...@gmail.com wrote: Ping. Thanks, bin On Tue, May 6, 2014 at 12:59 PM, bin.cheng bin.ch...@arm.com wrote: Precisely, I configured gcc with options --with-arch=armv7-a --with-cpu|--with-tune=cortex-a9. I read gcc documents and realized that -mcpu is ignored when -march is specified. I don't know why gcc acts in this manner, but it leads to inconsistent configuration/command line behavior. If we configure GCC with --with-arch=armv7-a --with-cpu=cortex-a9, then only -march=armv7-a is passed to cc1. If we compile with -march=armv7-a -mcpu=cortex-a9, then gcc works fine and passes -march=armv7-a -mcpu=cortex-a9 to cc1. Even more weird cc1 warns that switch -mcpu=cortex-m4 conflicts with -march=armv7-m switch. Thanks, bin -- Best Regards. -- Best Regards.
RE: [PATCH ARM] Improve ARM memset inlining
Ping^3 -Original Message- From: Bin.Cheng [mailto:amker.ch...@gmail.com] Sent: Monday, May 19, 2014 2:40 PM To: Bin Cheng Cc: Richard Earnshaw; gcc-patches List Subject: Re: [PATCH ARM] Improve ARM memset inlining Ping^2 Thanks, bin On Mon, May 12, 2014 at 11:17 AM, Bin.Cheng amker.ch...@gmail.com wrote: Ping. Thanks, bin On Tue, May 6, 2014 at 12:59 PM, bin.cheng bin.ch...@arm.com wrote: Precisely, I configured gcc with options --with-arch=armv7-a --with-cpu|--with-tune=cortex-a9. I read gcc documents and realized that -mcpu is ignored when -march is specified. I don't know why gcc acts in this manner, but it leads to inconsistent configuration/command line behavior. If we configure GCC with --with-arch=armv7-a --with-cpu=cortex-a9, then only -march=armv7-a is passed to cc1. If we compile with -march=armv7-a -mcpu=cortex-a9, then gcc works fine and passes -march=armv7-a -mcpu=cortex-a9 to cc1. Even more weird cc1 warns that switch -mcpu=cortex-m4 conflicts with -march=armv7-m switch. Thanks, bin -- Best Regards. -- Best Regards.
Re: [PATCH ARM] Improve ARM memset inlining
Ping^2 Thanks, bin On Mon, May 12, 2014 at 11:17 AM, Bin.Cheng amker.ch...@gmail.com wrote: Ping. Thanks, bin On Tue, May 6, 2014 at 12:59 PM, bin.cheng bin.ch...@arm.com wrote: Precisely, I configured gcc with options --with-arch=armv7-a --with-cpu|--with-tune=cortex-a9. I read gcc documents and realized that -mcpu is ignored when -march is specified. I don't know why gcc acts in this manner, but it leads to inconsistent configuration/command line behavior. If we configure GCC with --with-arch=armv7-a --with-cpu=cortex-a9, then only -march=armv7-a is passed to cc1. If we compile with -march=armv7-a -mcpu=cortex-a9, then gcc works fine and passes -march=armv7-a -mcpu=cortex-a9 to cc1. Even more weird cc1 warns that switch -mcpu=cortex-m4 conflicts with -march=armv7-m switch. Thanks, bin -- Best Regards. -- Best Regards.
Re: [PATCH ARM] Improve ARM memset inlining
Ping. Thanks, bin On Tue, May 6, 2014 at 12:59 PM, bin.cheng bin.ch...@arm.com wrote: -Original Message- From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- ow...@gcc.gnu.org] On Behalf Of bin.cheng Sent: Monday, May 05, 2014 3:21 PM To: Richard Earnshaw Cc: gcc-patches@gcc.gnu.org Subject: RE: [PATCH ARM] Improve ARM memset inlining Hi Richard, Thanks for reviewing. I embedded answers to your comments, also updated the patch. -Original Message- From: Richard Earnshaw Sent: Friday, May 02, 2014 10:00 PM To: Bin Cheng Cc: gcc-patches@gcc.gnu.org Subject: Re: [PATCH ARM] Improve ARM memset inlining On 30/04/14 03:52, bin.cheng wrote: Hi, This patch expands small memset calls into direct memory set instructions by introducing setmemsi pattern. For processors without NEON support, it expands memset using general store instruction. For example, strd for 4-bytes aligned addresses. For processors with NEON support, it expands memset using neon instructions like vstr and miscellaneous vst1.* instructions for both aligned and unaligned cases. This patch depends on http://gcc.gnu.org/ml/gcc-patches/2014-04/msg01923.html otherwise vst1.64 will be generated for 32-bit aligned memory unit. There is also one leftover work of this patch: Since vst1.* instructions only support post-increment addressing mode, the inlined memset for unaligned neon cases should be like: vmov.i32 q8, #... vst1.8 {q8}, [r3]! vst1.8 {q8}, [r3]! vst1.8 {q8}, [r3]! vst1.8 {q8}, [r3] Other than for zero, I'd expect the vmov to be vmov.i8 to move an arbitrary I just used vmov.i32 as an example. The element size is actually calculated by function neon_valid_immediate which works as expected I think. byte value into all lanes in a vector. After that, if the alignment is known to be more than 8-bit, I'd expect the vst1 instructions (with the exception of the last store if the length is not a multiple of the alignment) to use vst1.align {reg}, [addr-reg :align]! Hence, for 16-bit aligned data, we want vst1.16 {q8}, [r3:16]! Did I miss something important? It seems to me the explicit alignment notes supported are 64/128/256. So what do you mean by 16 bits alignment here? But for now, gcc can't do this and below code is generated: vmov.i32 q8, #... vst1.8 {q8}, [r3] addr2, r3, #16 addr3, r2, #16 vst1.8 {q8}, [r2] vst1.8 {q8}, [r3] addr2, r3, #16 vst1.8 {q8}, [r2] I investigated this issue. The root cause lies in rtx cost returned by ARM backend. Anyway, I think this is another issue and should be fixed in separated patch. Bootstrap and reg-test on cortex-a15, with or without neon support. Is it OK? Some more comments inline. Thanks, bin 2014-04-29 Bin Cheng bin.ch...@arm.com PR target/55701 * config/arm/arm.md (setmem): New pattern. * config/arm/arm-protos.h (struct tune_params): New field. (arm_gen_setmem): New prototype. * config/arm/arm.c (arm_slowmul_tune): Initialize new field. (arm_fastmul_tune, arm_strongarm_tune, arm_xscale_tune): Ditto. (arm_9e_tune, arm_v6t2_tune, arm_cortex_tune): Ditto. (arm_cortex_a8_tune, arm_cortex_a7_tune): Ditto. (arm_cortex_a15_tune, arm_cortex_a53_tune): Ditto. (arm_cortex_a57_tune, arm_cortex_a5_tune): Ditto. (arm_cortex_a9_tune, arm_cortex_a12_tune): Ditto. (arm_v7m_tune, arm_v6m_tune, arm_fa726te_tune): Ditto. (arm_const_inline_cost): New function. (arm_block_set_max_insns): New function. (arm_block_set_straight_profit_p): New function. (arm_block_set_vect_profit_p): New function. (arm_block_set_unaligned_vect): New function. (arm_block_set_aligned_vect): New function. (arm_block_set_unaligned_straight): New function. (arm_block_set_aligned_straight): New function. (arm_block_set_vect, arm_gen_setmem): New functions. gcc/testsuite/ChangeLog 2014-04-29 Bin Cheng bin.ch...@arm.com PR target/55701 * gcc.target/arm/memset-inline-1.c: New test. * gcc.target/arm/memset-inline-2.c: New test. * gcc.target/arm/memset-inline-3.c: New test. * gcc.target/arm/memset-inline-4.c: New test. * gcc.target/arm/memset-inline-5.c: New test. * gcc.target/arm/memset-inline-6.c: New test. * gcc.target/arm/memset-inline-7.c: New test. * gcc.target/arm/memset-inline-8.c: New test. * gcc.target/arm/memset-inline-9.c: New test. j1328-20140429.txt Index: gcc/config/arm/arm.c == = --- gcc/config/arm/arm.c (revision 209852) +++ gcc/config/arm/arm.c (working copy) @@ -1585,10 +1585,11 @@ const struct tune_params arm_slowmul_tune
RE: [PATCH ARM] Improve ARM memset inlining
-Original Message- From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- ow...@gcc.gnu.org] On Behalf Of bin.cheng Sent: Monday, May 05, 2014 3:21 PM To: Richard Earnshaw Cc: gcc-patches@gcc.gnu.org Subject: RE: [PATCH ARM] Improve ARM memset inlining Hi Richard, Thanks for reviewing. I embedded answers to your comments, also updated the patch. -Original Message- From: Richard Earnshaw Sent: Friday, May 02, 2014 10:00 PM To: Bin Cheng Cc: gcc-patches@gcc.gnu.org Subject: Re: [PATCH ARM] Improve ARM memset inlining On 30/04/14 03:52, bin.cheng wrote: Hi, This patch expands small memset calls into direct memory set instructions by introducing setmemsi pattern. For processors without NEON support, it expands memset using general store instruction. For example, strd for 4-bytes aligned addresses. For processors with NEON support, it expands memset using neon instructions like vstr and miscellaneous vst1.* instructions for both aligned and unaligned cases. This patch depends on http://gcc.gnu.org/ml/gcc-patches/2014-04/msg01923.html otherwise vst1.64 will be generated for 32-bit aligned memory unit. There is also one leftover work of this patch: Since vst1.* instructions only support post-increment addressing mode, the inlined memset for unaligned neon cases should be like: vmov.i32 q8, #... vst1.8 {q8}, [r3]! vst1.8 {q8}, [r3]! vst1.8 {q8}, [r3]! vst1.8 {q8}, [r3] Other than for zero, I'd expect the vmov to be vmov.i8 to move an arbitrary I just used vmov.i32 as an example. The element size is actually calculated by function neon_valid_immediate which works as expected I think. byte value into all lanes in a vector. After that, if the alignment is known to be more than 8-bit, I'd expect the vst1 instructions (with the exception of the last store if the length is not a multiple of the alignment) to use vst1.align {reg}, [addr-reg :align]! Hence, for 16-bit aligned data, we want vst1.16 {q8}, [r3:16]! Did I miss something important? It seems to me the explicit alignment notes supported are 64/128/256. So what do you mean by 16 bits alignment here? But for now, gcc can't do this and below code is generated: vmov.i32 q8, #... vst1.8 {q8}, [r3] addr2, r3, #16 addr3, r2, #16 vst1.8 {q8}, [r2] vst1.8 {q8}, [r3] addr2, r3, #16 vst1.8 {q8}, [r2] I investigated this issue. The root cause lies in rtx cost returned by ARM backend. Anyway, I think this is another issue and should be fixed in separated patch. Bootstrap and reg-test on cortex-a15, with or without neon support. Is it OK? Some more comments inline. Thanks, bin 2014-04-29 Bin Cheng bin.ch...@arm.com PR target/55701 * config/arm/arm.md (setmem): New pattern. * config/arm/arm-protos.h (struct tune_params): New field. (arm_gen_setmem): New prototype. * config/arm/arm.c (arm_slowmul_tune): Initialize new field. (arm_fastmul_tune, arm_strongarm_tune, arm_xscale_tune): Ditto. (arm_9e_tune, arm_v6t2_tune, arm_cortex_tune): Ditto. (arm_cortex_a8_tune, arm_cortex_a7_tune): Ditto. (arm_cortex_a15_tune, arm_cortex_a53_tune): Ditto. (arm_cortex_a57_tune, arm_cortex_a5_tune): Ditto. (arm_cortex_a9_tune, arm_cortex_a12_tune): Ditto. (arm_v7m_tune, arm_v6m_tune, arm_fa726te_tune): Ditto. (arm_const_inline_cost): New function. (arm_block_set_max_insns): New function. (arm_block_set_straight_profit_p): New function. (arm_block_set_vect_profit_p): New function. (arm_block_set_unaligned_vect): New function. (arm_block_set_aligned_vect): New function. (arm_block_set_unaligned_straight): New function. (arm_block_set_aligned_straight): New function. (arm_block_set_vect, arm_gen_setmem): New functions. gcc/testsuite/ChangeLog 2014-04-29 Bin Cheng bin.ch...@arm.com PR target/55701 * gcc.target/arm/memset-inline-1.c: New test. * gcc.target/arm/memset-inline-2.c: New test. * gcc.target/arm/memset-inline-3.c: New test. * gcc.target/arm/memset-inline-4.c: New test. * gcc.target/arm/memset-inline-5.c: New test. * gcc.target/arm/memset-inline-6.c: New test. * gcc.target/arm/memset-inline-7.c: New test. * gcc.target/arm/memset-inline-8.c: New test. * gcc.target/arm/memset-inline-9.c: New test. j1328-20140429.txt Index: gcc/config/arm/arm.c == = --- gcc/config/arm/arm.c (revision 209852) +++ gcc/config/arm/arm.c (working copy) @@ -1585,10 +1585,11 @@ const struct tune_params arm_slowmul_tune = true, /* Prefer constant pool
Re: [PATCH ARM] Improve ARM memset inlining
On 30/04/14 03:52, bin.cheng wrote: Hi, This patch expands small memset calls into direct memory set instructions by introducing setmemsi pattern. For processors without NEON support, it expands memset using general store instruction. For example, strd for 4-bytes aligned addresses. For processors with NEON support, it expands memset using neon instructions like vstr and miscellaneous vst1.* instructions for both aligned and unaligned cases. This patch depends on http://gcc.gnu.org/ml/gcc-patches/2014-04/msg01923.html otherwise vst1.64 will be generated for 32-bit aligned memory unit. There is also one leftover work of this patch: Since vst1.* instructions only support post-increment addressing mode, the inlined memset for unaligned neon cases should be like: vmov.i32 q8, #... vst1.8 {q8}, [r3]! vst1.8 {q8}, [r3]! vst1.8 {q8}, [r3]! vst1.8 {q8}, [r3] Other than for zero, I'd expect the vmov to be vmov.i8 to move an arbitrary byte value into all lanes in a vector. After that, if the alignment is known to be more than 8-bit, I'd expect the vst1 instructions (with the exception of the last store if the length is not a multiple of the alignment) to use vst1.align {reg}, [addr-reg :align]! Hence, for 16-bit aligned data, we want vst1.16 {q8}, [r3:16]! But for now, gcc can't do this and below code is generated: vmov.i32 q8, #... vst1.8 {q8}, [r3] addr2, r3, #16 addr3, r2, #16 vst1.8 {q8}, [r2] vst1.8 {q8}, [r3] addr2, r3, #16 vst1.8 {q8}, [r2] I investigated this issue. The root cause lies in rtx cost returned by ARM backend. Anyway, I think this is another issue and should be fixed in separated patch. Bootstrap and reg-test on cortex-a15, with or without neon support. Is it OK? Some more comments inline. Thanks, bin 2014-04-29 Bin Cheng bin.ch...@arm.com PR target/55701 * config/arm/arm.md (setmem): New pattern. * config/arm/arm-protos.h (struct tune_params): New field. (arm_gen_setmem): New prototype. * config/arm/arm.c (arm_slowmul_tune): Initialize new field. (arm_fastmul_tune, arm_strongarm_tune, arm_xscale_tune): Ditto. (arm_9e_tune, arm_v6t2_tune, arm_cortex_tune): Ditto. (arm_cortex_a8_tune, arm_cortex_a7_tune): Ditto. (arm_cortex_a15_tune, arm_cortex_a53_tune): Ditto. (arm_cortex_a57_tune, arm_cortex_a5_tune): Ditto. (arm_cortex_a9_tune, arm_cortex_a12_tune): Ditto. (arm_v7m_tune, arm_v6m_tune, arm_fa726te_tune): Ditto. (arm_const_inline_cost): New function. (arm_block_set_max_insns): New function. (arm_block_set_straight_profit_p): New function. (arm_block_set_vect_profit_p): New function. (arm_block_set_unaligned_vect): New function. (arm_block_set_aligned_vect): New function. (arm_block_set_unaligned_straight): New function. (arm_block_set_aligned_straight): New function. (arm_block_set_vect, arm_gen_setmem): New functions. gcc/testsuite/ChangeLog 2014-04-29 Bin Cheng bin.ch...@arm.com PR target/55701 * gcc.target/arm/memset-inline-1.c: New test. * gcc.target/arm/memset-inline-2.c: New test. * gcc.target/arm/memset-inline-3.c: New test. * gcc.target/arm/memset-inline-4.c: New test. * gcc.target/arm/memset-inline-5.c: New test. * gcc.target/arm/memset-inline-6.c: New test. * gcc.target/arm/memset-inline-7.c: New test. * gcc.target/arm/memset-inline-8.c: New test. * gcc.target/arm/memset-inline-9.c: New test. j1328-20140429.txt Index: gcc/config/arm/arm.c === --- gcc/config/arm/arm.c (revision 209852) +++ gcc/config/arm/arm.c (working copy) @@ -1585,10 +1585,11 @@ const struct tune_params arm_slowmul_tune = true, /* Prefer constant pool. */ arm_default_branch_cost, false, /* Prefer LDRD/STRD. */ - {true, true}, /* Prefer non short circuit. */ - arm_default_vec_cost,/* Vectorizer costs. */ - false,/* Prefer Neon for 64-bits bitops. */ - false, false /* Prefer 32-bit encodings. */ + {true, true}, /* Prefer non short circuit. */ + arm_default_vec_cost,/* Vectorizer costs. */ + false,/* Prefer Neon for 64-bits bitops. */ + false, false, /* Prefer 32-bit encodings. */ + false /* Prefer Neon for stringops. */ }; Please make sure that all the white space before the comments is using TAB, not spaces. Similarly for the other
[PATCH ARM] Improve ARM memset inlining
Hi, This patch expands small memset calls into direct memory set instructions by introducing setmemsi pattern. For processors without NEON support, it expands memset using general store instruction. For example, strd for 4-bytes aligned addresses. For processors with NEON support, it expands memset using neon instructions like vstr and miscellaneous vst1.* instructions for both aligned and unaligned cases. This patch depends on http://gcc.gnu.org/ml/gcc-patches/2014-04/msg01923.html otherwise vst1.64 will be generated for 32-bit aligned memory unit. There is also one leftover work of this patch: Since vst1.* instructions only support post-increment addressing mode, the inlined memset for unaligned neon cases should be like: vmov.i32 q8, #... vst1.8 {q8}, [r3]! vst1.8 {q8}, [r3]! vst1.8 {q8}, [r3]! vst1.8 {q8}, [r3] But for now, gcc can't do this and below code is generated: vmov.i32 q8, #... vst1.8 {q8}, [r3] addr2, r3, #16 addr3, r2, #16 vst1.8 {q8}, [r2] vst1.8 {q8}, [r3] addr2, r3, #16 vst1.8 {q8}, [r2] I investigated this issue. The root cause lies in rtx cost returned by ARM backend. Anyway, I think this is another issue and should be fixed in separated patch. Bootstrap and reg-test on cortex-a15, with or without neon support. Is it OK? Thanks, bin 2014-04-29 Bin Cheng bin.ch...@arm.com PR target/55701 * config/arm/arm.md (setmem): New pattern. * config/arm/arm-protos.h (struct tune_params): New field. (arm_gen_setmem): New prototype. * config/arm/arm.c (arm_slowmul_tune): Initialize new field. (arm_fastmul_tune, arm_strongarm_tune, arm_xscale_tune): Ditto. (arm_9e_tune, arm_v6t2_tune, arm_cortex_tune): Ditto. (arm_cortex_a8_tune, arm_cortex_a7_tune): Ditto. (arm_cortex_a15_tune, arm_cortex_a53_tune): Ditto. (arm_cortex_a57_tune, arm_cortex_a5_tune): Ditto. (arm_cortex_a9_tune, arm_cortex_a12_tune): Ditto. (arm_v7m_tune, arm_v6m_tune, arm_fa726te_tune): Ditto. (arm_const_inline_cost): New function. (arm_block_set_max_insns): New function. (arm_block_set_straight_profit_p): New function. (arm_block_set_vect_profit_p): New function. (arm_block_set_unaligned_vect): New function. (arm_block_set_aligned_vect): New function. (arm_block_set_unaligned_straight): New function. (arm_block_set_aligned_straight): New function. (arm_block_set_vect, arm_gen_setmem): New functions. gcc/testsuite/ChangeLog 2014-04-29 Bin Cheng bin.ch...@arm.com PR target/55701 * gcc.target/arm/memset-inline-1.c: New test. * gcc.target/arm/memset-inline-2.c: New test. * gcc.target/arm/memset-inline-3.c: New test. * gcc.target/arm/memset-inline-4.c: New test. * gcc.target/arm/memset-inline-5.c: New test. * gcc.target/arm/memset-inline-6.c: New test. * gcc.target/arm/memset-inline-7.c: New test. * gcc.target/arm/memset-inline-8.c: New test. * gcc.target/arm/memset-inline-9.c: New test. Index: gcc/config/arm/arm.c === --- gcc/config/arm/arm.c(revision 209852) +++ gcc/config/arm/arm.c(working copy) @@ -1585,10 +1585,11 @@ const struct tune_params arm_slowmul_tune = true,/* Prefer constant pool. */ arm_default_branch_cost, false, /* Prefer LDRD/STRD. */ - {true, true},/* Prefer non short circuit. */ - arm_default_vec_cost,/* Vectorizer costs. */ - false,/* Prefer Neon for 64-bits bitops. */ - false, false /* Prefer 32-bit encodings. */ + {true, true},/* Prefer non short circuit. */ + arm_default_vec_cost,/* Vectorizer costs. */ + false,/* Prefer Neon for 64-bits bitops. */ + false, false, /* Prefer 32-bit encodings. */ + false /* Prefer Neon for stringops. */ }; const struct tune_params arm_fastmul_tune = @@ -1602,10 +1603,11 @@ const struct tune_params arm_fastmul_tune = true,/* Prefer constant pool. */ arm_default_branch_cost, false, /* Prefer LDRD/STRD. */ - {true, true},/* Prefer non short circuit. */ - arm_default_vec_cost,/* Vectorizer costs. */ - false,/* Prefer Neon for 64-bits bitops. */ - false, false /* Prefer 32-bit encodings. */ + {true, true},