Re: [aarch64][PATCH v2] Disable reg offset in quad-word store for Falkor
On Wednesday 24 January 2018 06:29 PM, Siddhesh Poyarekar wrote: >>> + /* Avoid register indexing for 128-bit stores when the >>> + AARCH64_EXTRA_TUNE_SLOW_REGOFFSET_QUADWORD_STORE option is set. */ >>> + if (!optimize_size >>> + && type == ADDR_QUERY_STR >>> + && (aarch64_tune_params.extra_tuning_flags >>> + & AARCH64_EXTRA_TUNE_SLOW_REGOFFSET_QUADWORD_STORE) >>> + && (mode == TImode || mode == TFmode >>> + || aarch64_vector_data_mode_p (mode))) >>> + allow_reg_index_p = false; >> >> The aarch64_classify_vector_mode code has been reworked recently for SVE >> so I'm not entirely >> up to date with its logic, but I believe that >> "aarch64_classify_vector_mode (mode)" will >> allow 64-bit vector modes, which would not be using the 128-bit Q >> register, so you may be disabling >> register indexing for D-register memory stores. > > I check this and fix the condition if necessary. Looking back at the patch I remember why I used aarch64_vector_data_mode_p; this is to catch the pattern aarch64_simd_mov which optimizes a 64-bit store pair into a single quad word store. It should not avoid register indexing for any other vector modes since their patterns won't pass ADDR_QUERY_STR. In any case, I will be doing the CPU2017 run without -mcpu=falkor, so I'll report results from that. Siddhesh
Re: [aarch64][PATCH v2] Disable reg offset in quad-word store for Falkor
On Wednesday 24 January 2018 05:50 PM, Kyrill Tkachov wrote: > I would tend towards making the costs usage more intelligent and > differentiating > between loads and stores but I agree that is definitely GCC 9 material. > Whether this approach is an acceptable stopgap for GCC 8 is up to the > aarch64 maintainers > and will depend, among other things, on the impact it has on generic > (non-Falkor) codegen. > A good experiment to help this approach would be to compile a large > codebase (for example CPU2017) > with a non-Falkor -mcpu setting and make sure that there are no assembly > changes (or minimal). > This would help justify the aarch64.md constraint changes. Thanks, I'll verify with CPU2017. > file paths don't need the gcc/ because the ChangeLog file is already in > the gcc/ directory > >> gcc/testsuite/ >> * gcc/testsuite/gcc.target/aarch64/pr82533.c: New test case. > > Similarly, you don't need the gcc/testsuite/ prefix. > Also, since you have a bugzilla PR entry please reference it in the > ChangeLog > right above the file list: > PR target/82533 > > That way when the patch is committed the SVN hooks will pick it up > automagically and update bugzilla. Ugh, sorry, I was tardy about this - I'll fix it up. >> @@ -5530,6 +5530,16 @@ aarch64_classify_address (struct >> aarch64_address_info *info, >> || vec_flags == VEC_ADVSIMD >> || vec_flags == VEC_SVE_DATA)); >> >> + /* Avoid register indexing for 128-bit stores when the >> + AARCH64_EXTRA_TUNE_SLOW_REGOFFSET_QUADWORD_STORE option is set. */ >> + if (!optimize_size >> + && type == ADDR_QUERY_STR >> + && (aarch64_tune_params.extra_tuning_flags >> + & AARCH64_EXTRA_TUNE_SLOW_REGOFFSET_QUADWORD_STORE) >> + && (mode == TImode || mode == TFmode >> + || aarch64_vector_data_mode_p (mode))) >> + allow_reg_index_p = false; > > The aarch64_classify_vector_mode code has been reworked recently for SVE > so I'm not entirely > up to date with its logic, but I believe that > "aarch64_classify_vector_mode (mode)" will > allow 64-bit vector modes, which would not be using the 128-bit Q > register, so you may be disabling > register indexing for D-register memory stores. I check this and fix the condition if necessary. Thanks, Siddhesh
Re: [aarch64][PATCH v2] Disable reg offset in quad-word store for Falkor
Hi Siddhesh, On 23/01/18 15:41, Siddhesh Poyarekar wrote: Hi, Here's v2 of the patch to disable register offset addressing mode for stores of 128-bit values on Falkor because they're very costly. Differences from the last version: - Incorporated changes Jim made to his patch earlier that I missed, i.e. adding an extra tuning parameter called SLOW_REGOFFSET_QUADWORD_STORE instead of making it conditional on TUNE_FALKOR. - Added a new query type ADDR_QUERY_STR to indicate the queried address is used for a store. This way I can use it for other scenarios where stores are significantly more expensive than loads, such as pre/post modify addressing modes. - Incorporated the constraint functionality into aarch64_legitimate_address_p and aarch64_classify_address. I evaluated the suggestion of using costs to do this but it's not possible with the current costs as they do not differentiate between loads and stores. If modifying all passes that use these costs to identify loads vs stores is considered OK (ivopts seems to be the biggest user) then I can volunteer to do that work for gcc9 and evetually replace this. I would tend towards making the costs usage more intelligent and differentiating between loads and stores but I agree that is definitely GCC 9 material. Whether this approach is an acceptable stopgap for GCC 8 is up to the aarch64 maintainers and will depend, among other things, on the impact it has on generic (non-Falkor) codegen. A good experiment to help this approach would be to compile a large codebase (for example CPU2017) with a non-Falkor -mcpu setting and make sure that there are no assembly changes (or minimal). This would help justify the aarch64.md constraint changes. I have a couple of comments on the patch inline. Thanks, Kyrill On Falkor, because of an idiosyncracy of how the pipelines are designed, a quad-word store using a reg+reg addressing mode is almost twice as slow as an add followed by a quad-word store with a single reg addressing mode. So we get better performance if we disallow addressing modes using register offsets with quad-word stores. This patch improves fpspeed by 0.3% and intspeed by 0.22% in CPU2017, with omnetpp_s (4.3%) and pop2_s (2.6%) being the biggest winners. 2018-xx-xx Jim WilsonKugan Vivenakandarajah Siddhesh Poyarekar gcc/ * gcc/config/aarch64/aarch64-protos.h (aarch64_addr_query_type): New member ADDR_QUERY_STR. * gcc/config/aarch64/aarch64-tuning-flags.def (SLOW_REGOFFSET_QUADWORD_STORE): New. * gcc/config/aarch64/aarch64.c (qdf24xx_tunings): Add SLOW_REGOFFSET_QUADWORD_STORE to tuning flags. (aarch64_classify_address): Avoid register indexing for quad mode stores when SLOW_REGOFFSET_QUADWORD_STORE is set. * gcc/config/aarch64/constraints.md (Uts): New constraint. * gcc/config/aarch64/aarch64.md (movti_aarch64, movtf_aarch64): Use it. * gcc/config/aarch64/aarch64-simd.md (aarch64_simd_mov): Likewise. file paths don't need the gcc/ because the ChangeLog file is already in the gcc/ directory gcc/testsuite/ * gcc/testsuite/gcc.target/aarch64/pr82533.c: New test case. Similarly, you don't need the gcc/testsuite/ prefix. Also, since you have a bugzilla PR entry please reference it in the ChangeLog right above the file list: PR target/82533 That way when the patch is committed the SVN hooks will pick it up automagically and update bugzilla. --- gcc/config/aarch64/aarch64-protos.h | 4 gcc/config/aarch64/aarch64-simd.md | 4 ++-- gcc/config/aarch64/aarch64-tuning-flags.def | 4 gcc/config/aarch64/aarch64.c| 12 +++- gcc/config/aarch64/aarch64.md | 6 +++--- gcc/config/aarch64/constraints.md | 7 +++ gcc/testsuite/gcc.target/aarch64/pr82533.c | 11 +++ 7 files changed, 42 insertions(+), 6 deletions(-) create mode 100644 gcc/testsuite/gcc.target/aarch64/pr82533.c diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h index ef1b0bc8e28..5fedc85f283 100644 --- a/gcc/config/aarch64/aarch64-protos.h +++ b/gcc/config/aarch64/aarch64-protos.h @@ -120,6 +120,9 @@ enum aarch64_symbol_type ADDR_QUERY_LDP_STP Query what is valid for a load/store pair. + ADDR_QUERY_STR + Query what is valid for a store. + ADDR_QUERY_ANY Query what is valid for at least one memory constraint, which may allow things that "m" doesn't. For example, the SVE LDR and STR @@ -128,6 +131,7 @@ enum aarch64_symbol_type enum aarch64_addr_query_type { ADDR_QUERY_M, ADDR_QUERY_LDP_STP, + ADDR_QUERY_STR, ADDR_QUERY_ANY }; diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index
[aarch64][PATCH v2] Disable reg offset in quad-word store for Falkor
Hi, Here's v2 of the patch to disable register offset addressing mode for stores of 128-bit values on Falkor because they're very costly. Differences from the last version: - Incorporated changes Jim made to his patch earlier that I missed, i.e. adding an extra tuning parameter called SLOW_REGOFFSET_QUADWORD_STORE instead of making it conditional on TUNE_FALKOR. - Added a new query type ADDR_QUERY_STR to indicate the queried address is used for a store. This way I can use it for other scenarios where stores are significantly more expensive than loads, such as pre/post modify addressing modes. - Incorporated the constraint functionality into aarch64_legitimate_address_p and aarch64_classify_address. I evaluated the suggestion of using costs to do this but it's not possible with the current costs as they do not differentiate between loads and stores. If modifying all passes that use these costs to identify loads vs stores is considered OK (ivopts seems to be the biggest user) then I can volunteer to do that work for gcc9 and evetually replace this. On Falkor, because of an idiosyncracy of how the pipelines are designed, a quad-word store using a reg+reg addressing mode is almost twice as slow as an add followed by a quad-word store with a single reg addressing mode. So we get better performance if we disallow addressing modes using register offsets with quad-word stores. This patch improves fpspeed by 0.3% and intspeed by 0.22% in CPU2017, with omnetpp_s (4.3%) and pop2_s (2.6%) being the biggest winners. 2018-xx-xx Jim WilsonKugan Vivenakandarajah Siddhesh Poyarekar gcc/ * gcc/config/aarch64/aarch64-protos.h (aarch64_addr_query_type): New member ADDR_QUERY_STR. * gcc/config/aarch64/aarch64-tuning-flags.def (SLOW_REGOFFSET_QUADWORD_STORE): New. * gcc/config/aarch64/aarch64.c (qdf24xx_tunings): Add SLOW_REGOFFSET_QUADWORD_STORE to tuning flags. (aarch64_classify_address): Avoid register indexing for quad mode stores when SLOW_REGOFFSET_QUADWORD_STORE is set. * gcc/config/aarch64/constraints.md (Uts): New constraint. * gcc/config/aarch64/aarch64.md (movti_aarch64, movtf_aarch64): Use it. * gcc/config/aarch64/aarch64-simd.md (aarch64_simd_mov): Likewise. gcc/testsuite/ * gcc/testsuite/gcc.target/aarch64/pr82533.c: New test case. --- gcc/config/aarch64/aarch64-protos.h | 4 gcc/config/aarch64/aarch64-simd.md | 4 ++-- gcc/config/aarch64/aarch64-tuning-flags.def | 4 gcc/config/aarch64/aarch64.c| 12 +++- gcc/config/aarch64/aarch64.md | 6 +++--- gcc/config/aarch64/constraints.md | 7 +++ gcc/testsuite/gcc.target/aarch64/pr82533.c | 11 +++ 7 files changed, 42 insertions(+), 6 deletions(-) create mode 100644 gcc/testsuite/gcc.target/aarch64/pr82533.c diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h index ef1b0bc8e28..5fedc85f283 100644 --- a/gcc/config/aarch64/aarch64-protos.h +++ b/gcc/config/aarch64/aarch64-protos.h @@ -120,6 +120,9 @@ enum aarch64_symbol_type ADDR_QUERY_LDP_STP Query what is valid for a load/store pair. + ADDR_QUERY_STR + Query what is valid for a store. + ADDR_QUERY_ANY Query what is valid for at least one memory constraint, which may allow things that "m" doesn't. For example, the SVE LDR and STR @@ -128,6 +131,7 @@ enum aarch64_symbol_type enum aarch64_addr_query_type { ADDR_QUERY_M, ADDR_QUERY_LDP_STP, + ADDR_QUERY_STR, ADDR_QUERY_ANY }; diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index 3d1f6a01cb7..48d92702723 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -131,9 +131,9 @@ (define_insn "*aarch64_simd_mov" [(set (match_operand:VQ 0 "nonimmediate_operand" - "=w, Umq, m, w, ?r, ?w, ?r, w") + "=w, Umq, Uts, w, ?r, ?w, ?r, w") (match_operand:VQ 1 "general_operand" - "m, Dz, w, w, w, r, r, Dn"))] + "m, Dz,w, w, w, r, r, Dn"))] "TARGET_SIMD && (register_operand (operands[0], mode) || aarch64_simd_reg_or_zero (operands[1], mode))" diff --git a/gcc/config/aarch64/aarch64-tuning-flags.def b/gcc/config/aarch64/aarch64-tuning-flags.def index ea9ead234cb..04baf5b6de6 100644 --- a/gcc/config/aarch64/aarch64-tuning-flags.def +++ b/gcc/config/aarch64/aarch64-tuning-flags.def @@ -41,4 +41,8 @@ AARCH64_EXTRA_TUNING_OPTION ("slow_unaligned_ldpw", SLOW_UNALIGNED_LDPW) are not considered cheap. */ AARCH64_EXTRA_TUNING_OPTION ("cheap_shift_extend", CHEAP_SHIFT_EXTEND) +/* Don't use a register offset in a memory address for a quad-word store. */