Re: [AArch64] Fix vector multiplication costs
On Mon, 8 Feb 2021 at 18:10, Kyrylo Tkachov via Gcc-patches wrote: > > > > > -Original Message- > > From: Andre Vieira (lists) > > Sent: 03 February 2021 17:59 > > To: gcc-patches@gcc.gnu.org > > Cc: Kyrylo Tkachov > > Subject: [AArch64] Fix vector multiplication costs > > > > This patch introduces a vect.mul RTX cost and decouples the vector > > multiplication costing from the scalar one. > > > > After Wilco's "AArch64: Add cost table for Cortex-A76" patch we saw a > > regression in vector codegen. Reproduceable with the small test added in > > this patch. > > Upon further investigation we noticed 'aarch64_rtx_mult_cost' was using > > scalar costs to calculate the cost of vector multiplication, which was > > now lower and preventing 'choose_mult_variant' from making the right > > choice to expand such vector multiplications with constants as shift and > > sub's. I also added a special case for SSRA to use the default vector > > cost rather than mult, SSRA seems to be cost using > > 'aarch64_rtx_mult_cost', which to be fair is quite curious. I believe we > > should have a better look at 'aarch64_rtx_costs' altogether and > > completely decouple vector and scalar costs. Though that is something > > that requires more rewriting than I believe should be done in Stage 4. > > > > I gave all targets a vect.mult cost of 4x the vect.alu cost, with the > > exception of targets with cost 0 for vect.alu, those I gave the cost 4. > > > > Bootstrapped on aarch64. > > > > Is this OK for trunk? > > Ok. > Thanks, > Kyrill > > > > > gcc/ChangeLog: > > > > * config/aarch64/aarch64-cost-tables.h: Add entries for vect.mul. > > * config/aarch64/aarch64.c (aarch64_rtx_mult_cost): Use > > vect.mul for > > vector multiplies and vect.alu for SSRA. > > * config/arm/aarch-common-protos.h (struct vector_cost_table): > > Define > > vect.mul cost field. > > * config/arm/aarch-cost-tables.h: Add entries for vect.mul. > > * config/arm/arm.c: Likewise. > > > > gcc/testsuite/ChangeLog: > > > > * gcc.target/aarch64/asimd-mul-to-shl-sub.c: New test. > Hi Andre, It seems you forgot to update a test, because I've noticed these failures since you committed this patch: FAIL: gcc.target/aarch64/sve/mul_2.c -march=armv8.2-a+sve scan-assembler-times \\tmul\\tz[0-9]+\\.b, z[0-9]+\\.b, #-120\\n 1 FAIL: gcc.target/aarch64/sve/mul_2.c -march=armv8.2-a+sve scan-assembler-times \\tmul\\tz[0-9]+\\.h, z[0-9]+\\.h, #-128\\n 2 FAIL: gcc.target/aarch64/sve/mul_2.c -march=armv8.2-a+sve scan-assembler-times \\tmul\\tz[0-9]+\\.s, z[0-9]+\\.s, #-128\\n 1 Am I missing something? Thanks, Christophe
RE: [AArch64] Fix vector multiplication costs
> -Original Message- > From: Andre Vieira (lists) > Sent: 03 February 2021 17:59 > To: gcc-patches@gcc.gnu.org > Cc: Kyrylo Tkachov > Subject: [AArch64] Fix vector multiplication costs > > This patch introduces a vect.mul RTX cost and decouples the vector > multiplication costing from the scalar one. > > After Wilco's "AArch64: Add cost table for Cortex-A76" patch we saw a > regression in vector codegen. Reproduceable with the small test added in > this patch. > Upon further investigation we noticed 'aarch64_rtx_mult_cost' was using > scalar costs to calculate the cost of vector multiplication, which was > now lower and preventing 'choose_mult_variant' from making the right > choice to expand such vector multiplications with constants as shift and > sub's. I also added a special case for SSRA to use the default vector > cost rather than mult, SSRA seems to be cost using > 'aarch64_rtx_mult_cost', which to be fair is quite curious. I believe we > should have a better look at 'aarch64_rtx_costs' altogether and > completely decouple vector and scalar costs. Though that is something > that requires more rewriting than I believe should be done in Stage 4. > > I gave all targets a vect.mult cost of 4x the vect.alu cost, with the > exception of targets with cost 0 for vect.alu, those I gave the cost 4. > > Bootstrapped on aarch64. > > Is this OK for trunk? Ok. Thanks, Kyrill > > gcc/ChangeLog: > > * config/aarch64/aarch64-cost-tables.h: Add entries for vect.mul. > * config/aarch64/aarch64.c (aarch64_rtx_mult_cost): Use > vect.mul for > vector multiplies and vect.alu for SSRA. > * config/arm/aarch-common-protos.h (struct vector_cost_table): > Define > vect.mul cost field. > * config/arm/aarch-cost-tables.h: Add entries for vect.mul. > * config/arm/arm.c: Likewise. > > gcc/testsuite/ChangeLog: > > * gcc.target/aarch64/asimd-mul-to-shl-sub.c: New test.
[AArch64] Fix vector multiplication costs
This patch introduces a vect.mul RTX cost and decouples the vector multiplication costing from the scalar one. After Wilco's "AArch64: Add cost table for Cortex-A76" patch we saw a regression in vector codegen. Reproduceable with the small test added in this patch. Upon further investigation we noticed 'aarch64_rtx_mult_cost' was using scalar costs to calculate the cost of vector multiplication, which was now lower and preventing 'choose_mult_variant' from making the right choice to expand such vector multiplications with constants as shift and sub's. I also added a special case for SSRA to use the default vector cost rather than mult, SSRA seems to be cost using 'aarch64_rtx_mult_cost', which to be fair is quite curious. I believe we should have a better look at 'aarch64_rtx_costs' altogether and completely decouple vector and scalar costs. Though that is something that requires more rewriting than I believe should be done in Stage 4. I gave all targets a vect.mult cost of 4x the vect.alu cost, with the exception of targets with cost 0 for vect.alu, those I gave the cost 4. Bootstrapped on aarch64. Is this OK for trunk? gcc/ChangeLog: * config/aarch64/aarch64-cost-tables.h: Add entries for vect.mul. * config/aarch64/aarch64.c (aarch64_rtx_mult_cost): Use vect.mul for vector multiplies and vect.alu for SSRA. * config/arm/aarch-common-protos.h (struct vector_cost_table): Define vect.mul cost field. * config/arm/aarch-cost-tables.h: Add entries for vect.mul. * config/arm/arm.c: Likewise. gcc/testsuite/ChangeLog: * gcc.target/aarch64/asimd-mul-to-shl-sub.c: New test. diff --git a/gcc/config/aarch64/aarch64-cost-tables.h b/gcc/config/aarch64/aarch64-cost-tables.h index c309f88cbd56f0d2347996d860c982a3a6744492..dd2e7e7cbb13d24f0b51092270cd7e2d75fabf29 100644 --- a/gcc/config/aarch64/aarch64-cost-tables.h +++ b/gcc/config/aarch64/aarch64-cost-tables.h @@ -123,7 +123,8 @@ const struct cpu_cost_table qdf24xx_extra_costs = }, /* Vector */ { -COSTS_N_INSNS (1) /* alu. */ +COSTS_N_INSNS (1), /* alu. */ +COSTS_N_INSNS (4) /* mult. */ } }; @@ -227,7 +228,8 @@ const struct cpu_cost_table thunderx_extra_costs = }, /* Vector */ { -COSTS_N_INSNS (1) /* Alu. */ +COSTS_N_INSNS (1), /* Alu. */ +COSTS_N_INSNS (4) /* mult. */ } }; @@ -330,7 +332,8 @@ const struct cpu_cost_table thunderx2t99_extra_costs = }, /* Vector */ { -COSTS_N_INSNS (1) /* Alu. */ +COSTS_N_INSNS (1), /* Alu. */ +COSTS_N_INSNS (4) /* Mult. */ } }; @@ -433,7 +436,8 @@ const struct cpu_cost_table thunderx3t110_extra_costs = }, /* Vector */ { -COSTS_N_INSNS (1) /* Alu. */ +COSTS_N_INSNS (1), /* Alu. */ +COSTS_N_INSNS (4) /* Mult. */ } }; @@ -537,7 +541,8 @@ const struct cpu_cost_table tsv110_extra_costs = }, /* Vector */ { -COSTS_N_INSNS (1) /* alu. */ +COSTS_N_INSNS (1), /* alu. */ +COSTS_N_INSNS (4) /* mult. */ } }; @@ -640,7 +645,8 @@ const struct cpu_cost_table a64fx_extra_costs = }, /* Vector */ { -COSTS_N_INSNS (1) /* alu. */ +COSTS_N_INSNS (1), /* alu. */ +COSTS_N_INSNS (4) /* mult. */ } }; diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index b6192e55521004ae70cd13acbdb4dab142216845..146ed8c1b693d7204a754bc4e6d17025e0af544b 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -11568,7 +11568,6 @@ aarch64_rtx_mult_cost (rtx x, enum rtx_code code, int outer, bool speed) if (VECTOR_MODE_P (mode)) { unsigned int vec_flags = aarch64_classify_vector_mode (mode); - mode = GET_MODE_INNER (mode); if (vec_flags & VEC_ADVSIMD) { /* The by-element versions of the instruction have the same costs as @@ -11582,6 +11581,17 @@ aarch64_rtx_mult_cost (rtx x, enum rtx_code code, int outer, bool speed) else if (GET_CODE (op1) == VEC_DUPLICATE) op1 = XEXP (op1, 0); } + cost += rtx_cost (op0, mode, MULT, 0, speed); + cost += rtx_cost (op1, mode, MULT, 1, speed); + if (speed) + { + if (GET_CODE (x) == MULT) + cost += extra_cost->vect.mult; + /* This is to catch the SSRA costing currently flowing here. */ + else + cost += extra_cost->vect.alu; + } + return cost; } /* Integer multiply/fma. */ diff --git a/gcc/config/arm/aarch-common-protos.h b/gcc/config/arm/aarch-common-protos.h index 251de3d61a833a2bb4b77e9211cac7fbc17c0b75..7a9cf3d324c103de74af741abe9ef30b76fea5ce 100644 --- a/gcc/config/arm/aarch-common-protos.h +++ b/gcc/config/arm/aarch-common-protos.h @@ -132,6 +132,7 @@ struct fp_cost_table struct vector_cost_table { const int alu; + const int mult; }; struct cpu_cost_table diff --git a/gcc/config/arm/aarch-cost-tables.h