Re: [AArch64] Emit square root using the Newton series
On 04/27/16 09:23, James Greenhalgh wrote: On Tue, Apr 12, 2016 at 01:14:51PM -0500, Evandro Menezes wrote: On 04/05/16 17:30, Evandro Menezes wrote: On 04/05/16 13:37, Wilco Dijkstra wrote: I can't get any of these to work... Not only do I get a large number of collisions and duplicated code between these patches, when I try to resolve them, all I get is crashes whenever I try to use sqrt (even rsqrt stopped working). Do you have a patchset that applies cleanly so I can try all approximation routines? The original patches should be independent of each other, so indeed they duplicate code. This patch suite should be suitable for testing. Take look at other patch sets posted to this list for examples of how to make review easier. Please send a series of emails tagged: [Patch 0/3 AArch64] Add infrastructure for more approximate FP operations [PATCH 1/3 AArch64] Add more choices for the reciprocal square root approximation [PATCH 2/3 AArch64] Emit square root using the Newton series [PATCH 3/3 AArch64] Emit division using the Newton series One patch per email, with the dependencies explicit like this, is infinitely easier to follow than the current structure of your patch set. I'm not trying to be pedantic for the sake of it, I'm genuinely unsure where the latest patch versions currently are and how I should apply them to a clean tree for review. I can certainly create such a series, but the patch above should be suitable for testing. Thank you, -- Evandro Menezes
Re: [AArch64] Emit square root using the Newton series
On Tue, Apr 12, 2016 at 01:14:51PM -0500, Evandro Menezes wrote: > On 04/05/16 17:30, Evandro Menezes wrote: > >On 04/05/16 13:37, Wilco Dijkstra wrote: > >>I can't get any of these to work... Not only do I get a large > >>number of collisions and duplicated > >>code between these patches, when I try to resolve them, all I > >>get is crashes whenever I try > >>to use sqrt (even rsqrt stopped working). Do you have a patchset > >>that applies cleanly so I can > >>try all approximation routines? > > > >The original patches should be independent of each other, so > >indeed they duplicate code. > > > >This patch suite should be suitable for testing. Take look at other patch sets posted to this list for examples of how to make review easier. Please send a series of emails tagged: [Patch 0/3 AArch64] Add infrastructure for more approximate FP operations [PATCH 1/3 AArch64] Add more choices for the reciprocal square root approximation [PATCH 2/3 AArch64] Emit square root using the Newton series [PATCH 3/3 AArch64] Emit division using the Newton series One patch per email, with the dependencies explicit like this, is infinitely easier to follow than the current structure of your patch set. I'm not trying to be pedantic for the sake of it, I'm genuinely unsure where the latest patch versions currently are and how I should apply them to a clean tree for review. Thanks, James
RE: [AArch64] Emit square root using the Newton series
> On 04/05/16 17:30, Evandro Menezes wrote: > > On 04/05/16 13:37, Wilco Dijkstra wrote: > >> I can't get any of these to work... Not only do I get a large number > >> of collisions and duplicated code between these patches, when I try > >> to resolve them, all I get is crashes whenever I try to use sqrt > >> (even rsqrt stopped working). Do you have a patchset that applies > >> cleanly so I can try all approximation routines? > > > > The original patches should be independent of each other, so indeed > > they duplicate code. > > > > This patch suite should be suitable for testing. > > Ping^1 Ping^2 -- Evandro Menezes
Re: [AArch64] Emit square root using the Newton series
On 04/05/16 17:30, Evandro Menezes wrote: On 04/05/16 13:37, Wilco Dijkstra wrote: I can't get any of these to work... Not only do I get a large number of collisions and duplicated code between these patches, when I try to resolve them, all I get is crashes whenever I try to use sqrt (even rsqrt stopped working). Do you have a patchset that applies cleanly so I can try all approximation routines? The original patches should be independent of each other, so indeed they duplicate code. This patch suite should be suitable for testing. Ping^1 -- Evandro Menezes
Re: [AArch64] Emit square root using the Newton series
On 04/05/16 13:37, Wilco Dijkstra wrote: I can't get any of these to work... Not only do I get a large number of collisions and duplicated code between these patches, when I try to resolve them, all I get is crashes whenever I try to use sqrt (even rsqrt stopped working). Do you have a patchset that applies cleanly so I can try all approximation routines? Hi, Wilco. The original patches should be independent of each other, so indeed they duplicate code. This patch suite should be suitable for testing. HTH -- Evandro Menezes >From cbc2b62f7df5c3e2fef2a24157b1bdd1a6de191b Mon Sep 17 00:00:00 2001 From: Evandro MenezesDate: Mon, 4 Apr 2016 14:02:24 -0500 Subject: [PATCH 3/3] Emit division using the Newton series 2016-04-04 Evandro Menezes Wilco Dijkstra gcc/ * config/aarch64/aarch64-tuning-flags.def * config/aarch64/aarch64-protos.h (tune_params): Add new member "approx_div_modes". (aarch64_emit_approx_div): Declare new function. * config/aarch64/aarch64.c (generic_tunings): New member "approx_div_modes". (cortexa35_tunings): Likewise. (cortexa53_tunings): Likewise. (cortexa57_tunings): Likewise. (cortexa72_tunings): Likewise. (exynosm1_tunings): Likewise. (thunderx_tunings): Likewise. (xgene1_tunings): Likewise. (aarch64_emit_approx_div): Define new function. * config/aarch64/aarch64.md ("div3"): New expansion. * config/aarch64/aarch64-simd.md ("div3"): Likewise. * config/aarch64/aarch64.opt (-mlow-precision-div): Add new option. * doc/invoke.texi (-mlow-precision-div): Describe new option. --- gcc/config/aarch64/aarch64-protos.h | 2 + gcc/config/aarch64/aarch64-simd.md | 14 +- gcc/config/aarch64/aarch64.c| 85 + gcc/config/aarch64/aarch64.md | 19 +++-- gcc/config/aarch64/aarch64.opt | 5 +++ gcc/doc/invoke.texi | 10 + 6 files changed, 130 insertions(+), 5 deletions(-) diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h index 85ad796..649faf7 100644 --- a/gcc/config/aarch64/aarch64-protos.h +++ b/gcc/config/aarch64/aarch64-protos.h @@ -244,6 +244,7 @@ struct tune_params } autoprefetcher_model; unsigned int extra_tuning_flags; + unsigned int approx_div_modes; unsigned int approx_sqrt_modes; unsigned int approx_rsqrt_modes; }; @@ -390,6 +391,7 @@ void aarch64_relayout_simd_types (void); void aarch64_reset_previous_fndecl (void); void aarch64_save_restore_target_globals (tree); bool aarch64_emit_approx_sqrt (rtx, rtx, bool); +bool aarch64_emit_approx_div (rtx, rtx, rtx); /* Initialize builtins for SIMD intrinsics. */ void init_aarch64_simd_builtins (void); diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index 47ccb18..7e99e16 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -1509,7 +1509,19 @@ [(set_attr "type" "neon_fp_mul_")] ) -(define_insn "div3" +(define_expand "div3" + [(set (match_operand:VDQF 0 "register_operand") + (div:VDQF (match_operand:VDQF 1 "general_operand") + (match_operand:VDQF 2 "register_operand")))] + "TARGET_SIMD" +{ + if (aarch64_emit_approx_div (operands[0], operands[1], operands[2])) +DONE; + + operands[1] = force_reg (mode, operands[1]); +}) + +(define_insn "*div3" [(set (match_operand:VDQF 0 "register_operand" "=w") (div:VDQF (match_operand:VDQF 1 "register_operand" "w") (match_operand:VDQF 2 "register_operand" "w")))] diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 4af2175..74310e8 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -417,6 +417,7 @@ static const struct tune_params generic_tunings = 0, /* cache_line_size. */ tune_params::AUTOPREFETCHER_OFF, /* autoprefetcher_model. */ (AARCH64_EXTRA_TUNE_NONE), /* tune_flags. */ + (AARCH64_APPROX_NONE), /* approx_div_modes. */ (AARCH64_APPROX_NONE), /* approx_sqrt_modes. */ (AARCH64_APPROX_NONE) /* approx_rsqrt_modes. */ }; @@ -444,6 +445,7 @@ static const struct tune_params cortexa35_tunings = 0, /* cache_line_size. */ tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ (AARCH64_EXTRA_TUNE_NONE), /* tune_flags. */ + (AARCH64_APPROX_NONE), /* approx_div_modes. */ (AARCH64_APPROX_NONE), /* approx_sqrt_modes. */ (AARCH64_APPROX_NONE) /* approx_rsqrt_modes. */ }; @@ -471,6 +473,7 @@ static const struct tune_params cortexa53_tunings = 0, /* cache_line_size. */ tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ (AARCH64_EXTRA_TUNE_NONE), /* tune_flags. */ + (AARCH64_APPROX_NONE), /* approx_div_modes. */ (AARCH64_APPROX_NONE), /* approx_sqrt_modes. */ (AARCH64_APPROX_NONE) /* approx_rsqrt_modes. */ }; @@ -498,6 +501,7 @@ static const struct tune_params cortexa57_tunings = 0, /* cache_line_size. */
Re: [AArch64] Emit square root using the Newton series
On 04/01/16 17:45, Evandro Menezes wrote: On 03/24/16 14:11, Evandro Menezes wrote: On 03/17/16 17:46, Evandro Menezes wrote: This patch refactors the function to emit the reciprocal square root approximation to also emit the square root approximation. This version of the patch cleans up the changes to the MD files and fixes some bugs introduced in it since the first proposal. This version of the patch uses the finer grained selection for the approximate sqrt() by the target firstly proposed at https://gcc.gnu.org/ml/gcc-patches/2016-04/msg00089.html Additionally, I changed the handling of the special case when the argument is 0.0 for scalars to be the same as for vectors. The reason is that by relying on the CC, a scarce resource, it hindered parallelism. By using up an additional register to hold the mask also for scalars, the code is more... scalable. Hopefully this patch gets close to what all have in mind. [AArch64] Emit square root using the Newton series 2016-04-04 Evandro MenezesWilco Dijkstra gcc/ * config/aarch64/aarch64-protos.h (aarch64_emit_approx_rsqrt): Replace with new function "aarch64_emit_approx_sqrt". (AARCH64_APPROX_MODE): New macro. (AARCH64_APPROX_{NONE,SP,DP,DFORM,QFORM,SCALAR,VECTOR,ALL}: Likewise. (tune_params): New member "approx_sqrt_modes". * config/aarch64/aarch64.c (generic_tunings): New member "approx_rsqrt_modes". (cortexa35_tunings): Likewise. (cortexa53_tunings): Likewise. (cortexa57_tunings): Likewise. (cortexa72_tunings): Likewise. (exynosm1_tunings): Likewise. (thunderx_tunings): Likewise. (xgene1_tunings): Likewise. (aarch64_emit_approx_rsqrt): Replace with new function "aarch64_emit_approx_sqrt". (aarch64_override_options_after_change_1): Handle new option. * config/aarch64/aarch64-simd.md (rsqrt2): Use new function instead. (sqrt2): New expansion and insn definitions. * config/aarch64/aarch64.md: Likewise. * config/aarch64/aarch64.opt (mlow-precision-sqrt): Add new option description. * doc/invoke.texi (mlow-precision-sqrt): Likewise. This version of the patch refactors the algorithm to shorten the dependency chain at the last iteration of the series. Thank you for your feedback. -- Evandro Menezes >From 8e463d55c89233a623aad2412fb3055021fdd066 Mon Sep 17 00:00:00 2001 From: Evandro Menezes Date: Mon, 4 Apr 2016 11:23:29 -0500 Subject: [PATCH] [AArch64] Emit square root using the Newton series 2016-04-04 Evandro Menezes Wilco Dijkstra gcc/ * config/aarch64/aarch64-protos.h (aarch64_emit_approx_rsqrt): Replace with new function "aarch64_emit_approx_sqrt". (AARCH64_APPROX_MODE): New macro. (AARCH64_APPROX_{NONE,SP,DP,DFORM,QFORM,SCALAR,VECTOR,ALL}: Likewise. (tune_params): New member "approx_sqrt_modes". * config/aarch64/aarch64.c (generic_tunings): New member "approx_rsqrt_modes". (cortexa35_tunings): Likewise. (cortexa53_tunings): Likewise. (cortexa57_tunings): Likewise. (cortexa72_tunings): Likewise. (exynosm1_tunings): Likewise. (thunderx_tunings): Likewise. (xgene1_tunings): Likewise. (aarch64_emit_approx_rsqrt): Replace with new function "aarch64_emit_approx_sqrt". (aarch64_override_options_after_change_1): Handle new option. * config/aarch64/aarch64-simd.md (rsqrt2): Use new function instead. (sqrt2): New expansion and insn definitions. * config/aarch64/aarch64.md: Likewise. * config/aarch64/aarch64.opt (mlow-precision-sqrt): Add new option description. * doc/invoke.texi (mlow-precision-sqrt): Likewise. --- gcc/config/aarch64/aarch64-protos.h | 29 - gcc/config/aarch64/aarch64-simd.md | 13 +++- gcc/config/aarch64/aarch64.c| 115 +--- gcc/config/aarch64/aarch64.md | 11 +++- gcc/config/aarch64/aarch64.opt | 9 ++- gcc/doc/invoke.texi | 10 6 files changed, 147 insertions(+), 40 deletions(-) diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h index 58c9d0d..365572d 100644 --- a/gcc/config/aarch64/aarch64-protos.h +++ b/gcc/config/aarch64/aarch64-protos.h @@ -178,6 +178,32 @@ struct cpu_branch_cost const int unpredictable; /* Unpredictable branch or optimizing for speed. */ }; +/* Control approximate alternatives to certain FP operators. */ +#define AARCH64_APPROX_MODE(MODE) \ + ((MIN_MODE_FLOAT <= (MODE) && (MODE) <= MAX_MODE_FLOAT) \ + ? (1 << ((MODE) - MIN_MODE_FLOAT)) \ + : (MIN_MODE_VECTOR_FLOAT <= (MODE) && (MODE) <= MAX_MODE_VECTOR_FLOAT) \ + ? (1 << ((MODE) - MIN_MODE_VECTOR_FLOAT \ +
Re: [AArch64] Emit square root using the Newton series
On 03/24/16 14:11, Evandro Menezes wrote: On 03/17/16 17:46, Evandro Menezes wrote: This patch refactors the function to emit the reciprocal square root approximation to also emit the square root approximation. This version of the patch cleans up the changes to the MD files and fixes some bugs introduced in it since the first proposal. [AArch64] Emit square root using the Newton series 2016-03-30 Evandro MenezesWilco Dijkstra gcc/ * config/aarch64/aarch64-protos.h (aarch64_emit_approx_rsqrt): Replace with new function "aarch64_emit_approx_sqrt". (AARCH64_APPROX_MODE): New macro. (AARCH64_APPROX_{NONE,SP,DP,DFORM,QFORM,SCALAR,VECTOR,ALL}: Likewise. (tune_params): New member "approx_sqrt_modes". * config/aarch64/aarch64.c (generic_tunings): New member "approx_rsqrt_modes". (cortexa35_tunings): Likewise. (cortexa53_tunings): Likewise. (cortexa57_tunings): Likewise. (cortexa72_tunings): Likewise. (exynosm1_tunings): Likewise. (thunderx_tunings): Likewise. (xgene1_tunings): Likewise. (aarch64_emit_approx_rsqrt): Replace with new function "aarch64_emit_approx_sqrt". (aarch64_override_options_after_change_1): Handle new option. * config/aarch64/aarch64-simd.md (rsqrt2): Use new function instead. (sqrt2): New expansion and insn definitions. * config/aarch64/aarch64.md: Likewise. * config/aarch64/aarch64.opt (mlow-precision-sqrt): Add new option description. * doc/invoke.texi (mlow-precision-sqrt): Likewise. This version of the patch uses the finer grained selection for the approximate sqrt() by the target firstly proposed at https://gcc.gnu.org/ml/gcc-patches/2016-04/msg00089.html Additionally, I changed the handling of the special case when the argument is 0.0 for scalars to be the same as for vectors. The reason is that by relying on the CC, a scarce resource, it hindered parallelism. By using up an additional register to hold the mask also for scalars, the code is more... scalable. Hopefully this patch gets close to what all have in mind. Thank you, -- Evandro Menezes >From 6a508df89b9dde5506ec7c2fc40013850b1cd07c Mon Sep 17 00:00:00 2001 From: Evandro Menezes Date: Thu, 17 Mar 2016 17:39:55 -0500 Subject: [PATCH] [AArch64] Emit square root using the Newton series 2016-03-30 Evandro Menezes Wilco Dijkstra gcc/ * config/aarch64/aarch64-protos.h (aarch64_emit_approx_rsqrt): Replace with new function "aarch64_emit_approx_sqrt". (AARCH64_APPROX_MODE): New macro. (AARCH64_APPROX_{NONE,SP,DP,DFORM,QFORM,SCALAR,VECTOR,ALL}: Likewise. (tune_params): New member "approx_sqrt_modes". * config/aarch64/aarch64.c (generic_tunings): New member "approx_rsqrt_modes". (cortexa35_tunings): Likewise. (cortexa53_tunings): Likewise. (cortexa57_tunings): Likewise. (cortexa72_tunings): Likewise. (exynosm1_tunings): Likewise. (thunderx_tunings): Likewise. (xgene1_tunings): Likewise. (aarch64_emit_approx_rsqrt): Replace with new function "aarch64_emit_approx_sqrt". (aarch64_override_options_after_change_1): Handle new option. * config/aarch64/aarch64-simd.md (rsqrt2): Use new function instead. (sqrt2): New expansion and insn definitions. * config/aarch64/aarch64.md: Likewise. * config/aarch64/aarch64.opt (mlow-precision-sqrt): Add new option description. * doc/invoke.texi (mlow-precision-sqrt): Likewise. --- gcc/config/aarch64/aarch64-protos.h | 28 - gcc/config/aarch64/aarch64-simd.md | 13 +++- gcc/config/aarch64/aarch64.c| 114 +--- gcc/config/aarch64/aarch64.md | 11 +++- gcc/config/aarch64/aarch64.opt | 9 ++- gcc/config/aarch64/predicates.md| 2 +- gcc/doc/invoke.texi | 10 7 files changed, 146 insertions(+), 41 deletions(-) diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h index dced209..055ba7a 100644 --- a/gcc/config/aarch64/aarch64-protos.h +++ b/gcc/config/aarch64/aarch64-protos.h @@ -178,6 +178,31 @@ struct cpu_branch_cost const int unpredictable; /* Unpredictable branch or optimizing for speed. */ }; +/* Control approximate alternatives to certain FP operators. */ +#define AARCH64_APPROX_MODE(MODE) \ + ((MIN_MODE_FLOAT <= (MODE) && (MODE) <= MAX_MODE_FLOAT) \ + ? (1 << ((MODE) - MIN_MODE_FLOAT)) \ + : (MIN_MODE_VECTOR_FLOAT <= (MODE) && (MODE) <= MAX_MODE_VECTOR_FLOAT) \ + ? (1 << ((MODE) - MIN_MODE_VECTOR_FLOAT + MAX_MODE_FLOAT + 1)) \ + : (0)) +#define AARCH64_APPROX_NONE (0) +#define AARCH64_APPROX_SP (AARCH64_APPROX_MODE (SFmode) \ + |
Re: [AArch64] Emit square root using the Newton series
On 03/17/16 17:46, Evandro Menezes wrote: This patch refactors the function to emit the reciprocal square root approximation to also emit the square root approximation. 2016-03-23 Evandro MenezesWilco Dijkstra gcc/ * config/aarch64/aarch64-tuning-flags.def (AARCH64_EXTRA_TUNE_APPROX_SQRT_{SF,DF}): New tuning macros. * config/aarch64/aarch64-protos.h (aarch64_emit_approx_rsqrt): Replace with "aarch64_emit_approx_sqrt". (AARCH64_EXTRA_TUNE_APPROX_SQRT): New macro. * config/aarch64/aarch64.c (exynosm1_tunings): Use the new macro. (aarch64_emit_approx_sqrt): Define new function. (aarch64_override_options_after_change_1): Handle new option. * config/aarch64/aarch64.md (rsqrt2): Use new function instead. (sqrt2): New expansion and insn definitions. * config/aarch64/aarch64-simd.md: Likewise. * config/aarch64/aarch64.opt (mlow-precision-sqrt): Add new option description. * doc/invoke.texi (mlow-precision-sqrt): Likewise. This version of the patch cleans up the changes to the MD files and fixes some bugs introduced in it since the first proposal. Thanks for your feedback, -- Evandro Menezes >From 712e330bf651393bb788e85ebe7b3d9a37f54ae7 Mon Sep 17 00:00:00 2001 From: Evandro Menezes Date: Thu, 17 Mar 2016 17:39:55 -0500 Subject: [PATCH] [AArch64] Emit square root using the Newton series 2016-03-23 Evandro Menezes Wilco Dijkstra gcc/ * config/aarch64/aarch64-tuning-flags.def (AARCH64_EXTRA_TUNE_APPROX_SQRT_{SF,DF}): New tuning macros. * config/aarch64/aarch64-protos.h (aarch64_emit_approx_rsqrt): Replace with "aarch64_emit_approx_sqrt". (AARCH64_EXTRA_TUNE_APPROX_SQRT): New macro. * config/aarch64/aarch64.c (exynosm1_tunings): Use the new macro. (aarch64_emit_approx_sqrt): Define new function. (aarch64_override_options_after_change_1): Handle new option. * config/aarch64/aarch64.md (rsqrt2): Use new function instead. (sqrt2): New expansion and insn definitions. * config/aarch64/aarch64-simd.md: Likewise. * config/aarch64/aarch64.opt (mlow-precision-sqrt): Add new option description. * doc/invoke.texi (mlow-precision-sqrt): Likewise. --- gcc/config/aarch64/aarch64-protos.h | 5 +- gcc/config/aarch64/aarch64-simd.md | 13 ++- gcc/config/aarch64/aarch64-tuning-flags.def | 3 +- gcc/config/aarch64/aarch64.c| 129 ++-- gcc/config/aarch64/aarch64.md | 11 ++- gcc/config/aarch64/aarch64.opt | 9 +- gcc/doc/invoke.texi | 10 +++ 7 files changed, 147 insertions(+), 33 deletions(-) diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h index dced209..24c2125 100644 --- a/gcc/config/aarch64/aarch64-protos.h +++ b/gcc/config/aarch64/aarch64-protos.h @@ -263,6 +263,9 @@ enum aarch64_extra_tuning_flags }; #undef AARCH64_EXTRA_TUNING_OPTION +#define AARCH64_EXTRA_TUNE_APPROX_SQRT \ + (AARCH64_EXTRA_TUNE_APPROX_SQRT_DF | AARCH64_EXTRA_TUNE_APPROX_SQRT_SF) + extern struct tune_params aarch64_tune_params; HOST_WIDE_INT aarch64_initial_elimination_offset (unsigned, unsigned); @@ -361,7 +364,7 @@ void aarch64_register_pragmas (void); void aarch64_relayout_simd_types (void); void aarch64_reset_previous_fndecl (void); void aarch64_save_restore_target_globals (tree); -void aarch64_emit_approx_rsqrt (rtx, rtx); +bool aarch64_emit_approx_sqrt (rtx, rtx, bool); /* Initialize builtins for SIMD intrinsics. */ void init_aarch64_simd_builtins (void); diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index bd73bce..47ccb18 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -405,7 +405,7 @@ UNSPEC_RSQRT))] "TARGET_SIMD" { - aarch64_emit_approx_rsqrt (operands[0], operands[1]); + aarch64_emit_approx_sqrt (operands[0], operands[1], true); DONE; }) @@ -4307,7 +4307,16 @@ ;; sqrt -(define_insn "sqrt2" +(define_expand "sqrt2" + [(set (match_operand:VDQF 0 "register_operand") + (sqrt:VDQF (match_operand:VDQF 1 "register_operand")))] + "TARGET_SIMD" +{ + if (aarch64_emit_approx_sqrt (operands[0], operands[1], false)) +DONE; +}) + +(define_insn "*sqrt2" [(set (match_operand:VDQF 0 "register_operand" "=w") (sqrt:VDQF (match_operand:VDQF 1 "register_operand" "w")))] "TARGET_SIMD" diff --git a/gcc/config/aarch64/aarch64-tuning-flags.def b/gcc/config/aarch64/aarch64-tuning-flags.def index 7e45a0c..725a79c 100644 --- a/gcc/config/aarch64/aarch64-tuning-flags.def +++ b/gcc/config/aarch64/aarch64-tuning-flags.def @@ -30,4 +30,5 @@ AARCH64_EXTRA_TUNING_OPTION ("rename_fma_regs",
Re: [AArch64] Emit square root using the Newton series
On Wed, Mar 16, 2016 at 02:45:37PM -0500, Evandro Menezes wrote: > On 03/08/16 16:08, Evandro Menezes wrote: > >On 02/16/16 14:56, Evandro Menezes wrote: > >>On 12/08/15 15:35, Evandro Menezes wrote: > >>>Emit square root using the Newton series > >>> > >>> 2015-12-03 Evandro Menezes> >>> > >>> gcc/ > >>>* config/aarch64/aarch64-protos.h (aarch64_emit_swsqrt): > >>> Declare new > >>>function. > >>>* config/aarch64/aarch64-simd.md (sqrt2): New > >>> expansion and > >>>insn definitions. > >>>* config/aarch64/aarch64-tuning-flags.def > >>>(AARCH64_EXTRA_TUNE_FAST_SQRT): New tuning macro. > >>>* config/aarch64/aarch64.c (aarch64_emit_swsqrt): Define > >>> new function. > >>>* config/aarch64/aarch64.md (sqrt2): New expansion > >>> and insn > >>>definitions. > >>>* config/aarch64/aarch64.opt (mlow-precision-recip-sqrt): > >>> Expand option > >>>description. > >>>* doc/invoke.texi (mlow-precision-recip-sqrt): Likewise. > >>> > >>>This patch extends the patch that added support for > >>>implementing x^-1/2 using the Newton series by adding support > >>>for x^1/2 as well. > >>> > >>>Is it OK at this point of stage 3? > >>> > >>>Thank you, > >>> > >> > >>James, > >> > >>As I was saying, this patch results in some validation errors in > >>CPU2000 benchmarks using DF. Although proving the algorithm to > >>be pretty solid with a vast set of random values, I'm confused > >>why some benchmarks fail to validate with this implementation of > >>the Newton series for square root too, when they pass with the > >>Newton series for reciprocal square root. > >> > >>Since I had no problems with the same algorithm on x86-64, I > >>wonder if the initial estimate on AArch64, which offers just 8 > >>bits, whereas x86-64 offers 11 bits, has to do with it. Then > >>again, the algorithm iterated 1 less time on x86-64 than on > >>AArch64. > >> > >>Since it seems that the initial estimate is sufficient for > >>CPU2000 to validate when using SF, I'm leaning towards > >>restricting the Newton series for square root only for SF. > >> > >>Your thoughts on the matter are appreciated, > > > >Add choices for the reciprocal square root approximation > > > >Allow a target to prefer such operation depending on the FP > > precision. > > > >gcc/ > >* config/aarch64/aarch64-protos.h > >(AARCH64_EXTRA_TUNE_APPROX_RSQRT): New macro. > >* config/aarch64/aarch64-tuning-flags.def > >(AARCH64_EXTRA_TUNE_APPROX_RSQRT_DF): New mask. > >(AARCH64_EXTRA_TUNE_APPROX_RSQRT_SF): Likewise. > >* config/aarch64/aarch64.c > >(use_rsqrt_p): New argument for the mode. > >(aarch64_builtin_reciprocal): Devise mode from builtin. > >(aarch64_optab_supported_p): New argument for the mode. > > > > > >Now that the patch is attached, feedback is appreciated. > > Ping. Hi Evandro, I thought this was on hold while you looked in to the underlying issue for the failures in the other thread? With that said, I'm struggling to keep up with where we are on this, so maybe it is time for a clean break - a new thread for patch set v2, proposed as an explicit patch series (just to keep the dependencies clear to me). I'm not convinced of the value of this split, nor why we would stop here if it was useful (vector modes vs. scalar modes would also seem an important distinction). If you no longer need the workaround this enables then I'm not sure I see a good reason for it to go in, maybe I'm missing a target for which this would be important? Thanks, James
Re: [AArch64] Emit square root using the Newton series
On 03/17/16 09:55, James Greenhalgh wrote: On Wed, Mar 16, 2016 at 02:45:37PM -0500, Evandro Menezes wrote: On 03/08/16 16:08, Evandro Menezes wrote: On 02/16/16 14:56, Evandro Menezes wrote: On 12/08/15 15:35, Evandro Menezes wrote: Emit square root using the Newton series 2015-12-03 Evandro Menezesgcc/ * config/aarch64/aarch64-protos.h (aarch64_emit_swsqrt): Declare new function. * config/aarch64/aarch64-simd.md (sqrt2): New expansion and insn definitions. * config/aarch64/aarch64-tuning-flags.def (AARCH64_EXTRA_TUNE_FAST_SQRT): New tuning macro. * config/aarch64/aarch64.c (aarch64_emit_swsqrt): Define new function. * config/aarch64/aarch64.md (sqrt2): New expansion and insn definitions. * config/aarch64/aarch64.opt (mlow-precision-recip-sqrt): Expand option description. * doc/invoke.texi (mlow-precision-recip-sqrt): Likewise. This patch extends the patch that added support for implementing x^-1/2 using the Newton series by adding support for x^1/2 as well. Is it OK at this point of stage 3? Thank you, James, As I was saying, this patch results in some validation errors in CPU2000 benchmarks using DF. Although proving the algorithm to be pretty solid with a vast set of random values, I'm confused why some benchmarks fail to validate with this implementation of the Newton series for square root too, when they pass with the Newton series for reciprocal square root. Since I had no problems with the same algorithm on x86-64, I wonder if the initial estimate on AArch64, which offers just 8 bits, whereas x86-64 offers 11 bits, has to do with it. Then again, the algorithm iterated 1 less time on x86-64 than on AArch64. Since it seems that the initial estimate is sufficient for CPU2000 to validate when using SF, I'm leaning towards restricting the Newton series for square root only for SF. Your thoughts on the matter are appreciated, Add choices for the reciprocal square root approximation Allow a target to prefer such operation depending on the FP precision. gcc/ * config/aarch64/aarch64-protos.h (AARCH64_EXTRA_TUNE_APPROX_RSQRT): New macro. * config/aarch64/aarch64-tuning-flags.def (AARCH64_EXTRA_TUNE_APPROX_RSQRT_DF): New mask. (AARCH64_EXTRA_TUNE_APPROX_RSQRT_SF): Likewise. * config/aarch64/aarch64.c (use_rsqrt_p): New argument for the mode. (aarch64_builtin_reciprocal): Devise mode from builtin. (aarch64_optab_supported_p): New argument for the mode. Now that the patch is attached, feedback is appreciated. Ping. Hi Evandro, I thought this was on hold while you looked in to the underlying issue for the failures in the other thread? With that said, I'm struggling to keep up with where we are on this, so maybe it is time for a clean break - a new thread for patch set v2, proposed as an explicit patch series (just to keep the dependencies clear to me). I'm not convinced of the value of this split, nor why we would stop here if it was useful (vector modes vs. scalar modes would also seem an important distinction). If you no longer need the workaround this enables then I'm not sure I see a good reason for it to go in, maybe I'm missing a target for which this would be important? Hi, James. OK, I'll start a thread over. Thank you, -- Evandro Menezes
Re: [AArch64] Emit square root using the Newton series
On 03/08/16 16:08, Evandro Menezes wrote: On 02/16/16 14:56, Evandro Menezes wrote: On 12/08/15 15:35, Evandro Menezes wrote: Emit square root using the Newton series 2015-12-03 Evandro Menezesgcc/ * config/aarch64/aarch64-protos.h (aarch64_emit_swsqrt): Declare new function. * config/aarch64/aarch64-simd.md (sqrt2): New expansion and insn definitions. * config/aarch64/aarch64-tuning-flags.def (AARCH64_EXTRA_TUNE_FAST_SQRT): New tuning macro. * config/aarch64/aarch64.c (aarch64_emit_swsqrt): Define new function. * config/aarch64/aarch64.md (sqrt2): New expansion and insn definitions. * config/aarch64/aarch64.opt (mlow-precision-recip-sqrt): Expand option description. * doc/invoke.texi (mlow-precision-recip-sqrt): Likewise. This patch extends the patch that added support for implementing x^-1/2 using the Newton series by adding support for x^1/2 as well. Is it OK at this point of stage 3? Thank you, James, As I was saying, this patch results in some validation errors in CPU2000 benchmarks using DF. Although proving the algorithm to be pretty solid with a vast set of random values, I'm confused why some benchmarks fail to validate with this implementation of the Newton series for square root too, when they pass with the Newton series for reciprocal square root. Since I had no problems with the same algorithm on x86-64, I wonder if the initial estimate on AArch64, which offers just 8 bits, whereas x86-64 offers 11 bits, has to do with it. Then again, the algorithm iterated 1 less time on x86-64 than on AArch64. Since it seems that the initial estimate is sufficient for CPU2000 to validate when using SF, I'm leaning towards restricting the Newton series for square root only for SF. Your thoughts on the matter are appreciated, Add choices for the reciprocal square root approximation Allow a target to prefer such operation depending on the FP precision. gcc/ * config/aarch64/aarch64-protos.h (AARCH64_EXTRA_TUNE_APPROX_RSQRT): New macro. * config/aarch64/aarch64-tuning-flags.def (AARCH64_EXTRA_TUNE_APPROX_RSQRT_DF): New mask. (AARCH64_EXTRA_TUNE_APPROX_RSQRT_SF): Likewise. * config/aarch64/aarch64.c (use_rsqrt_p): New argument for the mode. (aarch64_builtin_reciprocal): Devise mode from builtin. (aarch64_optab_supported_p): New argument for the mode. Now that the patch is attached, feedback is appreciated. Ping. -- Evandro Menezes
Re: [AArch64] Emit square root using the Newton series
On 03/10/16 19:06, Wilco Dijkstra wrote: Evandro Menezeswrote: That's what I had in mind too, but around the approximation for x^-1/2 and using masks for vector cases thusly: fcmne v3.4s, v0.4s, #0.0 frsqrte v1.4s, v0.4s fmulv2.4s, v1.4s, v1.4s frsqrts v2.4s, v0.4s, v2.4s fmulv1.4s, v1.4s, v2.4s fmulv2.4s, v1.4s, v1.4s frsqrts v2.4s, v0.4s, v2.4s fmulv1.4s, v1.4s, v2.4s and v1.4s, v3.4s fmulv0.4s, v1.4s, v0.4s That's possible but the overall latency is higher - according to exynos-1.md the above takes 44 cycles while my version would be 37. Emit square root using the Newton series 2016-03-16 Evandro Menezes Wilco Dijkstra gcc/ * config/aarch64/aarch64-tuning-flags.def (AARCH64_EXTRA_TUNE_APPROX_SQRT_{SF,DF}): New tuning macros. * config/aarch64/aarch64-protos.h (aarch64_emit_approx_rsqrt): Replace with "aarch64_emit_approx_sqrt". (AARCH64_EXTRA_TUNE_APPROX_SQRT): New macro. * config/aarch64/aarch64.c (exynosm1_tunings): Use the new macro. (aarch64_emit_approx_sqrt): Define new function. * config/aarch64/aarch64.md (rsqrt2): Use new function instead. (sqrt2): New expansion and insn definitions. * config/aarch64/aarch64-simd.md: Likewise. * config/aarch64/aarch64.opt (mlow-precision-recip-sqrt): Expand option description. * doc/invoke.texi (mlow-precision-recip-sqrt): Likewise. This patch merges the function that emits the approximate reciprocal square root and the approximate square root, qualifying the latter for when the input argument is zero. It depends on the patch at https://gcc.gnu.org/ml/gcc-patches/2016-03/msg00534.html I appreciate your feedback. Thank you, -- Evandro Menezes >From a69a80da4c3feab691d3c1df28906ef195e5631d Mon Sep 17 00:00:00 2001 From: Evandro Menezes Date: Wed, 16 Mar 2016 15:21:00 -0500 Subject: [PATCH] Emit square root using the Newton series 2016-03-16 Evandro Menezes Wilco Dijkstra gcc/ * config/aarch64/aarch64-tuning-flags.def (AARCH64_EXTRA_TUNE_APPROX_SQRT_{SF,DF}): New tuning macros. * config/aarch64/aarch64-protos.h (aarch64_emit_approx_rsqrt): Replace with "aarch64_emit_approx_sqrt". (AARCH64_EXTRA_TUNE_APPROX_SQRT): New macro. * config/aarch64/aarch64.c (exynosm1_tunings): Use the new macro. (aarch64_emit_approx_sqrt): Define new function. * config/aarch64/aarch64.md (rsqrt2): Use new function instead. (sqrt2): New expansion and insn definitions. * config/aarch64/aarch64-simd.md: Likewise. * config/aarch64/aarch64.opt (mlow-precision-recip-sqrt): Expand option description. * doc/invoke.texi (mlow-precision-recip-sqrt): Likewise. --- gcc/config/aarch64/aarch64-protos.h | 4 +- gcc/config/aarch64/aarch64-simd.md | 27 +++- gcc/config/aarch64/aarch64-tuning-flags.def | 3 +- gcc/config/aarch64/aarch64.c| 97 +++-- gcc/config/aarch64/aarch64.md | 25 +++- gcc/config/aarch64/aarch64.opt | 4 +- gcc/doc/invoke.texi | 9 +-- 7 files changed, 138 insertions(+), 31 deletions(-) diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h index 58e5d73..c9a5192 100644 --- a/gcc/config/aarch64/aarch64-protos.h +++ b/gcc/config/aarch64/aarch64-protos.h @@ -265,6 +265,8 @@ enum aarch64_extra_tuning_flags #define AARCH64_EXTRA_TUNE_APPROX_RSQRT \ (AARCH64_EXTRA_TUNE_APPROX_RSQRT_DF | AARCH64_EXTRA_TUNE_APPROX_RSQRT_SF) +#define AARCH64_EXTRA_TUNE_APPROX_SQRT \ + (AARCH64_EXTRA_TUNE_APPROX_SQRT_DF | AARCH64_EXTRA_TUNE_APPROX_SQRT_SF) extern struct tune_params aarch64_tune_params; @@ -364,7 +366,7 @@ void aarch64_register_pragmas (void); void aarch64_relayout_simd_types (void); void aarch64_reset_previous_fndecl (void); void aarch64_save_restore_target_globals (tree); -void aarch64_emit_approx_rsqrt (rtx, rtx); +void aarch64_emit_approx_sqrt (rtx, rtx, bool); /* Initialize builtins for SIMD intrinsics. */ void init_aarch64_simd_builtins (void); diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index bd73bce..31191bb 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -405,7 +405,7 @@ UNSPEC_RSQRT))] "TARGET_SIMD" { - aarch64_emit_approx_rsqrt (operands[0], operands[1]); + aarch64_emit_approx_sqrt (operands[0], operands[1], true); DONE; }) @@ -4307,7 +4307,30 @@ ;; sqrt -(define_insn "sqrt2" +(define_expand "sqrt2" + [(set (match_operand:VDQF 0 "register_operand") + (sqrt:VDQF
Re: [AArch64] Emit square root using the Newton series
Evandro Menezeswrote: > > I got the scalar version going, but I'm stuck with the vector version. > As you can see above, I need to use the complement of the mask produced > by FCMEQ to squelch the offending vector element. However, the way in > which FCMEQ is defined in GCC, it produces an integer vector and the > SIMD AND only takes integer vectors. I'm stuck at how to pass an FP > vector to AND and then its integer vector back to an FP insn. You can use gen_rtx_SUBREG(mcmp, xsqrt, 0) to change the mode to an integer vector on the AND instruction and back to mode for the destination. Wilco
Re: [AArch64] Emit square root using the Newton series
On 03/10/16 19:06, Wilco Dijkstra wrote: Evandro Menezeswrote: That's what I had in mind too, but around the approximation for x^-1/2 and using masks for vector cases thusly: fcmne v3.4s, v0.4s, #0.0 frsqrte v1.4s, v0.4s fmulv2.4s, v1.4s, v1.4s frsqrts v2.4s, v0.4s, v2.4s fmulv1.4s, v1.4s, v2.4s fmulv2.4s, v1.4s, v1.4s frsqrts v2.4s, v0.4s, v2.4s fmulv1.4s, v1.4s, v2.4s and v1.4s, v3.4s fmulv0.4s, v1.4s, v0.4s That's possible but the overall latency is higher - according to exynos-1.md the above takes 44 cycles while my version would be 37. I'm currently working to get this prototyped without modifying the reciprocal square root. Once I'm done, I'll merge both functions together to generate better code. I got the scalar version going, but I'm stuck with the vector version. As you can see above, I need to use the complement of the mask produced by FCMEQ to squelch the offending vector element. However, the way in which FCMEQ is defined in GCC, it produces an integer vector and the SIMD AND only takes integer vectors. I'm stuck at how to pass an FP vector to AND and then its integer vector back to an FP insn. Here's how the function stands at the moment: void aarch64_emit_approx_sqrt (rtx dst, rtx src) { machine_mode mode = GET_MODE (src); gcc_assert (GET_MODE_INNER (mode) == SFmode || GET_MODE_INNER (mode) == DFmode); bool scalar = !VECTOR_MODE_P (mode); bool narrow = (mode == V2SFmode); rtx xsrc = gen_reg_rtx (mode); emit_move_insn (xsrc, src); rtx xcc, xne, xmsk; if (scalar) { /* fcmp */ xcc = aarch64_gen_compare_reg (NE, xsrc, CONST0_RTX (mode)); xne = gen_rtx_NE (VOIDmode, xcc, const0_rtx); } else { machine_mode mcmp = mode_for_vector (int_mode_for_mode (GET_MODE_INNER (mode)), GET_MODE_NUNITS (mode)); /* fcmne */ xmsk = gen_reg_rtx (mode); /* Just V4SF for now */ emit_insn (gen_aarch64_cmeqv4sf (xmsk, xsrc, CONST0_RTX (mode))); /* TODO: must use the complement of the this result. */ } /* Calculate the approximate reciprocal square root. */ rtx xrsqrt = gen_reg_rtx (mode); aarch64_emit_approx_rsqrt (xrsqrt, xsrc); /* Calculate the approximate square root. */ rtx xsqrt = gen_reg_rtx (mode); emit_set_insn (xsqrt, gen_rtx_MULT (mode, xrsqrt, xsrc)); /* Qualify the result for when the input is zero. */ rtx xdst = gen_reg_rtx (mode); if (scalar) /* fcsel */ emit_set_insn (xdst, gen_rtx_IF_THEN_ELSE (mode, xne, xsqrt, xsrc)); else /* and */ emit_set_insn (xdst, gen_rtx_AND (mode, xsqrt, xmsk)); emit_move_insn (dst, xdst); } Any help is welcome. Thank you, -- Evandro Menezes
Re: [AArch64] Emit square root using the Newton series
On 03/10/16 13:10, Wilco Dijkstra wrote: frsqrte s1, s0 fmul s2, s1, s1 frsqrts s2, s0, s2 fcmp s0, 0.0 fmul s1, s1, s2 fmul s2, s1, s1 fmul s1, s0, s1 frsqrts s2, s0, s2 fcsels1, s0, s1, eq fmul s0, s1, s2 That's what I had in mind too, but around the approximation for x^-1/2 and using masks for vector cases thusly: fcmne v3.4s, v0.4s, #0.0 frsqrte v1.4s, v0.4s fmulv2.4s, v1.4s, v1.4s frsqrts v2.4s, v0.4s, v2.4s fmulv1.4s, v1.4s, v2.4s fmulv2.4s, v1.4s, v1.4s frsqrts v2.4s, v0.4s, v2.4s fmulv1.4s, v1.4s, v2.4s and v1.4s, v3.4s fmulv0.4s, v1.4s, v0.4s Thanks, -- Evandro Menezes
Re: [AArch64] Emit square root using the Newton series
On 03/10/16 10:52, Wilco Dijkstra wrote: > Hi Evandro, > >> I have however encountered precision issues with DF, namely some benchmarks >> in the SPECfp CPU2000 suite would fail to validate. > Accuracy is not an issue, the computation is extremely accurate. The issue is > that your patch doesn't support sqrt(0.0) - it returns NaN rather than zero, > and that causes the miscompares you're seeing. So support for the zero case > should be added. > > This would be a better expansion, supporting zero, and with lower latency > than the current sequence: Now I think of it, frsqrts returns 1.5 for the zero case, so we only need to fix up the estimated sqrt value before the final multiply. Since a FCSEL/VAND can be hidden completely behind the latency of frsqrts, both scalar and vector case could do this: frsqrte s1, s0 fmul s2, s1, s1 frsqrts s2, s0, s2 fcmp s0, 0.0 fmul s1, s1, s2 fmul s2, s1, s1 fmul s1, s0, s1 frsqrts s2, s0, s2 fcsels1, s0, s1, eq fmul s0, s1, s2 Wilco
Re: [AArch64] Emit square root using the Newton series
On 03/10/16 10:52, Wilco Dijkstra wrote: Hi Evandro, I have however encountered precision issues with DF, namely some benchmarks in the SPECfp CPU2000 suite would fail to validate. Accuracy is not an issue, the computation is extremely accurate. The issue is that your patch doesn't support sqrt(0.0) - it returns NaN rather than zero, and that causes the miscompares you're seeing. So support for the zero case should be added. This would be a better expansion, supporting zero, and with lower latency than the current sequence: fcmps0, 0.0 beq zero frsqrtes1, s0 fmuls2, s1, s1 frsqrtss2, s0, s2 fmuls1, s1, s2 fmuls2, s1, s1 fmul s1, s0, s1 frsqrtss2, s0, s2 fmuls0, s1, s2 zero: For the vector variant you can't avoid the extra latency of an AND, but it should not be slower than it is today. Thanks for the pointer, Wilco. Will work it in the patch. -- Evandro Menezes
Re: [AArch64] Emit square root using the Newton series
Hi Evandro, > I have however encountered precision issues with DF, namely some benchmarks > in the SPECfp CPU2000 suite would fail to validate. Accuracy is not an issue, the computation is extremely accurate. The issue is that your patch doesn't support sqrt(0.0) - it returns NaN rather than zero, and that causes the miscompares you're seeing. So support for the zero case should be added. This would be a better expansion, supporting zero, and with lower latency than the current sequence: fcmp s0, 0.0 beq zero frsqrte s1, s0 fmul s2, s1, s1 frsqrts s2, s0, s2 fmul s1, s1, s2 fmul s2, s1, s1 fmul s1, s0, s1 frsqrts s2, s0, s2 fmul s0, s1, s2 zero: For the vector variant you can't avoid the extra latency of an AND, but it should not be slower than it is today. Cheers, Wilco
Re: [AArch64] Emit square root using the Newton series
On 03/08/16 16:08, Evandro Menezes wrote: On 02/16/16 14:56, Evandro Menezes wrote: On 12/08/15 15:35, Evandro Menezes wrote: Emit square root using the Newton series 2015-12-03 Evandro Menezesgcc/ * config/aarch64/aarch64-protos.h (aarch64_emit_swsqrt): Declare new function. * config/aarch64/aarch64-simd.md (sqrt2): New expansion and insn definitions. * config/aarch64/aarch64-tuning-flags.def (AARCH64_EXTRA_TUNE_FAST_SQRT): New tuning macro. * config/aarch64/aarch64.c (aarch64_emit_swsqrt): Define new function. * config/aarch64/aarch64.md (sqrt2): New expansion and insn definitions. * config/aarch64/aarch64.opt (mlow-precision-recip-sqrt): Expand option description. * doc/invoke.texi (mlow-precision-recip-sqrt): Likewise. This patch extends the patch that added support for implementing x^-1/2 using the Newton series by adding support for x^1/2 as well. Is it OK at this point of stage 3? Thank you, James, As I was saying, this patch results in some validation errors in CPU2000 benchmarks using DF. Although proving the algorithm to be pretty solid with a vast set of random values, I'm confused why some benchmarks fail to validate with this implementation of the Newton series for square root too, when they pass with the Newton series for reciprocal square root. Since I had no problems with the same algorithm on x86-64, I wonder if the initial estimate on AArch64, which offers just 8 bits, whereas x86-64 offers 11 bits, has to do with it. Then again, the algorithm iterated 1 less time on x86-64 than on AArch64. Since it seems that the initial estimate is sufficient for CPU2000 to validate when using SF, I'm leaning towards restricting the Newton series for square root only for SF. Your thoughts on the matter are appreciated, Add choices for the reciprocal square root approximation Allow a target to prefer such operation depending on the FP precision. gcc/ * config/aarch64/aarch64-protos.h (AARCH64_EXTRA_TUNE_APPROX_RSQRT): New macro. * config/aarch64/aarch64-tuning-flags.def (AARCH64_EXTRA_TUNE_APPROX_RSQRT_DF): New mask. (AARCH64_EXTRA_TUNE_APPROX_RSQRT_SF): Likewise. * config/aarch64/aarch64.c (use_rsqrt_p): New argument for the mode. (aarch64_builtin_reciprocal): Devise mode from builtin. (aarch64_optab_supported_p): New argument for the mode. Emit square root using the Newton series gcc/ * config/aarch64/aarch64-tuning-flags.def (AARCH64_EXTRA_TUNE_APPROX_SQRT_{DF,SF}): New tuning macros. * config/aarch64/aarch64-protos.h (aarch64_emit_approx_sqrt): Declare new function. * config/aarch64/aarch64.c (aarch64_emit_approx_sqrt): Define new function. * config/aarch64/aarch64.md (sqrt*2): New expansion and insn definitions. * config/aarch64/aarch64-simd.md (sqrt*2): Likewise. * config/aarch64/aarch64.opt (mlow-precision-recip-sqrt): Expand option description. * doc/invoke.texi (mlow-precision-recip-sqrt): Likewise. This patch, which depends on https://gcc.gnu.org/ml/gcc-patches/2016-03/msg00534.html, leverages the reciprocal square root approximation to emit a faster square root approximation. I have however encountered precision issues with DF, namely some benchmarks in the SPECfp CPU2000 suite would fail to validate. Perhaps the initial estimate, with just 8 bits, is not good enough for the series to converge given the workloads of such benchmarks; perhaps denormals, known to occur in some of these benchmarks, result in errors. This was the motivation to split the tuning flags between one specific for DF and the other, for SF in the previous related patch. Again, now with the patch attached, your feedback is appreciated. Thank you, -- Evandro Menezes >From 4f61f722f744339650a48aa034906dd685110ae2 Mon Sep 17 00:00:00 2001 From: Evandro Menezes Date: Tue, 8 Mar 2016 15:06:03 -0600 Subject: [PATCH] Emit square root using the Newton series gcc/ * config/aarch64/aarch64-tuning-flags.def (AARCH64_EXTRA_TUNE_APPROX_SQRT_{DF,SF}): New tuning macros. * config/aarch64/aarch64-protos.h (aarch64_emit_approx_sqrt): Declare new function. * config/aarch64/aarch64.c (aarch64_emit_approx_sqrt): Define new function. * config/aarch64/aarch64.md (sqrt*2): New expansion and insn definitions. * config/aarch64/aarch64-simd.md (sqrt*2): Likewise. * config/aarch64/aarch64.opt (mlow-precision-recip-sqrt): Expand option description. * doc/invoke.texi (mlow-precision-recip-sqrt): Likewise. --- gcc/config/aarch64/aarch64-protos.h | 3 +++
Re: [AArch64] Emit square root using the Newton series
On 03/08/16 16:08, Evandro Menezes wrote: On 02/16/16 14:56, Evandro Menezes wrote: On 12/08/15 15:35, Evandro Menezes wrote: Emit square root using the Newton series 2015-12-03 Evandro Menezesgcc/ * config/aarch64/aarch64-protos.h (aarch64_emit_swsqrt): Declare new function. * config/aarch64/aarch64-simd.md (sqrt2): New expansion and insn definitions. * config/aarch64/aarch64-tuning-flags.def (AARCH64_EXTRA_TUNE_FAST_SQRT): New tuning macro. * config/aarch64/aarch64.c (aarch64_emit_swsqrt): Define new function. * config/aarch64/aarch64.md (sqrt2): New expansion and insn definitions. * config/aarch64/aarch64.opt (mlow-precision-recip-sqrt): Expand option description. * doc/invoke.texi (mlow-precision-recip-sqrt): Likewise. This patch extends the patch that added support for implementing x^-1/2 using the Newton series by adding support for x^1/2 as well. Is it OK at this point of stage 3? Thank you, James, As I was saying, this patch results in some validation errors in CPU2000 benchmarks using DF. Although proving the algorithm to be pretty solid with a vast set of random values, I'm confused why some benchmarks fail to validate with this implementation of the Newton series for square root too, when they pass with the Newton series for reciprocal square root. Since I had no problems with the same algorithm on x86-64, I wonder if the initial estimate on AArch64, which offers just 8 bits, whereas x86-64 offers 11 bits, has to do with it. Then again, the algorithm iterated 1 less time on x86-64 than on AArch64. Since it seems that the initial estimate is sufficient for CPU2000 to validate when using SF, I'm leaning towards restricting the Newton series for square root only for SF. Your thoughts on the matter are appreciated, Add choices for the reciprocal square root approximation Allow a target to prefer such operation depending on the FP precision. gcc/ * config/aarch64/aarch64-protos.h (AARCH64_EXTRA_TUNE_APPROX_RSQRT): New macro. * config/aarch64/aarch64-tuning-flags.def (AARCH64_EXTRA_TUNE_APPROX_RSQRT_DF): New mask. (AARCH64_EXTRA_TUNE_APPROX_RSQRT_SF): Likewise. * config/aarch64/aarch64.c (use_rsqrt_p): New argument for the mode. (aarch64_builtin_reciprocal): Devise mode from builtin. (aarch64_optab_supported_p): New argument for the mode. Emit square root using the Newton series gcc/ * config/aarch64/aarch64-tuning-flags.def (AARCH64_EXTRA_TUNE_APPROX_SQRT_{DF,SF}): New tuning macros. * config/aarch64/aarch64-protos.h (aarch64_emit_approx_sqrt): Declare new function. * config/aarch64/aarch64.c (aarch64_emit_approx_sqrt): Define new function. * config/aarch64/aarch64.md (sqrt*2): New expansion and insn definitions. * config/aarch64/aarch64-simd.md (sqrt*2): Likewise. * config/aarch64/aarch64.opt (mlow-precision-recip-sqrt): Expand option description. * doc/invoke.texi (mlow-precision-recip-sqrt): Likewise. This patch, which depends on https://gcc.gnu.org/ml/gcc-patches/2016-03/msg00534.html, leverages the reciprocal square root approximation to emit a faster square root approximation. I have however encountered precision issues with DF, namely some benchmarks in the SPECfp CPU2000 suite would fail to validate. Perhaps the initial estimate, with just 8 bits, is not good enough for the series to converge given the workloads of such benchmarks; perhaps denormals, known to occur in some of these benchmarks, result in errors. This was the motivation to split the tuning flags between one specific for DF and the other, for SF in the previous related patch. Again, your feedback is appreciated. Thank you, -- Evandro Menezes
Re: [AArch64] Emit square root using the Newton series
On 02/16/16 14:56, Evandro Menezes wrote: On 12/08/15 15:35, Evandro Menezes wrote: Emit square root using the Newton series 2015-12-03 Evandro Menezesgcc/ * config/aarch64/aarch64-protos.h (aarch64_emit_swsqrt): Declare new function. * config/aarch64/aarch64-simd.md (sqrt2): New expansion and insn definitions. * config/aarch64/aarch64-tuning-flags.def (AARCH64_EXTRA_TUNE_FAST_SQRT): New tuning macro. * config/aarch64/aarch64.c (aarch64_emit_swsqrt): Define new function. * config/aarch64/aarch64.md (sqrt2): New expansion and insn definitions. * config/aarch64/aarch64.opt (mlow-precision-recip-sqrt): Expand option description. * doc/invoke.texi (mlow-precision-recip-sqrt): Likewise. This patch extends the patch that added support for implementing x^-1/2 using the Newton series by adding support for x^1/2 as well. Is it OK at this point of stage 3? Thank you, James, As I was saying, this patch results in some validation errors in CPU2000 benchmarks using DF. Although proving the algorithm to be pretty solid with a vast set of random values, I'm confused why some benchmarks fail to validate with this implementation of the Newton series for square root too, when they pass with the Newton series for reciprocal square root. Since I had no problems with the same algorithm on x86-64, I wonder if the initial estimate on AArch64, which offers just 8 bits, whereas x86-64 offers 11 bits, has to do with it. Then again, the algorithm iterated 1 less time on x86-64 than on AArch64. Since it seems that the initial estimate is sufficient for CPU2000 to validate when using SF, I'm leaning towards restricting the Newton series for square root only for SF. Your thoughts on the matter are appreciated, Add choices for the reciprocal square root approximation Allow a target to prefer such operation depending on the FP precision. gcc/ * config/aarch64/aarch64-protos.h (AARCH64_EXTRA_TUNE_APPROX_RSQRT): New macro. * config/aarch64/aarch64-tuning-flags.def (AARCH64_EXTRA_TUNE_APPROX_RSQRT_DF): New mask. (AARCH64_EXTRA_TUNE_APPROX_RSQRT_SF): Likewise. * config/aarch64/aarch64.c (use_rsqrt_p): New argument for the mode. (aarch64_builtin_reciprocal): Devise mode from builtin. (aarch64_optab_supported_p): New argument for the mode. Now that the patch is attached, feedback is appreciated. Thank you, -- Evandro Menezes >From 0bb413550e854c81cc5ab180a3afdd43cd4faf0b Mon Sep 17 00:00:00 2001 From: Evandro Menezes Date: Thu, 3 Mar 2016 18:13:46 -0600 Subject: [PATCH] Add choices for the reciprocal square root approximation Allow a target to prefer such operation depending on the FP precision. gcc/ * config/aarch64/aarch64-protos.h (AARCH64_EXTRA_TUNE_APPROX_RSQRT): New macro. * config/aarch64/aarch64-tuning-flags.def (AARCH64_EXTRA_TUNE_APPROX_RSQRT_DF): New mask. (AARCH64_EXTRA_TUNE_APPROX_RSQRT_SF): Likewise. * config/aarch64/aarch64.c (use_rsqrt_p): New argument for the mode. (aarch64_builtin_reciprocal): Devise mode from builtin. (aarch64_optab_supported_p): New argument for the mode. --- gcc/config/aarch64/aarch64-protos.h | 3 +++ gcc/config/aarch64/aarch64-tuning-flags.def | 3 ++- gcc/config/aarch64/aarch64.c| 23 +++ 3 files changed, 20 insertions(+), 9 deletions(-) diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h index acf2062..ee3505c 100644 --- a/gcc/config/aarch64/aarch64-protos.h +++ b/gcc/config/aarch64/aarch64-protos.h @@ -263,6 +263,9 @@ enum aarch64_extra_tuning_flags }; #undef AARCH64_EXTRA_TUNING_OPTION +#define AARCH64_EXTRA_TUNE_APPROX_RSQRT \ + (AARCH64_EXTRA_TUNE_APPROX_RSQRT_DF | AARCH64_EXTRA_TUNE_APPROX_RSQRT_SF) + extern struct tune_params aarch64_tune_params; HOST_WIDE_INT aarch64_initial_elimination_offset (unsigned, unsigned); diff --git a/gcc/config/aarch64/aarch64-tuning-flags.def b/gcc/config/aarch64/aarch64-tuning-flags.def index 7e45a0c..57d9588 100644 --- a/gcc/config/aarch64/aarch64-tuning-flags.def +++ b/gcc/config/aarch64/aarch64-tuning-flags.def @@ -29,5 +29,6 @@ AARCH64_TUNE_ to give an enum name. */ AARCH64_EXTRA_TUNING_OPTION ("rename_fma_regs", RENAME_FMA_REGS) -AARCH64_EXTRA_TUNING_OPTION ("approx_rsqrt", APPROX_RSQRT) +AARCH64_EXTRA_TUNING_OPTION ("approx_rsqrt", APPROX_RSQRT_DF) +AARCH64_EXTRA_TUNING_OPTION ("approx_rsqrtf", APPROX_RSQRT_SF) diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 801f95a..39a1a47 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -7464,12 +7464,16 @@ aarch64_memory_move_cost (machine_mode mode
Re: [AArch64] Emit square root using the Newton series
On 02/16/16 14:56, Evandro Menezes wrote: On 12/08/15 15:35, Evandro Menezes wrote: Emit square root using the Newton series 2015-12-03 Evandro Menezesgcc/ * config/aarch64/aarch64-protos.h (aarch64_emit_swsqrt): Declare new function. * config/aarch64/aarch64-simd.md (sqrt2): New expansion and insn definitions. * config/aarch64/aarch64-tuning-flags.def (AARCH64_EXTRA_TUNE_FAST_SQRT): New tuning macro. * config/aarch64/aarch64.c (aarch64_emit_swsqrt): Define new function. * config/aarch64/aarch64.md (sqrt2): New expansion and insn definitions. * config/aarch64/aarch64.opt (mlow-precision-recip-sqrt): Expand option description. * doc/invoke.texi (mlow-precision-recip-sqrt): Likewise. This patch extends the patch that added support for implementing x^-1/2 using the Newton series by adding support for x^1/2 as well. Is it OK at this point of stage 3? Thank you, James, As I was saying, this patch results in some validation errors in CPU2000 benchmarks using DF. Although proving the algorithm to be pretty solid with a vast set of random values, I'm confused why some benchmarks fail to validate with this implementation of the Newton series for square root too, when they pass with the Newton series for reciprocal square root. Since I had no problems with the same algorithm on x86-64, I wonder if the initial estimate on AArch64, which offers just 8 bits, whereas x86-64 offers 11 bits, has to do with it. Then again, the algorithm iterated 1 less time on x86-64 than on AArch64. Since it seems that the initial estimate is sufficient for CPU2000 to validate when using SF, I'm leaning towards restricting the Newton series for square root only for SF. Your thoughts on the matter are appreciated, Add choices for the reciprocal square root approximation Allow a target to prefer such operation depending on the FP precision. gcc/ * config/aarch64/aarch64-protos.h (AARCH64_EXTRA_TUNE_APPROX_RSQRT): New macro. * config/aarch64/aarch64-tuning-flags.def (AARCH64_EXTRA_TUNE_APPROX_RSQRT_DF): New mask. (AARCH64_EXTRA_TUNE_APPROX_RSQRT_SF): Likewise. * config/aarch64/aarch64.c (use_rsqrt_p): New argument for the mode. (aarch64_builtin_reciprocal): Devise mode from builtin. (aarch64_optab_supported_p): New argument for the mode. Feedback appreciated. Thank you, -- Evandro Menezes
Re: [AArch64] Emit square root using the Newton series
On 02/26/16 17:42, Evandro Menezes wrote: On 02/26/16 08:59, James Greenhalgh wrote: On Mon, Feb 22, 2016 at 06:50:44PM -0600, Evandro Menezes wrote: In preparation for the patch adding the Newton series also for square root, I'd like to propose this patch changing the name of the existing tuning flag for the reciprocal square root. This is fine, other names like sw_rsqrt, expand_rsqrt, nr_rsqrt would also be OK. Pick your favourite! One comment on the replacement invoke.texi text below, otherwise this is OK to apply now. diff --git a/gcc/config/aarch64/aarch64.opt b/gcc/config/aarch64/aarch64.opt index 5cbd4cd..155d2bd 100644 --- a/gcc/config/aarch64/aarch64.opt +++ b/gcc/config/aarch64/aarch64.opt @@ -151,5 +151,5 @@ PC relative literal loads. mlow-precision-recip-sqrt Common Var(flag_mrecip_low_precision_sqrt) Optimization -When calculating a sqrt approximation, run fewer steps. +Calculate the reciprocal square-root approximation in fewer steps. This reduces precision, but can result in faster computation. diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 490df93..eeff24d 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -12879,12 +12879,10 @@ corresponding flag to the linker. @item -mno-low-precision-recip-sqrt @opindex -mlow-precision-recip-sqrt @opindex -mno-low-precision-recip-sqrt -The square root estimate uses two steps instead of three for double-precision, -and one step instead of two for single-precision. -Thus reducing latency and precision. -This is only relevant if @option{-ffast-math} activates -reciprocal square root estimate instructions. -Which in turn depends on the target processor. +The reciprocal square root approximation uses one step less than otherwise, +thus reducing latency and precision. When calculating the reciprocal square root approximation, use one less step than otherwise, thus reducing latency and precision. Checked in as r233772. But not without some log hiccups, sorry... -- Evandro Menezes
Re: [AArch64] Emit square root using the Newton series
On 02/26/16 08:59, James Greenhalgh wrote: On Mon, Feb 22, 2016 at 06:50:44PM -0600, Evandro Menezes wrote: In preparation for the patch adding the Newton series also for square root, I'd like to propose this patch changing the name of the existing tuning flag for the reciprocal square root. This is fine, other names like sw_rsqrt, expand_rsqrt, nr_rsqrt would also be OK. Pick your favourite! One comment on the replacement invoke.texi text below, otherwise this is OK to apply now. diff --git a/gcc/config/aarch64/aarch64.opt b/gcc/config/aarch64/aarch64.opt index 5cbd4cd..155d2bd 100644 --- a/gcc/config/aarch64/aarch64.opt +++ b/gcc/config/aarch64/aarch64.opt @@ -151,5 +151,5 @@ PC relative literal loads. mlow-precision-recip-sqrt Common Var(flag_mrecip_low_precision_sqrt) Optimization -When calculating a sqrt approximation, run fewer steps. +Calculate the reciprocal square-root approximation in fewer steps. This reduces precision, but can result in faster computation. diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 490df93..eeff24d 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -12879,12 +12879,10 @@ corresponding flag to the linker. @item -mno-low-precision-recip-sqrt @opindex -mlow-precision-recip-sqrt @opindex -mno-low-precision-recip-sqrt -The square root estimate uses two steps instead of three for double-precision, -and one step instead of two for single-precision. -Thus reducing latency and precision. -This is only relevant if @option{-ffast-math} activates -reciprocal square root estimate instructions. -Which in turn depends on the target processor. +The reciprocal square root approximation uses one step less than otherwise, +thus reducing latency and precision. When calculating the reciprocal square root approximation, use one less step than otherwise, thus reducing latency and precision. Checked in as r233772. Thank you, -- Evandro Menezes
Re: [AArch64] Emit square root using the Newton series
On Mon, Feb 22, 2016 at 06:50:44PM -0600, Evandro Menezes wrote: > In preparation for the patch adding the Newton series also for > square root, I'd like to propose this patch changing the name of the > existing tuning flag for the reciprocal square root. This is fine, other names like sw_rsqrt, expand_rsqrt, nr_rsqrt would also be OK. Pick your favourite! One comment on the replacement invoke.texi text below, otherwise this is OK to apply now. > diff --git a/gcc/config/aarch64/aarch64.opt b/gcc/config/aarch64/aarch64.opt > index 5cbd4cd..155d2bd 100644 > --- a/gcc/config/aarch64/aarch64.opt > +++ b/gcc/config/aarch64/aarch64.opt > @@ -151,5 +151,5 @@ PC relative literal loads. > > mlow-precision-recip-sqrt > Common Var(flag_mrecip_low_precision_sqrt) Optimization > -When calculating a sqrt approximation, run fewer steps. > +Calculate the reciprocal square-root approximation in fewer steps. > This reduces precision, but can result in faster computation. > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi > index 490df93..eeff24d 100644 > --- a/gcc/doc/invoke.texi > +++ b/gcc/doc/invoke.texi > @@ -12879,12 +12879,10 @@ corresponding flag to the linker. > @item -mno-low-precision-recip-sqrt > @opindex -mlow-precision-recip-sqrt > @opindex -mno-low-precision-recip-sqrt > -The square root estimate uses two steps instead of three for > double-precision, > -and one step instead of two for single-precision. > -Thus reducing latency and precision. > -This is only relevant if @option{-ffast-math} activates > -reciprocal square root estimate instructions. > -Which in turn depends on the target processor. > +The reciprocal square root approximation uses one step less than otherwise, > +thus reducing latency and precision. When calculating the reciprocal square root approximation, use one less step than otherwise, thus reducing latency and precision. Thanks, James
Re: [AArch64] Emit square root using the Newton series
On 12/10/15 04:30, Kyrill Tkachov wrote: On 09/12/15 18:50, Evandro Menezes wrote: On 12/09/2015 11:16 AM, Kyrill Tkachov wrote: On 09/12/15 17:02, Kyrill Tkachov wrote: On 09/12/15 16:59, Evandro Menezes wrote: On 12/09/2015 10:52 AM, Kyrill Tkachov wrote: Hi Evandro, On 08/12/15 21:35, Evandro Menezes wrote: Emit square root using the Newton series 2015-12-03 Evandro Menezesgcc/ * config/aarch64/aarch64-protos.h (aarch64_emit_swsqrt): Declare new function. * config/aarch64/aarch64-simd.md (sqrt2): New expansion and insn definitions. * config/aarch64/aarch64-tuning-flags.def (AARCH64_EXTRA_TUNE_FAST_SQRT): New tuning macro. * config/aarch64/aarch64.c (aarch64_emit_swsqrt): Define new function. * config/aarch64/aarch64.md (sqrt2): New expansion and insn definitions. * config/aarch64/aarch64.opt (mlow-precision-recip-sqrt): Expand option description. * doc/invoke.texi (mlow-precision-recip-sqrt): Likewise. This patch extends the patch that added support for implementing x^-1/2 using the Newton series by adding support for x^1/2 as well. Is it OK at this point of stage 3? A comment on the patch itself from me... diff --git a/gcc/config/aarch64/aarch64-tuning-flags.def b/gcc/config/aarch64/aarch64-tuning-flags.def index 6f7dbce..11c6c9a 100644 --- a/gcc/config/aarch64/aarch64-tuning-flags.def +++ b/gcc/config/aarch64/aarch64-tuning-flags.def @@ -30,4 +30,4 @@ AARCH64_EXTRA_TUNING_OPTION ("rename_fma_regs", RENAME_FMA_REGS) AARCH64_EXTRA_TUNING_OPTION ("recip_sqrt", RECIP_SQRT) - +AARCH64_EXTRA_TUNING_OPTION ("fast_sqrt", FAST_SQRT) That seems like a misleading name to me. If we're doing this, that means that the sqrt instruction is not faster than doing the inverse sqrt estimation followed by a multiply. I think a name like "synth_sqrt" or "estimate_sqrt" or something along those lines is more appropriate. Unfortunately, this is the case on Exynos M1: the series is faster than the instruction. :-( So, other targets when this is also true, using the "fast_sqrt" option might make sense. Sure, but the way your patch is written, we will emit the series when "fast_sqrt" is set, rather than the other way around, unless I'm misreading the logic in: Sorry, what I meant to say is it would be clearer, IMO, to describe the compiler action that is being taken (e.g. the rename_fma_regs tuning flag), in this case it's estimating sqrt using a series. Kyrill diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index 030a101..f6d2da4 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -4280,7 +4280,23 @@ ;; sqrt -(define_insn "sqrt2" +(define_expand "sqrt2" + [(set (match_operand:VDQF 0 "register_operand") +(sqrt:VDQF (match_operand:VDQF 1 "register_operand")))] + "TARGET_SIMD" +{ + if ((AARCH64_EXTRA_TUNE_FAST_SQRT & aarch64_tune_params.extra_tuning_flags) + && !optimize_function_for_size_p (cfun) + && flag_finite_math_only + && !flag_trapping_math + && flag_unsafe_math_optimizations) +{ + aarch64_emit_swsqrt (operands[0], operands[1]); + DONE; +} +}) Kyrill, How about "approx_sqrt" for, you guessed it, approximate square root? The same adjective would perhaps describe "recip_sqrt" better too. Sounds good to me. Sorry for the bikeshedding. Rename the reciprocal square root tuning flag Rename the tuning option to enable the Newton series for the reciprocal square root to reflect its approximative characteristic. 2016-02-22 Evandro Menezes gcc/ * config/aarch64/aarch64-tuning-flags.def: Rename tuning flag to AARCH64_EXTRA_TUNE_APPROX_RSQRT. * config/aarch64/aarch64.c (xgene1_tunings): Use new name. (use_rsqrt_p): Likewise. * config/aarch64/aarch64.opt (mlow-precision-recip-sqrt): Reword the text explaining this option. * doc/invoke.texi (-mlow-precision-recip-sqrt): Likewise. In preparation for the patch adding the Newton series also for square root, I'd like to propose this patch changing the name of the existing tuning flag for the reciprocal square root. Thank you, -- Evandro Menezes >From 7043444f83c12de0ab50627a8b386e3070050591 Mon Sep 17 00:00:00 2001 From: Evandro Menezes Date: Mon, 22 Feb 2016 17:49:09 -0600 Subject: [PATCH] Rename the reciprocal square root tuning flag Rename the tuning option to enable the Newton series for the reciprocal square root to reflect its approximative characteristic. 2016-02-22 Evandro Menezes gcc/ * config/aarch64/aarch64-tuning-flags.def: Rename tuning flag to
Re: [AArch64] Emit square root using the Newton series
On 12/08/15 15:35, Evandro Menezes wrote: Emit square root using the Newton series 2015-12-03 Evandro Menezesgcc/ * config/aarch64/aarch64-protos.h (aarch64_emit_swsqrt): Declare new function. * config/aarch64/aarch64-simd.md (sqrt2): New expansion and insn definitions. * config/aarch64/aarch64-tuning-flags.def (AARCH64_EXTRA_TUNE_FAST_SQRT): New tuning macro. * config/aarch64/aarch64.c (aarch64_emit_swsqrt): Define new function. * config/aarch64/aarch64.md (sqrt2): New expansion and insn definitions. * config/aarch64/aarch64.opt (mlow-precision-recip-sqrt): Expand option description. * doc/invoke.texi (mlow-precision-recip-sqrt): Likewise. This patch extends the patch that added support for implementing x^-1/2 using the Newton series by adding support for x^1/2 as well. Is it OK at this point of stage 3? Thank you, James, As I was saying, this patch results in some validation errors in CPU2000 benchmarks using DF. Although proving the algorithm to be pretty solid with a vast set of random values, I'm confused why some benchmarks fail to validate with this implementation of the Newton series for square root too, when they pass with the Newton series for reciprocal square root. Since I had no problems with the same algorithm on x86-64, I wonder if the initial estimate on AArch64, which offers just 8 bits, whereas x86-64 offers 11 bits, has to do with it. Then again, the algorithm iterated 1 less time on x86-64 than on AArch64. Since it seems that the initial estimate is sufficient for CPU2000 to validate when using SF, I'm leaning towards restricting the Newton series for square root only for SF. Your thoughts on the matter are appreciated, -- Evandro Menezes
Re: [AArch64] Emit square root using the Newton series
On 09/12/15 18:50, Evandro Menezes wrote: On 12/09/2015 11:16 AM, Kyrill Tkachov wrote: On 09/12/15 17:02, Kyrill Tkachov wrote: On 09/12/15 16:59, Evandro Menezes wrote: On 12/09/2015 10:52 AM, Kyrill Tkachov wrote: Hi Evandro, On 08/12/15 21:35, Evandro Menezes wrote: Emit square root using the Newton series 2015-12-03 Evandro Menezesgcc/ * config/aarch64/aarch64-protos.h (aarch64_emit_swsqrt): Declare new function. * config/aarch64/aarch64-simd.md (sqrt2): New expansion and insn definitions. * config/aarch64/aarch64-tuning-flags.def (AARCH64_EXTRA_TUNE_FAST_SQRT): New tuning macro. * config/aarch64/aarch64.c (aarch64_emit_swsqrt): Define new function. * config/aarch64/aarch64.md (sqrt2): New expansion and insn definitions. * config/aarch64/aarch64.opt (mlow-precision-recip-sqrt): Expand option description. * doc/invoke.texi (mlow-precision-recip-sqrt): Likewise. This patch extends the patch that added support for implementing x^-1/2 using the Newton series by adding support for x^1/2 as well. Is it OK at this point of stage 3? A comment on the patch itself from me... diff --git a/gcc/config/aarch64/aarch64-tuning-flags.def b/gcc/config/aarch64/aarch64-tuning-flags.def index 6f7dbce..11c6c9a 100644 --- a/gcc/config/aarch64/aarch64-tuning-flags.def +++ b/gcc/config/aarch64/aarch64-tuning-flags.def @@ -30,4 +30,4 @@ AARCH64_EXTRA_TUNING_OPTION ("rename_fma_regs", RENAME_FMA_REGS) AARCH64_EXTRA_TUNING_OPTION ("recip_sqrt", RECIP_SQRT) - +AARCH64_EXTRA_TUNING_OPTION ("fast_sqrt", FAST_SQRT) That seems like a misleading name to me. If we're doing this, that means that the sqrt instruction is not faster than doing the inverse sqrt estimation followed by a multiply. I think a name like "synth_sqrt" or "estimate_sqrt" or something along those lines is more appropriate. Unfortunately, this is the case on Exynos M1: the series is faster than the instruction. :-( So, other targets when this is also true, using the "fast_sqrt" option might make sense. Sure, but the way your patch is written, we will emit the series when "fast_sqrt" is set, rather than the other way around, unless I'm misreading the logic in: Sorry, what I meant to say is it would be clearer, IMO, to describe the compiler action that is being taken (e.g. the rename_fma_regs tuning flag), in this case it's estimating sqrt using a series. Kyrill diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index 030a101..f6d2da4 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -4280,7 +4280,23 @@ ;; sqrt -(define_insn "sqrt2" +(define_expand "sqrt2" + [(set (match_operand:VDQF 0 "register_operand") +(sqrt:VDQF (match_operand:VDQF 1 "register_operand")))] + "TARGET_SIMD" +{ + if ((AARCH64_EXTRA_TUNE_FAST_SQRT & aarch64_tune_params.extra_tuning_flags) + && !optimize_function_for_size_p (cfun) + && flag_finite_math_only + && !flag_trapping_math + && flag_unsafe_math_optimizations) +{ + aarch64_emit_swsqrt (operands[0], operands[1]); + DONE; +} +}) Kyrill, How about "approx_sqrt" for, you guessed it, approximate square root? The same adjective would perhaps describe "recip_sqrt" better too. Sounds good to me. Sorry for the bikeshedding. Kyrill Thanks,
Re: [AArch64] Emit square root using the Newton series
On 8 December 2015 at 21:35, Evandro Menezeswrote: >Emit square root using the Newton series > >2015-12-03 Evandro Menezes > >gcc/ > * config/aarch64/aarch64-protos.h (aarch64_emit_swsqrt): >Declare new > function. > * config/aarch64/aarch64-simd.md (sqrt2): New >expansion and > insn definitions. > * config/aarch64/aarch64-tuning-flags.def > (AARCH64_EXTRA_TUNE_FAST_SQRT): New tuning macro. > * config/aarch64/aarch64.c (aarch64_emit_swsqrt): Define >new function. > * config/aarch64/aarch64.md (sqrt2): New expansion >and insn > definitions. > * config/aarch64/aarch64.opt (mlow-precision-recip-sqrt): >Expand option > description. > * doc/invoke.texi (mlow-precision-recip-sqrt): Likewise. > > This patch extends the patch that added support for implementing x^-1/2 > using the Newton series by adding support for x^1/2 as well. Hi Evandro, What benchmarking have you done on this patch? /M
RE: [AArch64] Emit square root using the Newton series
Hi, Marcus. I've run Geekbench, SPEC CPU2000 and synthetic benchmarks. I can share these results iterating an array with values between 1 and 100 and taking their square root: Million Operations/sJuno A53 @850MHz A57 @1100MHz X^½ DP Canon31 37 Newton 13 39 %Δ -57%6% SP Canon48 144 Newton 18 62 %Δ -63%-57% X^-½DP Canon17 16 Newton 14 42 %Δ -17%155% SP Canon28 70 Newton 20 62 %Δ -30%-11% As you can see, it's a mixed result for A57 and a definite regression for A53. In mnost benchmarks overall, this is not a good optimization for A57. That's why I left it as a target-specific tuning. Thank you, -- Evandro Menezes Austin, TX > -Original Message- > From: Marcus Shawcroft [mailto:marcus.shawcr...@gmail.com] > Sent: Wednesday, December 09, 2015 8:06 > To: Evandro Menezes > Cc: GCC Patches; Marcus Shawcroft; James Greenhalgh; Andrew Pinski; Benedikt > Huber; philipp.toms...@theobroma-systems.com > Subject: Re: [AArch64] Emit square root using the Newton series > > On 8 December 2015 at 21:35, Evandro Menezes <e.mene...@samsung.com> wrote: > >Emit square root using the Newton series > > > >2015-12-03 Evandro Menezes <e.mene...@samsung.com> > > > >gcc/ > > * config/aarch64/aarch64-protos.h (aarch64_emit_swsqrt): > >Declare new > > function. > > * config/aarch64/aarch64-simd.md (sqrt2): New > >expansion and > > insn definitions. > > * config/aarch64/aarch64-tuning-flags.def > > (AARCH64_EXTRA_TUNE_FAST_SQRT): New tuning macro. > > * config/aarch64/aarch64.c (aarch64_emit_swsqrt): Define > >new function. > > * config/aarch64/aarch64.md (sqrt2): New expansion > >and insn > > definitions. > > * config/aarch64/aarch64.opt (mlow-precision-recip-sqrt): > >Expand option > > description. > > * doc/invoke.texi (mlow-precision-recip-sqrt): Likewise. > > > > This patch extends the patch that added support for implementing > > x^-1/2 using the Newton series by adding support for x^1/2 as well. > > Hi Evandro, What benchmarking have you done on this patch? > /M
Re: [AArch64] Emit square root using the Newton series
Hi Evandro, On 08/12/15 21:35, Evandro Menezes wrote: Emit square root using the Newton series 2015-12-03 Evandro Menezesgcc/ * config/aarch64/aarch64-protos.h (aarch64_emit_swsqrt): Declare new function. * config/aarch64/aarch64-simd.md (sqrt2): New expansion and insn definitions. * config/aarch64/aarch64-tuning-flags.def (AARCH64_EXTRA_TUNE_FAST_SQRT): New tuning macro. * config/aarch64/aarch64.c (aarch64_emit_swsqrt): Define new function. * config/aarch64/aarch64.md (sqrt2): New expansion and insn definitions. * config/aarch64/aarch64.opt (mlow-precision-recip-sqrt): Expand option description. * doc/invoke.texi (mlow-precision-recip-sqrt): Likewise. This patch extends the patch that added support for implementing x^-1/2 using the Newton series by adding support for x^1/2 as well. Is it OK at this point of stage 3? A comment on the patch itself from me... diff --git a/gcc/config/aarch64/aarch64-tuning-flags.def b/gcc/config/aarch64/aarch64-tuning-flags.def index 6f7dbce..11c6c9a 100644 --- a/gcc/config/aarch64/aarch64-tuning-flags.def +++ b/gcc/config/aarch64/aarch64-tuning-flags.def @@ -30,4 +30,4 @@ AARCH64_EXTRA_TUNING_OPTION ("rename_fma_regs", RENAME_FMA_REGS) AARCH64_EXTRA_TUNING_OPTION ("recip_sqrt", RECIP_SQRT) - +AARCH64_EXTRA_TUNING_OPTION ("fast_sqrt", FAST_SQRT) That seems like a misleading name to me. If we're doing this, that means that the sqrt instruction is not faster than doing the inverse sqrt estimation followed by a multiply. I think a name like "synth_sqrt" or "estimate_sqrt" or something along those lines is more appropriate. Thanks, Kyrill
Re: [AArch64] Emit square root using the Newton series
On 09/12/15 17:02, Kyrill Tkachov wrote: On 09/12/15 16:59, Evandro Menezes wrote: On 12/09/2015 10:52 AM, Kyrill Tkachov wrote: Hi Evandro, On 08/12/15 21:35, Evandro Menezes wrote: Emit square root using the Newton series 2015-12-03 Evandro Menezesgcc/ * config/aarch64/aarch64-protos.h (aarch64_emit_swsqrt): Declare new function. * config/aarch64/aarch64-simd.md (sqrt2): New expansion and insn definitions. * config/aarch64/aarch64-tuning-flags.def (AARCH64_EXTRA_TUNE_FAST_SQRT): New tuning macro. * config/aarch64/aarch64.c (aarch64_emit_swsqrt): Define new function. * config/aarch64/aarch64.md (sqrt2): New expansion and insn definitions. * config/aarch64/aarch64.opt (mlow-precision-recip-sqrt): Expand option description. * doc/invoke.texi (mlow-precision-recip-sqrt): Likewise. This patch extends the patch that added support for implementing x^-1/2 using the Newton series by adding support for x^1/2 as well. Is it OK at this point of stage 3? A comment on the patch itself from me... diff --git a/gcc/config/aarch64/aarch64-tuning-flags.def b/gcc/config/aarch64/aarch64-tuning-flags.def index 6f7dbce..11c6c9a 100644 --- a/gcc/config/aarch64/aarch64-tuning-flags.def +++ b/gcc/config/aarch64/aarch64-tuning-flags.def @@ -30,4 +30,4 @@ AARCH64_EXTRA_TUNING_OPTION ("rename_fma_regs", RENAME_FMA_REGS) AARCH64_EXTRA_TUNING_OPTION ("recip_sqrt", RECIP_SQRT) - +AARCH64_EXTRA_TUNING_OPTION ("fast_sqrt", FAST_SQRT) That seems like a misleading name to me. If we're doing this, that means that the sqrt instruction is not faster than doing the inverse sqrt estimation followed by a multiply. I think a name like "synth_sqrt" or "estimate_sqrt" or something along those lines is more appropriate. Unfortunately, this is the case on Exynos M1: the series is faster than the instruction. :-( So, other targets when this is also true, using the "fast_sqrt" option might make sense. Sure, but the way your patch is written, we will emit the series when "fast_sqrt" is set, rather than the other way around, unless I'm misreading the logic in: Sorry, what I meant to say is it would be clearer, IMO, to describe the compiler action that is being taken (e.g. the rename_fma_regs tuning flag), in this case it's estimating sqrt using a series. Kyrill diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index 030a101..f6d2da4 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -4280,7 +4280,23 @@ ;; sqrt -(define_insn "sqrt2" +(define_expand "sqrt2" + [(set (match_operand:VDQF 0 "register_operand") +(sqrt:VDQF (match_operand:VDQF 1 "register_operand")))] + "TARGET_SIMD" +{ + if ((AARCH64_EXTRA_TUNE_FAST_SQRT & aarch64_tune_params.extra_tuning_flags) + && !optimize_function_for_size_p (cfun) + && flag_finite_math_only + && !flag_trapping_math + && flag_unsafe_math_optimizations) +{ + aarch64_emit_swsqrt (operands[0], operands[1]); + DONE; +} +}) Thanks, Kyrill
Re: [AArch64] Emit square root using the Newton series
On 12/09/2015 11:16 AM, Kyrill Tkachov wrote: On 09/12/15 17:02, Kyrill Tkachov wrote: On 09/12/15 16:59, Evandro Menezes wrote: On 12/09/2015 10:52 AM, Kyrill Tkachov wrote: Hi Evandro, On 08/12/15 21:35, Evandro Menezes wrote: Emit square root using the Newton series 2015-12-03 Evandro Menezesgcc/ * config/aarch64/aarch64-protos.h (aarch64_emit_swsqrt): Declare new function. * config/aarch64/aarch64-simd.md (sqrt2): New expansion and insn definitions. * config/aarch64/aarch64-tuning-flags.def (AARCH64_EXTRA_TUNE_FAST_SQRT): New tuning macro. * config/aarch64/aarch64.c (aarch64_emit_swsqrt): Define new function. * config/aarch64/aarch64.md (sqrt2): New expansion and insn definitions. * config/aarch64/aarch64.opt (mlow-precision-recip-sqrt): Expand option description. * doc/invoke.texi (mlow-precision-recip-sqrt): Likewise. This patch extends the patch that added support for implementing x^-1/2 using the Newton series by adding support for x^1/2 as well. Is it OK at this point of stage 3? A comment on the patch itself from me... diff --git a/gcc/config/aarch64/aarch64-tuning-flags.def b/gcc/config/aarch64/aarch64-tuning-flags.def index 6f7dbce..11c6c9a 100644 --- a/gcc/config/aarch64/aarch64-tuning-flags.def +++ b/gcc/config/aarch64/aarch64-tuning-flags.def @@ -30,4 +30,4 @@ AARCH64_EXTRA_TUNING_OPTION ("rename_fma_regs", RENAME_FMA_REGS) AARCH64_EXTRA_TUNING_OPTION ("recip_sqrt", RECIP_SQRT) - +AARCH64_EXTRA_TUNING_OPTION ("fast_sqrt", FAST_SQRT) That seems like a misleading name to me. If we're doing this, that means that the sqrt instruction is not faster than doing the inverse sqrt estimation followed by a multiply. I think a name like "synth_sqrt" or "estimate_sqrt" or something along those lines is more appropriate. Unfortunately, this is the case on Exynos M1: the series is faster than the instruction. :-( So, other targets when this is also true, using the "fast_sqrt" option might make sense. Sure, but the way your patch is written, we will emit the series when "fast_sqrt" is set, rather than the other way around, unless I'm misreading the logic in: Sorry, what I meant to say is it would be clearer, IMO, to describe the compiler action that is being taken (e.g. the rename_fma_regs tuning flag), in this case it's estimating sqrt using a series. Kyrill diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index 030a101..f6d2da4 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -4280,7 +4280,23 @@ ;; sqrt -(define_insn "sqrt2" +(define_expand "sqrt2" + [(set (match_operand:VDQF 0 "register_operand") +(sqrt:VDQF (match_operand:VDQF 1 "register_operand")))] + "TARGET_SIMD" +{ + if ((AARCH64_EXTRA_TUNE_FAST_SQRT & aarch64_tune_params.extra_tuning_flags) + && !optimize_function_for_size_p (cfun) + && flag_finite_math_only + && !flag_trapping_math + && flag_unsafe_math_optimizations) +{ + aarch64_emit_swsqrt (operands[0], operands[1]); + DONE; +} +}) Kyrill, How about "approx_sqrt" for, you guessed it, approximate square root? The same adjective would perhaps describe "recip_sqrt" better too. Thanks, -- Evandro Menezes
Re: [AArch64] Emit square root using the Newton series
On 12/09/2015 10:52 AM, Kyrill Tkachov wrote: Hi Evandro, On 08/12/15 21:35, Evandro Menezes wrote: Emit square root using the Newton series 2015-12-03 Evandro Menezesgcc/ * config/aarch64/aarch64-protos.h (aarch64_emit_swsqrt): Declare new function. * config/aarch64/aarch64-simd.md (sqrt2): New expansion and insn definitions. * config/aarch64/aarch64-tuning-flags.def (AARCH64_EXTRA_TUNE_FAST_SQRT): New tuning macro. * config/aarch64/aarch64.c (aarch64_emit_swsqrt): Define new function. * config/aarch64/aarch64.md (sqrt2): New expansion and insn definitions. * config/aarch64/aarch64.opt (mlow-precision-recip-sqrt): Expand option description. * doc/invoke.texi (mlow-precision-recip-sqrt): Likewise. This patch extends the patch that added support for implementing x^-1/2 using the Newton series by adding support for x^1/2 as well. Is it OK at this point of stage 3? A comment on the patch itself from me... diff --git a/gcc/config/aarch64/aarch64-tuning-flags.def b/gcc/config/aarch64/aarch64-tuning-flags.def index 6f7dbce..11c6c9a 100644 --- a/gcc/config/aarch64/aarch64-tuning-flags.def +++ b/gcc/config/aarch64/aarch64-tuning-flags.def @@ -30,4 +30,4 @@ AARCH64_EXTRA_TUNING_OPTION ("rename_fma_regs", RENAME_FMA_REGS) AARCH64_EXTRA_TUNING_OPTION ("recip_sqrt", RECIP_SQRT) - +AARCH64_EXTRA_TUNING_OPTION ("fast_sqrt", FAST_SQRT) That seems like a misleading name to me. If we're doing this, that means that the sqrt instruction is not faster than doing the inverse sqrt estimation followed by a multiply. I think a name like "synth_sqrt" or "estimate_sqrt" or something along those lines is more appropriate. Unfortunately, this is the case on Exynos M1: the series is faster than the instruction. :-( So, other targets when this is also true, using the "fast_sqrt" option might make sense. Thank you, -- Evandro Menezes
Re: [AArch64] Emit square root using the Newton series
On 09/12/15 16:59, Evandro Menezes wrote: On 12/09/2015 10:52 AM, Kyrill Tkachov wrote: Hi Evandro, On 08/12/15 21:35, Evandro Menezes wrote: Emit square root using the Newton series 2015-12-03 Evandro Menezesgcc/ * config/aarch64/aarch64-protos.h (aarch64_emit_swsqrt): Declare new function. * config/aarch64/aarch64-simd.md (sqrt2): New expansion and insn definitions. * config/aarch64/aarch64-tuning-flags.def (AARCH64_EXTRA_TUNE_FAST_SQRT): New tuning macro. * config/aarch64/aarch64.c (aarch64_emit_swsqrt): Define new function. * config/aarch64/aarch64.md (sqrt2): New expansion and insn definitions. * config/aarch64/aarch64.opt (mlow-precision-recip-sqrt): Expand option description. * doc/invoke.texi (mlow-precision-recip-sqrt): Likewise. This patch extends the patch that added support for implementing x^-1/2 using the Newton series by adding support for x^1/2 as well. Is it OK at this point of stage 3? A comment on the patch itself from me... diff --git a/gcc/config/aarch64/aarch64-tuning-flags.def b/gcc/config/aarch64/aarch64-tuning-flags.def index 6f7dbce..11c6c9a 100644 --- a/gcc/config/aarch64/aarch64-tuning-flags.def +++ b/gcc/config/aarch64/aarch64-tuning-flags.def @@ -30,4 +30,4 @@ AARCH64_EXTRA_TUNING_OPTION ("rename_fma_regs", RENAME_FMA_REGS) AARCH64_EXTRA_TUNING_OPTION ("recip_sqrt", RECIP_SQRT) - +AARCH64_EXTRA_TUNING_OPTION ("fast_sqrt", FAST_SQRT) That seems like a misleading name to me. If we're doing this, that means that the sqrt instruction is not faster than doing the inverse sqrt estimation followed by a multiply. I think a name like "synth_sqrt" or "estimate_sqrt" or something along those lines is more appropriate. Unfortunately, this is the case on Exynos M1: the series is faster than the instruction. :-( So, other targets when this is also true, using the "fast_sqrt" option might make sense. Sure, but the way your patch is written, we will emit the series when "fast_sqrt" is set, rather than the other way around, unless I'm misreading the logic in: diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index 030a101..f6d2da4 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -4280,7 +4280,23 @@ ;; sqrt -(define_insn "sqrt2" +(define_expand "sqrt2" + [(set (match_operand:VDQF 0 "register_operand") + (sqrt:VDQF (match_operand:VDQF 1 "register_operand")))] + "TARGET_SIMD" +{ + if ((AARCH64_EXTRA_TUNE_FAST_SQRT & aarch64_tune_params.extra_tuning_flags) + && !optimize_function_for_size_p (cfun) + && flag_finite_math_only + && !flag_trapping_math + && flag_unsafe_math_optimizations) +{ + aarch64_emit_swsqrt (operands[0], operands[1]); + DONE; +} +}) Thanks, Kyrill