Re: [PATCH 3/3][AArch64] Emit division using the Newton series

James Greenhalgh Tue, 31 May 2016 02:29:45 -0700

On Fri, May 27, 2016 at 05:57:30PM -0500, Evandro Menezes wrote:
> On 05/25/16 11:16, James Greenhalgh wrote:
> >On Wed, Apr 27, 2016 at 04:15:53PM -0500, Evandro Menezes wrote:
> >>    gcc/
> >>         * config/aarch64/aarch64-protos.h
> >>         (tune_params): Add new member "approx_div_modes".
> >>         (aarch64_emit_approx_div): Declare new function.
> >>         * config/aarch64/aarch64.c
> >>         (generic_tunings): New member "approx_div_modes".
> >>         (cortexa35_tunings): Likewise.
> >>         (cortexa53_tunings): Likewise.
> >>         (cortexa57_tunings): Likewise.
> >>         (cortexa72_tunings): Likewise.
> >>         (exynosm1_tunings): Likewise.
> >>         (thunderx_tunings): Likewise.
> >>         (xgene1_tunings): Likewise.
> >>         (aarch64_emit_approx_div): Define new function.
> >>         * config/aarch64/aarch64.md ("div<mode>3"): New expansion.
> >>         * config/aarch64/aarch64-simd.md ("div<mode>3"): Likewise.
> >>         * config/aarch64/aarch64.opt (-mlow-precision-div): Add new option.
> >>         * doc/invoke.texi (-mlow-precision-div): Describe new option.
> >My comments from the other two patches around using a structure to
> >group up the tuning flags and whether we really want the new option
> >apply here too.
> >
> >This code has no consumers by default and is only used for
> >-mlow-precision-div. Is this option likely to be useful to our users in
> >practice? It might all be more palatable under something like the rs6000's
> >-mrecip=opt .
> 
> I agree.  OK as a follow up?


Works for me.

> >>diff --git a/gcc/config/aarch64/aarch64-simd.md 
> >>b/gcc/config/aarch64/aarch64-simd.md
> >>index 47ccb18..7e99e16 100644
> >>--- a/gcc/config/aarch64/aarch64-simd.md
> >>+++ b/gcc/config/aarch64/aarch64-simd.md
> >>@@ -1509,7 +1509,19 @@
> >>    [(set_attr "type" "neon_fp_mul_<Vetype><q>")]
> >>  )
> >>-(define_insn "div<mode>3"
> >>+(define_expand "div<mode>3"
> >>+ [(set (match_operand:VDQF 0 "register_operand")
> >>+       (div:VDQF (match_operand:VDQF 1 "general_operand")
> >What does this relaxation to general_operand give you?
> 
> Hold that thought...
> 
> >>+            (match_operand:VDQF 2 "register_operand")))]
> >>+ "TARGET_SIMD"
> >>+{
> >>+  if (aarch64_emit_approx_div (operands[0], operands[1], operands[2]))
> >>+    DONE;
> >>+
> >>+  operands[1] = force_reg (<MODE>mode, operands[1]);
> >...other than the need to do this (sorry if I've missed something obvious).
> 
> Hold on...
> 
> >>+  if (num != CONST1_RTX (mode))
> >>+    {
> >>+      /* Calculate the approximate division.  */
> >>+      rtx xnum = force_reg (mode, num);
> >>+      emit_set_insn (xrcp, gen_rtx_MULT (mode, xrcp, xnum));
> >>+    }
> 
> About that relaxation, as you can see here, since the series
> approximates the reciprocal of the denominator, if the numerator is
> 1.0, a register can be spared, as the result is ready and the
> numerator is not needed.

But, in the case that the multiplication is by 1, can we not rely on the
other optimization machinery eliminating it? I mean, I see the optimization
that this enables for you, but can't you rely on future passes to do the
cleanup, and save yourself the few lines of special casing?

> +/* Emit the instruction sequence to compute the approximation for the 
> division
> +   of NUM by DEN and return whether the sequence was emitted or not.  */

Needs a brief mention of what we use QUO for :).

> +
> +bool
> +aarch64_emit_approx_div (rtx quo, rtx num, rtx den)
> +{
> +  machine_mode mode = GET_MODE (quo);
> +  bool use_approx_division_p = (flag_mlow_precision_div
> +                             || (aarch64_tune_params.approx_modes->division
> +                                 & AARCH64_APPROX_MODE (mode)));
> +
> +  if (!flag_finite_math_only
> +      || flag_trapping_math
> +      || !flag_unsafe_math_optimizations
> +      || optimize_function_for_size_p (cfun)
> +      || !use_approx_division_p)
> +    return false;
> +
> +  /* Estimate the approximate reciprocal.  */
> +  rtx xrcp = gen_reg_rtx (mode);
> +  emit_insn ((*get_recpe_type (mode)) (xrcp, den));
> +
> +  /* Iterate over the series twice for SF and thrice for DF.  */
> +  int iterations = (GET_MODE_INNER (mode) == DFmode) ? 3 : 2;
> +
> +  /* Optionally iterate over the series once less for faster performance,
> +     while sacrificing the accuracy.  */
> +  if (flag_mlow_precision_div)
> +    iterations--;
> +
> +  /* Iterate over the series to calculate the approximate reciprocal.  */
> +  rtx xtmp = gen_reg_rtx (mode);
> +  while (iterations--)
> +    {
> +      emit_insn ((*get_recps_type (mode)) (xtmp, xrcp, den));
> +
> +      if (iterations > 0)
> +     emit_set_insn (xrcp, gen_rtx_MULT (mode, xrcp, xtmp));
> +    }
> +
> +  if (num != CONST1_RTX (mode))
> +    {
> +      /* As the approximate reciprocal of the denominator is already 
> calculated,
> +         only calculate the approximate division when the numerator is not 
> 1.0.  */

Long lines.

> +      rtx xnum = force_reg (mode, num);
> +      emit_set_insn (xrcp, gen_rtx_MULT (mode, xrcp, xnum));
> +    }
> +
> +  /* Finalize the approximation.  */
> +  emit_set_insn (quo, gen_rtx_MULT (mode, xrcp, xtmp));
> +  return true;
> +}
> +
>  /* Return the number of instructions that can be issued per cycle.  */
>  static int
>  aarch64_sched_issue_rate (void)

Thanks,
James

Re: [PATCH 3/3][AArch64] Emit division using the Newton series

Reply via email to