> On 04/04/16 14:06, Evandro Menezes wrote:
> > On 04/01/16 17:52, Evandro Menezes wrote:
> >> On 04/01/16 17:45, Wilco Dijkstra wrote:
> >>> Evandro Menezes wrote:
> >>>
> >>>> However, I don't think that there's the need to handle any special
> >>>> case for division.  The only case when the approximation differs
> >>>> from division is when the numerator is infinity and the
> >>>> denominator, zero, when the approximation returns infinity and the
> >>>> division, NAN.  So I don't think that it's a special case that
> >>>> deserves being handled.
> >>>> IOW,
> >>>> the result of the approximate reciprocal is always needed.
> >>>   No, the result of the approximate reciprocal is not needed.
> >>>
> >>> Basically a NR approximation produces a correction factor that is
> >>> very close to 1.0, and then multiplies that with the previous
> >>> estimate to get a more accurate estimate. The final calculation for
> >>> x * recip(y) is:
> >>>
> >>> result = (reciprocal_correction * reciprocal_estimate) * x
> >>>
> >>> while what I am suggesting is a trivial reassociation:
> >>>
> >>> result = reciprocal_correction * (reciprocal_estimate * x)
> >>>
> >>> The computation of the final reciprocal_correction is on the
> >>> critical latency path, while reciprocal_estimate is computed
> >>> earlier, so we can compute (reciprocal_estimate * x) without
> >>> increasing the overall latency.
> >>> Ie. we saved
> >>> a multiply.
> >>>
> >>> In principle this could be done as a separate optimization pass that
> >>> tries to reassociate to reduce latency. However I'm not too
> >>> convinced this would be easy to implement in GCC's scheduler, so
> >>> it's best to do it explicitly.
> >>
> >> I think that I see what you mean.  I'll hack something tomorrow.
> >
> >    [AArch64] Emit division using the Newton series
> >
> >    2016-04-04  Evandro Menezes  <e.mene...@samsung.com>
> >                 Wilco Dijkstra <wilco.dijks...@arm.com>
> >
> >    gcc/
> >             * config/aarch64/aarch64-tuning-flags.def
> >             * config/aarch64/aarch64-protos.h
> >             (AARCH64_APPROX_MODE): New macro.
> > (AARCH64_EXTRA_TUNE_APPROX_{NONE,SP,DP,DFORM,QFORM,SCALAR,VECTOR,ALL}:
> >             New tuning macros.
> >             (tune_params): Add new member "approx_div_modes".
> >             (aarch64_emit_approx_div): Declare new function.
> >             * config/aarch64/aarch64.c
> >             (generic_tunings): New member "approx_div_modes".
> >             (cortexa35_tunings): Likewise.
> >             (cortexa53_tunings): Likewise.
> >             (cortexa57_tunings): Likewise.
> >             (cortexa72_tunings): Likewise.
> >             (exynosm1_tunings): Likewise.
> >             (thunderx_tunings): Likewise.
> >             (xgene1_tunings): Likewise.
> >             (aarch64_emit_approx_div): Define new function.
> >             * config/aarch64/aarch64.md ("div<mode>3"): New expansion.
> >             * config/aarch64/aarch64-simd.md ("div<mode>3"): Likewise.
> >             * config/aarch64/aarch64.opt (-mlow-precision-div): Add new
> >    option.
> >             * doc/invoke.texi (-mlow-precision-div): Describe new option.
> >
> >
> > This version of the patch has a shorter dependency chain at the last
> > iteration of the series.
> 
> Ping^1

Ping^2

-- 
Evandro Menezes                              Austin, TX

Reply via email to