Re: [ARM] Implement division using vrecpe, vrecps

Prathamesh Kulkarni Sun, 04 Nov 2018 20:56:50 -0800

On Fri, 2 Nov 2018 at 19:08, Wilco Dijkstra <wilco.dijks...@arm.com> wrote:
>
> Prathamesh Kulkarni wrote:
>
> > This is a rebased version of patch that adds a pattern to neon.md for
> > implementing division with multiplication by reciprocal using
> > vrecpe/vrecps with -funsafe-math-optimizations excluding -Os.
> > The newly added test-cases are not vectorized on armeb target with
> > -O2. I posted the analysis for that here:
> > https://gcc.gnu.org/ml/gcc-patches/2016-05/msg01765.html
>
> I don't think doing this unconditionally for any CPU is a good idea. On 
> AArch64
> we don't enable this for any core since it's not really faster (newer CPUs 
> have
> significantly improved division and the reciprocal instructions reduce 
> throughput
> of other FMAs). On wrf doing reciprocal square root is far better than 
> reciprocal
> division, but it's only faster on some specific CPUs, so it's not enabled by 
> default.
Hi Wilco,
Thanks for the suggestions. The last time I benchmarked the patch
(around Jan 2016)
I got following results with the patch for SPEC2006:


a15: +0.64% overall, 481.wrf: +6.46%
a53: +0.21% overall, 416.gamess: -1.39%, 481.wrf: +6.76%
a57: +0.35% overall, 481.wrf: +3.84%
(https://gcc.gnu.org/ml/gcc-patches/2016-01/msg01209.html)

Do these numbers look acceptable ?
I am benchmarking the patch on ToT, and will report if there are any
performance improvements found with the patch.

Thanks,
Prathamesh
>
> Wilco

Re: [ARM] Implement division using vrecpe, vrecps

Reply via email to