On Fri, 2 Nov 2018 at 19:08, Wilco Dijkstra <wilco.dijks...@arm.com> wrote: > > Prathamesh Kulkarni wrote: > > > This is a rebased version of patch that adds a pattern to neon.md for > > implementing division with multiplication by reciprocal using > > vrecpe/vrecps with -funsafe-math-optimizations excluding -Os. > > The newly added test-cases are not vectorized on armeb target with > > -O2. I posted the analysis for that here: > > https://gcc.gnu.org/ml/gcc-patches/2016-05/msg01765.html > > I don't think doing this unconditionally for any CPU is a good idea. On > AArch64 > we don't enable this for any core since it's not really faster (newer CPUs > have > significantly improved division and the reciprocal instructions reduce > throughput > of other FMAs). On wrf doing reciprocal square root is far better than > reciprocal > division, but it's only faster on some specific CPUs, so it's not enabled by > default. Hi Wilco, Thanks for the suggestions. The last time I benchmarked the patch (around Jan 2016) I got following results with the patch for SPEC2006:
a15: +0.64% overall, 481.wrf: +6.46% a53: +0.21% overall, 416.gamess: -1.39%, 481.wrf: +6.76% a57: +0.35% overall, 481.wrf: +3.84% (https://gcc.gnu.org/ml/gcc-patches/2016-01/msg01209.html) Do these numbers look acceptable ? I am benchmarking the patch on ToT, and will report if there are any performance improvements found with the patch. Thanks, Prathamesh > > Wilco