Re: [ARM] Implement division using vrecpe, vrecps

Wilco Dijkstra Mon, 05 Nov 2018 05:36:02 -0800

Hi Prathamesh,

Prathamesh Kulkarni wrote:
> Thanks for the suggestions. The last time I benchmarked the patch
> (around Jan 2016)
> I got following results with the patch for SPEC2006:
>
> a15: +0.64% overall, 481.wrf: +6.46%
> a53: +0.21% overall, 416.gamess: -1.39%, 481.wrf: +6.76%
> a57: +0.35% overall, 481.wrf: +3.84%
> (https://gcc.gnu.org/ml/gcc-patches/2016-01/msg01209.html)
>
> Do these numbers look acceptable ?
> I am benchmarking the patch on ToT, and will report if there are any
> performance improvements found with the patch.


Yes those results are quite good - in fact they seemed too good to be true at 
first.
However looking at arm/neon.md there isn't a division pattern. So I think it's 
worth
mentioning in the description that your patch actually adds vectorization of
division. Disassembling the AArch64 wrf binary shows several hundred vector
division instructions - so the speedup makes sense now since many more loops
are being vectorized.

It's a shame this pattern wasn't added many years ago... It's a good idea to 
add a
vectorized (r)sqrt too as this will improve wrf even further.

Wilco

Re: [ARM] Implement division using vrecpe, vrecps

Reply via email to