Hi Prathamesh, Prathamesh Kulkarni wrote: > Thanks for the suggestions. The last time I benchmarked the patch > (around Jan 2016) > I got following results with the patch for SPEC2006: > > a15: +0.64% overall, 481.wrf: +6.46% > a53: +0.21% overall, 416.gamess: -1.39%, 481.wrf: +6.76% > a57: +0.35% overall, 481.wrf: +3.84% > (https://gcc.gnu.org/ml/gcc-patches/2016-01/msg01209.html) > > Do these numbers look acceptable ? > I am benchmarking the patch on ToT, and will report if there are any > performance improvements found with the patch.
Yes those results are quite good - in fact they seemed too good to be true at first. However looking at arm/neon.md there isn't a division pattern. So I think it's worth mentioning in the description that your patch actually adds vectorization of division. Disassembling the AArch64 wrf binary shows several hundred vector division instructions - so the speedup makes sense now since many more loops are being vectorized. It's a shame this pattern wasn't added many years ago... It's a good idea to add a vectorized (r)sqrt too as this will improve wrf even further. Wilco