On Sun, Jan 21, 2024 at 07:57:54PM +0530, Ajit Agarwal wrote: > > Hello All: > > New pass to replace adjacent memory addresses lxv with lxvp. > Added common infrastructure for load store fusion for > different targets. > > Common routines are refactored in fusion-common.h. > > AARCH64 load/store fusion pass is not changed with the > common infrastructure. > > For AARCH64 architectures just include "fusion-common.h" > and target dependent code can be added to that. > > > Alex/Richard: > > If you would like me to add for AARCH64 I can do that for AARCH64. > > If you would like to do that is fine with me. > > Bootstrapped and regtested with powerpc64-linux-gnu. > > Improvement in performance is seen with Spec 2017 spec FP benchmarks.
This patch is a lot better than the previous patch in that it generates fewer extra instructions, and just replaces some of the load vector instructions with load vector pair. In compiling Spec 2017 with it, I see the following results: Benchmarks that generate lxvp instead of lxv: 500.perlbench_r replace 10 LXVs with 5 LXVPs 502.gcc_r replace 2 LXVs with 1 LXVPs 510.parest_r replace 28 LXVs with 14 LXVPs 511.povray_r replace 4 LXVs with 2 LXVPs 521.wrf_r replace 12 LXVs with 6 LXVPs 527.cam4_r replace 12 LXVs with 6 LXVPs 557.xz_r replace 10 LXVs with 5 LXVPs A few of the benchmarks generated a different number of NOPs, based on how prefixed addresses were generated. I tend to feel this is minor compared to the others. 507.cactuBSSN_r 17 fewer alignment NOPs 520.omnetpp_r 231 more alignment NOPs 523.xalancbmk_r 246 fewer alignment NOPs 531.deepsjeng_r 2 more alignment NOPs 541.leela_r 28 more alignment NOPs 549.fotonik3d_r 27 more alignment NOPs 554.roms_r 8 more alignment NOPs However there were three benchmarks where the code regressed. In particular, it looks like there are more load and store vectors to the stack, so it indicates more spilling is going on. 525.x264_r 16 more stack spills, but 84 LXVPs 526.blender_r 4 more stack spills, but 149 LXVPs One benchmark actually generated fewer stack spills as well as generating LXVPs. 538.imagick_r 11 fewer stack spills, and 26 LXVPs Note, these are changes to the static instructions generated. It does not evaluate whether the changes help/hurt performance. -- Michael Meissner, IBM PO Box 98, Ayer, Massachusetts, USA, 01432 email: meiss...@linux.ibm.com