On Thu, Dec 19, 2024 at 12:15 AM Craig Blackmore <[email protected]> wrote: > > Changes since v7: > - Fixed typo `bits` -> `bytes` > - Tuned threshold for applying the optimization > - Provided results for larger sizes requested by Max Chou > > This patch provides up to 60% speedup on the `memcpy` benchmark from: > > > https://github.com/embecosm/rise-rvv-tcg-qemu-tooling/tree/main/strmem-benchmarks > > There is some variation in the measurements so results are attached for six > runs on a single thread on an Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz. > > The three graphs are: > > memcpy-594c0cb1ab-128-speedup.pdf: VLEN 128 > > memcpy-594c0cb1ab-1024-speedup.pdf: VLEN 1024 > > memcpy-594c0cb1ab-stdlib-speedup.pdf: Scalar (to further illustrate > measurement variation as this version will not touch the function modified by > this patch) > > Previous versions: > - v1: > https://lore.kernel.org/all/[email protected]/ > - v2: > https://lore.kernel.org/all/[email protected]/ > - v3: > https://lore.kernel.org/all/[email protected]/ > - v4: > https://lore.kernel.org/all/[email protected]/ > - v5: > https://lore.kernel.org/all/[email protected]/ > - v6: > https://lore.kernel.org/all/[email protected]/ > - v7: > https://lore.kernel.org/all/[email protected]/ > > Cc: Richard Henderson <[email protected]> > Cc: Palmer Dabbelt <[email protected]> > Cc: Alistair Francis <[email protected]> > Cc: Bin Meng <[email protected]> > Cc: Weiwei Li <[email protected]> > Cc: Daniel Henrique Barboza <[email protected]> > Cc: Liu Zhiwei <[email protected]> > Cc: Helene Chelin <[email protected]> > Cc: Nathan Egge <[email protected]> > Cc: Max Chou <[email protected]> > Cc: Paolo Savini <[email protected]> > > Craig Blackmore (2): > target/riscv: rvv: fix typo in vext continuous ldst function names > target/riscv: rvv: speed up small unit-stride loads and stores
Thanks! Applied to riscv-to-apply.next Alistair > > target/riscv/vector_helper.c | 26 +++++++++++++++++++++----- > 1 file changed, 21 insertions(+), 5 deletions(-) > > -- > 2.43.0 > >
