Changes since v7:
- Fixed typo `bits` -> `bytes`
- Tuned threshold for applying the optimization
- Provided results for larger sizes requested by Max Chou

This patch provides up to 60% speedup on the `memcpy` benchmark from:

  
https://github.com/embecosm/rise-rvv-tcg-qemu-tooling/tree/main/strmem-benchmarks

There is some variation in the measurements so results are attached for six 
runs on a single thread on an Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz.

The three graphs are:

  memcpy-594c0cb1ab-128-speedup.pdf: VLEN 128

  memcpy-594c0cb1ab-1024-speedup.pdf: VLEN 1024

  memcpy-594c0cb1ab-stdlib-speedup.pdf: Scalar (to further illustrate 
measurement variation as this version will not touch the function modified by 
this patch)

Previous versions:
- 
v1:https://lore.kernel.org/all/[email protected]/
- 
v2:https://lore.kernel.org/all/[email protected]/
- 
v3:https://lore.kernel.org/all/[email protected]/
- 
v4:https://lore.kernel.org/all/[email protected]/
- 
v5:https://lore.kernel.org/all/[email protected]/
- 
v6:https://lore.kernel.org/all/[email protected]/
- 
v7:https://lore.kernel.org/all/[email protected]/

Cc: Richard Henderson<[email protected]>
Cc: Palmer Dabbelt<[email protected]>
Cc: Alistair Francis<[email protected]>
Cc: Bin Meng<[email protected]>
Cc: Weiwei Li<[email protected]>
Cc: Daniel Henrique Barboza<[email protected]>
Cc: Liu Zhiwei<[email protected]>
Cc: Helene Chelin<[email protected]>
Cc: Nathan Egge<[email protected]>
Cc: Max Chou<[email protected]>
Cc: Paolo Savini<[email protected]>

Craig Blackmore (2):
  target/riscv: rvv: fix typo in vext continuous ldst function names
  target/riscv: rvv: speed up small unit-stride loads and stores

 target/riscv/vector_helper.c | 26 +++++++++++++++++++++-----
 1 file changed, 21 insertions(+), 5 deletions(-)

--
2.43.0

Attachment: memcpy-594c0cb1ab-graphs.tar.gz
Description: application/gzip

Reply via email to