Hi, This version provides subroutine to iterate the load/store operations per page and replaces the original continuous load/store loop by the memcpy call.
And thank for Richard Henderson's patch set to the user-only munmap race issue, this version rebases on it and apply the [set|clear]_helper_retaddr calls. I'm preparing the v6 version that will include * Improve the vector unit-stride fault-only-first load instructions * Try to handle the watchpoints by hand to increase the possibility to do host direct access Some performance result of this version 1. Test case provided in https://gitlab.com/qemu-project/qemu/-/issues/2137#note_1757501369 - QEMU user mode (vlen=512): - Original: ~40.1 sec - v4: ~4.7 sec - v5: ~2.9 sec - QEMU system mode (vlen=512): - Original: ~112.5 sec - v4: ~6.5 sec - v5: ~3.1 sec 2. SPEC CPU2006 INT (ref input) - QEMU user mode (vlen=512) - Original: ~37.4 hr - v4: ~12.2 hr - v5: ~10.0 hr Based-on: 20240710032814.104643-1-richard.hender...@linaro.org Changes from v4: - v4 patch 1 - Queued - patch 1 - Separated from the patch 2 of v4 - patch 2 - Remove mask bound checking flow - Provide a subroutine to iterate pages accessed by the instruction - Add [set|clear]_helper_retaddr to avoid the munmap race issue on user mode - patch 3 - Apply the subroutine to the unit-stride whole register load/store instructions - patch 4 - Replace the original loop by memcpy call when the endian of both host and guest are the same Previous version: - v1: https://lore.kernel.org/all/20240215192823.729209-1-max.c...@sifive.com/ - v2: https://lore.kernel.org/all/20240531174504.281461-1-max.c...@sifive.com/ - v3: https://lore.kernel.org/all/20240613141906.1276105-1-max.c...@sifive.com/ - v4: https://lore.kernel.org/all/20240613175122.1299212-1-max.c...@sifive.com/ Max Chou (5): target/riscv: Set vdata.vm field for vector load/store whole register instructions target/riscv: rvv: Provide a fast path using direct access to host ram for unmasked unit-stride load/store target/riscv: rvv: Provide a fast path using direct access to host ram for unit-stride whole register load/store target/riscv: rvv: Provide group continuous ld/st flow for unit-stride ld/st instructions target/riscv: Inline unit-stride ld/st and corresponding functions for performance target/riscv/insn_trans/trans_rvv.c.inc | 3 + target/riscv/vector_helper.c | 482 +++++++++++++++--------- 2 files changed, 307 insertions(+), 178 deletions(-) -- 2.34.1