On 6/13/24 10:51, Max Chou wrote:
This commit references the sve_ldN_r/sve_stN_r helper functions in ARM
target to optimize the vector unmasked unit-stride load/store
instructions by following items:
* Get the loose bound of activate elements
* Probing pages/resolving host memory address/handling watchpoint at beginning
* Provide new interface to direct access host memory
The original element load/store interface is replaced by the new element
load/store functions with _tlb & _host postfix that means doing the
element load/store through the original softmmu flow and the direct
access host memory flow.
Signed-off-by: Max Chou <max.c...@sifive.com>
---
target/riscv/insn_trans/trans_rvv.c.inc | 3 +
target/riscv/vector_helper.c | 637 +++++++++++++++++++-----
target/riscv/vector_internals.h | 48 ++
3 files changed, 551 insertions(+), 137 deletions(-)
diff --git a/target/riscv/insn_trans/trans_rvv.c.inc
b/target/riscv/insn_trans/trans_rvv.c.inc
index 3a3896ba06c..14e10568bd7 100644
--- a/target/riscv/insn_trans/trans_rvv.c.inc
+++ b/target/riscv/insn_trans/trans_rvv.c.inc
@@ -770,6 +770,7 @@ static bool ld_us_mask_op(DisasContext *s, arg_vlm_v *a,
uint8_t eew)
/* Mask destination register are always tail-agnostic */
data = FIELD_DP32(data, VDATA, VTA, s->cfg_vta_all_1s);
data = FIELD_DP32(data, VDATA, VMA, s->vma);
+ data = FIELD_DP32(data, VDATA, VM, 1);
return ldst_us_trans(a->rd, a->rs1, data, fn, s, false);
}
@@ -787,6 +788,7 @@ static bool st_us_mask_op(DisasContext *s, arg_vsm_v *a, uint8_t eew)
/* EMUL = 1, NFIELDS = 1 */
data = FIELD_DP32(data, VDATA, LMUL, 0);
data = FIELD_DP32(data, VDATA, NF, 1);
+ data = FIELD_DP32(data, VDATA, VM, 1);
return ldst_us_trans(a->rd, a->rs1, data, fn, s, true);
}
@@ -1106,6 +1108,7 @@ static bool ldst_whole_trans(uint32_t vd, uint32_t rs1, uint32_t nf,
TCGv_i32 desc;
uint32_t data = FIELD_DP32(0, VDATA, NF, nf);
+ data = FIELD_DP32(data, VDATA, VM, 1);
dest = tcg_temp_new_ptr();
desc = tcg_constant_i32(simd_desc(s->cfg_ptr->vlenb,
s->cfg_ptr->vlenb, data));
This is ok, and would warrant a separate patch.
+ if (vm == 0) {
+ for (i = vstart; i < evl; ++i) {
+ if (vext_elem_mask(v0, i)) {
+ reg_idx_last = i;
+ if (reg_idx_first < 0) {
+ reg_idx_first = i;
+ }
+ }
+ }
This isn't great, and isn't used for now, since only unmasked unit-stride is handled so
far. I think this first patch should be simpler and *assume* VM is set.
+/*
+ * Resolve the guest virtual addresses to info->page[].
+ * Control the generation of page faults with @fault. Return false if
+ * there is no work to do, which can only happen with @fault == FAULT_NO.
+ */
+static bool vext_cont_ldst_pages(CPURISCVState *env, RVVContLdSt *info,
+ target_ulong addr, bool is_load,
+ uint32_t desc, uint32_t esz, uintptr_t ra,
+ bool is_us_whole)
+{
+ uint32_t vm = vext_vm(desc);
+ uint32_t nf = vext_nf(desc);
+ bool nofault = (vm == 1 ? false : true);
Why is nofault == "!vm"?
Also, it's silly to use ?: with true/false -- use the proper boolean expression in the
first place.
That said... faults with RVV must interact with vstart.
I'm not sure what the best code organization is.
Perhaps a subroutine, passed the first and last elements for a single page.
Update vstart, resolve the page, allowing the exception.
If watchpoints, one call to cpu_check_watchpoint for the entire memory range.
If ram, iterate through the rest of the page using host accesses; otherwise,
iterate through the rest of the page using tlb accesses, making sure vstart
is always up-to-date.
The main routine looks for the page_split, invokes the subroutine for the first (and
likely only) page. Special case any split-page element. Invoke the subroutine for the
second page.
r~