Re: [PULL 00/15] aspeed queue
On 7/21/24 18:13, Cédric Le Goater wrote: The following changes since commit a87a7c449e532130d4fa8faa391ff7e1f04ed660: Merge tag 'pull-loongarch-20240719' ofhttps://gitlab.com/gaosong/qemu into staging (2024-07-19 16:28:28 +1000) are available in the Git repository at: https://github.com/legoater/qemu/ tags/pull-aspeed-20240721 for you to fetch changes up to 4db1c16441923fc152142ae4bcc1cba23064cb8b: aspeed: fix coding style (2024-07-21 07:46:38 +0200) aspeed queue: * SMC model fix (Coverity) * AST2600 boot for eMMC support and test * AST2700 ADC model * I2C model changes preparing AST2700 I2C support Applied, thanks. Please update https://wiki.qemu.org/ChangeLog/9.1 as appropriate. r~
Re: [PULL 0/3] loongarch-to-apply queue
On 7/19/24 12:26, Song Gao wrote: Merge tag 'pull-target-arm-20240718' ofhttps://git.linaro.org/people/pmaydell/qemu-arm into staging (2024-07-19 07:02:17 +1000) are available in the Git repository at: https://gitlab.com/gaosong/qemu.git tags/pull-loongarch-20240719 for you to fetch changes up to 3ed016f525c8010e66be62d3ca6829eaa9b7cfb5: hw/loongarch: Modify flash block size to 256K (2024-07-19 10:40:04 +0800) pull-loongarch-20240719 Applied, thanks. Please update https://wiki.qemu.org/ChangeLog/9.1 as appropriate. r~
Re: [PULL 00/26] target-arm queue
On 7/18/24 23:20, Peter Maydell wrote: Hi; hopefully this is the last arm pullreq before softfreeze. There's a handful of miscellaneous bug fixes here, but the bulk of the pullreq is Mostafa's implementation of 2-stage translation in the SMMUv3. thanks -- PMM The following changes since commit d74ec4d7dda6322bcc51d1b13ccbd993d3574795: Merge tag 'pull-trivial-patches' ofhttps://gitlab.com/mjt0k/qemu into staging (2024-07-18 10:07:23 +1000) are available in the Git repository at: https://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20240718 for you to fetch changes up to 30a1690f2402e6c1582d5b3ebcf7940bfe2fad4b: hvf: arm: Do not advance PC when raising an exception (2024-07-18 13:49:30 +0100) target-arm queue: * Fix handling of LDAPR/STLR with negative offset * LDAPR should honour SCTLR_ELx.nAA * Use float_status copy in sme_fmopa_s * hw/display/bcm2835_fb: fix fb_use_offsets condition * hw/arm/smmuv3: Support and advertise nesting * Use FPST_F16 for SME FMOPA (widening) * tests/arm-cpu-features: Do not assume PMU availability * hvf: arm: Do not advance PC when raising an exception Applied, thanks. Please update https://wiki.qemu.org/ChangeLog/9.1 as appropriate. r~
[PATCH v3 11/12] target/s390x: Use set/clear_helper_retaddr in mem_helper.c
Avoid a race condition with munmap in another thread. For access_memset and access_memmove, manage the value within the helper. For uses of access_{get,set}_byte, manage the value across the for loops. Signed-off-by: Richard Henderson --- target/s390x/tcg/mem_helper.c | 43 ++- 1 file changed, 37 insertions(+), 6 deletions(-) diff --git a/target/s390x/tcg/mem_helper.c b/target/s390x/tcg/mem_helper.c index 331a35b2e5..0e12dae2aa 100644 --- a/target/s390x/tcg/mem_helper.c +++ b/target/s390x/tcg/mem_helper.c @@ -238,14 +238,14 @@ static void do_access_memset(CPUS390XState *env, vaddr vaddr, char *haddr, static void access_memset(CPUS390XState *env, S390Access *desta, uint8_t byte, uintptr_t ra) { - +set_helper_retaddr(ra); do_access_memset(env, desta->vaddr1, desta->haddr1, byte, desta->size1, desta->mmu_idx, ra); -if (likely(!desta->size2)) { -return; +if (unlikely(desta->size2)) { +do_access_memset(env, desta->vaddr2, desta->haddr2, byte, + desta->size2, desta->mmu_idx, ra); } -do_access_memset(env, desta->vaddr2, desta->haddr2, byte, desta->size2, - desta->mmu_idx, ra); +clear_helper_retaddr(); } static uint8_t access_get_byte(CPUS390XState *env, S390Access *access, @@ -366,6 +366,8 @@ static uint32_t do_helper_nc(CPUS390XState *env, uint32_t l, uint64_t dest, access_prepare(, env, src, l, MMU_DATA_LOAD, mmu_idx, ra); access_prepare(, env, dest, l, MMU_DATA_LOAD, mmu_idx, ra); access_prepare(, env, dest, l, MMU_DATA_STORE, mmu_idx, ra); +set_helper_retaddr(ra); + for (i = 0; i < l; i++) { const uint8_t x = access_get_byte(env, , i, ra) & access_get_byte(env, , i, ra); @@ -373,6 +375,8 @@ static uint32_t do_helper_nc(CPUS390XState *env, uint32_t l, uint64_t dest, c |= x; access_set_byte(env, , i, x, ra); } + +clear_helper_retaddr(); return c != 0; } @@ -407,6 +411,7 @@ static uint32_t do_helper_xc(CPUS390XState *env, uint32_t l, uint64_t dest, return 0; } +set_helper_retaddr(ra); for (i = 0; i < l; i++) { const uint8_t x = access_get_byte(env, , i, ra) ^ access_get_byte(env, , i, ra); @@ -414,6 +419,7 @@ static uint32_t do_helper_xc(CPUS390XState *env, uint32_t l, uint64_t dest, c |= x; access_set_byte(env, , i, x, ra); } +clear_helper_retaddr(); return c != 0; } @@ -441,6 +447,8 @@ static uint32_t do_helper_oc(CPUS390XState *env, uint32_t l, uint64_t dest, access_prepare(, env, src, l, MMU_DATA_LOAD, mmu_idx, ra); access_prepare(, env, dest, l, MMU_DATA_LOAD, mmu_idx, ra); access_prepare(, env, dest, l, MMU_DATA_STORE, mmu_idx, ra); +set_helper_retaddr(ra); + for (i = 0; i < l; i++) { const uint8_t x = access_get_byte(env, , i, ra) | access_get_byte(env, , i, ra); @@ -448,6 +456,8 @@ static uint32_t do_helper_oc(CPUS390XState *env, uint32_t l, uint64_t dest, c |= x; access_set_byte(env, , i, x, ra); } + +clear_helper_retaddr(); return c != 0; } @@ -484,11 +494,13 @@ static uint32_t do_helper_mvc(CPUS390XState *env, uint32_t l, uint64_t dest, } else if (!is_destructive_overlap(env, dest, src, l)) { access_memmove(env, , , ra); } else { +set_helper_retaddr(ra); for (i = 0; i < l; i++) { uint8_t byte = access_get_byte(env, , i, ra); access_set_byte(env, , i, byte, ra); } +clear_helper_retaddr(); } return env->cc_op; @@ -514,10 +526,12 @@ void HELPER(mvcrl)(CPUS390XState *env, uint64_t l, uint64_t dest, uint64_t src) access_prepare(, env, src, l, MMU_DATA_LOAD, mmu_idx, ra); access_prepare(, env, dest, l, MMU_DATA_STORE, mmu_idx, ra); +set_helper_retaddr(ra); for (i = l - 1; i >= 0; i--) { uint8_t byte = access_get_byte(env, , i, ra); access_set_byte(env, , i, byte, ra); } +clear_helper_retaddr(); } /* move inverse */ @@ -534,11 +548,13 @@ void HELPER(mvcin)(CPUS390XState *env, uint32_t l, uint64_t dest, uint64_t src) src = wrap_address(env, src - l + 1); access_prepare(, env, src, l, MMU_DATA_LOAD, mmu_idx, ra); access_prepare(, env, dest, l, MMU_DATA_STORE, mmu_idx, ra); + +set_helper_retaddr(ra); for (i = 0; i < l; i++) { const uint8_t x = access_get_byte(env, , l - i - 1, ra); - access_set_byte(env, , i, x, ra); } +clear_helper_retaddr(); } /* move numerics */ @@ -555,12 +571,15 @@ void HELPER(mvn)(CPUS390XState *env, uint32_t l, uint64_t dest, uint64_t src) access_prepare(, env, src, l, MMU_DATA_LOAD, mmu_idx, ra); access_prepare(, env, dest, l, MMU_DATA_LOA
[PATCH v3 08/12] target/ppc: Improve helper_dcbz for user-only
Mark the reserve_addr check unlikely. Use tlb_vaddr_to_host instead of probe_write, relying on the memset itself to test for page writability. Use set/clear_helper_retaddr so that we can properly unwind on segfault. With this, a trivial loop around guest memset will spend nearly 50% of runtime within helper_dcbz and host memset. Signed-off-by: Richard Henderson --- target/ppc/mem_helper.c | 15 +++ 1 file changed, 11 insertions(+), 4 deletions(-) diff --git a/target/ppc/mem_helper.c b/target/ppc/mem_helper.c index 24bae3b80c..953dd08d5d 100644 --- a/target/ppc/mem_helper.c +++ b/target/ppc/mem_helper.c @@ -280,20 +280,27 @@ static void dcbz_common(CPUPPCState *env, target_ulong addr, addr &= mask; /* Check reservation */ -if ((env->reserve_addr & mask) == addr) { +if (unlikely((env->reserve_addr & mask) == addr)) { env->reserve_addr = (target_ulong)-1ULL; } /* Try fast path translate */ +#ifdef CONFIG_USER_ONLY +haddr = tlb_vaddr_to_host(env, addr, MMU_DATA_STORE, mmu_idx); +#else haddr = probe_write(env, addr, dcbz_size, mmu_idx, retaddr); -if (haddr) { -memset(haddr, 0, dcbz_size); -} else { +if (unlikely(!haddr)) { /* Slow path */ for (int i = 0; i < dcbz_size; i += 8) { cpu_stq_mmuidx_ra(env, addr + i, 0, mmu_idx, retaddr); } +return; } +#endif + +set_helper_retaddr(retaddr); +memset(haddr, 0, dcbz_size); +clear_helper_retaddr(); } void helper_dcbz(CPUPPCState *env, target_ulong addr, int mmu_idx) -- 2.43.0
[PATCH v3 01/12] accel/tcg: Move {set, clear}_helper_retaddr to cpu_ldst.h
Use of these in helpers goes hand-in-hand with tlb_vaddr_to_host and other probing functions. Reviewed-by: Peter Maydell Signed-off-by: Richard Henderson --- accel/tcg/user-retaddr.h | 28 include/exec/cpu_ldst.h | 34 ++ accel/tcg/cpu-exec.c | 3 --- accel/tcg/user-exec.c| 1 - 4 files changed, 34 insertions(+), 32 deletions(-) delete mode 100644 accel/tcg/user-retaddr.h diff --git a/accel/tcg/user-retaddr.h b/accel/tcg/user-retaddr.h deleted file mode 100644 index e0f57e1994..00 --- a/accel/tcg/user-retaddr.h +++ /dev/null @@ -1,28 +0,0 @@ -#ifndef ACCEL_TCG_USER_RETADDR_H -#define ACCEL_TCG_USER_RETADDR_H - -#include "qemu/atomic.h" - -extern __thread uintptr_t helper_retaddr; - -static inline void set_helper_retaddr(uintptr_t ra) -{ -helper_retaddr = ra; -/* - * Ensure that this write is visible to the SIGSEGV handler that - * may be invoked due to a subsequent invalid memory operation. - */ -signal_barrier(); -} - -static inline void clear_helper_retaddr(void) -{ -/* - * Ensure that previous memory operations have succeeded before - * removing the data visible to the signal handler. - */ -signal_barrier(); -helper_retaddr = 0; -} - -#endif diff --git a/include/exec/cpu_ldst.h b/include/exec/cpu_ldst.h index 71009f84f5..dac12bd8eb 100644 --- a/include/exec/cpu_ldst.h +++ b/include/exec/cpu_ldst.h @@ -379,4 +379,38 @@ void *tlb_vaddr_to_host(CPUArchState *env, abi_ptr addr, MMUAccessType access_type, int mmu_idx); #endif +/* + * For user-only, helpers that use guest to host address translation + * must protect the actual host memory access by recording 'retaddr' + * for the signal handler. This is required for a race condition in + * which another thread unmaps the page between a probe and the + * actual access. + */ +#ifdef CONFIG_USER_ONLY +extern __thread uintptr_t helper_retaddr; + +static inline void set_helper_retaddr(uintptr_t ra) +{ +helper_retaddr = ra; +/* + * Ensure that this write is visible to the SIGSEGV handler that + * may be invoked due to a subsequent invalid memory operation. + */ +signal_barrier(); +} + +static inline void clear_helper_retaddr(void) +{ +/* + * Ensure that previous memory operations have succeeded before + * removing the data visible to the signal handler. + */ +signal_barrier(); +helper_retaddr = 0; +} +#else +#define set_helper_retaddr(ra) do { } while (0) +#define clear_helper_retaddr() do { } while (0) +#endif + #endif /* CPU_LDST_H */ diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c index 9010dad073..8163295f34 100644 --- a/accel/tcg/cpu-exec.c +++ b/accel/tcg/cpu-exec.c @@ -41,9 +41,6 @@ #include "tb-context.h" #include "internal-common.h" #include "internal-target.h" -#if defined(CONFIG_USER_ONLY) -#include "user-retaddr.h" -#endif /* -icount align implementation. */ diff --git a/accel/tcg/user-exec.c b/accel/tcg/user-exec.c index 80d24540ed..7ddc47b0ba 100644 --- a/accel/tcg/user-exec.c +++ b/accel/tcg/user-exec.c @@ -33,7 +33,6 @@ #include "tcg/tcg-ldst.h" #include "internal-common.h" #include "internal-target.h" -#include "user-retaddr.h" __thread uintptr_t helper_retaddr; -- 2.43.0
[PATCH v3 04/12] target/ppc/mem_helper.c: Remove a conditional from dcbz_common()
From: BALATON Zoltan Instead of passing a bool and select a value within dcbz_common() let the callers pass in the right value to avoid this conditional statement. On PPC dcbz is often used to zero memory and some code uses it a lot. This change improves the run time of a test case that copies memory with a dcbz call in every iteration from 6.23 to 5.83 seconds. Signed-off-by: BALATON Zoltan Message-Id: <20240622204833.5f7c74e6...@zero.eik.bme.hu> Reviewed-by: Richard Henderson Signed-off-by: Richard Henderson Reviewed-by: Nicholas Piggin --- target/ppc/mem_helper.c | 7 +++ 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/target/ppc/mem_helper.c b/target/ppc/mem_helper.c index f88155ad45..361fd72226 100644 --- a/target/ppc/mem_helper.c +++ b/target/ppc/mem_helper.c @@ -271,12 +271,11 @@ void helper_stsw(CPUPPCState *env, target_ulong addr, uint32_t nb, } static void dcbz_common(CPUPPCState *env, target_ulong addr, -uint32_t opcode, bool epid, uintptr_t retaddr) +uint32_t opcode, int mmu_idx, uintptr_t retaddr) { target_ulong mask, dcbz_size = env->dcache_line_size; uint32_t i; void *haddr; -int mmu_idx = epid ? PPC_TLB_EPID_STORE : ppc_env_mmu_index(env, false); #if defined(TARGET_PPC64) /* Check for dcbz vs dcbzl on 970 */ @@ -309,12 +308,12 @@ static void dcbz_common(CPUPPCState *env, target_ulong addr, void helper_dcbz(CPUPPCState *env, target_ulong addr, uint32_t opcode) { -dcbz_common(env, addr, opcode, false, GETPC()); +dcbz_common(env, addr, opcode, ppc_env_mmu_index(env, false), GETPC()); } void helper_dcbzep(CPUPPCState *env, target_ulong addr, uint32_t opcode) { -dcbz_common(env, addr, opcode, true, GETPC()); +dcbz_common(env, addr, opcode, PPC_TLB_EPID_STORE, GETPC()); } void helper_icbi(CPUPPCState *env, target_ulong addr) -- 2.43.0
[PATCH v3 02/12] target/arm: Use set/clear_helper_retaddr in helper-a64.c
Use these in helper_dc_dva and the FEAT_MOPS routines to avoid a race condition with munmap in another thread. Reviewed-by: Peter Maydell Signed-off-by: Richard Henderson --- target/arm/tcg/helper-a64.c | 14 -- 1 file changed, 12 insertions(+), 2 deletions(-) diff --git a/target/arm/tcg/helper-a64.c b/target/arm/tcg/helper-a64.c index 0ea8668ab4..c60d2a7ec9 100644 --- a/target/arm/tcg/helper-a64.c +++ b/target/arm/tcg/helper-a64.c @@ -928,6 +928,8 @@ uint32_t HELPER(sqrt_f16)(uint32_t a, void *fpstp) void HELPER(dc_zva)(CPUARMState *env, uint64_t vaddr_in) { +uintptr_t ra = GETPC(); + /* * Implement DC ZVA, which zeroes a fixed-length block of memory. * Note that we do not implement the (architecturally mandated) @@ -948,8 +950,6 @@ void HELPER(dc_zva)(CPUARMState *env, uint64_t vaddr_in) #ifndef CONFIG_USER_ONLY if (unlikely(!mem)) { -uintptr_t ra = GETPC(); - /* * Trap if accessing an invalid page. DC_ZVA requires that we supply * the original pointer for an invalid page. But watchpoints require @@ -971,7 +971,9 @@ void HELPER(dc_zva)(CPUARMState *env, uint64_t vaddr_in) } #endif +set_helper_retaddr(ra); memset(mem, 0, blocklen); +clear_helper_retaddr(); } void HELPER(unaligned_access)(CPUARMState *env, uint64_t addr, @@ -1120,7 +1122,9 @@ static uint64_t set_step(CPUARMState *env, uint64_t toaddr, } #endif /* Easy case: just memset the host memory */ +set_helper_retaddr(ra); memset(mem, data, setsize); +clear_helper_retaddr(); return setsize; } @@ -1163,7 +1167,9 @@ static uint64_t set_step_tags(CPUARMState *env, uint64_t toaddr, } #endif /* Easy case: just memset the host memory */ +set_helper_retaddr(ra); memset(mem, data, setsize); +clear_helper_retaddr(); mte_mops_set_tags(env, toaddr, setsize, *mtedesc); return setsize; } @@ -1497,7 +1503,9 @@ static uint64_t copy_step(CPUARMState *env, uint64_t toaddr, uint64_t fromaddr, } #endif /* Easy case: just memmove the host memory */ +set_helper_retaddr(ra); memmove(wmem, rmem, copysize); +clear_helper_retaddr(); return copysize; } @@ -1572,7 +1580,9 @@ static uint64_t copy_step_rev(CPUARMState *env, uint64_t toaddr, * Easy case: just memmove the host memory. Note that wmem and * rmem here point to the *last* byte to copy. */ +set_helper_retaddr(ra); memmove(wmem - (copysize - 1), rmem - (copysize - 1), copysize); +clear_helper_retaddr(); return copysize; } -- 2.43.0
[PATCH v3 03/12] target/arm: Use set/clear_helper_retaddr in SVE and SME helpers
Avoid a race condition with munmap in another thread. Use around blocks that exclusively use "host_fn". Keep the blocks as small as possible, but without setting and clearing for every operation on one page. Signed-off-by: Richard Henderson --- target/arm/tcg/sme_helper.c | 16 ++ target/arm/tcg/sve_helper.c | 42 + 2 files changed, 49 insertions(+), 9 deletions(-) diff --git a/target/arm/tcg/sme_helper.c b/target/arm/tcg/sme_helper.c index e2e0575039..ab40ced38f 100644 --- a/target/arm/tcg/sme_helper.c +++ b/target/arm/tcg/sme_helper.c @@ -517,6 +517,8 @@ void sme_ld1(CPUARMState *env, void *za, uint64_t *vg, clr_fn(za, 0, reg_off); } +set_helper_retaddr(ra); + while (reg_off <= reg_last) { uint64_t pg = vg[reg_off >> 6]; do { @@ -529,6 +531,8 @@ void sme_ld1(CPUARMState *env, void *za, uint64_t *vg, } while (reg_off <= reg_last && (reg_off & 63)); } +clear_helper_retaddr(); + /* * Use the slow path to manage the cross-page misalignment. * But we know this is RAM and cannot trap. @@ -543,6 +547,8 @@ void sme_ld1(CPUARMState *env, void *za, uint64_t *vg, reg_last = info.reg_off_last[1]; host = info.page[1].host; +set_helper_retaddr(ra); + do { uint64_t pg = vg[reg_off >> 6]; do { @@ -554,6 +560,8 @@ void sme_ld1(CPUARMState *env, void *za, uint64_t *vg, reg_off += esize; } while (reg_off & 63); } while (reg_off <= reg_last); + +clear_helper_retaddr(); } } @@ -701,6 +709,8 @@ void sme_st1(CPUARMState *env, void *za, uint64_t *vg, reg_last = info.reg_off_last[0]; host = info.page[0].host; +set_helper_retaddr(ra); + while (reg_off <= reg_last) { uint64_t pg = vg[reg_off >> 6]; do { @@ -711,6 +721,8 @@ void sme_st1(CPUARMState *env, void *za, uint64_t *vg, } while (reg_off <= reg_last && (reg_off & 63)); } +clear_helper_retaddr(); + /* * Use the slow path to manage the cross-page misalignment. * But we know this is RAM and cannot trap. @@ -725,6 +737,8 @@ void sme_st1(CPUARMState *env, void *za, uint64_t *vg, reg_last = info.reg_off_last[1]; host = info.page[1].host; +set_helper_retaddr(ra); + do { uint64_t pg = vg[reg_off >> 6]; do { @@ -734,6 +748,8 @@ void sme_st1(CPUARMState *env, void *za, uint64_t *vg, reg_off += 1 << esz; } while (reg_off & 63); } while (reg_off <= reg_last); + +clear_helper_retaddr(); } } diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c index dd49e67d7a..f1ee0e060f 100644 --- a/target/arm/tcg/sve_helper.c +++ b/target/arm/tcg/sve_helper.c @@ -5738,6 +5738,8 @@ void sve_ldN_r(CPUARMState *env, uint64_t *vg, const target_ulong addr, reg_last = info.reg_off_last[0]; host = info.page[0].host; +set_helper_retaddr(retaddr); + while (reg_off <= reg_last) { uint64_t pg = vg[reg_off >> 6]; do { @@ -5752,6 +5754,8 @@ void sve_ldN_r(CPUARMState *env, uint64_t *vg, const target_ulong addr, } while (reg_off <= reg_last && (reg_off & 63)); } +clear_helper_retaddr(); + /* * Use the slow path to manage the cross-page misalignment. * But we know this is RAM and cannot trap. @@ -5771,6 +5775,8 @@ void sve_ldN_r(CPUARMState *env, uint64_t *vg, const target_ulong addr, reg_last = info.reg_off_last[1]; host = info.page[1].host; +set_helper_retaddr(retaddr); + do { uint64_t pg = vg[reg_off >> 6]; do { @@ -5784,6 +5790,8 @@ void sve_ldN_r(CPUARMState *env, uint64_t *vg, const target_ulong addr, mem_off += N << msz; } while (reg_off & 63); } while (reg_off <= reg_last); + +clear_helper_retaddr(); } } @@ -5934,15 +5942,11 @@ DO_LDN_2(4, dd, MO_64) /* * Load contiguous data, first-fault and no-fault. * - * For user-only, one could argue that we should hold the mmap_lock during - * the operation so that there is no race between page_check_range and the - * load operation. However, unmapping pages out from under a running thread - * is extraordinarily unlikely. This theoretical race condition also affects - * linux-user/ in its get_user/put_user macros. - * - * TODO: Construct some helpers, written in assembly, that interact with - * host_signal_handler to produce memory ops which can properly report errors - * without racing. + * For user-only, we control the race between page_check_range and + * another thread's munmap by using set/clear_helper_retaddr. Any + * SEGV that occurs between those markers is assumed to be b
[PATCH v3 05/12] target/ppc: Hoist dcbz_size out of dcbz_common
The 970 logic does not apply to dcbzep, which is an e500 insn. Reviewed-by: Nicholas Piggin Reviewed-by: BALATON Zoltan Signed-off-by: Richard Henderson --- target/ppc/mem_helper.c | 30 +++--- 1 file changed, 15 insertions(+), 15 deletions(-) diff --git a/target/ppc/mem_helper.c b/target/ppc/mem_helper.c index 361fd72226..5067919ff8 100644 --- a/target/ppc/mem_helper.c +++ b/target/ppc/mem_helper.c @@ -271,22 +271,12 @@ void helper_stsw(CPUPPCState *env, target_ulong addr, uint32_t nb, } static void dcbz_common(CPUPPCState *env, target_ulong addr, -uint32_t opcode, int mmu_idx, uintptr_t retaddr) +int dcbz_size, int mmu_idx, uintptr_t retaddr) { -target_ulong mask, dcbz_size = env->dcache_line_size; -uint32_t i; +target_ulong mask = ~(target_ulong)(dcbz_size - 1); void *haddr; -#if defined(TARGET_PPC64) -/* Check for dcbz vs dcbzl on 970 */ -if (env->excp_model == POWERPC_EXCP_970 && -!(opcode & 0x0020) && ((env->spr[SPR_970_HID5] >> 7) & 0x3) == 1) { -dcbz_size = 32; -} -#endif - /* Align address */ -mask = ~(dcbz_size - 1); addr &= mask; /* Check reservation */ @@ -300,7 +290,7 @@ static void dcbz_common(CPUPPCState *env, target_ulong addr, memset(haddr, 0, dcbz_size); } else { /* Slow path */ -for (i = 0; i < dcbz_size; i += 8) { +for (int i = 0; i < dcbz_size; i += 8) { cpu_stq_mmuidx_ra(env, addr + i, 0, mmu_idx, retaddr); } } @@ -308,12 +298,22 @@ static void dcbz_common(CPUPPCState *env, target_ulong addr, void helper_dcbz(CPUPPCState *env, target_ulong addr, uint32_t opcode) { -dcbz_common(env, addr, opcode, ppc_env_mmu_index(env, false), GETPC()); +int dcbz_size = env->dcache_line_size; + +#if defined(TARGET_PPC64) +/* Check for dcbz vs dcbzl on 970 */ +if (env->excp_model == POWERPC_EXCP_970 && +!(opcode & 0x0020) && ((env->spr[SPR_970_HID5] >> 7) & 0x3) == 1) { +dcbz_size = 32; +} +#endif + +dcbz_common(env, addr, dcbz_size, ppc_env_mmu_index(env, false), GETPC()); } void helper_dcbzep(CPUPPCState *env, target_ulong addr, uint32_t opcode) { -dcbz_common(env, addr, opcode, PPC_TLB_EPID_STORE, GETPC()); +dcbz_common(env, addr, env->dcache_line_size, PPC_TLB_EPID_STORE, GETPC()); } void helper_icbi(CPUPPCState *env, target_ulong addr) -- 2.43.0
[PATCH v3 07/12] target/ppc: Merge helper_{dcbz,dcbzep}
Merge the two and pass the mmu_idx directly from translation. Swap the argument order in dcbz_common to avoid extra swaps. Reviewed-by: Nicholas Piggin Signed-off-by: Richard Henderson --- target/ppc/helper.h | 3 +-- target/ppc/mem_helper.c | 14 -- target/ppc/translate.c | 4 ++-- 3 files changed, 7 insertions(+), 14 deletions(-) diff --git a/target/ppc/helper.h b/target/ppc/helper.h index afc56855ff..4fa089cbf9 100644 --- a/target/ppc/helper.h +++ b/target/ppc/helper.h @@ -46,8 +46,7 @@ DEF_HELPER_FLAGS_3(stmw, TCG_CALL_NO_WG, void, env, tl, i32) DEF_HELPER_4(lsw, void, env, tl, i32, i32) DEF_HELPER_5(lswx, void, env, tl, i32, i32, i32) DEF_HELPER_FLAGS_4(stsw, TCG_CALL_NO_WG, void, env, tl, i32, i32) -DEF_HELPER_FLAGS_2(dcbz, TCG_CALL_NO_WG, void, env, tl) -DEF_HELPER_FLAGS_2(dcbzep, TCG_CALL_NO_WG, void, env, tl) +DEF_HELPER_FLAGS_3(dcbz, TCG_CALL_NO_WG, void, env, tl, int) #ifdef TARGET_PPC64 DEF_HELPER_FLAGS_2(dcbzl, TCG_CALL_NO_WG, void, env, tl) #endif diff --git a/target/ppc/mem_helper.c b/target/ppc/mem_helper.c index d4957efd6e..24bae3b80c 100644 --- a/target/ppc/mem_helper.c +++ b/target/ppc/mem_helper.c @@ -271,7 +271,7 @@ void helper_stsw(CPUPPCState *env, target_ulong addr, uint32_t nb, } static void dcbz_common(CPUPPCState *env, target_ulong addr, -int dcbz_size, int mmu_idx, uintptr_t retaddr) +int mmu_idx, int dcbz_size, uintptr_t retaddr) { target_ulong mask = ~(target_ulong)(dcbz_size - 1); void *haddr; @@ -296,15 +296,9 @@ static void dcbz_common(CPUPPCState *env, target_ulong addr, } } -void helper_dcbz(CPUPPCState *env, target_ulong addr) +void helper_dcbz(CPUPPCState *env, target_ulong addr, int mmu_idx) { -dcbz_common(env, addr, env->dcache_line_size, -ppc_env_mmu_index(env, false), GETPC()); -} - -void helper_dcbzep(CPUPPCState *env, target_ulong addr) -{ -dcbz_common(env, addr, env->dcache_line_size, PPC_TLB_EPID_STORE, GETPC()); +dcbz_common(env, addr, mmu_idx, env->dcache_line_size, GETPC()); } #ifdef TARGET_PPC64 @@ -320,7 +314,7 @@ void helper_dcbzl(CPUPPCState *env, target_ulong addr) dcbz_size = 32; } -dcbz_common(env, addr, dcbz_size, ppc_env_mmu_index(env, false), GETPC()); +dcbz_common(env, addr, ppc_env_mmu_index(env, false), dcbz_size, GETPC()); } #endif diff --git a/target/ppc/translate.c b/target/ppc/translate.c index 9e472ab7ef..cba943a49d 100644 --- a/target/ppc/translate.c +++ b/target/ppc/translate.c @@ -4458,7 +4458,7 @@ static void gen_dcbz(DisasContext *ctx) } #endif -gen_helper_dcbz(tcg_env, tcgv_addr); +gen_helper_dcbz(tcg_env, tcgv_addr, tcg_constant_i32(ctx->mem_idx)); } /* dcbzep */ @@ -4468,7 +4468,7 @@ static void gen_dcbzep(DisasContext *ctx) gen_set_access_type(ctx, ACCESS_CACHE); gen_addr_reg_index(ctx, tcgv_addr); -gen_helper_dcbzep(tcg_env, tcgv_addr); +gen_helper_dcbz(tcg_env, tcgv_addr, tcg_constant_i32(PPC_TLB_EPID_STORE)); } /* dst / dstt */ -- 2.43.0
[PATCH v3 12/12] target/riscv: Simplify probing in vext_ldff
The current pairing of tlb_vaddr_to_host with extra is either inefficient (user-only, with page_check_range) or incorrect (system, with probe_pages). For proper non-fault behaviour, use probe_access_flags with its nonfault parameter set to true. Acked-by: Alistair Francis Signed-off-by: Richard Henderson --- target/riscv/vector_helper.c | 31 +-- 1 file changed, 17 insertions(+), 14 deletions(-) diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c index 1b4d5a8e37..10a52ceb5b 100644 --- a/target/riscv/vector_helper.c +++ b/target/riscv/vector_helper.c @@ -474,7 +474,6 @@ vext_ldff(void *vd, void *v0, target_ulong base, vext_ldst_elem_fn *ldst_elem, uint32_t log2_esz, uintptr_t ra) { -void *host; uint32_t i, k, vl = 0; uint32_t nf = vext_nf(desc); uint32_t vm = vext_vm(desc); @@ -493,27 +492,31 @@ vext_ldff(void *vd, void *v0, target_ulong base, } addr = adjust_addr(env, base + i * (nf << log2_esz)); if (i == 0) { +/* Allow fault on first element. */ probe_pages(env, addr, nf << log2_esz, ra, MMU_DATA_LOAD); } else { -/* if it triggers an exception, no need to check watchpoint */ remain = nf << log2_esz; while (remain > 0) { +void *host; +int flags; + offset = -(addr | TARGET_PAGE_MASK); -host = tlb_vaddr_to_host(env, addr, MMU_DATA_LOAD, mmu_index); -if (host) { -#ifdef CONFIG_USER_ONLY -if (!page_check_range(addr, offset, PAGE_READ)) { -vl = i; -goto ProbeSuccess; -} -#else -probe_pages(env, addr, offset, ra, MMU_DATA_LOAD); -#endif -} else { + +/* Probe nonfault on subsequent elements. */ +flags = probe_access_flags(env, addr, offset, MMU_DATA_LOAD, + mmu_index, true, , 0); + +/* + * Stop if invalid (unmapped) or mmio (transaction may fail). + * Do not stop if watchpoint, as the spec says that + * first-fault should continue to access the same + * elements regardless of any watchpoint. + */ +if (flags & ~TLB_WATCHPOINT) { vl = i; goto ProbeSuccess; } -if (remain <= offset) { +if (remain <= offset) { break; } remain -= offset; -- 2.43.0
[PATCH v3 09/12] target/s390x: Use user_or_likely in do_access_memset
Eliminate the ifdef by using a predicate that is always true with CONFIG_USER_ONLY. Reviewed-by: Peter Maydell Signed-off-by: Richard Henderson --- target/s390x/tcg/mem_helper.c | 6 +- 1 file changed, 1 insertion(+), 5 deletions(-) diff --git a/target/s390x/tcg/mem_helper.c b/target/s390x/tcg/mem_helper.c index 6cdbc34178..5311a15a09 100644 --- a/target/s390x/tcg/mem_helper.c +++ b/target/s390x/tcg/mem_helper.c @@ -225,10 +225,7 @@ static void do_access_memset(CPUS390XState *env, vaddr vaddr, char *haddr, uint8_t byte, uint16_t size, int mmu_idx, uintptr_t ra) { -#ifdef CONFIG_USER_ONLY -memset(haddr, byte, size); -#else -if (likely(haddr)) { +if (user_or_likely(haddr)) { memset(haddr, byte, size); } else { MemOpIdx oi = make_memop_idx(MO_UB, mmu_idx); @@ -236,7 +233,6 @@ static void do_access_memset(CPUS390XState *env, vaddr vaddr, char *haddr, cpu_stb_mmu(env, vaddr + i, byte, oi, ra); } } -#endif } static void access_memset(CPUS390XState *env, S390Access *desta, -- 2.43.0
[PATCH v3 06/12] target/ppc: Split out helper_dbczl for 970
We can determine at translation time whether the insn is or is not dbczl. We must retain a runtime check against the HID5 register, but we can move that to a separate function that never affects other ppc models. Reviewed-by: Nicholas Piggin Reviewed-by: BALATON Zoltan Signed-off-by: Richard Henderson --- target/ppc/helper.h | 7 +-- target/ppc/mem_helper.c | 34 +- target/ppc/translate.c | 24 ++-- 3 files changed, 40 insertions(+), 25 deletions(-) diff --git a/target/ppc/helper.h b/target/ppc/helper.h index 76b8f25c77..afc56855ff 100644 --- a/target/ppc/helper.h +++ b/target/ppc/helper.h @@ -46,8 +46,11 @@ DEF_HELPER_FLAGS_3(stmw, TCG_CALL_NO_WG, void, env, tl, i32) DEF_HELPER_4(lsw, void, env, tl, i32, i32) DEF_HELPER_5(lswx, void, env, tl, i32, i32, i32) DEF_HELPER_FLAGS_4(stsw, TCG_CALL_NO_WG, void, env, tl, i32, i32) -DEF_HELPER_FLAGS_3(dcbz, TCG_CALL_NO_WG, void, env, tl, i32) -DEF_HELPER_FLAGS_3(dcbzep, TCG_CALL_NO_WG, void, env, tl, i32) +DEF_HELPER_FLAGS_2(dcbz, TCG_CALL_NO_WG, void, env, tl) +DEF_HELPER_FLAGS_2(dcbzep, TCG_CALL_NO_WG, void, env, tl) +#ifdef TARGET_PPC64 +DEF_HELPER_FLAGS_2(dcbzl, TCG_CALL_NO_WG, void, env, tl) +#endif DEF_HELPER_FLAGS_2(icbi, TCG_CALL_NO_WG, void, env, tl) DEF_HELPER_FLAGS_2(icbiep, TCG_CALL_NO_WG, void, env, tl) DEF_HELPER_5(lscbx, tl, env, tl, i32, i32, i32) diff --git a/target/ppc/mem_helper.c b/target/ppc/mem_helper.c index 5067919ff8..d4957efd6e 100644 --- a/target/ppc/mem_helper.c +++ b/target/ppc/mem_helper.c @@ -296,26 +296,34 @@ static void dcbz_common(CPUPPCState *env, target_ulong addr, } } -void helper_dcbz(CPUPPCState *env, target_ulong addr, uint32_t opcode) +void helper_dcbz(CPUPPCState *env, target_ulong addr) { -int dcbz_size = env->dcache_line_size; - -#if defined(TARGET_PPC64) -/* Check for dcbz vs dcbzl on 970 */ -if (env->excp_model == POWERPC_EXCP_970 && -!(opcode & 0x0020) && ((env->spr[SPR_970_HID5] >> 7) & 0x3) == 1) { -dcbz_size = 32; -} -#endif - -dcbz_common(env, addr, dcbz_size, ppc_env_mmu_index(env, false), GETPC()); +dcbz_common(env, addr, env->dcache_line_size, +ppc_env_mmu_index(env, false), GETPC()); } -void helper_dcbzep(CPUPPCState *env, target_ulong addr, uint32_t opcode) +void helper_dcbzep(CPUPPCState *env, target_ulong addr) { dcbz_common(env, addr, env->dcache_line_size, PPC_TLB_EPID_STORE, GETPC()); } +#ifdef TARGET_PPC64 +void helper_dcbzl(CPUPPCState *env, target_ulong addr) +{ +int dcbz_size = env->dcache_line_size; + +/* + * The translator checked for POWERPC_EXCP_970. + * All that's left is to check HID5. + */ +if (((env->spr[SPR_970_HID5] >> 7) & 0x3) == 1) { +dcbz_size = 32; +} + +dcbz_common(env, addr, dcbz_size, ppc_env_mmu_index(env, false), GETPC()); +} +#endif + void helper_icbi(CPUPPCState *env, target_ulong addr) { addr &= ~(env->dcache_line_size - 1); diff --git a/target/ppc/translate.c b/target/ppc/translate.c index 0bc16d7251..9e472ab7ef 100644 --- a/target/ppc/translate.c +++ b/target/ppc/translate.c @@ -178,6 +178,7 @@ struct DisasContext { /* Translation flags */ MemOp default_tcg_memop_mask; #if defined(TARGET_PPC64) +powerpc_excp_t excp_model; bool sf_mode; bool has_cfar; bool has_bhrb; @@ -4445,27 +4446,29 @@ static void gen_dcblc(DisasContext *ctx) /* dcbz */ static void gen_dcbz(DisasContext *ctx) { -TCGv tcgv_addr; -TCGv_i32 tcgv_op; +TCGv tcgv_addr = tcg_temp_new(); gen_set_access_type(ctx, ACCESS_CACHE); -tcgv_addr = tcg_temp_new(); -tcgv_op = tcg_constant_i32(ctx->opcode & 0x03FF000); gen_addr_reg_index(ctx, tcgv_addr); -gen_helper_dcbz(tcg_env, tcgv_addr, tcgv_op); + +#ifdef TARGET_PPC64 +if (ctx->excp_model == POWERPC_EXCP_970 && !(ctx->opcode & 0x0020)) { +gen_helper_dcbzl(tcg_env, tcgv_addr); +return; +} +#endif + +gen_helper_dcbz(tcg_env, tcgv_addr); } /* dcbzep */ static void gen_dcbzep(DisasContext *ctx) { -TCGv tcgv_addr; -TCGv_i32 tcgv_op; +TCGv tcgv_addr = tcg_temp_new(); gen_set_access_type(ctx, ACCESS_CACHE); -tcgv_addr = tcg_temp_new(); -tcgv_op = tcg_constant_i32(ctx->opcode & 0x03FF000); gen_addr_reg_index(ctx, tcgv_addr); -gen_helper_dcbzep(tcg_env, tcgv_addr, tcgv_op); +gen_helper_dcbzep(tcg_env, tcgv_addr); } /* dst / dstt */ @@ -6486,6 +6489,7 @@ static void ppc_tr_init_disas_context(DisasContextBase *dcbase, CPUState *cs) ctx->default_tcg_memop_mask = ctx->le_mode ? MO_LE : MO_BE; ctx->flags = env->flags; #if defined(TARGET_PPC64) +ctx->excp_model = env->excp_model; ctx->sf_mode = (hflags >> HFLAGS_64) & 1; ctx->has_cfar = !!(env->flags & POWERPC_FLAG_CFAR); ctx->has_bhrb = !!(env->flags & POWERPC_FLAG_BHRB); -- 2.43.0
[PATCH v3 10/12] target/s390x: Use user_or_likely in access_memmove
Invert the conditional, indent the block, and use the macro that expands to true for user-only. Signed-off-by: Richard Henderson --- target/s390x/tcg/mem_helper.c | 54 +-- 1 file changed, 26 insertions(+), 28 deletions(-) diff --git a/target/s390x/tcg/mem_helper.c b/target/s390x/tcg/mem_helper.c index 5311a15a09..331a35b2e5 100644 --- a/target/s390x/tcg/mem_helper.c +++ b/target/s390x/tcg/mem_helper.c @@ -296,41 +296,39 @@ static void access_memmove(CPUS390XState *env, S390Access *desta, S390Access *srca, uintptr_t ra) { int len = desta->size1 + desta->size2; -int diff; assert(len == srca->size1 + srca->size2); /* Fallback to slow access in case we don't have access to all host pages */ -if (unlikely(!desta->haddr1 || (desta->size2 && !desta->haddr2) || - !srca->haddr1 || (srca->size2 && !srca->haddr2))) { -int i; +if (user_or_likely(desta->haddr1 && + srca->haddr1 && + (!desta->size2 || desta->haddr2) && + (!srca->size2 || srca->haddr2))) { +int diff = desta->size1 - srca->size1; -for (i = 0; i < len; i++) { -uint8_t byte = access_get_byte(env, srca, i, ra); - -access_set_byte(env, desta, i, byte, ra); -} -return; -} - -diff = desta->size1 - srca->size1; -if (likely(diff == 0)) { -memmove(desta->haddr1, srca->haddr1, srca->size1); -if (unlikely(srca->size2)) { -memmove(desta->haddr2, srca->haddr2, srca->size2); -} -} else if (diff > 0) { -memmove(desta->haddr1, srca->haddr1, srca->size1); -memmove(desta->haddr1 + srca->size1, srca->haddr2, diff); -if (likely(desta->size2)) { -memmove(desta->haddr2, srca->haddr2 + diff, desta->size2); +if (likely(diff == 0)) { +memmove(desta->haddr1, srca->haddr1, srca->size1); +if (unlikely(srca->size2)) { +memmove(desta->haddr2, srca->haddr2, srca->size2); +} +} else if (diff > 0) { +memmove(desta->haddr1, srca->haddr1, srca->size1); +memmove(desta->haddr1 + srca->size1, srca->haddr2, diff); +if (likely(desta->size2)) { +memmove(desta->haddr2, srca->haddr2 + diff, desta->size2); +} +} else { +diff = -diff; +memmove(desta->haddr1, srca->haddr1, desta->size1); +memmove(desta->haddr2, srca->haddr1 + desta->size1, diff); +if (likely(srca->size2)) { +memmove(desta->haddr2 + diff, srca->haddr2, srca->size2); +} } } else { -diff = -diff; -memmove(desta->haddr1, srca->haddr1, desta->size1); -memmove(desta->haddr2, srca->haddr1 + desta->size1, diff); -if (likely(srca->size2)) { -memmove(desta->haddr2 + diff, srca->haddr2, srca->size2); +for (int i = 0; i < len; i++) { +uint8_t byte = access_get_byte(env, srca, i, ra); +access_set_byte(env, desta, i, byte, ra); } } } -- 2.43.0
[PATCH v3 00/12] Fixes for user-only munmap races
Changes for v3: * Fix patch 3 (sve) vs goto do_fault (pmm) * Fix patch 12 (rvv) vs watchpoints and goto ProbeSuccess (max chou). * Apply r-b. r~ BALATON Zoltan (1): target/ppc/mem_helper.c: Remove a conditional from dcbz_common() Richard Henderson (11): accel/tcg: Move {set,clear}_helper_retaddr to cpu_ldst.h target/arm: Use set/clear_helper_retaddr in helper-a64.c target/arm: Use set/clear_helper_retaddr in SVE and SME helpers target/ppc: Hoist dcbz_size out of dcbz_common target/ppc: Split out helper_dbczl for 970 target/ppc: Merge helper_{dcbz,dcbzep} target/ppc: Improve helper_dcbz for user-only target/s390x: Use user_or_likely in do_access_memset target/s390x: Use user_or_likely in access_memmove target/s390x: Use set/clear_helper_retaddr in mem_helper.c target/riscv: Simplify probing in vext_ldff accel/tcg/user-retaddr.h | 28 - include/exec/cpu_ldst.h | 34 +++ target/ppc/helper.h | 6 +- accel/tcg/cpu-exec.c | 3 - accel/tcg/user-exec.c | 1 - target/arm/tcg/helper-a64.c | 14 - target/arm/tcg/sme_helper.c | 16 ++ target/arm/tcg/sve_helper.c | 42 +++--- target/ppc/mem_helper.c | 52 + target/ppc/translate.c| 24 target/riscv/vector_helper.c | 31 +- target/s390x/tcg/mem_helper.c | 103 +- 12 files changed, 224 insertions(+), 130 deletions(-) delete mode 100644 accel/tcg/user-retaddr.h -- 2.43.0
[PATCH] tests/tcg/aarch64: Fix test-mte.py
Python 3.12 warns: TESTgdbstub MTE support on aarch64 /home/rth/qemu/src/tests/tcg/aarch64/gdbstub/test-mte.py:21: SyntaxWarning: invalid escape sequence '\(' PATTERN_0 = "Memory tags for address 0x[0-9a-f]+ match \(0x[0-9a-f]+\)." Double up the \ to pass one through to the pattern. Signed-off-by: Richard Henderson --- tests/tcg/aarch64/gdbstub/test-mte.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tests/tcg/aarch64/gdbstub/test-mte.py b/tests/tcg/aarch64/gdbstub/test-mte.py index 2db0663c1a..66f9c25f8a 100644 --- a/tests/tcg/aarch64/gdbstub/test-mte.py +++ b/tests/tcg/aarch64/gdbstub/test-mte.py @@ -18,7 +18,7 @@ from test_gdbstub import main, report -PATTERN_0 = "Memory tags for address 0x[0-9a-f]+ match \(0x[0-9a-f]+\)." +PATTERN_0 = "Memory tags for address 0x[0-9a-f]+ match \\(0x[0-9a-f]+\\)." PATTERN_1 = ".*(0x[0-9a-f]+)" -- 2.43.0
Re: [PATCH v5 00/19] Reconstruct loongson ipi driver
On 7/18/24 23:32, Philippe Mathieu-Daudé wrote: Since v4: - Fix build failure due to rebase (Song) - Loongarch -> LoongArch (Song) - Added Song's tags Since v3: - Use DEFINE_TYPES() macro (unreviewed patch #1) - Update MAINTAINERS - Added Bibo's tags Ho hum, I didn't notice v5 when I just reviewed v4. For the series: Reviewed-by: Richard Henderson r~
Re: [PATCH v4 18/18] hw/intc/loongson_ipi: Remove unused headers
On 7/18/24 18:38, Philippe Mathieu-Daudé wrote: Signed-off-by: Philippe Mathieu-Daudé Reviewed-by: Bibo Mao Tested-by: Bibo Mao --- hw/intc/loongson_ipi.c | 9 - 1 file changed, 9 deletions(-) Reviewed-by: Richard Henderson r~
Re: [PATCH v4 16/18] hw/loongarch/virt: Replace loongson IPI with loongarch IPI
On 7/18/24 18:38, Philippe Mathieu-Daudé wrote: From: Bibo Mao Loongarch IPI inherits from class LoongsonIPICommonClass, and it only contains Loongarch 3A5000 virt machine specific interfaces, rather than mix different machine implementations together. Signed-off-by: Bibo Mao [PMD: Rebased] Co-Developed-by: Philippe Mathieu-Daudé Signed-off-by: Philippe Mathieu-Daudé Reviewed-by: Bibo Mao Tested-by: Bibo Mao --- include/hw/loongarch/virt.h | 1 - hw/loongarch/virt.c | 4 ++-- hw/loongarch/Kconfig| 2 +- 3 files changed, 3 insertions(+), 4 deletions(-) Reviewed-by: Richard Henderson r~
Re: [PATCH v4 17/18] hw/intc/loongson_ipi: Restrict to MIPS
On 7/18/24 18:38, Philippe Mathieu-Daudé wrote: From: Bibo Mao Now than LoongArch target can use the TYPE_LOONGARCH_IPI model, restrict TYPE_LOONGSON_IPI to MIPS. Signed-off-by: Bibo Mao [PMD: Extracted from bigger commit, added commit description] Co-Developed-by: Philippe Mathieu-Daudé Signed-off-by: Philippe Mathieu-Daudé Reviewed-by: Bibo Mao Tested-by: Bibo Mao --- MAINTAINERS| 2 -- hw/intc/loongson_ipi.c | 14 -- 2 files changed, 16 deletions(-) Reviewed-by: Richard Henderson r~
Re: [PATCH v4 14/18] hw/intc/loongson_ipi: Move common code to loongson_ipi_common.c
On 7/18/24 18:38, Philippe Mathieu-Daudé wrote: From: Bibo Mao Move the common code from loongson_ipi.c to loongson_ipi_common.c, call parent_realize() instead of loongson_ipi_common_realize() in loongson_ipi_realize(). Signed-off-by: Bibo Mao [PMD: Extracted from bigger commit, added commit description] Co-Developed-by: Philippe Mathieu-Daudé Signed-off-by: Philippe Mathieu-Daudé Reviewed-by: Bibo Mao Tested-by: Bibo Mao --- hw/intc/loongson_ipi.c| 269 + hw/intc/loongson_ipi_common.c | 272 ++ 2 files changed, 274 insertions(+), 267 deletions(-) Reviewed-by: Richard Henderson r~
Re: [PATCH v4 13/18] hw/intc/loongson_ipi: Expose loongson_ipi_core_read/write helpers
On 7/18/24 18:38, Philippe Mathieu-Daudé wrote: From: Bibo Mao In order to access loongson_ipi_core_read/write helpers from loongson_ipi_common.c in the next commit, make their prototype declaration public. Signed-off-by: Bibo Mao [PMD: Extracted from bigger commit, added commit description] Co-Developed-by: Philippe Mathieu-Daudé Signed-off-by: Philippe Mathieu-Daudé Reviewed-by: Bibo Mao Tested-by: Bibo Mao --- include/hw/intc/loongson_ipi_common.h | 6 ++ hw/intc/loongson_ipi.c| 10 -- 2 files changed, 10 insertions(+), 6 deletions(-) Reviewed-by: Richard Henderson r~
Re: [PATCH v4 12/18] hw/intc/loongson_ipi: Add LoongsonIPICommonClass::cpu_by_arch_id handler
On 7/18/24 18:38, Philippe Mathieu-Daudé wrote: From: Bibo Mao Allow Loongson IPI implementations to have their own cpu_by_arch_id() handler. Signed-off-by: Bibo Mao [PMD: Extracted from bigger commit, added commit description] Co-Developed-by: Philippe Mathieu-Daudé Signed-off-by: Philippe Mathieu-Daudé Reviewed-by: Bibo Mao Tested-by: Bibo Mao --- include/hw/intc/loongson_ipi_common.h | 1 + hw/intc/loongson_ipi.c| 10 +++--- 2 files changed, 8 insertions(+), 3 deletions(-) Reviewed-by: Richard Henderson r~
Re: [PATCH v4 11/18] hw/intc/loongson_ipi: Add LoongsonIPICommonClass::get_iocsr_as handler
On 7/18/24 18:38, Philippe Mathieu-Daudé wrote: From: Bibo Mao Allow Loongson IPI implementations to have their own get_iocsr_as() handler. Signed-off-by: Bibo Mao [PMD: Extracted from bigger commit, added commit description] Co-Developed-by: Philippe Mathieu-Daudé Signed-off-by: Philippe Mathieu-Daudé Reviewed-by: Bibo Mao Tested-by: Bibo Mao --- include/hw/intc/loongson_ipi_common.h | 2 ++ hw/intc/loongson_ipi.c| 16 2 files changed, 14 insertions(+), 4 deletions(-) Reviewed-by: Richard Henderson r~
Re: [PATCH v4 10/18] hw/intc/loongson_ipi: Pass LoongsonIPICommonState to send_ipi_data()
On 7/18/24 18:38, Philippe Mathieu-Daudé wrote: From: Bibo Mao In order to get LoongsonIPICommonClass in send_ipi_data() in the next commit, propagate LoongsonIPICommonState. Signed-off-by: Bibo Mao [PMD: Extracted from bigger commit, added commit description] Co-Developed-by: Philippe Mathieu-Daudé Signed-off-by: Philippe Mathieu-Daudé Reviewed-by: Bibo Mao Tested-by: Bibo Mao --- hw/intc/loongson_ipi.c | 19 +++ 1 file changed, 11 insertions(+), 8 deletions(-) Reviewed-by: Richard Henderson r~
Re: [PATCH v4 09/18] hw/intc/loongson_ipi: Move IPICore structure to loongson_ipi_common.h
On 7/18/24 18:38, Philippe Mathieu-Daudé wrote: From: Bibo Mao Move the IPICore structure and corresponding common fields of LoongsonIPICommonState to "hw/intc/loongson_ipi_common.h". Signed-off-by: Bibo Mao [PMD: Extracted from bigger commit, added commit description] Co-Developed-by: Philippe Mathieu-Daudé Signed-off-by: Philippe Mathieu-Daudé Reviewed-by: Bibo Mao Tested-by: Bibo Mao --- include/hw/intc/loongson_ipi.h| 17 include/hw/intc/loongson_ipi_common.h | 18 + hw/intc/loongson_ipi.c| 56 +-- hw/intc/loongson_ipi_common.c | 50 4 files changed, 77 insertions(+), 64 deletions(-) Reviewed-by: Richard Henderson r~
Re: [PATCH v4 08/18] hw/intc/loongson_ipi: Move IPICore::mmio_mem to LoongsonIPIState
On 7/18/24 18:38, Philippe Mathieu-Daudé wrote: From: Bibo Mao It is easier to manage one array of MMIO MR rather than one per vCPU. Signed-off-by: Bibo Mao [PMD: Extracted from bigger commit, added commit description] Co-Developed-by: Philippe Mathieu-Daudé Signed-off-by: Philippe Mathieu-Daudé Reviewed-by: Bibo Mao Tested-by: Bibo Mao --- include/hw/intc/loongson_ipi.h | 2 +- hw/intc/loongson_ipi.c | 9 ++--- 2 files changed, 7 insertions(+), 4 deletions(-) Reviewed-by: Richard Henderson r~
Re: [PATCH v4 06/18] hw/intc/loongson_ipi: Add TYPE_LOONGSON_IPI_COMMON stub
On 7/18/24 18:38, Philippe Mathieu-Daudé wrote: From: Bibo Mao Introduce LOONGSON_IPI_COMMON stubs, QDev parent of LOONGSON_IPI. Signed-off-by: Bibo Mao [PMD: Extracted from bigger commit, added commit description] Co-Developed-by: Philippe Mathieu-Daudé Signed-off-by: Philippe Mathieu-Daudé Reviewed-by: Bibo Mao Tested-by: Bibo Mao --- MAINTAINERS | 4 include/hw/intc/loongson_ipi.h| 13 +++-- include/hw/intc/loongson_ipi_common.h | 26 ++ hw/intc/loongson_ipi.c| 7 --- hw/intc/loongson_ipi_common.c | 22 ++ hw/intc/Kconfig | 4 hw/intc/meson.build | 1 + 7 files changed, 72 insertions(+), 5 deletions(-) create mode 100644 include/hw/intc/loongson_ipi_common.h create mode 100644 hw/intc/loongson_ipi_common.c Reviewed-by: Richard Henderson r~
Re: [PATCH v4 07/18] hw/intc/loongson_ipi: Move common definitions to loongson_ipi_common.h
On 7/18/24 18:38, Philippe Mathieu-Daudé wrote: From: Bibo Mao Signed-off-by: Bibo Mao [PMD: Extracted from bigger commit, added commit description] Co-Developed-by: Philippe Mathieu-Daudé Signed-off-by: Philippe Mathieu-Daudé Reviewed-by: Bibo Mao Tested-by: Bibo Mao --- include/hw/intc/loongson_ipi.h| 18 -- include/hw/intc/loongson_ipi_common.h | 19 +++ 2 files changed, 19 insertions(+), 18 deletions(-) Reviewed-by: Richard Henderson r~
Re: [PATCH v4 05/18] hw/intc/loongson_ipi: Extract loongson_ipi_common_realize()
On 7/18/24 18:38, Philippe Mathieu-Daudé wrote: From: Bibo Mao In preparation to extract common IPI code in few commits, extract loongson_ipi_common_realize(). Signed-off-by: Bibo Mao [PMD: Extracted from bigger commit, added commit description] Co-Developed-by: Philippe Mathieu-Daudé Signed-off-by: Philippe Mathieu-Daudé Reviewed-by: Bibo Mao Tested-by: Bibo Mao --- hw/intc/loongson_ipi.c | 25 ++--- 1 file changed, 18 insertions(+), 7 deletions(-) Reviewed-by: Richard Henderson r~
Re: [PATCH v4 04/18] hw/intc/loongson_ipi: Extract loongson_ipi_common_finalize()
On 7/18/24 18:38, Philippe Mathieu-Daudé wrote: From: Bibo Mao In preparation to extract common IPI code in few commits, extract loongson_ipi_common_finalize(). Signed-off-by: Bibo Mao [PMD: Extracted from bigger commit, added commit description] Co-Developed-by: Philippe Mathieu-Daudé Signed-off-by: Philippe Mathieu-Daudé Reviewed-by: Bibo Mao Tested-by: Bibo Mao --- hw/intc/loongson_ipi.c | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) Reviewed-by: Richard Henderson r~
Re: [PATCH v4 03/18] hw/intc/loongson_ipi: Rename LoongsonIPI -> LoongsonIPIState
On 7/18/24 18:38, Philippe Mathieu-Daudé wrote: From: Bibo Mao We'll have to add LoongsonIPIClass in few commits, so rename LoongsonIPI as LoongsonIPIState for clarity. Signed-off-by: Bibo Mao [PMD: Extracted from bigger commit, added commit description] Co-Developed-by: Philippe Mathieu-Daudé Signed-off-by: Philippe Mathieu-Daudé Reviewed-by: Bibo Mao Tested-by: Bibo Mao --- include/hw/intc/loongson_ipi.h | 6 +++--- hw/intc/loongson_ipi.c | 16 2 files changed, 11 insertions(+), 11 deletions(-) Reviewed-by: Richard Henderson r~
Re: [PATCH v4 01/18] hw/intc/loongson_ipi: Declare QOM types using DEFINE_TYPES() macro
On 7/18/24 18:38, Philippe Mathieu-Daudé wrote: When multiple QOM types are registered in the same file, it is simpler to use the the DEFINE_TYPES() macro. Replace the type_init() / type_register_static() combination. Signed-off-by: Philippe Mathieu-Daudé --- hw/intc/loongson_ipi.c | 21 + 1 file changed, 9 insertions(+), 12 deletions(-) Reviewed-by: Richard Henderson r~
Re: [PULL 00/30] riscv-to-apply queue
On 7/18/24 12:09, Alistair Francis wrote: The following changes since commit 58ee924b97d1c0898555647a31820c5a20d55a73: Merge tag 'for-upstream' ofhttps://gitlab.com/bonzini/qemu into staging (2024-07-17 15:40:28 +1000) are available in the Git repository at: https://github.com/alistair23/qemu.git tags/pull-riscv-to-apply-20240718-1 for you to fetch changes up to daff9f7f7a457f78ce455e6abf19c2a37dfe7630: roms/opensbi: Update to v1.5 (2024-07-18 12:08:45 +1000) RISC-V PR for 9.1 * Support the zimop, zcmop, zama16b and zabha extensions * Validate the mode when setting vstvec CSR * Add decode support for Zawrs extension * Update the KVM regs to Linux 6.10-rc5 * Add smcntrpmf extension support * Raise an exception when CSRRS/CSRRC writes a read-only CSR * Re-insert and deprecate 'riscv,delegate' in virt machine device tree * roms/opensbi: Update to v1.5 Applied, thanks. Please update https://wiki.qemu.org/ChangeLog/9.1 as appropriate. r~
Re: [PULL 00/16] Trivial patches for 2024-07-17
On 7/17/24 21:06, Michael Tokarev wrote: The following changes since commit e2f346aa98646e84eabe0256f89d08e89b1837cf: Merge tag 'sdmmc-20240716' ofhttps://github.com/philmd/qemu into staging (2024-07-17 07:59:31 +1000) are available in the Git repository at: https://gitlab.com/mjt0k/qemu.git tags/pull-trivial-patches for you to fetch changes up to 66a8de9889ceb929e2abe7fb0e424f45210d9dda: meson: Update meson-buildoptions.sh (2024-07-17 14:04:15 +0300) trivial patches for 2024-07-17 Applied, thanks. r~
Re: [PULL 00/14] QAPI patches patches for 2024-07-17
On 7/17/24 20:48, Markus Armbruster wrote: The following changes since commit e2f346aa98646e84eabe0256f89d08e89b1837cf: Merge tag 'sdmmc-20240716' ofhttps://github.com/philmd/qemu into staging (2024-07-17 07:59:31 +1000) are available in the Git repository at: https://repo.or.cz/qemu/armbru.git tags/pull-qapi-2024-07-17 for you to fetch changes up to 3c5f6114d9ffc70bc9b1a7cc072a911f966d: qapi: remove "Example" doc section (2024-07-17 10:20:54 +0200) QAPI patches patches for 2024-07-17 Applied, thanks. r~
Re: [PULL 00/20] i386, bugfix changes for QEMU 9.1 soft freeze
On 7/17/24 15:03, Paolo Bonzini wrote: The following changes since commit 959269e910944c03bc13f300d65bf08b060d5d0f: Merge tag 'python-pull-request' ofhttps://gitlab.com/jsnow/qemu into staging (2024-07-16 06:45:23 +1000) are available in the Git repository at: https://gitlab.com/bonzini/qemu.git tags/for-upstream for you to fetch changes up to 6a079f2e68e1832ebca0e7d64bc31ffebde9b2dd: target/i386/tcg: save current task state before loading new one (2024-07-16 18:18:25 +0200) * target/i386/tcg: fixes for seg_helper.c * SEV: Don't allow automatic fallback to legacy KVM_SEV_INIT, but also don't use it by default * scsi: honor bootindex again for legacy drives * hpet, utils, scsi, build, cpu: miscellaneous bugfixes Applied, thanks. r~
[PATCH 11/17] target/arm: Fix whitespace near gen_srshr64_i64
Signed-off-by: Richard Henderson --- target/arm/tcg/gengvec.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/target/arm/tcg/gengvec.c b/target/arm/tcg/gengvec.c index 47ac2634ce..b6c0d86bad 100644 --- a/target/arm/tcg/gengvec.c +++ b/target/arm/tcg/gengvec.c @@ -304,7 +304,7 @@ void gen_srshr32_i32(TCGv_i32 d, TCGv_i32 a, int32_t sh) tcg_gen_add_i32(d, d, t); } - void gen_srshr64_i64(TCGv_i64 d, TCGv_i64 a, int64_t sh) +void gen_srshr64_i64(TCGv_i64 d, TCGv_i64 a, int64_t sh) { TCGv_i64 t = tcg_temp_new_i64(); -- 2.43.0
[PATCH 16/17] target/arm: Convert SSHLL, USHLL to decodetree
Signed-off-by: Richard Henderson --- target/arm/tcg/translate-a64.c | 84 -- target/arm/tcg/a64.decode | 3 ++ 2 files changed, 43 insertions(+), 44 deletions(-) diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c index 627d4311bb..2a9cb3fbe0 100644 --- a/target/arm/tcg/translate-a64.c +++ b/target/arm/tcg/translate-a64.c @@ -6972,6 +6972,45 @@ TRANS(SRI_v, do_vec_shift_imm, a, gen_gvec_sri) TRANS(SHL_v, do_vec_shift_imm, a, tcg_gen_gvec_shli) TRANS(SLI_v, do_vec_shift_imm, a, gen_gvec_sli); +static bool do_vec_shift_imm_wide(DisasContext *s, arg_qrri_e *a, bool is_u) +{ +TCGv_i64 tcg_rn, tcg_rd; +int esz = a->esz; +int esize; + +if (esz < 0 || esz >= MO_64) { +return false; +} +if (!fp_access_check(s)) { +return true; +} + +/* + * For the LL variants the store is larger than the load, + * so if rd == rn we would overwrite parts of our input. + * So load everything right now and use shifts in the main loop. + */ +tcg_rd = tcg_temp_new_i64(); +tcg_rn = tcg_temp_new_i64(); +read_vec_element(s, tcg_rn, a->rn, a->q, MO_64); + +esize = 8 << esz; +for (int i = 0, elements = 8 >> esz; i < elements; i++) { +if (is_u) { +tcg_gen_extract_i64(tcg_rd, tcg_rn, i * esize, esize); +} else { +tcg_gen_sextract_i64(tcg_rd, tcg_rn, i * esize, esize); +} +tcg_gen_shli_i64(tcg_rd, tcg_rd, a->imm); +write_vec_element(s, tcg_rd, a->rd, i, esz + 1); +} +clear_vec_high(s, true, a->rd); +return true; +} + +TRANS(SSHLL_v, do_vec_shift_imm_wide, a, false) +TRANS(USHLL_v, do_vec_shift_imm_wide, a, true) + /* Shift a TCGv src by TCGv shift_amount, put result in dst. * Note that it is the caller's responsibility to ensure that the * shift amount is in range (ie 0..31 or 0..63) and provide the ARM @@ -10436,47 +10475,6 @@ static void disas_simd_scalar_two_reg_misc(DisasContext *s, uint32_t insn) } } -/* USHLL/SHLL - Vector shift left with widening */ -static void handle_vec_simd_wshli(DisasContext *s, bool is_q, bool is_u, - int immh, int immb, int opcode, int rn, int rd) -{ -int size = 32 - clz32(immh) - 1; -int immhb = immh << 3 | immb; -int shift = immhb - (8 << size); -int dsize = 64; -int esize = 8 << size; -int elements = dsize/esize; -TCGv_i64 tcg_rn = tcg_temp_new_i64(); -TCGv_i64 tcg_rd = tcg_temp_new_i64(); -int i; - -if (size >= 3) { -unallocated_encoding(s); -return; -} - -if (!fp_access_check(s)) { -return; -} - -/* For the LL variants the store is larger than the load, - * so if rd == rn we would overwrite parts of our input. - * So load everything right now and use shifts in the main loop. - */ -read_vec_element(s, tcg_rn, rn, is_q ? 1 : 0, MO_64); - -for (i = 0; i < elements; i++) { -if (is_u) { -tcg_gen_extract_i64(tcg_rd, tcg_rn, i * esize, esize); -} else { -tcg_gen_sextract_i64(tcg_rd, tcg_rn, i * esize, esize); -} -tcg_gen_shli_i64(tcg_rd, tcg_rd, shift); -write_vec_element(s, tcg_rd, rd, i, size + 1); -} -clear_vec_high(s, true, rd); -} - /* SHRN/RSHRN - Shift right with narrowing (and potential rounding) */ static void handle_vec_simd_shrn(DisasContext *s, bool is_q, int immh, int immb, int opcode, int rn, int rd) @@ -10566,9 +10564,6 @@ static void disas_simd_shift_imm(DisasContext *s, uint32_t insn) handle_vec_simd_sqshrn(s, false, is_q, is_u, is_u, immh, immb, opcode, rn, rd); break; -case 0x14: /* SSHLL / USHLL */ -handle_vec_simd_wshli(s, is_q, is_u, immh, immb, opcode, rn, rd); -break; case 0x1c: /* SCVTF / UCVTF */ handle_simd_shift_intfp_conv(s, false, is_q, is_u, immh, immb, opcode, rn, rd); @@ -10593,6 +10588,7 @@ static void disas_simd_shift_imm(DisasContext *s, uint32_t insn) case 0x06: /* SRSRA / URSRA (accum + rounding) */ case 0x08: /* SRI */ case 0x0a: /* SHL / SLI */ +case 0x14: /* SSHLL / USHLL */ unallocated_encoding(s); return; } diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode index 6aa8a18240..d13d680589 100644 --- a/target/arm/tcg/a64.decode +++ b/target/arm/tcg/a64.decode @@ -1218,5 +1218,8 @@ FMOVI_v_h 0 q:1 00 0 ... 11 . rd:5 %abcdefgh SHL_v 0.00 0 ... 01010 1 . . @qlshifti SLI_v 0.10 0 ... 01010 1 . . @qlshifti + +SSHLL_v 0.00 0 ... 10100 1 . . @qlshifti +USHLL_v 0.10 0 ... 10100 1 . . @qlshifti ] } -- 2.43.0
[PATCH 17/17] target/arm: Push tcg_rnd into handle_shri_with_rndacc
We always pass the same value for round; compute it within common code. Signed-off-by: Richard Henderson --- target/arm/tcg/translate-a64.c | 32 ++-- 1 file changed, 6 insertions(+), 26 deletions(-) diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c index 2a9cb3fbe0..f4ff698257 100644 --- a/target/arm/tcg/translate-a64.c +++ b/target/arm/tcg/translate-a64.c @@ -9197,11 +9197,10 @@ static void disas_data_proc_fp(DisasContext *s, uint32_t insn) * the vector and scalar code. */ static void handle_shri_with_rndacc(TCGv_i64 tcg_res, TCGv_i64 tcg_src, -TCGv_i64 tcg_rnd, bool accumulate, +bool round, bool accumulate, bool is_u, int size, int shift) { bool extended_result = false; -bool round = tcg_rnd != NULL; int ext_lshift = 0; TCGv_i64 tcg_src_hi; @@ -9219,6 +9218,7 @@ static void handle_shri_with_rndacc(TCGv_i64 tcg_res, TCGv_i64 tcg_src, /* Deal with the rounding step */ if (round) { +TCGv_i64 tcg_rnd = tcg_constant_i64(1ull << (shift - 1)); if (extended_result) { TCGv_i64 tcg_zero = tcg_constant_i64(0); if (!is_u) { @@ -9286,7 +9286,6 @@ static void handle_scalar_simd_shri(DisasContext *s, bool insert = false; TCGv_i64 tcg_rn; TCGv_i64 tcg_rd; -TCGv_i64 tcg_round; if (!extract32(immh, 3, 1)) { unallocated_encoding(s); @@ -9312,12 +9311,6 @@ static void handle_scalar_simd_shri(DisasContext *s, break; } -if (round) { -tcg_round = tcg_constant_i64(1ULL << (shift - 1)); -} else { -tcg_round = NULL; -} - tcg_rn = read_fp_dreg(s, rn); tcg_rd = (accumulate || insert) ? read_fp_dreg(s, rd) : tcg_temp_new_i64(); @@ -9331,7 +9324,7 @@ static void handle_scalar_simd_shri(DisasContext *s, tcg_gen_deposit_i64(tcg_rd, tcg_rd, tcg_rn, 0, esize - shift); } } else { -handle_shri_with_rndacc(tcg_rd, tcg_rn, tcg_round, +handle_shri_with_rndacc(tcg_rd, tcg_rn, round, accumulate, is_u, size, shift); } @@ -9384,7 +9377,7 @@ static void handle_vec_simd_sqshrn(DisasContext *s, bool is_scalar, bool is_q, int elements = is_scalar ? 1 : (64 / esize); bool round = extract32(opcode, 0, 1); MemOp ldop = (size + 1) | (is_u_shift ? 0 : MO_SIGN); -TCGv_i64 tcg_rn, tcg_rd, tcg_round; +TCGv_i64 tcg_rn, tcg_rd; TCGv_i32 tcg_rd_narrowed; TCGv_i64 tcg_final; @@ -9429,15 +9422,9 @@ static void handle_vec_simd_sqshrn(DisasContext *s, bool is_scalar, bool is_q, tcg_rd_narrowed = tcg_temp_new_i32(); tcg_final = tcg_temp_new_i64(); -if (round) { -tcg_round = tcg_constant_i64(1ULL << (shift - 1)); -} else { -tcg_round = NULL; -} - for (i = 0; i < elements; i++) { read_vec_element(s, tcg_rn, rn, i, ldop); -handle_shri_with_rndacc(tcg_rd, tcg_rn, tcg_round, +handle_shri_with_rndacc(tcg_rd, tcg_rn, round, false, is_u_shift, size+1, shift); narrowfn(tcg_rd_narrowed, tcg_env, tcg_rd); tcg_gen_extu_i32_i64(tcg_rd, tcg_rd_narrowed); @@ -10487,7 +10474,6 @@ static void handle_vec_simd_shrn(DisasContext *s, bool is_q, int shift = (2 * esize) - immhb; bool round = extract32(opcode, 0, 1); TCGv_i64 tcg_rn, tcg_rd, tcg_final; -TCGv_i64 tcg_round; int i; if (extract32(immh, 3, 1)) { @@ -10504,15 +10490,9 @@ static void handle_vec_simd_shrn(DisasContext *s, bool is_q, tcg_final = tcg_temp_new_i64(); read_vec_element(s, tcg_final, rd, is_q ? 1 : 0, MO_64); -if (round) { -tcg_round = tcg_constant_i64(1ULL << (shift - 1)); -} else { -tcg_round = NULL; -} - for (i = 0; i < elements; i++) { read_vec_element(s, tcg_rn, rn, i, size+1); -handle_shri_with_rndacc(tcg_rd, tcg_rn, tcg_round, +handle_shri_with_rndacc(tcg_rd, tcg_rn, round, false, true, size+1, shift); tcg_gen_deposit_i64(tcg_final, tcg_final, tcg_rd, esize * i, esize); -- 2.43.0
[PATCH 08/17] target/arm: Convert FMOVI (scalar, immediate) to decodetree
Signed-off-by: Richard Henderson --- target/arm/tcg/translate-a64.c | 74 -- target/arm/tcg/a64.decode | 4 ++ 2 files changed, 30 insertions(+), 48 deletions(-) diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c index 2964279c00..6582816e4e 100644 --- a/target/arm/tcg/translate-a64.c +++ b/target/arm/tcg/translate-a64.c @@ -6847,6 +6847,31 @@ TRANS(FMINNMV_s, do_fp_reduction, a, gen_helper_vfp_minnums) TRANS(FMAXV_s, do_fp_reduction, a, gen_helper_vfp_maxs) TRANS(FMINV_s, do_fp_reduction, a, gen_helper_vfp_mins) +/* + * Floating-point Immediate + */ + +static bool trans_FMOVI_s(DisasContext *s, arg_FMOVI_s *a) +{ +switch (a->esz) { +case MO_32: +case MO_64: +break; +case MO_16: +if (!dc_isar_feature(aa64_fp16, s)) { +return false; +} +break; +default: +return false; +} +if (fp_access_check(s)) { +uint64_t imm = vfp_expand_imm(a->esz, a->imm); +write_fp_dreg(s, a->rd, tcg_constant_i64(imm)); +} +return true; +} + /* Shift a TCGv src by TCGv shift_amount, put result in dst. * Note that it is the caller's responsibility to ensure that the * shift amount is in range (ie 0..31 or 0..63) and provide the ARM @@ -8584,53 +8609,6 @@ static void disas_fp_1src(DisasContext *s, uint32_t insn) } } -/* Floating point immediate - * 31 30 29 28 24 23 22 21 2013 12 10 95 40 - * +---+---+---+---+--+---++---+--+--+ - * | M | 0 | S | 1 1 1 1 0 | type | 1 |imm8| 1 0 0 | imm5 | Rd | - * +---+---+---+---+--+---++---+--+--+ - */ -static void disas_fp_imm(DisasContext *s, uint32_t insn) -{ -int rd = extract32(insn, 0, 5); -int imm5 = extract32(insn, 5, 5); -int imm8 = extract32(insn, 13, 8); -int type = extract32(insn, 22, 2); -int mos = extract32(insn, 29, 3); -uint64_t imm; -MemOp sz; - -if (mos || imm5) { -unallocated_encoding(s); -return; -} - -switch (type) { -case 0: -sz = MO_32; -break; -case 1: -sz = MO_64; -break; -case 3: -sz = MO_16; -if (dc_isar_feature(aa64_fp16, s)) { -break; -} -/* fallthru */ -default: -unallocated_encoding(s); -return; -} - -if (!fp_access_check(s)) { -return; -} - -imm = vfp_expand_imm(sz, imm8); -write_fp_dreg(s, rd, tcg_constant_i64(imm)); -} - /* Handle floating point <=> fixed point conversions. Note that we can * also deal with fp <=> integer conversions as a special case (scale == 64) * OPTME: consider handling that special case specially or at least skipping @@ -9050,7 +9028,7 @@ static void disas_data_proc_fp(DisasContext *s, uint32_t insn) switch (ctz32(extract32(insn, 12, 4))) { case 0: /* [15:12] == xxx1 */ /* Floating point immediate */ -disas_fp_imm(s, insn); +unallocated_encoding(s); /* in decodetree */ break; case 1: /* [15:12] == xx10 */ /* Floating point compare */ diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode index 117269803d..de763d3f12 100644 --- a/target/arm/tcg/a64.decode +++ b/target/arm/tcg/a64.decode @@ -1180,3 +1180,7 @@ FMAXV_s 0110 1110 00 11000 0 10 . . @rr_q1e2 FMINV_h 0.00 1110 10 11000 0 10 . . @qrr_h FMINV_s 0110 1110 10 11000 0 10 . . @rr_q1e2 + +# Floating-point Immediate + +FMOVI_s 0001 1110 .. 1 imm:8 100 0 rd:5 esz=%esz_hsd -- 2.43.0
[PATCH 13/17] target/arm: Convet handle_vec_simd_shli to decodetree
This includes SHL and SLI. Signed-off-by: Richard Henderson --- target/arm/tcg/translate-a64.c | 40 +- target/arm/tcg/a64.decode | 6 + 2 files changed, 16 insertions(+), 30 deletions(-) diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c index 1e482477c5..fd90752dee 100644 --- a/target/arm/tcg/translate-a64.c +++ b/target/arm/tcg/translate-a64.c @@ -84,6 +84,13 @@ static int rcount_immhb(DisasContext *s, int x) return (16 << size) - x; } +/* For Advanced SIMD shift by immediate, left shift count. */ +static int lcount_immhb(DisasContext *s, int x) +{ +int size = esz_immh(s, x >> 3); +return x - (8 << size); +} + /* * Include the generated decoders. */ @@ -6962,6 +6969,8 @@ TRANS(URSHR_v, do_vec_shift_imm, a, gen_gvec_urshr) TRANS(SRSRA_v, do_vec_shift_imm, a, gen_gvec_srsra) TRANS(URSRA_v, do_vec_shift_imm, a, gen_gvec_ursra) TRANS(SRI_v, do_vec_shift_imm, a, gen_gvec_sri) +TRANS(SHL_v, do_vec_shift_imm, a, tcg_gen_gvec_shli) +TRANS(SLI_v, do_vec_shift_imm, a, gen_gvec_sli); /* Shift a TCGv src by TCGv shift_amount, put result in dst. * Note that it is the caller's responsibility to ensure that the @@ -10427,33 +10436,6 @@ static void disas_simd_scalar_two_reg_misc(DisasContext *s, uint32_t insn) } } -/* SHL/SLI - Vector shift left */ -static void handle_vec_simd_shli(DisasContext *s, bool is_q, bool insert, - int immh, int immb, int opcode, int rn, int rd) -{ -int size = 32 - clz32(immh) - 1; -int immhb = immh << 3 | immb; -int shift = immhb - (8 << size); - -/* Range of size is limited by decode: immh is a non-zero 4 bit field */ -assert(size >= 0 && size <= 3); - -if (extract32(immh, 3, 1) && !is_q) { -unallocated_encoding(s); -return; -} - -if (!fp_access_check(s)) { -return; -} - -if (insert) { -gen_gvec_fn2i(s, is_q, rd, rn, shift, gen_gvec_sli, size); -} else { -gen_gvec_fn2i(s, is_q, rd, rn, shift, tcg_gen_gvec_shli, size); -} -} - /* USHLL/SHLL - Vector shift left with widening */ static void handle_vec_simd_wshli(DisasContext *s, bool is_q, bool is_u, int immh, int immb, int opcode, int rn, int rd) @@ -10566,9 +10548,6 @@ static void disas_simd_shift_imm(DisasContext *s, uint32_t insn) } switch (opcode) { -case 0x0a: /* SHL / SLI */ -handle_vec_simd_shli(s, is_q, is_u, immh, immb, opcode, rn, rd); -break; case 0x10: /* SHRN */ case 0x11: /* RSHRN / SQRSHRUN */ if (is_u) { @@ -10609,6 +10588,7 @@ static void disas_simd_shift_imm(DisasContext *s, uint32_t insn) case 0x04: /* SRSHR / URSHR (rounding) */ case 0x06: /* SRSRA / URSRA (accum + rounding) */ case 0x08: /* SRI */ +case 0x0a: /* SHL / SLI */ unallocated_encoding(s); return; } diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode index c525f5fc35..6aa8a18240 100644 --- a/target/arm/tcg/a64.decode +++ b/target/arm/tcg/a64.decode @@ -1191,9 +1191,12 @@ FMOVI_s 0001 1110 .. 1 imm:8 100 0 rd:5 esz=%esz_hsd %abcdefgh 16:3 5:5 %esz_immh 19:4 !function=esz_immh %rcount_immhb 16:7 !function=rcount_immhb +%lcount_immhb 16:7 !function=lcount_immhb @qrshifti . q:1 .. . ... . . rn:5 rd:5 \ _e esz=%esz_immh imm=%rcount_immhb +@qlshifti . q:1 .. . ... . . rn:5 rd:5 \ +_e esz=%esz_immh imm=%lcount_immhb FMOVI_v_h 0 q:1 00 0 ... 11 . rd:5 %abcdefgh @@ -1212,5 +1215,8 @@ FMOVI_v_h 0 q:1 00 0 ... 11 . rd:5 %abcdefgh SRSRA_v 0.00 0 ... 00110 1 . . @qrshifti URSRA_v 0.10 0 ... 00110 1 . . @qrshifti SRI_v 0.10 0 ... 01000 1 . . @qrshifti + +SHL_v 0.00 0 ... 01010 1 . . @qlshifti +SLI_v 0.10 0 ... 01010 1 . . @qlshifti ] } -- 2.43.0
[PATCH 15/17] target/arm: Use {,s}extract in handle_vec_simd_wshli
Combine the right shift with the extension via the tcg extract operations. Signed-off-by: Richard Henderson --- target/arm/tcg/translate-a64.c | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c index d0ad6c90bc..627d4311bb 100644 --- a/target/arm/tcg/translate-a64.c +++ b/target/arm/tcg/translate-a64.c @@ -10466,8 +10466,11 @@ static void handle_vec_simd_wshli(DisasContext *s, bool is_q, bool is_u, read_vec_element(s, tcg_rn, rn, is_q ? 1 : 0, MO_64); for (i = 0; i < elements; i++) { -tcg_gen_shri_i64(tcg_rd, tcg_rn, i * esize); -ext_and_shift_reg(tcg_rd, tcg_rd, size | (!is_u << 2), 0); +if (is_u) { +tcg_gen_extract_i64(tcg_rd, tcg_rn, i * esize, esize); +} else { +tcg_gen_sextract_i64(tcg_rd, tcg_rn, i * esize, esize); +} tcg_gen_shli_i64(tcg_rd, tcg_rd, shift); write_vec_element(s, tcg_rd, rd, i, size + 1); } -- 2.43.0
[PATCH 05/17] target/arm: Simplify do_reduction_op
Use simple shift and add instead of ctpop, ctz, shift and mask. Unlike SVE, there is no predicate to disable elements. Signed-off-by: Richard Henderson --- target/arm/tcg/translate-a64.c | 40 +++--- 1 file changed, 13 insertions(+), 27 deletions(-) diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c index e0314a1253..6d2e1a2d80 100644 --- a/target/arm/tcg/translate-a64.c +++ b/target/arm/tcg/translate-a64.c @@ -8986,34 +8986,23 @@ static void disas_data_proc_fp(DisasContext *s, uint32_t insn) * important for correct NaN propagation that we do these * operations in exactly the order specified by the pseudocode. * - * This is a recursive function, TCG temps should be freed by the - * calling function once it is done with the values. + * This is a recursive function. */ static TCGv_i32 do_reduction_op(DisasContext *s, int fpopcode, int rn, -int esize, int size, int vmap, TCGv_ptr fpst) +MemOp esz, int ebase, int ecount, TCGv_ptr fpst) { -if (esize == size) { -int element; -MemOp msize = esize == 16 ? MO_16 : MO_32; -TCGv_i32 tcg_elem; - -/* We should have one register left here */ -assert(ctpop8(vmap) == 1); -element = ctz32(vmap); -assert(element < 8); - -tcg_elem = tcg_temp_new_i32(); -read_vec_element_i32(s, tcg_elem, rn, element, msize); +if (ecount == 1) { +TCGv_i32 tcg_elem = tcg_temp_new_i32(); +read_vec_element_i32(s, tcg_elem, rn, ebase, esz); return tcg_elem; } else { -int bits = size / 2; -int shift = ctpop8(vmap) / 2; -int vmap_lo = (vmap >> shift) & vmap; -int vmap_hi = (vmap & ~vmap_lo); +int half = ecount >> 1; TCGv_i32 tcg_hi, tcg_lo, tcg_res; -tcg_hi = do_reduction_op(s, fpopcode, rn, esize, bits, vmap_hi, fpst); -tcg_lo = do_reduction_op(s, fpopcode, rn, esize, bits, vmap_lo, fpst); +tcg_hi = do_reduction_op(s, fpopcode, rn, esz, + ebase + half, half, fpst); +tcg_lo = do_reduction_op(s, fpopcode, rn, esz, + ebase, half, fpst); tcg_res = tcg_temp_new_i32(); switch (fpopcode) { @@ -9064,7 +9053,6 @@ static void disas_simd_across_lanes(DisasContext *s, uint32_t insn) bool is_u = extract32(insn, 29, 1); bool is_fp = false; bool is_min = false; -int esize; int elements; int i; TCGv_i64 tcg_res, tcg_elt; @@ -9111,8 +9099,7 @@ static void disas_simd_across_lanes(DisasContext *s, uint32_t insn) return; } -esize = 8 << size; -elements = (is_q ? 128 : 64) / esize; +elements = (is_q ? 16 : 8) >> size; tcg_res = tcg_temp_new_i64(); tcg_elt = tcg_temp_new_i64(); @@ -9167,9 +9154,8 @@ static void disas_simd_across_lanes(DisasContext *s, uint32_t insn) */ TCGv_ptr fpst = fpstatus_ptr(size == MO_16 ? FPST_FPCR_F16 : FPST_FPCR); int fpopcode = opcode | is_min << 4 | is_u << 5; -int vmap = (1 << elements) - 1; -TCGv_i32 tcg_res32 = do_reduction_op(s, fpopcode, rn, esize, - (is_q ? 128 : 64), vmap, fpst); +TCGv_i32 tcg_res32 = do_reduction_op(s, fpopcode, rn, size, + 0, elements, fpst); tcg_gen_extu_i32_i64(tcg_res, tcg_res32); } -- 2.43.0
[PATCH 06/17] target/arm: Convert ADDV, *ADDLV, *MAXV, *MINV to decodetree
Signed-off-by: Richard Henderson --- target/arm/tcg/translate-a64.c | 140 - target/arm/tcg/a64.decode | 12 +++ 2 files changed, 61 insertions(+), 91 deletions(-) diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c index 6d2e1a2d80..055ba4695e 100644 --- a/target/arm/tcg/translate-a64.c +++ b/target/arm/tcg/translate-a64.c @@ -6753,6 +6753,47 @@ TRANS(FNMADD, do_fmadd, a, true, true) TRANS(FMSUB, do_fmadd, a, false, true) TRANS(FNMSUB, do_fmadd, a, true, false) +/* + * Advanced SIMD Across Lanes + */ + +static bool do_int_reduction(DisasContext *s, arg_qrr_e *a, bool widen, + MemOp src_sign, NeonGenTwo64OpFn *fn) +{ +TCGv_i64 tcg_res, tcg_elt; +MemOp src_mop = a->esz | src_sign; +int elements = (a->q ? 16 : 8) >> a->esz; + +/* Reject MO_64, and MO_32 without Q: a minimum of 4 elements. */ +if (elements < 4) { +return false; +} +if (!fp_access_check(s)) { +return true; +} + +tcg_res = tcg_temp_new_i64(); +tcg_elt = tcg_temp_new_i64(); + +read_vec_element(s, tcg_res, a->rn, 0, src_mop); +for (int i = 1; i < elements; i++) { +read_vec_element(s, tcg_elt, a->rn, i, src_mop); +fn(tcg_res, tcg_res, tcg_elt); +} + +tcg_gen_ext_i64(tcg_res, tcg_res, a->esz + widen); +write_fp_dreg(s, a->rd, tcg_res); +return true; +} + +TRANS(ADDV, do_int_reduction, a, false, 0, tcg_gen_add_i64) +TRANS(SADDLV, do_int_reduction, a, true, MO_SIGN, tcg_gen_add_i64) +TRANS(UADDLV, do_int_reduction, a, true, 0, tcg_gen_add_i64) +TRANS(SMAXV, do_int_reduction, a, false, MO_SIGN, tcg_gen_smax_i64) +TRANS(UMAXV, do_int_reduction, a, false, 0, tcg_gen_umax_i64) +TRANS(SMINV, do_int_reduction, a, false, MO_SIGN, tcg_gen_smin_i64) +TRANS(UMINV, do_int_reduction, a, false, 0, tcg_gen_umin_i64) + /* Shift a TCGv src by TCGv shift_amount, put result in dst. * Note that it is the caller's responsibility to ensure that the * shift amount is in range (ie 0..31 or 0..63) and provide the ARM @@ -9051,27 +9092,10 @@ static void disas_simd_across_lanes(DisasContext *s, uint32_t insn) int opcode = extract32(insn, 12, 5); bool is_q = extract32(insn, 30, 1); bool is_u = extract32(insn, 29, 1); -bool is_fp = false; bool is_min = false; int elements; -int i; -TCGv_i64 tcg_res, tcg_elt; switch (opcode) { -case 0x1b: /* ADDV */ -if (is_u) { -unallocated_encoding(s); -return; -} -/* fall through */ -case 0x3: /* SADDLV, UADDLV */ -case 0xa: /* SMAXV, UMAXV */ -case 0x1a: /* SMINV, UMINV */ -if (size == 3 || (size == 2 && !is_q)) { -unallocated_encoding(s); -return; -} -break; case 0xc: /* FMAXNMV, FMINNMV */ case 0xf: /* FMAXV, FMINV */ /* Bit 1 of size field encodes min vs max and the actual size @@ -9080,7 +9104,6 @@ static void disas_simd_across_lanes(DisasContext *s, uint32_t insn) * precision. */ is_min = extract32(size, 1, 1); -is_fp = true; if (!is_u && dc_isar_feature(aa64_fp16, s)) { size = 1; } else if (!is_u || !is_q || extract32(size, 0, 1)) { @@ -9091,6 +9114,10 @@ static void disas_simd_across_lanes(DisasContext *s, uint32_t insn) } break; default: +case 0x3: /* SADDLV, UADDLV */ +case 0xa: /* SMAXV, UMAXV */ +case 0x1a: /* SMINV, UMINV */ +case 0x1b: /* ADDV */ unallocated_encoding(s); return; } @@ -9101,52 +9128,7 @@ static void disas_simd_across_lanes(DisasContext *s, uint32_t insn) elements = (is_q ? 16 : 8) >> size; -tcg_res = tcg_temp_new_i64(); -tcg_elt = tcg_temp_new_i64(); - -/* These instructions operate across all lanes of a vector - * to produce a single result. We can guarantee that a 64 - * bit intermediate is sufficient: - * + for [US]ADDLV the maximum element size is 32 bits, and - *the result type is 64 bits - * + for FMAX*V, FMIN*V, ADDV the intermediate type is the - *same as the element size, which is 32 bits at most - * For the integer operations we can choose to work at 64 - * or 32 bits and truncate at the end; for simplicity - * we use 64 bits always. The floating point - * ops do require 32 bit intermediates, though. - */ -if (!is_fp) { -read_vec_element(s, tcg_res, rn, 0, size | (is_u ? 0 : MO_SIGN)); - -for (i = 1; i < elements; i++) { -read_vec_element(s, tcg_elt, rn, i, size | (is_u ? 0 : MO_SIGN)); - -switch (opcode) { -case 0x03: /* SADDLV / UADDLV */ -case 0x1b: /* ADDV */ -tcg_gen_add_i64(tcg_res, tcg_res, tcg_elt); -break; -case 0x0a: /* SMAXV / UMAXV */ -
[PATCH 10/17] target/arm: Introduce gen_gvec_sshr, gen_gvec_ushr
Handle the two special cases within these new functions instead of higher in the call stack. Signed-off-by: Richard Henderson --- target/arm/tcg/translate.h | 5 + target/arm/tcg/gengvec.c| 19 +++ target/arm/tcg/translate-a64.c | 16 +--- target/arm/tcg/translate-neon.c | 25 ++--- 4 files changed, 27 insertions(+), 38 deletions(-) diff --git a/target/arm/tcg/translate.h b/target/arm/tcg/translate.h index a8672c857c..d1a836ca6f 100644 --- a/target/arm/tcg/translate.h +++ b/target/arm/tcg/translate.h @@ -514,6 +514,11 @@ void gen_sqsub_d(TCGv_i64 d, TCGv_i64 q, TCGv_i64 a, TCGv_i64 b); void gen_gvec_sqsub_qc(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs, uint32_t rm_ofs, uint32_t opr_sz, uint32_t max_sz); +void gen_gvec_sshr(unsigned vece, uint32_t rd_ofs, uint32_t rm_ofs, + int64_t shift, uint32_t opr_sz, uint32_t max_sz); +void gen_gvec_ushr(unsigned vece, uint32_t rd_ofs, uint32_t rm_ofs, + int64_t shift, uint32_t opr_sz, uint32_t max_sz); + void gen_gvec_ssra(unsigned vece, uint32_t rd_ofs, uint32_t rm_ofs, int64_t shift, uint32_t opr_sz, uint32_t max_sz); void gen_gvec_usra(unsigned vece, uint32_t rd_ofs, uint32_t rm_ofs, diff --git a/target/arm/tcg/gengvec.c b/target/arm/tcg/gengvec.c index 56a1dc1f75..47ac2634ce 100644 --- a/target/arm/tcg/gengvec.c +++ b/target/arm/tcg/gengvec.c @@ -88,6 +88,25 @@ GEN_CMP0(gen_gvec_cgt0, TCG_COND_GT) #undef GEN_CMP0 +void gen_gvec_sshr(unsigned vece, uint32_t rd_ofs, uint32_t rm_ofs, + int64_t shift, uint32_t opr_sz, uint32_t max_sz) +{ +/* Signed shift out of range results in all-sign-bits */ +shift = MIN(shift, (8 << vece) - 1); +tcg_gen_gvec_sari(vece, rd_ofs, rm_ofs, shift, opr_sz, max_sz); +} + +void gen_gvec_ushr(unsigned vece, uint32_t rd_ofs, uint32_t rm_ofs, + int64_t shift, uint32_t opr_sz, uint32_t max_sz) +{ +/* Unsigned shift out of range results in all-zero-bits */ +if (shift >= (8 << vece)) { +tcg_gen_gvec_dup_imm(vece, rd_ofs, opr_sz, max_sz, 0); +} else { +tcg_gen_gvec_shri(vece, rd_ofs, rm_ofs, shift, opr_sz, max_sz); +} +} + static void gen_ssra8_i64(TCGv_i64 d, TCGv_i64 a, int64_t shift) { tcg_gen_vec_sar8i_i64(a, a, shift); diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c index 1fa9dc3172..d0a3450d75 100644 --- a/target/arm/tcg/translate-a64.c +++ b/target/arm/tcg/translate-a64.c @@ -10411,21 +10411,7 @@ static void handle_vec_simd_shri(DisasContext *s, bool is_q, bool is_u, break; case 0x00: /* SSHR / USHR */ -if (is_u) { -if (shift == 8 << size) { -/* Shift count the same size as element size produces zero. */ -tcg_gen_gvec_dup_imm(size, vec_full_reg_offset(s, rd), - is_q ? 16 : 8, vec_full_reg_size(s), 0); -return; -} -gvec_fn = tcg_gen_gvec_shri; -} else { -/* Shift count the same size as element size produces all sign. */ -if (shift == 8 << size) { -shift -= 1; -} -gvec_fn = tcg_gen_gvec_sari; -} +gvec_fn = is_u ? gen_gvec_ushr : gen_gvec_sshr; break; case 0x04: /* SRSHR / URSHR (rounding) */ diff --git a/target/arm/tcg/translate-neon.c b/target/arm/tcg/translate-neon.c index 915c9e56db..05d4016633 100644 --- a/target/arm/tcg/translate-neon.c +++ b/target/arm/tcg/translate-neon.c @@ -1068,29 +1068,8 @@ DO_2SH(VRSHR_S, gen_gvec_srshr) DO_2SH(VRSHR_U, gen_gvec_urshr) DO_2SH(VRSRA_S, gen_gvec_srsra) DO_2SH(VRSRA_U, gen_gvec_ursra) - -static bool trans_VSHR_S_2sh(DisasContext *s, arg_2reg_shift *a) -{ -/* Signed shift out of range results in all-sign-bits */ -a->shift = MIN(a->shift, (8 << a->size) - 1); -return do_vector_2sh(s, a, tcg_gen_gvec_sari); -} - -static void gen_zero_rd_2sh(unsigned vece, uint32_t rd_ofs, uint32_t rm_ofs, -int64_t shift, uint32_t oprsz, uint32_t maxsz) -{ -tcg_gen_gvec_dup_imm(vece, rd_ofs, oprsz, maxsz, 0); -} - -static bool trans_VSHR_U_2sh(DisasContext *s, arg_2reg_shift *a) -{ -/* Shift out of range is architecturally valid and results in zero. */ -if (a->shift >= (8 << a->size)) { -return do_vector_2sh(s, a, gen_zero_rd_2sh); -} else { -return do_vector_2sh(s, a, tcg_gen_gvec_shri); -} -} +DO_2SH(VSHR_S, gen_gvec_sshr) +DO_2SH(VSHR_U, gen_gvec_ushr) static bool do_2shift_env_64(DisasContext *s, arg_2reg_shift *a, NeonGenTwo64OpEnvFn *fn) -- 2.43.0
[PATCH 12/17] target/arm: Convert handle_vec_simd_shri to decodetree
This includes SSHR, USHR, SSRA, USRA, SRSHR, URSHR, SRSRA, URSRA, SRI. Signed-off-by: Richard Henderson --- target/arm/tcg/translate-a64.c | 109 +++-- target/arm/tcg/a64.decode | 27 +++- 2 files changed, 74 insertions(+), 62 deletions(-) diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c index d0a3450d75..1e482477c5 100644 --- a/target/arm/tcg/translate-a64.c +++ b/target/arm/tcg/translate-a64.c @@ -68,6 +68,22 @@ static int scale_by_log2_tag_granule(DisasContext *s, int x) return x << LOG2_TAG_GRANULE; } +/* + * For Advanced SIMD shift by immediate, extract esz from immh. + * The result must be validated by the translator: MO_8 <= x <= MO_64. + */ +static int esz_immh(DisasContext *s, int x) +{ +return 32 - clz32(x) - 1; +} + +/* For Advanced SIMD shift by immediate, right shift count. */ +static int rcount_immhb(DisasContext *s, int x) +{ +int size = esz_immh(s, x >> 3); +return (16 << size) - x; +} + /* * Include the generated decoders. */ @@ -6918,6 +6934,35 @@ static bool trans_Vimm(DisasContext *s, arg_Vimm *a) return true; } +/* + * Advanced SIMD Shift by Immediate + */ + +static bool do_vec_shift_imm(DisasContext *s, arg_qrri_e *a, GVecGen2iFn *fn) +{ +/* Validate result of esz_immh, for invalid immh == 0. */ +if (a->esz < 0) { +return false; +} +if (a->esz == MO_64 && !a->q) { +return false; +} +if (fp_access_check(s)) { +gen_gvec_fn2i(s, a->q, a->rd, a->rn, a->imm, fn, a->esz); +} +return true; +} + +TRANS(SSHR_v, do_vec_shift_imm, a, gen_gvec_sshr) +TRANS(USHR_v, do_vec_shift_imm, a, gen_gvec_ushr) +TRANS(SSRA_v, do_vec_shift_imm, a, gen_gvec_ssra) +TRANS(USRA_v, do_vec_shift_imm, a, gen_gvec_usra) +TRANS(SRSHR_v, do_vec_shift_imm, a, gen_gvec_srshr) +TRANS(URSHR_v, do_vec_shift_imm, a, gen_gvec_urshr) +TRANS(SRSRA_v, do_vec_shift_imm, a, gen_gvec_srsra) +TRANS(URSRA_v, do_vec_shift_imm, a, gen_gvec_ursra) +TRANS(SRI_v, do_vec_shift_imm, a, gen_gvec_sri) + /* Shift a TCGv src by TCGv shift_amount, put result in dst. * Note that it is the caller's responsibility to ensure that the * shift amount is in range (ie 0..31 or 0..63) and provide the ARM @@ -10382,53 +10427,6 @@ static void disas_simd_scalar_two_reg_misc(DisasContext *s, uint32_t insn) } } -/* SSHR[RA]/USHR[RA] - Vector shift right (optional rounding/accumulate) */ -static void handle_vec_simd_shri(DisasContext *s, bool is_q, bool is_u, - int immh, int immb, int opcode, int rn, int rd) -{ -int size = 32 - clz32(immh) - 1; -int immhb = immh << 3 | immb; -int shift = 2 * (8 << size) - immhb; -GVecGen2iFn *gvec_fn; - -if (extract32(immh, 3, 1) && !is_q) { -unallocated_encoding(s); -return; -} -tcg_debug_assert(size <= 3); - -if (!fp_access_check(s)) { -return; -} - -switch (opcode) { -case 0x02: /* SSRA / USRA (accumulate) */ -gvec_fn = is_u ? gen_gvec_usra : gen_gvec_ssra; -break; - -case 0x08: /* SRI */ -gvec_fn = gen_gvec_sri; -break; - -case 0x00: /* SSHR / USHR */ -gvec_fn = is_u ? gen_gvec_ushr : gen_gvec_sshr; -break; - -case 0x04: /* SRSHR / URSHR (rounding) */ -gvec_fn = is_u ? gen_gvec_urshr : gen_gvec_srshr; -break; - -case 0x06: /* SRSRA / URSRA (accum + rounding) */ -gvec_fn = is_u ? gen_gvec_ursra : gen_gvec_srsra; -break; - -default: -g_assert_not_reached(); -} - -gen_gvec_fn2i(s, is_q, rd, rn, shift, gvec_fn, size); -} - /* SHL/SLI - Vector shift left */ static void handle_vec_simd_shli(DisasContext *s, bool is_q, bool insert, int immh, int immb, int opcode, int rn, int rd) @@ -10568,18 +10566,6 @@ static void disas_simd_shift_imm(DisasContext *s, uint32_t insn) } switch (opcode) { -case 0x08: /* SRI */ -if (!is_u) { -unallocated_encoding(s); -return; -} -/* fall through */ -case 0x00: /* SSHR / USHR */ -case 0x02: /* SSRA / USRA (accumulate) */ -case 0x04: /* SRSHR / URSHR (rounding) */ -case 0x06: /* SRSRA / URSRA (accum + rounding) */ -handle_vec_simd_shri(s, is_q, is_u, immh, immb, opcode, rn, rd); -break; case 0x0a: /* SHL / SLI */ handle_vec_simd_shli(s, is_q, is_u, immh, immb, opcode, rn, rd); break; @@ -10618,6 +10604,11 @@ static void disas_simd_shift_imm(DisasContext *s, uint32_t insn) handle_simd_shift_fpint_conv(s, false, is_q, is_u, immh, immb, rn, rd); return; default: +case 0x00: /* SSHR / USHR */ +case 0x02: /* SSRA / USRA (accumulate) */ +case 0x04: /* SRSHR / URSHR (rounding) */ +case 0x06: /* SRSRA / URSRA (accum + rounding) */ +case 0x08: /* SR
[PATCH 14/17] target/arm: Clear high SVE elements in handle_vec_simd_wshli
AdvSIMD instructions are supposed to zero bits beyond 128. Affects SSHLL, USHLL, SSHLL2, USHLL2. Cc: qemu-sta...@nongnu.org Signed-off-by: Richard Henderson --- target/arm/tcg/translate-a64.c | 1 + 1 file changed, 1 insertion(+) diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c index fd90752dee..d0ad6c90bc 100644 --- a/target/arm/tcg/translate-a64.c +++ b/target/arm/tcg/translate-a64.c @@ -10471,6 +10471,7 @@ static void handle_vec_simd_wshli(DisasContext *s, bool is_q, bool is_u, tcg_gen_shli_i64(tcg_rd, tcg_rd, shift); write_vec_element(s, tcg_rd, rd, i, size + 1); } +clear_vec_high(s, true, rd); } /* SHRN/RSHRN - Shift right with narrowing (and potential rounding) */ -- 2.43.0
[PATCH 00/17] target/arm: AdvSIMD decodetree conversion, part 4
Flush before the queue gets too big. Also, there's a bug fix in patch 14. r~ Richard Henderson (17): target/arm: Use tcg_gen_extract2_i64 for EXT target/arm: Convert EXT to decodetree target/arm: Convert TBL, TBX to decodetree target/arm: Convert UZP, TRN, ZIP to decodetree target/arm: Simplify do_reduction_op target/arm: Convert ADDV, *ADDLV, *MAXV, *MINV to decodetree target/arm: Convert FMAXNMV, FMINNMV, FMAXV, FMINV to decodetree target/arm: Convert FMOVI (scalar, immediate) to decodetree target/arm: Convert MOVI, FMOV, ORR, BIC (vector immediate) to decodetree target/arm: Introduce gen_gvec_sshr, gen_gvec_ushr target/arm: Fix whitespace near gen_srshr64_i64 target/arm: Convert handle_vec_simd_shri to decodetree target/arm: Convet handle_vec_simd_shli to decodetree target/arm: Clear high SVE elements in handle_vec_simd_wshli target/arm: Use {,s}extract in handle_vec_simd_wshli target/arm: Convert SSHLL, USHLL to decodetree target/arm: Push tcg_rnd into handle_shri_with_rndacc target/arm/tcg/translate.h |5 + target/arm/tcg/gengvec.c| 21 +- target/arm/tcg/translate-a64.c | 1123 +++ target/arm/tcg/translate-neon.c | 25 +- target/arm/tcg/a64.decode | 87 +++ 5 files changed, 520 insertions(+), 741 deletions(-) -- 2.43.0
[PATCH 03/17] target/arm: Convert TBL, TBX to decodetree
Signed-off-by: Richard Henderson --- target/arm/tcg/translate-a64.c | 47 ++ target/arm/tcg/a64.decode | 4 +++ 2 files changed, 18 insertions(+), 33 deletions(-) diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c index 6ca24d9842..7e3bde93fe 100644 --- a/target/arm/tcg/translate-a64.c +++ b/target/arm/tcg/translate-a64.c @@ -4657,6 +4657,20 @@ static bool trans_EXTR(DisasContext *s, arg_extract *a) return true; } +static bool trans_TBL_TBX(DisasContext *s, arg_TBL_TBX *a) +{ +if (fp_access_check(s)) { +int len = (a->len + 1) * 16; + +tcg_gen_gvec_2_ptr(vec_full_reg_offset(s, a->rd), + vec_full_reg_offset(s, a->rm), tcg_env, + a->q ? 16 : 8, vec_full_reg_size(s), + (len << 6) | (a->tbx << 5) | a->rn, + gen_helper_simd_tblx); +} +return true; +} + /* * Cryptographic AES, SHA, SHA512 */ @@ -8897,38 +8911,6 @@ static void disas_data_proc_fp(DisasContext *s, uint32_t insn) } } -/* TBL/TBX - * 31 30 29 24 23 22 21 20 16 15 14 13 12 11 10 95 40 - * +---+---+-+-+---+--+---+-++-+--+--+ - * | 0 | Q | 0 0 1 1 1 0 | op2 | 0 | Rm | 0 | len | op | 0 0 | Rn | Rd | - * +---+---+-+-+---+--+---+-++-+--+--+ - */ -static void disas_simd_tb(DisasContext *s, uint32_t insn) -{ -int op2 = extract32(insn, 22, 2); -int is_q = extract32(insn, 30, 1); -int rm = extract32(insn, 16, 5); -int rn = extract32(insn, 5, 5); -int rd = extract32(insn, 0, 5); -int is_tbx = extract32(insn, 12, 1); -int len = (extract32(insn, 13, 2) + 1) * 16; - -if (op2 != 0) { -unallocated_encoding(s); -return; -} - -if (!fp_access_check(s)) { -return; -} - -tcg_gen_gvec_2_ptr(vec_full_reg_offset(s, rd), - vec_full_reg_offset(s, rm), tcg_env, - is_q ? 16 : 8, vec_full_reg_size(s), - (len << 6) | (is_tbx << 5) | rn, - gen_helper_simd_tblx); -} - /* ZIP/UZP/TRN * 31 30 29 24 23 22 21 20 16 15 14 12 11 10 95 40 * +---+---+-+--+---+--+---+--+--+ @@ -11792,7 +11774,6 @@ static const AArch64DecodeTable data_proc_simd[] = { /* simd_mod_imm decode is a subset of simd_shift_imm, so must precede it */ { 0x0f000400, 0x9ff80400, disas_simd_mod_imm }, { 0x0f000400, 0x9f800400, disas_simd_shift_imm }, -{ 0x0e00, 0xbf208c00, disas_simd_tb }, { 0x0e000800, 0xbf208c00, disas_simd_zip_trn }, { 0x5e200800, 0xdf3e0c00, disas_simd_scalar_two_reg_misc }, { 0x5f000400, 0xdf800400, disas_simd_scalar_shift_imm }, diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode index 05927fade6..45896902d5 100644 --- a/target/arm/tcg/a64.decode +++ b/target/arm/tcg/a64.decode @@ -1141,3 +1141,7 @@ FNMSUB 0001 .. 1 . 1 . . . @_hsd EXT_d 0010 1110 00 0 rm:5 00 imm:3 0 rn:5 rd:5 EXT_q 0110 1110 00 0 rm:5 0 imm:4 0 rn:5 rd:5 + +# Advanced SIMD Table Lookup + +TBL_TBX 0 q:1 00 1110 000 rm:5 0 len:2 tbx:1 00 rn:5 rd:5 -- 2.43.0
[PATCH 09/17] target/arm: Convert MOVI, FMOV, ORR, BIC (vector immediate) to decodetree
Signed-off-by: Richard Henderson --- target/arm/tcg/translate-a64.c | 117 ++--- target/arm/tcg/a64.decode | 9 +++ 2 files changed, 59 insertions(+), 67 deletions(-) diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c index 6582816e4e..1fa9dc3172 100644 --- a/target/arm/tcg/translate-a64.c +++ b/target/arm/tcg/translate-a64.c @@ -6872,6 +6872,52 @@ static bool trans_FMOVI_s(DisasContext *s, arg_FMOVI_s *a) return true; } +/* + * Advanced SIMD Modified Immediate + */ + +static bool trans_FMOVI_v_h(DisasContext *s, arg_FMOVI_v_h *a) +{ +if (!dc_isar_feature(aa64_fp16, s)) { +return false; +} +if (fp_access_check(s)) { +tcg_gen_gvec_dup_imm(MO_16, vec_full_reg_offset(s, a->rd), + a->q ? 16 : 8, vec_full_reg_size(s), + vfp_expand_imm(MO_16, a->abcdefgh)); +} +return true; +} + +static void gen_movi(unsigned vece, uint32_t dofs, uint32_t aofs, + int64_t c, uint32_t oprsz, uint32_t maxsz) +{ +tcg_gen_gvec_dup_imm(MO_64, dofs, oprsz, maxsz, c); +} + +static bool trans_Vimm(DisasContext *s, arg_Vimm *a) +{ +GVecGen2iFn *fn; + +/* Handle decode of cmode/op here between ORR/BIC/MOVI */ +if ((a->cmode & 1) && a->cmode < 12) { +/* For op=1, the imm will be inverted, so BIC becomes AND. */ +fn = a->op ? tcg_gen_gvec_andi : tcg_gen_gvec_ori; +} else { +/* There is one unallocated cmode/op combination in this space */ +if (a->cmode == 15 && a->op == 1 && a->q == 0) { +return false; +} +fn = gen_movi; +} + +if (fp_access_check(s)) { +uint64_t imm = asimd_imm_const(a->abcdefgh, a->cmode, a->op); +gen_gvec_fn2i(s, a->q, a->rd, a->rd, imm, fn, MO_64); +} +return true; +} + /* Shift a TCGv src by TCGv shift_amount, put result in dst. * Note that it is the caller's responsibility to ensure that the * shift amount is in range (ie 0..31 or 0..63) and provide the ARM @@ -9051,69 +9097,6 @@ static void disas_data_proc_fp(DisasContext *s, uint32_t insn) } } -/* AdvSIMD modified immediate - * 31 30 29 28 19 18 16 15 12 11 10 9 5 40 - * +---+---++-+-+---++---+---+--+ - * | 0 | Q | op | 0 1 1 1 1 0 0 0 0 0 | abc | cmode | o2 | 1 | defgh | Rd | - * +---+---++-+-+---++---+---+--+ - * - * There are a number of operations that can be carried out here: - * MOVI - move (shifted) imm into register - * MVNI - move inverted (shifted) imm into register - * ORR - bitwise OR of (shifted) imm with register - * BIC - bitwise clear of (shifted) imm with register - * With ARMv8.2 we also have: - * FMOV half-precision - */ -static void disas_simd_mod_imm(DisasContext *s, uint32_t insn) -{ -int rd = extract32(insn, 0, 5); -int cmode = extract32(insn, 12, 4); -int o2 = extract32(insn, 11, 1); -uint64_t abcdefgh = extract32(insn, 5, 5) | (extract32(insn, 16, 3) << 5); -bool is_neg = extract32(insn, 29, 1); -bool is_q = extract32(insn, 30, 1); -uint64_t imm = 0; - -if (o2) { -if (cmode != 0xf || is_neg) { -unallocated_encoding(s); -return; -} -/* FMOV (vector, immediate) - half-precision */ -if (!dc_isar_feature(aa64_fp16, s)) { -unallocated_encoding(s); -return; -} -imm = vfp_expand_imm(MO_16, abcdefgh); -/* now duplicate across the lanes */ -imm = dup_const(MO_16, imm); -} else { -if (cmode == 0xf && is_neg && !is_q) { -unallocated_encoding(s); -return; -} -imm = asimd_imm_const(abcdefgh, cmode, is_neg); -} - -if (!fp_access_check(s)) { -return; -} - -if (!((cmode & 0x9) == 0x1 || (cmode & 0xd) == 0x9)) { -/* MOVI or MVNI, with MVNI negation handled above. */ -tcg_gen_gvec_dup_imm(MO_64, vec_full_reg_offset(s, rd), is_q ? 16 : 8, - vec_full_reg_size(s), imm); -} else { -/* ORR or BIC, with BIC negation to AND handled above. */ -if (is_neg) { -gen_gvec_fn2i(s, is_q, rd, rd, imm, tcg_gen_gvec_andi, MO_64); -} else { -gen_gvec_fn2i(s, is_q, rd, rd, imm, tcg_gen_gvec_ori, MO_64); -} -} -} - /* * Common SSHR[RA]/USHR[RA] - Shift right (optional rounding/accumulate) * @@ -10593,8 +10576,10 @@ static void disas_simd_shift_imm(DisasContext *s, uint32_t insn) bool is_u = extract32(insn, 29, 1); bool is_q = extract32(insn, 30, 1); -/* data_proc_simd[] has sent immh == 0 to disas_simd_mod_imm. */ -assert(immh != 0); +if (immh == 0) { +unallocated_encoding(s);
[PATCH 02/17] target/arm: Convert EXT to decodetree
Signed-off-by: Richard Henderson --- target/arm/tcg/translate-a64.c | 121 + target/arm/tcg/a64.decode | 5 ++ 2 files changed, 53 insertions(+), 73 deletions(-) diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c index e4c8a20f39..6ca24d9842 100644 --- a/target/arm/tcg/translate-a64.c +++ b/target/arm/tcg/translate-a64.c @@ -6541,6 +6541,54 @@ static bool trans_FCSEL(DisasContext *s, arg_FCSEL *a) return true; } +/* + * Advanced SIMD Extract + */ + +static bool trans_EXT_d(DisasContext *s, arg_EXT_d *a) +{ +if (fp_access_check(s)) { +TCGv_i64 lo = read_fp_dreg(s, a->rn); +if (a->imm != 0) { +TCGv_i64 hi = read_fp_dreg(s, a->rm); +tcg_gen_extract2_i64(lo, lo, hi, a->imm * 8); +} +write_fp_dreg(s, a->rd, lo); +} +return true; +} + +static bool trans_EXT_q(DisasContext *s, arg_EXT_q *a) +{ +TCGv_i64 lo, hi; +int pos = (a->imm & 7) * 8; +int elt = a->imm >> 3; + +if (!fp_access_check(s)) { +return true; +} + +lo = tcg_temp_new_i64(); +hi = tcg_temp_new_i64(); + +read_vec_element(s, lo, a->rn, elt, MO_64); +elt++; +read_vec_element(s, hi, elt & 2 ? a->rm : a->rn, elt & 1, MO_64); +elt++; + +if (pos != 0) { +TCGv_i64 hh = tcg_temp_new_i64(); +tcg_gen_extract2_i64(lo, lo, hi, pos); +read_vec_element(s, hh, a->rm, elt & 1, MO_64); +tcg_gen_extract2_i64(hi, hi, hh, pos); +} + +write_vec_element(s, lo, a->rd, 0, MO_64); +write_vec_element(s, hi, a->rd, 1, MO_64); +clear_vec_high(s, true, a->rd); +return true; +} + /* * Floating-point data-processing (3 source) */ @@ -8849,78 +8897,6 @@ static void disas_data_proc_fp(DisasContext *s, uint32_t insn) } } -/* EXT - * 31 30 29 24 23 22 21 20 16 15 14 11 10 95 40 - * +---+---+-+-+---+--+---+--+---+--+--+ - * | 0 | Q | 1 0 1 1 1 0 | op2 | 0 | Rm | 0 | imm4 | 0 | Rn | Rd | - * +---+---+-+-+---+--+---+--+---+--+--+ - */ -static void disas_simd_ext(DisasContext *s, uint32_t insn) -{ -int is_q = extract32(insn, 30, 1); -int op2 = extract32(insn, 22, 2); -int imm4 = extract32(insn, 11, 4); -int rm = extract32(insn, 16, 5); -int rn = extract32(insn, 5, 5); -int rd = extract32(insn, 0, 5); -int pos = imm4 << 3; -TCGv_i64 tcg_resl, tcg_resh; - -if (op2 != 0 || (!is_q && extract32(imm4, 3, 1))) { -unallocated_encoding(s); -return; -} - -if (!fp_access_check(s)) { -return; -} - -tcg_resh = tcg_temp_new_i64(); -tcg_resl = tcg_temp_new_i64(); - -/* Vd gets bits starting at pos bits into Vm:Vn. This is - * either extracting 128 bits from a 128:128 concatenation, or - * extracting 64 bits from a 64:64 concatenation. - */ -if (!is_q) { -read_vec_element(s, tcg_resl, rn, 0, MO_64); -if (pos != 0) { -read_vec_element(s, tcg_resh, rm, 0, MO_64); -tcg_gen_extract2_i64(tcg_resl, tcg_resl, tcg_resh, pos); -} -} else { -TCGv_i64 tcg_hh; -typedef struct { -int reg; -int elt; -} EltPosns; -EltPosns eltposns[] = { {rn, 0}, {rn, 1}, {rm, 0}, {rm, 1} }; -EltPosns *elt = eltposns; - -if (pos >= 64) { -elt++; -pos -= 64; -} - -read_vec_element(s, tcg_resl, elt->reg, elt->elt, MO_64); -elt++; -read_vec_element(s, tcg_resh, elt->reg, elt->elt, MO_64); -elt++; -if (pos != 0) { -tcg_gen_extract2_i64(tcg_resl, tcg_resl, tcg_resh, pos); -tcg_hh = tcg_temp_new_i64(); -read_vec_element(s, tcg_hh, elt->reg, elt->elt, MO_64); -tcg_gen_extract2_i64(tcg_resh, tcg_resh, tcg_hh, pos); -} -} - -write_vec_element(s, tcg_resl, rd, 0, MO_64); -if (is_q) { -write_vec_element(s, tcg_resh, rd, 1, MO_64); -} -clear_vec_high(s, is_q, rd); -} - /* TBL/TBX * 31 30 29 24 23 22 21 20 16 15 14 13 12 11 10 95 40 * +---+---+-+-+---+--+---+-++-+--+--+ @@ -11818,7 +11794,6 @@ static const AArch64DecodeTable data_proc_simd[] = { { 0x0f000400, 0x9f800400, disas_simd_shift_imm }, { 0x0e00, 0xbf208c00, disas_simd_tb }, { 0x0e000800, 0xbf208c00, disas_simd_zip_trn }, -{ 0x2e00, 0xbf208400, disas_simd_ext }, { 0x5e200800, 0xdf3e0c00, disas_simd_scalar_two_reg_misc }, { 0x5f000400, 0xdf800400, disas_simd_scalar_shift_imm }, { 0x0e780800, 0x8f7e0c00, disas_simd_two_reg_misc_fp16 }, diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode index 2922de700c..05927fade6 100644 --- a/target/arm/tcg/a64.
[PATCH 07/17] target/arm: Convert FMAXNMV, FMINNMV, FMAXV, FMINV to decodetree
Signed-off-by: Richard Henderson --- target/arm/tcg/translate-a64.c | 176 ++--- target/arm/tcg/a64.decode | 14 +++ 2 files changed, 67 insertions(+), 123 deletions(-) diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c index 055ba4695e..2964279c00 100644 --- a/target/arm/tcg/translate-a64.c +++ b/target/arm/tcg/translate-a64.c @@ -6794,6 +6794,59 @@ TRANS(UMAXV, do_int_reduction, a, false, 0, tcg_gen_umax_i64) TRANS(SMINV, do_int_reduction, a, false, MO_SIGN, tcg_gen_smin_i64) TRANS(UMINV, do_int_reduction, a, false, 0, tcg_gen_umin_i64) +/* + * do_fp_reduction helper + * + * This mirrors the Reduce() pseudocode in the ARM ARM. It is + * important for correct NaN propagation that we do these + * operations in exactly the order specified by the pseudocode. + * + * This is a recursive function. + */ +static TCGv_i32 do_reduction_op(DisasContext *s, int rn, MemOp esz, +int ebase, int ecount, TCGv_ptr fpst, +NeonGenTwoSingleOpFn *fn) +{ +if (ecount == 1) { +TCGv_i32 tcg_elem = tcg_temp_new_i32(); +read_vec_element_i32(s, tcg_elem, rn, ebase, esz); +return tcg_elem; +} else { +int half = ecount >> 1; +TCGv_i32 tcg_hi, tcg_lo, tcg_res; + +tcg_hi = do_reduction_op(s, rn, esz, ebase + half, half, fpst, fn); +tcg_lo = do_reduction_op(s, rn, esz, ebase, half, fpst, fn); +tcg_res = tcg_temp_new_i32(); + +fn(tcg_res, tcg_lo, tcg_hi, fpst); +return tcg_res; +} +} + +static bool do_fp_reduction(DisasContext *s, arg_qrr_e *a, + NeonGenTwoSingleOpFn *fn) +{ +if (fp_access_check(s)) { +MemOp esz = a->esz; +int elts = (a->q ? 16 : 8) >> esz; +TCGv_ptr fpst = fpstatus_ptr(esz == MO_16 ? FPST_FPCR_F16 : FPST_FPCR); +TCGv_i32 res = do_reduction_op(s, a->rn, esz, 0, elts, fpst, fn); +write_fp_sreg(s, a->rd, res); +} +return true; +} + +TRANS_FEAT(FMAXNMV_h, aa64_fp16, do_fp_reduction, a, gen_helper_advsimd_maxnumh) +TRANS_FEAT(FMINNMV_h, aa64_fp16, do_fp_reduction, a, gen_helper_advsimd_minnumh) +TRANS_FEAT(FMAXV_h, aa64_fp16, do_fp_reduction, a, gen_helper_advsimd_maxh) +TRANS_FEAT(FMINV_h, aa64_fp16, do_fp_reduction, a, gen_helper_advsimd_minh) + +TRANS(FMAXNMV_s, do_fp_reduction, a, gen_helper_vfp_maxnums) +TRANS(FMINNMV_s, do_fp_reduction, a, gen_helper_vfp_minnums) +TRANS(FMAXV_s, do_fp_reduction, a, gen_helper_vfp_maxs) +TRANS(FMINV_s, do_fp_reduction, a, gen_helper_vfp_mins) + /* Shift a TCGv src by TCGv shift_amount, put result in dst. * Note that it is the caller's responsibility to ensure that the * shift amount is in range (ie 0..31 or 0..63) and provide the ARM @@ -9020,128 +9073,6 @@ static void disas_data_proc_fp(DisasContext *s, uint32_t insn) } } -/* - * do_reduction_op helper - * - * This mirrors the Reduce() pseudocode in the ARM ARM. It is - * important for correct NaN propagation that we do these - * operations in exactly the order specified by the pseudocode. - * - * This is a recursive function. - */ -static TCGv_i32 do_reduction_op(DisasContext *s, int fpopcode, int rn, -MemOp esz, int ebase, int ecount, TCGv_ptr fpst) -{ -if (ecount == 1) { -TCGv_i32 tcg_elem = tcg_temp_new_i32(); -read_vec_element_i32(s, tcg_elem, rn, ebase, esz); -return tcg_elem; -} else { -int half = ecount >> 1; -TCGv_i32 tcg_hi, tcg_lo, tcg_res; - -tcg_hi = do_reduction_op(s, fpopcode, rn, esz, - ebase + half, half, fpst); -tcg_lo = do_reduction_op(s, fpopcode, rn, esz, - ebase, half, fpst); -tcg_res = tcg_temp_new_i32(); - -switch (fpopcode) { -case 0x0c: /* fmaxnmv half-precision */ -gen_helper_advsimd_maxnumh(tcg_res, tcg_lo, tcg_hi, fpst); -break; -case 0x0f: /* fmaxv half-precision */ -gen_helper_advsimd_maxh(tcg_res, tcg_lo, tcg_hi, fpst); -break; -case 0x1c: /* fminnmv half-precision */ -gen_helper_advsimd_minnumh(tcg_res, tcg_lo, tcg_hi, fpst); -break; -case 0x1f: /* fminv half-precision */ -gen_helper_advsimd_minh(tcg_res, tcg_lo, tcg_hi, fpst); -break; -case 0x2c: /* fmaxnmv */ -gen_helper_vfp_maxnums(tcg_res, tcg_lo, tcg_hi, fpst); -break; -case 0x2f: /* fmaxv */ -gen_helper_vfp_maxs(tcg_res, tcg_lo, tcg_hi, fpst); -break; -case 0x3c: /* fminnmv */ -gen_helper_vfp_minnums(tcg_res, tcg_lo, tcg_hi, fpst); -break; -case 0x3f: /* fminv */ -gen_helper_vfp_mins(tcg_res, tcg_lo, tcg_hi, fpst); -break; -default: -g_assert_not_reached
[PATCH 04/17] target/arm: Convert UZP, TRN, ZIP to decodetree
Signed-off-by: Richard Henderson --- target/arm/tcg/translate-a64.c | 158 ++--- target/arm/tcg/a64.decode | 9 ++ 2 files changed, 77 insertions(+), 90 deletions(-) diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c index 7e3bde93fe..e0314a1253 100644 --- a/target/arm/tcg/translate-a64.c +++ b/target/arm/tcg/translate-a64.c @@ -4671,6 +4671,74 @@ static bool trans_TBL_TBX(DisasContext *s, arg_TBL_TBX *a) return true; } +typedef int simd_permute_idx_fn(int i, int part, int elements); + +static bool do_simd_permute(DisasContext *s, arg_qrrr_e *a, +simd_permute_idx_fn *fn, int part) +{ +MemOp esz = a->esz; +int datasize = a->q ? 16 : 8; +int elements = datasize >> esz; +TCGv_i64 tcg_res[2], tcg_ele; + +if (esz == MO_64 && !a->q) { +return false; +} +if (!fp_access_check(s)) { +return true; +} + +tcg_res[0] = tcg_temp_new_i64(); +tcg_res[1] = a->q ? tcg_temp_new_i64() : NULL; +tcg_ele = tcg_temp_new_i64(); + +for (int i = 0; i < elements; i++) { +int o, w, idx; + +idx = fn(i, part, elements); +read_vec_element(s, tcg_ele, (idx & elements ? a->rm : a->rn), + idx & (elements - 1), esz); + +w = (i << (esz + 3)) / 64; +o = (i << (esz + 3)) % 64; +if (o == 0) { +tcg_gen_mov_i64(tcg_res[w], tcg_ele); +} else { +tcg_gen_deposit_i64(tcg_res[w], tcg_res[w], tcg_ele, o, 8 << esz); +} +} + +for (int i = a->q; i >= 0; --i) { +write_vec_element(s, tcg_res[i], a->rd, i, MO_64); +} +clear_vec_high(s, a->q, a->rd); +return true; +} + +static int permute_load_uzp(int i, int part, int elements) +{ +return 2 * i + part; +} + +TRANS(UZP1, do_simd_permute, a, permute_load_uzp, 0) +TRANS(UZP2, do_simd_permute, a, permute_load_uzp, 1) + +static int permute_load_trn(int i, int part, int elements) +{ +return (i & 1) * elements + (i & ~1) + part; +} + +TRANS(TRN1, do_simd_permute, a, permute_load_trn, 0) +TRANS(TRN2, do_simd_permute, a, permute_load_trn, 1) + +static int permute_load_zip(int i, int part, int elements) +{ +return (i & 1) * elements + ((part * elements + i) >> 1); +} + +TRANS(ZIP1, do_simd_permute, a, permute_load_zip, 0) +TRANS(ZIP2, do_simd_permute, a, permute_load_zip, 1) + /* * Cryptographic AES, SHA, SHA512 */ @@ -8911,95 +8979,6 @@ static void disas_data_proc_fp(DisasContext *s, uint32_t insn) } } -/* ZIP/UZP/TRN - * 31 30 29 24 23 22 21 20 16 15 14 12 11 10 95 40 - * +---+---+-+--+---+--+---+--+--+ - * | 0 | Q | 0 0 1 1 1 0 | size | 0 | Rm | 0 | opc | 1 0 | Rn | Rd | - * +---+---+-+--+---+--+---+--+--+ - */ -static void disas_simd_zip_trn(DisasContext *s, uint32_t insn) -{ -int rd = extract32(insn, 0, 5); -int rn = extract32(insn, 5, 5); -int rm = extract32(insn, 16, 5); -int size = extract32(insn, 22, 2); -/* opc field bits [1:0] indicate ZIP/UZP/TRN; - * bit 2 indicates 1 vs 2 variant of the insn. - */ -int opcode = extract32(insn, 12, 2); -bool part = extract32(insn, 14, 1); -bool is_q = extract32(insn, 30, 1); -int esize = 8 << size; -int i; -int datasize = is_q ? 128 : 64; -int elements = datasize / esize; -TCGv_i64 tcg_res[2], tcg_ele; - -if (opcode == 0 || (size == 3 && !is_q)) { -unallocated_encoding(s); -return; -} - -if (!fp_access_check(s)) { -return; -} - -tcg_res[0] = tcg_temp_new_i64(); -tcg_res[1] = is_q ? tcg_temp_new_i64() : NULL; -tcg_ele = tcg_temp_new_i64(); - -for (i = 0; i < elements; i++) { -int o, w; - -switch (opcode) { -case 1: /* UZP1/2 */ -{ -int midpoint = elements / 2; -if (i < midpoint) { -read_vec_element(s, tcg_ele, rn, 2 * i + part, size); -} else { -read_vec_element(s, tcg_ele, rm, - 2 * (i - midpoint) + part, size); -} -break; -} -case 2: /* TRN1/2 */ -if (i & 1) { -read_vec_element(s, tcg_ele, rm, (i & ~1) + part, size); -} else { -read_vec_element(s, tcg_ele, rn, (i & ~1) + part, size); -} -break; -case 3: /* ZIP1/2 */ -{ -int base = part * elements / 2; -if (i & 1) { -read_vec_element(s, tcg_ele, rm, base + (i >> 1), size); -} else { -read_vec_element(s, tcg_ele, rn, base + (i >> 1), size); -} -break; -} -
[PATCH 01/17] target/arm: Use tcg_gen_extract2_i64 for EXT
The extract2 tcg op performs the same operation as the do_ext64 function. Signed-off-by: Richard Henderson --- target/arm/tcg/translate-a64.c | 23 +++ 1 file changed, 3 insertions(+), 20 deletions(-) diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c index 559a6cd799..e4c8a20f39 100644 --- a/target/arm/tcg/translate-a64.c +++ b/target/arm/tcg/translate-a64.c @@ -8849,23 +8849,6 @@ static void disas_data_proc_fp(DisasContext *s, uint32_t insn) } } -static void do_ext64(DisasContext *s, TCGv_i64 tcg_left, TCGv_i64 tcg_right, - int pos) -{ -/* Extract 64 bits from the middle of two concatenated 64 bit - * vector register slices left:right. The extracted bits start - * at 'pos' bits into the right (least significant) side. - * We return the result in tcg_right, and guarantee not to - * trash tcg_left. - */ -TCGv_i64 tcg_tmp = tcg_temp_new_i64(); -assert(pos > 0 && pos < 64); - -tcg_gen_shri_i64(tcg_right, tcg_right, pos); -tcg_gen_shli_i64(tcg_tmp, tcg_left, 64 - pos); -tcg_gen_or_i64(tcg_right, tcg_right, tcg_tmp); -} - /* EXT * 31 30 29 24 23 22 21 20 16 15 14 11 10 95 40 * +---+---+-+-+---+--+---+--+---+--+--+ @@ -8903,7 +8886,7 @@ static void disas_simd_ext(DisasContext *s, uint32_t insn) read_vec_element(s, tcg_resl, rn, 0, MO_64); if (pos != 0) { read_vec_element(s, tcg_resh, rm, 0, MO_64); -do_ext64(s, tcg_resh, tcg_resl, pos); +tcg_gen_extract2_i64(tcg_resl, tcg_resl, tcg_resh, pos); } } else { TCGv_i64 tcg_hh; @@ -8924,10 +8907,10 @@ static void disas_simd_ext(DisasContext *s, uint32_t insn) read_vec_element(s, tcg_resh, elt->reg, elt->elt, MO_64); elt++; if (pos != 0) { -do_ext64(s, tcg_resh, tcg_resl, pos); +tcg_gen_extract2_i64(tcg_resl, tcg_resl, tcg_resh, pos); tcg_hh = tcg_temp_new_i64(); read_vec_element(s, tcg_hh, elt->reg, elt->elt, MO_64); -do_ext64(s, tcg_hh, tcg_resh, pos); +tcg_gen_extract2_i64(tcg_resh, tcg_resh, tcg_hh, pos); } } -- 2.43.0
[PATCH v2 2/3] target/arm: Use FPST_F16 for SME FMOPA (widening)
This operation has float16 inputs and thus must use the FZ16 control not the FZ control. Cc: qemu-sta...@nongnu.org Fixes: 3916841ac75 ("target/arm: Implement FMOPA, FMOPS (widening)") Reported-by: Daniyal Khan Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2374 Signed-off-by: Richard Henderson Reviewed-by: Alex Bennée --- target/arm/tcg/translate-sme.c | 12 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index 46c7fce8b4..185a8a917b 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -304,6 +304,7 @@ static bool do_outprod(DisasContext *s, arg_op *a, MemOp esz, } static bool do_outprod_fpst(DisasContext *s, arg_op *a, MemOp esz, +ARMFPStatusFlavour e_fpst, gen_helper_gvec_5_ptr *fn) { int svl = streaming_vec_reg_size(s); @@ -319,15 +320,18 @@ static bool do_outprod_fpst(DisasContext *s, arg_op *a, MemOp esz, zm = vec_full_reg_ptr(s, a->zm); pn = pred_full_reg_ptr(s, a->pn); pm = pred_full_reg_ptr(s, a->pm); -fpst = fpstatus_ptr(FPST_FPCR); +fpst = fpstatus_ptr(e_fpst); fn(za, zn, zm, pn, pm, fpst, tcg_constant_i32(desc)); return true; } -TRANS_FEAT(FMOPA_h, aa64_sme, do_outprod_fpst, a, MO_32, gen_helper_sme_fmopa_h) -TRANS_FEAT(FMOPA_s, aa64_sme, do_outprod_fpst, a, MO_32, gen_helper_sme_fmopa_s) -TRANS_FEAT(FMOPA_d, aa64_sme_f64f64, do_outprod_fpst, a, MO_64, gen_helper_sme_fmopa_d) +TRANS_FEAT(FMOPA_h, aa64_sme, do_outprod_fpst, a, + MO_32, FPST_FPCR_F16, gen_helper_sme_fmopa_h) +TRANS_FEAT(FMOPA_s, aa64_sme, do_outprod_fpst, a, + MO_32, FPST_FPCR, gen_helper_sme_fmopa_s) +TRANS_FEAT(FMOPA_d, aa64_sme_f64f64, do_outprod_fpst, a, + MO_64, FPST_FPCR, gen_helper_sme_fmopa_d) /* TODO: FEAT_EBF16 */ TRANS_FEAT(BFMOPA, aa64_sme, do_outprod, a, MO_32, gen_helper_sme_bfmopa) -- 2.43.0
[PATCH v2 1/3] target/arm: Use float_status copy in sme_fmopa_s
From: Daniyal Khan We made a copy above because the fp exception flags are not propagated back to the FPST register, but then failed to use the copy. Cc: qemu-sta...@nongnu.org Fixes: 558e956c719 ("target/arm: Implement FMOPA, FMOPS (non-widening)") Signed-off-by: Daniyal Khan [rth: Split from a larger patch] Signed-off-by: Richard Henderson Reviewed-by: Philippe Mathieu-Daudé Reviewed-by: Alex Bennée --- target/arm/tcg/sme_helper.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/target/arm/tcg/sme_helper.c b/target/arm/tcg/sme_helper.c index e2e0575039..5a6dd76489 100644 --- a/target/arm/tcg/sme_helper.c +++ b/target/arm/tcg/sme_helper.c @@ -916,7 +916,7 @@ void HELPER(sme_fmopa_s)(void *vza, void *vzn, void *vzm, void *vpn, if (pb & 1) { uint32_t *a = vza_row + H1_4(col); uint32_t *m = vzm + H1_4(col); -*a = float32_muladd(n, *m, *a, 0, vst); +*a = float32_muladd(n, *m, *a, 0, ); } col += 4; pb >>= 4; -- 2.43.0
[PATCH v2 3/3] tests/tcg/aarch64: Add test cases for SME FMOPA (widening)
From: Daniyal Khan Signed-off-by: Daniyal Khan Message-Id: 172090222034.13953.1688870870882292209...@git.sr.ht [rth: Split test from a larger patch, tidy assembly] Signed-off-by: Richard Henderson Reviewed-by: Alex Bennée --- tests/tcg/aarch64/sme-fmopa-1.c | 63 +++ tests/tcg/aarch64/sme-fmopa-2.c | 56 +++ tests/tcg/aarch64/sme-fmopa-3.c | 63 +++ tests/tcg/aarch64/Makefile.target | 5 ++- 4 files changed, 185 insertions(+), 2 deletions(-) create mode 100644 tests/tcg/aarch64/sme-fmopa-1.c create mode 100644 tests/tcg/aarch64/sme-fmopa-2.c create mode 100644 tests/tcg/aarch64/sme-fmopa-3.c diff --git a/tests/tcg/aarch64/sme-fmopa-1.c b/tests/tcg/aarch64/sme-fmopa-1.c new file mode 100644 index 00..652c4ea090 --- /dev/null +++ b/tests/tcg/aarch64/sme-fmopa-1.c @@ -0,0 +1,63 @@ +/* + * SME outer product, 1 x 1. + * SPDX-License-Identifier: GPL-2.0-or-later + */ + +#include + +static void foo(float *dst) +{ +asm(".arch_extension sme\n\t" +"smstart\n\t" +"ptrue p0.s, vl4\n\t" +"fmov z0.s, #1.0\n\t" +/* + * An outer product of a vector of 1.0 by itself should be a matrix of 1.0. + * Note that we are using tile 1 here (za1.s) rather than tile 0. + */ +"zero {za}\n\t" +"fmopa za1.s, p0/m, p0/m, z0.s, z0.s\n\t" +/* + * Read the first 4x4 sub-matrix of elements from tile 1: + * Note that za1h should be interchangeable here. + */ +"mov w12, #0\n\t" +"mova z0.s, p0/m, za1v.s[w12, #0]\n\t" +"mova z1.s, p0/m, za1v.s[w12, #1]\n\t" +"mova z2.s, p0/m, za1v.s[w12, #2]\n\t" +"mova z3.s, p0/m, za1v.s[w12, #3]\n\t" +/* + * And store them to the input pointer (dst in the C code): + */ +"st1w {z0.s}, p0, [%0]\n\t" +"add x0, x0, #16\n\t" +"st1w {z1.s}, p0, [x0]\n\t" +"add x0, x0, #16\n\t" +"st1w {z2.s}, p0, [x0]\n\t" +"add x0, x0, #16\n\t" +"st1w {z3.s}, p0, [x0]\n\t" +"smstop" +: : "r"(dst) +: "x12", "d0", "d1", "d2", "d3", "memory"); +} + +int main() +{ +float dst[16] = { }; + +foo(dst); + +for (int i = 0; i < 16; i++) { +if (dst[i] != 1.0f) { +goto failure; +} +} +/* success */ +return 0; + + failure: +for (int i = 0; i < 16; i++) { +printf("%f%c", dst[i], i % 4 == 3 ? '\n' : ' '); +} +return 1; +} diff --git a/tests/tcg/aarch64/sme-fmopa-2.c b/tests/tcg/aarch64/sme-fmopa-2.c new file mode 100644 index 00..15f0972d83 --- /dev/null +++ b/tests/tcg/aarch64/sme-fmopa-2.c @@ -0,0 +1,56 @@ +/* + * SME outer product, FZ vs FZ16 + * SPDX-License-Identifier: GPL-2.0-or-later + */ + +#include +#include + +static void test_fmopa(uint32_t *result) +{ +asm(".arch_extension sme\n\t" +"smstart\n\t" /* Z*, P* and ZArray cleared */ +"ptrue p2.b, vl16\n\t" /* Limit vector length to 16 */ +"ptrue p5.b, vl16\n\t" +"movi d0, #0x00ff\n\t" /* fp16 denormal */ +"movi d16, #0x00ff\n\t" +"mov w15, #0x000100\n\t" /* FZ=1, FZ16=0 */ +"msr fpcr, x15\n\t" +"fmopa za3.s, p2/m, p5/m, z16.h, z0.h\n\t" +"mov w15, #0\n\t" +"st1w {za3h.s[w15, 0]}, p2, [%0]\n\t" +"add %0, %0, #16\n\t" +"st1w {za3h.s[w15, 1]}, p2, [%0]\n\t" +"mov w15, #2\n\t" +"add %0, %0, #16\n\t" +"st1w {za3h.s[w15, 0]}, p2, [%0]\n\t" +"add %0, %0, #16\n\t" +"st1w {za3h.s[w15, 1]}, p2, [%0]\n\t" +"smstop" +: "+r"(result) : +: "x15", "x16", "p2", "p5", "d0", "d16", "memory"); +} + +int main(void) +{ +uint32_t result[4 * 4] = { }; + +test_fmopa(result); + +if (result[0] != 0x2f7e0100) { +printf("Test failed: Incorrect output in first 4 bytes\n" + "Expected: %08x\n" + "Got: %08x\n", + 0x2f7e0100, result[0]); +return 1; +} + +for (int i = 1; i < 16; ++i) { +if (result[i] != 0) { +printf("Test failed: Non-zero word at position %d\n", i); +return 1; +} +} + +return 0; +} diff --git a/tests/tcg/aarch64/sme-fmopa-3.c b/tests/tcg/aarch64/sme-fmopa-3.c new file mode 100644
[PATCH v2 0/3] target/arm: Fixes for SME FMOPA (#2373)
Changes for v2: - Apply r-b. - Add license headers to two test cases. r~ Daniyal Khan (2): target/arm: Use float_status copy in sme_fmopa_s tests/tcg/aarch64: Add test cases for SME FMOPA (widening) Richard Henderson (1): target/arm: Use FPST_F16 for SME FMOPA (widening) target/arm/tcg/sme_helper.c | 2 +- target/arm/tcg/translate-sme.c| 12 -- tests/tcg/aarch64/sme-fmopa-1.c | 63 +++ tests/tcg/aarch64/sme-fmopa-2.c | 56 +++ tests/tcg/aarch64/sme-fmopa-3.c | 63 +++ tests/tcg/aarch64/Makefile.target | 5 ++- 6 files changed, 194 insertions(+), 7 deletions(-) create mode 100644 tests/tcg/aarch64/sme-fmopa-1.c create mode 100644 tests/tcg/aarch64/sme-fmopa-2.c create mode 100644 tests/tcg/aarch64/sme-fmopa-3.c -- 2.43.0
Re: [PULL 00/11] SD/MMC patches for 2024-07-16
On 7/17/24 04:41, Philippe Mathieu-Daudé wrote: The following changes since commit 959269e910944c03bc13f300d65bf08b060d5d0f: Merge tag 'python-pull-request' ofhttps://gitlab.com/jsnow/qemu into staging (2024-07-16 06:45:23 +1000) are available in the Git repository at: https://github.com/philmd/qemu.git tags/sdmmc-20240716 for you to fetch changes up to c8cb19876d3e29bffd7ffd87586ff451f97f5f46: hw/sd/sdcard: Support boot area in emmc image (2024-07-16 20:30:15 +0200) Ignored checkpatch error: WARNING: line over 80 characters #109: FILE: hw/sd/sd.c:500: +sd->ext_csd[EXT_CSD_HC_WP_GRP_SIZE] = 0x01; /* HC write protect group size */ SD/MMC patches queue Addition of eMMC support is a long-term collaborative virtual work by: - Cédric Le Goater - Edgar E. Iglesias - Francisco Iglesias - Joel Stanley - Luc Michel - Philippe Mathieu-Daudé - Sai Pavan Boddu - Vincent Palatin Applied, thanks. r~
Re: [PULL 00/13] Misc HW/UI patches for 2024-07-16
On 7/17/24 04:09, Philippe Mathieu-Daudé wrote: The following changes since commit 959269e910944c03bc13f300d65bf08b060d5d0f: Merge tag 'python-pull-request' ofhttps://gitlab.com/jsnow/qemu into staging (2024-07-16 06:45:23 +1000) are available in the Git repository at: https://github.com/philmd/qemu.git tags/hw-misc-20240716 for you to fetch changes up to 644a52778a90581dbda909f38b9eaf71501fd9cd: system/physmem: use return value of ram_block_discard_require() as errno (2024-07-16 20:04:08 +0200) Ignored checkpatch error: WARNING: line over 80 characters #30: FILE: system/vl.c:1004: +if (!ti->class_names[0] || module_object_class_by_name(ti->class_names[0])) { Ignored CI failures: - bios-tables-test on cross-i686-tci - qtest-sparc on msys2-64bit Misc HW & UI patches queue - Allow loading safely ROMs larger than 4GiB (Gregor) - Convert vt82c686 IRQ as named 'intr' (Bernhard) - Clarify QDev GPIO API (Peter) - Drop unused load_image_gzipped function (Ani) - MakeTCGCPUOps::cpu_exec_interrupt handler mandatory (Peter) - Factor cpu_pause() out (Nicholas) - Remove transfer size check from ESP DMA DATA IN / OUT transfers (Mark) - Add accelerated cursor composition to Cocoa UI (Akihiko) - Fix '-vga help' CLI (Marc-André) - Fix displayed errno in ram_block_add (Zhenzhong) Applied, thanks. r~
Re: [RFC PATCH] gdbstub: Re-factor gdb command extensions
On 7/17/24 02:55, Alex Bennée wrote: Are you expecting the same GdbCmdParseEntry object to be registered multiple times? Can we fix that at a higher level? Its basically a hack to deal with the fact everything is tied to the CPUObject so we register everything multiple times. We could do a if (!registerd) register() dance but I guess I'm thinking forward to a hydrogenous future but I guess we'd need to do more work then anyway. Any chance we could move it all to the CPUClass? r~
Re: [RFC PATCH] gdbstub: Re-factor gdb command extensions
On 7/16/24 21:42, Alex Bennée wrote: void gdb_extend_qsupported_features(char *qsupported_features) { -/* - * We don't support different sets of CPU gdb features on different CPUs yet - * so assert the feature strings are the same on all CPUs, or is set only - * once (1 CPU). - */ -g_assert(extended_qsupported_features == NULL || - g_strcmp0(extended_qsupported_features, qsupported_features) == 0); - -extended_qsupported_features = qsupported_features; +if (!extended_qsupported_features) { +extended_qsupported_features = g_strdup(qsupported_features); +} else if (!g_strrstr(extended_qsupported_features, qsupported_features)) { Did you really need the last instance of the substring? I'll note that g_strrstr is quite simplistic, whereas strstr has a much more scalable algorithm. +char *old = extended_qsupported_features; +extended_qsupported_features = g_strdup_printf("%s%s", old, qsupported_features); Right tool for the right job, please: g_strconcat(). That said, did you *really* want to concatenate now, and have to search through the middle, as opposed to storing N strings separately? You could defer the concat until the actual negotiation with gdb. That would reduce strstr above to a loop over strcmp. +for (int i = 0; i < extensions->len; i++) { +gpointer entry = g_ptr_array_index(extensions, i); +if (!g_ptr_array_find(table, entry, NULL)) { +g_ptr_array_add(table, entry); Are you expecting the same GdbCmdParseEntry object to be registered multiple times? Can we fix that at a higher level? r~
Re: [PATCH v3] osdep: add a qemu_close_all_open_fd() helper
On 7/17/24 00:39, Clément Léger wrote: +/* Restrict the range as we found fds matching start/end */ +if (i == skip_start) +skip_start++; +else if (i == skip_end) +skip_end--; Need braces. Otherwise, Reviewed-by: Richard Henderson r~
Re: [PULL 0/6] Python patches
On 7/16/24 03:32, John Snow wrote: The following changes since commit 4469bee2c529832d762af4a2f89468c926f02fe4: Merge tag 'nvme-next-pull-request' ofhttps://gitlab.com/birkelund/qemu into staging (2024-07-11 14:32:51 -0700) are available in the Git repository at: https://gitlab.com/jsnow/qemu.git tags/python-pull-request for you to fetch changes up to dd23f9ec519db9c424223cff8767715de5532718: docs: remove Sphinx 1.x compatibility code (2024-07-12 16:46:21 -0400) Python: 3.13 compat & sphinx minver bump Applied, thanks. Please update https://wiki.qemu.org/ChangeLog/9.1 as appropriate. r~
FreeBSD update required for CI?
Hi guys, CI currently failing FreeBSD: https://gitlab.com/qemu-project/qemu/-/jobs/7347517439 pkg: No packages available to install matching 'py39-pillow' have been found in the repositories pkg: No packages available to install matching 'py39-pip' have been found in the repositories pkg: No packages available to install matching 'py39-sphinx' have been found in the repositories pkg: No packages available to install matching 'py39-sphinx_rtd_theme' have been found in the repositories pkg: No packages available to install matching 'py39-yaml' have been found in the repositories Has FreeBSD ports updated to something beyond python 3.9, and we need an update to match? r~
Re: [PATCH] disas: Fix build against Capstone v6
On 7/16/24 07:39, Gustavo Romero wrote: Capstone v6 made major changes, such as renaming for AArch64, which broke programs using the old headers, like QEMU. However, Capstone v6 provides the CAPSTONE_AARCH64_COMPAT_HEADER compatibility definition allowing to build against v6 with the old definitions, so fix the QEMU build using it. We can lift that definition and switch to the new naming once our supported distros have Capstone v6 in place. Signed-off-by: Gustavo Romero Suggested-by: Peter Maydell --- include/disas/capstone.h | 1 + 1 file changed, 1 insertion(+) diff --git a/include/disas/capstone.h b/include/disas/capstone.h index e29068dd97..a11985151d 100644 --- a/include/disas/capstone.h +++ b/include/disas/capstone.h @@ -3,6 +3,7 @@ #ifdef CONFIG_CAPSTONE +#define CAPSTONE_AARCH64_COMPAT_HEADER #include #else Reviewed-by: Richard Henderson r~
Re: [PATCH v2 13/13] target/riscv: Simplify probing in vext_ldff
On 7/15/24 17:06, Max Chou wrote: +/* Probe nonfault on subsequent elements. */ +flags = probe_access_flags(env, addr, offset, MMU_DATA_LOAD, + mmu_index, true, , 0); +if (flags) { According to the section 7.7. Unit-stride Fault-Only-First Loads in the v spec (v1.0) When the fault-only- data-watchpoint trap on an element after the implementations should not reduce vl but instead should trigger the debug trap as otherwise the event might be lost. Hmm, ok. Interesting. And I think that there is a potential issue in the original implementation that maybe we can fix in this patch. We need to assign the correct element load size to the probe_access_internal function called by tlb_vaddr_to_host in original implementation or is called directly in this patch. The size parameter will be used by the pmp_hart_has_privs function to do the physical memory protection (PMP) checking. If we set the size parameter to the remain page size, we may get unexpected trap caused by the PMP rules that covered the regions of masked-off elements. Maybe we can replace the while loop liked below. vext_ldff(void *vd, void *v0, target_ulong base, ... { ... uint32_t size = nf << log2_esz; VSTART_CHECK_EARLY_EXIT(env); /* probe every access */ for (i = env->vstart; i < env->vl; i++) { if (!vm && !vext_elem_mask(v0, i)) { continue; } addr = adjust_addr(env, base + i * size); if (i == 0) { probe_pages(env, addr, size, ra, MMU_DATA_LOAD); } else { /* if it triggers an exception, no need to check watchpoint */ void *host; int flags; /* Probe nonfault on subsequent elements. */ flags = probe_access_flags(env, addr, size, MMU_DATA_LOAD, mmu_index, true, , 0); if (flags & ~TLB_WATCHPOINT) { /* * Stop any flag bit set: * invalid (unmapped) * mmio (transaction failed) * In all cases, handle as the first load next time. */ vl = i; break; } } } No, I don't think repeated probing is a good idea. You'll lose everything you attempted to gain with the other improvements. It seems, to handle watchpoints, you need to start by probing the entire length non-fault. That will tell you if any portion of the length has any of the problem cases. The fast path will not, of course. After probing, you have flags for the 1 or two pages, and you can make a choice about the actual load length: - invalid on first page: either the first element faults, or you need to check PMP via some alternate mechanism. Do not be afraid to add something to CPUTLBEntryFull.extra.riscv during tlb_fill in order to accelerate this, if needed. - mmio on first page: just one element, as the second might fault during the transaction. It would be possible to enhance riscv_cpu_do_transaction_failed to suppress the fault and set a flag noting the fault. This would allow multiple elements to be loaded, at the expense of another check after each element within the slow tlb-load path. I don't know if this is desirable, really. Using vector operations on mmio is usually a programming error. :-) - invalid or mmio on second page, continue to the end of the first page. Once we have the actual load length, handle watchpoints by hand. See sve_cont_ldst_watchpoints. Finally, the loop loading the elements, likely in ram via host pointer. r~
Re: [PATCH 2/3] target/arm: Use FPST_F16 for SME FMOPA (widening)
On 7/15/24 22:58, Richard Henderson wrote: This operation has float16 inputs and thus must use the FZ16 control not the FZ control. Cc: qemu-sta...@nongnu.org Reported-by: Daniyal Khan Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2374 Signed-off-by: Richard Henderson --- target/arm/tcg/translate-sme.c | 12 1 file changed, 8 insertions(+), 4 deletions(-) Fixes: 3916841ac75 ("target/arm: Implement FMOPA, FMOPS (widening)") r~
[PATCH 0/3] target/arm: Fixes for SME FMOPA (#2373)
Hi Daniyal, Your fix for sme_fmopa_s is correct, but not the FZ16 fix. We represent FZ16 with a separate float_status structure, so all that is needed is to use that. Thanks for the test cases. I cleaned them up a little, and wired them into the Makefile. r~ Supercedes: 172090222034.13953.1688870870882292209...@git.sr.ht Daniyal Khan (2): target/arm: Use float_status copy in sme_fmopa_s tests/tcg/aarch64: Add test cases for SME FMOPA (widening) Richard Henderson (1): target/arm: Use FPST_F16 for SME FMOPA (widening) target/arm/tcg/sme_helper.c | 2 +- target/arm/tcg/translate-sme.c| 12 -- tests/tcg/aarch64/sme-fmopa-1.c | 63 +++ tests/tcg/aarch64/sme-fmopa-2.c | 51 + tests/tcg/aarch64/sme-fmopa-3.c | 58 tests/tcg/aarch64/Makefile.target | 5 ++- 6 files changed, 184 insertions(+), 7 deletions(-) create mode 100644 tests/tcg/aarch64/sme-fmopa-1.c create mode 100644 tests/tcg/aarch64/sme-fmopa-2.c create mode 100644 tests/tcg/aarch64/sme-fmopa-3.c -- 2.43.0
[PATCH 2/3] target/arm: Use FPST_F16 for SME FMOPA (widening)
This operation has float16 inputs and thus must use the FZ16 control not the FZ control. Cc: qemu-sta...@nongnu.org Reported-by: Daniyal Khan Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2374 Signed-off-by: Richard Henderson --- target/arm/tcg/translate-sme.c | 12 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index 46c7fce8b4..185a8a917b 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -304,6 +304,7 @@ static bool do_outprod(DisasContext *s, arg_op *a, MemOp esz, } static bool do_outprod_fpst(DisasContext *s, arg_op *a, MemOp esz, +ARMFPStatusFlavour e_fpst, gen_helper_gvec_5_ptr *fn) { int svl = streaming_vec_reg_size(s); @@ -319,15 +320,18 @@ static bool do_outprod_fpst(DisasContext *s, arg_op *a, MemOp esz, zm = vec_full_reg_ptr(s, a->zm); pn = pred_full_reg_ptr(s, a->pn); pm = pred_full_reg_ptr(s, a->pm); -fpst = fpstatus_ptr(FPST_FPCR); +fpst = fpstatus_ptr(e_fpst); fn(za, zn, zm, pn, pm, fpst, tcg_constant_i32(desc)); return true; } -TRANS_FEAT(FMOPA_h, aa64_sme, do_outprod_fpst, a, MO_32, gen_helper_sme_fmopa_h) -TRANS_FEAT(FMOPA_s, aa64_sme, do_outprod_fpst, a, MO_32, gen_helper_sme_fmopa_s) -TRANS_FEAT(FMOPA_d, aa64_sme_f64f64, do_outprod_fpst, a, MO_64, gen_helper_sme_fmopa_d) +TRANS_FEAT(FMOPA_h, aa64_sme, do_outprod_fpst, a, + MO_32, FPST_FPCR_F16, gen_helper_sme_fmopa_h) +TRANS_FEAT(FMOPA_s, aa64_sme, do_outprod_fpst, a, + MO_32, FPST_FPCR, gen_helper_sme_fmopa_s) +TRANS_FEAT(FMOPA_d, aa64_sme_f64f64, do_outprod_fpst, a, + MO_64, FPST_FPCR, gen_helper_sme_fmopa_d) /* TODO: FEAT_EBF16 */ TRANS_FEAT(BFMOPA, aa64_sme, do_outprod, a, MO_32, gen_helper_sme_bfmopa) -- 2.43.0
[PATCH 1/3] target/arm: Use float_status copy in sme_fmopa_s
From: Daniyal Khan We made a copy above because the fp exception flags are not propagated back to the FPST register, but then failed to use the copy. Cc: qemu-sta...@nongnu.org Fixes: 558e956c719 ("target/arm: Implement FMOPA, FMOPS (non-widening)") Signed-off-by: Daniyal Khan [rth: Split from a larger patch] Signed-off-by: Richard Henderson --- target/arm/tcg/sme_helper.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/target/arm/tcg/sme_helper.c b/target/arm/tcg/sme_helper.c index e2e0575039..5a6dd76489 100644 --- a/target/arm/tcg/sme_helper.c +++ b/target/arm/tcg/sme_helper.c @@ -916,7 +916,7 @@ void HELPER(sme_fmopa_s)(void *vza, void *vzn, void *vzm, void *vpn, if (pb & 1) { uint32_t *a = vza_row + H1_4(col); uint32_t *m = vzm + H1_4(col); -*a = float32_muladd(n, *m, *a, 0, vst); +*a = float32_muladd(n, *m, *a, 0, ); } col += 4; pb >>= 4; -- 2.43.0
[PATCH 3/3] tests/tcg/aarch64: Add test cases for SME FMOPA (widening)
From: Daniyal Khan Signed-off-by: Daniyal Khan Message-Id: 172090222034.13953.1688870870882292209...@git.sr.ht [rth: Split test cases to separate patch, tidy assembly.] Signed-off-by: Richard Henderson --- tests/tcg/aarch64/sme-fmopa-1.c | 63 +++ tests/tcg/aarch64/sme-fmopa-2.c | 51 + tests/tcg/aarch64/sme-fmopa-3.c | 58 tests/tcg/aarch64/Makefile.target | 5 ++- 4 files changed, 175 insertions(+), 2 deletions(-) create mode 100644 tests/tcg/aarch64/sme-fmopa-1.c create mode 100644 tests/tcg/aarch64/sme-fmopa-2.c create mode 100644 tests/tcg/aarch64/sme-fmopa-3.c diff --git a/tests/tcg/aarch64/sme-fmopa-1.c b/tests/tcg/aarch64/sme-fmopa-1.c new file mode 100644 index 00..652c4ea090 --- /dev/null +++ b/tests/tcg/aarch64/sme-fmopa-1.c @@ -0,0 +1,63 @@ +/* + * SME outer product, 1 x 1. + * SPDX-License-Identifier: GPL-2.0-or-later + */ + +#include + +static void foo(float *dst) +{ +asm(".arch_extension sme\n\t" +"smstart\n\t" +"ptrue p0.s, vl4\n\t" +"fmov z0.s, #1.0\n\t" +/* + * An outer product of a vector of 1.0 by itself should be a matrix of 1.0. + * Note that we are using tile 1 here (za1.s) rather than tile 0. + */ +"zero {za}\n\t" +"fmopa za1.s, p0/m, p0/m, z0.s, z0.s\n\t" +/* + * Read the first 4x4 sub-matrix of elements from tile 1: + * Note that za1h should be interchangeable here. + */ +"mov w12, #0\n\t" +"mova z0.s, p0/m, za1v.s[w12, #0]\n\t" +"mova z1.s, p0/m, za1v.s[w12, #1]\n\t" +"mova z2.s, p0/m, za1v.s[w12, #2]\n\t" +"mova z3.s, p0/m, za1v.s[w12, #3]\n\t" +/* + * And store them to the input pointer (dst in the C code): + */ +"st1w {z0.s}, p0, [%0]\n\t" +"add x0, x0, #16\n\t" +"st1w {z1.s}, p0, [x0]\n\t" +"add x0, x0, #16\n\t" +"st1w {z2.s}, p0, [x0]\n\t" +"add x0, x0, #16\n\t" +"st1w {z3.s}, p0, [x0]\n\t" +"smstop" +: : "r"(dst) +: "x12", "d0", "d1", "d2", "d3", "memory"); +} + +int main() +{ +float dst[16] = { }; + +foo(dst); + +for (int i = 0; i < 16; i++) { +if (dst[i] != 1.0f) { +goto failure; +} +} +/* success */ +return 0; + + failure: +for (int i = 0; i < 16; i++) { +printf("%f%c", dst[i], i % 4 == 3 ? '\n' : ' '); +} +return 1; +} diff --git a/tests/tcg/aarch64/sme-fmopa-2.c b/tests/tcg/aarch64/sme-fmopa-2.c new file mode 100644 index 00..198cc31528 --- /dev/null +++ b/tests/tcg/aarch64/sme-fmopa-2.c @@ -0,0 +1,51 @@ +#include +#include + +static void test_fmopa(uint32_t *result) +{ +asm(".arch_extension sme\n\t" +"smstart\n\t" /* Z*, P* and ZArray cleared */ +"ptrue p2.b, vl16\n\t" /* Limit vector length to 16 */ +"ptrue p5.b, vl16\n\t" +"movi d0, #0x00ff\n\t" /* fp16 denormal */ +"movi d16, #0x00ff\n\t" +"mov w15, #0x000100\n\t" /* FZ=1, FZ16=0 */ +"msr fpcr, x15\n\t" +"fmopa za3.s, p2/m, p5/m, z16.h, z0.h\n\t" +"mov w15, #0\n\t" +"st1w {za3h.s[w15, 0]}, p2, [%0]\n\t" +"add %0, %0, #16\n\t" +"st1w {za3h.s[w15, 1]}, p2, [%0]\n\t" +"mov w15, #2\n\t" +"add %0, %0, #16\n\t" +"st1w {za3h.s[w15, 0]}, p2, [%0]\n\t" +"add %0, %0, #16\n\t" +"st1w {za3h.s[w15, 1]}, p2, [%0]\n\t" +"smstop" +: "+r"(result) : +: "x15", "x16", "p2", "p5", "d0", "d16", "memory"); +} + +int main(void) +{ +uint32_t result[4 * 4] = { }; + +test_fmopa(result); + +if (result[0] != 0x2f7e0100) { +printf("Test failed: Incorrect output in first 4 bytes\n" + "Expected: %08x\n" + "Got: %08x\n", + 0x2f7e0100, result[0]); +return 1; +} + +for (int i = 1; i < 16; ++i) { +if (result[i] != 0) { +printf("Test failed: Non-zero word at position %d\n", i); +return 1; +} +} + +return 0; +} diff --git a/tests/tcg/aarch64/sme-fmopa-3.c b/tests/tcg/aarch64/sme-fmopa-3.c new file mode 100644 index 00..6617355c9d --- /dev/null +++ b/tests/tcg/aarch64/sme-fmopa-3.c @@ -0,0 +1,58 @@ +#include +#incl
Re: [PULL v2 0/1] ufs queue
On 7/14/24 01:24, Jeuk Kim wrote: From: Jeuk Kim The following changes since commit 37fbfda8f4145ba1700f63f0cb7be4c108d545de: Merge tag 'edgar/xen-queue-2024-07-12.for-upstream' ofhttps://gitlab.com/edgar.iglesias/qemu into staging (2024-07-12 09:53:22 -0700) are available in the Git repository at: https://gitlab.com/jeuk20.kim/qemu.git tags/pull-ufs-20240714 for you to fetch changes up to 50475f1511964775ff73c2b07239c3ff571f75cd: hw/ufs: Fix mcq register range check logic (2024-07-14 17:11:21 +0900) hw/ufs: - Fix invalid address access in mcq register check I didn't cc qemu-stable@, as 5c079578d2e4 ("hw/ufs: Add support MCQ of UFSHCI 4.0") is not yet included in any release tag. If I'm wrong, please let me know. Thanks. Applied, thanks. Please update https://wiki.qemu.org/ChangeLog/9.1 as appropriate. r~
Re: [PULL 00/13] target/i386 changes for 2024-07-12
On 7/14/24 04:10, Paolo Bonzini wrote: The following changes since commit 23901b2b721c0576007ab7580da8aa855d6042a9: Merge tag 'pull-target-arm-20240711' of https://git.linaro.org/people/pmaydell/qemu-arm into staging (2024-07-11 12:00:00 -0700) are available in the Git repository at: https://gitlab.com/bonzini/qemu.git tags/for-upstream-i386 for you to fetch changes up to cdcadf9ee9efef96323e0b88fccff589f06fc0ee: i386/sev: Don't allow automatic fallback to legacy KVM_SEV*_INIT (2024-07-12 15:35:54 +0200) * target/i386/tcg: fixes for seg_helper.c * SEV: Don't allow automatic fallback to legacy KVM_SEV_INIT, but also don't use it by default Fails testing: https://gitlab.com/qemu-project/qemu/-/jobs/7338361630 2024-07-14 23:45:07,744 __init__ L0153 DEBUG| EIP: alternative_instructions+0x2b/0xfa 2024-07-14 23:45:07,746 __init__ L0153 DEBUG| Code: 89 e5 83 ec 08 64 a1 c0 06 f4 c3 89 45 fc 31 c0 b8 e4 f7 ef c3 c7 45 f8 00 00 00 00 e8 84 6f 7a ff 85 c0 74 02 0f 0b 8d 45 f8 90 90 90 90 83 7d f8 01 74 02 0f 0b b8 e4 f7 ef c3 e8 04 6e 7a 2024-07-14 23:45:07,747 __init__ L0153 DEBUG| EAX: c3e0bf38 EBX: ECX: EDX: 00200292 2024-07-14 23:45:07,747 __init__ L0153 DEBUG| ESI: c3d54b3f EDI: c3d555e0 EBP: c3e0bf40 ESP: c3e0bf38 2024-07-14 23:45:07,748 __init__ L0153 DEBUG| DS: 007b ES: 007b FS: 00d8 GS: SS: 0068 EFLAGS: 00210246 2024-07-14 23:45:07,748 __init__ L0153 DEBUG| CR0: 80050033 CR2: c3e0bf34 CR3: 03f4c000 CR4: 06d0 2024-07-14 23:45:07,748 __init__ L0153 DEBUG| Call Trace: 2024-07-14 23:45:07,750 __init__ L0153 DEBUG| check_bugs+0x900/0x91e 2024-07-14 23:45:07,750 __init__ L0153 DEBUG| ? __get_locked_pte+0x67/0xb0 2024-07-14 23:45:07,750 __init__ L0153 DEBUG| start_kernel+0x4d3/0x501 2024-07-14 23:45:07,750 __init__ L0153 DEBUG| ? set_intr_gate+0x42/0x55 2024-07-14 23:45:07,750 __init__ L0153 DEBUG| i386_start_kernel+0x43/0x45 2024-07-14 23:45:07,751 __init__ L0153 DEBUG| startup_32_smp+0x161/0x164 2024-07-14 23:45:07,751 __init__ L0153 DEBUG| Modules linked in: 2024-07-14 23:45:07,751 __init__ L0153 DEBUG| CR2: c3e0bf34 2024-07-14 23:45:07,752 __init__ L0153 DEBUG| ---[ end trace 7adaac7a13f2a45f ]--- 2024-07-14 23:45:07,752 __init__ L0153 DEBUG| EIP: alternative_instructions+0x2b/0xfa 2024-07-14 23:45:07,753 __init__ L0153 DEBUG| Code: 89 e5 83 ec 08 64 a1 c0 06 f4 c3 89 45 fc 31 c0 b8 e4 f7 ef c3 c7 45 f8 00 00 00 00 e8 84 6f 7a ff 85 c0 74 02 0f 0b 8d 45 f8 90 90 90 90 83 7d f8 01 74 02 0f 0b b8 e4 f7 ef c3 e8 04 6e 7a 2024-07-14 23:45:07,753 __init__ L0153 DEBUG| EAX: c3e0bf38 EBX: ECX: EDX: 00200292 2024-07-14 23:45:07,753 __init__ L0153 DEBUG| ESI: c3d54b3f EDI: c3d555e0 EBP: c3e0bf40 ESP: c3e0bf38 2024-07-14 23:45:07,754 __init__ L0153 DEBUG| DS: 007b ES: 007b FS: 00d8 GS: SS: 0068 EFLAGS: 00210246 2024-07-14 23:45:07,754 __init__ L0153 DEBUG| CR0: 80050033 CR2: c3e0bf34 CR3: 03f4c000 CR4: 06d0 2024-07-14 23:45:07,754 __init__ L0153 DEBUG| Kernel panic - not syncing: Attempted to kill the idle task! r~
Reminder: soft freeze on 23 July
https://wiki.qemu.org/Planning/9.1 Just a friendly reminder that soft freeze is coming up soon. r~
Re: [PULL v1 0/3] Xen queue
On 7/12/24 04:02, Edgar E. Iglesias wrote: From: "Edgar E. Iglesias" The following changes since commit 23901b2b721c0576007ab7580da8aa855d6042a9: Merge tag 'pull-target-arm-20240711' ofhttps://git.linaro.org/people/pmaydell/qemu-arm into staging (2024-07-11 12:00:00 -0700) are available in the Git repository at: https://gitlab.com/edgar.iglesias/qemu.git tags/edgar/xen-queue-2024-07-12.for-upstream for you to fetch changes up to 872cb9cced796e75d4f719c31d70ed5fd629efca: xen: mapcache: Fix unmapping of first entries in buckets (2024-07-12 00:17:36 +0200) Edgars Xen queue. Applied, thanks. Please update https://wiki.qemu.org/ChangeLog/9.1 as appropriate. r~
Re: [PULL v2 0/8] loongarch-to-apply queue
On 7/11/24 18:36, Song Gao wrote: The following changes since commit 23901b2b721c0576007ab7580da8aa855d6042a9: Merge tag 'pull-target-arm-20240711' ofhttps://git.linaro.org/people/pmaydell/qemu-arm into staging (2024-07-11 12:00:00 -0700) are available in the Git repository at: https://gitlab.com/gaosong/qemu.git tags/pull-loongarch-20240712 for you to fetch changes up to 3ef4b21a5c767ff0b15047e709762abef490ad07: target/loongarch: Fix cpu_reset set wrong CSR_CRMD (2024-07-12 09:41:18 +0800) pull-loongarch-20240712 v2: drop patch 'hw/loongarch: Modify flash block size to 256K'. Applied, thanks. Please update https://wiki.qemu.org/ChangeLog/9.1 as appropriate. r~
Re: [PULL 0/7] hw/nvme patches
On 7/11/24 11:04, Klaus Jensen wrote: From: Klaus Jensen Hi, The following changes since commit 59084feb256c617063e0dbe7e64821ae8852d7cf: Merge tag 'pull-aspeed-20240709' ofhttps://github.com/legoater/qemu into staging (2024-07-09 07:13:55 -0700) are available in the Git repository at: https://gitlab.com/birkelund/qemu.git tags/nvme-next-pull-request for you to fetch changes up to 15ef124c93a4d4ba6b98b55492e3a1b3297248b0: hw/nvme: Expand VI/VQ resource to uint32 (2024-07-11 17:05:37 +0200) Applied, thanks. Please update https://wiki.qemu.org/ChangeLog/9.1 as appropriate. r~
Re: [PATCH] accel/tcg: Make cpu_exec_interrupt hook mandatory
On 7/12/24 04:39, Peter Maydell wrote: TheTCGCPUOps::cpu_exec_interrupt hook is currently not mandatory; if it is left NULL then we treat it as if it had returned false. However since pretty much every architecture needs to handle interrupts, almost every target we have provides the hook. The one exception is Tricore, which doesn't currently implement the architectural interrupt handling. Add a "do nothing" implementation of cpu_exec_hook for Tricore, assert on startup that the CPU does provide the hook, and remove the runtime NULL check before calling it. Signed-off-by: Peter Maydell --- accel/tcg/cpu-exec.c | 4 ++-- target/tricore/cpu.c | 6 ++ 2 files changed, 8 insertions(+), 2 deletions(-) Reviewed-by: Richard Henderson r~
Re: [RFC PATCH 5/8] tests_pytest: Implement fetch_asset() method for downloading assets
On 7/11/24 12:23, Alex Bennée wrote: Richard Henderson writes: On 7/11/24 09:45, Richard Henderson wrote: On 7/11/24 04:55, Thomas Huth wrote: + def fetch_asset(self, url, asset_hash): + cache_dir = os.path.expanduser("~/.cache/qemu/download") + if not os.path.exists(cache_dir): + os.makedirs(cache_dir) + fname = os.path.join(cache_dir, + hashlib.sha1(url.encode("utf-8")).hexdigest()) + if os.path.exists(fname) and self.check_hash(fname, asset_hash): + return fname + logging.debug("Downloading %s to %s...", url, fname) + subprocess.check_call(["wget", "-c", url, "-O", fname + ".download"]) + os.rename(fname + ".download", fname) + return fname Download failure via exception? Check hash on downloaded asset? I would prefer to see assets, particularly downloading, handled in a separate pass from tests. And I assume cachable? The cache is already handled here. But downloading after cache miss is non-optional, may not fail, and is accounted against the meson test timeout. r~
Re: [PULL 00/24] target-arm queue
On 7/11/24 06:17, Peter Maydell wrote: The following changes since commit 59084feb256c617063e0dbe7e64821ae8852d7cf: Merge tag 'pull-aspeed-20240709' ofhttps://github.com/legoater/qemu into staging (2024-07-09 07:13:55 -0700) are available in the Git repository at: https://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20240711 for you to fetch changes up to 7f49089158a4db644fcbadfa90cd3d30a4868735: target/arm: Convert PMULL to decodetree (2024-07-11 11:41:34 +0100) target-arm queue: * Refactor FPCR/FPSR handling in preparation for FEAT_AFP * More decodetree conversions * target/arm: Use cpu_env in cpu_untagged_addr * target/arm: Set arm_v7m_tcg_ops cpu_exec_halt to arm_cpu_exec_halt() * hw/char/pl011: Avoid division-by-zero in pl011_get_baudrate() * hw/misc/bcm2835_thermal: Fix access size handling in bcm2835_thermal_ops * accel/tcg: MakeTCGCPUOps::cpu_exec_halt mandatory * STM32L4x5: Handle USART interrupts correctly Applied, thanks. Please update https://wiki.qemu.org/ChangeLog/9.1 as appropriate. r~
Re: [PULL 0/2] Block patches
On 7/11/24 02:17, Stefan Hajnoczi wrote: The following changes since commit 59084feb256c617063e0dbe7e64821ae8852d7cf: Merge tag 'pull-aspeed-20240709' ofhttps://github.com/legoater/qemu into staging (2024-07-09 07:13:55 -0700) are available in the Git repository at: https://gitlab.com/stefanha/qemu.git tags/block-pull-request for you to fetch changes up to d05ae948cc887054495977855b0859d0d4ab2613: Consider discard option when writing zeros (2024-07-11 11:06:36 +0200) Pull request A discard fix from Nir Soffer. Applied, thanks. Please update https://wiki.qemu.org/ChangeLog/9.1 as appropriate. r~
Re: [PULL 0/1] Host Memory Backends and Memory devices queue 2024-07-10
On 7/10/24 11:00, David Hildenbrand wrote: The following changes since commit 59084feb256c617063e0dbe7e64821ae8852d7cf: Merge tag 'pull-aspeed-20240709' ofhttps://github.com/legoater/qemu into staging (2024-07-09 07:13:55 -0700) are available in the Git repository at: https://github.com/davidhildenbrand/qemu.git tags/mem-2024-07-10 for you to fetch changes up to 4d13ae45ff93fa825ceb39dfd16b305f4baccd18: virtio-mem: improve error message when unplug of device fails due to plugged memory (2024-07-10 18:06:24 +0200) Hi, "Host Memory Backends" and "Memory devices" queue ("mem"): - Only one error message improvement that causes less confusion when triggered from libvirt Applied, thanks. Please update https://wiki.qemu.org/ChangeLog/9.1 as appropriate. r~
Re: [RFC PATCH 5/8] tests_pytest: Implement fetch_asset() method for downloading assets
On 7/11/24 09:45, Richard Henderson wrote: On 7/11/24 04:55, Thomas Huth wrote: + def fetch_asset(self, url, asset_hash): + cache_dir = os.path.expanduser("~/.cache/qemu/download") + if not os.path.exists(cache_dir): + os.makedirs(cache_dir) + fname = os.path.join(cache_dir, + hashlib.sha1(url.encode("utf-8")).hexdigest()) + if os.path.exists(fname) and self.check_hash(fname, asset_hash): + return fname + logging.debug("Downloading %s to %s...", url, fname) + subprocess.check_call(["wget", "-c", url, "-O", fname + ".download"]) + os.rename(fname + ".download", fname) + return fname Download failure via exception? Check hash on downloaded asset? I would prefer to see assets, particularly downloading, handled in a separate pass from tests. (1) Asset download should not count against test timeout. (2) Running tests while disconnected should skip unavailable assets. Avocado kinda does this, but still generates errors instead of skips. r~
Re: [PATCH v2] osdep: add a qemu_close_all_open_fd() helper
On 6/18/24 04:17, Clément Léger wrote: Since commit 03e471c41d8b ("qemu_init: increase NOFILE soft limit on POSIX"), the maximum number of file descriptors that can be opened are raised to nofile.rlim_max. On recent debian distro, this yield a maximum of 1073741816 file descriptors. Now, when forking to start qemu-bridge-helper, this actually calls close() on the full possible file descriptor range (more precisely [3 - sysconf(_SC_OPEN_MAX)]) which takes a considerable amount of time. In order to reduce that time, factorize existing code to close all open files descriptors in a new qemu_close_all_open_fd() function. This function uses various methods to close all the open file descriptors ranging from the most efficient one to the least one. It also accepts an ordered array of file descriptors that should not be closed since this is required by the callers that calls it after forking. Signed-off-by: Clément Léger v2: - Factorize async_teardown.c close_fds implementation as well as tap.c ones - Apply checkpatch - v1: https://lore.kernel.org/qemu-devel/20240617162520.4045016-1-cle...@rivosinc.com/ --- include/qemu/osdep.h| 8 +++ net/tap.c | 31 ++- system/async-teardown.c | 37 + util/osdep.c| 115 4 files changed, 141 insertions(+), 50 deletions(-) diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h index f61edcfdc2..9369a97d3d 100644 --- a/include/qemu/osdep.h +++ b/include/qemu/osdep.h @@ -755,6 +755,14 @@ static inline void qemu_reset_optind(void) int qemu_fdatasync(int fd); +/** + * Close all open file descriptors except the ones supplied in the @skip array + * + * @skip: ordered array of distinct file descriptors that should not be closed + * @nskip: number of entries in the @skip array. + */ +void qemu_close_all_open_fd(const int *skip, unsigned int nskip); + /** * Sync changes made to the memory mapped file back to the backing * storage. For POSIX compliant systems this will fallback diff --git a/net/tap.c b/net/tap.c index 51f7aec39d..6fc3939078 100644 --- a/net/tap.c +++ b/net/tap.c @@ -385,6 +385,21 @@ static TAPState *net_tap_fd_init(NetClientState *peer, return s; } +static void close_all_fds_after_fork(int excluded_fd) +{ +const int skip_fd[] = {0, 1, 2, 3, excluded_fd}; 3 should not be included here... +unsigned int nskip = ARRAY_SIZE(skip_fd); + +/* + * skip_fd must be an ordered array of distinct fds, exclude + * excluded_fd if already included in the [0 - 3] range + */ +if (excluded_fd <= 3) { or here -- stdin is 0, stdout is 1, stderr is 2. Perhaps we need reminding of this and use the STD*_FILENO names instead of raw integer constants. @@ -400,13 +415,7 @@ static void launch_script(const char *setup_script, const char *ifname, return; } if (pid == 0) { -int open_max = sysconf(_SC_OPEN_MAX), i; - -for (i = 3; i < open_max; i++) { -if (i != fd) { -close(i); -} -} Note that the original *does* close 3. +#ifdef CONFIG_LINUX +static bool qemu_close_all_open_fd_proc(const int *skip, unsigned int nskip) +{ +struct dirent *de; +int fd, dfd; +bool close_fd; +DIR *dir; +int i; + +dir = opendir("/proc/self/fd"); +if (!dir) { +/* If /proc is not mounted, there is nothing that can be done. */ +return false; +} +/* Avoid closing the directory. */ +dfd = dirfd(dir); + +for (de = readdir(dir); de; de = readdir(dir)) { +fd = atoi(de->d_name); +close_fd = true; +if (fd == dfd) { +close_fd = false; +} else { +for (i = 0; i < nskip; i++) { The skip list is sorted, so you should remember the point of the last search and begin from there, and you should not search past fd < skip[i]. +#else +static bool qemu_close_all_open_fd_proc(const int *skip, unsigned int nskip) +{ +return false; +} +#endif I'm not fond of duplicating the function declaration. I think it's better to move the ifdef inside: static bool foo(...) { #ifdef XYZ impl #else stub #endif } + +#ifdef CONFIG_CLOSE_RANGE +static bool qemu_close_all_open_fd_close_range(const int *skip, + unsigned int nskip) +{ +int max_fd = sysconf(_SC_OPEN_MAX) - 1; +int first = 0, last = max_fd; +int cur_skip = 0, ret; + +do { +if (nskip) { +while (first == skip[cur_skip]) { +cur_skip++; +first++; +} This fails to check cur_skip < nskip in the loop. Mixing signed cur_skip with unsigned nskip is bad. There seems to be no good reason for the separate "if (nskip)" check. A proper check for cur_skip < nskip will work just fine with nskip == 0. +/* Fallback */ +for (i = 0; i < open_max; i++) {
Re: [RFC PATCH 5/8] tests_pytest: Implement fetch_asset() method for downloading assets
On 7/11/24 04:55, Thomas Huth wrote: +def fetch_asset(self, url, asset_hash): +cache_dir = os.path.expanduser("~/.cache/qemu/download") +if not os.path.exists(cache_dir): +os.makedirs(cache_dir) +fname = os.path.join(cache_dir, + hashlib.sha1(url.encode("utf-8")).hexdigest()) +if os.path.exists(fname) and self.check_hash(fname, asset_hash): +return fname +logging.debug("Downloading %s to %s...", url, fname) +subprocess.check_call(["wget", "-c", url, "-O", fname + ".download"]) +os.rename(fname + ".download", fname) +return fname Download failure via exception? Check hash on downloaded asset? r~
Re: [PATCH 09/10] target/i386/tcg: use X86Access for TSS access
On 7/10/24 23:28, Paolo Bonzini wrote: On 7/10/24 20:40, Paolo Bonzini wrote: Il mer 10 lug 2024, 18:47 Richard Henderson <mailto:richard.hender...@linaro.org>> ha scritto: On 7/9/24 23:29, Paolo Bonzini wrote: > This takes care of probing the vaddr range in advance, and is also faster > because it avoids repeated TLB lookups. It also matches the Intel manual > better, as it says "Checks that the current (old) TSS, new TSS, and all > segment descriptors used in the task switch are paged into system memory"; > note however that it's not clear how the processor checks for segment > descriptors, and this check is not included in the AMD manual. > > Signed-off-by: Paolo Bonzini mailto:pbonz...@redhat.com>> > --- > target/i386/tcg/seg_helper.c | 101 ++- > 1 file changed, 51 insertions(+), 50 deletions(-) > > diff --git a/target/i386/tcg/seg_helper.c b/target/i386/tcg/seg_helper.c > index 25af9d4a4ec..77f2c65c3cf 100644 > --- a/target/i386/tcg/seg_helper.c > +++ b/target/i386/tcg/seg_helper.c > @@ -311,35 +313,44 @@ static int switch_tss_ra(CPUX86State *env, int tss_selector, > raise_exception_err_ra(env, EXCP0A_TSS, tss_selector & 0xfffc, retaddr); > } > > + /* X86Access avoids memory exceptions during the task switch */ > + access_prepare_mmu(, env, env->tr.base, old_tss_limit_max, > + MMU_DATA_STORE, cpu_mmu_index_kernel(env), retaddr); > + > + if (source == SWITCH_TSS_CALL) { > + /* Probe for future write of parent task */ > + probe_access(env, tss_base, 2, MMU_DATA_STORE, > + cpu_mmu_index_kernel(env), retaddr); > + } > + access_prepare_mmu(, env, tss_base, tss_limit, > + MMU_DATA_LOAD, cpu_mmu_index_kernel(env), retaddr); You're computing cpu_mmu_index_kernel 3 times. Squashing this in (easier to review than the whole thing): Excellent, thanks! r~ diff --git a/target/i386/tcg/seg_helper.c b/target/i386/tcg/seg_helper.c index 4123ff1245e..4edfd26135f 100644 --- a/target/i386/tcg/seg_helper.c +++ b/target/i386/tcg/seg_helper.c @@ -321,7 +321,7 @@ static void switch_tss_ra(CPUX86State *env, int tss_selector, uint32_t new_eflags, new_eip, new_cr3, new_ldt, new_trap; uint32_t old_eflags, eflags_mask; SegmentCache *dt; - int index; + int mmu_index, index; target_ulong ptr; X86Access old, new; @@ -378,16 +378,17 @@ static void switch_tss_ra(CPUX86State *env, int tss_selector, } /* X86Access avoids memory exceptions during the task switch */ + mmu_index = cpu_mmu_index_kernel(env); access_prepare_mmu(, env, env->tr.base, old_tss_limit_max, - MMU_DATA_STORE, cpu_mmu_index_kernel(env), retaddr); + MMU_DATA_STORE, mmu_index, retaddr); if (source == SWITCH_TSS_CALL) { /* Probe for future write of parent task */ probe_access(env, tss_base, 2, MMU_DATA_STORE, - cpu_mmu_index_kernel(env), retaddr); + mmu_index, retaddr); } access_prepare_mmu(, env, tss_base, tss_limit, - MMU_DATA_LOAD, cpu_mmu_index_kernel(env), retaddr); + MMU_DATA_LOAD, mmu_index, retaddr); /* read all the registers from the new TSS */ if (type & 8) { @@ -468,7 +469,11 @@ static void switch_tss_ra(CPUX86State *env, int tss_selector, context */ if (source == SWITCH_TSS_CALL) { - cpu_stw_kernel_ra(env, tss_base, env->tr.selector, retaddr); + /* + * Thanks to the probe_access above, we know the first two + * bytes addressed by are writable too. + */ + access_stw(, tss_base, env->tr.selector); new_eflags |= NT_MASK; } Paolo
Re: Disassembler location
On 7/10/24 14:55, Paolo Bonzini wrote: The others are not hosts, only targets. By putting the file in target//, they do not need to add it to the "disassemblers" variable in meson.build---but they add it anyway. :) We should clean that up. :-) r~
Re: Disassembler location
On 7/10/24 11:02, Michael Morrell wrote: I'm working on a port to a new architecture and was noticing a discrepancy in where the disassembler code lives. There is a file "target//disas.c" for 4 architectures (avr, loongarch, openrisc, and rx), but a file "disas/.c" for 14 architectures (if I counted right). It seems the 4 architectures using "target//disas.c" are more recently added so I was wondering if that is now the preferred location. I couldn't find information on this, but I wasn't sure where to look. Any advice? The older disas/arch.c files come from binutils, prior to the GPLv3 license change. These are generally very old architectures, or not up to date. The newer target/arch/disas.c are for architectures for which the translator and the disassembler share generated code via decodetree. If you're implementing a new architecture from scratch, this is your best choice. The "best" supported are those with support in system libcapstone. :-) r~
Re: [PATCH 10/10] target/i386/tcg: save current task state before loading new one
On 7/9/24 23:29, Paolo Bonzini wrote: This is how the steps are ordered in the manual. EFLAGS.NT is overwritten after the fact in the saved image. Signed-off-by: Paolo Bonzini --- target/i386/tcg/seg_helper.c | 85 +++- 1 file changed, 45 insertions(+), 40 deletions(-) Reviewed-by: Richard Henderson r~
Re: [PATCH 09/10] target/i386/tcg: use X86Access for TSS access
On 7/9/24 23:29, Paolo Bonzini wrote: This takes care of probing the vaddr range in advance, and is also faster because it avoids repeated TLB lookups. It also matches the Intel manual better, as it says "Checks that the current (old) TSS, new TSS, and all segment descriptors used in the task switch are paged into system memory"; note however that it's not clear how the processor checks for segment descriptors, and this check is not included in the AMD manual. Signed-off-by: Paolo Bonzini --- target/i386/tcg/seg_helper.c | 101 ++- 1 file changed, 51 insertions(+), 50 deletions(-) diff --git a/target/i386/tcg/seg_helper.c b/target/i386/tcg/seg_helper.c index 25af9d4a4ec..77f2c65c3cf 100644 --- a/target/i386/tcg/seg_helper.c +++ b/target/i386/tcg/seg_helper.c @@ -27,6 +27,7 @@ #include "exec/log.h" #include "helper-tcg.h" #include "seg_helper.h" +#include "access.h" int get_pg_mode(CPUX86State *env) { @@ -250,7 +251,7 @@ static int switch_tss_ra(CPUX86State *env, int tss_selector, uint32_t e1, uint32_t e2, int source, uint32_t next_eip, uintptr_t retaddr) { -int tss_limit, tss_limit_max, type, old_tss_limit_max, old_type, v1, v2, i; +int tss_limit, tss_limit_max, type, old_tss_limit_max, old_type, i; target_ulong tss_base; uint32_t new_regs[8], new_segs[6]; uint32_t new_eflags, new_eip, new_cr3, new_ldt, new_trap; @@ -258,6 +259,7 @@ static int switch_tss_ra(CPUX86State *env, int tss_selector, SegmentCache *dt; int index; target_ulong ptr; +X86Access old, new; type = (e2 >> DESC_TYPE_SHIFT) & 0xf; LOG_PCALL("switch_tss: sel=0x%04x type=%d src=%d\n", tss_selector, type, @@ -311,35 +313,44 @@ static int switch_tss_ra(CPUX86State *env, int tss_selector, raise_exception_err_ra(env, EXCP0A_TSS, tss_selector & 0xfffc, retaddr); } +/* X86Access avoids memory exceptions during the task switch */ +access_prepare_mmu(, env, env->tr.base, old_tss_limit_max, + MMU_DATA_STORE, cpu_mmu_index_kernel(env), retaddr); + +if (source == SWITCH_TSS_CALL) { +/* Probe for future write of parent task */ +probe_access(env, tss_base, 2, MMU_DATA_STORE, +cpu_mmu_index_kernel(env), retaddr); +} +access_prepare_mmu(, env, tss_base, tss_limit, + MMU_DATA_LOAD, cpu_mmu_index_kernel(env), retaddr); You're computing cpu_mmu_index_kernel 3 times. This appears to be conservative in that you're requiring only 2 bytes (a minimum) of 0x68 to be writable. Is it legal to place the TSS at offset 0xffe of page 0, with the balance on page 1, with page 0 writable and page 1 read-only? Otherwise I would think you could just check the entire TSS for writability. Anyway, after the MMU_DATA_STORE probe, you have proved that 'X86Access new' contains an address range that may be stored. So you can change the SWITCH_TSS_CALL store below to access_stw() too. @@ -349,16 +360,6 @@ static int switch_tss_ra(CPUX86State *env, int tss_selector, chapters 12.2.5 and 13.2.4 on how to implement TSS Trap bit */ (void)new_trap; -/* NOTE: we must avoid memory exceptions during the task switch, - so we make dummy accesses before */ -/* XXX: it can still fail in some cases, so a bigger hack is - necessary to valid the TLB after having done the accesses */ - -v1 = cpu_ldub_kernel_ra(env, env->tr.base, retaddr); -v2 = cpu_ldub_kernel_ra(env, env->tr.base + old_tss_limit_max, retaddr); -cpu_stb_kernel_ra(env, env->tr.base, v1, retaddr); -cpu_stb_kernel_ra(env, env->tr.base + old_tss_limit_max, v2, retaddr); OMG. Looks like a fantastic cleanup overall. r~
Re: [PATCH 08/10] target/i386/tcg: check for correct busy state before switching to a new task
On 7/9/24 23:29, Paolo Bonzini wrote: This step is listed in the Intel manual: "Checks that the new task is available (call, jump, exception, or interrupt) or busy (IRET return)". The AMD manual lists the same operation under the "Preventing recursion" paragraph of "12.3.4 Nesting Tasks", though it is not clear if the processor checks the busy bit in the IRET case. Signed-off-by: Paolo Bonzini --- target/i386/tcg/seg_helper.c | 5 + 1 file changed, 5 insertions(+) Reviewed-by: Richard Henderson r!
Re: [PATCH 07/10] target/i386/tcg: Use DPL-level accesses for interrupts and call gates
On 7/9/24 23:29, Paolo Bonzini wrote: This fixes a bug wherein i386/tcg assumed an interrupt return using the CALL or JMP instructions were always going from kernel or user mode to kernel mode, when using a call gate. This assumption is violated if the call gate has a DPL that is greater than 0. In addition, the stack accesses should count as explicit, not implicit ("kernel" in QEMU code), so that SMAP is not applied if DPL=3. Analyzed-by: Robert R. Henry Resolves:https://gitlab.com/qemu-project/qemu/-/issues/249 Signed-off-by: Paolo Bonzini --- target/i386/tcg/seg_helper.c | 13 ++--- 1 file changed, 6 insertions(+), 7 deletions(-) Reviewed-by: Richard Henderson r~
Re: [PATCH 06/10] target/i386/tcg: Compute MMU index once
On 7/9/24 23:29, Paolo Bonzini wrote: Add the MMU index to the StackAccess struct, so that it can be cached or (in the next patch) computed from information that is not in CPUX86State. Co-developed-by: Richard Henderson Signed-off-by: Richard Henderson Signed-off-by: Paolo Bonzini Reviewed-by: Richard Henderson r~
Re: [PATCH 03/10] target/i386/tcg: use PUSHL/PUSHW for error code
On 7/9/24 23:29, Paolo Bonzini wrote: Do not pre-decrement esp, let the macros subtract the appropriate operand size. Signed-off-by: Paolo Bonzini --- target/i386/tcg/seg_helper.c | 16 +++- 1 file changed, 7 insertions(+), 9 deletions(-) Reviewed-by: Richard Henderson r~
Re: [PATCH 02/10] target/i386/tcg: Allow IRET from user mode to user mode with SMAP
On 7/9/24 23:29, Paolo Bonzini wrote: This fixes a bug wherein i386/tcg assumed an interrupt return using the IRET instruction was always returning from kernel mode to either kernel mode or user mode. This assumption is violated when IRET is used as a clever way to restore thread state, as for example in the dotnet runtime. There, IRET returns from user mode to user mode. This bug is that stack accesses from IRET and RETF, as well as accesses to the parameters in a call gate, are normal data accesses using the current CPL. This manifested itself as a page fault in the guest Linux kernel due to SMAP preventing the access. This bug appears to have been in QEMU since the beginning. Analyzed-by: Robert R. Henry Co-developed-by: Robert R. Henry Signed-off-by: Robert R. Henry Signed-off-by: Paolo Bonzini --- target/i386/tcg/seg_helper.c | 18 +- 1 file changed, 9 insertions(+), 9 deletions(-) Reviewed-by: Richard Henderson r~
Re: [PATCH] target/i386/tcg: fix POP to memory in long mode
On 7/10/24 07:13, Paolo Bonzini wrote: In long mode, POP to memory will write a full 64-bit value. However, the call to gen_writeback() in gen_POP will use MO_32 because the decoding table is incorrect. The bug was latent until commit aea49fbb01a ("target/i386: use gen_writeback() within gen_POP()", 2024-06-08), and then became visible because gen_op_st_v now receives op->ot instead of the "ot" returned by gen_pop_T0. Analyzed-by: Clément Chigot Fixes: 5e9e21bcc4d ("target/i386: move 60-BF opcodes to new decoder", 2024-05-07) Tested-by: Clément Chigot Signed-off-by: Paolo Bonzini --- target/i386/tcg/decode-new.c.inc | 2 +- target/i386/tcg/emit.c.inc | 2 ++ 2 files changed, 3 insertions(+), 1 deletion(-) Reviewed-by: Richard Henderson r~
Re: [PATCH v2 09/13] target/ppc: Improve helper_dcbz for user-only
On 7/10/24 05:25, BALATON Zoltan wrote: On Tue, 9 Jul 2024, Richard Henderson wrote: Mark the reserve_addr check unlikely. Use tlb_vaddr_to_host instead of probe_write, relying on the memset itself to test for page writability. Use set/clear_helper_retaddr so that we can properly unwind on segfault. With this, a trivial loop around guest memset will spend nearly 50% of runtime within helper_dcbz and host memset. Signed-off-by: Richard Henderson --- target/ppc/mem_helper.c | 14 ++ 1 file changed, 10 insertions(+), 4 deletions(-) diff --git a/target/ppc/mem_helper.c b/target/ppc/mem_helper.c index 24bae3b80c..fa4c4f9fa9 100644 --- a/target/ppc/mem_helper.c +++ b/target/ppc/mem_helper.c @@ -280,20 +280,26 @@ static void dcbz_common(CPUPPCState *env, target_ulong addr, addr &= mask; /* Check reservation */ - if ((env->reserve_addr & mask) == addr) { + if (unlikely((env->reserve_addr & mask) == addr)) { env->reserve_addr = (target_ulong)-1ULL; } /* Try fast path translate */ +#ifdef CONFIG_USER_ONLY + haddr = tlb_vaddr_to_host(env, addr, MMU_DATA_STORE, mmu_idx); +#else haddr = probe_write(env, addr, dcbz_size, mmu_idx, retaddr); - if (haddr) { - memset(haddr, 0, dcbz_size); - } else { + if (unlikely(!haddr)) { /* Slow path */ for (int i = 0; i < dcbz_size; i += 8) { cpu_stq_mmuidx_ra(env, addr + i, 0, mmu_idx, retaddr); } Is a return needed here to only get to memset below when haddr != NULL? Oops, yes. r~