Re: [PULL 00/15] aspeed queue

2024-07-21 Thread Richard Henderson

On 7/21/24 18:13, Cédric Le Goater wrote:

The following changes since commit a87a7c449e532130d4fa8faa391ff7e1f04ed660:

   Merge tag 'pull-loongarch-20240719' ofhttps://gitlab.com/gaosong/qemu into 
staging (2024-07-19 16:28:28 +1000)

are available in the Git repository at:

   https://github.com/legoater/qemu/ tags/pull-aspeed-20240721

for you to fetch changes up to 4db1c16441923fc152142ae4bcc1cba23064cb8b:

   aspeed: fix coding style (2024-07-21 07:46:38 +0200)


aspeed queue:

* SMC model fix (Coverity)
* AST2600 boot for eMMC support and test
* AST2700 ADC model
* I2C model changes preparing AST2700 I2C support


Applied, thanks.  Please update https://wiki.qemu.org/ChangeLog/9.1 as 
appropriate.

r~



Re: [PULL 0/3] loongarch-to-apply queue

2024-07-19 Thread Richard Henderson

On 7/19/24 12:26, Song Gao wrote:

   Merge tag 'pull-target-arm-20240718' 
ofhttps://git.linaro.org/people/pmaydell/qemu-arm into staging (2024-07-19 
07:02:17 +1000)

are available in the Git repository at:

   https://gitlab.com/gaosong/qemu.git tags/pull-loongarch-20240719

for you to fetch changes up to 3ed016f525c8010e66be62d3ca6829eaa9b7cfb5:

   hw/loongarch: Modify flash block size to 256K (2024-07-19 10:40:04 +0800)


pull-loongarch-20240719


Applied, thanks.  Please update https://wiki.qemu.org/ChangeLog/9.1 as 
appropriate.

r~



Re: [PULL 00/26] target-arm queue

2024-07-18 Thread Richard Henderson

On 7/18/24 23:20, Peter Maydell wrote:

Hi; hopefully this is the last arm pullreq before softfreeze.
There's a handful of miscellaneous bug fixes here, but the
bulk of the pullreq is Mostafa's implementation of 2-stage
translation in the SMMUv3.

thanks
-- PMM

The following changes since commit d74ec4d7dda6322bcc51d1b13ccbd993d3574795:

   Merge tag 'pull-trivial-patches' ofhttps://gitlab.com/mjt0k/qemu into 
staging (2024-07-18 10:07:23 +1000)

are available in the Git repository at:

   https://git.linaro.org/people/pmaydell/qemu-arm.git 
tags/pull-target-arm-20240718

for you to fetch changes up to 30a1690f2402e6c1582d5b3ebcf7940bfe2fad4b:

   hvf: arm: Do not advance PC when raising an exception (2024-07-18 13:49:30 
+0100)


target-arm queue:
  * Fix handling of LDAPR/STLR with negative offset
  * LDAPR should honour SCTLR_ELx.nAA
  * Use float_status copy in sme_fmopa_s
  * hw/display/bcm2835_fb: fix fb_use_offsets condition
  * hw/arm/smmuv3: Support and advertise nesting
  * Use FPST_F16 for SME FMOPA (widening)
  * tests/arm-cpu-features: Do not assume PMU availability
  * hvf: arm: Do not advance PC when raising an exception


Applied, thanks.  Please update https://wiki.qemu.org/ChangeLog/9.1 as 
appropriate.

r~



[PATCH v3 11/12] target/s390x: Use set/clear_helper_retaddr in mem_helper.c

2024-07-18 Thread Richard Henderson
Avoid a race condition with munmap in another thread.
For access_memset and access_memmove, manage the value
within the helper.  For uses of access_{get,set}_byte,
manage the value across the for loops.

Signed-off-by: Richard Henderson 
---
 target/s390x/tcg/mem_helper.c | 43 ++-
 1 file changed, 37 insertions(+), 6 deletions(-)

diff --git a/target/s390x/tcg/mem_helper.c b/target/s390x/tcg/mem_helper.c
index 331a35b2e5..0e12dae2aa 100644
--- a/target/s390x/tcg/mem_helper.c
+++ b/target/s390x/tcg/mem_helper.c
@@ -238,14 +238,14 @@ static void do_access_memset(CPUS390XState *env, vaddr 
vaddr, char *haddr,
 static void access_memset(CPUS390XState *env, S390Access *desta,
   uint8_t byte, uintptr_t ra)
 {
-
+set_helper_retaddr(ra);
 do_access_memset(env, desta->vaddr1, desta->haddr1, byte, desta->size1,
  desta->mmu_idx, ra);
-if (likely(!desta->size2)) {
-return;
+if (unlikely(desta->size2)) {
+do_access_memset(env, desta->vaddr2, desta->haddr2, byte,
+ desta->size2, desta->mmu_idx, ra);
 }
-do_access_memset(env, desta->vaddr2, desta->haddr2, byte, desta->size2,
- desta->mmu_idx, ra);
+clear_helper_retaddr();
 }
 
 static uint8_t access_get_byte(CPUS390XState *env, S390Access *access,
@@ -366,6 +366,8 @@ static uint32_t do_helper_nc(CPUS390XState *env, uint32_t 
l, uint64_t dest,
 access_prepare(, env, src, l, MMU_DATA_LOAD, mmu_idx, ra);
 access_prepare(, env, dest, l, MMU_DATA_LOAD, mmu_idx, ra);
 access_prepare(, env, dest, l, MMU_DATA_STORE, mmu_idx, ra);
+set_helper_retaddr(ra);
+
 for (i = 0; i < l; i++) {
 const uint8_t x = access_get_byte(env, , i, ra) &
   access_get_byte(env, , i, ra);
@@ -373,6 +375,8 @@ static uint32_t do_helper_nc(CPUS390XState *env, uint32_t 
l, uint64_t dest,
 c |= x;
 access_set_byte(env, , i, x, ra);
 }
+
+clear_helper_retaddr();
 return c != 0;
 }
 
@@ -407,6 +411,7 @@ static uint32_t do_helper_xc(CPUS390XState *env, uint32_t 
l, uint64_t dest,
 return 0;
 }
 
+set_helper_retaddr(ra);
 for (i = 0; i < l; i++) {
 const uint8_t x = access_get_byte(env, , i, ra) ^
   access_get_byte(env, , i, ra);
@@ -414,6 +419,7 @@ static uint32_t do_helper_xc(CPUS390XState *env, uint32_t 
l, uint64_t dest,
 c |= x;
 access_set_byte(env, , i, x, ra);
 }
+clear_helper_retaddr();
 return c != 0;
 }
 
@@ -441,6 +447,8 @@ static uint32_t do_helper_oc(CPUS390XState *env, uint32_t 
l, uint64_t dest,
 access_prepare(, env, src, l, MMU_DATA_LOAD, mmu_idx, ra);
 access_prepare(, env, dest, l, MMU_DATA_LOAD, mmu_idx, ra);
 access_prepare(, env, dest, l, MMU_DATA_STORE, mmu_idx, ra);
+set_helper_retaddr(ra);
+
 for (i = 0; i < l; i++) {
 const uint8_t x = access_get_byte(env, , i, ra) |
   access_get_byte(env, , i, ra);
@@ -448,6 +456,8 @@ static uint32_t do_helper_oc(CPUS390XState *env, uint32_t 
l, uint64_t dest,
 c |= x;
 access_set_byte(env, , i, x, ra);
 }
+
+clear_helper_retaddr();
 return c != 0;
 }
 
@@ -484,11 +494,13 @@ static uint32_t do_helper_mvc(CPUS390XState *env, 
uint32_t l, uint64_t dest,
 } else if (!is_destructive_overlap(env, dest, src, l)) {
 access_memmove(env, , , ra);
 } else {
+set_helper_retaddr(ra);
 for (i = 0; i < l; i++) {
 uint8_t byte = access_get_byte(env, , i, ra);
 
 access_set_byte(env, , i, byte, ra);
 }
+clear_helper_retaddr();
 }
 
 return env->cc_op;
@@ -514,10 +526,12 @@ void HELPER(mvcrl)(CPUS390XState *env, uint64_t l, 
uint64_t dest, uint64_t src)
 access_prepare(, env, src, l, MMU_DATA_LOAD, mmu_idx, ra);
 access_prepare(, env, dest, l, MMU_DATA_STORE, mmu_idx, ra);
 
+set_helper_retaddr(ra);
 for (i = l - 1; i >= 0; i--) {
 uint8_t byte = access_get_byte(env, , i, ra);
 access_set_byte(env, , i, byte, ra);
 }
+clear_helper_retaddr();
 }
 
 /* move inverse  */
@@ -534,11 +548,13 @@ void HELPER(mvcin)(CPUS390XState *env, uint32_t l, 
uint64_t dest, uint64_t src)
 src = wrap_address(env, src - l + 1);
 access_prepare(, env, src, l, MMU_DATA_LOAD, mmu_idx, ra);
 access_prepare(, env, dest, l, MMU_DATA_STORE, mmu_idx, ra);
+
+set_helper_retaddr(ra);
 for (i = 0; i < l; i++) {
 const uint8_t x = access_get_byte(env, , l - i - 1, ra);
-
 access_set_byte(env, , i, x, ra);
 }
+clear_helper_retaddr();
 }
 
 /* move numerics  */
@@ -555,12 +571,15 @@ void HELPER(mvn)(CPUS390XState *env, uint32_t l, uint64_t 
dest, uint64_t src)
 access_prepare(, env, src, l, MMU_DATA_LOAD, mmu_idx, ra);
 access_prepare(, env, dest, l, MMU_DATA_LOA

[PATCH v3 08/12] target/ppc: Improve helper_dcbz for user-only

2024-07-18 Thread Richard Henderson
Mark the reserve_addr check unlikely.  Use tlb_vaddr_to_host
instead of probe_write, relying on the memset itself to test
for page writability.  Use set/clear_helper_retaddr so that
we can properly unwind on segfault.

With this, a trivial loop around guest memset will spend
nearly 50% of runtime within helper_dcbz and host memset.

Signed-off-by: Richard Henderson 
---
 target/ppc/mem_helper.c | 15 +++
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/target/ppc/mem_helper.c b/target/ppc/mem_helper.c
index 24bae3b80c..953dd08d5d 100644
--- a/target/ppc/mem_helper.c
+++ b/target/ppc/mem_helper.c
@@ -280,20 +280,27 @@ static void dcbz_common(CPUPPCState *env, target_ulong 
addr,
 addr &= mask;
 
 /* Check reservation */
-if ((env->reserve_addr & mask) == addr)  {
+if (unlikely((env->reserve_addr & mask) == addr))  {
 env->reserve_addr = (target_ulong)-1ULL;
 }
 
 /* Try fast path translate */
+#ifdef CONFIG_USER_ONLY
+haddr = tlb_vaddr_to_host(env, addr, MMU_DATA_STORE, mmu_idx);
+#else
 haddr = probe_write(env, addr, dcbz_size, mmu_idx, retaddr);
-if (haddr) {
-memset(haddr, 0, dcbz_size);
-} else {
+if (unlikely(!haddr)) {
 /* Slow path */
 for (int i = 0; i < dcbz_size; i += 8) {
 cpu_stq_mmuidx_ra(env, addr + i, 0, mmu_idx, retaddr);
 }
+return;
 }
+#endif
+
+set_helper_retaddr(retaddr);
+memset(haddr, 0, dcbz_size);
+clear_helper_retaddr();
 }
 
 void helper_dcbz(CPUPPCState *env, target_ulong addr, int mmu_idx)
-- 
2.43.0




[PATCH v3 01/12] accel/tcg: Move {set, clear}_helper_retaddr to cpu_ldst.h

2024-07-18 Thread Richard Henderson
Use of these in helpers goes hand-in-hand with tlb_vaddr_to_host
and other probing functions.

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 accel/tcg/user-retaddr.h | 28 
 include/exec/cpu_ldst.h  | 34 ++
 accel/tcg/cpu-exec.c |  3 ---
 accel/tcg/user-exec.c|  1 -
 4 files changed, 34 insertions(+), 32 deletions(-)
 delete mode 100644 accel/tcg/user-retaddr.h

diff --git a/accel/tcg/user-retaddr.h b/accel/tcg/user-retaddr.h
deleted file mode 100644
index e0f57e1994..00
--- a/accel/tcg/user-retaddr.h
+++ /dev/null
@@ -1,28 +0,0 @@
-#ifndef ACCEL_TCG_USER_RETADDR_H
-#define ACCEL_TCG_USER_RETADDR_H
-
-#include "qemu/atomic.h"
-
-extern __thread uintptr_t helper_retaddr;
-
-static inline void set_helper_retaddr(uintptr_t ra)
-{
-helper_retaddr = ra;
-/*
- * Ensure that this write is visible to the SIGSEGV handler that
- * may be invoked due to a subsequent invalid memory operation.
- */
-signal_barrier();
-}
-
-static inline void clear_helper_retaddr(void)
-{
-/*
- * Ensure that previous memory operations have succeeded before
- * removing the data visible to the signal handler.
- */
-signal_barrier();
-helper_retaddr = 0;
-}
-
-#endif
diff --git a/include/exec/cpu_ldst.h b/include/exec/cpu_ldst.h
index 71009f84f5..dac12bd8eb 100644
--- a/include/exec/cpu_ldst.h
+++ b/include/exec/cpu_ldst.h
@@ -379,4 +379,38 @@ void *tlb_vaddr_to_host(CPUArchState *env, abi_ptr addr,
 MMUAccessType access_type, int mmu_idx);
 #endif
 
+/*
+ * For user-only, helpers that use guest to host address translation
+ * must protect the actual host memory access by recording 'retaddr'
+ * for the signal handler.  This is required for a race condition in
+ * which another thread unmaps the page between a probe and the
+ * actual access.
+ */
+#ifdef CONFIG_USER_ONLY
+extern __thread uintptr_t helper_retaddr;
+
+static inline void set_helper_retaddr(uintptr_t ra)
+{
+helper_retaddr = ra;
+/*
+ * Ensure that this write is visible to the SIGSEGV handler that
+ * may be invoked due to a subsequent invalid memory operation.
+ */
+signal_barrier();
+}
+
+static inline void clear_helper_retaddr(void)
+{
+/*
+ * Ensure that previous memory operations have succeeded before
+ * removing the data visible to the signal handler.
+ */
+signal_barrier();
+helper_retaddr = 0;
+}
+#else
+#define set_helper_retaddr(ra)   do { } while (0)
+#define clear_helper_retaddr()   do { } while (0)
+#endif
+
 #endif /* CPU_LDST_H */
diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
index 9010dad073..8163295f34 100644
--- a/accel/tcg/cpu-exec.c
+++ b/accel/tcg/cpu-exec.c
@@ -41,9 +41,6 @@
 #include "tb-context.h"
 #include "internal-common.h"
 #include "internal-target.h"
-#if defined(CONFIG_USER_ONLY)
-#include "user-retaddr.h"
-#endif
 
 /* -icount align implementation. */
 
diff --git a/accel/tcg/user-exec.c b/accel/tcg/user-exec.c
index 80d24540ed..7ddc47b0ba 100644
--- a/accel/tcg/user-exec.c
+++ b/accel/tcg/user-exec.c
@@ -33,7 +33,6 @@
 #include "tcg/tcg-ldst.h"
 #include "internal-common.h"
 #include "internal-target.h"
-#include "user-retaddr.h"
 
 __thread uintptr_t helper_retaddr;
 
-- 
2.43.0




[PATCH v3 04/12] target/ppc/mem_helper.c: Remove a conditional from dcbz_common()

2024-07-18 Thread Richard Henderson
From: BALATON Zoltan 

Instead of passing a bool and select a value within dcbz_common() let
the callers pass in the right value to avoid this conditional
statement. On PPC dcbz is often used to zero memory and some code uses
it a lot. This change improves the run time of a test case that copies
memory with a dcbz call in every iteration from 6.23 to 5.83 seconds.

Signed-off-by: BALATON Zoltan 
Message-Id: <20240622204833.5f7c74e6...@zero.eik.bme.hu>
Reviewed-by: Richard Henderson 
Signed-off-by: Richard Henderson 
Reviewed-by: Nicholas Piggin 
---
 target/ppc/mem_helper.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/target/ppc/mem_helper.c b/target/ppc/mem_helper.c
index f88155ad45..361fd72226 100644
--- a/target/ppc/mem_helper.c
+++ b/target/ppc/mem_helper.c
@@ -271,12 +271,11 @@ void helper_stsw(CPUPPCState *env, target_ulong addr, 
uint32_t nb,
 }
 
 static void dcbz_common(CPUPPCState *env, target_ulong addr,
-uint32_t opcode, bool epid, uintptr_t retaddr)
+uint32_t opcode, int mmu_idx, uintptr_t retaddr)
 {
 target_ulong mask, dcbz_size = env->dcache_line_size;
 uint32_t i;
 void *haddr;
-int mmu_idx = epid ? PPC_TLB_EPID_STORE : ppc_env_mmu_index(env, false);
 
 #if defined(TARGET_PPC64)
 /* Check for dcbz vs dcbzl on 970 */
@@ -309,12 +308,12 @@ static void dcbz_common(CPUPPCState *env, target_ulong 
addr,
 
 void helper_dcbz(CPUPPCState *env, target_ulong addr, uint32_t opcode)
 {
-dcbz_common(env, addr, opcode, false, GETPC());
+dcbz_common(env, addr, opcode, ppc_env_mmu_index(env, false), GETPC());
 }
 
 void helper_dcbzep(CPUPPCState *env, target_ulong addr, uint32_t opcode)
 {
-dcbz_common(env, addr, opcode, true, GETPC());
+dcbz_common(env, addr, opcode, PPC_TLB_EPID_STORE, GETPC());
 }
 
 void helper_icbi(CPUPPCState *env, target_ulong addr)
-- 
2.43.0




[PATCH v3 02/12] target/arm: Use set/clear_helper_retaddr in helper-a64.c

2024-07-18 Thread Richard Henderson
Use these in helper_dc_dva and the FEAT_MOPS routines to
avoid a race condition with munmap in another thread.

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 target/arm/tcg/helper-a64.c | 14 --
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/target/arm/tcg/helper-a64.c b/target/arm/tcg/helper-a64.c
index 0ea8668ab4..c60d2a7ec9 100644
--- a/target/arm/tcg/helper-a64.c
+++ b/target/arm/tcg/helper-a64.c
@@ -928,6 +928,8 @@ uint32_t HELPER(sqrt_f16)(uint32_t a, void *fpstp)
 
 void HELPER(dc_zva)(CPUARMState *env, uint64_t vaddr_in)
 {
+uintptr_t ra = GETPC();
+
 /*
  * Implement DC ZVA, which zeroes a fixed-length block of memory.
  * Note that we do not implement the (architecturally mandated)
@@ -948,8 +950,6 @@ void HELPER(dc_zva)(CPUARMState *env, uint64_t vaddr_in)
 
 #ifndef CONFIG_USER_ONLY
 if (unlikely(!mem)) {
-uintptr_t ra = GETPC();
-
 /*
  * Trap if accessing an invalid page.  DC_ZVA requires that we supply
  * the original pointer for an invalid page.  But watchpoints require
@@ -971,7 +971,9 @@ void HELPER(dc_zva)(CPUARMState *env, uint64_t vaddr_in)
 }
 #endif
 
+set_helper_retaddr(ra);
 memset(mem, 0, blocklen);
+clear_helper_retaddr();
 }
 
 void HELPER(unaligned_access)(CPUARMState *env, uint64_t addr,
@@ -1120,7 +1122,9 @@ static uint64_t set_step(CPUARMState *env, uint64_t 
toaddr,
 }
 #endif
 /* Easy case: just memset the host memory */
+set_helper_retaddr(ra);
 memset(mem, data, setsize);
+clear_helper_retaddr();
 return setsize;
 }
 
@@ -1163,7 +1167,9 @@ static uint64_t set_step_tags(CPUARMState *env, uint64_t 
toaddr,
 }
 #endif
 /* Easy case: just memset the host memory */
+set_helper_retaddr(ra);
 memset(mem, data, setsize);
+clear_helper_retaddr();
 mte_mops_set_tags(env, toaddr, setsize, *mtedesc);
 return setsize;
 }
@@ -1497,7 +1503,9 @@ static uint64_t copy_step(CPUARMState *env, uint64_t 
toaddr, uint64_t fromaddr,
 }
 #endif
 /* Easy case: just memmove the host memory */
+set_helper_retaddr(ra);
 memmove(wmem, rmem, copysize);
+clear_helper_retaddr();
 return copysize;
 }
 
@@ -1572,7 +1580,9 @@ static uint64_t copy_step_rev(CPUARMState *env, uint64_t 
toaddr,
  * Easy case: just memmove the host memory. Note that wmem and
  * rmem here point to the *last* byte to copy.
  */
+set_helper_retaddr(ra);
 memmove(wmem - (copysize - 1), rmem - (copysize - 1), copysize);
+clear_helper_retaddr();
 return copysize;
 }
 
-- 
2.43.0




[PATCH v3 03/12] target/arm: Use set/clear_helper_retaddr in SVE and SME helpers

2024-07-18 Thread Richard Henderson
Avoid a race condition with munmap in another thread.
Use around blocks that exclusively use "host_fn".
Keep the blocks as small as possible, but without setting
and clearing for every operation on one page.

Signed-off-by: Richard Henderson 
---
 target/arm/tcg/sme_helper.c | 16 ++
 target/arm/tcg/sve_helper.c | 42 +
 2 files changed, 49 insertions(+), 9 deletions(-)

diff --git a/target/arm/tcg/sme_helper.c b/target/arm/tcg/sme_helper.c
index e2e0575039..ab40ced38f 100644
--- a/target/arm/tcg/sme_helper.c
+++ b/target/arm/tcg/sme_helper.c
@@ -517,6 +517,8 @@ void sme_ld1(CPUARMState *env, void *za, uint64_t *vg,
 clr_fn(za, 0, reg_off);
 }
 
+set_helper_retaddr(ra);
+
 while (reg_off <= reg_last) {
 uint64_t pg = vg[reg_off >> 6];
 do {
@@ -529,6 +531,8 @@ void sme_ld1(CPUARMState *env, void *za, uint64_t *vg,
 } while (reg_off <= reg_last && (reg_off & 63));
 }
 
+clear_helper_retaddr();
+
 /*
  * Use the slow path to manage the cross-page misalignment.
  * But we know this is RAM and cannot trap.
@@ -543,6 +547,8 @@ void sme_ld1(CPUARMState *env, void *za, uint64_t *vg,
 reg_last = info.reg_off_last[1];
 host = info.page[1].host;
 
+set_helper_retaddr(ra);
+
 do {
 uint64_t pg = vg[reg_off >> 6];
 do {
@@ -554,6 +560,8 @@ void sme_ld1(CPUARMState *env, void *za, uint64_t *vg,
 reg_off += esize;
 } while (reg_off & 63);
 } while (reg_off <= reg_last);
+
+clear_helper_retaddr();
 }
 }
 
@@ -701,6 +709,8 @@ void sme_st1(CPUARMState *env, void *za, uint64_t *vg,
 reg_last = info.reg_off_last[0];
 host = info.page[0].host;
 
+set_helper_retaddr(ra);
+
 while (reg_off <= reg_last) {
 uint64_t pg = vg[reg_off >> 6];
 do {
@@ -711,6 +721,8 @@ void sme_st1(CPUARMState *env, void *za, uint64_t *vg,
 } while (reg_off <= reg_last && (reg_off & 63));
 }
 
+clear_helper_retaddr();
+
 /*
  * Use the slow path to manage the cross-page misalignment.
  * But we know this is RAM and cannot trap.
@@ -725,6 +737,8 @@ void sme_st1(CPUARMState *env, void *za, uint64_t *vg,
 reg_last = info.reg_off_last[1];
 host = info.page[1].host;
 
+set_helper_retaddr(ra);
+
 do {
 uint64_t pg = vg[reg_off >> 6];
 do {
@@ -734,6 +748,8 @@ void sme_st1(CPUARMState *env, void *za, uint64_t *vg,
 reg_off += 1 << esz;
 } while (reg_off & 63);
 } while (reg_off <= reg_last);
+
+clear_helper_retaddr();
 }
 }
 
diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c
index dd49e67d7a..f1ee0e060f 100644
--- a/target/arm/tcg/sve_helper.c
+++ b/target/arm/tcg/sve_helper.c
@@ -5738,6 +5738,8 @@ void sve_ldN_r(CPUARMState *env, uint64_t *vg, const 
target_ulong addr,
 reg_last = info.reg_off_last[0];
 host = info.page[0].host;
 
+set_helper_retaddr(retaddr);
+
 while (reg_off <= reg_last) {
 uint64_t pg = vg[reg_off >> 6];
 do {
@@ -5752,6 +5754,8 @@ void sve_ldN_r(CPUARMState *env, uint64_t *vg, const 
target_ulong addr,
 } while (reg_off <= reg_last && (reg_off & 63));
 }
 
+clear_helper_retaddr();
+
 /*
  * Use the slow path to manage the cross-page misalignment.
  * But we know this is RAM and cannot trap.
@@ -5771,6 +5775,8 @@ void sve_ldN_r(CPUARMState *env, uint64_t *vg, const 
target_ulong addr,
 reg_last = info.reg_off_last[1];
 host = info.page[1].host;
 
+set_helper_retaddr(retaddr);
+
 do {
 uint64_t pg = vg[reg_off >> 6];
 do {
@@ -5784,6 +5790,8 @@ void sve_ldN_r(CPUARMState *env, uint64_t *vg, const 
target_ulong addr,
 mem_off += N << msz;
 } while (reg_off & 63);
 } while (reg_off <= reg_last);
+
+clear_helper_retaddr();
 }
 }
 
@@ -5934,15 +5942,11 @@ DO_LDN_2(4, dd, MO_64)
 /*
  * Load contiguous data, first-fault and no-fault.
  *
- * For user-only, one could argue that we should hold the mmap_lock during
- * the operation so that there is no race between page_check_range and the
- * load operation.  However, unmapping pages out from under a running thread
- * is extraordinarily unlikely.  This theoretical race condition also affects
- * linux-user/ in its get_user/put_user macros.
- *
- * TODO: Construct some helpers, written in assembly, that interact with
- * host_signal_handler to produce memory ops which can properly report errors
- * without racing.
+ * For user-only, we control the race between page_check_range and
+ * another thread's munmap by using set/clear_helper_retaddr.  Any
+ * SEGV that occurs between those markers is assumed to be b

[PATCH v3 05/12] target/ppc: Hoist dcbz_size out of dcbz_common

2024-07-18 Thread Richard Henderson
The 970 logic does not apply to dcbzep, which is an e500 insn.

Reviewed-by: Nicholas Piggin 
Reviewed-by: BALATON Zoltan 
Signed-off-by: Richard Henderson 
---
 target/ppc/mem_helper.c | 30 +++---
 1 file changed, 15 insertions(+), 15 deletions(-)

diff --git a/target/ppc/mem_helper.c b/target/ppc/mem_helper.c
index 361fd72226..5067919ff8 100644
--- a/target/ppc/mem_helper.c
+++ b/target/ppc/mem_helper.c
@@ -271,22 +271,12 @@ void helper_stsw(CPUPPCState *env, target_ulong addr, 
uint32_t nb,
 }
 
 static void dcbz_common(CPUPPCState *env, target_ulong addr,
-uint32_t opcode, int mmu_idx, uintptr_t retaddr)
+int dcbz_size, int mmu_idx, uintptr_t retaddr)
 {
-target_ulong mask, dcbz_size = env->dcache_line_size;
-uint32_t i;
+target_ulong mask = ~(target_ulong)(dcbz_size - 1);
 void *haddr;
 
-#if defined(TARGET_PPC64)
-/* Check for dcbz vs dcbzl on 970 */
-if (env->excp_model == POWERPC_EXCP_970 &&
-!(opcode & 0x0020) && ((env->spr[SPR_970_HID5] >> 7) & 0x3) == 1) {
-dcbz_size = 32;
-}
-#endif
-
 /* Align address */
-mask = ~(dcbz_size - 1);
 addr &= mask;
 
 /* Check reservation */
@@ -300,7 +290,7 @@ static void dcbz_common(CPUPPCState *env, target_ulong addr,
 memset(haddr, 0, dcbz_size);
 } else {
 /* Slow path */
-for (i = 0; i < dcbz_size; i += 8) {
+for (int i = 0; i < dcbz_size; i += 8) {
 cpu_stq_mmuidx_ra(env, addr + i, 0, mmu_idx, retaddr);
 }
 }
@@ -308,12 +298,22 @@ static void dcbz_common(CPUPPCState *env, target_ulong 
addr,
 
 void helper_dcbz(CPUPPCState *env, target_ulong addr, uint32_t opcode)
 {
-dcbz_common(env, addr, opcode, ppc_env_mmu_index(env, false), GETPC());
+int dcbz_size = env->dcache_line_size;
+
+#if defined(TARGET_PPC64)
+/* Check for dcbz vs dcbzl on 970 */
+if (env->excp_model == POWERPC_EXCP_970 &&
+!(opcode & 0x0020) && ((env->spr[SPR_970_HID5] >> 7) & 0x3) == 1) {
+dcbz_size = 32;
+}
+#endif
+
+dcbz_common(env, addr, dcbz_size, ppc_env_mmu_index(env, false), GETPC());
 }
 
 void helper_dcbzep(CPUPPCState *env, target_ulong addr, uint32_t opcode)
 {
-dcbz_common(env, addr, opcode, PPC_TLB_EPID_STORE, GETPC());
+dcbz_common(env, addr, env->dcache_line_size, PPC_TLB_EPID_STORE, GETPC());
 }
 
 void helper_icbi(CPUPPCState *env, target_ulong addr)
-- 
2.43.0




[PATCH v3 07/12] target/ppc: Merge helper_{dcbz,dcbzep}

2024-07-18 Thread Richard Henderson
Merge the two and pass the mmu_idx directly from translation.
Swap the argument order in dcbz_common to avoid extra swaps.

Reviewed-by: Nicholas Piggin 
Signed-off-by: Richard Henderson 
---
 target/ppc/helper.h |  3 +--
 target/ppc/mem_helper.c | 14 --
 target/ppc/translate.c  |  4 ++--
 3 files changed, 7 insertions(+), 14 deletions(-)

diff --git a/target/ppc/helper.h b/target/ppc/helper.h
index afc56855ff..4fa089cbf9 100644
--- a/target/ppc/helper.h
+++ b/target/ppc/helper.h
@@ -46,8 +46,7 @@ DEF_HELPER_FLAGS_3(stmw, TCG_CALL_NO_WG, void, env, tl, i32)
 DEF_HELPER_4(lsw, void, env, tl, i32, i32)
 DEF_HELPER_5(lswx, void, env, tl, i32, i32, i32)
 DEF_HELPER_FLAGS_4(stsw, TCG_CALL_NO_WG, void, env, tl, i32, i32)
-DEF_HELPER_FLAGS_2(dcbz, TCG_CALL_NO_WG, void, env, tl)
-DEF_HELPER_FLAGS_2(dcbzep, TCG_CALL_NO_WG, void, env, tl)
+DEF_HELPER_FLAGS_3(dcbz, TCG_CALL_NO_WG, void, env, tl, int)
 #ifdef TARGET_PPC64
 DEF_HELPER_FLAGS_2(dcbzl, TCG_CALL_NO_WG, void, env, tl)
 #endif
diff --git a/target/ppc/mem_helper.c b/target/ppc/mem_helper.c
index d4957efd6e..24bae3b80c 100644
--- a/target/ppc/mem_helper.c
+++ b/target/ppc/mem_helper.c
@@ -271,7 +271,7 @@ void helper_stsw(CPUPPCState *env, target_ulong addr, 
uint32_t nb,
 }
 
 static void dcbz_common(CPUPPCState *env, target_ulong addr,
-int dcbz_size, int mmu_idx, uintptr_t retaddr)
+int mmu_idx, int dcbz_size, uintptr_t retaddr)
 {
 target_ulong mask = ~(target_ulong)(dcbz_size - 1);
 void *haddr;
@@ -296,15 +296,9 @@ static void dcbz_common(CPUPPCState *env, target_ulong 
addr,
 }
 }
 
-void helper_dcbz(CPUPPCState *env, target_ulong addr)
+void helper_dcbz(CPUPPCState *env, target_ulong addr, int mmu_idx)
 {
-dcbz_common(env, addr, env->dcache_line_size,
-ppc_env_mmu_index(env, false), GETPC());
-}
-
-void helper_dcbzep(CPUPPCState *env, target_ulong addr)
-{
-dcbz_common(env, addr, env->dcache_line_size, PPC_TLB_EPID_STORE, GETPC());
+dcbz_common(env, addr, mmu_idx, env->dcache_line_size, GETPC());
 }
 
 #ifdef TARGET_PPC64
@@ -320,7 +314,7 @@ void helper_dcbzl(CPUPPCState *env, target_ulong addr)
 dcbz_size = 32;
 }
 
-dcbz_common(env, addr, dcbz_size, ppc_env_mmu_index(env, false), GETPC());
+dcbz_common(env, addr, ppc_env_mmu_index(env, false), dcbz_size, GETPC());
 }
 #endif
 
diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index 9e472ab7ef..cba943a49d 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -4458,7 +4458,7 @@ static void gen_dcbz(DisasContext *ctx)
 }
 #endif
 
-gen_helper_dcbz(tcg_env, tcgv_addr);
+gen_helper_dcbz(tcg_env, tcgv_addr, tcg_constant_i32(ctx->mem_idx));
 }
 
 /* dcbzep */
@@ -4468,7 +4468,7 @@ static void gen_dcbzep(DisasContext *ctx)
 
 gen_set_access_type(ctx, ACCESS_CACHE);
 gen_addr_reg_index(ctx, tcgv_addr);
-gen_helper_dcbzep(tcg_env, tcgv_addr);
+gen_helper_dcbz(tcg_env, tcgv_addr, tcg_constant_i32(PPC_TLB_EPID_STORE));
 }
 
 /* dst / dstt */
-- 
2.43.0




[PATCH v3 12/12] target/riscv: Simplify probing in vext_ldff

2024-07-18 Thread Richard Henderson
The current pairing of tlb_vaddr_to_host with extra is either
inefficient (user-only, with page_check_range) or incorrect
(system, with probe_pages).

For proper non-fault behaviour, use probe_access_flags with
its nonfault parameter set to true.

Acked-by: Alistair Francis 
Signed-off-by: Richard Henderson 
---
 target/riscv/vector_helper.c | 31 +--
 1 file changed, 17 insertions(+), 14 deletions(-)

diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index 1b4d5a8e37..10a52ceb5b 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -474,7 +474,6 @@ vext_ldff(void *vd, void *v0, target_ulong base,
   vext_ldst_elem_fn *ldst_elem,
   uint32_t log2_esz, uintptr_t ra)
 {
-void *host;
 uint32_t i, k, vl = 0;
 uint32_t nf = vext_nf(desc);
 uint32_t vm = vext_vm(desc);
@@ -493,27 +492,31 @@ vext_ldff(void *vd, void *v0, target_ulong base,
 }
 addr = adjust_addr(env, base + i * (nf << log2_esz));
 if (i == 0) {
+/* Allow fault on first element. */
 probe_pages(env, addr, nf << log2_esz, ra, MMU_DATA_LOAD);
 } else {
-/* if it triggers an exception, no need to check watchpoint */
 remain = nf << log2_esz;
 while (remain > 0) {
+void *host;
+int flags;
+
 offset = -(addr | TARGET_PAGE_MASK);
-host = tlb_vaddr_to_host(env, addr, MMU_DATA_LOAD, mmu_index);
-if (host) {
-#ifdef CONFIG_USER_ONLY
-if (!page_check_range(addr, offset, PAGE_READ)) {
-vl = i;
-goto ProbeSuccess;
-}
-#else
-probe_pages(env, addr, offset, ra, MMU_DATA_LOAD);
-#endif
-} else {
+
+/* Probe nonfault on subsequent elements. */
+flags = probe_access_flags(env, addr, offset, MMU_DATA_LOAD,
+   mmu_index, true, , 0);
+
+/*
+ * Stop if invalid (unmapped) or mmio (transaction may fail).
+ * Do not stop if watchpoint, as the spec says that
+ * first-fault should continue to access the same
+ * elements regardless of any watchpoint.
+ */
+if (flags & ~TLB_WATCHPOINT) {
 vl = i;
 goto ProbeSuccess;
 }
-if (remain <=  offset) {
+if (remain <= offset) {
 break;
 }
 remain -= offset;
-- 
2.43.0




[PATCH v3 09/12] target/s390x: Use user_or_likely in do_access_memset

2024-07-18 Thread Richard Henderson
Eliminate the ifdef by using a predicate that is
always true with CONFIG_USER_ONLY.

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 target/s390x/tcg/mem_helper.c | 6 +-
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/target/s390x/tcg/mem_helper.c b/target/s390x/tcg/mem_helper.c
index 6cdbc34178..5311a15a09 100644
--- a/target/s390x/tcg/mem_helper.c
+++ b/target/s390x/tcg/mem_helper.c
@@ -225,10 +225,7 @@ static void do_access_memset(CPUS390XState *env, vaddr 
vaddr, char *haddr,
  uint8_t byte, uint16_t size, int mmu_idx,
  uintptr_t ra)
 {
-#ifdef CONFIG_USER_ONLY
-memset(haddr, byte, size);
-#else
-if (likely(haddr)) {
+if (user_or_likely(haddr)) {
 memset(haddr, byte, size);
 } else {
 MemOpIdx oi = make_memop_idx(MO_UB, mmu_idx);
@@ -236,7 +233,6 @@ static void do_access_memset(CPUS390XState *env, vaddr 
vaddr, char *haddr,
 cpu_stb_mmu(env, vaddr + i, byte, oi, ra);
 }
 }
-#endif
 }
 
 static void access_memset(CPUS390XState *env, S390Access *desta,
-- 
2.43.0




[PATCH v3 06/12] target/ppc: Split out helper_dbczl for 970

2024-07-18 Thread Richard Henderson
We can determine at translation time whether the insn is or
is not dbczl.  We must retain a runtime check against the
HID5 register, but we can move that to a separate function
that never affects other ppc models.

Reviewed-by: Nicholas Piggin 
Reviewed-by: BALATON Zoltan 
Signed-off-by: Richard Henderson 
---
 target/ppc/helper.h |  7 +--
 target/ppc/mem_helper.c | 34 +-
 target/ppc/translate.c  | 24 ++--
 3 files changed, 40 insertions(+), 25 deletions(-)

diff --git a/target/ppc/helper.h b/target/ppc/helper.h
index 76b8f25c77..afc56855ff 100644
--- a/target/ppc/helper.h
+++ b/target/ppc/helper.h
@@ -46,8 +46,11 @@ DEF_HELPER_FLAGS_3(stmw, TCG_CALL_NO_WG, void, env, tl, i32)
 DEF_HELPER_4(lsw, void, env, tl, i32, i32)
 DEF_HELPER_5(lswx, void, env, tl, i32, i32, i32)
 DEF_HELPER_FLAGS_4(stsw, TCG_CALL_NO_WG, void, env, tl, i32, i32)
-DEF_HELPER_FLAGS_3(dcbz, TCG_CALL_NO_WG, void, env, tl, i32)
-DEF_HELPER_FLAGS_3(dcbzep, TCG_CALL_NO_WG, void, env, tl, i32)
+DEF_HELPER_FLAGS_2(dcbz, TCG_CALL_NO_WG, void, env, tl)
+DEF_HELPER_FLAGS_2(dcbzep, TCG_CALL_NO_WG, void, env, tl)
+#ifdef TARGET_PPC64
+DEF_HELPER_FLAGS_2(dcbzl, TCG_CALL_NO_WG, void, env, tl)
+#endif
 DEF_HELPER_FLAGS_2(icbi, TCG_CALL_NO_WG, void, env, tl)
 DEF_HELPER_FLAGS_2(icbiep, TCG_CALL_NO_WG, void, env, tl)
 DEF_HELPER_5(lscbx, tl, env, tl, i32, i32, i32)
diff --git a/target/ppc/mem_helper.c b/target/ppc/mem_helper.c
index 5067919ff8..d4957efd6e 100644
--- a/target/ppc/mem_helper.c
+++ b/target/ppc/mem_helper.c
@@ -296,26 +296,34 @@ static void dcbz_common(CPUPPCState *env, target_ulong 
addr,
 }
 }
 
-void helper_dcbz(CPUPPCState *env, target_ulong addr, uint32_t opcode)
+void helper_dcbz(CPUPPCState *env, target_ulong addr)
 {
-int dcbz_size = env->dcache_line_size;
-
-#if defined(TARGET_PPC64)
-/* Check for dcbz vs dcbzl on 970 */
-if (env->excp_model == POWERPC_EXCP_970 &&
-!(opcode & 0x0020) && ((env->spr[SPR_970_HID5] >> 7) & 0x3) == 1) {
-dcbz_size = 32;
-}
-#endif
-
-dcbz_common(env, addr, dcbz_size, ppc_env_mmu_index(env, false), GETPC());
+dcbz_common(env, addr, env->dcache_line_size,
+ppc_env_mmu_index(env, false), GETPC());
 }
 
-void helper_dcbzep(CPUPPCState *env, target_ulong addr, uint32_t opcode)
+void helper_dcbzep(CPUPPCState *env, target_ulong addr)
 {
 dcbz_common(env, addr, env->dcache_line_size, PPC_TLB_EPID_STORE, GETPC());
 }
 
+#ifdef TARGET_PPC64
+void helper_dcbzl(CPUPPCState *env, target_ulong addr)
+{
+int dcbz_size = env->dcache_line_size;
+
+/*
+ * The translator checked for POWERPC_EXCP_970.
+ * All that's left is to check HID5.
+ */
+if (((env->spr[SPR_970_HID5] >> 7) & 0x3) == 1) {
+dcbz_size = 32;
+}
+
+dcbz_common(env, addr, dcbz_size, ppc_env_mmu_index(env, false), GETPC());
+}
+#endif
+
 void helper_icbi(CPUPPCState *env, target_ulong addr)
 {
 addr &= ~(env->dcache_line_size - 1);
diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index 0bc16d7251..9e472ab7ef 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -178,6 +178,7 @@ struct DisasContext {
 /* Translation flags */
 MemOp default_tcg_memop_mask;
 #if defined(TARGET_PPC64)
+powerpc_excp_t excp_model;
 bool sf_mode;
 bool has_cfar;
 bool has_bhrb;
@@ -4445,27 +4446,29 @@ static void gen_dcblc(DisasContext *ctx)
 /* dcbz */
 static void gen_dcbz(DisasContext *ctx)
 {
-TCGv tcgv_addr;
-TCGv_i32 tcgv_op;
+TCGv tcgv_addr = tcg_temp_new();
 
 gen_set_access_type(ctx, ACCESS_CACHE);
-tcgv_addr = tcg_temp_new();
-tcgv_op = tcg_constant_i32(ctx->opcode & 0x03FF000);
 gen_addr_reg_index(ctx, tcgv_addr);
-gen_helper_dcbz(tcg_env, tcgv_addr, tcgv_op);
+
+#ifdef TARGET_PPC64
+if (ctx->excp_model == POWERPC_EXCP_970 && !(ctx->opcode & 0x0020)) {
+gen_helper_dcbzl(tcg_env, tcgv_addr);
+return;
+}
+#endif
+
+gen_helper_dcbz(tcg_env, tcgv_addr);
 }
 
 /* dcbzep */
 static void gen_dcbzep(DisasContext *ctx)
 {
-TCGv tcgv_addr;
-TCGv_i32 tcgv_op;
+TCGv tcgv_addr = tcg_temp_new();
 
 gen_set_access_type(ctx, ACCESS_CACHE);
-tcgv_addr = tcg_temp_new();
-tcgv_op = tcg_constant_i32(ctx->opcode & 0x03FF000);
 gen_addr_reg_index(ctx, tcgv_addr);
-gen_helper_dcbzep(tcg_env, tcgv_addr, tcgv_op);
+gen_helper_dcbzep(tcg_env, tcgv_addr);
 }
 
 /* dst / dstt */
@@ -6486,6 +6489,7 @@ static void ppc_tr_init_disas_context(DisasContextBase 
*dcbase, CPUState *cs)
 ctx->default_tcg_memop_mask = ctx->le_mode ? MO_LE : MO_BE;
 ctx->flags = env->flags;
 #if defined(TARGET_PPC64)
+ctx->excp_model = env->excp_model;
 ctx->sf_mode = (hflags >> HFLAGS_64) & 1;
 ctx->has_cfar = !!(env->flags & POWERPC_FLAG_CFAR);
 ctx->has_bhrb = !!(env->flags & POWERPC_FLAG_BHRB);
-- 
2.43.0




[PATCH v3 10/12] target/s390x: Use user_or_likely in access_memmove

2024-07-18 Thread Richard Henderson
Invert the conditional, indent the block, and use the macro
that expands to true for user-only.

Signed-off-by: Richard Henderson 
---
 target/s390x/tcg/mem_helper.c | 54 +--
 1 file changed, 26 insertions(+), 28 deletions(-)

diff --git a/target/s390x/tcg/mem_helper.c b/target/s390x/tcg/mem_helper.c
index 5311a15a09..331a35b2e5 100644
--- a/target/s390x/tcg/mem_helper.c
+++ b/target/s390x/tcg/mem_helper.c
@@ -296,41 +296,39 @@ static void access_memmove(CPUS390XState *env, S390Access 
*desta,
S390Access *srca, uintptr_t ra)
 {
 int len = desta->size1 + desta->size2;
-int diff;
 
 assert(len == srca->size1 + srca->size2);
 
 /* Fallback to slow access in case we don't have access to all host pages 
*/
-if (unlikely(!desta->haddr1 || (desta->size2 && !desta->haddr2) ||
- !srca->haddr1 || (srca->size2 && !srca->haddr2))) {
-int i;
+if (user_or_likely(desta->haddr1 &&
+   srca->haddr1 &&
+   (!desta->size2 || desta->haddr2) &&
+   (!srca->size2 || srca->haddr2))) {
+int diff = desta->size1 - srca->size1;
 
-for (i = 0; i < len; i++) {
-uint8_t byte = access_get_byte(env, srca, i, ra);
-
-access_set_byte(env, desta, i, byte, ra);
-}
-return;
-}
-
-diff = desta->size1 - srca->size1;
-if (likely(diff == 0)) {
-memmove(desta->haddr1, srca->haddr1, srca->size1);
-if (unlikely(srca->size2)) {
-memmove(desta->haddr2, srca->haddr2, srca->size2);
-}
-} else if (diff > 0) {
-memmove(desta->haddr1, srca->haddr1, srca->size1);
-memmove(desta->haddr1 + srca->size1, srca->haddr2, diff);
-if (likely(desta->size2)) {
-memmove(desta->haddr2, srca->haddr2 + diff, desta->size2);
+if (likely(diff == 0)) {
+memmove(desta->haddr1, srca->haddr1, srca->size1);
+if (unlikely(srca->size2)) {
+memmove(desta->haddr2, srca->haddr2, srca->size2);
+}
+} else if (diff > 0) {
+memmove(desta->haddr1, srca->haddr1, srca->size1);
+memmove(desta->haddr1 + srca->size1, srca->haddr2, diff);
+if (likely(desta->size2)) {
+memmove(desta->haddr2, srca->haddr2 + diff, desta->size2);
+}
+} else {
+diff = -diff;
+memmove(desta->haddr1, srca->haddr1, desta->size1);
+memmove(desta->haddr2, srca->haddr1 + desta->size1, diff);
+if (likely(srca->size2)) {
+memmove(desta->haddr2 + diff, srca->haddr2, srca->size2);
+}
 }
 } else {
-diff = -diff;
-memmove(desta->haddr1, srca->haddr1, desta->size1);
-memmove(desta->haddr2, srca->haddr1 + desta->size1, diff);
-if (likely(srca->size2)) {
-memmove(desta->haddr2 + diff, srca->haddr2, srca->size2);
+for (int i = 0; i < len; i++) {
+uint8_t byte = access_get_byte(env, srca, i, ra);
+access_set_byte(env, desta, i, byte, ra);
 }
 }
 }
-- 
2.43.0




[PATCH v3 00/12] Fixes for user-only munmap races

2024-07-18 Thread Richard Henderson
Changes for v3:
  * Fix patch 3 (sve) vs goto do_fault (pmm)
  * Fix patch 12 (rvv) vs watchpoints and goto ProbeSuccess (max chou).
  * Apply r-b.

r~

BALATON Zoltan (1):
  target/ppc/mem_helper.c: Remove a conditional from dcbz_common()

Richard Henderson (11):
  accel/tcg: Move {set,clear}_helper_retaddr to cpu_ldst.h
  target/arm: Use set/clear_helper_retaddr in helper-a64.c
  target/arm: Use set/clear_helper_retaddr in SVE and SME helpers
  target/ppc: Hoist dcbz_size out of dcbz_common
  target/ppc: Split out helper_dbczl for 970
  target/ppc: Merge helper_{dcbz,dcbzep}
  target/ppc: Improve helper_dcbz for user-only
  target/s390x: Use user_or_likely in do_access_memset
  target/s390x: Use user_or_likely in access_memmove
  target/s390x: Use set/clear_helper_retaddr in mem_helper.c
  target/riscv: Simplify probing in vext_ldff

 accel/tcg/user-retaddr.h  |  28 -
 include/exec/cpu_ldst.h   |  34 +++
 target/ppc/helper.h   |   6 +-
 accel/tcg/cpu-exec.c  |   3 -
 accel/tcg/user-exec.c |   1 -
 target/arm/tcg/helper-a64.c   |  14 -
 target/arm/tcg/sme_helper.c   |  16 ++
 target/arm/tcg/sve_helper.c   |  42 +++---
 target/ppc/mem_helper.c   |  52 +
 target/ppc/translate.c|  24 
 target/riscv/vector_helper.c  |  31 +-
 target/s390x/tcg/mem_helper.c | 103 +-
 12 files changed, 224 insertions(+), 130 deletions(-)
 delete mode 100644 accel/tcg/user-retaddr.h

-- 
2.43.0




[PATCH] tests/tcg/aarch64: Fix test-mte.py

2024-07-18 Thread Richard Henderson
Python 3.12 warns:

  TESTgdbstub MTE support on aarch64
/home/rth/qemu/src/tests/tcg/aarch64/gdbstub/test-mte.py:21: SyntaxWarning: 
invalid escape sequence '\('
  PATTERN_0 = "Memory tags for address 0x[0-9a-f]+ match \(0x[0-9a-f]+\)."

Double up the \ to pass one through to the pattern.

Signed-off-by: Richard Henderson 
---
 tests/tcg/aarch64/gdbstub/test-mte.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/tcg/aarch64/gdbstub/test-mte.py 
b/tests/tcg/aarch64/gdbstub/test-mte.py
index 2db0663c1a..66f9c25f8a 100644
--- a/tests/tcg/aarch64/gdbstub/test-mte.py
+++ b/tests/tcg/aarch64/gdbstub/test-mte.py
@@ -18,7 +18,7 @@
 from test_gdbstub import main, report
 
 
-PATTERN_0 = "Memory tags for address 0x[0-9a-f]+ match \(0x[0-9a-f]+\)."
+PATTERN_0 = "Memory tags for address 0x[0-9a-f]+ match \\(0x[0-9a-f]+\\)."
 PATTERN_1 = ".*(0x[0-9a-f]+)"
 
 
-- 
2.43.0




Re: [PATCH v5 00/19] Reconstruct loongson ipi driver

2024-07-18 Thread Richard Henderson

On 7/18/24 23:32, Philippe Mathieu-Daudé wrote:

Since v4:
- Fix build failure due to rebase (Song)
- Loongarch -> LoongArch (Song)
- Added Song's tags

Since v3:
- Use DEFINE_TYPES() macro (unreviewed patch #1)
- Update MAINTAINERS
- Added Bibo's tags


Ho hum, I didn't notice v5 when I just reviewed v4.

For the series:
Reviewed-by: Richard Henderson 

r~



Re: [PATCH v4 18/18] hw/intc/loongson_ipi: Remove unused headers

2024-07-18 Thread Richard Henderson

On 7/18/24 18:38, Philippe Mathieu-Daudé wrote:

Signed-off-by: Philippe Mathieu-Daudé
Reviewed-by: Bibo Mao
Tested-by: Bibo Mao
---
  hw/intc/loongson_ipi.c | 9 -
  1 file changed, 9 deletions(-)



Reviewed-by: Richard Henderson 

r~



Re: [PATCH v4 16/18] hw/loongarch/virt: Replace loongson IPI with loongarch IPI

2024-07-18 Thread Richard Henderson

On 7/18/24 18:38, Philippe Mathieu-Daudé wrote:

From: Bibo Mao

Loongarch IPI inherits from class LoongsonIPICommonClass, and it
only contains Loongarch 3A5000 virt machine specific interfaces,
rather than mix different machine implementations together.

Signed-off-by: Bibo Mao
[PMD: Rebased]
Co-Developed-by: Philippe Mathieu-Daudé
Signed-off-by: Philippe Mathieu-Daudé
Reviewed-by: Bibo Mao
Tested-by: Bibo Mao
---
  include/hw/loongarch/virt.h | 1 -
  hw/loongarch/virt.c | 4 ++--
  hw/loongarch/Kconfig| 2 +-
  3 files changed, 3 insertions(+), 4 deletions(-)



Reviewed-by: Richard Henderson 

r~



Re: [PATCH v4 17/18] hw/intc/loongson_ipi: Restrict to MIPS

2024-07-18 Thread Richard Henderson

On 7/18/24 18:38, Philippe Mathieu-Daudé wrote:

From: Bibo Mao

Now than LoongArch target can use the TYPE_LOONGARCH_IPI
model, restrict TYPE_LOONGSON_IPI to MIPS.

Signed-off-by: Bibo Mao
[PMD: Extracted from bigger commit, added commit description]
Co-Developed-by: Philippe Mathieu-Daudé
Signed-off-by: Philippe Mathieu-Daudé
Reviewed-by: Bibo Mao
Tested-by: Bibo Mao
---
  MAINTAINERS|  2 --
  hw/intc/loongson_ipi.c | 14 --
  2 files changed, 16 deletions(-)



Reviewed-by: Richard Henderson 

r~



Re: [PATCH v4 14/18] hw/intc/loongson_ipi: Move common code to loongson_ipi_common.c

2024-07-18 Thread Richard Henderson

On 7/18/24 18:38, Philippe Mathieu-Daudé wrote:

From: Bibo Mao

Move the common code from loongson_ipi.c to loongson_ipi_common.c,
call parent_realize() instead of loongson_ipi_common_realize() in
loongson_ipi_realize().

Signed-off-by: Bibo Mao
[PMD: Extracted from bigger commit, added commit description]
Co-Developed-by: Philippe Mathieu-Daudé
Signed-off-by: Philippe Mathieu-Daudé
Reviewed-by: Bibo Mao
Tested-by: Bibo Mao
---
  hw/intc/loongson_ipi.c| 269 +
  hw/intc/loongson_ipi_common.c | 272 ++
  2 files changed, 274 insertions(+), 267 deletions(-)


Reviewed-by: Richard Henderson 

r~



Re: [PATCH v4 13/18] hw/intc/loongson_ipi: Expose loongson_ipi_core_read/write helpers

2024-07-18 Thread Richard Henderson

On 7/18/24 18:38, Philippe Mathieu-Daudé wrote:

From: Bibo Mao

In order to access loongson_ipi_core_read/write helpers
from loongson_ipi_common.c in the next commit, make their
prototype declaration public.

Signed-off-by: Bibo Mao
[PMD: Extracted from bigger commit, added commit description]
Co-Developed-by: Philippe Mathieu-Daudé
Signed-off-by: Philippe Mathieu-Daudé
Reviewed-by: Bibo Mao
Tested-by: Bibo Mao
---
  include/hw/intc/loongson_ipi_common.h |  6 ++
  hw/intc/loongson_ipi.c| 10 --
  2 files changed, 10 insertions(+), 6 deletions(-)


Reviewed-by: Richard Henderson 

r~



Re: [PATCH v4 12/18] hw/intc/loongson_ipi: Add LoongsonIPICommonClass::cpu_by_arch_id handler

2024-07-18 Thread Richard Henderson

On 7/18/24 18:38, Philippe Mathieu-Daudé wrote:

From: Bibo Mao

Allow Loongson IPI implementations to have their own cpu_by_arch_id()
handler.

Signed-off-by: Bibo Mao
[PMD: Extracted from bigger commit, added commit description]
Co-Developed-by: Philippe Mathieu-Daudé
Signed-off-by: Philippe Mathieu-Daudé
Reviewed-by: Bibo Mao
Tested-by: Bibo Mao
---
  include/hw/intc/loongson_ipi_common.h |  1 +
  hw/intc/loongson_ipi.c| 10 +++---
  2 files changed, 8 insertions(+), 3 deletions(-)


Reviewed-by: Richard Henderson 

r~



Re: [PATCH v4 11/18] hw/intc/loongson_ipi: Add LoongsonIPICommonClass::get_iocsr_as handler

2024-07-18 Thread Richard Henderson

On 7/18/24 18:38, Philippe Mathieu-Daudé wrote:

From: Bibo Mao

Allow Loongson IPI implementations to have their own get_iocsr_as()
handler.

Signed-off-by: Bibo Mao
[PMD: Extracted from bigger commit, added commit description]
Co-Developed-by: Philippe Mathieu-Daudé
Signed-off-by: Philippe Mathieu-Daudé
Reviewed-by: Bibo Mao
Tested-by: Bibo Mao
---
  include/hw/intc/loongson_ipi_common.h |  2 ++
  hw/intc/loongson_ipi.c| 16 
  2 files changed, 14 insertions(+), 4 deletions(-)


Reviewed-by: Richard Henderson 

r~



Re: [PATCH v4 10/18] hw/intc/loongson_ipi: Pass LoongsonIPICommonState to send_ipi_data()

2024-07-18 Thread Richard Henderson

On 7/18/24 18:38, Philippe Mathieu-Daudé wrote:

From: Bibo Mao

In order to get LoongsonIPICommonClass in send_ipi_data()
in the next commit, propagate LoongsonIPICommonState.

Signed-off-by: Bibo Mao
[PMD: Extracted from bigger commit, added commit description]
Co-Developed-by: Philippe Mathieu-Daudé
Signed-off-by: Philippe Mathieu-Daudé
Reviewed-by: Bibo Mao
Tested-by: Bibo Mao
---
  hw/intc/loongson_ipi.c | 19 +++
  1 file changed, 11 insertions(+), 8 deletions(-)


Reviewed-by: Richard Henderson 

r~



Re: [PATCH v4 09/18] hw/intc/loongson_ipi: Move IPICore structure to loongson_ipi_common.h

2024-07-18 Thread Richard Henderson

On 7/18/24 18:38, Philippe Mathieu-Daudé wrote:

From: Bibo Mao

Move the IPICore structure and corresponding common fields
of LoongsonIPICommonState to "hw/intc/loongson_ipi_common.h".

Signed-off-by: Bibo Mao
[PMD: Extracted from bigger commit, added commit description]
Co-Developed-by: Philippe Mathieu-Daudé
Signed-off-by: Philippe Mathieu-Daudé
Reviewed-by: Bibo Mao
Tested-by: Bibo Mao
---
  include/hw/intc/loongson_ipi.h| 17 
  include/hw/intc/loongson_ipi_common.h | 18 +
  hw/intc/loongson_ipi.c| 56 +--
  hw/intc/loongson_ipi_common.c | 50 
  4 files changed, 77 insertions(+), 64 deletions(-)


Reviewed-by: Richard Henderson 

r~



Re: [PATCH v4 08/18] hw/intc/loongson_ipi: Move IPICore::mmio_mem to LoongsonIPIState

2024-07-18 Thread Richard Henderson

On 7/18/24 18:38, Philippe Mathieu-Daudé wrote:

From: Bibo Mao

It is easier to manage one array of MMIO MR rather
than one per vCPU.

Signed-off-by: Bibo Mao
[PMD: Extracted from bigger commit, added commit description]
Co-Developed-by: Philippe Mathieu-Daudé
Signed-off-by: Philippe Mathieu-Daudé
Reviewed-by: Bibo Mao
Tested-by: Bibo Mao
---
  include/hw/intc/loongson_ipi.h | 2 +-
  hw/intc/loongson_ipi.c | 9 ++---
  2 files changed, 7 insertions(+), 4 deletions(-)


Reviewed-by: Richard Henderson 

r~



Re: [PATCH v4 06/18] hw/intc/loongson_ipi: Add TYPE_LOONGSON_IPI_COMMON stub

2024-07-18 Thread Richard Henderson

On 7/18/24 18:38, Philippe Mathieu-Daudé wrote:

From: Bibo Mao

Introduce LOONGSON_IPI_COMMON stubs, QDev parent of LOONGSON_IPI.

Signed-off-by: Bibo Mao
[PMD: Extracted from bigger commit, added commit description]
Co-Developed-by: Philippe Mathieu-Daudé
Signed-off-by: Philippe Mathieu-Daudé
Reviewed-by: Bibo Mao
Tested-by: Bibo Mao
---
  MAINTAINERS   |  4 
  include/hw/intc/loongson_ipi.h| 13 +++--
  include/hw/intc/loongson_ipi_common.h | 26 ++
  hw/intc/loongson_ipi.c|  7 ---
  hw/intc/loongson_ipi_common.c | 22 ++
  hw/intc/Kconfig   |  4 
  hw/intc/meson.build   |  1 +
  7 files changed, 72 insertions(+), 5 deletions(-)
  create mode 100644 include/hw/intc/loongson_ipi_common.h
  create mode 100644 hw/intc/loongson_ipi_common.c


Reviewed-by: Richard Henderson 

r~



Re: [PATCH v4 07/18] hw/intc/loongson_ipi: Move common definitions to loongson_ipi_common.h

2024-07-18 Thread Richard Henderson

On 7/18/24 18:38, Philippe Mathieu-Daudé wrote:

From: Bibo Mao

Signed-off-by: Bibo Mao
[PMD: Extracted from bigger commit, added commit description]
Co-Developed-by: Philippe Mathieu-Daudé
Signed-off-by: Philippe Mathieu-Daudé
Reviewed-by: Bibo Mao
Tested-by: Bibo Mao
---
  include/hw/intc/loongson_ipi.h| 18 --
  include/hw/intc/loongson_ipi_common.h | 19 +++
  2 files changed, 19 insertions(+), 18 deletions(-)


Reviewed-by: Richard Henderson 

r~



Re: [PATCH v4 05/18] hw/intc/loongson_ipi: Extract loongson_ipi_common_realize()

2024-07-18 Thread Richard Henderson

On 7/18/24 18:38, Philippe Mathieu-Daudé wrote:

From: Bibo Mao

In preparation to extract common IPI code in few commits,
extract loongson_ipi_common_realize().

Signed-off-by: Bibo Mao
[PMD: Extracted from bigger commit, added commit description]
Co-Developed-by: Philippe Mathieu-Daudé
Signed-off-by: Philippe Mathieu-Daudé
Reviewed-by: Bibo Mao
Tested-by: Bibo Mao
---
  hw/intc/loongson_ipi.c | 25 ++---
  1 file changed, 18 insertions(+), 7 deletions(-)


Reviewed-by: Richard Henderson 

r~



Re: [PATCH v4 04/18] hw/intc/loongson_ipi: Extract loongson_ipi_common_finalize()

2024-07-18 Thread Richard Henderson

On 7/18/24 18:38, Philippe Mathieu-Daudé wrote:

From: Bibo Mao

In preparation to extract common IPI code in few commits,
extract loongson_ipi_common_finalize().

Signed-off-by: Bibo Mao
[PMD: Extracted from bigger commit, added commit description]
Co-Developed-by: Philippe Mathieu-Daudé
Signed-off-by: Philippe Mathieu-Daudé
Reviewed-by: Bibo Mao
Tested-by: Bibo Mao
---
  hw/intc/loongson_ipi.c | 7 ++-
  1 file changed, 6 insertions(+), 1 deletion(-)


Reviewed-by: Richard Henderson 

r~



Re: [PATCH v4 03/18] hw/intc/loongson_ipi: Rename LoongsonIPI -> LoongsonIPIState

2024-07-18 Thread Richard Henderson

On 7/18/24 18:38, Philippe Mathieu-Daudé wrote:

From: Bibo Mao

We'll have to add LoongsonIPIClass in few commits,
so rename LoongsonIPI as LoongsonIPIState for clarity.

Signed-off-by: Bibo Mao
[PMD: Extracted from bigger commit, added commit description]
Co-Developed-by: Philippe Mathieu-Daudé
Signed-off-by: Philippe Mathieu-Daudé
Reviewed-by: Bibo Mao
Tested-by: Bibo Mao
---
  include/hw/intc/loongson_ipi.h |  6 +++---
  hw/intc/loongson_ipi.c | 16 
  2 files changed, 11 insertions(+), 11 deletions(-)


Reviewed-by: Richard Henderson 

r~



Re: [PATCH v4 01/18] hw/intc/loongson_ipi: Declare QOM types using DEFINE_TYPES() macro

2024-07-18 Thread Richard Henderson

On 7/18/24 18:38, Philippe Mathieu-Daudé wrote:

When multiple QOM types are registered in the same file,
it is simpler to use the the DEFINE_TYPES() macro. Replace
the type_init() / type_register_static() combination.

Signed-off-by: Philippe Mathieu-Daudé 
---
  hw/intc/loongson_ipi.c | 21 +
  1 file changed, 9 insertions(+), 12 deletions(-)


Reviewed-by: Richard Henderson 

r~



Re: [PULL 00/30] riscv-to-apply queue

2024-07-18 Thread Richard Henderson

On 7/18/24 12:09, Alistair Francis wrote:

The following changes since commit 58ee924b97d1c0898555647a31820c5a20d55a73:

   Merge tag 'for-upstream' ofhttps://gitlab.com/bonzini/qemu into staging 
(2024-07-17 15:40:28 +1000)

are available in the Git repository at:

   https://github.com/alistair23/qemu.git tags/pull-riscv-to-apply-20240718-1

for you to fetch changes up to daff9f7f7a457f78ce455e6abf19c2a37dfe7630:

   roms/opensbi: Update to v1.5 (2024-07-18 12:08:45 +1000)


RISC-V PR for 9.1

* Support the zimop, zcmop, zama16b and zabha extensions
* Validate the mode when setting vstvec CSR
* Add decode support for Zawrs extension
* Update the KVM regs to Linux 6.10-rc5
* Add smcntrpmf extension support
* Raise an exception when CSRRS/CSRRC writes a read-only CSR
* Re-insert and deprecate 'riscv,delegate' in virt machine device tree
* roms/opensbi: Update to v1.5


Applied, thanks.  Please update https://wiki.qemu.org/ChangeLog/9.1 as 
appropriate.


r~



Re: [PULL 00/16] Trivial patches for 2024-07-17

2024-07-18 Thread Richard Henderson

On 7/17/24 21:06, Michael Tokarev wrote:

The following changes since commit e2f346aa98646e84eabe0256f89d08e89b1837cf:

   Merge tag 'sdmmc-20240716' ofhttps://github.com/philmd/qemu into staging 
(2024-07-17 07:59:31 +1000)

are available in the Git repository at:

   https://gitlab.com/mjt0k/qemu.git tags/pull-trivial-patches

for you to fetch changes up to 66a8de9889ceb929e2abe7fb0e424f45210d9dda:

   meson: Update meson-buildoptions.sh (2024-07-17 14:04:15 +0300)


trivial patches for 2024-07-17


Applied, thanks.

r~



Re: [PULL 00/14] QAPI patches patches for 2024-07-17

2024-07-18 Thread Richard Henderson

On 7/17/24 20:48, Markus Armbruster wrote:

The following changes since commit e2f346aa98646e84eabe0256f89d08e89b1837cf:

   Merge tag 'sdmmc-20240716' ofhttps://github.com/philmd/qemu into staging 
(2024-07-17 07:59:31 +1000)

are available in the Git repository at:

   https://repo.or.cz/qemu/armbru.git tags/pull-qapi-2024-07-17

for you to fetch changes up to 3c5f6114d9ffc70bc9b1a7cc072a911f966d:

   qapi: remove "Example" doc section (2024-07-17 10:20:54 +0200)


QAPI patches patches for 2024-07-17


Applied, thanks.

r~



Re: [PULL 00/20] i386, bugfix changes for QEMU 9.1 soft freeze

2024-07-17 Thread Richard Henderson

On 7/17/24 15:03, Paolo Bonzini wrote:

The following changes since commit 959269e910944c03bc13f300d65bf08b060d5d0f:

   Merge tag 'python-pull-request' ofhttps://gitlab.com/jsnow/qemu into staging 
(2024-07-16 06:45:23 +1000)

are available in the Git repository at:

   https://gitlab.com/bonzini/qemu.git tags/for-upstream

for you to fetch changes up to 6a079f2e68e1832ebca0e7d64bc31ffebde9b2dd:

   target/i386/tcg: save current task state before loading new one (2024-07-16 
18:18:25 +0200)


* target/i386/tcg: fixes for seg_helper.c
* SEV: Don't allow automatic fallback to legacy KVM_SEV_INIT,
   but also don't use it by default
* scsi: honor bootindex again for legacy drives
* hpet, utils, scsi, build, cpu: miscellaneous bugfixes


Applied, thanks.


r~



[PATCH 11/17] target/arm: Fix whitespace near gen_srshr64_i64

2024-07-17 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 target/arm/tcg/gengvec.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/arm/tcg/gengvec.c b/target/arm/tcg/gengvec.c
index 47ac2634ce..b6c0d86bad 100644
--- a/target/arm/tcg/gengvec.c
+++ b/target/arm/tcg/gengvec.c
@@ -304,7 +304,7 @@ void gen_srshr32_i32(TCGv_i32 d, TCGv_i32 a, int32_t sh)
 tcg_gen_add_i32(d, d, t);
 }
 
- void gen_srshr64_i64(TCGv_i64 d, TCGv_i64 a, int64_t sh)
+void gen_srshr64_i64(TCGv_i64 d, TCGv_i64 a, int64_t sh)
 {
 TCGv_i64 t = tcg_temp_new_i64();
 
-- 
2.43.0




[PATCH 16/17] target/arm: Convert SSHLL, USHLL to decodetree

2024-07-17 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 target/arm/tcg/translate-a64.c | 84 --
 target/arm/tcg/a64.decode  |  3 ++
 2 files changed, 43 insertions(+), 44 deletions(-)

diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index 627d4311bb..2a9cb3fbe0 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -6972,6 +6972,45 @@ TRANS(SRI_v, do_vec_shift_imm, a, gen_gvec_sri)
 TRANS(SHL_v, do_vec_shift_imm, a, tcg_gen_gvec_shli)
 TRANS(SLI_v, do_vec_shift_imm, a, gen_gvec_sli);
 
+static bool do_vec_shift_imm_wide(DisasContext *s, arg_qrri_e *a, bool is_u)
+{
+TCGv_i64 tcg_rn, tcg_rd;
+int esz = a->esz;
+int esize;
+
+if (esz < 0 || esz >= MO_64) {
+return false;
+}
+if (!fp_access_check(s)) {
+return true;
+}
+
+/*
+ * For the LL variants the store is larger than the load,
+ * so if rd == rn we would overwrite parts of our input.
+ * So load everything right now and use shifts in the main loop.
+ */
+tcg_rd = tcg_temp_new_i64();
+tcg_rn = tcg_temp_new_i64();
+read_vec_element(s, tcg_rn, a->rn, a->q, MO_64);
+
+esize = 8 << esz;
+for (int i = 0, elements = 8 >> esz; i < elements; i++) {
+if (is_u) {
+tcg_gen_extract_i64(tcg_rd, tcg_rn, i * esize, esize);
+} else {
+tcg_gen_sextract_i64(tcg_rd, tcg_rn, i * esize, esize);
+}
+tcg_gen_shli_i64(tcg_rd, tcg_rd, a->imm);
+write_vec_element(s, tcg_rd, a->rd, i, esz + 1);
+}
+clear_vec_high(s, true, a->rd);
+return true;
+}
+
+TRANS(SSHLL_v, do_vec_shift_imm_wide, a, false)
+TRANS(USHLL_v, do_vec_shift_imm_wide, a, true)
+
 /* Shift a TCGv src by TCGv shift_amount, put result in dst.
  * Note that it is the caller's responsibility to ensure that the
  * shift amount is in range (ie 0..31 or 0..63) and provide the ARM
@@ -10436,47 +10475,6 @@ static void 
disas_simd_scalar_two_reg_misc(DisasContext *s, uint32_t insn)
 }
 }
 
-/* USHLL/SHLL - Vector shift left with widening */
-static void handle_vec_simd_wshli(DisasContext *s, bool is_q, bool is_u,
- int immh, int immb, int opcode, int rn, int 
rd)
-{
-int size = 32 - clz32(immh) - 1;
-int immhb = immh << 3 | immb;
-int shift = immhb - (8 << size);
-int dsize = 64;
-int esize = 8 << size;
-int elements = dsize/esize;
-TCGv_i64 tcg_rn = tcg_temp_new_i64();
-TCGv_i64 tcg_rd = tcg_temp_new_i64();
-int i;
-
-if (size >= 3) {
-unallocated_encoding(s);
-return;
-}
-
-if (!fp_access_check(s)) {
-return;
-}
-
-/* For the LL variants the store is larger than the load,
- * so if rd == rn we would overwrite parts of our input.
- * So load everything right now and use shifts in the main loop.
- */
-read_vec_element(s, tcg_rn, rn, is_q ? 1 : 0, MO_64);
-
-for (i = 0; i < elements; i++) {
-if (is_u) {
-tcg_gen_extract_i64(tcg_rd, tcg_rn, i * esize, esize);
-} else {
-tcg_gen_sextract_i64(tcg_rd, tcg_rn, i * esize, esize);
-}
-tcg_gen_shli_i64(tcg_rd, tcg_rd, shift);
-write_vec_element(s, tcg_rd, rd, i, size + 1);
-}
-clear_vec_high(s, true, rd);
-}
-
 /* SHRN/RSHRN - Shift right with narrowing (and potential rounding) */
 static void handle_vec_simd_shrn(DisasContext *s, bool is_q,
  int immh, int immb, int opcode, int rn, int 
rd)
@@ -10566,9 +10564,6 @@ static void disas_simd_shift_imm(DisasContext *s, 
uint32_t insn)
 handle_vec_simd_sqshrn(s, false, is_q, is_u, is_u, immh, immb,
opcode, rn, rd);
 break;
-case 0x14: /* SSHLL / USHLL */
-handle_vec_simd_wshli(s, is_q, is_u, immh, immb, opcode, rn, rd);
-break;
 case 0x1c: /* SCVTF / UCVTF */
 handle_simd_shift_intfp_conv(s, false, is_q, is_u, immh, immb,
  opcode, rn, rd);
@@ -10593,6 +10588,7 @@ static void disas_simd_shift_imm(DisasContext *s, 
uint32_t insn)
 case 0x06: /* SRSRA / URSRA (accum + rounding) */
 case 0x08: /* SRI */
 case 0x0a: /* SHL / SLI */
+case 0x14: /* SSHLL / USHLL */
 unallocated_encoding(s);
 return;
 }
diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode
index 6aa8a18240..d13d680589 100644
--- a/target/arm/tcg/a64.decode
+++ b/target/arm/tcg/a64.decode
@@ -1218,5 +1218,8 @@ FMOVI_v_h   0 q:1 00  0 ...  11 . 
rd:5  %abcdefgh
 
 SHL_v   0.00 0  ... 01010 1 . . @qlshifti
 SLI_v   0.10 0  ... 01010 1 . . @qlshifti
+
+SSHLL_v 0.00 0  ... 10100 1 . . @qlshifti
+USHLL_v 0.10 0  ... 10100 1 . . @qlshifti
   ]
 }
-- 
2.43.0




[PATCH 17/17] target/arm: Push tcg_rnd into handle_shri_with_rndacc

2024-07-17 Thread Richard Henderson
We always pass the same value for round; compute it
within common code.

Signed-off-by: Richard Henderson 
---
 target/arm/tcg/translate-a64.c | 32 ++--
 1 file changed, 6 insertions(+), 26 deletions(-)

diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index 2a9cb3fbe0..f4ff698257 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -9197,11 +9197,10 @@ static void disas_data_proc_fp(DisasContext *s, 
uint32_t insn)
  * the vector and scalar code.
  */
 static void handle_shri_with_rndacc(TCGv_i64 tcg_res, TCGv_i64 tcg_src,
-TCGv_i64 tcg_rnd, bool accumulate,
+bool round, bool accumulate,
 bool is_u, int size, int shift)
 {
 bool extended_result = false;
-bool round = tcg_rnd != NULL;
 int ext_lshift = 0;
 TCGv_i64 tcg_src_hi;
 
@@ -9219,6 +9218,7 @@ static void handle_shri_with_rndacc(TCGv_i64 tcg_res, 
TCGv_i64 tcg_src,
 
 /* Deal with the rounding step */
 if (round) {
+TCGv_i64 tcg_rnd = tcg_constant_i64(1ull << (shift - 1));
 if (extended_result) {
 TCGv_i64 tcg_zero = tcg_constant_i64(0);
 if (!is_u) {
@@ -9286,7 +9286,6 @@ static void handle_scalar_simd_shri(DisasContext *s,
 bool insert = false;
 TCGv_i64 tcg_rn;
 TCGv_i64 tcg_rd;
-TCGv_i64 tcg_round;
 
 if (!extract32(immh, 3, 1)) {
 unallocated_encoding(s);
@@ -9312,12 +9311,6 @@ static void handle_scalar_simd_shri(DisasContext *s,
 break;
 }
 
-if (round) {
-tcg_round = tcg_constant_i64(1ULL << (shift - 1));
-} else {
-tcg_round = NULL;
-}
-
 tcg_rn = read_fp_dreg(s, rn);
 tcg_rd = (accumulate || insert) ? read_fp_dreg(s, rd) : tcg_temp_new_i64();
 
@@ -9331,7 +9324,7 @@ static void handle_scalar_simd_shri(DisasContext *s,
 tcg_gen_deposit_i64(tcg_rd, tcg_rd, tcg_rn, 0, esize - shift);
 }
 } else {
-handle_shri_with_rndacc(tcg_rd, tcg_rn, tcg_round,
+handle_shri_with_rndacc(tcg_rd, tcg_rn, round,
 accumulate, is_u, size, shift);
 }
 
@@ -9384,7 +9377,7 @@ static void handle_vec_simd_sqshrn(DisasContext *s, bool 
is_scalar, bool is_q,
 int elements = is_scalar ? 1 : (64 / esize);
 bool round = extract32(opcode, 0, 1);
 MemOp ldop = (size + 1) | (is_u_shift ? 0 : MO_SIGN);
-TCGv_i64 tcg_rn, tcg_rd, tcg_round;
+TCGv_i64 tcg_rn, tcg_rd;
 TCGv_i32 tcg_rd_narrowed;
 TCGv_i64 tcg_final;
 
@@ -9429,15 +9422,9 @@ static void handle_vec_simd_sqshrn(DisasContext *s, bool 
is_scalar, bool is_q,
 tcg_rd_narrowed = tcg_temp_new_i32();
 tcg_final = tcg_temp_new_i64();
 
-if (round) {
-tcg_round = tcg_constant_i64(1ULL << (shift - 1));
-} else {
-tcg_round = NULL;
-}
-
 for (i = 0; i < elements; i++) {
 read_vec_element(s, tcg_rn, rn, i, ldop);
-handle_shri_with_rndacc(tcg_rd, tcg_rn, tcg_round,
+handle_shri_with_rndacc(tcg_rd, tcg_rn, round,
 false, is_u_shift, size+1, shift);
 narrowfn(tcg_rd_narrowed, tcg_env, tcg_rd);
 tcg_gen_extu_i32_i64(tcg_rd, tcg_rd_narrowed);
@@ -10487,7 +10474,6 @@ static void handle_vec_simd_shrn(DisasContext *s, bool 
is_q,
 int shift = (2 * esize) - immhb;
 bool round = extract32(opcode, 0, 1);
 TCGv_i64 tcg_rn, tcg_rd, tcg_final;
-TCGv_i64 tcg_round;
 int i;
 
 if (extract32(immh, 3, 1)) {
@@ -10504,15 +10490,9 @@ static void handle_vec_simd_shrn(DisasContext *s, bool 
is_q,
 tcg_final = tcg_temp_new_i64();
 read_vec_element(s, tcg_final, rd, is_q ? 1 : 0, MO_64);
 
-if (round) {
-tcg_round = tcg_constant_i64(1ULL << (shift - 1));
-} else {
-tcg_round = NULL;
-}
-
 for (i = 0; i < elements; i++) {
 read_vec_element(s, tcg_rn, rn, i, size+1);
-handle_shri_with_rndacc(tcg_rd, tcg_rn, tcg_round,
+handle_shri_with_rndacc(tcg_rd, tcg_rn, round,
 false, true, size+1, shift);
 
 tcg_gen_deposit_i64(tcg_final, tcg_final, tcg_rd, esize * i, esize);
-- 
2.43.0




[PATCH 08/17] target/arm: Convert FMOVI (scalar, immediate) to decodetree

2024-07-17 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 target/arm/tcg/translate-a64.c | 74 --
 target/arm/tcg/a64.decode  |  4 ++
 2 files changed, 30 insertions(+), 48 deletions(-)

diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index 2964279c00..6582816e4e 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -6847,6 +6847,31 @@ TRANS(FMINNMV_s, do_fp_reduction, a, 
gen_helper_vfp_minnums)
 TRANS(FMAXV_s, do_fp_reduction, a, gen_helper_vfp_maxs)
 TRANS(FMINV_s, do_fp_reduction, a, gen_helper_vfp_mins)
 
+/*
+ * Floating-point Immediate
+ */
+
+static bool trans_FMOVI_s(DisasContext *s, arg_FMOVI_s *a)
+{
+switch (a->esz) {
+case MO_32:
+case MO_64:
+break;
+case MO_16:
+if (!dc_isar_feature(aa64_fp16, s)) {
+return false;
+}
+break;
+default:
+return false;
+}
+if (fp_access_check(s)) {
+uint64_t imm = vfp_expand_imm(a->esz, a->imm);
+write_fp_dreg(s, a->rd, tcg_constant_i64(imm));
+}
+return true;
+}
+
 /* Shift a TCGv src by TCGv shift_amount, put result in dst.
  * Note that it is the caller's responsibility to ensure that the
  * shift amount is in range (ie 0..31 or 0..63) and provide the ARM
@@ -8584,53 +8609,6 @@ static void disas_fp_1src(DisasContext *s, uint32_t insn)
 }
 }
 
-/* Floating point immediate
- *   31  30  29 28   24 23  22  21 2013 12   10 95 40
- * +---+---+---+---+--+---++---+--+--+
- * | M | 0 | S | 1 1 1 1 0 | type | 1 |imm8| 1 0 0 | imm5 |  Rd  |
- * +---+---+---+---+--+---++---+--+--+
- */
-static void disas_fp_imm(DisasContext *s, uint32_t insn)
-{
-int rd = extract32(insn, 0, 5);
-int imm5 = extract32(insn, 5, 5);
-int imm8 = extract32(insn, 13, 8);
-int type = extract32(insn, 22, 2);
-int mos = extract32(insn, 29, 3);
-uint64_t imm;
-MemOp sz;
-
-if (mos || imm5) {
-unallocated_encoding(s);
-return;
-}
-
-switch (type) {
-case 0:
-sz = MO_32;
-break;
-case 1:
-sz = MO_64;
-break;
-case 3:
-sz = MO_16;
-if (dc_isar_feature(aa64_fp16, s)) {
-break;
-}
-/* fallthru */
-default:
-unallocated_encoding(s);
-return;
-}
-
-if (!fp_access_check(s)) {
-return;
-}
-
-imm = vfp_expand_imm(sz, imm8);
-write_fp_dreg(s, rd, tcg_constant_i64(imm));
-}
-
 /* Handle floating point <=> fixed point conversions. Note that we can
  * also deal with fp <=> integer conversions as a special case (scale == 64)
  * OPTME: consider handling that special case specially or at least skipping
@@ -9050,7 +9028,7 @@ static void disas_data_proc_fp(DisasContext *s, uint32_t 
insn)
 switch (ctz32(extract32(insn, 12, 4))) {
 case 0: /* [15:12] == xxx1 */
 /* Floating point immediate */
-disas_fp_imm(s, insn);
+unallocated_encoding(s); /* in decodetree */
 break;
 case 1: /* [15:12] == xx10 */
 /* Floating point compare */
diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode
index 117269803d..de763d3f12 100644
--- a/target/arm/tcg/a64.decode
+++ b/target/arm/tcg/a64.decode
@@ -1180,3 +1180,7 @@ FMAXV_s 0110 1110 00 11000 0 10 . .   
  @rr_q1e2
 
 FMINV_h 0.00 1110 10 11000 0 10 . . @qrr_h
 FMINV_s 0110 1110 10 11000 0 10 . . @rr_q1e2
+
+# Floating-point Immediate
+
+FMOVI_s 0001 1110 .. 1 imm:8 100 0 rd:5 esz=%esz_hsd
-- 
2.43.0




[PATCH 13/17] target/arm: Convet handle_vec_simd_shli to decodetree

2024-07-17 Thread Richard Henderson
This includes SHL and SLI.

Signed-off-by: Richard Henderson 
---
 target/arm/tcg/translate-a64.c | 40 +-
 target/arm/tcg/a64.decode  |  6 +
 2 files changed, 16 insertions(+), 30 deletions(-)

diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index 1e482477c5..fd90752dee 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -84,6 +84,13 @@ static int rcount_immhb(DisasContext *s, int x)
 return (16 << size) - x;
 }
 
+/* For Advanced SIMD shift by immediate, left shift count. */
+static int lcount_immhb(DisasContext *s, int x)
+{
+int size = esz_immh(s, x >> 3);
+return x - (8 << size);
+}
+
 /*
  * Include the generated decoders.
  */
@@ -6962,6 +6969,8 @@ TRANS(URSHR_v, do_vec_shift_imm, a, gen_gvec_urshr)
 TRANS(SRSRA_v, do_vec_shift_imm, a, gen_gvec_srsra)
 TRANS(URSRA_v, do_vec_shift_imm, a, gen_gvec_ursra)
 TRANS(SRI_v, do_vec_shift_imm, a, gen_gvec_sri)
+TRANS(SHL_v, do_vec_shift_imm, a, tcg_gen_gvec_shli)
+TRANS(SLI_v, do_vec_shift_imm, a, gen_gvec_sli);
 
 /* Shift a TCGv src by TCGv shift_amount, put result in dst.
  * Note that it is the caller's responsibility to ensure that the
@@ -10427,33 +10436,6 @@ static void 
disas_simd_scalar_two_reg_misc(DisasContext *s, uint32_t insn)
 }
 }
 
-/* SHL/SLI - Vector shift left */
-static void handle_vec_simd_shli(DisasContext *s, bool is_q, bool insert,
- int immh, int immb, int opcode, int rn, int 
rd)
-{
-int size = 32 - clz32(immh) - 1;
-int immhb = immh << 3 | immb;
-int shift = immhb - (8 << size);
-
-/* Range of size is limited by decode: immh is a non-zero 4 bit field */
-assert(size >= 0 && size <= 3);
-
-if (extract32(immh, 3, 1) && !is_q) {
-unallocated_encoding(s);
-return;
-}
-
-if (!fp_access_check(s)) {
-return;
-}
-
-if (insert) {
-gen_gvec_fn2i(s, is_q, rd, rn, shift, gen_gvec_sli, size);
-} else {
-gen_gvec_fn2i(s, is_q, rd, rn, shift, tcg_gen_gvec_shli, size);
-}
-}
-
 /* USHLL/SHLL - Vector shift left with widening */
 static void handle_vec_simd_wshli(DisasContext *s, bool is_q, bool is_u,
  int immh, int immb, int opcode, int rn, int 
rd)
@@ -10566,9 +10548,6 @@ static void disas_simd_shift_imm(DisasContext *s, 
uint32_t insn)
 }
 
 switch (opcode) {
-case 0x0a: /* SHL / SLI */
-handle_vec_simd_shli(s, is_q, is_u, immh, immb, opcode, rn, rd);
-break;
 case 0x10: /* SHRN */
 case 0x11: /* RSHRN / SQRSHRUN */
 if (is_u) {
@@ -10609,6 +10588,7 @@ static void disas_simd_shift_imm(DisasContext *s, 
uint32_t insn)
 case 0x04: /* SRSHR / URSHR (rounding) */
 case 0x06: /* SRSRA / URSRA (accum + rounding) */
 case 0x08: /* SRI */
+case 0x0a: /* SHL / SLI */
 unallocated_encoding(s);
 return;
 }
diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode
index c525f5fc35..6aa8a18240 100644
--- a/target/arm/tcg/a64.decode
+++ b/target/arm/tcg/a64.decode
@@ -1191,9 +1191,12 @@ FMOVI_s 0001 1110 .. 1 imm:8 100 0 rd:5  
   esz=%esz_hsd
 %abcdefgh   16:3 5:5
 %esz_immh   19:4 !function=esz_immh
 %rcount_immhb   16:7 !function=rcount_immhb
+%lcount_immhb   16:7 !function=lcount_immhb
 
 @qrshifti   . q:1 .. .  ... . . rn:5 rd:5   \
 _e esz=%esz_immh imm=%rcount_immhb
+@qlshifti   . q:1 .. .  ... . . rn:5 rd:5   \
+_e esz=%esz_immh imm=%lcount_immhb
 
 FMOVI_v_h   0 q:1 00  0 ...  11 . rd:5  %abcdefgh
 
@@ -1212,5 +1215,8 @@ FMOVI_v_h   0 q:1 00  0 ...  11 . 
rd:5  %abcdefgh
 SRSRA_v 0.00 0  ... 00110 1 . . @qrshifti
 URSRA_v 0.10 0  ... 00110 1 . . @qrshifti
 SRI_v   0.10 0  ... 01000 1 . . @qrshifti
+
+SHL_v   0.00 0  ... 01010 1 . . @qlshifti
+SLI_v   0.10 0  ... 01010 1 . . @qlshifti
   ]
 }
-- 
2.43.0




[PATCH 15/17] target/arm: Use {,s}extract in handle_vec_simd_wshli

2024-07-17 Thread Richard Henderson
Combine the right shift with the extension via
the tcg extract operations.

Signed-off-by: Richard Henderson 
---
 target/arm/tcg/translate-a64.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index d0ad6c90bc..627d4311bb 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -10466,8 +10466,11 @@ static void handle_vec_simd_wshli(DisasContext *s, 
bool is_q, bool is_u,
 read_vec_element(s, tcg_rn, rn, is_q ? 1 : 0, MO_64);
 
 for (i = 0; i < elements; i++) {
-tcg_gen_shri_i64(tcg_rd, tcg_rn, i * esize);
-ext_and_shift_reg(tcg_rd, tcg_rd, size | (!is_u << 2), 0);
+if (is_u) {
+tcg_gen_extract_i64(tcg_rd, tcg_rn, i * esize, esize);
+} else {
+tcg_gen_sextract_i64(tcg_rd, tcg_rn, i * esize, esize);
+}
 tcg_gen_shli_i64(tcg_rd, tcg_rd, shift);
 write_vec_element(s, tcg_rd, rd, i, size + 1);
 }
-- 
2.43.0




[PATCH 05/17] target/arm: Simplify do_reduction_op

2024-07-17 Thread Richard Henderson
Use simple shift and add instead of ctpop, ctz, shift and mask.
Unlike SVE, there is no predicate to disable elements.

Signed-off-by: Richard Henderson 
---
 target/arm/tcg/translate-a64.c | 40 +++---
 1 file changed, 13 insertions(+), 27 deletions(-)

diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index e0314a1253..6d2e1a2d80 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -8986,34 +8986,23 @@ static void disas_data_proc_fp(DisasContext *s, 
uint32_t insn)
  * important for correct NaN propagation that we do these
  * operations in exactly the order specified by the pseudocode.
  *
- * This is a recursive function, TCG temps should be freed by the
- * calling function once it is done with the values.
+ * This is a recursive function.
  */
 static TCGv_i32 do_reduction_op(DisasContext *s, int fpopcode, int rn,
-int esize, int size, int vmap, TCGv_ptr fpst)
+MemOp esz, int ebase, int ecount, TCGv_ptr 
fpst)
 {
-if (esize == size) {
-int element;
-MemOp msize = esize == 16 ? MO_16 : MO_32;
-TCGv_i32 tcg_elem;
-
-/* We should have one register left here */
-assert(ctpop8(vmap) == 1);
-element = ctz32(vmap);
-assert(element < 8);
-
-tcg_elem = tcg_temp_new_i32();
-read_vec_element_i32(s, tcg_elem, rn, element, msize);
+if (ecount == 1) {
+TCGv_i32 tcg_elem = tcg_temp_new_i32();
+read_vec_element_i32(s, tcg_elem, rn, ebase, esz);
 return tcg_elem;
 } else {
-int bits = size / 2;
-int shift = ctpop8(vmap) / 2;
-int vmap_lo = (vmap >> shift) & vmap;
-int vmap_hi = (vmap & ~vmap_lo);
+int half = ecount >> 1;
 TCGv_i32 tcg_hi, tcg_lo, tcg_res;
 
-tcg_hi = do_reduction_op(s, fpopcode, rn, esize, bits, vmap_hi, fpst);
-tcg_lo = do_reduction_op(s, fpopcode, rn, esize, bits, vmap_lo, fpst);
+tcg_hi = do_reduction_op(s, fpopcode, rn, esz,
+ ebase + half, half, fpst);
+tcg_lo = do_reduction_op(s, fpopcode, rn, esz,
+ ebase, half, fpst);
 tcg_res = tcg_temp_new_i32();
 
 switch (fpopcode) {
@@ -9064,7 +9053,6 @@ static void disas_simd_across_lanes(DisasContext *s, 
uint32_t insn)
 bool is_u = extract32(insn, 29, 1);
 bool is_fp = false;
 bool is_min = false;
-int esize;
 int elements;
 int i;
 TCGv_i64 tcg_res, tcg_elt;
@@ -9111,8 +9099,7 @@ static void disas_simd_across_lanes(DisasContext *s, 
uint32_t insn)
 return;
 }
 
-esize = 8 << size;
-elements = (is_q ? 128 : 64) / esize;
+elements = (is_q ? 16 : 8) >> size;
 
 tcg_res = tcg_temp_new_i64();
 tcg_elt = tcg_temp_new_i64();
@@ -9167,9 +9154,8 @@ static void disas_simd_across_lanes(DisasContext *s, 
uint32_t insn)
  */
 TCGv_ptr fpst = fpstatus_ptr(size == MO_16 ? FPST_FPCR_F16 : 
FPST_FPCR);
 int fpopcode = opcode | is_min << 4 | is_u << 5;
-int vmap = (1 << elements) - 1;
-TCGv_i32 tcg_res32 = do_reduction_op(s, fpopcode, rn, esize,
- (is_q ? 128 : 64), vmap, fpst);
+TCGv_i32 tcg_res32 = do_reduction_op(s, fpopcode, rn, size,
+ 0, elements, fpst);
 tcg_gen_extu_i32_i64(tcg_res, tcg_res32);
 }
 
-- 
2.43.0




[PATCH 06/17] target/arm: Convert ADDV, *ADDLV, *MAXV, *MINV to decodetree

2024-07-17 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 target/arm/tcg/translate-a64.c | 140 -
 target/arm/tcg/a64.decode  |  12 +++
 2 files changed, 61 insertions(+), 91 deletions(-)

diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index 6d2e1a2d80..055ba4695e 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -6753,6 +6753,47 @@ TRANS(FNMADD, do_fmadd, a, true, true)
 TRANS(FMSUB, do_fmadd, a, false, true)
 TRANS(FNMSUB, do_fmadd, a, true, false)
 
+/*
+ * Advanced SIMD Across Lanes
+ */
+
+static bool do_int_reduction(DisasContext *s, arg_qrr_e *a, bool widen,
+ MemOp src_sign, NeonGenTwo64OpFn *fn)
+{
+TCGv_i64 tcg_res, tcg_elt;
+MemOp src_mop = a->esz | src_sign;
+int elements = (a->q ? 16 : 8) >> a->esz;
+
+/* Reject MO_64, and MO_32 without Q: a minimum of 4 elements. */
+if (elements < 4) {
+return false;
+}
+if (!fp_access_check(s)) {
+return true;
+}
+
+tcg_res = tcg_temp_new_i64();
+tcg_elt = tcg_temp_new_i64();
+
+read_vec_element(s, tcg_res, a->rn, 0, src_mop);
+for (int i = 1; i < elements; i++) {
+read_vec_element(s, tcg_elt, a->rn, i, src_mop);
+fn(tcg_res, tcg_res, tcg_elt);
+}
+
+tcg_gen_ext_i64(tcg_res, tcg_res, a->esz + widen);
+write_fp_dreg(s, a->rd, tcg_res);
+return true;
+}
+
+TRANS(ADDV, do_int_reduction, a, false, 0, tcg_gen_add_i64)
+TRANS(SADDLV, do_int_reduction, a, true, MO_SIGN, tcg_gen_add_i64)
+TRANS(UADDLV, do_int_reduction, a, true, 0, tcg_gen_add_i64)
+TRANS(SMAXV, do_int_reduction, a, false, MO_SIGN, tcg_gen_smax_i64)
+TRANS(UMAXV, do_int_reduction, a, false, 0, tcg_gen_umax_i64)
+TRANS(SMINV, do_int_reduction, a, false, MO_SIGN, tcg_gen_smin_i64)
+TRANS(UMINV, do_int_reduction, a, false, 0, tcg_gen_umin_i64)
+
 /* Shift a TCGv src by TCGv shift_amount, put result in dst.
  * Note that it is the caller's responsibility to ensure that the
  * shift amount is in range (ie 0..31 or 0..63) and provide the ARM
@@ -9051,27 +9092,10 @@ static void disas_simd_across_lanes(DisasContext *s, 
uint32_t insn)
 int opcode = extract32(insn, 12, 5);
 bool is_q = extract32(insn, 30, 1);
 bool is_u = extract32(insn, 29, 1);
-bool is_fp = false;
 bool is_min = false;
 int elements;
-int i;
-TCGv_i64 tcg_res, tcg_elt;
 
 switch (opcode) {
-case 0x1b: /* ADDV */
-if (is_u) {
-unallocated_encoding(s);
-return;
-}
-/* fall through */
-case 0x3: /* SADDLV, UADDLV */
-case 0xa: /* SMAXV, UMAXV */
-case 0x1a: /* SMINV, UMINV */
-if (size == 3 || (size == 2 && !is_q)) {
-unallocated_encoding(s);
-return;
-}
-break;
 case 0xc: /* FMAXNMV, FMINNMV */
 case 0xf: /* FMAXV, FMINV */
 /* Bit 1 of size field encodes min vs max and the actual size
@@ -9080,7 +9104,6 @@ static void disas_simd_across_lanes(DisasContext *s, 
uint32_t insn)
  * precision.
  */
 is_min = extract32(size, 1, 1);
-is_fp = true;
 if (!is_u && dc_isar_feature(aa64_fp16, s)) {
 size = 1;
 } else if (!is_u || !is_q || extract32(size, 0, 1)) {
@@ -9091,6 +9114,10 @@ static void disas_simd_across_lanes(DisasContext *s, 
uint32_t insn)
 }
 break;
 default:
+case 0x3: /* SADDLV, UADDLV */
+case 0xa: /* SMAXV, UMAXV */
+case 0x1a: /* SMINV, UMINV */
+case 0x1b: /* ADDV */
 unallocated_encoding(s);
 return;
 }
@@ -9101,52 +9128,7 @@ static void disas_simd_across_lanes(DisasContext *s, 
uint32_t insn)
 
 elements = (is_q ? 16 : 8) >> size;
 
-tcg_res = tcg_temp_new_i64();
-tcg_elt = tcg_temp_new_i64();
-
-/* These instructions operate across all lanes of a vector
- * to produce a single result. We can guarantee that a 64
- * bit intermediate is sufficient:
- *  + for [US]ADDLV the maximum element size is 32 bits, and
- *the result type is 64 bits
- *  + for FMAX*V, FMIN*V, ADDV the intermediate type is the
- *same as the element size, which is 32 bits at most
- * For the integer operations we can choose to work at 64
- * or 32 bits and truncate at the end; for simplicity
- * we use 64 bits always. The floating point
- * ops do require 32 bit intermediates, though.
- */
-if (!is_fp) {
-read_vec_element(s, tcg_res, rn, 0, size | (is_u ? 0 : MO_SIGN));
-
-for (i = 1; i < elements; i++) {
-read_vec_element(s, tcg_elt, rn, i, size | (is_u ? 0 : MO_SIGN));
-
-switch (opcode) {
-case 0x03: /* SADDLV / UADDLV */
-case 0x1b: /* ADDV */
-tcg_gen_add_i64(tcg_res, tcg_res, tcg_elt);
-break;
-case 0x0a: /* SMAXV / UMAXV */
-   

[PATCH 10/17] target/arm: Introduce gen_gvec_sshr, gen_gvec_ushr

2024-07-17 Thread Richard Henderson
Handle the two special cases within these new
functions instead of higher in the call stack.

Signed-off-by: Richard Henderson 
---
 target/arm/tcg/translate.h  |  5 +
 target/arm/tcg/gengvec.c| 19 +++
 target/arm/tcg/translate-a64.c  | 16 +---
 target/arm/tcg/translate-neon.c | 25 ++---
 4 files changed, 27 insertions(+), 38 deletions(-)

diff --git a/target/arm/tcg/translate.h b/target/arm/tcg/translate.h
index a8672c857c..d1a836ca6f 100644
--- a/target/arm/tcg/translate.h
+++ b/target/arm/tcg/translate.h
@@ -514,6 +514,11 @@ void gen_sqsub_d(TCGv_i64 d, TCGv_i64 q, TCGv_i64 a, 
TCGv_i64 b);
 void gen_gvec_sqsub_qc(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
uint32_t rm_ofs, uint32_t opr_sz, uint32_t max_sz);
 
+void gen_gvec_sshr(unsigned vece, uint32_t rd_ofs, uint32_t rm_ofs,
+   int64_t shift, uint32_t opr_sz, uint32_t max_sz);
+void gen_gvec_ushr(unsigned vece, uint32_t rd_ofs, uint32_t rm_ofs,
+   int64_t shift, uint32_t opr_sz, uint32_t max_sz);
+
 void gen_gvec_ssra(unsigned vece, uint32_t rd_ofs, uint32_t rm_ofs,
int64_t shift, uint32_t opr_sz, uint32_t max_sz);
 void gen_gvec_usra(unsigned vece, uint32_t rd_ofs, uint32_t rm_ofs,
diff --git a/target/arm/tcg/gengvec.c b/target/arm/tcg/gengvec.c
index 56a1dc1f75..47ac2634ce 100644
--- a/target/arm/tcg/gengvec.c
+++ b/target/arm/tcg/gengvec.c
@@ -88,6 +88,25 @@ GEN_CMP0(gen_gvec_cgt0, TCG_COND_GT)
 
 #undef GEN_CMP0
 
+void gen_gvec_sshr(unsigned vece, uint32_t rd_ofs, uint32_t rm_ofs,
+   int64_t shift, uint32_t opr_sz, uint32_t max_sz)
+{
+/* Signed shift out of range results in all-sign-bits */
+shift = MIN(shift, (8 << vece) - 1);
+tcg_gen_gvec_sari(vece, rd_ofs, rm_ofs, shift, opr_sz, max_sz);
+}
+
+void gen_gvec_ushr(unsigned vece, uint32_t rd_ofs, uint32_t rm_ofs,
+   int64_t shift, uint32_t opr_sz, uint32_t max_sz)
+{
+/* Unsigned shift out of range results in all-zero-bits */
+if (shift >= (8 << vece)) {
+tcg_gen_gvec_dup_imm(vece, rd_ofs, opr_sz, max_sz, 0);
+} else {
+tcg_gen_gvec_shri(vece, rd_ofs, rm_ofs, shift, opr_sz, max_sz);
+}
+}
+
 static void gen_ssra8_i64(TCGv_i64 d, TCGv_i64 a, int64_t shift)
 {
 tcg_gen_vec_sar8i_i64(a, a, shift);
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index 1fa9dc3172..d0a3450d75 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -10411,21 +10411,7 @@ static void handle_vec_simd_shri(DisasContext *s, bool 
is_q, bool is_u,
 break;
 
 case 0x00: /* SSHR / USHR */
-if (is_u) {
-if (shift == 8 << size) {
-/* Shift count the same size as element size produces zero.  */
-tcg_gen_gvec_dup_imm(size, vec_full_reg_offset(s, rd),
- is_q ? 16 : 8, vec_full_reg_size(s), 0);
-return;
-}
-gvec_fn = tcg_gen_gvec_shri;
-} else {
-/* Shift count the same size as element size produces all sign.  */
-if (shift == 8 << size) {
-shift -= 1;
-}
-gvec_fn = tcg_gen_gvec_sari;
-}
+gvec_fn = is_u ? gen_gvec_ushr : gen_gvec_sshr;
 break;
 
 case 0x04: /* SRSHR / URSHR (rounding) */
diff --git a/target/arm/tcg/translate-neon.c b/target/arm/tcg/translate-neon.c
index 915c9e56db..05d4016633 100644
--- a/target/arm/tcg/translate-neon.c
+++ b/target/arm/tcg/translate-neon.c
@@ -1068,29 +1068,8 @@ DO_2SH(VRSHR_S, gen_gvec_srshr)
 DO_2SH(VRSHR_U, gen_gvec_urshr)
 DO_2SH(VRSRA_S, gen_gvec_srsra)
 DO_2SH(VRSRA_U, gen_gvec_ursra)
-
-static bool trans_VSHR_S_2sh(DisasContext *s, arg_2reg_shift *a)
-{
-/* Signed shift out of range results in all-sign-bits */
-a->shift = MIN(a->shift, (8 << a->size) - 1);
-return do_vector_2sh(s, a, tcg_gen_gvec_sari);
-}
-
-static void gen_zero_rd_2sh(unsigned vece, uint32_t rd_ofs, uint32_t rm_ofs,
-int64_t shift, uint32_t oprsz, uint32_t maxsz)
-{
-tcg_gen_gvec_dup_imm(vece, rd_ofs, oprsz, maxsz, 0);
-}
-
-static bool trans_VSHR_U_2sh(DisasContext *s, arg_2reg_shift *a)
-{
-/* Shift out of range is architecturally valid and results in zero. */
-if (a->shift >= (8 << a->size)) {
-return do_vector_2sh(s, a, gen_zero_rd_2sh);
-} else {
-return do_vector_2sh(s, a, tcg_gen_gvec_shri);
-}
-}
+DO_2SH(VSHR_S, gen_gvec_sshr)
+DO_2SH(VSHR_U, gen_gvec_ushr)
 
 static bool do_2shift_env_64(DisasContext *s, arg_2reg_shift *a,
  NeonGenTwo64OpEnvFn *fn)
-- 
2.43.0




[PATCH 12/17] target/arm: Convert handle_vec_simd_shri to decodetree

2024-07-17 Thread Richard Henderson
This includes SSHR, USHR, SSRA, USRA, SRSHR, URSHR, SRSRA, URSRA, SRI.

Signed-off-by: Richard Henderson 
---
 target/arm/tcg/translate-a64.c | 109 +++--
 target/arm/tcg/a64.decode  |  27 +++-
 2 files changed, 74 insertions(+), 62 deletions(-)

diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index d0a3450d75..1e482477c5 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -68,6 +68,22 @@ static int scale_by_log2_tag_granule(DisasContext *s, int x)
 return x << LOG2_TAG_GRANULE;
 }
 
+/*
+ * For Advanced SIMD shift by immediate, extract esz from immh.
+ * The result must be validated by the translator: MO_8 <= x <= MO_64.
+ */
+static int esz_immh(DisasContext *s, int x)
+{
+return 32 - clz32(x) - 1;
+}
+
+/* For Advanced SIMD shift by immediate, right shift count. */
+static int rcount_immhb(DisasContext *s, int x)
+{
+int size = esz_immh(s, x >> 3);
+return (16 << size) - x;
+}
+
 /*
  * Include the generated decoders.
  */
@@ -6918,6 +6934,35 @@ static bool trans_Vimm(DisasContext *s, arg_Vimm *a)
 return true;
 }
 
+/*
+ * Advanced SIMD Shift by Immediate
+ */
+
+static bool do_vec_shift_imm(DisasContext *s, arg_qrri_e *a, GVecGen2iFn *fn)
+{
+/* Validate result of esz_immh, for invalid immh == 0. */
+if (a->esz < 0) {
+return false;
+}
+if (a->esz == MO_64 && !a->q) {
+return false;
+}
+if (fp_access_check(s)) {
+gen_gvec_fn2i(s, a->q, a->rd, a->rn, a->imm, fn, a->esz);
+}
+return true;
+}
+
+TRANS(SSHR_v, do_vec_shift_imm, a, gen_gvec_sshr)
+TRANS(USHR_v, do_vec_shift_imm, a, gen_gvec_ushr)
+TRANS(SSRA_v, do_vec_shift_imm, a, gen_gvec_ssra)
+TRANS(USRA_v, do_vec_shift_imm, a, gen_gvec_usra)
+TRANS(SRSHR_v, do_vec_shift_imm, a, gen_gvec_srshr)
+TRANS(URSHR_v, do_vec_shift_imm, a, gen_gvec_urshr)
+TRANS(SRSRA_v, do_vec_shift_imm, a, gen_gvec_srsra)
+TRANS(URSRA_v, do_vec_shift_imm, a, gen_gvec_ursra)
+TRANS(SRI_v, do_vec_shift_imm, a, gen_gvec_sri)
+
 /* Shift a TCGv src by TCGv shift_amount, put result in dst.
  * Note that it is the caller's responsibility to ensure that the
  * shift amount is in range (ie 0..31 or 0..63) and provide the ARM
@@ -10382,53 +10427,6 @@ static void 
disas_simd_scalar_two_reg_misc(DisasContext *s, uint32_t insn)
 }
 }
 
-/* SSHR[RA]/USHR[RA] - Vector shift right (optional rounding/accumulate) */
-static void handle_vec_simd_shri(DisasContext *s, bool is_q, bool is_u,
- int immh, int immb, int opcode, int rn, int 
rd)
-{
-int size = 32 - clz32(immh) - 1;
-int immhb = immh << 3 | immb;
-int shift = 2 * (8 << size) - immhb;
-GVecGen2iFn *gvec_fn;
-
-if (extract32(immh, 3, 1) && !is_q) {
-unallocated_encoding(s);
-return;
-}
-tcg_debug_assert(size <= 3);
-
-if (!fp_access_check(s)) {
-return;
-}
-
-switch (opcode) {
-case 0x02: /* SSRA / USRA (accumulate) */
-gvec_fn = is_u ? gen_gvec_usra : gen_gvec_ssra;
-break;
-
-case 0x08: /* SRI */
-gvec_fn = gen_gvec_sri;
-break;
-
-case 0x00: /* SSHR / USHR */
-gvec_fn = is_u ? gen_gvec_ushr : gen_gvec_sshr;
-break;
-
-case 0x04: /* SRSHR / URSHR (rounding) */
-gvec_fn = is_u ? gen_gvec_urshr : gen_gvec_srshr;
-break;
-
-case 0x06: /* SRSRA / URSRA (accum + rounding) */
-gvec_fn = is_u ? gen_gvec_ursra : gen_gvec_srsra;
-break;
-
-default:
-g_assert_not_reached();
-}
-
-gen_gvec_fn2i(s, is_q, rd, rn, shift, gvec_fn, size);
-}
-
 /* SHL/SLI - Vector shift left */
 static void handle_vec_simd_shli(DisasContext *s, bool is_q, bool insert,
  int immh, int immb, int opcode, int rn, int 
rd)
@@ -10568,18 +10566,6 @@ static void disas_simd_shift_imm(DisasContext *s, 
uint32_t insn)
 }
 
 switch (opcode) {
-case 0x08: /* SRI */
-if (!is_u) {
-unallocated_encoding(s);
-return;
-}
-/* fall through */
-case 0x00: /* SSHR / USHR */
-case 0x02: /* SSRA / USRA (accumulate) */
-case 0x04: /* SRSHR / URSHR (rounding) */
-case 0x06: /* SRSRA / URSRA (accum + rounding) */
-handle_vec_simd_shri(s, is_q, is_u, immh, immb, opcode, rn, rd);
-break;
 case 0x0a: /* SHL / SLI */
 handle_vec_simd_shli(s, is_q, is_u, immh, immb, opcode, rn, rd);
 break;
@@ -10618,6 +10604,11 @@ static void disas_simd_shift_imm(DisasContext *s, 
uint32_t insn)
 handle_simd_shift_fpint_conv(s, false, is_q, is_u, immh, immb, rn, rd);
 return;
 default:
+case 0x00: /* SSHR / USHR */
+case 0x02: /* SSRA / USRA (accumulate) */
+case 0x04: /* SRSHR / URSHR (rounding) */
+case 0x06: /* SRSRA / URSRA (accum + rounding) */
+case 0x08: /* SR

[PATCH 14/17] target/arm: Clear high SVE elements in handle_vec_simd_wshli

2024-07-17 Thread Richard Henderson
AdvSIMD instructions are supposed to zero bits beyond 128.
Affects SSHLL, USHLL, SSHLL2, USHLL2.

Cc: qemu-sta...@nongnu.org
Signed-off-by: Richard Henderson 
---
 target/arm/tcg/translate-a64.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index fd90752dee..d0ad6c90bc 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -10471,6 +10471,7 @@ static void handle_vec_simd_wshli(DisasContext *s, bool 
is_q, bool is_u,
 tcg_gen_shli_i64(tcg_rd, tcg_rd, shift);
 write_vec_element(s, tcg_rd, rd, i, size + 1);
 }
+clear_vec_high(s, true, rd);
 }
 
 /* SHRN/RSHRN - Shift right with narrowing (and potential rounding) */
-- 
2.43.0




[PATCH 00/17] target/arm: AdvSIMD decodetree conversion, part 4

2024-07-17 Thread Richard Henderson
Flush before the queue gets too big.
Also, there's a bug fix in patch 14.

r~

Richard Henderson (17):
  target/arm: Use tcg_gen_extract2_i64 for EXT
  target/arm: Convert EXT to decodetree
  target/arm: Convert TBL, TBX to decodetree
  target/arm: Convert UZP, TRN, ZIP to decodetree
  target/arm: Simplify do_reduction_op
  target/arm: Convert ADDV, *ADDLV, *MAXV, *MINV to decodetree
  target/arm: Convert FMAXNMV, FMINNMV, FMAXV, FMINV to decodetree
  target/arm: Convert FMOVI (scalar, immediate) to decodetree
  target/arm: Convert MOVI, FMOV, ORR, BIC (vector immediate) to
decodetree
  target/arm: Introduce gen_gvec_sshr, gen_gvec_ushr
  target/arm: Fix whitespace near gen_srshr64_i64
  target/arm: Convert handle_vec_simd_shri to decodetree
  target/arm: Convet handle_vec_simd_shli to decodetree
  target/arm: Clear high SVE elements in handle_vec_simd_wshli
  target/arm: Use {,s}extract in handle_vec_simd_wshli
  target/arm: Convert SSHLL, USHLL to decodetree
  target/arm: Push tcg_rnd into handle_shri_with_rndacc

 target/arm/tcg/translate.h  |5 +
 target/arm/tcg/gengvec.c|   21 +-
 target/arm/tcg/translate-a64.c  | 1123 +++
 target/arm/tcg/translate-neon.c |   25 +-
 target/arm/tcg/a64.decode   |   87 +++
 5 files changed, 520 insertions(+), 741 deletions(-)

-- 
2.43.0




[PATCH 03/17] target/arm: Convert TBL, TBX to decodetree

2024-07-17 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 target/arm/tcg/translate-a64.c | 47 ++
 target/arm/tcg/a64.decode  |  4 +++
 2 files changed, 18 insertions(+), 33 deletions(-)

diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index 6ca24d9842..7e3bde93fe 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -4657,6 +4657,20 @@ static bool trans_EXTR(DisasContext *s, arg_extract *a)
 return true;
 }
 
+static bool trans_TBL_TBX(DisasContext *s, arg_TBL_TBX *a)
+{
+if (fp_access_check(s)) {
+int len = (a->len + 1) * 16;
+
+tcg_gen_gvec_2_ptr(vec_full_reg_offset(s, a->rd),
+   vec_full_reg_offset(s, a->rm), tcg_env,
+   a->q ? 16 : 8, vec_full_reg_size(s),
+   (len << 6) | (a->tbx << 5) | a->rn,
+   gen_helper_simd_tblx);
+}
+return true;
+}
+
 /*
  * Cryptographic AES, SHA, SHA512
  */
@@ -8897,38 +8911,6 @@ static void disas_data_proc_fp(DisasContext *s, uint32_t 
insn)
 }
 }
 
-/* TBL/TBX
- *   31  30 29 24 23 22  21 20  16 15  14 13  12  11 10 95 40
- * +---+---+-+-+---+--+---+-++-+--+--+
- * | 0 | Q | 0 0 1 1 1 0 | op2 | 0 |  Rm  | 0 | len | op | 0 0 |  Rn  |  Rd  |
- * +---+---+-+-+---+--+---+-++-+--+--+
- */
-static void disas_simd_tb(DisasContext *s, uint32_t insn)
-{
-int op2 = extract32(insn, 22, 2);
-int is_q = extract32(insn, 30, 1);
-int rm = extract32(insn, 16, 5);
-int rn = extract32(insn, 5, 5);
-int rd = extract32(insn, 0, 5);
-int is_tbx = extract32(insn, 12, 1);
-int len = (extract32(insn, 13, 2) + 1) * 16;
-
-if (op2 != 0) {
-unallocated_encoding(s);
-return;
-}
-
-if (!fp_access_check(s)) {
-return;
-}
-
-tcg_gen_gvec_2_ptr(vec_full_reg_offset(s, rd),
-   vec_full_reg_offset(s, rm), tcg_env,
-   is_q ? 16 : 8, vec_full_reg_size(s),
-   (len << 6) | (is_tbx << 5) | rn,
-   gen_helper_simd_tblx);
-}
-
 /* ZIP/UZP/TRN
  *   31  30 29 24 23  22  21 20   16 15 14 12 11 10 95 40
  * +---+---+-+--+---+--+---+--+--+
@@ -11792,7 +11774,6 @@ static const AArch64DecodeTable data_proc_simd[] = {
 /* simd_mod_imm decode is a subset of simd_shift_imm, so must precede it */
 { 0x0f000400, 0x9ff80400, disas_simd_mod_imm },
 { 0x0f000400, 0x9f800400, disas_simd_shift_imm },
-{ 0x0e00, 0xbf208c00, disas_simd_tb },
 { 0x0e000800, 0xbf208c00, disas_simd_zip_trn },
 { 0x5e200800, 0xdf3e0c00, disas_simd_scalar_two_reg_misc },
 { 0x5f000400, 0xdf800400, disas_simd_scalar_shift_imm },
diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode
index 05927fade6..45896902d5 100644
--- a/target/arm/tcg/a64.decode
+++ b/target/arm/tcg/a64.decode
@@ -1141,3 +1141,7 @@ FNMSUB  0001  .. 1 . 1 . . .  
  @_hsd
 
 EXT_d   0010 1110 00 0 rm:5 00 imm:3 0 rn:5 rd:5
 EXT_q   0110 1110 00 0 rm:5 0  imm:4 0 rn:5 rd:5
+
+# Advanced SIMD Table Lookup
+
+TBL_TBX 0 q:1 00 1110 000 rm:5 0 len:2 tbx:1 00 rn:5 rd:5
-- 
2.43.0




[PATCH 09/17] target/arm: Convert MOVI, FMOV, ORR, BIC (vector immediate) to decodetree

2024-07-17 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 target/arm/tcg/translate-a64.c | 117 ++---
 target/arm/tcg/a64.decode  |   9 +++
 2 files changed, 59 insertions(+), 67 deletions(-)

diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index 6582816e4e..1fa9dc3172 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -6872,6 +6872,52 @@ static bool trans_FMOVI_s(DisasContext *s, arg_FMOVI_s 
*a)
 return true;
 }
 
+/*
+ * Advanced SIMD Modified Immediate
+ */
+
+static bool trans_FMOVI_v_h(DisasContext *s, arg_FMOVI_v_h *a)
+{
+if (!dc_isar_feature(aa64_fp16, s)) {
+return false;
+}
+if (fp_access_check(s)) {
+tcg_gen_gvec_dup_imm(MO_16, vec_full_reg_offset(s, a->rd),
+ a->q ? 16 : 8, vec_full_reg_size(s),
+ vfp_expand_imm(MO_16, a->abcdefgh));
+}
+return true;
+}
+
+static void gen_movi(unsigned vece, uint32_t dofs, uint32_t aofs,
+ int64_t c, uint32_t oprsz, uint32_t maxsz)
+{
+tcg_gen_gvec_dup_imm(MO_64, dofs, oprsz, maxsz, c);
+}
+
+static bool trans_Vimm(DisasContext *s, arg_Vimm *a)
+{
+GVecGen2iFn *fn;
+
+/* Handle decode of cmode/op here between ORR/BIC/MOVI */
+if ((a->cmode & 1) && a->cmode < 12) {
+/* For op=1, the imm will be inverted, so BIC becomes AND. */
+fn = a->op ? tcg_gen_gvec_andi : tcg_gen_gvec_ori;
+} else {
+/* There is one unallocated cmode/op combination in this space */
+if (a->cmode == 15 && a->op == 1 && a->q == 0) {
+return false;
+}
+fn = gen_movi;
+}
+
+if (fp_access_check(s)) {
+uint64_t imm = asimd_imm_const(a->abcdefgh, a->cmode, a->op);
+gen_gvec_fn2i(s, a->q, a->rd, a->rd, imm, fn, MO_64);
+}
+return true;
+}
+
 /* Shift a TCGv src by TCGv shift_amount, put result in dst.
  * Note that it is the caller's responsibility to ensure that the
  * shift amount is in range (ie 0..31 or 0..63) and provide the ARM
@@ -9051,69 +9097,6 @@ static void disas_data_proc_fp(DisasContext *s, uint32_t 
insn)
 }
 }
 
-/* AdvSIMD modified immediate
- *  31  30   29  28 19 18 16 15   12  11  10  9 5 40
- * +---+---++-+-+---++---+---+--+
- * | 0 | Q | op | 0 1 1 1 1 0 0 0 0 0 | abc | cmode | o2 | 1 | defgh |  Rd  |
- * +---+---++-+-+---++---+---+--+
- *
- * There are a number of operations that can be carried out here:
- *   MOVI - move (shifted) imm into register
- *   MVNI - move inverted (shifted) imm into register
- *   ORR  - bitwise OR of (shifted) imm with register
- *   BIC  - bitwise clear of (shifted) imm with register
- * With ARMv8.2 we also have:
- *   FMOV half-precision
- */
-static void disas_simd_mod_imm(DisasContext *s, uint32_t insn)
-{
-int rd = extract32(insn, 0, 5);
-int cmode = extract32(insn, 12, 4);
-int o2 = extract32(insn, 11, 1);
-uint64_t abcdefgh = extract32(insn, 5, 5) | (extract32(insn, 16, 3) << 5);
-bool is_neg = extract32(insn, 29, 1);
-bool is_q = extract32(insn, 30, 1);
-uint64_t imm = 0;
-
-if (o2) {
-if (cmode != 0xf || is_neg) {
-unallocated_encoding(s);
-return;
-}
-/* FMOV (vector, immediate) - half-precision */
-if (!dc_isar_feature(aa64_fp16, s)) {
-unallocated_encoding(s);
-return;
-}
-imm = vfp_expand_imm(MO_16, abcdefgh);
-/* now duplicate across the lanes */
-imm = dup_const(MO_16, imm);
-} else {
-if (cmode == 0xf && is_neg && !is_q) {
-unallocated_encoding(s);
-return;
-}
-imm = asimd_imm_const(abcdefgh, cmode, is_neg);
-}
-
-if (!fp_access_check(s)) {
-return;
-}
-
-if (!((cmode & 0x9) == 0x1 || (cmode & 0xd) == 0x9)) {
-/* MOVI or MVNI, with MVNI negation handled above.  */
-tcg_gen_gvec_dup_imm(MO_64, vec_full_reg_offset(s, rd), is_q ? 16 : 8,
- vec_full_reg_size(s), imm);
-} else {
-/* ORR or BIC, with BIC negation to AND handled above.  */
-if (is_neg) {
-gen_gvec_fn2i(s, is_q, rd, rd, imm, tcg_gen_gvec_andi, MO_64);
-} else {
-gen_gvec_fn2i(s, is_q, rd, rd, imm, tcg_gen_gvec_ori, MO_64);
-}
-}
-}
-
 /*
  * Common SSHR[RA]/USHR[RA] - Shift right (optional rounding/accumulate)
  *
@@ -10593,8 +10576,10 @@ static void disas_simd_shift_imm(DisasContext *s, 
uint32_t insn)
 bool is_u = extract32(insn, 29, 1);
 bool is_q = extract32(insn, 30, 1);
 
-/* data_proc_simd[] has sent immh == 0 to disas_simd_mod_imm. */
-assert(immh != 0);
+if (immh == 0) {
+unallocated_encoding(s);

[PATCH 02/17] target/arm: Convert EXT to decodetree

2024-07-17 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 target/arm/tcg/translate-a64.c | 121 +
 target/arm/tcg/a64.decode  |   5 ++
 2 files changed, 53 insertions(+), 73 deletions(-)

diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index e4c8a20f39..6ca24d9842 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -6541,6 +6541,54 @@ static bool trans_FCSEL(DisasContext *s, arg_FCSEL *a)
 return true;
 }
 
+/*
+ * Advanced SIMD Extract
+ */
+
+static bool trans_EXT_d(DisasContext *s, arg_EXT_d *a)
+{
+if (fp_access_check(s)) {
+TCGv_i64 lo = read_fp_dreg(s, a->rn);
+if (a->imm != 0) {
+TCGv_i64 hi = read_fp_dreg(s, a->rm);
+tcg_gen_extract2_i64(lo, lo, hi, a->imm * 8);
+}
+write_fp_dreg(s, a->rd, lo);
+}
+return true;
+}
+
+static bool trans_EXT_q(DisasContext *s, arg_EXT_q *a)
+{
+TCGv_i64 lo, hi;
+int pos = (a->imm & 7) * 8;
+int elt = a->imm >> 3;
+
+if (!fp_access_check(s)) {
+return true;
+}
+
+lo = tcg_temp_new_i64();
+hi = tcg_temp_new_i64();
+
+read_vec_element(s, lo, a->rn, elt, MO_64);
+elt++;
+read_vec_element(s, hi, elt & 2 ? a->rm : a->rn, elt & 1, MO_64);
+elt++;
+
+if (pos != 0) {
+TCGv_i64 hh = tcg_temp_new_i64();
+tcg_gen_extract2_i64(lo, lo, hi, pos);
+read_vec_element(s, hh, a->rm, elt & 1, MO_64);
+tcg_gen_extract2_i64(hi, hi, hh, pos);
+}
+
+write_vec_element(s, lo, a->rd, 0, MO_64);
+write_vec_element(s, hi, a->rd, 1, MO_64);
+clear_vec_high(s, true, a->rd);
+return true;
+}
+
 /*
  * Floating-point data-processing (3 source)
  */
@@ -8849,78 +8897,6 @@ static void disas_data_proc_fp(DisasContext *s, uint32_t 
insn)
 }
 }
 
-/* EXT
- *   31  30 29 24 23 22  21 20  16 15  14  11 10  95 40
- * +---+---+-+-+---+--+---+--+---+--+--+
- * | 0 | Q | 1 0 1 1 1 0 | op2 | 0 |  Rm  | 0 | imm4 | 0 |  Rn  |  Rd  |
- * +---+---+-+-+---+--+---+--+---+--+--+
- */
-static void disas_simd_ext(DisasContext *s, uint32_t insn)
-{
-int is_q = extract32(insn, 30, 1);
-int op2 = extract32(insn, 22, 2);
-int imm4 = extract32(insn, 11, 4);
-int rm = extract32(insn, 16, 5);
-int rn = extract32(insn, 5, 5);
-int rd = extract32(insn, 0, 5);
-int pos = imm4 << 3;
-TCGv_i64 tcg_resl, tcg_resh;
-
-if (op2 != 0 || (!is_q && extract32(imm4, 3, 1))) {
-unallocated_encoding(s);
-return;
-}
-
-if (!fp_access_check(s)) {
-return;
-}
-
-tcg_resh = tcg_temp_new_i64();
-tcg_resl = tcg_temp_new_i64();
-
-/* Vd gets bits starting at pos bits into Vm:Vn. This is
- * either extracting 128 bits from a 128:128 concatenation, or
- * extracting 64 bits from a 64:64 concatenation.
- */
-if (!is_q) {
-read_vec_element(s, tcg_resl, rn, 0, MO_64);
-if (pos != 0) {
-read_vec_element(s, tcg_resh, rm, 0, MO_64);
-tcg_gen_extract2_i64(tcg_resl, tcg_resl, tcg_resh, pos);
-}
-} else {
-TCGv_i64 tcg_hh;
-typedef struct {
-int reg;
-int elt;
-} EltPosns;
-EltPosns eltposns[] = { {rn, 0}, {rn, 1}, {rm, 0}, {rm, 1} };
-EltPosns *elt = eltposns;
-
-if (pos >= 64) {
-elt++;
-pos -= 64;
-}
-
-read_vec_element(s, tcg_resl, elt->reg, elt->elt, MO_64);
-elt++;
-read_vec_element(s, tcg_resh, elt->reg, elt->elt, MO_64);
-elt++;
-if (pos != 0) {
-tcg_gen_extract2_i64(tcg_resl, tcg_resl, tcg_resh, pos);
-tcg_hh = tcg_temp_new_i64();
-read_vec_element(s, tcg_hh, elt->reg, elt->elt, MO_64);
-tcg_gen_extract2_i64(tcg_resh, tcg_resh, tcg_hh, pos);
-}
-}
-
-write_vec_element(s, tcg_resl, rd, 0, MO_64);
-if (is_q) {
-write_vec_element(s, tcg_resh, rd, 1, MO_64);
-}
-clear_vec_high(s, is_q, rd);
-}
-
 /* TBL/TBX
  *   31  30 29 24 23 22  21 20  16 15  14 13  12  11 10 95 40
  * +---+---+-+-+---+--+---+-++-+--+--+
@@ -11818,7 +11794,6 @@ static const AArch64DecodeTable data_proc_simd[] = {
 { 0x0f000400, 0x9f800400, disas_simd_shift_imm },
 { 0x0e00, 0xbf208c00, disas_simd_tb },
 { 0x0e000800, 0xbf208c00, disas_simd_zip_trn },
-{ 0x2e00, 0xbf208400, disas_simd_ext },
 { 0x5e200800, 0xdf3e0c00, disas_simd_scalar_two_reg_misc },
 { 0x5f000400, 0xdf800400, disas_simd_scalar_shift_imm },
 { 0x0e780800, 0x8f7e0c00, disas_simd_two_reg_misc_fp16 },
diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode
index 2922de700c..05927fade6 100644
--- a/target/arm/tcg/a64.

[PATCH 07/17] target/arm: Convert FMAXNMV, FMINNMV, FMAXV, FMINV to decodetree

2024-07-17 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 target/arm/tcg/translate-a64.c | 176 ++---
 target/arm/tcg/a64.decode  |  14 +++
 2 files changed, 67 insertions(+), 123 deletions(-)

diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index 055ba4695e..2964279c00 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -6794,6 +6794,59 @@ TRANS(UMAXV, do_int_reduction, a, false, 0, 
tcg_gen_umax_i64)
 TRANS(SMINV, do_int_reduction, a, false, MO_SIGN, tcg_gen_smin_i64)
 TRANS(UMINV, do_int_reduction, a, false, 0, tcg_gen_umin_i64)
 
+/*
+ * do_fp_reduction helper
+ *
+ * This mirrors the Reduce() pseudocode in the ARM ARM. It is
+ * important for correct NaN propagation that we do these
+ * operations in exactly the order specified by the pseudocode.
+ *
+ * This is a recursive function.
+ */
+static TCGv_i32 do_reduction_op(DisasContext *s, int rn, MemOp esz,
+int ebase, int ecount, TCGv_ptr fpst,
+NeonGenTwoSingleOpFn *fn)
+{
+if (ecount == 1) {
+TCGv_i32 tcg_elem = tcg_temp_new_i32();
+read_vec_element_i32(s, tcg_elem, rn, ebase, esz);
+return tcg_elem;
+} else {
+int half = ecount >> 1;
+TCGv_i32 tcg_hi, tcg_lo, tcg_res;
+
+tcg_hi = do_reduction_op(s, rn, esz, ebase + half, half, fpst, fn);
+tcg_lo = do_reduction_op(s, rn, esz, ebase, half, fpst, fn);
+tcg_res = tcg_temp_new_i32();
+
+fn(tcg_res, tcg_lo, tcg_hi, fpst);
+return tcg_res;
+}
+}
+
+static bool do_fp_reduction(DisasContext *s, arg_qrr_e *a,
+  NeonGenTwoSingleOpFn *fn)
+{
+if (fp_access_check(s)) {
+MemOp esz = a->esz;
+int elts = (a->q ? 16 : 8) >> esz;
+TCGv_ptr fpst = fpstatus_ptr(esz == MO_16 ? FPST_FPCR_F16 : FPST_FPCR);
+TCGv_i32 res = do_reduction_op(s, a->rn, esz, 0, elts, fpst, fn);
+write_fp_sreg(s, a->rd, res);
+}
+return true;
+}
+
+TRANS_FEAT(FMAXNMV_h, aa64_fp16, do_fp_reduction, a, 
gen_helper_advsimd_maxnumh)
+TRANS_FEAT(FMINNMV_h, aa64_fp16, do_fp_reduction, a, 
gen_helper_advsimd_minnumh)
+TRANS_FEAT(FMAXV_h, aa64_fp16, do_fp_reduction, a, gen_helper_advsimd_maxh)
+TRANS_FEAT(FMINV_h, aa64_fp16, do_fp_reduction, a, gen_helper_advsimd_minh)
+
+TRANS(FMAXNMV_s, do_fp_reduction, a, gen_helper_vfp_maxnums)
+TRANS(FMINNMV_s, do_fp_reduction, a, gen_helper_vfp_minnums)
+TRANS(FMAXV_s, do_fp_reduction, a, gen_helper_vfp_maxs)
+TRANS(FMINV_s, do_fp_reduction, a, gen_helper_vfp_mins)
+
 /* Shift a TCGv src by TCGv shift_amount, put result in dst.
  * Note that it is the caller's responsibility to ensure that the
  * shift amount is in range (ie 0..31 or 0..63) and provide the ARM
@@ -9020,128 +9073,6 @@ static void disas_data_proc_fp(DisasContext *s, 
uint32_t insn)
 }
 }
 
-/*
- * do_reduction_op helper
- *
- * This mirrors the Reduce() pseudocode in the ARM ARM. It is
- * important for correct NaN propagation that we do these
- * operations in exactly the order specified by the pseudocode.
- *
- * This is a recursive function.
- */
-static TCGv_i32 do_reduction_op(DisasContext *s, int fpopcode, int rn,
-MemOp esz, int ebase, int ecount, TCGv_ptr 
fpst)
-{
-if (ecount == 1) {
-TCGv_i32 tcg_elem = tcg_temp_new_i32();
-read_vec_element_i32(s, tcg_elem, rn, ebase, esz);
-return tcg_elem;
-} else {
-int half = ecount >> 1;
-TCGv_i32 tcg_hi, tcg_lo, tcg_res;
-
-tcg_hi = do_reduction_op(s, fpopcode, rn, esz,
- ebase + half, half, fpst);
-tcg_lo = do_reduction_op(s, fpopcode, rn, esz,
- ebase, half, fpst);
-tcg_res = tcg_temp_new_i32();
-
-switch (fpopcode) {
-case 0x0c: /* fmaxnmv half-precision */
-gen_helper_advsimd_maxnumh(tcg_res, tcg_lo, tcg_hi, fpst);
-break;
-case 0x0f: /* fmaxv half-precision */
-gen_helper_advsimd_maxh(tcg_res, tcg_lo, tcg_hi, fpst);
-break;
-case 0x1c: /* fminnmv half-precision */
-gen_helper_advsimd_minnumh(tcg_res, tcg_lo, tcg_hi, fpst);
-break;
-case 0x1f: /* fminv half-precision */
-gen_helper_advsimd_minh(tcg_res, tcg_lo, tcg_hi, fpst);
-break;
-case 0x2c: /* fmaxnmv */
-gen_helper_vfp_maxnums(tcg_res, tcg_lo, tcg_hi, fpst);
-break;
-case 0x2f: /* fmaxv */
-gen_helper_vfp_maxs(tcg_res, tcg_lo, tcg_hi, fpst);
-break;
-case 0x3c: /* fminnmv */
-gen_helper_vfp_minnums(tcg_res, tcg_lo, tcg_hi, fpst);
-break;
-case 0x3f: /* fminv */
-gen_helper_vfp_mins(tcg_res, tcg_lo, tcg_hi, fpst);
-break;
-default:
-g_assert_not_reached

[PATCH 04/17] target/arm: Convert UZP, TRN, ZIP to decodetree

2024-07-17 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 target/arm/tcg/translate-a64.c | 158 ++---
 target/arm/tcg/a64.decode  |   9 ++
 2 files changed, 77 insertions(+), 90 deletions(-)

diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index 7e3bde93fe..e0314a1253 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -4671,6 +4671,74 @@ static bool trans_TBL_TBX(DisasContext *s, arg_TBL_TBX 
*a)
 return true;
 }
 
+typedef int simd_permute_idx_fn(int i, int part, int elements);
+
+static bool do_simd_permute(DisasContext *s, arg_qrrr_e *a,
+simd_permute_idx_fn *fn, int part)
+{
+MemOp esz = a->esz;
+int datasize = a->q ? 16 : 8;
+int elements = datasize >> esz;
+TCGv_i64 tcg_res[2], tcg_ele;
+
+if (esz == MO_64 && !a->q) {
+return false;
+}
+if (!fp_access_check(s)) {
+return true;
+}
+
+tcg_res[0] = tcg_temp_new_i64();
+tcg_res[1] = a->q ? tcg_temp_new_i64() : NULL;
+tcg_ele = tcg_temp_new_i64();
+
+for (int i = 0; i < elements; i++) {
+int o, w, idx;
+
+idx = fn(i, part, elements);
+read_vec_element(s, tcg_ele, (idx & elements ? a->rm : a->rn),
+ idx & (elements - 1), esz);
+
+w = (i << (esz + 3)) / 64;
+o = (i << (esz + 3)) % 64;
+if (o == 0) {
+tcg_gen_mov_i64(tcg_res[w], tcg_ele);
+} else {
+tcg_gen_deposit_i64(tcg_res[w], tcg_res[w], tcg_ele, o, 8 << esz);
+}
+}
+
+for (int i = a->q; i >= 0; --i) {
+write_vec_element(s, tcg_res[i], a->rd, i, MO_64);
+}
+clear_vec_high(s, a->q, a->rd);
+return true;
+}
+
+static int permute_load_uzp(int i, int part, int elements)
+{
+return 2 * i + part;
+}
+
+TRANS(UZP1, do_simd_permute, a, permute_load_uzp, 0)
+TRANS(UZP2, do_simd_permute, a, permute_load_uzp, 1)
+
+static int permute_load_trn(int i, int part, int elements)
+{
+return (i & 1) * elements + (i & ~1) + part;
+}
+
+TRANS(TRN1, do_simd_permute, a, permute_load_trn, 0)
+TRANS(TRN2, do_simd_permute, a, permute_load_trn, 1)
+
+static int permute_load_zip(int i, int part, int elements)
+{
+return (i & 1) * elements + ((part * elements + i) >> 1);
+}
+
+TRANS(ZIP1, do_simd_permute, a, permute_load_zip, 0)
+TRANS(ZIP2, do_simd_permute, a, permute_load_zip, 1)
+
 /*
  * Cryptographic AES, SHA, SHA512
  */
@@ -8911,95 +8979,6 @@ static void disas_data_proc_fp(DisasContext *s, uint32_t 
insn)
 }
 }
 
-/* ZIP/UZP/TRN
- *   31  30 29 24 23  22  21 20   16 15 14 12 11 10 95 40
- * +---+---+-+--+---+--+---+--+--+
- * | 0 | Q | 0 0 1 1 1 0 | size | 0 |  Rm  | 0 | opc | 1 0 |  Rn  |  Rd  |
- * +---+---+-+--+---+--+---+--+--+
- */
-static void disas_simd_zip_trn(DisasContext *s, uint32_t insn)
-{
-int rd = extract32(insn, 0, 5);
-int rn = extract32(insn, 5, 5);
-int rm = extract32(insn, 16, 5);
-int size = extract32(insn, 22, 2);
-/* opc field bits [1:0] indicate ZIP/UZP/TRN;
- * bit 2 indicates 1 vs 2 variant of the insn.
- */
-int opcode = extract32(insn, 12, 2);
-bool part = extract32(insn, 14, 1);
-bool is_q = extract32(insn, 30, 1);
-int esize = 8 << size;
-int i;
-int datasize = is_q ? 128 : 64;
-int elements = datasize / esize;
-TCGv_i64 tcg_res[2], tcg_ele;
-
-if (opcode == 0 || (size == 3 && !is_q)) {
-unallocated_encoding(s);
-return;
-}
-
-if (!fp_access_check(s)) {
-return;
-}
-
-tcg_res[0] = tcg_temp_new_i64();
-tcg_res[1] = is_q ? tcg_temp_new_i64() : NULL;
-tcg_ele = tcg_temp_new_i64();
-
-for (i = 0; i < elements; i++) {
-int o, w;
-
-switch (opcode) {
-case 1: /* UZP1/2 */
-{
-int midpoint = elements / 2;
-if (i < midpoint) {
-read_vec_element(s, tcg_ele, rn, 2 * i + part, size);
-} else {
-read_vec_element(s, tcg_ele, rm,
- 2 * (i - midpoint) + part, size);
-}
-break;
-}
-case 2: /* TRN1/2 */
-if (i & 1) {
-read_vec_element(s, tcg_ele, rm, (i & ~1) + part, size);
-} else {
-read_vec_element(s, tcg_ele, rn, (i & ~1) + part, size);
-}
-break;
-case 3: /* ZIP1/2 */
-{
-int base = part * elements / 2;
-if (i & 1) {
-read_vec_element(s, tcg_ele, rm, base + (i >> 1), size);
-} else {
-read_vec_element(s, tcg_ele, rn, base + (i >> 1), size);
-}
-break;
-}
-  

[PATCH 01/17] target/arm: Use tcg_gen_extract2_i64 for EXT

2024-07-17 Thread Richard Henderson
The extract2 tcg op performs the same operation
as the do_ext64 function.

Signed-off-by: Richard Henderson 
---
 target/arm/tcg/translate-a64.c | 23 +++
 1 file changed, 3 insertions(+), 20 deletions(-)

diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index 559a6cd799..e4c8a20f39 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -8849,23 +8849,6 @@ static void disas_data_proc_fp(DisasContext *s, uint32_t 
insn)
 }
 }
 
-static void do_ext64(DisasContext *s, TCGv_i64 tcg_left, TCGv_i64 tcg_right,
- int pos)
-{
-/* Extract 64 bits from the middle of two concatenated 64 bit
- * vector register slices left:right. The extracted bits start
- * at 'pos' bits into the right (least significant) side.
- * We return the result in tcg_right, and guarantee not to
- * trash tcg_left.
- */
-TCGv_i64 tcg_tmp = tcg_temp_new_i64();
-assert(pos > 0 && pos < 64);
-
-tcg_gen_shri_i64(tcg_right, tcg_right, pos);
-tcg_gen_shli_i64(tcg_tmp, tcg_left, 64 - pos);
-tcg_gen_or_i64(tcg_right, tcg_right, tcg_tmp);
-}
-
 /* EXT
  *   31  30 29 24 23 22  21 20  16 15  14  11 10  95 40
  * +---+---+-+-+---+--+---+--+---+--+--+
@@ -8903,7 +8886,7 @@ static void disas_simd_ext(DisasContext *s, uint32_t insn)
 read_vec_element(s, tcg_resl, rn, 0, MO_64);
 if (pos != 0) {
 read_vec_element(s, tcg_resh, rm, 0, MO_64);
-do_ext64(s, tcg_resh, tcg_resl, pos);
+tcg_gen_extract2_i64(tcg_resl, tcg_resl, tcg_resh, pos);
 }
 } else {
 TCGv_i64 tcg_hh;
@@ -8924,10 +8907,10 @@ static void disas_simd_ext(DisasContext *s, uint32_t 
insn)
 read_vec_element(s, tcg_resh, elt->reg, elt->elt, MO_64);
 elt++;
 if (pos != 0) {
-do_ext64(s, tcg_resh, tcg_resl, pos);
+tcg_gen_extract2_i64(tcg_resl, tcg_resl, tcg_resh, pos);
 tcg_hh = tcg_temp_new_i64();
 read_vec_element(s, tcg_hh, elt->reg, elt->elt, MO_64);
-do_ext64(s, tcg_hh, tcg_resh, pos);
+tcg_gen_extract2_i64(tcg_resh, tcg_resh, tcg_hh, pos);
 }
 }
 
-- 
2.43.0




[PATCH v2 2/3] target/arm: Use FPST_F16 for SME FMOPA (widening)

2024-07-17 Thread Richard Henderson
This operation has float16 inputs and thus must use
the FZ16 control not the FZ control.

Cc: qemu-sta...@nongnu.org
Fixes: 3916841ac75 ("target/arm: Implement FMOPA, FMOPS (widening)")
Reported-by: Daniyal Khan 
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2374
Signed-off-by: Richard Henderson 
Reviewed-by: Alex Bennée 
---
 target/arm/tcg/translate-sme.c | 12 
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c
index 46c7fce8b4..185a8a917b 100644
--- a/target/arm/tcg/translate-sme.c
+++ b/target/arm/tcg/translate-sme.c
@@ -304,6 +304,7 @@ static bool do_outprod(DisasContext *s, arg_op *a, MemOp 
esz,
 }
 
 static bool do_outprod_fpst(DisasContext *s, arg_op *a, MemOp esz,
+ARMFPStatusFlavour e_fpst,
 gen_helper_gvec_5_ptr *fn)
 {
 int svl = streaming_vec_reg_size(s);
@@ -319,15 +320,18 @@ static bool do_outprod_fpst(DisasContext *s, arg_op *a, 
MemOp esz,
 zm = vec_full_reg_ptr(s, a->zm);
 pn = pred_full_reg_ptr(s, a->pn);
 pm = pred_full_reg_ptr(s, a->pm);
-fpst = fpstatus_ptr(FPST_FPCR);
+fpst = fpstatus_ptr(e_fpst);
 
 fn(za, zn, zm, pn, pm, fpst, tcg_constant_i32(desc));
 return true;
 }
 
-TRANS_FEAT(FMOPA_h, aa64_sme, do_outprod_fpst, a, MO_32, 
gen_helper_sme_fmopa_h)
-TRANS_FEAT(FMOPA_s, aa64_sme, do_outprod_fpst, a, MO_32, 
gen_helper_sme_fmopa_s)
-TRANS_FEAT(FMOPA_d, aa64_sme_f64f64, do_outprod_fpst, a, MO_64, 
gen_helper_sme_fmopa_d)
+TRANS_FEAT(FMOPA_h, aa64_sme, do_outprod_fpst, a,
+   MO_32, FPST_FPCR_F16, gen_helper_sme_fmopa_h)
+TRANS_FEAT(FMOPA_s, aa64_sme, do_outprod_fpst, a,
+   MO_32, FPST_FPCR, gen_helper_sme_fmopa_s)
+TRANS_FEAT(FMOPA_d, aa64_sme_f64f64, do_outprod_fpst, a,
+   MO_64, FPST_FPCR, gen_helper_sme_fmopa_d)
 
 /* TODO: FEAT_EBF16 */
 TRANS_FEAT(BFMOPA, aa64_sme, do_outprod, a, MO_32, gen_helper_sme_bfmopa)
-- 
2.43.0




[PATCH v2 1/3] target/arm: Use float_status copy in sme_fmopa_s

2024-07-17 Thread Richard Henderson
From: Daniyal Khan 

We made a copy above because the fp exception flags
are not propagated back to the FPST register, but
then failed to use the copy.

Cc: qemu-sta...@nongnu.org
Fixes: 558e956c719 ("target/arm: Implement FMOPA, FMOPS (non-widening)")
Signed-off-by: Daniyal Khan 
[rth: Split from a larger patch]
Signed-off-by: Richard Henderson 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Alex Bennée 
---
 target/arm/tcg/sme_helper.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/arm/tcg/sme_helper.c b/target/arm/tcg/sme_helper.c
index e2e0575039..5a6dd76489 100644
--- a/target/arm/tcg/sme_helper.c
+++ b/target/arm/tcg/sme_helper.c
@@ -916,7 +916,7 @@ void HELPER(sme_fmopa_s)(void *vza, void *vzn, void *vzm, 
void *vpn,
 if (pb & 1) {
 uint32_t *a = vza_row + H1_4(col);
 uint32_t *m = vzm + H1_4(col);
-*a = float32_muladd(n, *m, *a, 0, vst);
+*a = float32_muladd(n, *m, *a, 0, );
 }
 col += 4;
 pb >>= 4;
-- 
2.43.0




[PATCH v2 3/3] tests/tcg/aarch64: Add test cases for SME FMOPA (widening)

2024-07-17 Thread Richard Henderson
From: Daniyal Khan 

Signed-off-by: Daniyal Khan 
Message-Id: 172090222034.13953.1688870870882292209...@git.sr.ht
[rth: Split test from a larger patch, tidy assembly]
Signed-off-by: Richard Henderson 
Reviewed-by: Alex Bennée 
---
 tests/tcg/aarch64/sme-fmopa-1.c   | 63 +++
 tests/tcg/aarch64/sme-fmopa-2.c   | 56 +++
 tests/tcg/aarch64/sme-fmopa-3.c   | 63 +++
 tests/tcg/aarch64/Makefile.target |  5 ++-
 4 files changed, 185 insertions(+), 2 deletions(-)
 create mode 100644 tests/tcg/aarch64/sme-fmopa-1.c
 create mode 100644 tests/tcg/aarch64/sme-fmopa-2.c
 create mode 100644 tests/tcg/aarch64/sme-fmopa-3.c

diff --git a/tests/tcg/aarch64/sme-fmopa-1.c b/tests/tcg/aarch64/sme-fmopa-1.c
new file mode 100644
index 00..652c4ea090
--- /dev/null
+++ b/tests/tcg/aarch64/sme-fmopa-1.c
@@ -0,0 +1,63 @@
+/*
+ * SME outer product, 1 x 1.
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#include 
+
+static void foo(float *dst)
+{
+asm(".arch_extension sme\n\t"
+"smstart\n\t"
+"ptrue p0.s, vl4\n\t"
+"fmov z0.s, #1.0\n\t"
+/*
+ * An outer product of a vector of 1.0 by itself should be a matrix of 
1.0.
+ * Note that we are using tile 1 here (za1.s) rather than tile 0.
+ */
+"zero {za}\n\t"
+"fmopa za1.s, p0/m, p0/m, z0.s, z0.s\n\t"
+/*
+ * Read the first 4x4 sub-matrix of elements from tile 1:
+ * Note that za1h should be interchangeable here.
+ */
+"mov w12, #0\n\t"
+"mova z0.s, p0/m, za1v.s[w12, #0]\n\t"
+"mova z1.s, p0/m, za1v.s[w12, #1]\n\t"
+"mova z2.s, p0/m, za1v.s[w12, #2]\n\t"
+"mova z3.s, p0/m, za1v.s[w12, #3]\n\t"
+/*
+ * And store them to the input pointer (dst in the C code):
+ */
+"st1w {z0.s}, p0, [%0]\n\t"
+"add x0, x0, #16\n\t"
+"st1w {z1.s}, p0, [x0]\n\t"
+"add x0, x0, #16\n\t"
+"st1w {z2.s}, p0, [x0]\n\t"
+"add x0, x0, #16\n\t"
+"st1w {z3.s}, p0, [x0]\n\t"
+"smstop"
+: : "r"(dst)
+: "x12", "d0", "d1", "d2", "d3", "memory");
+}
+
+int main()
+{
+float dst[16] = { };
+
+foo(dst);
+
+for (int i = 0; i < 16; i++) {
+if (dst[i] != 1.0f) {
+goto failure;
+}
+}
+/* success */
+return 0;
+
+ failure:
+for (int i = 0; i < 16; i++) {
+printf("%f%c", dst[i], i % 4 == 3 ? '\n' : ' ');
+}
+return 1;
+}
diff --git a/tests/tcg/aarch64/sme-fmopa-2.c b/tests/tcg/aarch64/sme-fmopa-2.c
new file mode 100644
index 00..15f0972d83
--- /dev/null
+++ b/tests/tcg/aarch64/sme-fmopa-2.c
@@ -0,0 +1,56 @@
+/*
+ * SME outer product, FZ vs FZ16
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#include 
+#include 
+
+static void test_fmopa(uint32_t *result)
+{
+asm(".arch_extension sme\n\t"
+"smstart\n\t"   /* Z*, P* and ZArray cleared */
+"ptrue p2.b, vl16\n\t"  /* Limit vector length to 16 */
+"ptrue p5.b, vl16\n\t"
+"movi d0, #0x00ff\n\t"  /* fp16 denormal */
+"movi d16, #0x00ff\n\t"
+"mov w15, #0x000100\n\t" /* FZ=1, FZ16=0 */
+"msr fpcr, x15\n\t"
+"fmopa za3.s, p2/m, p5/m, z16.h, z0.h\n\t"
+"mov w15, #0\n\t"
+"st1w {za3h.s[w15, 0]}, p2, [%0]\n\t"
+"add %0, %0, #16\n\t"
+"st1w {za3h.s[w15, 1]}, p2, [%0]\n\t"
+"mov w15, #2\n\t"
+"add %0, %0, #16\n\t"
+"st1w {za3h.s[w15, 0]}, p2, [%0]\n\t"
+"add %0, %0, #16\n\t"
+"st1w {za3h.s[w15, 1]}, p2, [%0]\n\t"
+"smstop"
+: "+r"(result) :
+: "x15", "x16", "p2", "p5", "d0", "d16", "memory");
+}
+
+int main(void)
+{
+uint32_t result[4 * 4] = { };
+
+test_fmopa(result);
+
+if (result[0] != 0x2f7e0100) {
+printf("Test failed: Incorrect output in first 4 bytes\n"
+   "Expected: %08x\n"
+   "Got:  %08x\n",
+   0x2f7e0100, result[0]);
+return 1;
+}
+
+for (int i = 1; i < 16; ++i) {
+if (result[i] != 0) {
+printf("Test failed: Non-zero word at position %d\n", i);
+return 1;
+}
+}
+
+return 0;
+}
diff --git a/tests/tcg/aarch64/sme-fmopa-3.c b/tests/tcg/aarch64/sme-fmopa-3.c
new file mode 100644

[PATCH v2 0/3] target/arm: Fixes for SME FMOPA (#2373)

2024-07-17 Thread Richard Henderson
Changes for v2:
  - Apply r-b.
  - Add license headers to two test cases.

r~

Daniyal Khan (2):
  target/arm: Use float_status copy in sme_fmopa_s
  tests/tcg/aarch64: Add test cases for SME FMOPA (widening)

Richard Henderson (1):
  target/arm: Use FPST_F16 for SME FMOPA (widening)

 target/arm/tcg/sme_helper.c   |  2 +-
 target/arm/tcg/translate-sme.c| 12 --
 tests/tcg/aarch64/sme-fmopa-1.c   | 63 +++
 tests/tcg/aarch64/sme-fmopa-2.c   | 56 +++
 tests/tcg/aarch64/sme-fmopa-3.c   | 63 +++
 tests/tcg/aarch64/Makefile.target |  5 ++-
 6 files changed, 194 insertions(+), 7 deletions(-)
 create mode 100644 tests/tcg/aarch64/sme-fmopa-1.c
 create mode 100644 tests/tcg/aarch64/sme-fmopa-2.c
 create mode 100644 tests/tcg/aarch64/sme-fmopa-3.c

-- 
2.43.0




Re: [PULL 00/11] SD/MMC patches for 2024-07-16

2024-07-16 Thread Richard Henderson

On 7/17/24 04:41, Philippe Mathieu-Daudé wrote:

The following changes since commit 959269e910944c03bc13f300d65bf08b060d5d0f:

   Merge tag 'python-pull-request' ofhttps://gitlab.com/jsnow/qemu into staging 
(2024-07-16 06:45:23 +1000)

are available in the Git repository at:

   https://github.com/philmd/qemu.git tags/sdmmc-20240716

for you to fetch changes up to c8cb19876d3e29bffd7ffd87586ff451f97f5f46:

   hw/sd/sdcard: Support boot area in emmc image (2024-07-16 20:30:15 +0200)

Ignored checkpatch error:

   WARNING: line over 80 characters
   #109: FILE: hw/sd/sd.c:500:
   +sd->ext_csd[EXT_CSD_HC_WP_GRP_SIZE] = 0x01; /* HC write protect group 
size */


SD/MMC patches queue

Addition of eMMC support is a long-term collaborative virtual work by:

  - Cédric Le Goater
  - Edgar E. Iglesias
  - Francisco Iglesias
  - Joel Stanley
  - Luc Michel
  - Philippe Mathieu-Daudé
  - Sai Pavan Boddu
  - Vincent Palatin


Applied, thanks.

r~



Re: [PULL 00/13] Misc HW/UI patches for 2024-07-16

2024-07-16 Thread Richard Henderson

On 7/17/24 04:09, Philippe Mathieu-Daudé wrote:

The following changes since commit 959269e910944c03bc13f300d65bf08b060d5d0f:

   Merge tag 'python-pull-request' ofhttps://gitlab.com/jsnow/qemu into staging 
(2024-07-16 06:45:23 +1000)

are available in the Git repository at:

   https://github.com/philmd/qemu.git tags/hw-misc-20240716

for you to fetch changes up to 644a52778a90581dbda909f38b9eaf71501fd9cd:

   system/physmem: use return value of ram_block_discard_require() as errno 
(2024-07-16 20:04:08 +0200)

Ignored checkpatch error:

   WARNING: line over 80 characters
   #30: FILE: system/vl.c:1004:
   +if (!ti->class_names[0] || 
module_object_class_by_name(ti->class_names[0])) {

Ignored CI failures:

  - bios-tables-test on cross-i686-tci
  - qtest-sparc on msys2-64bit


Misc HW & UI patches queue

- Allow loading safely ROMs larger than 4GiB (Gregor)
- Convert vt82c686 IRQ as named 'intr' (Bernhard)
- Clarify QDev GPIO API (Peter)
- Drop unused load_image_gzipped function (Ani)
- MakeTCGCPUOps::cpu_exec_interrupt handler mandatory (Peter)
- Factor cpu_pause() out (Nicholas)
- Remove transfer size check from ESP DMA DATA IN / OUT transfers (Mark)
- Add accelerated cursor composition to Cocoa UI (Akihiko)
- Fix '-vga help' CLI (Marc-André)
- Fix displayed errno in ram_block_add (Zhenzhong)


Applied, thanks.

r~



Re: [RFC PATCH] gdbstub: Re-factor gdb command extensions

2024-07-16 Thread Richard Henderson

On 7/17/24 02:55, Alex Bennée wrote:

Are you expecting the same GdbCmdParseEntry object to be registered
multiple times?  Can we fix that at a higher level?


Its basically a hack to deal with the fact everything is tied to the
CPUObject so we register everything multiple times. We could do a if
(!registerd) register() dance but I guess I'm thinking forward to a
hydrogenous future but I guess we'd need to do more work then anyway.


Any chance we could move it all to the CPUClass?


r~



Re: [RFC PATCH] gdbstub: Re-factor gdb command extensions

2024-07-16 Thread Richard Henderson

On 7/16/24 21:42, Alex Bennée wrote:

  void gdb_extend_qsupported_features(char *qsupported_features)
  {
-/*
- * We don't support different sets of CPU gdb features on different CPUs 
yet
- * so assert the feature strings are the same on all CPUs, or is set only
- * once (1 CPU).
- */
-g_assert(extended_qsupported_features == NULL ||
- g_strcmp0(extended_qsupported_features, qsupported_features) == 
0);
-
-extended_qsupported_features = qsupported_features;
+if (!extended_qsupported_features) {
+extended_qsupported_features = g_strdup(qsupported_features);
+} else if (!g_strrstr(extended_qsupported_features, qsupported_features)) {


Did you really need the last instance of the substring?

I'll note that g_strrstr is quite simplistic, whereas strstr has a much more scalable 
algorithm.




+char *old = extended_qsupported_features;
+extended_qsupported_features = g_strdup_printf("%s%s", old, 
qsupported_features);


Right tool for the right job, please: g_strconcat().

That said, did you *really* want to concatenate now, and have to search through the 
middle, as opposed to storing N strings separately?  You could defer the concat until the 
actual negotiation with gdb.  That would reduce strstr above to a loop over strcmp.



+for (int i = 0; i < extensions->len; i++) {
+gpointer entry = g_ptr_array_index(extensions, i);
+if (!g_ptr_array_find(table, entry, NULL)) {
+g_ptr_array_add(table, entry);


Are you expecting the same GdbCmdParseEntry object to be registered multiple times?  Can 
we fix that at a higher level?



r~



Re: [PATCH v3] osdep: add a qemu_close_all_open_fd() helper

2024-07-16 Thread Richard Henderson

On 7/17/24 00:39, Clément Léger wrote:

+/* Restrict the range as we found fds matching start/end */
+if (i == skip_start)
+skip_start++;
+else if (i == skip_end)
+skip_end--;


Need braces.

Otherwise,
Reviewed-by: Richard Henderson 



r~



Re: [PULL 0/6] Python patches

2024-07-15 Thread Richard Henderson

On 7/16/24 03:32, John Snow wrote:

The following changes since commit 4469bee2c529832d762af4a2f89468c926f02fe4:

   Merge tag 'nvme-next-pull-request' ofhttps://gitlab.com/birkelund/qemu  into 
staging (2024-07-11 14:32:51 -0700)

are available in the Git repository at:

   https://gitlab.com/jsnow/qemu.git  tags/python-pull-request

for you to fetch changes up to dd23f9ec519db9c424223cff8767715de5532718:

   docs: remove Sphinx 1.x compatibility code (2024-07-12 16:46:21 -0400)


Python: 3.13 compat & sphinx minver bump


Applied, thanks.  Please update https://wiki.qemu.org/ChangeLog/9.1 as 
appropriate.

r~



FreeBSD update required for CI?

2024-07-15 Thread Richard Henderson

Hi guys,

CI currently failing FreeBSD:

https://gitlab.com/qemu-project/qemu/-/jobs/7347517439


pkg: No packages available to install matching 'py39-pillow' have been found in 
the repositories
pkg: No packages available to install matching 'py39-pip' have been found in 
the repositories
pkg: No packages available to install matching 'py39-sphinx' have been found in 
the repositories
pkg: No packages available to install matching 'py39-sphinx_rtd_theme' have 
been found in the repositories
pkg: No packages available to install matching 'py39-yaml' have been found in 
the repositories


Has FreeBSD ports updated to something beyond python 3.9, and we need an update 
to match?


r~



Re: [PATCH] disas: Fix build against Capstone v6

2024-07-15 Thread Richard Henderson

On 7/16/24 07:39, Gustavo Romero wrote:

Capstone v6 made major changes, such as renaming for AArch64, which
broke programs using the old headers, like QEMU. However, Capstone v6
provides the CAPSTONE_AARCH64_COMPAT_HEADER compatibility definition
allowing to build against v6 with the old definitions, so fix the QEMU
build using it.

We can lift that definition and switch to the new naming once our
supported distros have Capstone v6 in place.

Signed-off-by: Gustavo Romero 
Suggested-by: Peter Maydell 
---
  include/disas/capstone.h | 1 +
  1 file changed, 1 insertion(+)

diff --git a/include/disas/capstone.h b/include/disas/capstone.h
index e29068dd97..a11985151d 100644
--- a/include/disas/capstone.h
+++ b/include/disas/capstone.h
@@ -3,6 +3,7 @@
  
  #ifdef CONFIG_CAPSTONE
  
+#define CAPSTONE_AARCH64_COMPAT_HEADER

  #include 
  
  #else


Reviewed-by: Richard Henderson 


r~



Re: [PATCH v2 13/13] target/riscv: Simplify probing in vext_ldff

2024-07-15 Thread Richard Henderson

On 7/15/24 17:06, Max Chou wrote:

+/* Probe nonfault on subsequent elements. */
+flags = probe_access_flags(env, addr, offset, MMU_DATA_LOAD,
+   mmu_index, true, , 0);
+if (flags) {

According to the section 7.7. Unit-stride Fault-Only-First Loads in the v spec 
(v1.0)

      When the fault-only- data-watchpoint trap on an element after the  
implementations should not reduce vl but instead should trigger the debug trap as 
otherwise the event might be lost.


Hmm, ok.  Interesting.


And I think that there is a potential issue in the original implementation that maybe we 
can fix in this patch.


We need to assign the correct element load size to the probe_access_internal function 
called by tlb_vaddr_to_host in original implementation or is called directly in this patch.
The size parameter will be used by the pmp_hart_has_privs function to do the physical 
memory protection (PMP) checking.
If we set the size parameter to the remain page size, we may get unexpected trap caused by 
the PMP rules that covered the regions of masked-off elements.


Maybe we can replace the while loop liked below.


vext_ldff(void *vd, void *v0, target_ulong base,
   ...
{
     ...
     uint32_t size = nf << log2_esz;

     VSTART_CHECK_EARLY_EXIT(env);

     /* probe every access */
     for (i = env->vstart; i < env->vl; i++) {
     if (!vm && !vext_elem_mask(v0, i)) {
     continue;
     }
     addr = adjust_addr(env, base + i * size);
     if (i == 0) {
     probe_pages(env, addr, size, ra, MMU_DATA_LOAD);
     } else {
     /* if it triggers an exception, no need to check watchpoint */
     void *host;
     int flags;

     /* Probe nonfault on subsequent elements. */
     flags = probe_access_flags(env, addr, size, MMU_DATA_LOAD,
     mmu_index, true, , 0);
     if (flags & ~TLB_WATCHPOINT) {
     /*
  * Stop any flag bit set:
  *   invalid (unmapped)
  *   mmio (transaction failed)
  * In all cases, handle as the first load next time.
  */
     vl = i;
     break;
     }
     }
     }


No, I don't think repeated probing is a good idea.
You'll lose everything you attempted to gain with the other improvements.

It seems, to handle watchpoints, you need to start by probing the entire length non-fault. 
 That will tell you if any portion of the length has any of the problem cases.  The fast 
path will not, of course.


After probing, you have flags for the 1 or two pages, and you can make a choice about the 
actual load length:


  - invalid on first page: either the first element faults,
or you need to check PMP via some alternate mechanism.
Do not be afraid to add something to CPUTLBEntryFull.extra.riscv
during tlb_fill in order to accelerate this, if needed.

  - mmio on first page: just one element, as the second might fault
during the transaction.

It would be possible to enhance riscv_cpu_do_transaction_failed to
suppress the fault and set a flag noting the fault.  This would allow
multiple elements to be loaded, at the expense of another check after
each element within the slow tlb-load path.  I don't know if this is
desirable, really.  Using vector operations on mmio is usually a
programming error.  :-)

  - invalid or mmio on second page, continue to the end of the first page.

Once we have the actual load length, handle watchpoints by hand.
See sve_cont_ldst_watchpoints.

Finally, the loop loading the elements, likely in ram via host pointer.


r~



Re: [PATCH 2/3] target/arm: Use FPST_F16 for SME FMOPA (widening)

2024-07-15 Thread Richard Henderson

On 7/15/24 22:58, Richard Henderson wrote:

This operation has float16 inputs and thus must use
the FZ16 control not the FZ control.

Cc: qemu-sta...@nongnu.org
Reported-by: Daniyal Khan 
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2374
Signed-off-by: Richard Henderson 
---
  target/arm/tcg/translate-sme.c | 12 
  1 file changed, 8 insertions(+), 4 deletions(-)


Fixes: 3916841ac75 ("target/arm: Implement FMOPA, FMOPS (widening)")


r~



[PATCH 0/3] target/arm: Fixes for SME FMOPA (#2373)

2024-07-14 Thread Richard Henderson
Hi Daniyal,

Your fix for sme_fmopa_s is correct, but not the FZ16 fix.
We represent FZ16 with a separate float_status structure,
so all that is needed is to use that.

Thanks for the test cases.  I cleaned them up a little,
and wired them into the Makefile.


r~

Supercedes: 172090222034.13953.1688870870882292209...@git.sr.ht

Daniyal Khan (2):
  target/arm: Use float_status copy in sme_fmopa_s
  tests/tcg/aarch64: Add test cases for SME FMOPA (widening)

Richard Henderson (1):
  target/arm: Use FPST_F16 for SME FMOPA (widening)

 target/arm/tcg/sme_helper.c   |  2 +-
 target/arm/tcg/translate-sme.c| 12 --
 tests/tcg/aarch64/sme-fmopa-1.c   | 63 +++
 tests/tcg/aarch64/sme-fmopa-2.c   | 51 +
 tests/tcg/aarch64/sme-fmopa-3.c   | 58 
 tests/tcg/aarch64/Makefile.target |  5 ++-
 6 files changed, 184 insertions(+), 7 deletions(-)
 create mode 100644 tests/tcg/aarch64/sme-fmopa-1.c
 create mode 100644 tests/tcg/aarch64/sme-fmopa-2.c
 create mode 100644 tests/tcg/aarch64/sme-fmopa-3.c

-- 
2.43.0




[PATCH 2/3] target/arm: Use FPST_F16 for SME FMOPA (widening)

2024-07-14 Thread Richard Henderson
This operation has float16 inputs and thus must use
the FZ16 control not the FZ control.

Cc: qemu-sta...@nongnu.org
Reported-by: Daniyal Khan 
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2374
Signed-off-by: Richard Henderson 
---
 target/arm/tcg/translate-sme.c | 12 
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c
index 46c7fce8b4..185a8a917b 100644
--- a/target/arm/tcg/translate-sme.c
+++ b/target/arm/tcg/translate-sme.c
@@ -304,6 +304,7 @@ static bool do_outprod(DisasContext *s, arg_op *a, MemOp 
esz,
 }
 
 static bool do_outprod_fpst(DisasContext *s, arg_op *a, MemOp esz,
+ARMFPStatusFlavour e_fpst,
 gen_helper_gvec_5_ptr *fn)
 {
 int svl = streaming_vec_reg_size(s);
@@ -319,15 +320,18 @@ static bool do_outprod_fpst(DisasContext *s, arg_op *a, 
MemOp esz,
 zm = vec_full_reg_ptr(s, a->zm);
 pn = pred_full_reg_ptr(s, a->pn);
 pm = pred_full_reg_ptr(s, a->pm);
-fpst = fpstatus_ptr(FPST_FPCR);
+fpst = fpstatus_ptr(e_fpst);
 
 fn(za, zn, zm, pn, pm, fpst, tcg_constant_i32(desc));
 return true;
 }
 
-TRANS_FEAT(FMOPA_h, aa64_sme, do_outprod_fpst, a, MO_32, 
gen_helper_sme_fmopa_h)
-TRANS_FEAT(FMOPA_s, aa64_sme, do_outprod_fpst, a, MO_32, 
gen_helper_sme_fmopa_s)
-TRANS_FEAT(FMOPA_d, aa64_sme_f64f64, do_outprod_fpst, a, MO_64, 
gen_helper_sme_fmopa_d)
+TRANS_FEAT(FMOPA_h, aa64_sme, do_outprod_fpst, a,
+   MO_32, FPST_FPCR_F16, gen_helper_sme_fmopa_h)
+TRANS_FEAT(FMOPA_s, aa64_sme, do_outprod_fpst, a,
+   MO_32, FPST_FPCR, gen_helper_sme_fmopa_s)
+TRANS_FEAT(FMOPA_d, aa64_sme_f64f64, do_outprod_fpst, a,
+   MO_64, FPST_FPCR, gen_helper_sme_fmopa_d)
 
 /* TODO: FEAT_EBF16 */
 TRANS_FEAT(BFMOPA, aa64_sme, do_outprod, a, MO_32, gen_helper_sme_bfmopa)
-- 
2.43.0




[PATCH 1/3] target/arm: Use float_status copy in sme_fmopa_s

2024-07-14 Thread Richard Henderson
From: Daniyal Khan 

We made a copy above because the fp exception flags
are not propagated back to the FPST register, but
then failed to use the copy.

Cc: qemu-sta...@nongnu.org
Fixes: 558e956c719 ("target/arm: Implement FMOPA, FMOPS (non-widening)")
Signed-off-by: Daniyal Khan 
[rth: Split from a larger patch]
Signed-off-by: Richard Henderson 
---
 target/arm/tcg/sme_helper.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/arm/tcg/sme_helper.c b/target/arm/tcg/sme_helper.c
index e2e0575039..5a6dd76489 100644
--- a/target/arm/tcg/sme_helper.c
+++ b/target/arm/tcg/sme_helper.c
@@ -916,7 +916,7 @@ void HELPER(sme_fmopa_s)(void *vza, void *vzn, void *vzm, 
void *vpn,
 if (pb & 1) {
 uint32_t *a = vza_row + H1_4(col);
 uint32_t *m = vzm + H1_4(col);
-*a = float32_muladd(n, *m, *a, 0, vst);
+*a = float32_muladd(n, *m, *a, 0, );
 }
 col += 4;
 pb >>= 4;
-- 
2.43.0




[PATCH 3/3] tests/tcg/aarch64: Add test cases for SME FMOPA (widening)

2024-07-14 Thread Richard Henderson
From: Daniyal Khan 

Signed-off-by: Daniyal Khan 
Message-Id: 172090222034.13953.1688870870882292209...@git.sr.ht
[rth: Split test cases to separate patch, tidy assembly.]
Signed-off-by: Richard Henderson 
---
 tests/tcg/aarch64/sme-fmopa-1.c   | 63 +++
 tests/tcg/aarch64/sme-fmopa-2.c   | 51 +
 tests/tcg/aarch64/sme-fmopa-3.c   | 58 
 tests/tcg/aarch64/Makefile.target |  5 ++-
 4 files changed, 175 insertions(+), 2 deletions(-)
 create mode 100644 tests/tcg/aarch64/sme-fmopa-1.c
 create mode 100644 tests/tcg/aarch64/sme-fmopa-2.c
 create mode 100644 tests/tcg/aarch64/sme-fmopa-3.c

diff --git a/tests/tcg/aarch64/sme-fmopa-1.c b/tests/tcg/aarch64/sme-fmopa-1.c
new file mode 100644
index 00..652c4ea090
--- /dev/null
+++ b/tests/tcg/aarch64/sme-fmopa-1.c
@@ -0,0 +1,63 @@
+/*
+ * SME outer product, 1 x 1.
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#include 
+
+static void foo(float *dst)
+{
+asm(".arch_extension sme\n\t"
+"smstart\n\t"
+"ptrue p0.s, vl4\n\t"
+"fmov z0.s, #1.0\n\t"
+/*
+ * An outer product of a vector of 1.0 by itself should be a matrix of 
1.0.
+ * Note that we are using tile 1 here (za1.s) rather than tile 0.
+ */
+"zero {za}\n\t"
+"fmopa za1.s, p0/m, p0/m, z0.s, z0.s\n\t"
+/*
+ * Read the first 4x4 sub-matrix of elements from tile 1:
+ * Note that za1h should be interchangeable here.
+ */
+"mov w12, #0\n\t"
+"mova z0.s, p0/m, za1v.s[w12, #0]\n\t"
+"mova z1.s, p0/m, za1v.s[w12, #1]\n\t"
+"mova z2.s, p0/m, za1v.s[w12, #2]\n\t"
+"mova z3.s, p0/m, za1v.s[w12, #3]\n\t"
+/*
+ * And store them to the input pointer (dst in the C code):
+ */
+"st1w {z0.s}, p0, [%0]\n\t"
+"add x0, x0, #16\n\t"
+"st1w {z1.s}, p0, [x0]\n\t"
+"add x0, x0, #16\n\t"
+"st1w {z2.s}, p0, [x0]\n\t"
+"add x0, x0, #16\n\t"
+"st1w {z3.s}, p0, [x0]\n\t"
+"smstop"
+: : "r"(dst)
+: "x12", "d0", "d1", "d2", "d3", "memory");
+}
+
+int main()
+{
+float dst[16] = { };
+
+foo(dst);
+
+for (int i = 0; i < 16; i++) {
+if (dst[i] != 1.0f) {
+goto failure;
+}
+}
+/* success */
+return 0;
+
+ failure:
+for (int i = 0; i < 16; i++) {
+printf("%f%c", dst[i], i % 4 == 3 ? '\n' : ' ');
+}
+return 1;
+}
diff --git a/tests/tcg/aarch64/sme-fmopa-2.c b/tests/tcg/aarch64/sme-fmopa-2.c
new file mode 100644
index 00..198cc31528
--- /dev/null
+++ b/tests/tcg/aarch64/sme-fmopa-2.c
@@ -0,0 +1,51 @@
+#include 
+#include 
+
+static void test_fmopa(uint32_t *result)
+{
+asm(".arch_extension sme\n\t"
+"smstart\n\t"   /* Z*, P* and ZArray cleared */
+"ptrue p2.b, vl16\n\t"  /* Limit vector length to 16 */
+"ptrue p5.b, vl16\n\t"
+"movi d0, #0x00ff\n\t"  /* fp16 denormal */
+"movi d16, #0x00ff\n\t"
+"mov w15, #0x000100\n\t" /* FZ=1, FZ16=0 */
+"msr fpcr, x15\n\t"
+"fmopa za3.s, p2/m, p5/m, z16.h, z0.h\n\t"
+"mov w15, #0\n\t"
+"st1w {za3h.s[w15, 0]}, p2, [%0]\n\t"
+"add %0, %0, #16\n\t"
+"st1w {za3h.s[w15, 1]}, p2, [%0]\n\t"
+"mov w15, #2\n\t"
+"add %0, %0, #16\n\t"
+"st1w {za3h.s[w15, 0]}, p2, [%0]\n\t"
+"add %0, %0, #16\n\t"
+"st1w {za3h.s[w15, 1]}, p2, [%0]\n\t"
+"smstop"
+: "+r"(result) :
+: "x15", "x16", "p2", "p5", "d0", "d16", "memory");
+}
+
+int main(void)
+{
+uint32_t result[4 * 4] = { };
+
+test_fmopa(result);
+
+if (result[0] != 0x2f7e0100) {
+printf("Test failed: Incorrect output in first 4 bytes\n"
+   "Expected: %08x\n"
+   "Got:  %08x\n",
+   0x2f7e0100, result[0]);
+return 1;
+}
+
+for (int i = 1; i < 16; ++i) {
+if (result[i] != 0) {
+printf("Test failed: Non-zero word at position %d\n", i);
+return 1;
+}
+}
+
+return 0;
+}
diff --git a/tests/tcg/aarch64/sme-fmopa-3.c b/tests/tcg/aarch64/sme-fmopa-3.c
new file mode 100644
index 00..6617355c9d
--- /dev/null
+++ b/tests/tcg/aarch64/sme-fmopa-3.c
@@ -0,0 +1,58 @@
+#include 
+#incl

Re: [PULL v2 0/1] ufs queue

2024-07-14 Thread Richard Henderson

On 7/14/24 01:24, Jeuk Kim wrote:

From: Jeuk Kim

The following changes since commit 37fbfda8f4145ba1700f63f0cb7be4c108d545de:

   Merge tag 'edgar/xen-queue-2024-07-12.for-upstream' 
ofhttps://gitlab.com/edgar.iglesias/qemu  into staging (2024-07-12 09:53:22 
-0700)

are available in the Git repository at:

   https://gitlab.com/jeuk20.kim/qemu.git  tags/pull-ufs-20240714

for you to fetch changes up to 50475f1511964775ff73c2b07239c3ff571f75cd:

   hw/ufs: Fix mcq register range check logic (2024-07-14 17:11:21 +0900)


hw/ufs:
  - Fix invalid address access in mcq register check

I didn't cc qemu-stable@, as 5c079578d2e4 ("hw/ufs: Add support MCQ of
UFSHCI 4.0") is not yet included in any release tag. If I'm wrong,
please let me know. Thanks.


Applied, thanks.  Please update https://wiki.qemu.org/ChangeLog/9.1 as 
appropriate.

r~



Re: [PULL 00/13] target/i386 changes for 2024-07-12

2024-07-14 Thread Richard Henderson

On 7/14/24 04:10, Paolo Bonzini wrote:

The following changes since commit 23901b2b721c0576007ab7580da8aa855d6042a9:

   Merge tag 'pull-target-arm-20240711' of 
https://git.linaro.org/people/pmaydell/qemu-arm into staging (2024-07-11 
12:00:00 -0700)

are available in the Git repository at:

   https://gitlab.com/bonzini/qemu.git tags/for-upstream-i386

for you to fetch changes up to cdcadf9ee9efef96323e0b88fccff589f06fc0ee:

   i386/sev: Don't allow automatic fallback to legacy KVM_SEV*_INIT (2024-07-12 
15:35:54 +0200)


* target/i386/tcg: fixes for seg_helper.c
* SEV: Don't allow automatic fallback to legacy KVM_SEV_INIT,
   but also don't use it by default


Fails testing:

https://gitlab.com/qemu-project/qemu/-/jobs/7338361630

2024-07-14 23:45:07,744 __init__ L0153 DEBUG| EIP: 
alternative_instructions+0x2b/0xfa
2024-07-14 23:45:07,746 __init__ L0153 DEBUG| Code: 89 e5 83 ec 08 64 a1 c0 06 f4 
c3 89 45 fc 31 c0 b8 e4 f7 ef c3 c7 45 f8
00 00 00 00 e8 84 6f 7a ff 85 c0 74 02 0f 0b 8d 45 f8  90 90 90 90 83 7d f8 01 74 02 
0f 0b b8 e4 f7 ef c3 e8 04 6e 7a
2024-07-14 23:45:07,747 __init__ L0153 DEBUG| EAX: c3e0bf38 EBX:  ECX: 
 EDX: 00200292
2024-07-14 23:45:07,747 __init__ L0153 DEBUG| ESI: c3d54b3f EDI: c3d555e0 EBP: 
c3e0bf40 ESP: c3e0bf38
2024-07-14 23:45:07,748 __init__ L0153 DEBUG| DS: 007b ES: 007b FS: 00d8 GS:  
SS: 0068 EFLAGS: 00210246
2024-07-14 23:45:07,748 __init__ L0153 DEBUG| CR0: 80050033 CR2: c3e0bf34 CR3: 
03f4c000 CR4: 06d0

2024-07-14 23:45:07,748 __init__ L0153 DEBUG| Call Trace:
2024-07-14 23:45:07,750 __init__ L0153 DEBUG| check_bugs+0x900/0x91e
2024-07-14 23:45:07,750 __init__ L0153 DEBUG| ? 
__get_locked_pte+0x67/0xb0
2024-07-14 23:45:07,750 __init__ L0153 DEBUG| start_kernel+0x4d3/0x501
2024-07-14 23:45:07,750 __init__ L0153 DEBUG| ? set_intr_gate+0x42/0x55
2024-07-14 23:45:07,750 __init__ L0153 DEBUG| 
i386_start_kernel+0x43/0x45
2024-07-14 23:45:07,751 __init__ L0153 DEBUG| startup_32_smp+0x161/0x164
2024-07-14 23:45:07,751 __init__ L0153 DEBUG| Modules linked in:
2024-07-14 23:45:07,751 __init__ L0153 DEBUG| CR2: c3e0bf34
2024-07-14 23:45:07,752 __init__ L0153 DEBUG| ---[ end trace 
7adaac7a13f2a45f ]---
2024-07-14 23:45:07,752 __init__ L0153 DEBUG| EIP: 
alternative_instructions+0x2b/0xfa
2024-07-14 23:45:07,753 __init__ L0153 DEBUG| Code: 89 e5 83 ec 08 64 a1 c0 06 f4 
c3 89 45 fc 31 c0 b8 e4 f7 ef c3 c7 45 f8
00 00 00 00 e8 84 6f 7a ff 85 c0 74 02 0f 0b 8d 45 f8  90 90 90 90 83 7d f8 01 74 02 
0f 0b b8 e4 f7 ef c3 e8 04 6e 7a
2024-07-14 23:45:07,753 __init__ L0153 DEBUG| EAX: c3e0bf38 EBX:  ECX: 
 EDX: 00200292
2024-07-14 23:45:07,753 __init__ L0153 DEBUG| ESI: c3d54b3f EDI: c3d555e0 EBP: 
c3e0bf40 ESP: c3e0bf38
2024-07-14 23:45:07,754 __init__ L0153 DEBUG| DS: 007b ES: 007b FS: 00d8 GS:  
SS: 0068 EFLAGS: 00210246
2024-07-14 23:45:07,754 __init__ L0153 DEBUG| CR0: 80050033 CR2: c3e0bf34 CR3: 
03f4c000 CR4: 06d0
2024-07-14 23:45:07,754 __init__ L0153 DEBUG| Kernel panic - not syncing: 
Attempted to kill the idle task!




r~



Reminder: soft freeze on 23 July

2024-07-14 Thread Richard Henderson

https://wiki.qemu.org/Planning/9.1

Just a friendly reminder that soft freeze is coming up soon.


r~



Re: [PULL v1 0/3] Xen queue

2024-07-12 Thread Richard Henderson

On 7/12/24 04:02, Edgar E. Iglesias wrote:

From: "Edgar E. Iglesias"

The following changes since commit 23901b2b721c0576007ab7580da8aa855d6042a9:

   Merge tag 'pull-target-arm-20240711' 
ofhttps://git.linaro.org/people/pmaydell/qemu-arm  into staging (2024-07-11 
12:00:00 -0700)

are available in the Git repository at:

   https://gitlab.com/edgar.iglesias/qemu.git  
tags/edgar/xen-queue-2024-07-12.for-upstream

for you to fetch changes up to 872cb9cced796e75d4f719c31d70ed5fd629efca:

   xen: mapcache: Fix unmapping of first entries in buckets (2024-07-12 
00:17:36 +0200)


Edgars Xen queue.


Applied, thanks.  Please update https://wiki.qemu.org/ChangeLog/9.1 as 
appropriate.

r~



Re: [PULL v2 0/8] loongarch-to-apply queue

2024-07-12 Thread Richard Henderson

On 7/11/24 18:36, Song Gao wrote:

The following changes since commit 23901b2b721c0576007ab7580da8aa855d6042a9:

   Merge tag 'pull-target-arm-20240711' 
ofhttps://git.linaro.org/people/pmaydell/qemu-arm  into staging (2024-07-11 
12:00:00 -0700)

are available in the Git repository at:

   https://gitlab.com/gaosong/qemu.git  tags/pull-loongarch-20240712

for you to fetch changes up to 3ef4b21a5c767ff0b15047e709762abef490ad07:

   target/loongarch: Fix cpu_reset set wrong CSR_CRMD (2024-07-12 09:41:18 
+0800)


pull-loongarch-20240712

v2: drop patch 'hw/loongarch: Modify flash block size to 256K'.


Applied, thanks.  Please update https://wiki.qemu.org/ChangeLog/9.1 as 
appropriate.

r~



Re: [PULL 0/7] hw/nvme patches

2024-07-12 Thread Richard Henderson

On 7/11/24 11:04, Klaus Jensen wrote:

From: Klaus Jensen

Hi,

The following changes since commit 59084feb256c617063e0dbe7e64821ae8852d7cf:

   Merge tag 'pull-aspeed-20240709' ofhttps://github.com/legoater/qemu  into 
staging (2024-07-09 07:13:55 -0700)

are available in the Git repository at:

   https://gitlab.com/birkelund/qemu.git  tags/nvme-next-pull-request

for you to fetch changes up to 15ef124c93a4d4ba6b98b55492e3a1b3297248b0:

   hw/nvme: Expand VI/VQ resource to uint32 (2024-07-11 17:05:37 +0200)


Applied, thanks.  Please update https://wiki.qemu.org/ChangeLog/9.1 as 
appropriate.

r~



Re: [PATCH] accel/tcg: Make cpu_exec_interrupt hook mandatory

2024-07-12 Thread Richard Henderson

On 7/12/24 04:39, Peter Maydell wrote:

TheTCGCPUOps::cpu_exec_interrupt  hook is currently not mandatory; if
it is left NULL then we treat it as if it had returned false. However
since pretty much every architecture needs to handle interrupts,
almost every target we have provides the hook. The one exception is
Tricore, which doesn't currently implement the architectural
interrupt handling.

Add a "do nothing" implementation of cpu_exec_hook for Tricore,
assert on startup that the CPU does provide the hook, and remove
the runtime NULL check before calling it.

Signed-off-by: Peter Maydell
---
  accel/tcg/cpu-exec.c | 4 ++--
  target/tricore/cpu.c | 6 ++
  2 files changed, 8 insertions(+), 2 deletions(-)


Reviewed-by: Richard Henderson 

r~



Re: [RFC PATCH 5/8] tests_pytest: Implement fetch_asset() method for downloading assets

2024-07-11 Thread Richard Henderson

On 7/11/24 12:23, Alex Bennée wrote:

Richard Henderson  writes:


On 7/11/24 09:45, Richard Henderson wrote:

On 7/11/24 04:55, Thomas Huth wrote:

+    def fetch_asset(self, url, asset_hash):
+    cache_dir = os.path.expanduser("~/.cache/qemu/download")
+    if not os.path.exists(cache_dir):
+    os.makedirs(cache_dir)
+    fname = os.path.join(cache_dir,
+ hashlib.sha1(url.encode("utf-8")).hexdigest())
+    if os.path.exists(fname) and self.check_hash(fname, asset_hash):
+    return fname
+    logging.debug("Downloading %s to %s...", url, fname)
+    subprocess.check_call(["wget", "-c", url, "-O", fname + ".download"])
+    os.rename(fname + ".download", fname)
+    return fname

Download failure via exception?
Check hash on downloaded asset?


I would prefer to see assets, particularly downloading, handled in a
separate pass from tests.


And I assume cachable?


The cache is already handled here.  But downloading after cache miss is non-optional, may 
not fail, and is accounted against the meson test timeout.



r~



Re: [PULL 00/24] target-arm queue

2024-07-11 Thread Richard Henderson

On 7/11/24 06:17, Peter Maydell wrote:

The following changes since commit 59084feb256c617063e0dbe7e64821ae8852d7cf:

   Merge tag 'pull-aspeed-20240709' ofhttps://github.com/legoater/qemu  into 
staging (2024-07-09 07:13:55 -0700)

are available in the Git repository at:

   https://git.linaro.org/people/pmaydell/qemu-arm.git  
tags/pull-target-arm-20240711

for you to fetch changes up to 7f49089158a4db644fcbadfa90cd3d30a4868735:

   target/arm: Convert PMULL to decodetree (2024-07-11 11:41:34 +0100)


target-arm queue:
  * Refactor FPCR/FPSR handling in preparation for FEAT_AFP
  * More decodetree conversions
  * target/arm: Use cpu_env in cpu_untagged_addr
  * target/arm: Set arm_v7m_tcg_ops cpu_exec_halt to arm_cpu_exec_halt()
  * hw/char/pl011: Avoid division-by-zero in pl011_get_baudrate()
  * hw/misc/bcm2835_thermal: Fix access size handling in bcm2835_thermal_ops
  * accel/tcg: MakeTCGCPUOps::cpu_exec_halt  mandatory
  * STM32L4x5: Handle USART interrupts correctly


Applied, thanks.  Please update https://wiki.qemu.org/ChangeLog/9.1 as 
appropriate.

r~



Re: [PULL 0/2] Block patches

2024-07-11 Thread Richard Henderson

On 7/11/24 02:17, Stefan Hajnoczi wrote:

The following changes since commit 59084feb256c617063e0dbe7e64821ae8852d7cf:

   Merge tag 'pull-aspeed-20240709' ofhttps://github.com/legoater/qemu  into 
staging (2024-07-09 07:13:55 -0700)

are available in the Git repository at:

   https://gitlab.com/stefanha/qemu.git  tags/block-pull-request

for you to fetch changes up to d05ae948cc887054495977855b0859d0d4ab2613:

   Consider discard option when writing zeros (2024-07-11 11:06:36 +0200)


Pull request

A discard fix from Nir Soffer.



Applied, thanks.  Please update https://wiki.qemu.org/ChangeLog/9.1 as 
appropriate.

r~



Re: [PULL 0/1] Host Memory Backends and Memory devices queue 2024-07-10

2024-07-11 Thread Richard Henderson

On 7/10/24 11:00, David Hildenbrand wrote:

The following changes since commit 59084feb256c617063e0dbe7e64821ae8852d7cf:

   Merge tag 'pull-aspeed-20240709' ofhttps://github.com/legoater/qemu  into 
staging (2024-07-09 07:13:55 -0700)

are available in the Git repository at:

   https://github.com/davidhildenbrand/qemu.git  tags/mem-2024-07-10

for you to fetch changes up to 4d13ae45ff93fa825ceb39dfd16b305f4baccd18:

   virtio-mem: improve error message when unplug of device fails due to plugged 
memory (2024-07-10 18:06:24 +0200)


Hi,

"Host Memory Backends" and "Memory devices" queue ("mem"):
- Only one error message improvement that causes less confusion when
   triggered from libvirt


Applied, thanks.  Please update https://wiki.qemu.org/ChangeLog/9.1 as 
appropriate.

r~




Re: [RFC PATCH 5/8] tests_pytest: Implement fetch_asset() method for downloading assets

2024-07-11 Thread Richard Henderson

On 7/11/24 09:45, Richard Henderson wrote:

On 7/11/24 04:55, Thomas Huth wrote:

+    def fetch_asset(self, url, asset_hash):
+    cache_dir = os.path.expanduser("~/.cache/qemu/download")
+    if not os.path.exists(cache_dir):
+    os.makedirs(cache_dir)
+    fname = os.path.join(cache_dir,
+ hashlib.sha1(url.encode("utf-8")).hexdigest())
+    if os.path.exists(fname) and self.check_hash(fname, asset_hash):
+    return fname
+    logging.debug("Downloading %s to %s...", url, fname)
+    subprocess.check_call(["wget", "-c", url, "-O", fname + ".download"])
+    os.rename(fname + ".download", fname)
+    return fname


Download failure via exception?
Check hash on downloaded asset?


I would prefer to see assets, particularly downloading, handled in a separate 
pass from tests.

(1) Asset download should not count against test timeout.
(2) Running tests while disconnected should skip unavailable assets.

Avocado kinda does this, but still generates errors instead of skips.


r~




Re: [PATCH v2] osdep: add a qemu_close_all_open_fd() helper

2024-07-11 Thread Richard Henderson

On 6/18/24 04:17, Clément Léger wrote:

Since commit 03e471c41d8b ("qemu_init: increase NOFILE soft limit on
POSIX"), the maximum number of file descriptors that can be opened are
raised to nofile.rlim_max. On recent debian distro, this yield a maximum
of 1073741816 file descriptors. Now, when forking to start
qemu-bridge-helper, this actually calls close() on the full possible file
descriptor range (more precisely [3 - sysconf(_SC_OPEN_MAX)]) which
takes a considerable amount of time. In order to reduce that time,
factorize existing code to close all open files descriptors in a new
qemu_close_all_open_fd() function. This function uses various methods
to close all the open file descriptors ranging from the most efficient
one to the least one. It also accepts an ordered array of file
descriptors that should not be closed since this is required by the
callers that calls it after forking.

Signed-off-by: Clément Léger 



v2:
  - Factorize async_teardown.c close_fds implementation as well as tap.c ones
  - Apply checkpatch
  - v1: 
https://lore.kernel.org/qemu-devel/20240617162520.4045016-1-cle...@rivosinc.com/

---
  include/qemu/osdep.h|   8 +++
  net/tap.c   |  31 ++-
  system/async-teardown.c |  37 +
  util/osdep.c| 115 
  4 files changed, 141 insertions(+), 50 deletions(-)

diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
index f61edcfdc2..9369a97d3d 100644
--- a/include/qemu/osdep.h
+++ b/include/qemu/osdep.h
@@ -755,6 +755,14 @@ static inline void qemu_reset_optind(void)
  
  int qemu_fdatasync(int fd);
  
+/**

+ * Close all open file descriptors except the ones supplied in the @skip array
+ *
+ * @skip: ordered array of distinct file descriptors that should not be closed
+ * @nskip: number of entries in the @skip array.
+ */
+void qemu_close_all_open_fd(const int *skip, unsigned int nskip);
+
  /**
   * Sync changes made to the memory mapped file back to the backing
   * storage. For POSIX compliant systems this will fallback
diff --git a/net/tap.c b/net/tap.c
index 51f7aec39d..6fc3939078 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -385,6 +385,21 @@ static TAPState *net_tap_fd_init(NetClientState *peer,
  return s;
  }
  
+static void close_all_fds_after_fork(int excluded_fd)

+{
+const int skip_fd[] = {0, 1, 2, 3, excluded_fd};


3 should not be included here...


+unsigned int nskip = ARRAY_SIZE(skip_fd);
+
+/*
+ * skip_fd must be an ordered array of distinct fds, exclude
+ * excluded_fd if already included in the [0 - 3] range
+ */
+if (excluded_fd <= 3) {


or here -- stdin is 0, stdout is 1, stderr is 2.

Perhaps we need reminding of this and use the STD*_FILENO names instead of raw integer 
constants.




@@ -400,13 +415,7 @@ static void launch_script(const char *setup_script, const 
char *ifname,
  return;
  }
  if (pid == 0) {
-int open_max = sysconf(_SC_OPEN_MAX), i;
-
-for (i = 3; i < open_max; i++) {
-if (i != fd) {
-close(i);
-}
-}


Note that the original *does* close 3.


+#ifdef CONFIG_LINUX
+static bool qemu_close_all_open_fd_proc(const int *skip, unsigned int nskip)
+{
+struct dirent *de;
+int fd, dfd;
+bool close_fd;
+DIR *dir;
+int i;
+
+dir = opendir("/proc/self/fd");
+if (!dir) {
+/* If /proc is not mounted, there is nothing that can be done. */
+return false;
+}
+/* Avoid closing the directory. */
+dfd = dirfd(dir);
+
+for (de = readdir(dir); de; de = readdir(dir)) {
+fd = atoi(de->d_name);
+close_fd = true;
+if (fd == dfd) {
+close_fd = false;
+} else {
+for (i = 0; i < nskip; i++) {


The skip list is sorted, so you should remember the point of the last search and begin 
from there, and you should not search past fd < skip[i].




+#else
+static bool qemu_close_all_open_fd_proc(const int *skip, unsigned int nskip)
+{
+return false;
+}
+#endif


I'm not fond of duplicating the function declaration.
I think it's better to move the ifdef inside:

static bool foo(...)
{
#ifdef XYZ
  impl
#else
  stub
#endif
}


+
+#ifdef CONFIG_CLOSE_RANGE
+static bool qemu_close_all_open_fd_close_range(const int *skip,
+   unsigned int nskip)
+{
+int max_fd = sysconf(_SC_OPEN_MAX) - 1;
+int first = 0, last = max_fd;
+int cur_skip = 0, ret;
+
+do {
+if (nskip) {
+while (first == skip[cur_skip]) {
+cur_skip++;
+first++;
+}


This fails to check cur_skip < nskip in the loop.
Mixing signed cur_skip with unsigned nskip is bad.

There seems to be no good reason for the separate "if (nskip)" check.
A proper check for cur_skip < nskip will work just fine with nskip == 0.


+/* Fallback */
+for (i = 0; i < open_max; i++) {

Re: [RFC PATCH 5/8] tests_pytest: Implement fetch_asset() method for downloading assets

2024-07-11 Thread Richard Henderson

On 7/11/24 04:55, Thomas Huth wrote:

+def fetch_asset(self, url, asset_hash):
+cache_dir = os.path.expanduser("~/.cache/qemu/download")
+if not os.path.exists(cache_dir):
+os.makedirs(cache_dir)
+fname = os.path.join(cache_dir,
+ hashlib.sha1(url.encode("utf-8")).hexdigest())
+if os.path.exists(fname) and self.check_hash(fname, asset_hash):
+return fname
+logging.debug("Downloading %s to %s...", url, fname)
+subprocess.check_call(["wget", "-c", url, "-O", fname + ".download"])
+os.rename(fname + ".download", fname)
+return fname


Download failure via exception?
Check hash on downloaded asset?


r~



Re: [PATCH 09/10] target/i386/tcg: use X86Access for TSS access

2024-07-11 Thread Richard Henderson

On 7/10/24 23:28, Paolo Bonzini wrote:

On 7/10/24 20:40, Paolo Bonzini wrote:



Il mer 10 lug 2024, 18:47 Richard Henderson <mailto:richard.hender...@linaro.org>> ha scritto:


    On 7/9/24 23:29, Paolo Bonzini wrote:
 > This takes care of probing the vaddr range in advance, and is
    also faster
 > because it avoids repeated TLB lookups.  It also matches the
    Intel manual
 > better, as it says "Checks that the current (old) TSS, new TSS,
    and all
 > segment descriptors used in the task switch are paged into system
    memory";
 > note however that it's not clear how the processor checks for segment
 > descriptors, and this check is not included in the AMD manual.
 >
 > Signed-off-by: Paolo Bonzini mailto:pbonz...@redhat.com>>
 > ---
 >   target/i386/tcg/seg_helper.c | 101
    ++-
 >   1 file changed, 51 insertions(+), 50 deletions(-)
 >
 > diff --git a/target/i386/tcg/seg_helper.c
    b/target/i386/tcg/seg_helper.c
 > index 25af9d4a4ec..77f2c65c3cf 100644
 > --- a/target/i386/tcg/seg_helper.c
 > +++ b/target/i386/tcg/seg_helper.c
 > @@ -311,35 +313,44 @@ static int switch_tss_ra(CPUX86State *env,
    int tss_selector,
 >           raise_exception_err_ra(env, EXCP0A_TSS, tss_selector &
    0xfffc, retaddr);
 >       }
 >
 > +    /* X86Access avoids memory exceptions during the task switch */
 > +    access_prepare_mmu(, env, env->tr.base, old_tss_limit_max,
 > +                    MMU_DATA_STORE, cpu_mmu_index_kernel(env),
    retaddr);
 > +
 > +    if (source == SWITCH_TSS_CALL) {
 > +        /* Probe for future write of parent task */
 > +        probe_access(env, tss_base, 2, MMU_DATA_STORE,
 > +                  cpu_mmu_index_kernel(env), retaddr);
 > +    }
 > +    access_prepare_mmu(, env, tss_base, tss_limit,
 > +                    MMU_DATA_LOAD, cpu_mmu_index_kernel(env),
    retaddr);

    You're computing cpu_mmu_index_kernel 3 times.


Squashing this in (easier to review than the whole thing):


Excellent, thanks!


r~



diff --git a/target/i386/tcg/seg_helper.c b/target/i386/tcg/seg_helper.c
index 4123ff1245e..4edfd26135f 100644
--- a/target/i386/tcg/seg_helper.c
+++ b/target/i386/tcg/seg_helper.c
@@ -321,7 +321,7 @@ static void switch_tss_ra(CPUX86State *env, int 
tss_selector,
  uint32_t new_eflags, new_eip, new_cr3, new_ldt, new_trap;
  uint32_t old_eflags, eflags_mask;
  SegmentCache *dt;
-    int index;
+    int mmu_index, index;
  target_ulong ptr;
  X86Access old, new;

@@ -378,16 +378,17 @@ static void switch_tss_ra(CPUX86State *env, int 
tss_selector,
  }

  /* X86Access avoids memory exceptions during the task switch */
+    mmu_index = cpu_mmu_index_kernel(env);
  access_prepare_mmu(, env, env->tr.base, old_tss_limit_max,
-   MMU_DATA_STORE, cpu_mmu_index_kernel(env), retaddr);
+   MMU_DATA_STORE, mmu_index, retaddr);

  if (source == SWITCH_TSS_CALL) {
  /* Probe for future write of parent task */
  probe_access(env, tss_base, 2, MMU_DATA_STORE,
- cpu_mmu_index_kernel(env), retaddr);
+ mmu_index, retaddr);
  }
  access_prepare_mmu(, env, tss_base, tss_limit,
-   MMU_DATA_LOAD, cpu_mmu_index_kernel(env), retaddr);
+   MMU_DATA_LOAD, mmu_index, retaddr);

  /* read all the registers from the new TSS */
  if (type & 8) {
@@ -468,7 +469,11 @@ static void switch_tss_ra(CPUX86State *env, int 
tss_selector,
     context */

  if (source == SWITCH_TSS_CALL) {
-    cpu_stw_kernel_ra(env, tss_base, env->tr.selector, retaddr);
+    /*
+ * Thanks to the probe_access above, we know the first two
+ * bytes addressed by  are writable too.
+ */
+    access_stw(, tss_base, env->tr.selector);
  new_eflags |= NT_MASK;
  }

Paolo






Re: Disassembler location

2024-07-10 Thread Richard Henderson

On 7/10/24 14:55, Paolo Bonzini wrote:
The others are not hosts, only targets.  By putting the file in target//, they do 
not need to add it to the "disassemblers" variable in meson.build---but they add it 
anyway. :)


We should clean that up.  :-)

r~



Re: Disassembler location

2024-07-10 Thread Richard Henderson

On 7/10/24 11:02, Michael Morrell wrote:

I'm working on a port to a new architecture and was noticing a discrepancy in where the disassembler code lives.  There is a 
file "target//disas.c" for 4 architectures (avr, loongarch, openrisc, and rx), but a file 
"disas/.c" for 14 architectures (if I counted right).  It seems the 4 architectures using 
"target//disas.c" are more recently added so I was wondering if that is now the preferred location.  I 
couldn't find information on this, but I wasn't sure where to look.

Any advice?


The older disas/arch.c files come from binutils, prior to the GPLv3 license change.  These 
are generally very old architectures, or not up to date.


The newer target/arch/disas.c are for architectures for which the translator and the 
disassembler share generated code via decodetree.  If you're implementing a new 
architecture from scratch, this is your best choice.


The "best" supported are those with support in system libcapstone.  :-)


r~



Re: [PATCH 10/10] target/i386/tcg: save current task state before loading new one

2024-07-10 Thread Richard Henderson

On 7/9/24 23:29, Paolo Bonzini wrote:

This is how the steps are ordered in the manual.  EFLAGS.NT is
overwritten after the fact in the saved image.

Signed-off-by: Paolo Bonzini
---
  target/i386/tcg/seg_helper.c | 85 +++-
  1 file changed, 45 insertions(+), 40 deletions(-)


Reviewed-by: Richard Henderson 

r~



Re: [PATCH 09/10] target/i386/tcg: use X86Access for TSS access

2024-07-10 Thread Richard Henderson

On 7/9/24 23:29, Paolo Bonzini wrote:

This takes care of probing the vaddr range in advance, and is also faster
because it avoids repeated TLB lookups.  It also matches the Intel manual
better, as it says "Checks that the current (old) TSS, new TSS, and all
segment descriptors used in the task switch are paged into system memory";
note however that it's not clear how the processor checks for segment
descriptors, and this check is not included in the AMD manual.

Signed-off-by: Paolo Bonzini 
---
  target/i386/tcg/seg_helper.c | 101 ++-
  1 file changed, 51 insertions(+), 50 deletions(-)

diff --git a/target/i386/tcg/seg_helper.c b/target/i386/tcg/seg_helper.c
index 25af9d4a4ec..77f2c65c3cf 100644
--- a/target/i386/tcg/seg_helper.c
+++ b/target/i386/tcg/seg_helper.c
@@ -27,6 +27,7 @@
  #include "exec/log.h"
  #include "helper-tcg.h"
  #include "seg_helper.h"
+#include "access.h"
  
  int get_pg_mode(CPUX86State *env)

  {
@@ -250,7 +251,7 @@ static int switch_tss_ra(CPUX86State *env, int tss_selector,
   uint32_t e1, uint32_t e2, int source,
   uint32_t next_eip, uintptr_t retaddr)
  {
-int tss_limit, tss_limit_max, type, old_tss_limit_max, old_type, v1, v2, i;
+int tss_limit, tss_limit_max, type, old_tss_limit_max, old_type, i;
  target_ulong tss_base;
  uint32_t new_regs[8], new_segs[6];
  uint32_t new_eflags, new_eip, new_cr3, new_ldt, new_trap;
@@ -258,6 +259,7 @@ static int switch_tss_ra(CPUX86State *env, int tss_selector,
  SegmentCache *dt;
  int index;
  target_ulong ptr;
+X86Access old, new;
  
  type = (e2 >> DESC_TYPE_SHIFT) & 0xf;

  LOG_PCALL("switch_tss: sel=0x%04x type=%d src=%d\n", tss_selector, type,
@@ -311,35 +313,44 @@ static int switch_tss_ra(CPUX86State *env, int 
tss_selector,
  raise_exception_err_ra(env, EXCP0A_TSS, tss_selector & 0xfffc, 
retaddr);
  }
  
+/* X86Access avoids memory exceptions during the task switch */

+access_prepare_mmu(, env, env->tr.base, old_tss_limit_max,
+  MMU_DATA_STORE, cpu_mmu_index_kernel(env), retaddr);
+
+if (source == SWITCH_TSS_CALL) {
+/* Probe for future write of parent task */
+probe_access(env, tss_base, 2, MMU_DATA_STORE,
+cpu_mmu_index_kernel(env), retaddr);
+}
+access_prepare_mmu(, env, tss_base, tss_limit,
+  MMU_DATA_LOAD, cpu_mmu_index_kernel(env), retaddr);


You're computing cpu_mmu_index_kernel 3 times.

This appears to be conservative in that you're requiring only 2 bytes (a minimum) of 0x68 
to be writable.  Is it legal to place the TSS at offset 0xffe of page 0, with the balance 
on page 1, with page 0 writable and page 1 read-only?  Otherwise I would think you could 
just check the entire TSS for writability.


Anyway, after the MMU_DATA_STORE probe, you have proved that 'X86Access new' contains an 
address range that may be stored.  So you can change the SWITCH_TSS_CALL store below to 
access_stw() too.



@@ -349,16 +360,6 @@ static int switch_tss_ra(CPUX86State *env, int 
tss_selector,
   chapters 12.2.5 and 13.2.4 on how to implement TSS Trap bit */
  (void)new_trap;
  
-/* NOTE: we must avoid memory exceptions during the task switch,

-   so we make dummy accesses before */
-/* XXX: it can still fail in some cases, so a bigger hack is
-   necessary to valid the TLB after having done the accesses */
-
-v1 = cpu_ldub_kernel_ra(env, env->tr.base, retaddr);
-v2 = cpu_ldub_kernel_ra(env, env->tr.base + old_tss_limit_max, retaddr);
-cpu_stb_kernel_ra(env, env->tr.base, v1, retaddr);
-cpu_stb_kernel_ra(env, env->tr.base + old_tss_limit_max, v2, retaddr);


OMG.

Looks like a fantastic cleanup overall.


r~



Re: [PATCH 08/10] target/i386/tcg: check for correct busy state before switching to a new task

2024-07-10 Thread Richard Henderson

On 7/9/24 23:29, Paolo Bonzini wrote:

This step is listed in the Intel manual: "Checks that the new task is available
(call, jump, exception, or interrupt) or busy (IRET return)".

The AMD manual lists the same operation under the "Preventing recursion"
paragraph of "12.3.4 Nesting Tasks", though it is not clear if the processor
checks the busy bit in the IRET case.

Signed-off-by: Paolo Bonzini
---
  target/i386/tcg/seg_helper.c | 5 +
  1 file changed, 5 insertions(+)


Reviewed-by: Richard Henderson 

r!



Re: [PATCH 07/10] target/i386/tcg: Use DPL-level accesses for interrupts and call gates

2024-07-10 Thread Richard Henderson

On 7/9/24 23:29, Paolo Bonzini wrote:

This fixes a bug wherein i386/tcg assumed an interrupt return using
the CALL or JMP instructions were always going from kernel or user mode to
kernel mode, when using a call gate. This assumption is violated if
the call gate has a DPL that is greater than 0.

In addition, the stack accesses should count as explicit, not implicit
("kernel" in QEMU code), so that SMAP is not applied if DPL=3.

Analyzed-by: Robert R. Henry
Resolves:https://gitlab.com/qemu-project/qemu/-/issues/249
Signed-off-by: Paolo Bonzini
---
  target/i386/tcg/seg_helper.c | 13 ++---
  1 file changed, 6 insertions(+), 7 deletions(-)


Reviewed-by: Richard Henderson 


r~



Re: [PATCH 06/10] target/i386/tcg: Compute MMU index once

2024-07-10 Thread Richard Henderson

On 7/9/24 23:29, Paolo Bonzini wrote:

Add the MMU index to the StackAccess struct, so that it can be cached
or (in the next patch) computed from information that is not in
CPUX86State.

Co-developed-by: Richard Henderson 
Signed-off-by: Richard Henderson 
Signed-off-by: Paolo Bonzini 


Reviewed-by: Richard Henderson 


r~



Re: [PATCH 03/10] target/i386/tcg: use PUSHL/PUSHW for error code

2024-07-10 Thread Richard Henderson

On 7/9/24 23:29, Paolo Bonzini wrote:

Do not pre-decrement esp, let the macros subtract the appropriate
operand size.

Signed-off-by: Paolo Bonzini
---
  target/i386/tcg/seg_helper.c | 16 +++-
  1 file changed, 7 insertions(+), 9 deletions(-)


Reviewed-by: Richard Henderson 

r~



Re: [PATCH 02/10] target/i386/tcg: Allow IRET from user mode to user mode with SMAP

2024-07-10 Thread Richard Henderson

On 7/9/24 23:29, Paolo Bonzini wrote:

This fixes a bug wherein i386/tcg assumed an interrupt return using
the IRET instruction was always returning from kernel mode to either
kernel mode or user mode. This assumption is violated when IRET is used
as a clever way to restore thread state, as for example in the dotnet
runtime. There, IRET returns from user mode to user mode.

This bug is that stack accesses from IRET and RETF, as well as accesses
to the parameters in a call gate, are normal data accesses using the
current CPL.  This manifested itself as a page fault in the guest Linux
kernel due to SMAP preventing the access.

This bug appears to have been in QEMU since the beginning.

Analyzed-by: Robert R. Henry
Co-developed-by: Robert R. Henry
Signed-off-by: Robert R. Henry
Signed-off-by: Paolo Bonzini
---
  target/i386/tcg/seg_helper.c | 18 +-
  1 file changed, 9 insertions(+), 9 deletions(-)


Reviewed-by: Richard Henderson 


r~



Re: [PATCH] target/i386/tcg: fix POP to memory in long mode

2024-07-10 Thread Richard Henderson

On 7/10/24 07:13, Paolo Bonzini wrote:

In long mode, POP to memory will write a full 64-bit value.  However,
the call to gen_writeback() in gen_POP will use MO_32 because the
decoding table is incorrect.

The bug was latent until commit aea49fbb01a ("target/i386: use gen_writeback()
within gen_POP()", 2024-06-08), and then became visible because gen_op_st_v
now receives op->ot instead of the "ot" returned by gen_pop_T0.

Analyzed-by: Clément Chigot
Fixes: 5e9e21bcc4d ("target/i386: move 60-BF opcodes to new decoder", 
2024-05-07)
Tested-by: Clément Chigot
Signed-off-by: Paolo Bonzini
---
  target/i386/tcg/decode-new.c.inc | 2 +-
  target/i386/tcg/emit.c.inc   | 2 ++
  2 files changed, 3 insertions(+), 1 deletion(-)


Reviewed-by: Richard Henderson 

r~



Re: [PATCH v2 09/13] target/ppc: Improve helper_dcbz for user-only

2024-07-10 Thread Richard Henderson

On 7/10/24 05:25, BALATON Zoltan wrote:

On Tue, 9 Jul 2024, Richard Henderson wrote:

Mark the reserve_addr check unlikely.  Use tlb_vaddr_to_host
instead of probe_write, relying on the memset itself to test
for page writability.  Use set/clear_helper_retaddr so that
we can properly unwind on segfault.

With this, a trivial loop around guest memset will spend
nearly 50% of runtime within helper_dcbz and host memset.

Signed-off-by: Richard Henderson 
---
target/ppc/mem_helper.c | 14 ++
1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/target/ppc/mem_helper.c b/target/ppc/mem_helper.c
index 24bae3b80c..fa4c4f9fa9 100644
--- a/target/ppc/mem_helper.c
+++ b/target/ppc/mem_helper.c
@@ -280,20 +280,26 @@ static void dcbz_common(CPUPPCState *env, target_ulong 
addr,
    addr &= mask;

    /* Check reservation */
-    if ((env->reserve_addr & mask) == addr)  {
+    if (unlikely((env->reserve_addr & mask) == addr))  {
    env->reserve_addr = (target_ulong)-1ULL;
    }

    /* Try fast path translate */
+#ifdef CONFIG_USER_ONLY
+    haddr = tlb_vaddr_to_host(env, addr, MMU_DATA_STORE, mmu_idx);
+#else
    haddr = probe_write(env, addr, dcbz_size, mmu_idx, retaddr);
-    if (haddr) {
-    memset(haddr, 0, dcbz_size);
-    } else {
+    if (unlikely(!haddr)) {
    /* Slow path */
    for (int i = 0; i < dcbz_size; i += 8) {
    cpu_stq_mmuidx_ra(env, addr + i, 0, mmu_idx, retaddr);
    }


Is a return needed here to only get to memset below when haddr != NULL?


Oops, yes.


r~



  1   2   3   4   5   6   7   8   9   10   >