date:20230807

Re: [PATCH 3/3] docs: update hw/nvme documentation for protection information

2023-08-07 Thread Klaus Jensen

On Aug  8 02:57, Ankit Kumar wrote:
> Add missing entry for pif ("protection information format").
> Protection information size can be 8 or 16 bytes, Update the pil entry
> as per the NVM command set specification.
> 
> Signed-off-by: Ankit Kumar 
> ---
>  docs/system/devices/nvme.rst | 10 +++---
>  1 file changed, 7 insertions(+), 3 deletions(-)
> 
> diff --git a/docs/system/devices/nvme.rst b/docs/system/devices/nvme.rst
> index 2a3af268f7..30d46d9338 100644
> --- a/docs/system/devices/nvme.rst
> +++ b/docs/system/devices/nvme.rst
> @@ -271,9 +271,13 @@ The virtual namespace device supports DIF- and DIX-based 
> protection information
>  
>  ``pil=UINT8`` (default: ``0``)
>Controls the location of the protection information within the metadata. 
> Set
> -  to ``1`` to transfer protection information as the first eight bytes of
> -  metadata. Otherwise, the protection information is transferred as the last
> -  eight bytes.
> +  to ``1`` to transfer protection information as the first bytes of metadata.
> +  Otherwise, the protection information is transferred as the last bytes of
> +  metadata.
> +
> +``pif=UINT8`` (default: ``0``)
> +  By default, the namespace device uses 16 bit guard protection information
> +  format. Set to ``2`` to enable 64 bit guard protection information format.
>  

I'll add a small note that pif=1 (32b guard) is not supported.

Thanks,

Reviewed-by: Klaus Jensen 


signature.asc
Description: PGP signature

Re: [PATCH 1/3] hw/nvme: fix CRC64 for guard tag

2023-08-07 Thread Klaus Jensen

On Aug  8 02:57, Ankit Kumar wrote:
> The nvme CRC64 generator expects the caller to pass inverted seed value.
> Pass inverted crc value for metadata buffer.
> 
> Signed-off-by: Ankit Kumar 
> ---
>  hw/nvme/dif.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/nvme/dif.c b/hw/nvme/dif.c
> index 63c44c86ab..01b19c3373 100644
> --- a/hw/nvme/dif.c
> +++ b/hw/nvme/dif.c
> @@ -115,7 +115,7 @@ static void 
> nvme_dif_pract_generate_dif_crc64(NvmeNamespace *ns, uint8_t *buf,
>  uint64_t crc = crc64_nvme(~0ULL, buf, ns->lbasz);
>  
>  if (pil) {
> -crc = crc64_nvme(crc, mbuf, pil);
> +crc = crc64_nvme(~crc, mbuf, pil);
>  }
>  
>  dif->g64.guard = cpu_to_be64(crc);
> @@ -246,7 +246,7 @@ static uint16_t nvme_dif_prchk_crc64(NvmeNamespace *ns, 
> NvmeDifTuple *dif,
>  uint64_t crc = crc64_nvme(~0ULL, buf, ns->lbasz);
>  
>  if (pil) {
> -crc = crc64_nvme(crc, mbuf, pil);
> +crc = crc64_nvme(~crc, mbuf, pil);
>  }
>  
>  trace_pci_nvme_dif_prchk_guard_crc64(be64_to_cpu(dif->g64.guard), 
> crc);
> -- 
> 2.25.1
> 

Good catch, thanks!

Reviewed-by: Klaus Jensen 


signature.asc
Description: PGP signature

Re: [PATCH 2/3] hw/nvme: fix disable pi checks for Type 3 protection

2023-08-07 Thread Klaus Jensen

On Aug  8 02:57, Ankit Kumar wrote:
> As per the NVM command set specification, the protection information
> checks for Type 3 protection are disabled, only when both application
> and reference tag have all bits set to 1.
> 
> Signed-off-by: Ankit Kumar 
> ---
>  hw/nvme/dif.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/nvme/dif.c b/hw/nvme/dif.c
> index 01b19c3373..f9bd29a2a6 100644
> --- a/hw/nvme/dif.c
> +++ b/hw/nvme/dif.c
> @@ -157,7 +157,8 @@ static uint16_t nvme_dif_prchk_crc16(NvmeNamespace *ns, 
> NvmeDifTuple *dif,
>  {
>  switch (NVME_ID_NS_DPS_TYPE(ns->id_ns.dps)) {
>  case NVME_ID_NS_DPS_TYPE_3:
> -if (be32_to_cpu(dif->g16.reftag) != 0x) {
> +if ((be32_to_cpu(dif->g16.reftag) != 0x) ||
> +(be16_to_cpu(dif->g16.apptag) != 0x)) {
>  break;
>  }

For type 3, if reftag is 0x the NVME_ID_NS_DPS_TYPE_3 case will
fallthrough to the next cases (_TYPE_1 and _TYPE_2), checking if apptag
is 0x, and disable checking if so. 

>  
> @@ -225,7 +226,7 @@ static uint16_t nvme_dif_prchk_crc64(NvmeNamespace *ns, 
> NvmeDifTuple *dif,
>  
>  switch (NVME_ID_NS_DPS_TYPE(ns->id_ns.dps)) {
>  case NVME_ID_NS_DPS_TYPE_3:
> -if (r != 0x) {
> +if (r != 0x || (be16_to_cpu(dif->g64.apptag) != 0x)) 
> {
>  break;
>  }

Same here.


signature.asc
Description: PGP signature

[PATCH] target/loongarch: Split fcc register to fcc0-7 in gdbstub

2023-08-07 Thread Jiajie Chen

Since GDB 13.1(GDB commit ea3352172), GDB LoongArch changed to use
fcc0-7 instead of fcc register. This commit partially reverts commit
2f149c759 (`target/loongarch: Update gdb_set_fpu() and gdb_get_fpu()`)
to match the behavior of GDB.

Note that it is a breaking change for GDB 13.0 or earlier, but it is
also required for GDB 13.1 or later to work.

Signed-off-by: Jiajie Chen 
---
 gdb-xml/loongarch-fpu.xml  |  9 -
 target/loongarch/gdbstub.c | 16 +++-
 2 files changed, 15 insertions(+), 10 deletions(-)

diff --git a/gdb-xml/loongarch-fpu.xml b/gdb-xml/loongarch-fpu.xml
index 78e42cf5dd..e81e3382e7 100644
--- a/gdb-xml/loongarch-fpu.xml
+++ b/gdb-xml/loongarch-fpu.xml
@@ -45,6 +45,13 @@
   
   
   
-  
+  
+  
+  
+  
+  
+  
+  
+  
   
 
diff --git a/target/loongarch/gdbstub.c b/target/loongarch/gdbstub.c
index 0752fff924..15ad6778f1 100644
--- a/target/loongarch/gdbstub.c
+++ b/target/loongarch/gdbstub.c
@@ -70,10 +70,9 @@ static int loongarch_gdb_get_fpu(CPULoongArchState *env,
 {
 if (0 <= n && n < 32) {
 return gdb_get_reg64(mem_buf, env->fpr[n].vreg.D(0));
-} else if (n == 32) {
-uint64_t val = read_fcc(env);
-return gdb_get_reg64(mem_buf, val);
-} else if (n == 33) {
+} else if (32 <= n && n < 40) {
+return gdb_get_reg8(mem_buf, env->cf[n - 32]);
+} else if (n == 40) {
 return gdb_get_reg32(mem_buf, env->fcsr0);
 }
 return 0;
@@ -87,11 +86,10 @@ static int loongarch_gdb_set_fpu(CPULoongArchState *env,
 if (0 <= n && n < 32) {
 env->fpr[n].vreg.D(0) = ldq_p(mem_buf);
 length = 8;
-} else if (n == 32) {
-uint64_t val = ldq_p(mem_buf);
-write_fcc(env, val);
-length = 8;
-} else if (n == 33) {
+} else if (32 <= n && n < 40) {
+env->cf[n - 32] = ldub_p(mem_buf);
+length = 1;
+} else if (n == 40) {
 env->fcsr0 = ldl_p(mem_buf);
 length = 4;
 }
-- 
2.41.0

[PATCH v2 15/19] spapr: Fix machine reset deadlock from replay-record

2023-08-07 Thread Nicholas Piggin

When the machine is reset to load a new snapshot while being debugged
with replay-record, it is done from another thread, so the CPU does
not run the register setting operations. Set CPU registers directly in
machine reset.

Cc: Pavel Dovgalyuk 
Signed-off-by: Nicholas Piggin 
---
 hw/ppc/spapr.c | 20 ++--
 include/hw/ppc/spapr.h |  1 +
 target/ppc/compat.c| 19 +++
 target/ppc/cpu.h   |  1 +
 4 files changed, 39 insertions(+), 2 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 1c8b8d57a7..7d84244f03 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1322,6 +1322,22 @@ void spapr_set_all_lpcrs(target_ulong value, 
target_ulong mask)
 }
 }
 
+/* May be used when the machine is not running */
+void spapr_init_all_lpcrs(target_ulong value, target_ulong mask)
+{
+CPUState *cs;
+CPU_FOREACH(cs) {
+PowerPCCPU *cpu = POWERPC_CPU(cs);
+CPUPPCState *env = >env;
+target_ulong lpcr;
+
+lpcr = env->spr[SPR_LPCR];
+lpcr &= ~(LPCR_HR | LPCR_UPRT);
+ppc_store_lpcr(cpu, lpcr);
+}
+}
+
+
 static bool spapr_get_pate(PPCVirtualHypervisor *vhyp, PowerPCCPU *cpu,
target_ulong lpid, ppc_v3_pate_t *entry)
 {
@@ -1583,7 +1599,7 @@ int spapr_reallocate_hpt(SpaprMachineState *spapr, int 
shift, Error **errp)
 }
 /* We're setting up a hash table, so that means we're not radix */
 spapr->patb_entry = 0;
-spapr_set_all_lpcrs(0, LPCR_HR | LPCR_UPRT);
+spapr_init_all_lpcrs(0, LPCR_HR | LPCR_UPRT);
 return 0;
 }
 
@@ -1661,7 +1677,7 @@ static void spapr_machine_reset(MachineState *machine, 
ShutdownCause reason)
 spapr_ovec_cleanup(spapr->ov5_cas);
 spapr->ov5_cas = spapr_ovec_new();
 
-ppc_set_compat_all(spapr->max_compat_pvr, _fatal);
+ppc_init_compat_all(spapr->max_compat_pvr, _fatal);
 
 /*
  * This is fixing some of the default configuration of the XIVE
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index 538b2dfb89..f47e8419a5 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -1012,6 +1012,7 @@ bool spapr_check_pagesize(SpaprMachineState *spapr, 
hwaddr pagesize,
 #define SPAPR_OV5_XIVE_BOTH 0x80 /* Only to advertise on the platform */
 
 void spapr_set_all_lpcrs(target_ulong value, target_ulong mask);
+void spapr_init_all_lpcrs(target_ulong value, target_ulong mask);
 hwaddr spapr_get_rtas_addr(void);
 bool spapr_memory_hot_unplug_supported(SpaprMachineState *spapr);
 
diff --git a/target/ppc/compat.c b/target/ppc/compat.c
index 7949a24f5a..ebef2cccec 100644
--- a/target/ppc/compat.c
+++ b/target/ppc/compat.c
@@ -229,6 +229,25 @@ int ppc_set_compat_all(uint32_t compat_pvr, Error **errp)
 return 0;
 }
 
+/* To be used when the machine is not running */
+int ppc_init_compat_all(uint32_t compat_pvr, Error **errp)
+{
+CPUState *cs;
+
+CPU_FOREACH(cs) {
+PowerPCCPU *cpu = POWERPC_CPU(cs);
+int ret;
+
+ret = ppc_set_compat(cpu, compat_pvr, errp);
+
+if (ret < 0) {
+return ret;
+}
+}
+
+return 0;
+}
+
 int ppc_compat_max_vthreads(PowerPCCPU *cpu)
 {
 const CompatInfo *compat = compat_by_pvr(cpu->compat_pvr);
diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
index 9e491e05eb..f8fe0db5cd 100644
--- a/target/ppc/cpu.h
+++ b/target/ppc/cpu.h
@@ -1504,6 +1504,7 @@ int ppc_set_compat(PowerPCCPU *cpu, uint32_t compat_pvr, 
Error **errp);
 
 #if !defined(CONFIG_USER_ONLY)
 int ppc_set_compat_all(uint32_t compat_pvr, Error **errp);
+int ppc_init_compat_all(uint32_t compat_pvr, Error **errp);
 #endif
 int ppc_compat_max_vthreads(PowerPCCPU *cpu);
 void ppc_compat_add_property(Object *obj, const char *name,
-- 
2.40.1

[PATCH v2 05/19] host-utils: Add muldiv64_round_up

2023-08-07 Thread Nicholas Piggin

This will be used for converting time intervals in different base units
to host units, for the purpose of scheduling timers to emulate target
timers. Timers typically must not fire before their requested expiry
time but may fire some time afterward, so rounding up is the right way
to implement these.

Signed-off-by: Nicholas Piggin 
---
 include/qemu/host-utils.h | 21 -
 1 file changed, 20 insertions(+), 1 deletion(-)

diff --git a/include/qemu/host-utils.h b/include/qemu/host-utils.h
index 011618373e..e2a50a567f 100644
--- a/include/qemu/host-utils.h
+++ b/include/qemu/host-utils.h
@@ -56,6 +56,11 @@ static inline uint64_t muldiv64(uint64_t a, uint32_t b, 
uint32_t c)
 return (__int128_t)a * b / c;
 }
 
+static inline uint64_t muldiv64_round_up(uint64_t a, uint32_t b, uint32_t c)
+{
+return ((__int128_t)a * b + c - 1) / c;
+}
+
 static inline uint64_t divu128(uint64_t *plow, uint64_t *phigh,
uint64_t divisor)
 {
@@ -83,7 +88,8 @@ void mulu64(uint64_t *plow, uint64_t *phigh, uint64_t a, 
uint64_t b);
 uint64_t divu128(uint64_t *plow, uint64_t *phigh, uint64_t divisor);
 int64_t divs128(uint64_t *plow, int64_t *phigh, int64_t divisor);
 
-static inline uint64_t muldiv64(uint64_t a, uint32_t b, uint32_t c)
+static inline uint64_t __muldiv64(uint64_t a, uint32_t b, uint32_t c,
+  bool round_up)
 {
 union {
 uint64_t ll;
@@ -99,12 +105,25 @@ static inline uint64_t muldiv64(uint64_t a, uint32_t b, 
uint32_t c)
 
 u.ll = a;
 rl = (uint64_t)u.l.low * (uint64_t)b;
+if (round_up) {
+rl += c - 1;
+}
 rh = (uint64_t)u.l.high * (uint64_t)b;
 rh += (rl >> 32);
 res.l.high = rh / c;
 res.l.low = (((rh % c) << 32) + (rl & 0x)) / c;
 return res.ll;
 }
+
+static inline uint64_t muldiv64(uint64_t a, uint32_t b, uint32_t c)
+{
+return __muldiv64(a, b, c, false);
+}
+
+static inline uint64_t muldiv64_round_up(uint64_t a, uint32_t b, uint32_t c)
+{
+return __muldiv64(a, b, c, true);
+}
 #endif
 
 /**
-- 
2.40.1

[PATCH v2 10/19] target/ppc: Migrate DECR SPR

2023-08-07 Thread Nicholas Piggin

TCG does not maintain the DEC reigster in the SPR array, so it does get
migrated. TCG also needs to re-start the decrementer timer on the
destination machine.

Load and store the decrementer into the SPR when migrating. This works
for the level-triggered (book3s) decrementer, and should be compatible
with existing KVM machines that do keep the DEC value there.

This fixes lost decrementer interrupt on migration that can cause
hangs, as well as other problems including record-replay bugs.

Signed-off-by: Nicholas Piggin 
---
 target/ppc/machine.c | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/target/ppc/machine.c b/target/ppc/machine.c
index 8234e35d69..8a190c4853 100644
--- a/target/ppc/machine.c
+++ b/target/ppc/machine.c
@@ -209,6 +209,14 @@ static int cpu_pre_save(void *opaque)
 /* Used to retain migration compatibility for pre 6.0 for 601 machines. */
 env->hflags_compat_nmsr = 0;
 
+if (tcg_enabled()) {
+/*
+ * TCG does not maintain the DECR spr (unlike KVM) so have to save
+ * it here.
+ */
+env->spr[SPR_DECR] = cpu_ppc_load_decr(env);
+}
+
 return 0;
 }
 
@@ -319,6 +327,12 @@ static int cpu_post_load(void *opaque, int version_id)
 ppc_update_ciabr(env);
 ppc_update_daw0(env);
 #endif
+/*
+ * TCG needs to re-start the decrementer timer and/or raise the
+ * interrupt. This works for level-triggered decrementer. Edge
+ * triggered types (including HDEC) would need to carry more state.
+ */
+cpu_ppc_store_decr(env, env->spr[SPR_DECR]);
 pmu_mmcr01_updated(env);
 }
 
-- 
2.40.1

[PATCH v2 17/19] tests/avocado: boot ppc64 pseries replay-record test to Linux VFS mount

2023-08-07 Thread Nicholas Piggin

This the ppc64 record-replay test is able to replay the full kernel boot
so try enabling it.

Acked-by: Pavel Dovgalyuk 
Signed-off-by: Nicholas Piggin 
---
 tests/avocado/replay_kernel.py | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/tests/avocado/replay_kernel.py b/tests/avocado/replay_kernel.py
index 79c607b0e7..a18610542e 100644
--- a/tests/avocado/replay_kernel.py
+++ b/tests/avocado/replay_kernel.py
@@ -255,8 +255,7 @@ def test_ppc64_pseries(self):
 kernel_path = self.fetch_asset(kernel_url, asset_hash=kernel_hash)
 
 kernel_command_line = self.KERNEL_COMMON_COMMAND_LINE + 'console=hvc0'
-# icount is not good enough for PPC64 for complete boot yet
-console_pattern = 'Kernel command line: %s' % kernel_command_line
+console_pattern = 'VFS: Cannot open root device'
 self.run_rr(kernel_path, kernel_command_line, console_pattern)
 
 def test_ppc64_powernv(self):
-- 
2.40.1

[PATCH v2 11/19] hw/ppc: Reset timebase facilities on machine reset

2023-08-07 Thread Nicholas Piggin

Lower interrupts, delete timers, and set time facility registers
back to initial state on machine reset.

This is not so important for record-replay since timebase and
decrementer are migrated, but it gives a cleaner reset state.

Cc: Mark Cave-Ayland 
Cc: BALATON Zoltan 
Signed-off-by: Nicholas Piggin 
---
 hw/ppc/mac_oldworld.c   |  1 +
 hw/ppc/pegasos2.c   |  1 +
 hw/ppc/pnv_core.c   |  2 ++
 hw/ppc/ppc.c| 46 +++--
 hw/ppc/prep.c   |  1 +
 hw/ppc/spapr_cpu_core.c |  2 ++
 include/hw/ppc/ppc.h|  3 ++-
 7 files changed, 35 insertions(+), 21 deletions(-)

diff --git a/hw/ppc/mac_oldworld.c b/hw/ppc/mac_oldworld.c
index 510ff0eaaf..9acc7adfc9 100644
--- a/hw/ppc/mac_oldworld.c
+++ b/hw/ppc/mac_oldworld.c
@@ -81,6 +81,7 @@ static void ppc_heathrow_reset(void *opaque)
 {
 PowerPCCPU *cpu = opaque;
 
+cpu_ppc_tb_reset(>env);
 cpu_reset(CPU(cpu));
 }
 
diff --git a/hw/ppc/pegasos2.c b/hw/ppc/pegasos2.c
index 075367d94d..bd397cf2b5 100644
--- a/hw/ppc/pegasos2.c
+++ b/hw/ppc/pegasos2.c
@@ -99,6 +99,7 @@ static void pegasos2_cpu_reset(void *opaque)
 cpu->env.gpr[1] = 2 * VOF_STACK_SIZE - 0x20;
 cpu->env.nip = 0x100;
 }
+cpu_ppc_tb_reset(>env);
 }
 
 static void pegasos2_pci_irq(void *opaque, int n, int level)
diff --git a/hw/ppc/pnv_core.c b/hw/ppc/pnv_core.c
index 9b39d527de..8c7afe037f 100644
--- a/hw/ppc/pnv_core.c
+++ b/hw/ppc/pnv_core.c
@@ -61,6 +61,8 @@ static void pnv_core_cpu_reset(PnvCore *pc, PowerPCCPU *cpu)
 hreg_compute_hflags(env);
 ppc_maybe_interrupt(env);
 
+cpu_ppc_tb_reset(env);
+
 pcc->intc_reset(pc->chip, cpu);
 }
 
diff --git a/hw/ppc/ppc.c b/hw/ppc/ppc.c
index d9a1cfbf91..f391acc39e 100644
--- a/hw/ppc/ppc.c
+++ b/hw/ppc/ppc.c
@@ -944,23 +944,6 @@ void cpu_ppc_store_purr(CPUPPCState *env, uint64_t value)
  _env->purr_offset, value);
 }
 
-static void cpu_ppc_set_tb_clk (void *opaque, uint32_t freq)
-{
-CPUPPCState *env = opaque;
-PowerPCCPU *cpu = env_archcpu(env);
-ppc_tb_t *tb_env = env->tb_env;
-
-tb_env->tb_freq = freq;
-tb_env->decr_freq = freq;
-/* There is a bug in Linux 2.4 kernels:
- * if a decrementer exception is pending when it enables msr_ee at startup,
- * it's not ready to handle it...
- */
-_cpu_ppc_store_decr(cpu, 0x, 0x, 32);
-_cpu_ppc_store_hdecr(cpu, 0x, 0x, 32);
-cpu_ppc_store_purr(env, 0xULL);
-}
-
 static void timebase_save(PPCTimebase *tb)
 {
 uint64_t ticks = cpu_get_host_ticks();
@@ -1062,7 +1045,7 @@ const VMStateDescription vmstate_ppc_timebase = {
 };
 
 /* Set up (once) timebase frequency (in Hz) */
-clk_setup_cb cpu_ppc_tb_init (CPUPPCState *env, uint32_t freq)
+void cpu_ppc_tb_init(CPUPPCState *env, uint32_t freq)
 {
 PowerPCCPU *cpu = env_archcpu(env);
 ppc_tb_t *tb_env;
@@ -1083,9 +1066,32 @@ clk_setup_cb cpu_ppc_tb_init (CPUPPCState *env, uint32_t 
freq)
 } else {
 tb_env->hdecr_timer = NULL;
 }
-cpu_ppc_set_tb_clk(env, freq);
 
-return _ppc_set_tb_clk;
+tb_env->tb_freq = freq;
+tb_env->decr_freq = freq;
+}
+
+void cpu_ppc_tb_reset(CPUPPCState *env)
+{
+PowerPCCPU *cpu = env_archcpu(env);
+ppc_tb_t *tb_env = env->tb_env;
+
+timer_del(tb_env->decr_timer);
+ppc_set_irq(cpu, PPC_INTERRUPT_DECR, 0);
+tb_env->decr_next = 0;
+if (tb_env->hdecr_timer != NULL) {
+timer_del(tb_env->hdecr_timer);
+ppc_set_irq(cpu, PPC_INTERRUPT_HDECR, 0);
+tb_env->hdecr_next = 0;
+}
+
+/* There is a bug in Linux 2.4 kernels:
+ * if a decrementer exception is pending when it enables msr_ee at startup,
+ * it's not ready to handle it...
+ */
+cpu_ppc_store_decr(env, -1);
+cpu_ppc_store_hdecr(env, -1);
+cpu_ppc_store_purr(env, 0xULL);
 }
 
 void cpu_ppc_tb_free(CPUPPCState *env)
diff --git a/hw/ppc/prep.c b/hw/ppc/prep.c
index d9231c7317..f6fd35fcb9 100644
--- a/hw/ppc/prep.c
+++ b/hw/ppc/prep.c
@@ -67,6 +67,7 @@ static void ppc_prep_reset(void *opaque)
 PowerPCCPU *cpu = opaque;
 
 cpu_reset(CPU(cpu));
+cpu_ppc_tb_reset(>env);
 }
 
 
diff --git a/hw/ppc/spapr_cpu_core.c b/hw/ppc/spapr_cpu_core.c
index b482d9754a..91fae56573 100644
--- a/hw/ppc/spapr_cpu_core.c
+++ b/hw/ppc/spapr_cpu_core.c
@@ -74,6 +74,8 @@ static void spapr_reset_vcpu(PowerPCCPU *cpu)
 
 kvm_check_mmu(cpu, _fatal);
 
+cpu_ppc_tb_reset(env);
+
 spapr_irq_cpu_intc_reset(spapr, cpu);
 }
 
diff --git a/include/hw/ppc/ppc.h b/include/hw/ppc/ppc.h
index e095c002dc..17a8dfc107 100644
--- a/include/hw/ppc/ppc.h
+++ b/include/hw/ppc/ppc.h
@@ -54,7 +54,8 @@ struct ppc_tb_t {
*/
 
 uint64_t cpu_ppc_get_tb(ppc_tb_t *tb_env, uint64_t vmclk, int64_t tb_offset);
-clk_setup_cb cpu_ppc_tb_init (CPUPPCState *env, uint32_t freq);
+void cpu_ppc_tb_init(CPUPPCState *env, uint32_t freq);
+void

[PATCH v2 19/19] tests/avocado: ppc64 reverse debugging tests for pseries and powernv

2023-08-07 Thread Nicholas Piggin

These machines run reverse-debugging well enough to pass basic tests.
Wire them up.

Reviewed-by: Pavel Dovgalyuk 
Signed-off-by: Nicholas Piggin 
---
 tests/avocado/reverse_debugging.py | 29 +
 1 file changed, 29 insertions(+)

diff --git a/tests/avocado/reverse_debugging.py 
b/tests/avocado/reverse_debugging.py
index 7d1a478df1..fc47874eda 100644
--- a/tests/avocado/reverse_debugging.py
+++ b/tests/avocado/reverse_debugging.py
@@ -233,3 +233,32 @@ def test_aarch64_virt(self):
 
 self.reverse_debugging(
 args=('-kernel', kernel_path))
+
+class ReverseDebugging_ppc64(ReverseDebugging):
+"""
+:avocado: tags=accel:tcg
+"""
+
+REG_PC = 0x40
+
+# unidentified gitlab timeout problem
+@skipIf(os.getenv('GITLAB_CI'), 'Running on GitLab')
+def test_ppc64_pseries(self):
+"""
+:avocado: tags=arch:ppc64
+:avocado: tags=machine:pseries
+"""
+# SLOF branches back to its entry point, which causes this test
+# to take the 'hit a breakpoint again' path. That's not a problem,
+# just slightly different than the other machines.
+self.endian_is_le = False
+self.reverse_debugging()
+
+@skipIf(os.getenv('GITLAB_CI'), 'Running on GitLab')
+def test_ppc64_powernv(self):
+"""
+:avocado: tags=arch:ppc64
+:avocado: tags=machine:powernv
+"""
+self.endian_is_le = False
+self.reverse_debugging()
-- 
2.40.1

[PATCH v2 16/19] spapr: Fix record-replay machine reset consuming too many events

2023-08-07 Thread Nicholas Piggin

spapr_machine_reset gets a random number to populate the device-tree
rng seed with. When loading a snapshot for record-replay, the machine
is reset again, and that tries to consume the random event record
again, crashing due to inconsistent record

Fix this by saving the seed to populate the device tree with, and
skipping the rng on snapshot load.

Acked-by: Pavel Dovgalyuk 
Signed-off-by: Nicholas Piggin 
---
 hw/ppc/spapr.c | 12 +---
 include/hw/ppc/spapr.h |  1 +
 2 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 7d84244f03..ecfbdb0030 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1022,7 +1022,6 @@ static void spapr_dt_chosen(SpaprMachineState *spapr, 
void *fdt, bool reset)
 {
 MachineState *machine = MACHINE(spapr);
 SpaprMachineClass *smc = SPAPR_MACHINE_GET_CLASS(machine);
-uint8_t rng_seed[32];
 int chosen;
 
 _FDT(chosen = fdt_add_subnode(fdt, 0, "chosen"));
@@ -1100,8 +1099,7 @@ static void spapr_dt_chosen(SpaprMachineState *spapr, 
void *fdt, bool reset)
 spapr_dt_ov5_platform_support(spapr, fdt, chosen);
 }
 
-qemu_guest_getrandom_nofail(rng_seed, sizeof(rng_seed));
-_FDT(fdt_setprop(fdt, chosen, "rng-seed", rng_seed, sizeof(rng_seed)));
+_FDT(fdt_setprop(fdt, chosen, "rng-seed", spapr->fdt_rng_seed, 32));
 
 _FDT(spapr_dt_ovec(fdt, chosen, spapr->ov5_cas, "ibm,architecture-vec-5"));
 }
@@ -1654,6 +1652,14 @@ static void spapr_machine_reset(MachineState *machine, 
ShutdownCause reason)
 void *fdt;
 int rc;
 
+if (reason != SHUTDOWN_CAUSE_SNAPSHOT_LOAD) {
+/*
+ * Record-replay snapshot load must not consume random, this was
+ * already replayed from initial machine reset.
+ */
+qemu_guest_getrandom_nofail(spapr->fdt_rng_seed, 32);
+}
+
 pef_kvm_reset(machine->cgs, _fatal);
 spapr_caps_apply(spapr);
 
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index f47e8419a5..f4bd204d86 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -204,6 +204,7 @@ struct SpaprMachineState {
 uint32_t fdt_size;
 uint32_t fdt_initial_size;
 void *fdt_blob;
+uint8_t fdt_rng_seed[32];
 long kernel_size;
 bool kernel_le;
 uint64_t kernel_addr;
-- 
2.40.1

[PATCH v2 14/19] target/ppc: Fix timebase reset with record-replay

2023-08-07 Thread Nicholas Piggin

Timebase save uses a random number for a legacy vmstate field, which
makes rr snapshot loading unbalanced. The easiest way to deal with this
is just to skip the rng if record-replay is active.

Reviewed-by: Pavel Dovgalyuk 
Signed-off-by: Nicholas Piggin 
---
 hw/ppc/ppc.c | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/hw/ppc/ppc.c b/hw/ppc/ppc.c
index a0ee064b1d..87df914600 100644
--- a/hw/ppc/ppc.c
+++ b/hw/ppc/ppc.c
@@ -32,6 +32,7 @@
 #include "qemu/main-loop.h"
 #include "qemu/error-report.h"
 #include "sysemu/kvm.h"
+#include "sysemu/replay.h"
 #include "sysemu/runstate.h"
 #include "kvm_ppc.h"
 #include "migration/vmstate.h"
@@ -976,8 +977,14 @@ static void timebase_save(PPCTimebase *tb)
 return;
 }
 
-/* not used anymore, we keep it for compatibility */
-tb->time_of_the_day_ns = qemu_clock_get_ns(QEMU_CLOCK_HOST);
+if (replay_mode == REPLAY_MODE_NONE) {
+/* not used anymore, we keep it for compatibility */
+tb->time_of_the_day_ns = qemu_clock_get_ns(QEMU_CLOCK_HOST);
+} else {
+/* simpler for record-replay to avoid this event, compat not needed */
+tb->time_of_the_day_ns = 0;
+}
+
 /*
  * tb_offset is only expected to be changed by QEMU so
  * there is no need to update it from KVM here
-- 
2.40.1

[PATCH v2 13/19] target/ppc: Fix CPU reservation migration for record-replay

2023-08-07 Thread Nicholas Piggin

ppc only migrates reserve_addr, so the destination machine can get a
valid reservation with an incorrect reservation value of 0. Prior to
commit 392d328abe753 ("target/ppc: Ensure stcx size matches larx"),
this could permit a stcx. to incorrectly succeed. That commit
inadvertently fixed that bug because the target machine starts with an
impossible reservation size of 0, so any stcx. will fail.

This behaviour is permitted by the ISA because reservation loss may
have implementation-dependent cause. What's more, with KVM machines it
is impossible save or reasonably restore reservation state. However if
the vmstate is being used for record-replay, the reservation must be
saved and restored exactly in order for execution from snapshot to
match the record.

This patch deprecates the existing incomplete reserve_addr vmstate,
and adds a new vmstate subsection with complete reservation state.
The new vmstate is needed only when record-replay mode is active.

Acked-by: Pavel Dovgalyuk 
Signed-off-by: Nicholas Piggin 
---
 target/ppc/cpu.h   |  2 ++
 target/ppc/machine.c   | 26 --
 target/ppc/translate.c |  4 
 3 files changed, 30 insertions(+), 2 deletions(-)

diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
index 2777ea3110..9e491e05eb 100644
--- a/target/ppc/cpu.h
+++ b/target/ppc/cpu.h
@@ -1121,7 +1121,9 @@ struct CPUArchState {
 target_ulong reserve_addr;   /* Reservation address */
 target_ulong reserve_length; /* Reservation larx op size (bytes) */
 target_ulong reserve_val;/* Reservation value */
+#if defined(TARGET_PPC64)
 target_ulong reserve_val2;
+#endif
 
 /* These are used in supervisor mode only */
 target_ulong msr;  /* machine state register */
diff --git a/target/ppc/machine.c b/target/ppc/machine.c
index 8a190c4853..ad7b4f6338 100644
--- a/target/ppc/machine.c
+++ b/target/ppc/machine.c
@@ -10,6 +10,7 @@
 #include "qemu/main-loop.h"
 #include "kvm_ppc.h"
 #include "power8-pmu.h"
+#include "sysemu/replay.h"
 
 static void post_load_update_msr(CPUPPCState *env)
 {
@@ -690,6 +691,27 @@ static const VMStateDescription vmstate_compat = {
 }
 };
 
+static bool reservation_needed(void *opaque)
+{
+return (replay_mode != REPLAY_MODE_NONE);
+}
+
+static const VMStateDescription vmstate_reservation = {
+.name = "cpu/reservation",
+.version_id = 1,
+.minimum_version_id = 1,
+.needed = reservation_needed,
+.fields = (VMStateField[]) {
+VMSTATE_UINTTL(env.reserve_addr, PowerPCCPU),
+VMSTATE_UINTTL(env.reserve_length, PowerPCCPU),
+VMSTATE_UINTTL(env.reserve_val, PowerPCCPU),
+#if defined(TARGET_PPC64)
+VMSTATE_UINTTL(env.reserve_val2, PowerPCCPU),
+#endif
+VMSTATE_END_OF_LIST()
+}
+};
+
 const VMStateDescription vmstate_ppc_cpu = {
 .name = "cpu",
 .version_id = 5,
@@ -711,8 +733,7 @@ const VMStateDescription vmstate_ppc_cpu = {
 VMSTATE_UINTTL_ARRAY(env.spr, PowerPCCPU, 1024),
 VMSTATE_UINT64(env.spe_acc, PowerPCCPU),
 
-/* Reservation */
-VMSTATE_UINTTL(env.reserve_addr, PowerPCCPU),
+VMSTATE_UNUSED(sizeof(target_ulong)), /* was env.reserve_addr */
 
 /* Supervisor mode architected state */
 VMSTATE_UINTTL(env.msr, PowerPCCPU),
@@ -741,6 +762,7 @@ const VMStateDescription vmstate_ppc_cpu = {
 _tlbemb,
 _tlbmas,
 _compat,
+_reservation,
 NULL
 }
 };
diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index b8c7f38ccd..4a60aefd8f 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -77,7 +77,9 @@ static TCGv cpu_xer, cpu_so, cpu_ov, cpu_ca, cpu_ov32, 
cpu_ca32;
 static TCGv cpu_reserve;
 static TCGv cpu_reserve_length;
 static TCGv cpu_reserve_val;
+#if defined(TARGET_PPC64)
 static TCGv cpu_reserve_val2;
+#endif
 static TCGv cpu_fpscr;
 static TCGv_i32 cpu_access_type;
 
@@ -151,9 +153,11 @@ void ppc_translate_init(void)
 cpu_reserve_val = tcg_global_mem_new(cpu_env,
  offsetof(CPUPPCState, reserve_val),
  "reserve_val");
+#if defined(TARGET_PPC64)
 cpu_reserve_val2 = tcg_global_mem_new(cpu_env,
   offsetof(CPUPPCState, reserve_val2),
   "reserve_val2");
+#endif
 
 cpu_fpscr = tcg_global_mem_new(cpu_env,
offsetof(CPUPPCState, fpscr), "fpscr");
-- 
2.40.1

[PATCH v2 06/19] hw/ppc: Round up the decrementer interval when converting to ns

2023-08-07 Thread Nicholas Piggin

The rule of timers is typically that they should never expire before the
timeout, but some time afterward. Rounding timer intervals up when doing
conversion is the right thing to do.

Under most circumstances it is impossible observe the decrementer
interrupt before the dec register has triggered. However with icount
timing, problems can arise. For example setting DEC to 0 can schedule
the timer for now, causing it to fire before any more instructions
have been executed and DEC is still 0.

Signed-off-by: Nicholas Piggin 
---
 hw/ppc/ppc.c | 31 +++
 1 file changed, 19 insertions(+), 12 deletions(-)

diff --git a/hw/ppc/ppc.c b/hw/ppc/ppc.c
index 423a3a117a..13eb45f4b7 100644
--- a/hw/ppc/ppc.c
+++ b/hw/ppc/ppc.c
@@ -482,14 +482,26 @@ void ppce500_set_mpic_proxy(bool enabled)
 /*/
 /* PowerPC time base and decrementer emulation */
 
+/*
+ * Conversion between QEMU_CLOCK_VIRTUAL ns and timebase (TB) ticks:
+ * TB ticks are arrived at by multiplying tb_freq then dividing by
+ * ns per second, and rounding down. TB ticks drive all clocks and
+ * timers in the target machine.
+ *
+ * Converting TB intervals to ns for the purpose of setting a
+ * QEMU_CLOCK_VIRTUAL timer should go the other way, but rounding
+ * up. Rounding down could cause the timer to fire before the TB
+ * value has been reached.
+ */
 static uint64_t ns_to_tb(uint32_t freq, int64_t clock)
 {
 return muldiv64(clock, freq, NANOSECONDS_PER_SECOND);
 }
 
-static int64_t tb_to_ns(uint32_t freq, uint64_t tb)
+/* virtual clock in TB ticks, not adjusted by TB offset */
+static int64_t tb_to_ns_round_up(uint32_t freq, uint64_t tb)
 {
-return muldiv64(tb, NANOSECONDS_PER_SECOND, freq);
+return muldiv64_round_up(tb, NANOSECONDS_PER_SECOND, freq);
 }
 
 uint64_t cpu_ppc_get_tb(ppc_tb_t *tb_env, uint64_t vmclk, int64_t tb_offset)
@@ -847,7 +859,7 @@ static void __cpu_ppc_store_decr(PowerPCCPU *cpu, uint64_t 
*nextp,
 
 /* Calculate the next timer event */
 now = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
-next = now + tb_to_ns(tb_env->decr_freq, value);
+next = now + tb_to_ns_round_up(tb_env->decr_freq, value);
 *nextp = next;
 
 /* Adjust timer */
@@ -1139,9 +1151,7 @@ static void cpu_4xx_fit_cb (void *opaque)
 /* Cannot occur, but makes gcc happy */
 return;
 }
-next = now + tb_to_ns(tb_env->tb_freq, next);
-if (next == now)
-next++;
+next = now + tb_to_ns_round_up(tb_env->tb_freq, next);
 timer_mod(ppc40x_timer->fit_timer, next);
 env->spr[SPR_40x_TSR] |= 1 << 26;
 if ((env->spr[SPR_40x_TCR] >> 23) & 0x1) {
@@ -1167,11 +1177,10 @@ static void start_stop_pit (CPUPPCState *env, ppc_tb_t 
*tb_env, int is_excp)
 } else {
 trace_ppc4xx_pit_start(ppc40x_timer->pit_reload);
 now = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
-next = now + tb_to_ns(tb_env->decr_freq, ppc40x_timer->pit_reload);
+next = now + tb_to_ns_round_up(tb_env->decr_freq,
+   ppc40x_timer->pit_reload);
 if (is_excp)
 next += tb_env->decr_next - now;
-if (next == now)
-next++;
 timer_mod(tb_env->decr_timer, next);
 tb_env->decr_next = next;
 }
@@ -1226,9 +1235,7 @@ static void cpu_4xx_wdt_cb (void *opaque)
 /* Cannot occur, but makes gcc happy */
 return;
 }
-next = now + tb_to_ns(tb_env->decr_freq, next);
-if (next == now)
-next++;
+next = now + tb_to_ns_round_up(tb_env->decr_freq, next);
 trace_ppc4xx_wdt(env->spr[SPR_40x_TCR], env->spr[SPR_40x_TSR]);
 switch ((env->spr[SPR_40x_TSR] >> 30) & 0x3) {
 case 0x0:
-- 
2.40.1

[PATCH v2 18/19] tests/avocado: reverse-debugging cope with re-executing breakpoints

2023-08-07 Thread Nicholas Piggin

The reverse-debugging test creates a trace, then replays it and:

1. Steps the first 10 instructions and records their addresses.
2. Steps backward and verifies their addresses match.
3. Runs to (near) the end of the trace.
4. Sets breakpoints on the first 10 instructions.
5. Continues backward and verifies execution stops at the last
   breakpoint.

Step 5 breaks if any of the other 9 breakpoints are re-executed in the
trace after the 10th instruction is run, because those will be
unexpectedly hit when reverse continuing. This situation does arise
with the ppc pseries machine, the SLOF bios branches to its own entry
point.

Deal with this by switching steps 3 and 4, so the trace will be run to
the end *or* one of the breakpoints being re-executed. Step 5 then
reverses from there to the 10th instruction will not hit a breakpoint in
between, by definition.

Another step is added between steps 2 and 3, which steps forward over
the first 10 instructions and verifies their addresses, to support this.

Reviewed-by: Pavel Dovgalyuk 
Signed-off-by: Nicholas Piggin 
---
 tests/avocado/reverse_debugging.py | 25 +
 1 file changed, 21 insertions(+), 4 deletions(-)

diff --git a/tests/avocado/reverse_debugging.py 
b/tests/avocado/reverse_debugging.py
index 680c314cfc..7d1a478df1 100644
--- a/tests/avocado/reverse_debugging.py
+++ b/tests/avocado/reverse_debugging.py
@@ -150,16 +150,33 @@ def reverse_debugging(self, shift=7, args=None):
 self.check_pc(g, addr)
 logger.info('found position %x' % addr)
 
-logger.info('seeking to the end (icount %s)' % (last_icount - 1))
-vm.qmp('replay-break', icount=last_icount - 1)
-# continue - will return after pausing
-g.cmd(b'c', b'T02thread:01;')
+# visit the recorded instruction in forward order
+logger.info('stepping forward')
+for addr in steps:
+self.check_pc(g, addr)
+self.gdb_step(g)
+logger.info('found position %x' % addr)
 
+# set breakpoints for the instructions just stepped over
 logger.info('setting breakpoints')
 for addr in steps:
 # hardware breakpoint at addr with len=1
 g.cmd(b'Z1,%x,1' % addr, b'OK')
 
+# this may hit a breakpoint if first instructions are executed
+# again
+logger.info('continuing execution')
+vm.qmp('replay-break', icount=last_icount - 1)
+# continue - will return after pausing
+# This could stop at the end and get a T02 return, or by
+# re-executing one of the breakpoints and get a T05 return.
+g.cmd(b'c')
+if self.vm_get_icount(vm) == last_icount - 1:
+logger.info('reached the end (icount %s)' % (last_icount - 1))
+else:
+logger.info('hit a breakpoint again at %x (icount %s)' %
+(self.get_pc(g), self.vm_get_icount(vm)))
+
 logger.info('running reverse continue to reach %x' % steps[-1])
 # reverse continue - will return after stopping at the breakpoint
 g.cmd(b'bc', b'T05thread:01;')
-- 
2.40.1

[PATCH v2 12/19] hw/ppc: Read time only once to perform decrementer write

2023-08-07 Thread Nicholas Piggin

Reading the time more than once to perform an operation always increases
complexity and fragility due to introduced deltas. Simplify the
decrementer write by reading the clock once for the operation.

Signed-off-by: Nicholas Piggin 
---
 hw/ppc/ppc.c | 84 +---
 1 file changed, 53 insertions(+), 31 deletions(-)

diff --git a/hw/ppc/ppc.c b/hw/ppc/ppc.c
index f391acc39e..a0ee064b1d 100644
--- a/hw/ppc/ppc.c
+++ b/hw/ppc/ppc.c
@@ -708,13 +708,13 @@ bool ppc_decr_clear_on_delivery(CPUPPCState *env)
 return ((tb_env->flags & flags) == PPC_DECR_UNDERFLOW_TRIGGERED);
 }
 
-static inline int64_t _cpu_ppc_load_decr(CPUPPCState *env, uint64_t next)
+static inline int64_t __cpu_ppc_load_decr(CPUPPCState *env, int64_t now,
+  uint64_t next)
 {
 ppc_tb_t *tb_env = env->tb_env;
-uint64_t now, n;
+uint64_t n;
 int64_t decr;
 
-now = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
 n = ns_to_tb(tb_env->decr_freq, now);
 if (next > n && tb_env->flags & PPC_TIMER_BOOKE) {
 decr = 0;
@@ -727,16 +727,12 @@ static inline int64_t _cpu_ppc_load_decr(CPUPPCState 
*env, uint64_t next)
 return decr;
 }
 
-target_ulong cpu_ppc_load_decr(CPUPPCState *env)
+static target_ulong _cpu_ppc_load_decr(CPUPPCState *env, int64_t now)
 {
 ppc_tb_t *tb_env = env->tb_env;
 uint64_t decr;
 
-if (kvm_enabled()) {
-return env->spr[SPR_DECR];
-}
-
-decr = _cpu_ppc_load_decr(env, tb_env->decr_next);
+decr = __cpu_ppc_load_decr(env, now, tb_env->decr_next);
 
 /*
  * If large decrementer is enabled then the decrementer is signed extened
@@ -750,14 +746,23 @@ target_ulong cpu_ppc_load_decr(CPUPPCState *env)
 return (uint32_t) decr;
 }
 
-target_ulong cpu_ppc_load_hdecr(CPUPPCState *env)
+target_ulong cpu_ppc_load_decr(CPUPPCState *env)
+{
+if (kvm_enabled()) {
+return env->spr[SPR_DECR];
+} else {
+return _cpu_ppc_load_decr(env, qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL));
+}
+}
+
+static target_ulong _cpu_ppc_load_hdecr(CPUPPCState *env, int64_t now)
 {
 PowerPCCPU *cpu = env_archcpu(env);
 PowerPCCPUClass *pcc = POWERPC_CPU_GET_CLASS(cpu);
 ppc_tb_t *tb_env = env->tb_env;
 uint64_t hdecr;
 
-hdecr =  _cpu_ppc_load_decr(env, tb_env->hdecr_next);
+hdecr =  __cpu_ppc_load_decr(env, now, tb_env->hdecr_next);
 
 /*
  * If we have a large decrementer (POWER9 or later) then hdecr is sign
@@ -771,6 +776,11 @@ target_ulong cpu_ppc_load_hdecr(CPUPPCState *env)
 return (uint32_t) hdecr;
 }
 
+target_ulong cpu_ppc_load_hdecr(CPUPPCState *env)
+{
+return _cpu_ppc_load_hdecr(env, qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL));
+}
+
 uint64_t cpu_ppc_load_purr (CPUPPCState *env)
 {
 ppc_tb_t *tb_env = env->tb_env;
@@ -815,7 +825,7 @@ static inline void cpu_ppc_hdecr_lower(PowerPCCPU *cpu)
 ppc_set_irq(cpu, PPC_INTERRUPT_HDECR, 0);
 }
 
-static void __cpu_ppc_store_decr(PowerPCCPU *cpu, uint64_t *nextp,
+static void __cpu_ppc_store_decr(PowerPCCPU *cpu, int64_t now, uint64_t *nextp,
  QEMUTimer *timer,
  void (*raise_excp)(void *),
  void (*lower_excp)(PowerPCCPU *),
@@ -824,7 +834,7 @@ static void __cpu_ppc_store_decr(PowerPCCPU *cpu, uint64_t 
*nextp,
 {
 CPUPPCState *env = >env;
 ppc_tb_t *tb_env = env->tb_env;
-uint64_t now, next;
+uint64_t next;
 int64_t signed_value;
 int64_t signed_decr;
 
@@ -836,18 +846,12 @@ static void __cpu_ppc_store_decr(PowerPCCPU *cpu, 
uint64_t *nextp,
 
 trace_ppc_decr_store(nr_bits, decr, value);
 
-if (kvm_enabled()) {
-/* KVM handles decrementer exceptions, we don't need our own timer */
-return;
-}
-
 /*
  * Calculate the next decrementer event and set a timer.
  * decr_next is in timebase units to keep rounding simple. Note it is
  * not adjusted by tb_offset because if TB changes via tb_offset changing,
  * decrementer does not change, so not directly comparable with TB.
  */
-now = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
 next = ns_to_tb(tb_env->decr_freq, now) + value;
 *nextp = next; /* nextp is in timebase units */
 
@@ -876,12 +880,13 @@ static void __cpu_ppc_store_decr(PowerPCCPU *cpu, 
uint64_t *nextp,
 timer_mod(timer, tb_to_ns_round_up(tb_env->decr_freq, next));
 }
 
-static inline void _cpu_ppc_store_decr(PowerPCCPU *cpu, target_ulong decr,
-   target_ulong value, int nr_bits)
+static inline void _cpu_ppc_store_decr(PowerPCCPU *cpu, int64_t now,
+   target_ulong decr, target_ulong value,
+   int nr_bits)
 {
 ppc_tb_t *tb_env = cpu->env.tb_env;
 
-__cpu_ppc_store_decr(cpu, _env->decr_next, tb_env->decr_timer,
+__cpu_ppc_store_decr(cpu, now, _env->decr_next, tb_env->decr_timer,

[PATCH v2 02/19] ppc/vof: Fix missed fields in VOF cleanup

2023-08-07 Thread Nicholas Piggin

Failing to reset the of_instance_last makes ihandle allocation continue
to increase, which causes record-replay replay fail to match the
recorded trace.

Not resetting claimed_base makes VOF eventually run out of memory after
some resets.

Cc: Alexey Kardashevskiy 
Fixes: fc8c745d501 ("spapr: Implement Open Firmware client interface")
Signed-off-by: Nicholas Piggin 
---
 hw/ppc/vof.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/hw/ppc/vof.c b/hw/ppc/vof.c
index 18c3f92317..e3b430a81f 100644
--- a/hw/ppc/vof.c
+++ b/hw/ppc/vof.c
@@ -1024,6 +1024,8 @@ void vof_cleanup(Vof *vof)
 }
 vof->claimed = NULL;
 vof->of_instances = NULL;
+vof->of_instance_last = 0;
+vof->claimed_base = 0;
 }
 
 void vof_build_dt(void *fdt, Vof *vof)
-- 
2.40.1

[PATCH v2 09/19] hw/ppc: Always store the decrementer value

2023-08-07 Thread Nicholas Piggin

When writing a value to the decrementer that raises an exception, the
irq is raised, but the value is not stored so the store doesn't appear
to have changed the register when it is read again.

Always store the write value to the register.

Fixes: e81a982aa53 ("PPC: Clean up DECR implementation")
Signed-off-by: Nicholas Piggin 
---
 hw/ppc/ppc.c | 20 ++--
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/hw/ppc/ppc.c b/hw/ppc/ppc.c
index fb4784793c..d9a1cfbf91 100644
--- a/hw/ppc/ppc.c
+++ b/hw/ppc/ppc.c
@@ -841,6 +841,16 @@ static void __cpu_ppc_store_decr(PowerPCCPU *cpu, uint64_t 
*nextp,
 return;
 }
 
+/*
+ * Calculate the next decrementer event and set a timer.
+ * decr_next is in timebase units to keep rounding simple. Note it is
+ * not adjusted by tb_offset because if TB changes via tb_offset changing,
+ * decrementer does not change, so not directly comparable with TB.
+ */
+now = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
+next = ns_to_tb(tb_env->decr_freq, now) + value;
+*nextp = next; /* nextp is in timebase units */
+
 /*
  * Going from 1 -> 0 or 0 -> -1 is the event to generate a DEC interrupt.
  *
@@ -862,16 +872,6 @@ static void __cpu_ppc_store_decr(PowerPCCPU *cpu, uint64_t 
*nextp,
 (*lower_excp)(cpu);
 }
 
-/*
- * Calculate the next decrementer event and set a timer.
- * decr_next is in timebase units to keep rounding simple. Note it is
- * not adjusted by tb_offset because if TB changes via tb_offset changing,
- * decrementer does not change, so not directly comparable with TB.
- */
-now = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
-next = ns_to_tb(tb_env->decr_freq, now) + value;
-*nextp = next;
-
 /* Adjust timer */
 timer_mod(timer, tb_to_ns_round_up(tb_env->decr_freq, next));
 }
-- 
2.40.1

[PATCH v2 for-8.2 00/19] ppc: record-replay enablement and fixes

2023-08-07 Thread Nicholas Piggin

The patches in this series has been seen a few times in various
iterations.

There are two main pieces, some assorted small fixes and tests for
record-replay, plus a large set of decrementer fixes. I merged these
into one series rather than send decrementer fixes alone first, because
record-replay has been very good at uncovering timer problems, so it's
good to have those test cases in at the same time IMO.

Some of the fixes we might take to stable, but unclear which.
Decrementer fixes were a bit of a tangle so maybe we just leave those
alone since they work okay.

The decrementer is not emulated perfectly still. Underflow from -ve
to +ve is not implemented, for one. I started doing that but it's not
trivial so better stop here for now.

For record-replay, pseries is now quite solid with rr. Surely some
issues to iron out but it is becoming usable.

powernv record-replay has some known problems migrating edge-triggered
decrementer, and edge triggered msgsnd. Also it seems to get stuck in
xive init somewhere when replaying from checkpoint, so there is probably
some state in xive not being reset. But at least it runs the avocado
test and seems close to working, so I've added that test case so we
don't go backwards (ha!).

Other machine types might not be too far off if there is interest. I
found it quite difficult to find these problems though, reverse
debugging will sometimes just lock up, stop at wrong location, or abort
with wrong event. Difficult understand what went wrong. Worst case I had
to basically bisect the replay of the trace, and find the minimum length
of replay that hit the problem -- that sometimes would land near a
mtDEC or timer interrupt or similar.

Thanks,
Nick

Nicholas Piggin (19):
  ppc/vhyp: reset exception state when handling vhyp hcall
  ppc/vof: Fix missed fields in VOF cleanup
  hw/ppc/ppc.c: Tidy over-long lines
  hw/ppc: Introduce functions for conversion between timebase and
nanoseconds
  host-utils: Add muldiv64_round_up
  hw/ppc: Round up the decrementer interval when converting to ns
  hw/ppc: Avoid decrementer rounding errors
  target/ppc: Sign-extend large decrementer to 64-bits
  hw/ppc: Always store the decrementer value
  target/ppc: Migrate DECR SPR
  hw/ppc: Reset timebase facilities on machine reset
  hw/ppc: Read time only once to perform decrementer write
  target/ppc: Fix CPU reservation migration for record-replay
  target/ppc: Fix timebase reset with record-replay
  spapr: Fix machine reset deadlock from replay-record
  spapr: Fix record-replay machine reset consuming too many events
  tests/avocado: boot ppc64 pseries replay-record test to Linux VFS
mount
  tests/avocado: reverse-debugging cope with re-executing breakpoints
  tests/avocado: ppc64 reverse debugging tests for pseries and powernv

 hw/ppc/mac_oldworld.c  |   1 +
 hw/ppc/pegasos2.c  |   1 +
 hw/ppc/pnv_core.c  |   2 +
 hw/ppc/ppc.c   | 236 +++--
 hw/ppc/prep.c  |   1 +
 hw/ppc/spapr.c |  32 +++-
 hw/ppc/spapr_cpu_core.c|   2 +
 hw/ppc/vof.c   |   2 +
 include/hw/ppc/ppc.h   |   3 +-
 include/hw/ppc/spapr.h |   2 +
 include/qemu/host-utils.h  |  21 ++-
 target/ppc/compat.c|  19 +++
 target/ppc/cpu.h   |   3 +
 target/ppc/excp_helper.c   |   3 +
 target/ppc/machine.c   |  40 -
 target/ppc/translate.c |   4 +
 tests/avocado/replay_kernel.py |   3 +-
 tests/avocado/reverse_debugging.py |  54 ++-
 18 files changed, 330 insertions(+), 99 deletions(-)

-- 
2.40.1

[PATCH v2 07/19] hw/ppc: Avoid decrementer rounding errors

2023-08-07 Thread Nicholas Piggin

The decrementer register contains a relative time in timebase units.
When writing to DECR this is converted and stored as an absolute value
in nanosecond units, reading DECR converts back to relative timebase.

The tb<->ns conversion of the relative part can cause rounding such that
a value writen to the decrementer can read back a different, with time
held constant. This is a particular problem for a deterministic icount
and record-replay trace.

Fix this by storing the absolute value in timebase units rather than
nanoseconds. The math before:
  store:  decr_next = now_ns + decr * ns_per_sec / tb_per_sec
  load:decr = (decr_next - now_ns) * tb_per_sec / ns_per_sec
  load(store): decr = decr * ns_per_sec / tb_per_sec * tb_per_sec /
  ns_per_sec

After:
  store:  decr_next = now_ns * tb_per_sec / ns_per_sec + decr
  load:decr = decr_next - now_ns * tb_per_sec / ns_per_sec
  load(store): decr = decr

Fixes: 9fddaa0c0cab ("PowerPC merge: real time TB and decrementer - faster and 
simpler exception handling (Jocelyn Mayer)")
Signed-off-by: Nicholas Piggin 
---
 hw/ppc/ppc.c | 39 ---
 1 file changed, 24 insertions(+), 15 deletions(-)

diff --git a/hw/ppc/ppc.c b/hw/ppc/ppc.c
index 13eb45f4b7..a397820d9c 100644
--- a/hw/ppc/ppc.c
+++ b/hw/ppc/ppc.c
@@ -711,16 +711,17 @@ bool ppc_decr_clear_on_delivery(CPUPPCState *env)
 static inline int64_t _cpu_ppc_load_decr(CPUPPCState *env, uint64_t next)
 {
 ppc_tb_t *tb_env = env->tb_env;
-int64_t decr, diff;
+uint64_t now, n;
+int64_t decr;
 
-diff = next - qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
-if (diff >= 0) {
-decr = ns_to_tb(tb_env->decr_freq, diff);
-} else if (tb_env->flags & PPC_TIMER_BOOKE) {
+now = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
+n = ns_to_tb(tb_env->decr_freq, now);
+if (next > n && tb_env->flags & PPC_TIMER_BOOKE) {
 decr = 0;
-}  else {
-decr = -ns_to_tb(tb_env->decr_freq, -diff);
+} else {
+decr = next - n;
 }
+
 trace_ppc_decr_load(decr);
 
 return decr;
@@ -857,13 +858,18 @@ static void __cpu_ppc_store_decr(PowerPCCPU *cpu, 
uint64_t *nextp,
 (*lower_excp)(cpu);
 }
 
-/* Calculate the next timer event */
+/*
+ * Calculate the next decrementer event and set a timer.
+ * decr_next is in timebase units to keep rounding simple. Note it is
+ * not adjusted by tb_offset because if TB changes via tb_offset changing,
+ * decrementer does not change, so not directly comparable with TB.
+ */
 now = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
-next = now + tb_to_ns_round_up(tb_env->decr_freq, value);
+next = ns_to_tb(tb_env->decr_freq, now) + value;
 *nextp = next;
 
 /* Adjust timer */
-timer_mod(timer, next);
+timer_mod(timer, tb_to_ns_round_up(tb_env->decr_freq, next));
 }
 
 static inline void _cpu_ppc_store_decr(PowerPCCPU *cpu, target_ulong decr,
@@ -1177,12 +1183,15 @@ static void start_stop_pit (CPUPPCState *env, ppc_tb_t 
*tb_env, int is_excp)
 } else {
 trace_ppc4xx_pit_start(ppc40x_timer->pit_reload);
 now = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
-next = now + tb_to_ns_round_up(tb_env->decr_freq,
-   ppc40x_timer->pit_reload);
-if (is_excp)
-next += tb_env->decr_next - now;
+
+if (is_excp) {
+tb_env->decr_next += ppc40x_timer->pit_reload;
+} else {
+tb_env->decr_next = ns_to_tb(tb_env->decr_freq, now)
++ ppc40x_timer->pit_reload;
+}
+next = tb_to_ns_round_up(tb_env->decr_freq, tb_env->decr_next);
 timer_mod(tb_env->decr_timer, next);
-tb_env->decr_next = next;
 }
 }
 
-- 
2.40.1

[PATCH v2 08/19] target/ppc: Sign-extend large decrementer to 64-bits

2023-08-07 Thread Nicholas Piggin

When storing a large decrementer value with the most significant
implemented bit set, it is to be treated as a negative and sign
extended.

This isn't hit for book3s DEC because of another bug, fixing it
in the next patch exposes this one and can cause additional
problems, so fix this first. It can be hit with HDECR and other
edge triggered types.

Fixes: a8dafa52518 ("target/ppc: Implement large decrementer support for TCG")
Signed-off-by: Nicholas Piggin 
---
 hw/ppc/ppc.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/hw/ppc/ppc.c b/hw/ppc/ppc.c
index a397820d9c..fb4784793c 100644
--- a/hw/ppc/ppc.c
+++ b/hw/ppc/ppc.c
@@ -743,7 +743,9 @@ target_ulong cpu_ppc_load_decr(CPUPPCState *env)
  * to 64 bits, otherwise it is a 32 bit value.
  */
 if (env->spr[SPR_LPCR] & LPCR_LD) {
-return decr;
+PowerPCCPU *cpu = env_archcpu(env);
+PowerPCCPUClass *pcc = POWERPC_CPU_GET_CLASS(cpu);
+return sextract64(decr, 0, pcc->lrg_decr_bits);
 }
 return (uint32_t) decr;
 }
@@ -762,7 +764,9 @@ target_ulong cpu_ppc_load_hdecr(CPUPPCState *env)
  * extended to 64 bits, otherwise it is 32 bits.
  */
 if (pcc->lrg_decr_bits > 32) {
-return hdecr;
+PowerPCCPU *cpu = env_archcpu(env);
+PowerPCCPUClass *pcc = POWERPC_CPU_GET_CLASS(cpu);
+return sextract64(hdecr, 0, pcc->lrg_decr_bits);
 }
 return (uint32_t) hdecr;
 }
-- 
2.40.1

[PATCH v2 04/19] hw/ppc: Introduce functions for conversion between timebase and nanoseconds

2023-08-07 Thread Nicholas Piggin

These calculations are repeated several times, and they will become
a little more complicated with subsequent changes.

Signed-off-by: Nicholas Piggin 
---
 hw/ppc/ppc.c | 28 ++--
 1 file changed, 18 insertions(+), 10 deletions(-)

diff --git a/hw/ppc/ppc.c b/hw/ppc/ppc.c
index 09b82f68a8..423a3a117a 100644
--- a/hw/ppc/ppc.c
+++ b/hw/ppc/ppc.c
@@ -482,10 +482,20 @@ void ppce500_set_mpic_proxy(bool enabled)
 /*/
 /* PowerPC time base and decrementer emulation */
 
+static uint64_t ns_to_tb(uint32_t freq, int64_t clock)
+{
+return muldiv64(clock, freq, NANOSECONDS_PER_SECOND);
+}
+
+static int64_t tb_to_ns(uint32_t freq, uint64_t tb)
+{
+return muldiv64(tb, NANOSECONDS_PER_SECOND, freq);
+}
+
 uint64_t cpu_ppc_get_tb(ppc_tb_t *tb_env, uint64_t vmclk, int64_t tb_offset)
 {
 /* TB time in tb periods */
-return muldiv64(vmclk, tb_env->tb_freq, NANOSECONDS_PER_SECOND) + 
tb_offset;
+return ns_to_tb(tb_env->tb_freq, vmclk) + tb_offset;
 }
 
 uint64_t cpu_ppc_load_tbl (CPUPPCState *env)
@@ -528,8 +538,7 @@ uint32_t cpu_ppc_load_tbu (CPUPPCState *env)
 static inline void cpu_ppc_store_tb(ppc_tb_t *tb_env, uint64_t vmclk,
 int64_t *tb_offsetp, uint64_t value)
 {
-*tb_offsetp = value -
-muldiv64(vmclk, tb_env->tb_freq, NANOSECONDS_PER_SECOND);
+*tb_offsetp = value - ns_to_tb(tb_env->tb_freq, vmclk);
 
 trace_ppc_tb_store(value, *tb_offsetp);
 }
@@ -694,11 +703,11 @@ static inline int64_t _cpu_ppc_load_decr(CPUPPCState 
*env, uint64_t next)
 
 diff = next - qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
 if (diff >= 0) {
-decr = muldiv64(diff, tb_env->decr_freq, NANOSECONDS_PER_SECOND);
+decr = ns_to_tb(tb_env->decr_freq, diff);
 } else if (tb_env->flags & PPC_TIMER_BOOKE) {
 decr = 0;
 }  else {
-decr = -muldiv64(-diff, tb_env->decr_freq, NANOSECONDS_PER_SECOND);
+decr = -ns_to_tb(tb_env->decr_freq, -diff);
 }
 trace_ppc_decr_load(decr);
 
@@ -838,7 +847,7 @@ static void __cpu_ppc_store_decr(PowerPCCPU *cpu, uint64_t 
*nextp,
 
 /* Calculate the next timer event */
 now = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
-next = now + muldiv64(value, NANOSECONDS_PER_SECOND, tb_env->decr_freq);
+next = now + tb_to_ns(tb_env->decr_freq, value);
 *nextp = next;
 
 /* Adjust timer */
@@ -1130,7 +1139,7 @@ static void cpu_4xx_fit_cb (void *opaque)
 /* Cannot occur, but makes gcc happy */
 return;
 }
-next = now + muldiv64(next, NANOSECONDS_PER_SECOND, tb_env->tb_freq);
+next = now + tb_to_ns(tb_env->tb_freq, next);
 if (next == now)
 next++;
 timer_mod(ppc40x_timer->fit_timer, next);
@@ -1158,8 +1167,7 @@ static void start_stop_pit (CPUPPCState *env, ppc_tb_t 
*tb_env, int is_excp)
 } else {
 trace_ppc4xx_pit_start(ppc40x_timer->pit_reload);
 now = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
-next = now + muldiv64(ppc40x_timer->pit_reload,
-  NANOSECONDS_PER_SECOND, tb_env->decr_freq);
+next = now + tb_to_ns(tb_env->decr_freq, ppc40x_timer->pit_reload);
 if (is_excp)
 next += tb_env->decr_next - now;
 if (next == now)
@@ -1218,7 +1226,7 @@ static void cpu_4xx_wdt_cb (void *opaque)
 /* Cannot occur, but makes gcc happy */
 return;
 }
-next = now + muldiv64(next, NANOSECONDS_PER_SECOND, tb_env->decr_freq);
+next = now + tb_to_ns(tb_env->decr_freq, next);
 if (next == now)
 next++;
 trace_ppc4xx_wdt(env->spr[SPR_40x_TCR], env->spr[SPR_40x_TSR]);
-- 
2.40.1

[PATCH v2 03/19] hw/ppc/ppc.c: Tidy over-long lines

2023-08-07 Thread Nicholas Piggin

Signed-off-by: Nicholas Piggin 
---
 hw/ppc/ppc.c | 19 ---
 1 file changed, 12 insertions(+), 7 deletions(-)

diff --git a/hw/ppc/ppc.c b/hw/ppc/ppc.c
index 0e0a3d93c3..09b82f68a8 100644
--- a/hw/ppc/ppc.c
+++ b/hw/ppc/ppc.c
@@ -497,7 +497,8 @@ uint64_t cpu_ppc_load_tbl (CPUPPCState *env)
 return env->spr[SPR_TBL];
 }
 
-tb = cpu_ppc_get_tb(tb_env, qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL), 
tb_env->tb_offset);
+tb = cpu_ppc_get_tb(tb_env, qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL),
+tb_env->tb_offset);
 trace_ppc_tb_load(tb);
 
 return tb;
@@ -508,7 +509,8 @@ static inline uint32_t _cpu_ppc_load_tbu(CPUPPCState *env)
 ppc_tb_t *tb_env = env->tb_env;
 uint64_t tb;
 
-tb = cpu_ppc_get_tb(tb_env, qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL), 
tb_env->tb_offset);
+tb = cpu_ppc_get_tb(tb_env, qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL),
+tb_env->tb_offset);
 trace_ppc_tb_load(tb);
 
 return tb >> 32;
@@ -565,7 +567,8 @@ uint64_t cpu_ppc_load_atbl (CPUPPCState *env)
 ppc_tb_t *tb_env = env->tb_env;
 uint64_t tb;
 
-tb = cpu_ppc_get_tb(tb_env, qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL), 
tb_env->atb_offset);
+tb = cpu_ppc_get_tb(tb_env, qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL),
+tb_env->atb_offset);
 trace_ppc_tb_load(tb);
 
 return tb;
@@ -576,7 +579,8 @@ uint32_t cpu_ppc_load_atbu (CPUPPCState *env)
 ppc_tb_t *tb_env = env->tb_env;
 uint64_t tb;
 
-tb = cpu_ppc_get_tb(tb_env, qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL), 
tb_env->atb_offset);
+tb = cpu_ppc_get_tb(tb_env, qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL),
+tb_env->atb_offset);
 trace_ppc_tb_load(tb);
 
 return tb >> 32;
@@ -1040,10 +1044,11 @@ clk_setup_cb cpu_ppc_tb_init (CPUPPCState *env, 
uint32_t freq)
 tb_env->flags |= PPC_DECR_UNDERFLOW_LEVEL;
 }
 /* Create new timer */
-tb_env->decr_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, _ppc_decr_cb, 
cpu);
+tb_env->decr_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL,
+  _ppc_decr_cb, cpu);
 if (env->has_hv_mode && !cpu->vhyp) {
-tb_env->hdecr_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, 
_ppc_hdecr_cb,
-cpu);
+tb_env->hdecr_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL,
+   _ppc_hdecr_cb, cpu);
 } else {
 tb_env->hdecr_timer = NULL;
 }
-- 
2.40.1

[PATCH v2 01/19] ppc/vhyp: reset exception state when handling vhyp hcall

2023-08-07 Thread Nicholas Piggin

Convention is to reset the exception_index and error_code after handling
an interrupt. The vhyp hcall handler fails to do this. This does not
appear to have ill effects because cpu_handle_exception() clears
exception_index later, but it is fragile and inconsistent. Reset the
exception state after handling vhyp hcall like other handlers.

Signed-off-by: Nicholas Piggin 
---
 target/ppc/excp_helper.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/target/ppc/excp_helper.c b/target/ppc/excp_helper.c
index 32e46e56b3..72ec2be92e 100644
--- a/target/ppc/excp_helper.c
+++ b/target/ppc/excp_helper.c
@@ -843,6 +843,7 @@ static void powerpc_excp_7xx(PowerPCCPU *cpu, int excp)
 PPCVirtualHypervisorClass *vhc =
 PPC_VIRTUAL_HYPERVISOR_GET_CLASS(cpu->vhyp);
 vhc->hypercall(cpu->vhyp, cpu);
+powerpc_reset_excp_state(cpu);
 return;
 }
 
@@ -1014,6 +1015,7 @@ static void powerpc_excp_74xx(PowerPCCPU *cpu, int excp)
 PPCVirtualHypervisorClass *vhc =
 PPC_VIRTUAL_HYPERVISOR_GET_CLASS(cpu->vhyp);
 vhc->hypercall(cpu->vhyp, cpu);
+powerpc_reset_excp_state(cpu);
 return;
 }
 
@@ -1526,6 +1528,7 @@ static void powerpc_excp_books(PowerPCCPU *cpu, int excp)
 PPCVirtualHypervisorClass *vhc =
 PPC_VIRTUAL_HYPERVISOR_GET_CLASS(cpu->vhyp);
 vhc->hypercall(cpu->vhyp, cpu);
+powerpc_reset_excp_state(cpu);
 return;
 }
 if (env->insns_flags2 & PPC2_ISA310) {
-- 
2.40.1

Re: [PATCH 4/7] spapr: Fix record-replay machine reset consuming too many events

2023-08-07 Thread Pavel Dovgalyuk


On 08.08.2023 06:09, Nicholas Piggin wrote:

On Sun Aug 6, 2023 at 9:46 PM AEST, Nicholas Piggin wrote:

On Fri Aug 4, 2023 at 6:50 PM AEST, Pavel Dovgalyuk wrote:

BTW, there is a function qemu_register_reset_nosnapshotload that can be
used in similar cases.
Can you just use it without changing the code of the reset handler?


I didn't know that, thanks for pointing it out. I'll take a closer look
at it before reposting.


Seems a bit tricky because the device tree has to be rebuilt at reset
time (including snapshot load), but it uses the random number. So


It seems strange to me, that loading the existing configuration has to 
randomize the device tree.



having a second nosnapshotload reset function might not be called in
the correct order, I think?  For now I will keep it as is.


Ok, let's wait for other reviewers.


Pavel Dovgalyuk

Re: [PATCH 4/8] Introduce the CPU address space destruction function

2023-08-07 Thread lixianglai


Hi Igor Mammedov:

The first four patches are written with reference to the patch in the public

modification section of Arm's CPU Hotplug, and the Arm CPU HotPlug-related

patches will be merged into the community in the near future, so the 
first four


patches will be discarded and rebase based on the latest code.


Thanks,

xianglai.


On 7/28/23 8:13 PM, Igor Mammedov wrote:

On Thu, 20 Jul 2023 15:15:09 +0800
xianglai li  wrote:


Introduce new functions to destroy CPU address space resources

s/functions/function/


for cpu hot-(un)plug.

Cc: Xiaojuan Yang 
Cc: Song Gao 
Cc: "Michael S. Tsirkin" 
Cc: Igor Mammedov 
Cc: Ani Sinha 
Cc: Paolo Bonzini 
Cc: Richard Henderson 
Cc: Eduardo Habkost 
Cc: Marcel Apfelbaum 
Cc: "Philippe Mathieu-Daudé" 
Cc: Yanan Wang 
Cc: "Daniel P. Berrangé" 
Cc: Peter Xu 
Cc: David Hildenbrand 
Signed-off-by: xianglai li 
---
  include/exec/cpu-common.h |  8 
  include/hw/core/cpu.h |  1 +
  softmmu/physmem.c | 24 
  3 files changed, 33 insertions(+)

diff --git a/include/exec/cpu-common.h b/include/exec/cpu-common.h
index 87dc9a752c..27cd4d32b1 100644
--- a/include/exec/cpu-common.h
+++ b/include/exec/cpu-common.h
@@ -120,6 +120,14 @@ size_t qemu_ram_pagesize_largest(void);
   */
  void cpu_address_space_init(CPUState *cpu, int asidx,
  const char *prefix, MemoryRegion *mr);
+/**
+ * cpu_address_space_destroy:
+ * @cpu: CPU for which address space needs to be destroyed
+ * @asidx: integer index of this address space
+ *
+ * Note that with KVM only one address space is supported.
+ */
+void cpu_address_space_destroy(CPUState *cpu, int asidx);
  
  void cpu_physical_memory_rw(hwaddr addr, void *buf,

  hwaddr len, bool is_write);
diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h
index fdcbe87352..d6d68dac12 100644
--- a/include/hw/core/cpu.h
+++ b/include/hw/core/cpu.h
@@ -366,6 +366,7 @@ struct CPUState {
  QSIMPLEQ_HEAD(, qemu_work_item) work_list;
  
  CPUAddressSpace *cpu_ases;

+int cpu_ases_ref_count;

perhaps renaming it to num_ases would be better


  int num_ases;

and this one can be named num__ases_[total|max]



  AddressSpace *as;
  MemoryRegion *memory;
diff --git a/softmmu/physmem.c b/softmmu/physmem.c
index 3df73542e1..f4545b4508 100644
--- a/softmmu/physmem.c
+++ b/softmmu/physmem.c
@@ -762,6 +762,7 @@ void cpu_address_space_init(CPUState *cpu, int asidx,
  
  if (!cpu->cpu_ases) {

  cpu->cpu_ases = g_new0(CPUAddressSpace, cpu->num_ases);
+cpu->cpu_ases_ref_count = cpu->num_ases;
  }
  
  newas = >cpu_ases[asidx];

@@ -775,6 +776,29 @@ void cpu_address_space_init(CPUState *cpu, int asidx,
  }
  }
  
+void cpu_address_space_destroy(CPUState *cpu, int asidx)

  ^^^ should it be uintX_t ?


+{
+CPUAddressSpace *cpuas;
+
+assert(asidx < cpu->num_ases);
+assert(asidx == 0 || !kvm_enabled());
+assert(cpu->cpu_ases);
+
+cpuas = >cpu_ases[asidx];
+if (tcg_enabled()) {
+memory_listener_unregister(>tcg_as_listener);
+}
+
+address_space_destroy(cpuas->as);
+
+cpu->cpu_ases_ref_count--;
+if (cpu->cpu_ases_ref_count == 0) {
+g_free(cpu->cpu_ases);
+cpu->cpu_ases = NULL;
+}
+
+}
+
  AddressSpace *cpu_get_address_space(CPUState *cpu, int asidx)
  {
  /* Return the AddressSpace corresponding to the specified index */

[PATCH 22/24] tcg/i386: Clear dest first in tcg_out_setcond if possible

2023-08-07 Thread Richard Henderson

Using XOR first is both smaller and more efficient,
though cannot be applied if it clobbers an input.

Signed-off-by: Richard Henderson 
---
 tcg/i386/tcg-target.c.inc | 17 -
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/tcg/i386/tcg-target.c.inc b/tcg/i386/tcg-target.c.inc
index e06ac638b0..cca49fe63a 100644
--- a/tcg/i386/tcg-target.c.inc
+++ b/tcg/i386/tcg-target.c.inc
@@ -1529,6 +1529,7 @@ static void tcg_out_setcond(TCGContext *s, int rexw, 
TCGCond cond,
 int const_arg2)
 {
 bool inv = false;
+bool cleared;
 
 switch (cond) {
 case TCG_COND_NE:
@@ -1578,9 +1579,23 @@ static void tcg_out_setcond(TCGContext *s, int rexw, 
TCGCond cond,
 break;
 }
 
+/*
+ * If dest does not overlap the inputs, clearing it first is preferred.
+ * The XOR breaks any false dependency for the low-byte write to dest,
+ * and is also one byte smaller than MOVZBL.
+ */
+cleared = false;
+if (dest != arg1 && (const_arg2 || dest != arg2)) {
+tgen_arithr(s, ARITH_XOR, dest, dest);
+cleared = true;
+}
+
 tcg_out_cmp(s, rexw, arg1, arg2, const_arg2, false);
 tcg_out_modrm(s, OPC_SETCC | tcg_cond_to_jcc[cond], 0, dest);
-tcg_out_ext8u(s, dest, dest);
+
+if (!cleared) {
+tcg_out_ext8u(s, dest, dest);
+}
 }
 
 #if TCG_TARGET_REG_BITS == 32
-- 
2.34.1

[PATCH 18/24] tcg/i386: Merge tcg_out_setcond{32,64}

2023-08-07 Thread Richard Henderson

Pass a rexw parameter instead of duplicating the functions.

Signed-off-by: Richard Henderson 
---
 tcg/i386/tcg-target.c.inc | 24 +++-
 1 file changed, 7 insertions(+), 17 deletions(-)

diff --git a/tcg/i386/tcg-target.c.inc b/tcg/i386/tcg-target.c.inc
index b9673b55bd..ec3c7012d4 100644
--- a/tcg/i386/tcg-target.c.inc
+++ b/tcg/i386/tcg-target.c.inc
@@ -1524,23 +1524,16 @@ static void tcg_out_brcond2(TCGContext *s, const TCGArg 
*args,
 }
 #endif
 
-static void tcg_out_setcond32(TCGContext *s, TCGCond cond, TCGArg dest,
-  TCGArg arg1, TCGArg arg2, int const_arg2)
+static void tcg_out_setcond(TCGContext *s, int rexw, TCGCond cond,
+TCGArg dest, TCGArg arg1, TCGArg arg2,
+int const_arg2)
 {
-tcg_out_cmp(s, arg1, arg2, const_arg2, 0);
+tcg_out_cmp(s, arg1, arg2, const_arg2, rexw);
 tcg_out_modrm(s, OPC_SETCC | tcg_cond_to_jcc[cond], 0, dest);
 tcg_out_ext8u(s, dest, dest);
 }
 
-#if TCG_TARGET_REG_BITS == 64
-static void tcg_out_setcond64(TCGContext *s, TCGCond cond, TCGArg dest,
-  TCGArg arg1, TCGArg arg2, int const_arg2)
-{
-tcg_out_cmp(s, arg1, arg2, const_arg2, P_REXW);
-tcg_out_modrm(s, OPC_SETCC | tcg_cond_to_jcc[cond], 0, dest);
-tcg_out_ext8u(s, dest, dest);
-}
-#else
+#if TCG_TARGET_REG_BITS == 32
 static void tcg_out_setcond2(TCGContext *s, const TCGArg *args,
  const int *const_args)
 {
@@ -2565,8 +2558,8 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode 
opc,
 tcg_out_brcond(s, rexw, a2, a0, a1, const_args[1],
arg_label(args[3]), 0);
 break;
-case INDEX_op_setcond_i32:
-tcg_out_setcond32(s, args[3], a0, a1, a2, const_a2);
+OP_32_64(setcond):
+tcg_out_setcond(s, rexw, args[3], a0, a1, a2, const_a2);
 break;
 case INDEX_op_movcond_i32:
 tcg_out_movcond32(s, args[5], a0, a1, a2, const_a2, args[3]);
@@ -2718,9 +2711,6 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode 
opc,
 }
 break;
 
-case INDEX_op_setcond_i64:
-tcg_out_setcond64(s, args[3], a0, a1, a2, const_a2);
-break;
 case INDEX_op_movcond_i64:
 tcg_out_movcond64(s, args[5], a0, a1, a2, const_a2, args[3]);
 break;
-- 
2.34.1

[PATCH 5/6] target/ppc: Implement watchpoint debug facility for v2.07S

2023-08-07 Thread Nicholas Piggin

ISA v2.07S introduced the watchpoint facility based on the DAWR0
and DAWRX0 SPRs. Implement this in TCG.

Signed-off-by: Nicholas Piggin 
---
 target/ppc/cpu.c | 59 
 target/ppc/cpu.h |  4 +++
 target/ppc/cpu_init.c|  6 ++--
 target/ppc/excp_helper.c | 52 ++-
 target/ppc/helper.h  |  2 ++
 target/ppc/internal.h|  1 +
 target/ppc/machine.c |  1 +
 target/ppc/misc_helper.c | 10 +++
 target/ppc/spr_common.h  |  2 ++
 target/ppc/translate.c   | 13 +
 10 files changed, 147 insertions(+), 3 deletions(-)

diff --git a/target/ppc/cpu.c b/target/ppc/cpu.c
index d9c665ce18..62e1c15e3d 100644
--- a/target/ppc/cpu.c
+++ b/target/ppc/cpu.c
@@ -128,6 +128,65 @@ void ppc_store_ciabr(CPUPPCState *env, target_ulong val)
 env->spr[SPR_CIABR] = val;
 ppc_update_ciabr(env);
 }
+
+void ppc_update_daw0(CPUPPCState *env)
+{
+CPUState *cs = env_cpu(env);
+target_ulong deaw = env->spr[SPR_DAWR0] & PPC_BITMASK(0, 60);
+uint32_t dawrx = env->spr[SPR_DAWRX0];
+int mrd = extract32(dawrx, PPC_BIT_NR(48), 54 - 48);
+bool dw = extract32(dawrx, PPC_BIT_NR(57), 1);
+bool dr = extract32(dawrx, PPC_BIT_NR(58), 1);
+bool hv = extract32(dawrx, PPC_BIT_NR(61), 1);
+bool sv = extract32(dawrx, PPC_BIT_NR(62), 1);
+bool pr = extract32(dawrx, PPC_BIT_NR(62), 1);
+vaddr len;
+int flags;
+
+if (env->dawr0_watchpoint) {
+cpu_watchpoint_remove_by_ref(cs, env->dawr0_watchpoint);
+env->dawr0_watchpoint = NULL;
+}
+
+if (!dr && !dw) {
+return;
+}
+
+if (!hv && !sv && !pr) {
+return;
+}
+
+len = (mrd + 1) * 8;
+flags = BP_CPU | BP_STOP_BEFORE_ACCESS;
+if (dr) {
+flags |= BP_MEM_READ;
+}
+if (dw) {
+flags |= BP_MEM_WRITE;
+}
+
+cpu_watchpoint_insert(cs, deaw, len, flags, >dawr0_watchpoint);
+}
+
+void ppc_store_dawr0(CPUPPCState *env, target_ulong val)
+{
+env->spr[SPR_DAWR0] = val;
+ppc_update_daw0(env);
+}
+
+void ppc_store_dawrx0(CPUPPCState *env, uint32_t val)
+{
+int hrammc = extract32(val, PPC_BIT_NR(56), 1);
+
+if (hrammc) {
+/* This might be done with a second watchpoint at the xor of DEAW[0] */
+qemu_log_mask(LOG_UNIMP, "%s: DAWRX0[HRAMMC] is unimplemented\n",
+  __func__);
+}
+
+env->spr[SPR_DAWRX0] = val;
+ppc_update_daw0(env);
+}
 #endif
 #endif
 
diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
index d97fabd8f6..2777ea3110 100644
--- a/target/ppc/cpu.h
+++ b/target/ppc/cpu.h
@@ -1138,6 +1138,7 @@ struct CPUArchState {
 #if defined(TARGET_PPC64)
 ppc_slb_t slb[MAX_SLB_ENTRIES]; /* PowerPC 64 SLB area */
 struct CPUBreakpoint *ciabr_breakpoint;
+struct CPUWatchpoint *dawr0_watchpoint;
 #endif
 target_ulong sr[32];   /* segment registers */
 uint32_t nb_BATs;  /* number of BATs */
@@ -1406,6 +1407,9 @@ void ppc_store_sdr1(CPUPPCState *env, target_ulong value);
 void ppc_store_lpcr(PowerPCCPU *cpu, target_ulong val);
 void ppc_update_ciabr(CPUPPCState *env);
 void ppc_store_ciabr(CPUPPCState *env, target_ulong value);
+void ppc_update_daw0(CPUPPCState *env);
+void ppc_store_dawr0(CPUPPCState *env, target_ulong value);
+void ppc_store_dawrx0(CPUPPCState *env, uint32_t value);
 #endif /* !defined(CONFIG_USER_ONLY) */
 void ppc_store_msr(CPUPPCState *env, target_ulong value);
 
diff --git a/target/ppc/cpu_init.c b/target/ppc/cpu_init.c
index a2820839b3..9c1c045d1b 100644
--- a/target/ppc/cpu_init.c
+++ b/target/ppc/cpu_init.c
@@ -5117,12 +5117,12 @@ static void register_book3s_207_dbg_sprs(CPUPPCState 
*env)
 spr_register_kvm_hv(env, SPR_DAWR0, "DAWR0",
 SPR_NOACCESS, SPR_NOACCESS,
 SPR_NOACCESS, SPR_NOACCESS,
-_read_generic, _write_generic,
+_read_generic, _write_dawr0,
 KVM_REG_PPC_DAWR, 0x);
 spr_register_kvm_hv(env, SPR_DAWRX0, "DAWRX0",
 SPR_NOACCESS, SPR_NOACCESS,
 SPR_NOACCESS, SPR_NOACCESS,
-_read_generic, _write_generic32,
+_read_generic, _write_dawrx0,
 KVM_REG_PPC_DAWRX, 0x);
 spr_register_kvm_hv(env, SPR_CIABR, "CIABR",
 SPR_NOACCESS, SPR_NOACCESS,
@@ -7150,6 +7150,7 @@ static void ppc_cpu_reset_hold(Object *obj)
 
 if (tcg_enabled()) {
 cpu_breakpoint_remove_all(s, BP_CPU);
+cpu_watchpoint_remove_all(s, BP_CPU);
 if (env->mmu_model != POWERPC_MMU_REAL) {
 ppc_tlb_invalidate_all(env);
 }
@@ -7339,6 +7340,7 @@ static const struct TCGCPUOps ppc_tcg_ops = {
   .do_transaction_failed = ppc_cpu_do_transaction_failed,
   .debug_excp_handler = ppc_cpu_debug_excp_handler,
   .debug_check_breakpoint = ppc_cpu_debug_check_breakpoint,
+  .debug_check_watchpoint =

[PATCH 02/24] tcg: Use tcg_gen_negsetcond_*

2023-08-07 Thread Richard Henderson

Signed-off-by: Richard Henderson 
---
 tcg/tcg-op-gvec.c | 6 ++
 tcg/tcg-op.c  | 6 ++
 2 files changed, 4 insertions(+), 8 deletions(-)

diff --git a/tcg/tcg-op-gvec.c b/tcg/tcg-op-gvec.c
index a062239804..e260a07c61 100644
--- a/tcg/tcg-op-gvec.c
+++ b/tcg/tcg-op-gvec.c
@@ -3692,8 +3692,7 @@ static void expand_cmp_i32(uint32_t dofs, uint32_t aofs, 
uint32_t bofs,
 for (i = 0; i < oprsz; i += 4) {
 tcg_gen_ld_i32(t0, cpu_env, aofs + i);
 tcg_gen_ld_i32(t1, cpu_env, bofs + i);
-tcg_gen_setcond_i32(cond, t0, t0, t1);
-tcg_gen_neg_i32(t0, t0);
+tcg_gen_negsetcond_i32(cond, t0, t0, t1);
 tcg_gen_st_i32(t0, cpu_env, dofs + i);
 }
 tcg_temp_free_i32(t1);
@@ -3710,8 +3709,7 @@ static void expand_cmp_i64(uint32_t dofs, uint32_t aofs, 
uint32_t bofs,
 for (i = 0; i < oprsz; i += 8) {
 tcg_gen_ld_i64(t0, cpu_env, aofs + i);
 tcg_gen_ld_i64(t1, cpu_env, bofs + i);
-tcg_gen_setcond_i64(cond, t0, t0, t1);
-tcg_gen_neg_i64(t0, t0);
+tcg_gen_negsetcond_i64(cond, t0, t0, t1);
 tcg_gen_st_i64(t0, cpu_env, dofs + i);
 }
 tcg_temp_free_i64(t1);
diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c
index 76d2377669..b4f1f24cab 100644
--- a/tcg/tcg-op.c
+++ b/tcg/tcg-op.c
@@ -863,8 +863,7 @@ void tcg_gen_movcond_i32(TCGCond cond, TCGv_i32 ret, 
TCGv_i32 c1,
 } else {
 TCGv_i32 t0 = tcg_temp_ebb_new_i32();
 TCGv_i32 t1 = tcg_temp_ebb_new_i32();
-tcg_gen_setcond_i32(cond, t0, c1, c2);
-tcg_gen_neg_i32(t0, t0);
+tcg_gen_negsetcond_i32(cond, t0, c1, c2);
 tcg_gen_and_i32(t1, v1, t0);
 tcg_gen_andc_i32(ret, v2, t0);
 tcg_gen_or_i32(ret, ret, t1);
@@ -2563,8 +2562,7 @@ void tcg_gen_movcond_i64(TCGCond cond, TCGv_i64 ret, 
TCGv_i64 c1,
 } else {
 TCGv_i64 t0 = tcg_temp_ebb_new_i64();
 TCGv_i64 t1 = tcg_temp_ebb_new_i64();
-tcg_gen_setcond_i64(cond, t0, c1, c2);
-tcg_gen_neg_i64(t0, t0);
+tcg_gen_negsetcond_i64(cond, t0, c1, c2);
 tcg_gen_and_i64(t1, v1, t0);
 tcg_gen_andc_i64(ret, v2, t0);
 tcg_gen_or_i64(ret, ret, t1);
-- 
2.34.1

[PATCH 05/24] target/m68k: Use tcg_gen_negsetcond_*

2023-08-07 Thread Richard Henderson

Signed-off-by: Richard Henderson 
---
 target/m68k/translate.c | 24 ++--
 1 file changed, 10 insertions(+), 14 deletions(-)

diff --git a/target/m68k/translate.c b/target/m68k/translate.c
index e07161d76f..37954d11a6 100644
--- a/target/m68k/translate.c
+++ b/target/m68k/translate.c
@@ -1357,8 +1357,7 @@ static void gen_cc_cond(DisasCompare *c, DisasContext *s, 
int cond)
 case 14: /* GT (!(Z || (N ^ V))) */
 case 15: /* LE (Z || (N ^ V)) */
 c->v1 = tmp = tcg_temp_new();
-tcg_gen_setcond_i32(TCG_COND_EQ, tmp, QREG_CC_Z, c->v2);
-tcg_gen_neg_i32(tmp, tmp);
+tcg_gen_negsetcond_i32(TCG_COND_EQ, tmp, QREG_CC_Z, c->v2);
 tmp2 = tcg_temp_new();
 tcg_gen_xor_i32(tmp2, QREG_CC_N, QREG_CC_V);
 tcg_gen_or_i32(tmp, tmp, tmp2);
@@ -1437,9 +1436,8 @@ DISAS_INSN(scc)
 gen_cc_cond(, s, cond);
 
 tmp = tcg_temp_new();
-tcg_gen_setcond_i32(c.tcond, tmp, c.v1, c.v2);
+tcg_gen_negsetcond_i32(c.tcond, tmp, c.v1, c.v2);
 
-tcg_gen_neg_i32(tmp, tmp);
 DEST_EA(env, insn, OS_BYTE, tmp, NULL);
 }
 
@@ -2771,13 +2769,14 @@ DISAS_INSN(mull)
 tcg_gen_muls2_i32(QREG_CC_N, QREG_CC_V, src1, DREG(ext, 12));
 /* QREG_CC_V is -(QREG_CC_V != (QREG_CC_N >> 31)) */
 tcg_gen_sari_i32(QREG_CC_Z, QREG_CC_N, 31);
-tcg_gen_setcond_i32(TCG_COND_NE, QREG_CC_V, QREG_CC_V, QREG_CC_Z);
+tcg_gen_negsetcond_i32(TCG_COND_NE, QREG_CC_V,
+   QREG_CC_V, QREG_CC_Z);
 } else {
 tcg_gen_mulu2_i32(QREG_CC_N, QREG_CC_V, src1, DREG(ext, 12));
 /* QREG_CC_V is -(QREG_CC_V != 0), use QREG_CC_C as 0 */
-tcg_gen_setcond_i32(TCG_COND_NE, QREG_CC_V, QREG_CC_V, QREG_CC_C);
+tcg_gen_negsetcond_i32(TCG_COND_NE, QREG_CC_V,
+   QREG_CC_V, QREG_CC_C);
 }
-tcg_gen_neg_i32(QREG_CC_V, QREG_CC_V);
 tcg_gen_mov_i32(DREG(ext, 12), QREG_CC_N);
 
 tcg_gen_mov_i32(QREG_CC_Z, QREG_CC_N);
@@ -3346,14 +3345,13 @@ static inline void shift_im(DisasContext *s, uint16_t 
insn, int opsize)
 if (!logical && m68k_feature(s->env, M68K_FEATURE_M68K)) {
 /* if shift count >= bits, V is (reg != 0) */
 if (count >= bits) {
-tcg_gen_setcond_i32(TCG_COND_NE, QREG_CC_V, reg, QREG_CC_V);
+tcg_gen_negsetcond_i32(TCG_COND_NE, QREG_CC_V, reg, QREG_CC_V);
 } else {
 TCGv t0 = tcg_temp_new();
 tcg_gen_sari_i32(QREG_CC_V, reg, bits - 1);
 tcg_gen_sari_i32(t0, reg, bits - count - 1);
-tcg_gen_setcond_i32(TCG_COND_NE, QREG_CC_V, QREG_CC_V, t0);
+tcg_gen_negsetcond_i32(TCG_COND_NE, QREG_CC_V, QREG_CC_V, t0);
 }
-tcg_gen_neg_i32(QREG_CC_V, QREG_CC_V);
 }
 } else {
 tcg_gen_shri_i32(QREG_CC_C, reg, count - 1);
@@ -3437,9 +3435,8 @@ static inline void shift_reg(DisasContext *s, uint16_t 
insn, int opsize)
 /* Ignore the bits below the sign bit.  */
 tcg_gen_andi_i64(t64, t64, -1ULL << (bits - 1));
 /* If any bits remain set, we have overflow.  */
-tcg_gen_setcondi_i64(TCG_COND_NE, t64, t64, 0);
+tcg_gen_negsetcond_i64(TCG_COND_NE, t64, t64, tcg_constant_i64(0));
 tcg_gen_extrl_i64_i32(QREG_CC_V, t64);
-tcg_gen_neg_i32(QREG_CC_V, QREG_CC_V);
 }
 } else {
 tcg_gen_shli_i64(t64, t64, 32);
@@ -5318,9 +5315,8 @@ DISAS_INSN(fscc)
 gen_fcc_cond(, s, cond);
 
 tmp = tcg_temp_new();
-tcg_gen_setcond_i32(c.tcond, tmp, c.v1, c.v2);
+tcg_gen_negsetcond_i32(c.tcond, tmp, c.v1, c.v2);
 
-tcg_gen_neg_i32(tmp, tmp);
 DEST_EA(env, insn, OS_BYTE, tmp, NULL);
 }
 
-- 
2.34.1

[PATCH 04/24] target/arm: Use tcg_gen_negsetcond_*

2023-08-07 Thread Richard Henderson

Signed-off-by: Richard Henderson 
---
 target/arm/tcg/translate-a64.c | 22 +-
 target/arm/tcg/translate.c | 12 
 2 files changed, 13 insertions(+), 21 deletions(-)

diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index 5fa1257d32..ac16593699 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -4935,9 +4935,12 @@ static void disas_cond_select(DisasContext *s, uint32_t 
insn)
 
 if (rn == 31 && rm == 31 && (else_inc ^ else_inv)) {
 /* CSET & CSETM.  */
-tcg_gen_setcond_i64(tcg_invert_cond(c.cond), tcg_rd, c.value, zero);
 if (else_inv) {
-tcg_gen_neg_i64(tcg_rd, tcg_rd);
+tcg_gen_negsetcond_i64(tcg_invert_cond(c.cond),
+   tcg_rd, c.value, zero);
+} else {
+tcg_gen_setcond_i64(tcg_invert_cond(c.cond),
+tcg_rd, c.value, zero);
 }
 } else {
 TCGv_i64 t_true = cpu_reg(s, rn);
@@ -8670,13 +8673,10 @@ static void handle_3same_64(DisasContext *s, int 
opcode, bool u,
 }
 break;
 case 0x6: /* CMGT, CMHI */
-/* 64 bit integer comparison, result = test ? (2^64 - 1) : 0.
- * We implement this using setcond (test) and then negating.
- */
 cond = u ? TCG_COND_GTU : TCG_COND_GT;
 do_cmop:
-tcg_gen_setcond_i64(cond, tcg_rd, tcg_rn, tcg_rm);
-tcg_gen_neg_i64(tcg_rd, tcg_rd);
+/* 64 bit integer comparison, result = test ? -1 : 0. */
+tcg_gen_negsetcond_i64(cond, tcg_rd, tcg_rn, tcg_rm);
 break;
 case 0x7: /* CMGE, CMHS */
 cond = u ? TCG_COND_GEU : TCG_COND_GE;
@@ -9265,14 +9265,10 @@ static void handle_2misc_64(DisasContext *s, int 
opcode, bool u,
 }
 break;
 case 0xa: /* CMLT */
-/* 64 bit integer comparison against zero, result is
- * test ? (2^64 - 1) : 0. We implement via setcond(!test) and
- * subtracting 1.
- */
+/* 64 bit integer comparison against zero, result is test ? 1 : 0. */
 cond = TCG_COND_LT;
 do_cmop:
-tcg_gen_setcondi_i64(cond, tcg_rd, tcg_rn, 0);
-tcg_gen_neg_i64(tcg_rd, tcg_rd);
+tcg_gen_negsetcond_i64(cond, tcg_rd, tcg_rn, tcg_constant_i64(0));
 break;
 case 0x8: /* CMGT, CMGE */
 cond = u ? TCG_COND_GE : TCG_COND_GT;
diff --git a/target/arm/tcg/translate.c b/target/arm/tcg/translate.c
index b71ac2d0d5..31d3130e4c 100644
--- a/target/arm/tcg/translate.c
+++ b/target/arm/tcg/translate.c
@@ -2946,13 +2946,11 @@ void gen_gvec_sqrdmlsh_qc(unsigned vece, uint32_t 
rd_ofs, uint32_t rn_ofs,
 #define GEN_CMP0(NAME, COND)\
 static void gen_##NAME##0_i32(TCGv_i32 d, TCGv_i32 a)   \
 {   \
-tcg_gen_setcondi_i32(COND, d, a, 0);\
-tcg_gen_neg_i32(d, d);  \
+tcg_gen_negsetcond_i32(COND, d, a, tcg_constant_i32(0));\
 }   \
 static void gen_##NAME##0_i64(TCGv_i64 d, TCGv_i64 a)   \
 {   \
-tcg_gen_setcondi_i64(COND, d, a, 0);\
-tcg_gen_neg_i64(d, d);  \
+tcg_gen_negsetcond_i64(COND, d, a, tcg_constant_i64(0));\
 }   \
 static void gen_##NAME##0_vec(unsigned vece, TCGv_vec d, TCGv_vec a) \
 {   \
@@ -3863,15 +3861,13 @@ void gen_gvec_mls(unsigned vece, uint32_t rd_ofs, 
uint32_t rn_ofs,
 static void gen_cmtst_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b)
 {
 tcg_gen_and_i32(d, a, b);
-tcg_gen_setcondi_i32(TCG_COND_NE, d, d, 0);
-tcg_gen_neg_i32(d, d);
+tcg_gen_negsetcond_i32(TCG_COND_NE, d, d, tcg_constant_i32(0));
 }
 
 void gen_cmtst_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b)
 {
 tcg_gen_and_i64(d, a, b);
-tcg_gen_setcondi_i64(TCG_COND_NE, d, d, 0);
-tcg_gen_neg_i64(d, d);
+tcg_gen_negsetcond_i64(TCG_COND_NE, d, d, tcg_constant_i64(0));
 }
 
 static void gen_cmtst_vec(unsigned vece, TCGv_vec d, TCGv_vec a, TCGv_vec b)
-- 
2.34.1

[PATCH 15/24] tcg/s390x: Implement negsetcond_*

2023-08-07 Thread Richard Henderson

Signed-off-by: Richard Henderson 
---
 tcg/s390x/tcg-target.h |  4 +-
 tcg/s390x/tcg-target.c.inc | 78 +-
 2 files changed, 54 insertions(+), 28 deletions(-)

diff --git a/tcg/s390x/tcg-target.h b/tcg/s390x/tcg-target.h
index 24e207c2d4..cd3d245be0 100644
--- a/tcg/s390x/tcg-target.h
+++ b/tcg/s390x/tcg-target.h
@@ -104,7 +104,7 @@ extern uint64_t s390_facilities[3];
 #define TCG_TARGET_HAS_mulsh_i32  0
 #define TCG_TARGET_HAS_extrl_i64_i32  0
 #define TCG_TARGET_HAS_extrh_i64_i32  0
-#define TCG_TARGET_HAS_negsetcond_i32 0
+#define TCG_TARGET_HAS_negsetcond_i32 1
 #define TCG_TARGET_HAS_qemu_st8_i32   0
 
 #define TCG_TARGET_HAS_div2_i64   1
@@ -139,7 +139,7 @@ extern uint64_t s390_facilities[3];
 #define TCG_TARGET_HAS_muls2_i64  HAVE_FACILITY(MISC_INSN_EXT2)
 #define TCG_TARGET_HAS_muluh_i64  0
 #define TCG_TARGET_HAS_mulsh_i64  0
-#define TCG_TARGET_HAS_negsetcond_i64 0
+#define TCG_TARGET_HAS_negsetcond_i64 1
 
 #define TCG_TARGET_HAS_qemu_ldst_i128 1
 
diff --git a/tcg/s390x/tcg-target.c.inc b/tcg/s390x/tcg-target.c.inc
index a94f7908d6..ecd8aaf2a1 100644
--- a/tcg/s390x/tcg-target.c.inc
+++ b/tcg/s390x/tcg-target.c.inc
@@ -1266,7 +1266,8 @@ static int tgen_cmp(TCGContext *s, TCGType type, TCGCond 
c, TCGReg r1,
 }
 
 static void tgen_setcond(TCGContext *s, TCGType type, TCGCond cond,
- TCGReg dest, TCGReg c1, TCGArg c2, int c2const)
+ TCGReg dest, TCGReg c1, TCGArg c2,
+ bool c2const, bool neg)
 {
 int cc;
 
@@ -1275,11 +1276,27 @@ static void tgen_setcond(TCGContext *s, TCGType type, 
TCGCond cond,
 /* Emit: d = 0, d = (cc ? 1 : d).  */
 cc = tgen_cmp(s, type, cond, c1, c2, c2const, false);
 tcg_out_movi(s, TCG_TYPE_I64, dest, 0);
-tcg_out_insn(s, RIEg, LOCGHI, dest, 1, cc);
+tcg_out_insn(s, RIEg, LOCGHI, dest, neg ? -1 : 1, cc);
 return;
 }
 
- restart:
+switch (cond) {
+case TCG_COND_GEU:
+case TCG_COND_LTU:
+case TCG_COND_LT:
+case TCG_COND_GE:
+/* Swap operands so that we can use LEU/GTU/GT/LE.  */
+if (!c2const) {
+TCGReg t = c1;
+c1 = c2;
+c2 = t;
+cond = tcg_swap_cond(cond);
+}
+break;
+default:
+break;
+}
+
 switch (cond) {
 case TCG_COND_NE:
 /* X != 0 is X > 0.  */
@@ -1292,11 +1309,20 @@ static void tgen_setcond(TCGContext *s, TCGType type, 
TCGCond cond,
 
 case TCG_COND_GTU:
 case TCG_COND_GT:
-/* The result of a compare has CC=2 for GT and CC=3 unused.
-   ADD LOGICAL WITH CARRY considers (CC & 2) the carry bit.  */
+/*
+ * The result of a compare has CC=2 for GT and CC=3 unused.
+ * ADD LOGICAL WITH CARRY considers (CC & 2) the carry bit.
+ */
 tgen_cmp(s, type, cond, c1, c2, c2const, true);
 tcg_out_movi(s, type, dest, 0);
 tcg_out_insn(s, RRE, ALCGR, dest, dest);
+if (neg) {
+if (type == TCG_TYPE_I32) {
+tcg_out_insn(s, RR, LCR, dest, dest);
+} else {
+tcg_out_insn(s, RRE, LCGR, dest, dest);
+}
+}
 return;
 
 case TCG_COND_EQ:
@@ -1310,27 +1336,17 @@ static void tgen_setcond(TCGContext *s, TCGType type, 
TCGCond cond,
 
 case TCG_COND_LEU:
 case TCG_COND_LE:
-/* As above, but we're looking for borrow, or !carry.
-   The second insn computes d - d - borrow, or -1 for true
-   and 0 for false.  So we must mask to 1 bit afterward.  */
+/*
+ * As above, but we're looking for borrow, or !carry.
+ * The second insn computes d - d - borrow, or -1 for true
+ * and 0 for false.  So we must mask to 1 bit afterward.
+ */
 tgen_cmp(s, type, cond, c1, c2, c2const, true);
 tcg_out_insn(s, RRE, SLBGR, dest, dest);
-tgen_andi(s, type, dest, 1);
-return;
-
-case TCG_COND_GEU:
-case TCG_COND_LTU:
-case TCG_COND_LT:
-case TCG_COND_GE:
-/* Swap operands so that we can use LEU/GTU/GT/LE.  */
-if (!c2const) {
-TCGReg t = c1;
-c1 = c2;
-c2 = t;
-cond = tcg_swap_cond(cond);
-goto restart;
+if (!neg) {
+tgen_andi(s, type, dest, 1);
 }
-break;
+return;
 
 default:
 g_assert_not_reached();
@@ -1339,7 +1355,7 @@ static void tgen_setcond(TCGContext *s, TCGType type, 
TCGCond cond,
 cc = tgen_cmp(s, type, cond, c1, c2, c2const, false);
 /* Emit: d = 0, t = 1, d = (cc ? t : d).  */
 tcg_out_movi(s, TCG_TYPE_I64, dest, 0);
-tcg_out_movi(s, TCG_TYPE_I64, TCG_TMP0, 1);
+tcg_out_movi(s, TCG_TYPE_I64, TCG_TMP0, neg ? -1 : 1);
 tcg_out_insn(s, RRFc, LOCGR, dest, TCG_TMP0, cc);
 }
 
@@ -2288,7 +2304,11 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode

[PATCH for-8.2 00/24] tcg: Introduce negsetcond opcodes

2023-08-07 Thread Richard Henderson

Introduce two new setcond opcode variants which produce -1 instead
of 1 when the condition.  For most of our hosts, producing -1 is
just as easy as 1, and avoid requiring a separate negate instruction.

Use the new opcode in tcg/tcg-op-gvec.c for integral expansion of
generic vector operations.  I looked through target/ for obvious
pairings of setcond and neg.


r~


Richard Henderson (24):
  tcg: Introduce negsetcond opcodes
  tcg: Use tcg_gen_negsetcond_*
  target/alpha: Use tcg_gen_movcond_i64 in gen_fold_mzero
  target/arm: Use tcg_gen_negsetcond_*
  target/m68k: Use tcg_gen_negsetcond_*
  target/openrisc: Use tcg_gen_negsetcond_*
  target/ppc: Use tcg_gen_negsetcond_*
  target/sparc: Use tcg_gen_movcond_i64 in gen_edge
  target/tricore: Replace gen_cond_w with tcg_gen_negsetcond_tl
  tcg/ppc: Implement negsetcond_*
  tcg/ppc: Use the Set Boolean Extension
  tcg/aarch64: Implement negsetcond_*
  tcg/arm: Implement negsetcond_i32
  tcg/riscv: Implement negsetcond_*
  tcg/s390x: Implement negsetcond_*
  tcg/sparc64: Implement negsetcond_*
  tcg/i386: Merge tcg_out_brcond{32,64}
  tcg/i386: Merge tcg_out_setcond{32,64}
  tcg/i386: Merge tcg_out_movcond{32,64}
  tcg/i386: Add cf parameter to tcg_out_cmp
  tcg/i386: Use CMP+SBB in tcg_out_setcond
  tcg/i386: Clear dest first in tcg_out_setcond if possible
  tcg/i386: Use shift in tcg_out_setcond
  tcg/i386: Implement negsetcond_*

 docs/devel/tcg-ops.rst |   6 +
 include/tcg/tcg-op-common.h|   4 +
 include/tcg/tcg-op.h   |   2 +
 include/tcg/tcg-opc.h  |   2 +
 include/tcg/tcg.h  |   1 +
 tcg/aarch64/tcg-target.h   |   2 +
 tcg/arm/tcg-target.h   |   1 +
 tcg/i386/tcg-target.h  |   2 +
 tcg/loongarch64/tcg-target.h   |   3 +
 tcg/mips/tcg-target.h  |   2 +
 tcg/ppc/tcg-target.h   |   2 +
 tcg/riscv/tcg-target.h |   2 +
 tcg/s390x/tcg-target.h |   2 +
 tcg/sparc64/tcg-target.h   |   2 +
 tcg/tci/tcg-target.h   |   2 +
 target/alpha/translate.c   |   7 +-
 target/arm/tcg/translate-a64.c |  22 +-
 target/arm/tcg/translate.c |  12 +-
 target/m68k/translate.c|  24 +-
 target/openrisc/translate.c|   6 +-
 target/sparc/translate.c   |  17 +-
 target/tricore/translate.c |  16 +-
 tcg/optimize.c |  41 +++-
 tcg/tcg-op-gvec.c  |   6 +-
 tcg/tcg-op.c   |  42 +++-
 tcg/tcg.c  |   6 +
 target/ppc/translate/fixedpoint-impl.c.inc |   6 +-
 target/ppc/translate/vmx-impl.c.inc|   8 +-
 tcg/aarch64/tcg-target.c.inc   |  12 +
 tcg/arm/tcg-target.c.inc   |   9 +
 tcg/i386/tcg-target.c.inc  | 265 +
 tcg/ppc/tcg-target.c.inc   | 149 
 tcg/riscv/tcg-target.c.inc |  45 
 tcg/s390x/tcg-target.c.inc |  78 --
 tcg/sparc64/tcg-target.c.inc   |  36 ++-
 35 files changed, 572 insertions(+), 270 deletions(-)

-- 
2.34.1

[PATCH 03/24] target/alpha: Use tcg_gen_movcond_i64 in gen_fold_mzero

2023-08-07 Thread Richard Henderson

The setcond + neg + and sequence is a complex method of
performing a conditional move.

Signed-off-by: Richard Henderson 
---
 target/alpha/translate.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/target/alpha/translate.c b/target/alpha/translate.c
index 846f3d8091..0839182a1f 100644
--- a/target/alpha/translate.c
+++ b/target/alpha/translate.c
@@ -517,10 +517,9 @@ static void gen_fold_mzero(TCGCond cond, TCGv dest, TCGv 
src)
 
 case TCG_COND_GE:
 case TCG_COND_LT:
-/* For >= or <, map -0.0 to +0.0 via comparison and mask.  */
-tcg_gen_setcondi_i64(TCG_COND_NE, dest, src, mzero);
-tcg_gen_neg_i64(dest, dest);
-tcg_gen_and_i64(dest, dest, src);
+/* For >= or <, map -0.0 to +0.0. */
+tcg_gen_movcond_i64(TCG_COND_NE, dest, src, tcg_constant_i64(mzero),
+src, tcg_constant_i64(0));
 break;
 
 default:
-- 
2.34.1

[PATCH 20/24] tcg/i386: Add cf parameter to tcg_out_cmp

2023-08-07 Thread Richard Henderson

Add the parameter to avoid TEST and pass along to tgen_arithi.
All current users pass false.

Signed-off-by: Richard Henderson 
---
 tcg/i386/tcg-target.c.inc | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/tcg/i386/tcg-target.c.inc b/tcg/i386/tcg-target.c.inc
index b88fc14afd..56549ff2a0 100644
--- a/tcg/i386/tcg-target.c.inc
+++ b/tcg/i386/tcg-target.c.inc
@@ -1418,15 +1418,15 @@ static void tcg_out_jxx(TCGContext *s, int opc, 
TCGLabel *l, bool small)
 }
 }
 
-static void tcg_out_cmp(TCGContext *s, TCGArg arg1, TCGArg arg2,
-int const_arg2, int rexw)
+static void tcg_out_cmp(TCGContext *s, int rexw, TCGArg arg1, TCGArg arg2,
+int const_arg2, bool cf)
 {
 if (const_arg2) {
-if (arg2 == 0) {
+if (arg2 == 0 && !cf) {
 /* test r, r */
 tcg_out_modrm(s, OPC_TESTL + rexw, arg1, arg1);
 } else {
-tgen_arithi(s, ARITH_CMP + rexw, arg1, arg2, 0);
+tgen_arithi(s, ARITH_CMP + rexw, arg1, arg2, cf);
 }
 } else {
 tgen_arithr(s, ARITH_CMP + rexw, arg1, arg2);
@@ -1437,7 +1437,7 @@ static void tcg_out_brcond(TCGContext *s, int rexw, 
TCGCond cond,
TCGArg arg1, TCGArg arg2, int const_arg2,
TCGLabel *label, bool small)
 {
-tcg_out_cmp(s, arg1, arg2, const_arg2, rexw);
+tcg_out_cmp(s, rexw, arg1, arg2, const_arg2, false);
 tcg_out_jxx(s, tcg_cond_to_jcc[cond], label, small);
 }
 
@@ -1528,7 +1528,7 @@ static void tcg_out_setcond(TCGContext *s, int rexw, 
TCGCond cond,
 TCGArg dest, TCGArg arg1, TCGArg arg2,
 int const_arg2)
 {
-tcg_out_cmp(s, arg1, arg2, const_arg2, rexw);
+tcg_out_cmp(s, rexw, arg1, arg2, const_arg2, false);
 tcg_out_modrm(s, OPC_SETCC | tcg_cond_to_jcc[cond], 0, dest);
 tcg_out_ext8u(s, dest, dest);
 }
@@ -1594,7 +1594,7 @@ static void tcg_out_movcond(TCGContext *s, int rexw, 
TCGCond cond,
 TCGReg dest, TCGReg c1, TCGArg c2, int const_c2,
 TCGReg v1)
 {
-tcg_out_cmp(s, c1, c2, const_c2, rexw);
+tcg_out_cmp(s, rexw, c1, c2, const_c2, false);
 tcg_out_cmov(s, cond, rexw, dest, v1);
 }
 
@@ -1637,7 +1637,7 @@ static void tcg_out_clz(TCGContext *s, int rexw, TCGReg 
dest, TCGReg arg1,
 tgen_arithi(s, ARITH_XOR + rexw, dest, rexw ? 63 : 31, 0);
 
 /* Since we have destroyed the flags from BSR, we have to re-test.  */
-tcg_out_cmp(s, arg1, 0, 1, rexw);
+tcg_out_cmp(s, rexw, arg1, 0, 1, false);
 tcg_out_cmov(s, TCG_COND_EQ, rexw, dest, arg2);
 }
 }
-- 
2.34.1

[PATCH 21/24] tcg/i386: Use CMP+SBB in tcg_out_setcond

2023-08-07 Thread Richard Henderson

Use the carry bit to optimize some forms of setcond.

Signed-off-by: Richard Henderson 
---
 tcg/i386/tcg-target.c.inc | 50 +++
 1 file changed, 50 insertions(+)

diff --git a/tcg/i386/tcg-target.c.inc b/tcg/i386/tcg-target.c.inc
index 56549ff2a0..e06ac638b0 100644
--- a/tcg/i386/tcg-target.c.inc
+++ b/tcg/i386/tcg-target.c.inc
@@ -1528,6 +1528,56 @@ static void tcg_out_setcond(TCGContext *s, int rexw, 
TCGCond cond,
 TCGArg dest, TCGArg arg1, TCGArg arg2,
 int const_arg2)
 {
+bool inv = false;
+
+switch (cond) {
+case TCG_COND_NE:
+inv = true;
+/* fall through */
+case TCG_COND_EQ:
+/* If arg2 is 0, convert to LTU/GEU vs 1. */
+if (const_arg2 && arg2 == 0) {
+arg2 = 1;
+goto do_ltu;
+}
+break;
+
+case TCG_COND_LEU:
+inv = true;
+/* fall through */
+case TCG_COND_GTU:
+/* If arg2 is a register, swap for LTU/GEU. */
+if (!const_arg2) {
+TCGReg t = arg1;
+arg1 = arg2;
+arg2 = t;
+goto do_ltu;
+}
+break;
+
+case TCG_COND_GEU:
+inv = true;
+/* fall through */
+case TCG_COND_LTU:
+do_ltu:
+/*
+ * Relying on the carry bit, use SBB to produce -1 if LTU, 0 if GEU.
+ * We can then use NEG or INC to produce the desired result.
+ * This is always smaller than the SETCC expansion.
+ */
+tcg_out_cmp(s, rexw, arg1, arg2, const_arg2, true);
+tgen_arithr(s, ARITH_SBB, dest, dest);  /* T:-1 F:0 */
+if (inv) {
+tgen_arithi(s, ARITH_ADD, dest, 1, 0);  /* T:0  F:1 */
+} else {
+tcg_out_modrm(s, OPC_GRP3_Ev, EXT3_NEG, dest);  /* T:1  F:0 */
+}
+return;
+
+default:
+break;
+}
+
 tcg_out_cmp(s, rexw, arg1, arg2, const_arg2, false);
 tcg_out_modrm(s, OPC_SETCC | tcg_cond_to_jcc[cond], 0, dest);
 tcg_out_ext8u(s, dest, dest);
-- 
2.34.1

[PATCH 19/24] tcg/i386: Merge tcg_out_movcond{32,64}

2023-08-07 Thread Richard Henderson

Pass a rexw parameter instead of duplicating the functions.

Signed-off-by: Richard Henderson 
---
 tcg/i386/tcg-target.c.inc | 28 +++-
 1 file changed, 7 insertions(+), 21 deletions(-)

diff --git a/tcg/i386/tcg-target.c.inc b/tcg/i386/tcg-target.c.inc
index ec3c7012d4..b88fc14afd 100644
--- a/tcg/i386/tcg-target.c.inc
+++ b/tcg/i386/tcg-target.c.inc
@@ -1590,24 +1590,14 @@ static void tcg_out_cmov(TCGContext *s, TCGCond cond, 
int rexw,
 }
 }
 
-static void tcg_out_movcond32(TCGContext *s, TCGCond cond, TCGReg dest,
-  TCGReg c1, TCGArg c2, int const_c2,
-  TCGReg v1)
+static void tcg_out_movcond(TCGContext *s, int rexw, TCGCond cond,
+TCGReg dest, TCGReg c1, TCGArg c2, int const_c2,
+TCGReg v1)
 {
-tcg_out_cmp(s, c1, c2, const_c2, 0);
-tcg_out_cmov(s, cond, 0, dest, v1);
+tcg_out_cmp(s, c1, c2, const_c2, rexw);
+tcg_out_cmov(s, cond, rexw, dest, v1);
 }
 
-#if TCG_TARGET_REG_BITS == 64
-static void tcg_out_movcond64(TCGContext *s, TCGCond cond, TCGReg dest,
-  TCGReg c1, TCGArg c2, int const_c2,
-  TCGReg v1)
-{
-tcg_out_cmp(s, c1, c2, const_c2, P_REXW);
-tcg_out_cmov(s, cond, P_REXW, dest, v1);
-}
-#endif
-
 static void tcg_out_ctz(TCGContext *s, int rexw, TCGReg dest, TCGReg arg1,
 TCGArg arg2, bool const_a2)
 {
@@ -2561,8 +2551,8 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode 
opc,
 OP_32_64(setcond):
 tcg_out_setcond(s, rexw, args[3], a0, a1, a2, const_a2);
 break;
-case INDEX_op_movcond_i32:
-tcg_out_movcond32(s, args[5], a0, a1, a2, const_a2, args[3]);
+OP_32_64(movcond):
+tcg_out_movcond(s, rexw, args[5], a0, a1, a2, const_a2, args[3]);
 break;
 
 OP_32_64(bswap16):
@@ -2711,10 +2701,6 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode 
opc,
 }
 break;
 
-case INDEX_op_movcond_i64:
-tcg_out_movcond64(s, args[5], a0, a1, a2, const_a2, args[3]);
-break;
-
 case INDEX_op_bswap64_i64:
 tcg_out_bswap64(s, a0);
 break;
-- 
2.34.1

[PATCH 10/24] tcg/ppc: Implement negsetcond_*

2023-08-07 Thread Richard Henderson

In the general case we simply negate.  However with isel we
may load -1 instead of 1 with no extra effort.

Consolidate EQ0 and NE0 logic.  Replace the NE0 zero-extension
with inversion+negation of EQ0, which is never worse and may
eliminate one insn.  Provide a special case for -EQ0.

Signed-off-by: Richard Henderson 
---
 tcg/ppc/tcg-target.h |   4 +-
 tcg/ppc/tcg-target.c.inc | 127 ---
 2 files changed, 82 insertions(+), 49 deletions(-)

diff --git a/tcg/ppc/tcg-target.h b/tcg/ppc/tcg-target.h
index ba4fd3eb3a..a143b8f1e0 100644
--- a/tcg/ppc/tcg-target.h
+++ b/tcg/ppc/tcg-target.h
@@ -101,7 +101,7 @@ typedef enum {
 #define TCG_TARGET_HAS_muls2_i320
 #define TCG_TARGET_HAS_muluh_i321
 #define TCG_TARGET_HAS_mulsh_i321
-#define TCG_TARGET_HAS_negsetcond_i32   0
+#define TCG_TARGET_HAS_negsetcond_i32   1
 #define TCG_TARGET_HAS_qemu_st8_i32 0
 
 #if TCG_TARGET_REG_BITS == 64
@@ -142,7 +142,7 @@ typedef enum {
 #define TCG_TARGET_HAS_muls2_i640
 #define TCG_TARGET_HAS_muluh_i641
 #define TCG_TARGET_HAS_mulsh_i641
-#define TCG_TARGET_HAS_negsetcond_i64   0
+#define TCG_TARGET_HAS_negsetcond_i64   1
 #endif
 
 #define TCG_TARGET_HAS_qemu_ldst_i128   \
diff --git a/tcg/ppc/tcg-target.c.inc b/tcg/ppc/tcg-target.c.inc
index 511e14b180..10448aa0e6 100644
--- a/tcg/ppc/tcg-target.c.inc
+++ b/tcg/ppc/tcg-target.c.inc
@@ -1548,8 +1548,20 @@ static void tcg_out_cmp(TCGContext *s, int cond, TCGArg 
arg1, TCGArg arg2,
 }
 
 static void tcg_out_setcond_eq0(TCGContext *s, TCGType type,
-TCGReg dst, TCGReg src)
+TCGReg dst, TCGReg src, bool neg)
 {
+if (neg && (TCG_TARGET_REG_BITS == 32 || type == TCG_TYPE_I64)) {
+/*
+ * X != 0 implies X + -1 generates a carry.
+ * RT = (~X + X) + CA
+ *= -1 + CA
+ *= CA ? 0 : -1
+ */
+tcg_out32(s, ADDIC | TAI(TCG_REG_R0, src, -1));
+tcg_out32(s, SUBFE | TAB(dst, src, src));
+return;
+}
+
 if (type == TCG_TYPE_I32) {
 tcg_out32(s, CNTLZW | RS(src) | RA(dst));
 tcg_out_shri32(s, dst, dst, 5);
@@ -1557,18 +1569,28 @@ static void tcg_out_setcond_eq0(TCGContext *s, TCGType 
type,
 tcg_out32(s, CNTLZD | RS(src) | RA(dst));
 tcg_out_shri64(s, dst, dst, 6);
 }
+if (neg) {
+tcg_out32(s, NEG | RT(dst) | RA(dst));
+}
 }
 
-static void tcg_out_setcond_ne0(TCGContext *s, TCGReg dst, TCGReg src)
+static void tcg_out_setcond_ne0(TCGContext *s, TCGType type,
+TCGReg dst, TCGReg src, bool neg)
 {
-/* X != 0 implies X + -1 generates a carry.  Extra addition
-   trickery means: R = X-1 + ~X + C = X-1 + (-X+1) + C = C.  */
-if (dst != src) {
-tcg_out32(s, ADDIC | TAI(dst, src, -1));
-tcg_out32(s, SUBFE | TAB(dst, dst, src));
-} else {
+if (!neg && (TCG_TARGET_REG_BITS == 32 || type == TCG_TYPE_I64)) {
+/*
+ * X != 0 implies X + -1 generates a carry.  Extra addition
+ * trickery means: R = X-1 + ~X + C = X-1 + (-X+1) + C = C.
+ */
 tcg_out32(s, ADDIC | TAI(TCG_REG_R0, src, -1));
 tcg_out32(s, SUBFE | TAB(dst, TCG_REG_R0, src));
+return;
+}
+tcg_out_setcond_eq0(s, type, dst, src, false);
+if (neg) {
+tcg_out32(s, ADDI | TAI(dst, dst, -1));
+} else {
+tcg_out_xori32(s, dst, dst, 1);
 }
 }
 
@@ -1590,9 +1612,10 @@ static TCGReg tcg_gen_setcond_xor(TCGContext *s, TCGReg 
arg1, TCGArg arg2,
 
 static void tcg_out_setcond(TCGContext *s, TCGType type, TCGCond cond,
 TCGArg arg0, TCGArg arg1, TCGArg arg2,
-int const_arg2)
+int const_arg2, bool neg)
 {
-int crop, sh;
+int sh;
+bool inv;
 
 tcg_debug_assert(TCG_TARGET_REG_BITS == 64 || type == TCG_TYPE_I32);
 
@@ -1605,14 +1628,10 @@ static void tcg_out_setcond(TCGContext *s, TCGType 
type, TCGCond cond,
 if (arg2 == 0) {
 switch (cond) {
 case TCG_COND_EQ:
-tcg_out_setcond_eq0(s, type, arg0, arg1);
+tcg_out_setcond_eq0(s, type, arg0, arg1, neg);
 return;
 case TCG_COND_NE:
-if (TCG_TARGET_REG_BITS == 64 && type == TCG_TYPE_I32) {
-tcg_out_ext32u(s, TCG_REG_R0, arg1);
-arg1 = TCG_REG_R0;
-}
-tcg_out_setcond_ne0(s, arg0, arg1);
+tcg_out_setcond_ne0(s, type, arg0, arg1, neg);
 return;
 case TCG_COND_GE:
 tcg_out32(s, NOR | SAB(arg1, arg0, arg1));
@@ -1621,9 +1640,17 @@ static void tcg_out_setcond(TCGContext *s, TCGType type, 
TCGCond cond,
 case TCG_COND_LT:
 /* Extract the sign bit.  */
 if (type == TCG_TYPE_I32) {
-tcg_out_shri32(s, arg0, arg1, 31);
+if (neg) {
+

[PATCH 24/24] tcg/i386: Implement negsetcond_*

2023-08-07 Thread Richard Henderson

Signed-off-by: Richard Henderson 
---
 tcg/i386/tcg-target.h |  4 ++--
 tcg/i386/tcg-target.c.inc | 27 +++
 2 files changed, 21 insertions(+), 10 deletions(-)

diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h
index 41df0e5ae1..1a9025d786 100644
--- a/tcg/i386/tcg-target.h
+++ b/tcg/i386/tcg-target.h
@@ -156,7 +156,7 @@ typedef enum {
 #define TCG_TARGET_HAS_muls2_i321
 #define TCG_TARGET_HAS_muluh_i320
 #define TCG_TARGET_HAS_mulsh_i320
-#define TCG_TARGET_HAS_negsetcond_i32   0
+#define TCG_TARGET_HAS_negsetcond_i32   1
 
 #if TCG_TARGET_REG_BITS == 64
 /* Keep 32-bit values zero-extended in a register.  */
@@ -194,7 +194,7 @@ typedef enum {
 #define TCG_TARGET_HAS_muls2_i641
 #define TCG_TARGET_HAS_muluh_i640
 #define TCG_TARGET_HAS_mulsh_i640
-#define TCG_TARGET_HAS_negsetcond_i64   0
+#define TCG_TARGET_HAS_negsetcond_i64   1
 #define TCG_TARGET_HAS_qemu_st8_i32 0
 #else
 #define TCG_TARGET_HAS_qemu_st8_i32 1
diff --git a/tcg/i386/tcg-target.c.inc b/tcg/i386/tcg-target.c.inc
index f68722b8a5..cc75653bb8 100644
--- a/tcg/i386/tcg-target.c.inc
+++ b/tcg/i386/tcg-target.c.inc
@@ -1526,7 +1526,7 @@ static void tcg_out_brcond2(TCGContext *s, const TCGArg 
*args,
 
 static void tcg_out_setcond(TCGContext *s, int rexw, TCGCond cond,
 TCGArg dest, TCGArg arg1, TCGArg arg2,
-int const_arg2)
+int const_arg2, bool neg)
 {
 bool inv = false;
 bool cleared;
@@ -1567,11 +1567,13 @@ static void tcg_out_setcond(TCGContext *s, int rexw, 
TCGCond cond,
  * This is always smaller than the SETCC expansion.
  */
 tcg_out_cmp(s, rexw, arg1, arg2, const_arg2, true);
-tgen_arithr(s, ARITH_SBB, dest, dest);  /* T:-1 F:0 */
-if (inv) {
-tgen_arithi(s, ARITH_ADD, dest, 1, 0);  /* T:0  F:1 */
-} else {
-tcg_out_modrm(s, OPC_GRP3_Ev, EXT3_NEG, dest);  /* T:1  F:0 */
+tgen_arithr(s, ARITH_SBB + (neg ? rexw : 0), dest, dest); /* T:-1 F:0 
*/
+if (inv && neg) {
+tcg_out_modrm(s, OPC_GRP3_Ev + rexw, EXT3_NOT, dest); /* T:0 F:-1 
*/
+} else if (inv) {
+tgen_arithi(s, ARITH_ADD, dest, 1, 0);/* T:0  F:1 
*/
+} else if (!neg) {
+tcg_out_modrm(s, OPC_GRP3_Ev, EXT3_NEG, dest);/* T:1  F:0 
*/
 }
 return;
 
@@ -1585,7 +1587,8 @@ static void tcg_out_setcond(TCGContext *s, int rexw, 
TCGCond cond,
 if (inv) {
 tcg_out_modrm(s, OPC_GRP3_Ev + rexw, EXT3_NOT, dest);
 }
-tcg_out_shifti(s, SHIFT_SHR + rexw, dest, rexw ? 63 : 31);
+tcg_out_shifti(s, (neg ? SHIFT_SAR : SHIFT_SHR) + rexw,
+   dest, rexw ? 63 : 31);
 return;
 }
 break;
@@ -1611,6 +1614,9 @@ static void tcg_out_setcond(TCGContext *s, int rexw, 
TCGCond cond,
 if (!cleared) {
 tcg_out_ext8u(s, dest, dest);
 }
+if (neg) {
+tcg_out_modrm(s, OPC_GRP3_Ev + rexw, EXT3_NEG, dest);
+}
 }
 
 #if TCG_TARGET_REG_BITS == 32
@@ -2629,7 +2635,10 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode 
opc,
arg_label(args[3]), 0);
 break;
 OP_32_64(setcond):
-tcg_out_setcond(s, rexw, args[3], a0, a1, a2, const_a2);
+tcg_out_setcond(s, rexw, args[3], a0, a1, a2, const_a2, false);
+break;
+OP_32_64(negsetcond):
+tcg_out_setcond(s, rexw, args[3], a0, a1, a2, const_a2, true);
 break;
 OP_32_64(movcond):
 tcg_out_movcond(s, rexw, args[5], a0, a1, a2, const_a2, args[3]);
@@ -3357,6 +3366,8 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode 
op)
 
 case INDEX_op_setcond_i32:
 case INDEX_op_setcond_i64:
+case INDEX_op_negsetcond_i32:
+case INDEX_op_negsetcond_i64:
 return C_O1_I2(q, r, re);
 
 case INDEX_op_movcond_i32:
-- 
2.34.1

[PATCH 06/24] target/openrisc: Use tcg_gen_negsetcond_*

2023-08-07 Thread Richard Henderson

Signed-off-by: Richard Henderson 
---
 target/openrisc/translate.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/target/openrisc/translate.c b/target/openrisc/translate.c
index a86360d4f5..7c6f80daf1 100644
--- a/target/openrisc/translate.c
+++ b/target/openrisc/translate.c
@@ -253,9 +253,8 @@ static void gen_mul(DisasContext *dc, TCGv dest, TCGv srca, 
TCGv srcb)
 
 tcg_gen_muls2_tl(dest, cpu_sr_ov, srca, srcb);
 tcg_gen_sari_tl(t0, dest, TARGET_LONG_BITS - 1);
-tcg_gen_setcond_tl(TCG_COND_NE, cpu_sr_ov, cpu_sr_ov, t0);
+tcg_gen_negsetcond_tl(TCG_COND_NE, cpu_sr_ov, cpu_sr_ov, t0);
 
-tcg_gen_neg_tl(cpu_sr_ov, cpu_sr_ov);
 gen_ove_ov(dc);
 }
 
@@ -309,9 +308,8 @@ static void gen_muld(DisasContext *dc, TCGv srca, TCGv srcb)
 
 tcg_gen_muls2_i64(cpu_mac, high, t1, t2);
 tcg_gen_sari_i64(t1, cpu_mac, 63);
-tcg_gen_setcond_i64(TCG_COND_NE, t1, t1, high);
+tcg_gen_negsetcond_i64(TCG_COND_NE, t1, t1, high);
 tcg_gen_trunc_i64_tl(cpu_sr_ov, t1);
-tcg_gen_neg_tl(cpu_sr_ov, cpu_sr_ov);
 
 gen_ove_ov(dc);
 }
-- 
2.34.1

[PATCH 07/24] target/ppc: Use tcg_gen_negsetcond_*

2023-08-07 Thread Richard Henderson

Signed-off-by: Richard Henderson 
---
 target/ppc/translate/fixedpoint-impl.c.inc | 6 --
 target/ppc/translate/vmx-impl.c.inc| 8 +++-
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/target/ppc/translate/fixedpoint-impl.c.inc 
b/target/ppc/translate/fixedpoint-impl.c.inc
index f47f1a50e8..4ce02fd3a4 100644
--- a/target/ppc/translate/fixedpoint-impl.c.inc
+++ b/target/ppc/translate/fixedpoint-impl.c.inc
@@ -342,12 +342,14 @@ static bool do_set_bool_cond(DisasContext *ctx, arg_X_bi 
*a, bool neg, bool rev)
 uint32_t mask = 0x08 >> (a->bi & 0x03);
 TCGCond cond = rev ? TCG_COND_EQ : TCG_COND_NE;
 TCGv temp = tcg_temp_new();
+TCGv zero = tcg_constant_tl(0);
 
 tcg_gen_extu_i32_tl(temp, cpu_crf[a->bi >> 2]);
 tcg_gen_andi_tl(temp, temp, mask);
-tcg_gen_setcondi_tl(cond, cpu_gpr[a->rt], temp, 0);
 if (neg) {
-tcg_gen_neg_tl(cpu_gpr[a->rt], cpu_gpr[a->rt]);
+tcg_gen_negsetcond_tl(cond, cpu_gpr[a->rt], temp, zero);
+} else {
+tcg_gen_setcond_tl(cond, cpu_gpr[a->rt], temp, zero);
 }
 return true;
 }
diff --git a/target/ppc/translate/vmx-impl.c.inc 
b/target/ppc/translate/vmx-impl.c.inc
index c8712dd7d8..6d7669aabd 100644
--- a/target/ppc/translate/vmx-impl.c.inc
+++ b/target/ppc/translate/vmx-impl.c.inc
@@ -1341,8 +1341,7 @@ static bool trans_VCMPEQUQ(DisasContext *ctx, arg_VC *a)
 tcg_gen_xor_i64(t1, t0, t1);
 
 tcg_gen_or_i64(t1, t1, t2);
-tcg_gen_setcondi_i64(TCG_COND_EQ, t1, t1, 0);
-tcg_gen_neg_i64(t1, t1);
+tcg_gen_negsetcond_i64(TCG_COND_EQ, t1, t1, tcg_constant_i64(0));
 
 set_avr64(a->vrt, t1, true);
 set_avr64(a->vrt, t1, false);
@@ -1365,15 +1364,14 @@ static bool do_vcmpgtq(DisasContext *ctx, arg_VC *a, 
bool sign)
 
 get_avr64(t0, a->vra, false);
 get_avr64(t1, a->vrb, false);
-tcg_gen_setcond_i64(TCG_COND_GTU, t2, t0, t1);
+tcg_gen_negsetcond_i64(TCG_COND_GTU, t2, t0, t1);
 
 get_avr64(t0, a->vra, true);
 get_avr64(t1, a->vrb, true);
 tcg_gen_movcond_i64(TCG_COND_EQ, t2, t0, t1, t2, tcg_constant_i64(0));
-tcg_gen_setcond_i64(sign ? TCG_COND_GT : TCG_COND_GTU, t1, t0, t1);
+tcg_gen_negsetcond_i64(sign ? TCG_COND_GT : TCG_COND_GTU, t1, t0, t1);
 
 tcg_gen_or_i64(t1, t1, t2);
-tcg_gen_neg_i64(t1, t1);
 
 set_avr64(a->vrt, t1, true);
 set_avr64(a->vrt, t1, false);
-- 
2.34.1

[PATCH 08/24] target/sparc: Use tcg_gen_movcond_i64 in gen_edge

2023-08-07 Thread Richard Henderson

The setcond + neg + or sequence is a complex method of
performing a conditional move.

Signed-off-by: Richard Henderson 
---
 target/sparc/translate.c | 17 -
 1 file changed, 4 insertions(+), 13 deletions(-)

diff --git a/target/sparc/translate.c b/target/sparc/translate.c
index bd877a5e4a..fa80a91161 100644
--- a/target/sparc/translate.c
+++ b/target/sparc/translate.c
@@ -2916,7 +2916,7 @@ static void gen_edge(DisasContext *dc, TCGv dst, TCGv s1, 
TCGv s2,
 
 tcg_gen_shr_tl(lo1, tcg_constant_tl(tabl), lo1);
 tcg_gen_shr_tl(lo2, tcg_constant_tl(tabr), lo2);
-tcg_gen_andi_tl(dst, lo1, omask);
+tcg_gen_andi_tl(lo1, lo1, omask);
 tcg_gen_andi_tl(lo2, lo2, omask);
 
 amask = -8;
@@ -2926,18 +2926,9 @@ static void gen_edge(DisasContext *dc, TCGv dst, TCGv 
s1, TCGv s2,
 tcg_gen_andi_tl(s1, s1, amask);
 tcg_gen_andi_tl(s2, s2, amask);
 
-/* We want to compute
-dst = (s1 == s2 ? lo1 : lo1 & lo2).
-   We've already done dst = lo1, so this reduces to
-dst &= (s1 == s2 ? -1 : lo2)
-   Which we perform by
-lo2 |= -(s1 == s2)
-dst &= lo2
-*/
-tcg_gen_setcond_tl(TCG_COND_EQ, lo1, s1, s2);
-tcg_gen_neg_tl(lo1, lo1);
-tcg_gen_or_tl(lo2, lo2, lo1);
-tcg_gen_and_tl(dst, dst, lo2);
+/* Compute dst = (s1 == s2 ? lo1 : lo1 & lo2). */
+tcg_gen_and_tl(lo2, lo2, lo1);
+tcg_gen_movcond_tl(TCG_COND_EQ, dst, s1, s2, lo1, lo2);
 }
 
 static void gen_alignaddr(TCGv dst, TCGv s1, TCGv s2, bool left)
-- 
2.34.1

[PATCH 11/24] tcg/ppc: Use the Set Boolean Extension

2023-08-07 Thread Richard Henderson

The SETBC family of instructions requires exactly two insns for
all comparisions, saving 0-3 insns per (neg)setcond.

Signed-off-by: Richard Henderson 
---
 tcg/ppc/tcg-target.c.inc | 22 ++
 1 file changed, 22 insertions(+)

diff --git a/tcg/ppc/tcg-target.c.inc b/tcg/ppc/tcg-target.c.inc
index 10448aa0e6..090f11e71c 100644
--- a/tcg/ppc/tcg-target.c.inc
+++ b/tcg/ppc/tcg-target.c.inc
@@ -447,6 +447,11 @@ static bool tcg_target_const_match(int64_t val, TCGType 
type, int ct)
 #define TW XO31( 4)
 #define TRAP   (TW | TO(31))
 
+#define SETBCXO31(384)  /* v3.10 */
+#define SETBCR   XO31(416)  /* v3.10 */
+#define SETNBC   XO31(448)  /* v3.10 */
+#define SETNBCR  XO31(480)  /* v3.10 */
+
 #define NOPORI  /* ori 0,0,0 */
 
 #define LVXXO31(103)
@@ -1624,6 +1629,23 @@ static void tcg_out_setcond(TCGContext *s, TCGType type, 
TCGCond cond,
 arg2 = (uint32_t)arg2;
 }
 
+/* With SETBC/SETBCR, we can always implement with 2 insns. */
+if (have_isa_3_10) {
+tcg_insn_unit bi, opc;
+
+tcg_out_cmp(s, cond, arg1, arg2, const_arg2, 7, type);
+
+/* Re-use tcg_to_bc for BI and BO_COND_{TRUE,FALSE}. */
+bi = tcg_to_bc[cond] & (0x1f << 16);
+if (tcg_to_bc[cond] & BO(8)) {
+opc = neg ? SETNBC : SETBC;
+} else {
+opc = neg ? SETNBCR : SETBCR;
+}
+tcg_out32(s, opc | RT(arg0) | bi);
+return;
+}
+
 /* Handle common and trivial cases before handling anything else.  */
 if (arg2 == 0) {
 switch (cond) {
-- 
2.34.1

[PATCH 6/6] spapr: implement H_SET_MODE debug facilities

2023-08-07 Thread Nicholas Piggin

Wire up the H_SET_MODE debug resources to the CIABR and DAWR0 debug
facilities in TCG.

Signed-off-by: Nicholas Piggin 
---
 hw/ppc/spapr_hcall.c | 57 
 1 file changed, 57 insertions(+)

diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
index 9b1f225d4a..b7dc388f2f 100644
--- a/hw/ppc/spapr_hcall.c
+++ b/hw/ppc/spapr_hcall.c
@@ -3,6 +3,7 @@
 #include "qapi/error.h"
 #include "sysemu/hw_accel.h"
 #include "sysemu/runstate.h"
+#include "sysemu/tcg.h"
 #include "qemu/log.h"
 #include "qemu/main-loop.h"
 #include "qemu/module.h"
@@ -789,6 +790,54 @@ static target_ulong h_logical_dcbf(PowerPCCPU *cpu, 
SpaprMachineState *spapr,
 return H_SUCCESS;
 }
 
+static target_ulong h_set_mode_resource_set_ciabr(PowerPCCPU *cpu,
+  SpaprMachineState *spapr,
+  target_ulong mflags,
+  target_ulong value1,
+  target_ulong value2)
+{
+CPUPPCState *env = >env;
+
+assert(tcg_enabled()); /* KVM will have handled this */
+
+if (mflags) {
+return H_UNSUPPORTED_FLAG;
+}
+if (value2) {
+return H_P4;
+}
+if ((value1 & PPC_BITMASK(62, 63)) == 0x3) {
+return H_P3;
+}
+
+ppc_store_ciabr(env, value1);
+
+return H_SUCCESS;
+}
+
+static target_ulong h_set_mode_resource_set_dawr0(PowerPCCPU *cpu,
+  SpaprMachineState *spapr,
+  target_ulong mflags,
+  target_ulong value1,
+  target_ulong value2)
+{
+CPUPPCState *env = >env;
+
+assert(tcg_enabled()); /* KVM will have handled this */
+
+if (mflags) {
+return H_UNSUPPORTED_FLAG;
+}
+if (value2 & PPC_BIT(61)) {
+return H_P4;
+}
+
+ppc_store_dawr0(env, value1);
+ppc_store_dawrx0(env, value2);
+
+return H_SUCCESS;
+}
+
 static target_ulong h_set_mode_resource_le(PowerPCCPU *cpu,
SpaprMachineState *spapr,
target_ulong mflags,
@@ -858,6 +907,14 @@ static target_ulong h_set_mode(PowerPCCPU *cpu, 
SpaprMachineState *spapr,
 target_ulong ret = H_P2;
 
 switch (resource) {
+case H_SET_MODE_RESOURCE_SET_CIABR:
+ret = h_set_mode_resource_set_ciabr(cpu, spapr, args[0], args[2],
+args[3]);
+break;
+case H_SET_MODE_RESOURCE_SET_DAWR0:
+ret = h_set_mode_resource_set_dawr0(cpu, spapr, args[0], args[2],
+args[3]);
+break;
 case H_SET_MODE_RESOURCE_LE:
 ret = h_set_mode_resource_le(cpu, spapr, args[0], args[2], args[3]);
 break;
-- 
2.40.1

[PATCH 2/6] target/ppc: Improve book3s branch trace interrupt for v2.07S

2023-08-07 Thread Nicholas Piggin

Improve the emulation accuracy of the single step and branch trace
interrupts for v2.07S. Set SRR1[33]=1, and set SIAR to completed
instruction address.

Signed-off-by: Nicholas Piggin 
---
 target/ppc/excp_helper.c | 16 +++-
 target/ppc/helper.h  |  1 +
 target/ppc/translate.c   | 21 +++--
 3 files changed, 27 insertions(+), 11 deletions(-)

diff --git a/target/ppc/excp_helper.c b/target/ppc/excp_helper.c
index 9aa8e46566..2d6aef5e66 100644
--- a/target/ppc/excp_helper.c
+++ b/target/ppc/excp_helper.c
@@ -1571,9 +1571,11 @@ static void powerpc_excp_books(PowerPCCPU *cpu, int excp)
 }
 }
 break;
+case POWERPC_EXCP_TRACE: /* Trace exception  */
+msr |= env->error_code;
+/* fall through */
 case POWERPC_EXCP_DSEG:  /* Data segment exception   */
 case POWERPC_EXCP_ISEG:  /* Instruction segment exception*/
-case POWERPC_EXCP_TRACE: /* Trace exception  */
 case POWERPC_EXCP_SDOOR: /* Doorbell interrupt   */
 case POWERPC_EXCP_PERFM: /* Performance monitor interrupt*/
 break;
@@ -3168,6 +3170,18 @@ void helper_book3s_msgsndp(CPUPPCState *env, 
target_ulong rb)
 }
 #endif /* TARGET_PPC64 */
 
+/* Single-step tracing */
+void helper_book3s_trace(CPUPPCState *env, target_ulong prev_ip)
+{
+uint32_t error_code = 0;
+if (env->insns_flags2 & PPC2_ISA207S) {
+/* Load/store reporting, SRR1[35, 36] and SDAR, are not implemented. */
+env->spr[SPR_POWER_SIAR] = prev_ip;
+error_code = PPC_BIT(33);
+}
+raise_exception_err(env, POWERPC_EXCP_TRACE, error_code);
+}
+
 void ppc_cpu_do_unaligned_access(CPUState *cs, vaddr vaddr,
  MMUAccessType access_type,
  int mmu_idx, uintptr_t retaddr)
diff --git a/target/ppc/helper.h b/target/ppc/helper.h
index abec6fe341..f4db32ee1a 100644
--- a/target/ppc/helper.h
+++ b/target/ppc/helper.h
@@ -32,6 +32,7 @@ DEF_HELPER_2(read_pmc, tl, env, i32)
 DEF_HELPER_2(insns_inc, void, env, i32)
 DEF_HELPER_1(handle_pmc5_overflow, void, env)
 #endif
+DEF_HELPER_2(book3s_trace, void, env, tl)
 DEF_HELPER_1(check_tlb_flush_local, void, env)
 DEF_HELPER_1(check_tlb_flush_global, void, env)
 #endif
diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index 06530dd782..5051596670 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -338,8 +338,9 @@ static void gen_ppc_maybe_interrupt(DisasContext *ctx)
  * The exception can be either POWERPC_EXCP_TRACE (on most PowerPCs) or
  * POWERPC_EXCP_DEBUG (on BookE).
  */
-static uint32_t gen_prep_dbgex(DisasContext *ctx)
+static void gen_debug_exception(DisasContext *ctx)
 {
+#if !defined(CONFIG_USER_ONLY)
 if (ctx->flags & POWERPC_FLAG_DE) {
 target_ulong dbsr = 0;
 if (ctx->singlestep_enabled & CPU_SINGLE_STEP) {
@@ -352,16 +353,16 @@ static uint32_t gen_prep_dbgex(DisasContext *ctx)
 gen_load_spr(t0, SPR_BOOKE_DBSR);
 tcg_gen_ori_tl(t0, t0, dbsr);
 gen_store_spr(SPR_BOOKE_DBSR, t0);
-return POWERPC_EXCP_DEBUG;
+gen_helper_raise_exception(cpu_env,
+   tcg_constant_i32(POWERPC_EXCP_DEBUG));
+ctx->base.is_jmp = DISAS_NORETURN;
 } else {
-return POWERPC_EXCP_TRACE;
+TCGv t0 = tcg_temp_new();
+tcg_gen_movi_tl(t0, ctx->cia);
+gen_helper_book3s_trace(cpu_env, t0);
+ctx->base.is_jmp = DISAS_NORETURN;
 }
-}
-
-static void gen_debug_exception(DisasContext *ctx)
-{
-gen_helper_raise_exception(cpu_env, tcg_constant_i32(gen_prep_dbgex(ctx)));
-ctx->base.is_jmp = DISAS_NORETURN;
+#endif
 }
 
 static inline void gen_inval_exception(DisasContext *ctx, uint32_t error)
@@ -4184,7 +4185,7 @@ static inline bool use_goto_tb(DisasContext *ctx, 
target_ulong dest)
 static void gen_lookup_and_goto_ptr(DisasContext *ctx)
 {
 if (unlikely(ctx->singlestep_enabled)) {
-gen_debug_exception(ctx);
+gen_debug_exception(ctx, false);
 } else {
 /*
  * tcg_gen_lookup_and_goto_ptr will exit the TB if
-- 
2.40.1

[PATCH 23/24] tcg/i386: Use shift in tcg_out_setcond

2023-08-07 Thread Richard Henderson

For LT/GE vs zero, shift down the sign bit.

Signed-off-by: Richard Henderson 
---
 tcg/i386/tcg-target.c.inc | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/tcg/i386/tcg-target.c.inc b/tcg/i386/tcg-target.c.inc
index cca49fe63a..f68722b8a5 100644
--- a/tcg/i386/tcg-target.c.inc
+++ b/tcg/i386/tcg-target.c.inc
@@ -1575,6 +1575,21 @@ static void tcg_out_setcond(TCGContext *s, int rexw, 
TCGCond cond,
 }
 return;
 
+case TCG_COND_GE:
+inv = true;
+/* fall through */
+case TCG_COND_LT:
+/* If arg2 is 0, extract the sign bit. */
+if (const_arg2 && arg2 == 0) {
+tcg_out_mov(s, rexw ? TCG_TYPE_I64 : TCG_TYPE_I32, dest, arg1);
+if (inv) {
+tcg_out_modrm(s, OPC_GRP3_Ev + rexw, EXT3_NOT, dest);
+}
+tcg_out_shifti(s, SHIFT_SHR + rexw, dest, rexw ? 63 : 31);
+return;
+}
+break;
+
 default:
 break;
 }
-- 
2.34.1

[PATCH 14/24] tcg/riscv: Implement negsetcond_*

2023-08-07 Thread Richard Henderson

Signed-off-by: Richard Henderson 
---
 tcg/riscv/tcg-target.h |  4 ++--
 tcg/riscv/tcg-target.c.inc | 45 ++
 2 files changed, 47 insertions(+), 2 deletions(-)

diff --git a/tcg/riscv/tcg-target.h b/tcg/riscv/tcg-target.h
index b2961fec8e..7e8ac48a7d 100644
--- a/tcg/riscv/tcg-target.h
+++ b/tcg/riscv/tcg-target.h
@@ -120,7 +120,7 @@ extern bool have_zbb;
 #define TCG_TARGET_HAS_ctpop_i32have_zbb
 #define TCG_TARGET_HAS_brcond2  1
 #define TCG_TARGET_HAS_setcond2 1
-#define TCG_TARGET_HAS_negsetcond_i32   0
+#define TCG_TARGET_HAS_negsetcond_i32   1
 #define TCG_TARGET_HAS_qemu_st8_i32 0
 
 #define TCG_TARGET_HAS_movcond_i64  1
@@ -159,7 +159,7 @@ extern bool have_zbb;
 #define TCG_TARGET_HAS_muls2_i640
 #define TCG_TARGET_HAS_muluh_i641
 #define TCG_TARGET_HAS_mulsh_i641
-#define TCG_TARGET_HAS_negsetcond_i64   0
+#define TCG_TARGET_HAS_negsetcond_i64   1
 
 #define TCG_TARGET_HAS_qemu_ldst_i128   0
 
diff --git a/tcg/riscv/tcg-target.c.inc b/tcg/riscv/tcg-target.c.inc
index eeaeb6b6e3..232b616af3 100644
--- a/tcg/riscv/tcg-target.c.inc
+++ b/tcg/riscv/tcg-target.c.inc
@@ -936,6 +936,44 @@ static void tcg_out_setcond(TCGContext *s, TCGCond cond, 
TCGReg ret,
 }
 }
 
+static void tcg_out_negsetcond(TCGContext *s, TCGCond cond, TCGReg ret,
+   TCGReg arg1, tcg_target_long arg2, bool c2)
+{
+int tmpflags;
+TCGReg tmp;
+
+/* For LT/GE comparison against 0, replicate the sign bit. */
+if (c2 && arg2 == 0) {
+switch (cond) {
+case TCG_COND_GE:
+tcg_out_opc_imm(s, OPC_XORI, ret, arg1, -1);
+arg1 = ret;
+/* fall through */
+case TCG_COND_LT:
+tcg_out_opc_imm(s, OPC_SRAI, ret, arg1, TCG_TARGET_REG_BITS - 1);
+return;
+default:
+break;
+}
+}
+
+tmpflags = tcg_out_setcond_int(s, cond, ret, arg1, arg2, c2);
+tmp = tmpflags & ~SETCOND_FLAGS;
+
+/* If intermediate result is zero/non-zero: test != 0. */
+if (tmpflags & SETCOND_NEZ) {
+tcg_out_opc_reg(s, OPC_SLTU, ret, TCG_REG_ZERO, tmp);
+tmp = ret;
+}
+
+/* Produce the 0/-1 result. */
+if (tmpflags & SETCOND_INV) {
+tcg_out_opc_imm(s, OPC_ADDI, ret, tmp, -1);
+} else {
+tcg_out_opc_reg(s, OPC_SUB, ret, TCG_REG_ZERO, tmp);
+}
+}
+
 static void tcg_out_movcond_zicond(TCGContext *s, TCGReg ret, TCGReg test_ne,
int val1, bool c_val1,
int val2, bool c_val2)
@@ -1782,6 +1820,11 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
 tcg_out_setcond(s, args[3], a0, a1, a2, c2);
 break;
 
+case INDEX_op_negsetcond_i32:
+case INDEX_op_negsetcond_i64:
+tcg_out_negsetcond(s, args[3], a0, a1, a2, c2);
+break;
+
 case INDEX_op_movcond_i32:
 case INDEX_op_movcond_i64:
 tcg_out_movcond(s, args[5], a0, a1, a2, c2,
@@ -1910,6 +1953,8 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode 
op)
 case INDEX_op_xor_i64:
 case INDEX_op_setcond_i32:
 case INDEX_op_setcond_i64:
+case INDEX_op_negsetcond_i32:
+case INDEX_op_negsetcond_i64:
 return C_O1_I2(r, r, rI);
 
 case INDEX_op_andc_i32:
-- 
2.34.1

[PATCH 12/24] tcg/aarch64: Implement negsetcond_*

2023-08-07 Thread Richard Henderson

Trivial, as aarch64 has an instruction for this: CSETM.

Signed-off-by: Richard Henderson 
---
 tcg/aarch64/tcg-target.h |  4 ++--
 tcg/aarch64/tcg-target.c.inc | 12 
 2 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/tcg/aarch64/tcg-target.h b/tcg/aarch64/tcg-target.h
index 6080fddf73..e3faa9cff4 100644
--- a/tcg/aarch64/tcg-target.h
+++ b/tcg/aarch64/tcg-target.h
@@ -94,7 +94,7 @@ typedef enum {
 #define TCG_TARGET_HAS_mulsh_i320
 #define TCG_TARGET_HAS_extrl_i64_i320
 #define TCG_TARGET_HAS_extrh_i64_i320
-#define TCG_TARGET_HAS_negsetcond_i32   0
+#define TCG_TARGET_HAS_negsetcond_i32   1
 #define TCG_TARGET_HAS_qemu_st8_i32 0
 
 #define TCG_TARGET_HAS_div_i64  1
@@ -130,7 +130,7 @@ typedef enum {
 #define TCG_TARGET_HAS_muls2_i640
 #define TCG_TARGET_HAS_muluh_i641
 #define TCG_TARGET_HAS_mulsh_i641
-#define TCG_TARGET_HAS_negsetcond_i64   0
+#define TCG_TARGET_HAS_negsetcond_i64   1
 
 /*
  * Without FEAT_LSE2, we must use LDXP+STXP to implement atomic 128-bit load,
diff --git a/tcg/aarch64/tcg-target.c.inc b/tcg/aarch64/tcg-target.c.inc
index 35ca80cd56..7d8d114c9e 100644
--- a/tcg/aarch64/tcg-target.c.inc
+++ b/tcg/aarch64/tcg-target.c.inc
@@ -2262,6 +2262,16 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
  TCG_REG_XZR, tcg_invert_cond(args[3]));
 break;
 
+case INDEX_op_negsetcond_i32:
+a2 = (int32_t)a2;
+/* FALLTHRU */
+case INDEX_op_negsetcond_i64:
+tcg_out_cmp(s, ext, a1, a2, c2);
+/* Use CSETM alias of CSINV Wd, WZR, WZR, invert(cond).  */
+tcg_out_insn(s, 3506, CSINV, ext, a0, TCG_REG_XZR,
+ TCG_REG_XZR, tcg_invert_cond(args[3]));
+break;
+
 case INDEX_op_movcond_i32:
 a2 = (int32_t)a2;
 /* FALLTHRU */
@@ -2868,6 +2878,8 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode 
op)
 case INDEX_op_sub_i64:
 case INDEX_op_setcond_i32:
 case INDEX_op_setcond_i64:
+case INDEX_op_negsetcond_i32:
+case INDEX_op_negsetcond_i64:
 return C_O1_I2(r, r, rA);
 
 case INDEX_op_mul_i32:
-- 
2.34.1

[PATCH 16/24] tcg/sparc64: Implement negsetcond_*

2023-08-07 Thread Richard Henderson

Signed-off-by: Richard Henderson 
---
 tcg/sparc64/tcg-target.h |  4 ++--
 tcg/sparc64/tcg-target.c.inc | 36 ++--
 2 files changed, 28 insertions(+), 12 deletions(-)

diff --git a/tcg/sparc64/tcg-target.h b/tcg/sparc64/tcg-target.h
index 1faadc704b..4bbd825bd8 100644
--- a/tcg/sparc64/tcg-target.h
+++ b/tcg/sparc64/tcg-target.h
@@ -112,7 +112,7 @@ extern bool use_vis3_instructions;
 #define TCG_TARGET_HAS_muls2_i321
 #define TCG_TARGET_HAS_muluh_i320
 #define TCG_TARGET_HAS_mulsh_i320
-#define TCG_TARGET_HAS_negsetcond_i32   0
+#define TCG_TARGET_HAS_negsetcond_i32   1
 #define TCG_TARGET_HAS_qemu_st8_i32 0
 
 #define TCG_TARGET_HAS_extrl_i64_i321
@@ -150,7 +150,7 @@ extern bool use_vis3_instructions;
 #define TCG_TARGET_HAS_muls2_i640
 #define TCG_TARGET_HAS_muluh_i64use_vis3_instructions
 #define TCG_TARGET_HAS_mulsh_i640
-#define TCG_TARGET_HAS_negsetcond_i64   0
+#define TCG_TARGET_HAS_negsetcond_i64   1
 
 #define TCG_TARGET_HAS_qemu_ldst_i128   0
 
diff --git a/tcg/sparc64/tcg-target.c.inc b/tcg/sparc64/tcg-target.c.inc
index ffcb879211..37839f9a21 100644
--- a/tcg/sparc64/tcg-target.c.inc
+++ b/tcg/sparc64/tcg-target.c.inc
@@ -720,7 +720,7 @@ static void tcg_out_movcond_i64(TCGContext *s, TCGCond 
cond, TCGReg ret,
 }
 
 static void tcg_out_setcond_i32(TCGContext *s, TCGCond cond, TCGReg ret,
-TCGReg c1, int32_t c2, int c2const)
+TCGReg c1, int32_t c2, int c2const, bool neg)
 {
 /* For 32-bit comparisons, we can play games with ADDC/SUBC.  */
 switch (cond) {
@@ -760,22 +760,30 @@ static void tcg_out_setcond_i32(TCGContext *s, TCGCond 
cond, TCGReg ret,
 default:
 tcg_out_cmp(s, c1, c2, c2const);
 tcg_out_movi_s13(s, ret, 0);
-tcg_out_movcc(s, cond, MOVCC_ICC, ret, 1, 1);
+tcg_out_movcc(s, cond, MOVCC_ICC, ret, neg ? -1 : 1, 1);
 return;
 }
 
 tcg_out_cmp(s, c1, c2, c2const);
 if (cond == TCG_COND_LTU) {
-tcg_out_arithi(s, ret, TCG_REG_G0, 0, ARITH_ADDC);
+if (neg) {
+tcg_out_arithi(s, ret, TCG_REG_G0, 0, ARITH_SUBC);
+} else {
+tcg_out_arithi(s, ret, TCG_REG_G0, 0, ARITH_ADDC);
+}
 } else {
-tcg_out_arithi(s, ret, TCG_REG_G0, -1, ARITH_SUBC);
+if (neg) {
+tcg_out_arithi(s, ret, TCG_REG_G0, -1, ARITH_ADDC);
+} else {
+tcg_out_arithi(s, ret, TCG_REG_G0, -1, ARITH_SUBC);
+}
 }
 }
 
 static void tcg_out_setcond_i64(TCGContext *s, TCGCond cond, TCGReg ret,
-TCGReg c1, int32_t c2, int c2const)
+TCGReg c1, int32_t c2, int c2const, bool neg)
 {
-if (use_vis3_instructions) {
+if (use_vis3_instructions && !neg) {
 switch (cond) {
 case TCG_COND_NE:
 if (c2 != 0) {
@@ -796,11 +804,11 @@ static void tcg_out_setcond_i64(TCGContext *s, TCGCond 
cond, TCGReg ret,
if the input does not overlap the output.  */
 if (c2 == 0 && !is_unsigned_cond(cond) && c1 != ret) {
 tcg_out_movi_s13(s, ret, 0);
-tcg_out_movr(s, cond, ret, c1, 1, 1);
+tcg_out_movr(s, cond, ret, c1, neg ? -1 : 1, 1);
 } else {
 tcg_out_cmp(s, c1, c2, c2const);
 tcg_out_movi_s13(s, ret, 0);
-tcg_out_movcc(s, cond, MOVCC_XCC, ret, 1, 1);
+tcg_out_movcc(s, cond, MOVCC_XCC, ret, neg ? -1 : 1, 1);
 }
 }
 
@@ -1355,7 +1363,10 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
 tcg_out_brcond_i32(s, a2, a0, a1, const_args[1], arg_label(args[3]));
 break;
 case INDEX_op_setcond_i32:
-tcg_out_setcond_i32(s, args[3], a0, a1, a2, c2);
+tcg_out_setcond_i32(s, args[3], a0, a1, a2, c2, false);
+break;
+case INDEX_op_negsetcond_i32:
+tcg_out_setcond_i32(s, args[3], a0, a1, a2, c2, true);
 break;
 case INDEX_op_movcond_i32:
 tcg_out_movcond_i32(s, args[5], a0, a1, a2, c2, args[3], 
const_args[3]);
@@ -1437,7 +1448,10 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
 tcg_out_brcond_i64(s, a2, a0, a1, const_args[1], arg_label(args[3]));
 break;
 case INDEX_op_setcond_i64:
-tcg_out_setcond_i64(s, args[3], a0, a1, a2, c2);
+tcg_out_setcond_i64(s, args[3], a0, a1, a2, c2, false);
+break;
+case INDEX_op_negsetcond_i64:
+tcg_out_setcond_i64(s, args[3], a0, a1, a2, c2, true);
 break;
 case INDEX_op_movcond_i64:
 tcg_out_movcond_i64(s, args[5], a0, a1, a2, c2, args[3], 
const_args[3]);
@@ -1564,6 +1578,8 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode 
op)
 case INDEX_op_sar_i64:
 case INDEX_op_setcond_i32:
 case INDEX_op_setcond_i64:
+case INDEX_op_negsetcond_i32:
+case INDEX_op_negsetcond_i64:
 return C_O1_I2(r, rZ, rJ);
 
 case INDEX_op_brcond_i32:
-- 
2.34.1

[PATCH 4/6] target/ppc: Implement breakpoint debug facility for v2.07S

2023-08-07 Thread Nicholas Piggin

ISA v2.07S introduced the breakpoint facility based on the CIABR SPR.
Implement this in TCG.

Signed-off-by: Nicholas Piggin 
---
 target/ppc/cpu.c | 27 ++
 target/ppc/cpu.h |  3 +++
 target/ppc/cpu_init.c|  5 -
 target/ppc/excp_helper.c | 42 
 target/ppc/helper.h  |  1 +
 target/ppc/internal.h|  2 ++
 target/ppc/machine.c |  4 
 target/ppc/misc_helper.c |  5 +
 target/ppc/spr_common.h  |  1 +
 target/ppc/translate.c   | 10 +-
 10 files changed, 98 insertions(+), 2 deletions(-)

diff --git a/target/ppc/cpu.c b/target/ppc/cpu.c
index 424f2e1741..d9c665ce18 100644
--- a/target/ppc/cpu.c
+++ b/target/ppc/cpu.c
@@ -102,6 +102,33 @@ void ppc_store_lpcr(PowerPCCPU *cpu, target_ulong val)
 
 ppc_maybe_interrupt(env);
 }
+
+#if defined(TARGET_PPC64)
+void ppc_update_ciabr(CPUPPCState *env)
+{
+CPUState *cs = env_cpu(env);
+target_ulong ciabr = env->spr[SPR_CIABR];
+target_ulong ciea, priv;
+
+ciea = ciabr & PPC_BITMASK(0, 61);
+priv = ciabr & PPC_BITMASK(62, 63);
+
+if (env->ciabr_breakpoint) {
+cpu_breakpoint_remove_by_ref(cs, env->ciabr_breakpoint);
+env->ciabr_breakpoint = NULL;
+}
+
+if (priv) {
+cpu_breakpoint_insert(cs, ciea, BP_CPU, >ciabr_breakpoint);
+}
+}
+
+void ppc_store_ciabr(CPUPPCState *env, target_ulong val)
+{
+env->spr[SPR_CIABR] = val;
+ppc_update_ciabr(env);
+}
+#endif
 #endif
 
 static inline void fpscr_set_rounding_mode(CPUPPCState *env)
diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
index 25fac9577a..d97fabd8f6 100644
--- a/target/ppc/cpu.h
+++ b/target/ppc/cpu.h
@@ -1137,6 +1137,7 @@ struct CPUArchState {
 /* MMU context, only relevant for full system emulation */
 #if defined(TARGET_PPC64)
 ppc_slb_t slb[MAX_SLB_ENTRIES]; /* PowerPC 64 SLB area */
+struct CPUBreakpoint *ciabr_breakpoint;
 #endif
 target_ulong sr[32];   /* segment registers */
 uint32_t nb_BATs;  /* number of BATs */
@@ -1403,6 +1404,8 @@ void ppc_translate_init(void);
 #if !defined(CONFIG_USER_ONLY)
 void ppc_store_sdr1(CPUPPCState *env, target_ulong value);
 void ppc_store_lpcr(PowerPCCPU *cpu, target_ulong val);
+void ppc_update_ciabr(CPUPPCState *env);
+void ppc_store_ciabr(CPUPPCState *env, target_ulong value);
 #endif /* !defined(CONFIG_USER_ONLY) */
 void ppc_store_msr(CPUPPCState *env, target_ulong value);
 
diff --git a/target/ppc/cpu_init.c b/target/ppc/cpu_init.c
index 02b7aad9b0..a2820839b3 100644
--- a/target/ppc/cpu_init.c
+++ b/target/ppc/cpu_init.c
@@ -5127,7 +5127,7 @@ static void register_book3s_207_dbg_sprs(CPUPPCState *env)
 spr_register_kvm_hv(env, SPR_CIABR, "CIABR",
 SPR_NOACCESS, SPR_NOACCESS,
 SPR_NOACCESS, SPR_NOACCESS,
-_read_generic, _write_generic,
+_read_generic, _write_ciabr,
 KVM_REG_PPC_CIABR, 0x);
 }
 
@@ -7149,6 +7149,7 @@ static void ppc_cpu_reset_hold(Object *obj)
 env->nip = env->hreset_vector | env->excp_prefix;
 
 if (tcg_enabled()) {
+cpu_breakpoint_remove_all(s, BP_CPU);
 if (env->mmu_model != POWERPC_MMU_REAL) {
 ppc_tlb_invalidate_all(env);
 }
@@ -7336,6 +7337,8 @@ static const struct TCGCPUOps ppc_tcg_ops = {
   .cpu_exec_exit = ppc_cpu_exec_exit,
   .do_unaligned_access = ppc_cpu_do_unaligned_access,
   .do_transaction_failed = ppc_cpu_do_transaction_failed,
+  .debug_excp_handler = ppc_cpu_debug_excp_handler,
+  .debug_check_breakpoint = ppc_cpu_debug_check_breakpoint,
 #endif /* !CONFIG_USER_ONLY */
 };
 #endif /* CONFIG_TCG */
diff --git a/target/ppc/excp_helper.c b/target/ppc/excp_helper.c
index 2d6aef5e66..9c9881ae19 100644
--- a/target/ppc/excp_helper.c
+++ b/target/ppc/excp_helper.c
@@ -3257,5 +3257,47 @@ void ppc_cpu_do_transaction_failed(CPUState *cs, hwaddr 
physaddr,
 cs->exception_index = POWERPC_EXCP_MCHECK;
 cpu_loop_exit_restore(cs, retaddr);
 }
+
+void ppc_cpu_debug_excp_handler(CPUState *cs)
+{
+#if defined(TARGET_PPC64)
+CPUPPCState *env = cs->env_ptr;
+
+if (env->insns_flags2 & PPC2_ISA207S) {
+if (cpu_breakpoint_test(cs, env->nip, BP_CPU)) {
+raise_exception_err(env, POWERPC_EXCP_TRACE,
+PPC_BIT(33) | PPC_BIT(43));
+}
+}
+#endif
+}
+
+bool ppc_cpu_debug_check_breakpoint(CPUState *cs)
+{
+#if defined(TARGET_PPC64)
+CPUPPCState *env = cs->env_ptr;
+
+if (env->insns_flags2 & PPC2_ISA207S) {
+target_ulong priv;
+
+priv = env->spr[SPR_CIABR] & PPC_BITMASK(62, 63);
+switch (priv) {
+case 0x1: /* problem */
+return env->msr & ((target_ulong)1 << MSR_PR);
+case 0x2: /* supervisor */
+return (!(env->msr & ((target_ulong)1 << MSR_PR)) &&
+!(env->msr & ((target_ulong)1 << MSR_HV)));
+case 0x3: /* hypervisor */

[PATCH 13/24] tcg/arm: Implement negsetcond_i32

2023-08-07 Thread Richard Henderson

Trivial, as we simply need to load a different constant
in the conditional move.

Signed-off-by: Richard Henderson 
---
 tcg/arm/tcg-target.h | 2 +-
 tcg/arm/tcg-target.c.inc | 9 +
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/tcg/arm/tcg-target.h b/tcg/arm/tcg-target.h
index b076d033a9..b064bbda9f 100644
--- a/tcg/arm/tcg-target.h
+++ b/tcg/arm/tcg-target.h
@@ -122,7 +122,7 @@ extern bool use_neon_instructions;
 #define TCG_TARGET_HAS_mulsh_i320
 #define TCG_TARGET_HAS_div_i32  use_idiv_instructions
 #define TCG_TARGET_HAS_rem_i32  0
-#define TCG_TARGET_HAS_negsetcond_i32   0
+#define TCG_TARGET_HAS_negsetcond_i32   1
 #define TCG_TARGET_HAS_qemu_st8_i32 0
 
 #define TCG_TARGET_HAS_qemu_ldst_i128   0
diff --git a/tcg/arm/tcg-target.c.inc b/tcg/arm/tcg-target.c.inc
index 83e286088f..162df38c73 100644
--- a/tcg/arm/tcg-target.c.inc
+++ b/tcg/arm/tcg-target.c.inc
@@ -1975,6 +1975,14 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
 tcg_out_dat_imm(s, tcg_cond_to_arm_cond[tcg_invert_cond(args[3])],
 ARITH_MOV, args[0], 0, 0);
 break;
+case INDEX_op_negsetcond_i32:
+tcg_out_dat_rIN(s, COND_AL, ARITH_CMP, ARITH_CMN, 0,
+args[1], args[2], const_args[2]);
+tcg_out_dat_imm(s, tcg_cond_to_arm_cond[args[3]],
+ARITH_MVN, args[0], 0, 0);
+tcg_out_dat_imm(s, tcg_cond_to_arm_cond[tcg_invert_cond(args[3])],
+ARITH_MOV, args[0], 0, 0);
+break;
 
 case INDEX_op_brcond2_i32:
 c = tcg_out_cmp2(s, args, const_args);
@@ -2112,6 +2120,7 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode 
op)
 case INDEX_op_add_i32:
 case INDEX_op_sub_i32:
 case INDEX_op_setcond_i32:
+case INDEX_op_negsetcond_i32:
 return C_O1_I2(r, r, rIN);
 
 case INDEX_op_and_i32:
-- 
2.34.1

[PATCH 09/24] target/tricore: Replace gen_cond_w with tcg_gen_negsetcond_tl

2023-08-07 Thread Richard Henderson

Signed-off-by: Richard Henderson 
---
 target/tricore/translate.c | 16 ++--
 1 file changed, 6 insertions(+), 10 deletions(-)

diff --git a/target/tricore/translate.c b/target/tricore/translate.c
index 1947733870..6ae5ccbf72 100644
--- a/target/tricore/translate.c
+++ b/target/tricore/translate.c
@@ -2680,13 +2680,6 @@ gen_accumulating_condi(int cond, TCGv ret, TCGv r1, 
int32_t con,
 gen_accumulating_cond(cond, ret, r1, temp, op);
 }
 
-/* ret = (r1 cond r2) ? 0x ? 0x;*/
-static inline void gen_cond_w(TCGCond cond, TCGv ret, TCGv r1, TCGv r2)
-{
-tcg_gen_setcond_tl(cond, ret, r1, r2);
-tcg_gen_neg_tl(ret, ret);
-}
-
 static inline void gen_eqany_bi(TCGv ret, TCGv r1, int32_t con)
 {
 TCGv b0 = tcg_temp_new();
@@ -5692,7 +5685,8 @@ static void decode_rr_accumulator(DisasContext *ctx)
 gen_helper_eq_h(cpu_gpr_d[r3], cpu_gpr_d[r1], cpu_gpr_d[r2]);
 break;
 case OPC2_32_RR_EQ_W:
-gen_cond_w(TCG_COND_EQ, cpu_gpr_d[r3], cpu_gpr_d[r1], cpu_gpr_d[r2]);
+tcg_gen_negsetcond_tl(TCG_COND_EQ, cpu_gpr_d[r3],
+  cpu_gpr_d[r1], cpu_gpr_d[r2]);
 break;
 case OPC2_32_RR_EQANY_B:
 gen_helper_eqany_b(cpu_gpr_d[r3], cpu_gpr_d[r1], cpu_gpr_d[r2]);
@@ -5729,10 +5723,12 @@ static void decode_rr_accumulator(DisasContext *ctx)
 gen_helper_lt_hu(cpu_gpr_d[r3], cpu_gpr_d[r1], cpu_gpr_d[r2]);
 break;
 case OPC2_32_RR_LT_W:
-gen_cond_w(TCG_COND_LT, cpu_gpr_d[r3], cpu_gpr_d[r1], cpu_gpr_d[r2]);
+tcg_gen_negsetcond_tl(TCG_COND_LT, cpu_gpr_d[r3],
+  cpu_gpr_d[r1], cpu_gpr_d[r2]);
 break;
 case OPC2_32_RR_LT_WU:
-gen_cond_w(TCG_COND_LTU, cpu_gpr_d[r3], cpu_gpr_d[r1], cpu_gpr_d[r2]);
+tcg_gen_negsetcond_tl(TCG_COND_LTU, cpu_gpr_d[r3],
+  cpu_gpr_d[r1], cpu_gpr_d[r2]);
 break;
 case OPC2_32_RR_MAX:
 tcg_gen_movcond_tl(TCG_COND_GT, cpu_gpr_d[r3], cpu_gpr_d[r1],
-- 
2.34.1

[PATCH 17/24] tcg/i386: Merge tcg_out_brcond{32,64}

2023-08-07 Thread Richard Henderson

Pass a rexw parameter instead of duplicating the functions.

Signed-off-by: Richard Henderson 
---
 tcg/i386/tcg-target.c.inc | 110 +-
 1 file changed, 49 insertions(+), 61 deletions(-)

diff --git a/tcg/i386/tcg-target.c.inc b/tcg/i386/tcg-target.c.inc
index 77482da070..b9673b55bd 100644
--- a/tcg/i386/tcg-target.c.inc
+++ b/tcg/i386/tcg-target.c.inc
@@ -1433,99 +1433,89 @@ static void tcg_out_cmp(TCGContext *s, TCGArg arg1, 
TCGArg arg2,
 }
 }
 
-static void tcg_out_brcond32(TCGContext *s, TCGCond cond,
- TCGArg arg1, TCGArg arg2, int const_arg2,
- TCGLabel *label, int small)
+static void tcg_out_brcond(TCGContext *s, int rexw, TCGCond cond,
+   TCGArg arg1, TCGArg arg2, int const_arg2,
+   TCGLabel *label, bool small)
 {
-tcg_out_cmp(s, arg1, arg2, const_arg2, 0);
+tcg_out_cmp(s, arg1, arg2, const_arg2, rexw);
 tcg_out_jxx(s, tcg_cond_to_jcc[cond], label, small);
 }
 
-#if TCG_TARGET_REG_BITS == 64
-static void tcg_out_brcond64(TCGContext *s, TCGCond cond,
- TCGArg arg1, TCGArg arg2, int const_arg2,
- TCGLabel *label, int small)
-{
-tcg_out_cmp(s, arg1, arg2, const_arg2, P_REXW);
-tcg_out_jxx(s, tcg_cond_to_jcc[cond], label, small);
-}
-#else
-/* XXX: we implement it at the target level to avoid having to
-   handle cross basic blocks temporaries */
+#if TCG_TARGET_REG_BITS == 32
 static void tcg_out_brcond2(TCGContext *s, const TCGArg *args,
-const int *const_args, int small)
+const int *const_args, bool small)
 {
 TCGLabel *label_next = gen_new_label();
 TCGLabel *label_this = arg_label(args[5]);
 
 switch(args[4]) {
 case TCG_COND_EQ:
-tcg_out_brcond32(s, TCG_COND_NE, args[0], args[2], const_args[2],
- label_next, 1);
-tcg_out_brcond32(s, TCG_COND_EQ, args[1], args[3], const_args[3],
- label_this, small);
+tcg_out_brcond(s, 0, TCG_COND_NE, args[0], args[2], const_args[2],
+   label_next, 1);
+tcg_out_brcond(s, 0, TCG_COND_EQ, args[1], args[3], const_args[3],
+   label_this, small);
 break;
 case TCG_COND_NE:
-tcg_out_brcond32(s, TCG_COND_NE, args[0], args[2], const_args[2],
- label_this, small);
-tcg_out_brcond32(s, TCG_COND_NE, args[1], args[3], const_args[3],
- label_this, small);
+tcg_out_brcond(s, 0, TCG_COND_NE, args[0], args[2], const_args[2],
+   label_this, small);
+tcg_out_brcond(s, 0, TCG_COND_NE, args[1], args[3], const_args[3],
+   label_this, small);
 break;
 case TCG_COND_LT:
-tcg_out_brcond32(s, TCG_COND_LT, args[1], args[3], const_args[3],
- label_this, small);
+tcg_out_brcond(s, 0, TCG_COND_LT, args[1], args[3], const_args[3],
+   label_this, small);
 tcg_out_jxx(s, JCC_JNE, label_next, 1);
-tcg_out_brcond32(s, TCG_COND_LTU, args[0], args[2], const_args[2],
- label_this, small);
+tcg_out_brcond(s, 0, TCG_COND_LTU, args[0], args[2], const_args[2],
+   label_this, small);
 break;
 case TCG_COND_LE:
-tcg_out_brcond32(s, TCG_COND_LT, args[1], args[3], const_args[3],
- label_this, small);
+tcg_out_brcond(s, 0, TCG_COND_LT, args[1], args[3], const_args[3],
+   label_this, small);
 tcg_out_jxx(s, JCC_JNE, label_next, 1);
-tcg_out_brcond32(s, TCG_COND_LEU, args[0], args[2], const_args[2],
- label_this, small);
+tcg_out_brcond(s, 0, TCG_COND_LEU, args[0], args[2], const_args[2],
+   label_this, small);
 break;
 case TCG_COND_GT:
-tcg_out_brcond32(s, TCG_COND_GT, args[1], args[3], const_args[3],
- label_this, small);
+tcg_out_brcond(s, 0, TCG_COND_GT, args[1], args[3], const_args[3],
+   label_this, small);
 tcg_out_jxx(s, JCC_JNE, label_next, 1);
-tcg_out_brcond32(s, TCG_COND_GTU, args[0], args[2], const_args[2],
- label_this, small);
+tcg_out_brcond(s, 0, TCG_COND_GTU, args[0], args[2], const_args[2],
+   label_this, small);
 break;
 case TCG_COND_GE:
-tcg_out_brcond32(s, TCG_COND_GT, args[1], args[3], const_args[3],
- label_this, small);
+tcg_out_brcond(s, 0, TCG_COND_GT, args[1], args[3], const_args[3],
+   label_this, small);
 tcg_out_jxx(s, JCC_JNE, label_next, 1);
-tcg_out_brcond32(s, TCG_COND_GEU, args[0], args[2], const_args[2],
-

[PATCH 01/24] tcg: Introduce negsetcond opcodes

2023-08-07 Thread Richard Henderson

Introduce a new opcode for negative setcond.

Signed-off-by: Richard Henderson 
---
 docs/devel/tcg-ops.rst   |  6 ++
 include/tcg/tcg-op-common.h  |  4 
 include/tcg/tcg-op.h |  2 ++
 include/tcg/tcg-opc.h|  2 ++
 include/tcg/tcg.h|  1 +
 tcg/aarch64/tcg-target.h |  2 ++
 tcg/arm/tcg-target.h |  1 +
 tcg/i386/tcg-target.h|  2 ++
 tcg/loongarch64/tcg-target.h |  3 +++
 tcg/mips/tcg-target.h|  2 ++
 tcg/ppc/tcg-target.h |  2 ++
 tcg/riscv/tcg-target.h   |  2 ++
 tcg/s390x/tcg-target.h   |  2 ++
 tcg/sparc64/tcg-target.h |  2 ++
 tcg/tci/tcg-target.h |  2 ++
 tcg/optimize.c   | 41 +++-
 tcg/tcg-op.c | 36 +++
 tcg/tcg.c|  6 ++
 18 files changed, 117 insertions(+), 1 deletion(-)

diff --git a/docs/devel/tcg-ops.rst b/docs/devel/tcg-ops.rst
index 6a166c5665..fbde8040d7 100644
--- a/docs/devel/tcg-ops.rst
+++ b/docs/devel/tcg-ops.rst
@@ -498,6 +498,12 @@ Conditional moves
|
| Set *dest* to 1 if (*t1* *cond* *t2*) is true, otherwise set to 0.
 
+   * - negsetcond_i32/i64 *dest*, *t1*, *t2*, *cond*
+
+ - | *dest* = -(*t1* *cond* *t2*)
+   |
+   | Set *dest* to -1 if (*t1* *cond* *t2*) is true, otherwise set to 0.
+
* - movcond_i32/i64 *dest*, *c1*, *c2*, *v1*, *v2*, *cond*
 
  - | *dest* = (*c1* *cond* *c2* ? *v1* : *v2*)
diff --git a/include/tcg/tcg-op-common.h b/include/tcg/tcg-op-common.h
index be382bbf77..a53b15933b 100644
--- a/include/tcg/tcg-op-common.h
+++ b/include/tcg/tcg-op-common.h
@@ -344,6 +344,8 @@ void tcg_gen_setcond_i32(TCGCond cond, TCGv_i32 ret,
  TCGv_i32 arg1, TCGv_i32 arg2);
 void tcg_gen_setcondi_i32(TCGCond cond, TCGv_i32 ret,
   TCGv_i32 arg1, int32_t arg2);
+void tcg_gen_negsetcond_i32(TCGCond cond, TCGv_i32 ret,
+TCGv_i32 arg1, TCGv_i32 arg2);
 void tcg_gen_movcond_i32(TCGCond cond, TCGv_i32 ret, TCGv_i32 c1,
  TCGv_i32 c2, TCGv_i32 v1, TCGv_i32 v2);
 void tcg_gen_add2_i32(TCGv_i32 rl, TCGv_i32 rh, TCGv_i32 al,
@@ -540,6 +542,8 @@ void tcg_gen_setcond_i64(TCGCond cond, TCGv_i64 ret,
  TCGv_i64 arg1, TCGv_i64 arg2);
 void tcg_gen_setcondi_i64(TCGCond cond, TCGv_i64 ret,
   TCGv_i64 arg1, int64_t arg2);
+void tcg_gen_negsetcond_i64(TCGCond cond, TCGv_i64 ret,
+TCGv_i64 arg1, TCGv_i64 arg2);
 void tcg_gen_movcond_i64(TCGCond cond, TCGv_i64 ret, TCGv_i64 c1,
  TCGv_i64 c2, TCGv_i64 v1, TCGv_i64 v2);
 void tcg_gen_add2_i64(TCGv_i64 rl, TCGv_i64 rh, TCGv_i64 al,
diff --git a/include/tcg/tcg-op.h b/include/tcg/tcg-op.h
index d63683c47b..80cfcf8104 100644
--- a/include/tcg/tcg-op.h
+++ b/include/tcg/tcg-op.h
@@ -200,6 +200,7 @@ DEF_ATOMIC2(tcg_gen_atomic_umax_fetch, i64)
 #define tcg_gen_brcondi_tl tcg_gen_brcondi_i64
 #define tcg_gen_setcond_tl tcg_gen_setcond_i64
 #define tcg_gen_setcondi_tl tcg_gen_setcondi_i64
+#define tcg_gen_negsetcond_tl tcg_gen_negsetcond_i64
 #define tcg_gen_mul_tl tcg_gen_mul_i64
 #define tcg_gen_muli_tl tcg_gen_muli_i64
 #define tcg_gen_div_tl tcg_gen_div_i64
@@ -317,6 +318,7 @@ DEF_ATOMIC2(tcg_gen_atomic_umax_fetch, i64)
 #define tcg_gen_brcondi_tl tcg_gen_brcondi_i32
 #define tcg_gen_setcond_tl tcg_gen_setcond_i32
 #define tcg_gen_setcondi_tl tcg_gen_setcondi_i32
+#define tcg_gen_negsetcond_tl tcg_gen_negsetcond_i32
 #define tcg_gen_mul_tl tcg_gen_mul_i32
 #define tcg_gen_muli_tl tcg_gen_muli_i32
 #define tcg_gen_div_tl tcg_gen_div_i32
diff --git a/include/tcg/tcg-opc.h b/include/tcg/tcg-opc.h
index acfa5ba753..5044814d15 100644
--- a/include/tcg/tcg-opc.h
+++ b/include/tcg/tcg-opc.h
@@ -46,6 +46,7 @@ DEF(mb, 0, 0, 1, 0)
 
 DEF(mov_i32, 1, 1, 0, TCG_OPF_NOT_PRESENT)
 DEF(setcond_i32, 1, 2, 1, 0)
+DEF(negsetcond_i32, 1, 2, 1, IMPL(TCG_TARGET_HAS_negsetcond_i32))
 DEF(movcond_i32, 1, 4, 1, IMPL(TCG_TARGET_HAS_movcond_i32))
 /* load/store */
 DEF(ld8u_i32, 1, 1, 1, 0)
@@ -111,6 +112,7 @@ DEF(ctpop_i32, 1, 1, 0, IMPL(TCG_TARGET_HAS_ctpop_i32))
 
 DEF(mov_i64, 1, 1, 0, TCG_OPF_64BIT | TCG_OPF_NOT_PRESENT)
 DEF(setcond_i64, 1, 2, 1, IMPL64)
+DEF(negsetcond_i64, 1, 2, 1, IMPL64 | IMPL(TCG_TARGET_HAS_negsetcond_i64))
 DEF(movcond_i64, 1, 4, 1, IMPL64 | IMPL(TCG_TARGET_HAS_movcond_i64))
 /* load/store */
 DEF(ld8u_i64, 1, 1, 1, IMPL64)
diff --git a/include/tcg/tcg.h b/include/tcg/tcg.h
index 0875971719..f00bff9c85 100644
--- a/include/tcg/tcg.h
+++ b/include/tcg/tcg.h
@@ -104,6 +104,7 @@ typedef uint64_t TCGRegSet;
 #define TCG_TARGET_HAS_muls2_i640
 #define TCG_TARGET_HAS_muluh_i640
 #define TCG_TARGET_HAS_mulsh_i640
+#define TCG_TARGET_HAS_negsetcond_i64   0
 /* Turn some undef macros into true macros.  */
 #define TCG_TARGET_HAS_add2_i32 1
 #define TCG_TARGET_HAS_sub2_i32 1
diff --git

[PATCH 3/6] target/ppc: Suppress single step interrupts on rfi-type instructions

2023-08-07 Thread Nicholas Piggin

BookS does not take single step interrupts on completion of rfi and
similar (rfid, hrfid, rfscv). This is not a completely clean way to
do it, but in general non-branch instructions that change NIP on
completion are excluded.

Signed-off-by: Nicholas Piggin 
---
 target/ppc/translate.c | 23 +--
 1 file changed, 17 insertions(+), 6 deletions(-)

diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index 5051596670..6e8f1797ac 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -338,7 +338,7 @@ static void gen_ppc_maybe_interrupt(DisasContext *ctx)
  * The exception can be either POWERPC_EXCP_TRACE (on most PowerPCs) or
  * POWERPC_EXCP_DEBUG (on BookE).
  */
-static void gen_debug_exception(DisasContext *ctx)
+static void gen_debug_exception(DisasContext *ctx, bool rfi_type)
 {
 #if !defined(CONFIG_USER_ONLY)
 if (ctx->flags & POWERPC_FLAG_DE) {
@@ -357,10 +357,12 @@ static void gen_debug_exception(DisasContext *ctx)
tcg_constant_i32(POWERPC_EXCP_DEBUG));
 ctx->base.is_jmp = DISAS_NORETURN;
 } else {
-TCGv t0 = tcg_temp_new();
-tcg_gen_movi_tl(t0, ctx->cia);
-gen_helper_book3s_trace(cpu_env, t0);
-ctx->base.is_jmp = DISAS_NORETURN;
+if (!rfi_type) { /* BookS does not single step rfi type instructions */
+TCGv t0 = tcg_temp_new();
+tcg_gen_movi_tl(t0, ctx->cia);
+gen_helper_book3s_trace(cpu_env, t0);
+ctx->base.is_jmp = DISAS_NORETURN;
+}
 }
 #endif
 }
@@ -7412,6 +7414,8 @@ static void ppc_tr_tb_stop(DisasContextBase *dcbase, 
CPUState *cs)
 
 /* Honor single stepping. */
 if (unlikely(ctx->singlestep_enabled & CPU_SINGLE_STEP)) {
+bool rfi_type = false;
+
 switch (is_jmp) {
 case DISAS_TOO_MANY:
 case DISAS_EXIT_UPDATE:
@@ -7420,12 +7424,19 @@ static void ppc_tr_tb_stop(DisasContextBase *dcbase, 
CPUState *cs)
 break;
 case DISAS_EXIT:
 case DISAS_CHAIN:
+/*
+ * This is a heuristic, to put it kindly. The rfi class of
+ * instructions are among the few outside branches that change
+ * NIP without taking an interrupt. Single step trace interrupts
+ * do not fire on completion of these instructions.
+ */
+rfi_type = true;
 break;
 default:
 g_assert_not_reached();
 }
 
-gen_debug_exception(ctx);
+gen_debug_exception(ctx, rfi_type);
 return;
 }
 
-- 
2.40.1

[PATCH 1/6] target/ppc: Remove single-step suppression inside 0x100-0xf00

2023-08-07 Thread Nicholas Piggin

Single-step interrputs are suppressed if the nip is between 0x100 and
0xf00. This has been the case for a long time and it's not clear what
the intention is. Likely either an attempt to suppress trace interrupts
for instructions that cause an interrupt on completion, or a workaround
to prevent software tripping over itself single stepping its interrupt
handlers.

BookE interrupt vectors are set by IVOR registers, and BookS has AIL
modes and new interrupt types, so there are many interrupts including
the debug interrupt which can be outside this range. So any effect it
might have had does not cover most cases (including Linux on recent
BookS CPUs).

Remove this special case.

Signed-off-by: Nicholas Piggin 
---
 target/ppc/translate.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index 74796ec7ba..06530dd782 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -7410,8 +7410,7 @@ static void ppc_tr_tb_stop(DisasContextBase *dcbase, 
CPUState *cs)
 }
 
 /* Honor single stepping. */
-if (unlikely(ctx->singlestep_enabled & CPU_SINGLE_STEP)
-&& (nip <= 0x100 || nip > 0xf00)) {
+if (unlikely(ctx->singlestep_enabled & CPU_SINGLE_STEP)) {
 switch (is_jmp) {
 case DISAS_TOO_MANY:
 case DISAS_EXIT_UPDATE:
-- 
2.40.1

[PATCH for-8.2 0/6] ppc: debug facility improvements

2023-08-07 Thread Nicholas Piggin

I started out looking at this to reduce divergence of TCG and KVM
machines with 2nd DAWR. The divergence already exists with first
DAWR, so I don't want to tie the KVM 2nd DAWR enablement to this,
but it would be nice to ensure the caps and such for the 2nd DAWR
will also work for TCG.

I don't know that we have great test cases for this, it does work
with some of the Linux selftests ptrace debug tests (although those
tests seem to have a few issues in upstream kernels), some basic
Linux xmon and gdb tests by hand, and I've started working on some
kvm unit tests.

Thanks,
Nick 

Nicholas Piggin (6):
  target/ppc: Remove single-step suppression inside 0x100-0xf00
  target/ppc: Improve book3s branch trace interrupt for v2.07S
  target/ppc: Suppress single step interrupts on rfi-type instructions
  target/ppc: Implement breakpoint debug facility for v2.07S
  target/ppc: Implement watchpoint debug facility for v2.07S
  spapr: implement H_SET_MODE debug facilities

 hw/ppc/spapr_hcall.c |  57 +
 target/ppc/cpu.c |  86 +++
 target/ppc/cpu.h |   7 +++
 target/ppc/cpu_init.c|  11 ++--
 target/ppc/excp_helper.c | 108 ++-
 target/ppc/helper.h  |   4 ++
 target/ppc/internal.h|   3 ++
 target/ppc/machine.c |   5 ++
 target/ppc/misc_helper.c |  15 ++
 target/ppc/spr_common.h  |   3 ++
 target/ppc/translate.c   |  60 +-
 11 files changed, 341 insertions(+), 18 deletions(-)

-- 
2.40.1

Re: [PATCH 4/7] spapr: Fix record-replay machine reset consuming too many events

2023-08-07 Thread Nicholas Piggin

On Sun Aug 6, 2023 at 9:46 PM AEST, Nicholas Piggin wrote:
> On Fri Aug 4, 2023 at 6:50 PM AEST, Pavel Dovgalyuk wrote:
> > BTW, there is a function qemu_register_reset_nosnapshotload that can be 
> > used in similar cases.
> > Can you just use it without changing the code of the reset handler?
>
> I didn't know that, thanks for pointing it out. I'll take a closer look
> at it before reposting.

Seems a bit tricky because the device tree has to be rebuilt at reset
time (including snapshot load), but it uses the random number. So
having a second nosnapshotload reset function might not be called in
the correct order, I think?  For now I will keep it as is.

Thanks,
Nick

[PATCH v2 2/7] tcg/ppc: Use PADDI in tcg_out_movi

2023-08-07 Thread Richard Henderson

PADDI can load 34-bit immediates and 34-bit pc-relative addresses.

Reviewed-by: Jordan Niethe 
Signed-off-by: Richard Henderson 
---
 tcg/ppc/tcg-target.c.inc | 51 
 1 file changed, 51 insertions(+)

diff --git a/tcg/ppc/tcg-target.c.inc b/tcg/ppc/tcg-target.c.inc
index 642d0fd128..2141c0bc78 100644
--- a/tcg/ppc/tcg-target.c.inc
+++ b/tcg/ppc/tcg-target.c.inc
@@ -707,6 +707,38 @@ static bool patch_reloc(tcg_insn_unit *code_ptr, int type,
 return true;
 }
 
+/* Ensure that the prefixed instruction does not cross a 64-byte boundary. */
+static bool tcg_out_need_prefix_align(TCGContext *s)
+{
+return ((uintptr_t)s->code_ptr & 0x3f) == 0x3c;
+}
+
+static void tcg_out_prefix_align(TCGContext *s)
+{
+if (tcg_out_need_prefix_align(s)) {
+tcg_out32(s, NOP);
+}
+}
+
+static ptrdiff_t tcg_pcrel_diff_for_prefix(TCGContext *s, const void *target)
+{
+return tcg_pcrel_diff(s, target) - (tcg_out_need_prefix_align(s) ? 4 : 0);
+}
+
+/* Output Type 10 Prefix - Modified Load/Store Form (MLS:D) */
+static void tcg_out_mls_d(TCGContext *s, tcg_insn_unit opc, unsigned rt,
+  unsigned ra, tcg_target_long imm, bool r)
+{
+tcg_insn_unit p, i;
+
+p = OPCD(1) | (2 << 24) | (r << 20) | ((imm >> 16) & 0x3);
+i = opc | TAI(rt, ra, imm);
+
+tcg_out_prefix_align(s);
+tcg_out32(s, p);
+tcg_out32(s, i);
+}
+
 static void tcg_out_mem_long(TCGContext *s, int opi, int opx, TCGReg rt,
  TCGReg base, tcg_target_long offset);
 
@@ -992,6 +1024,25 @@ static void tcg_out_movi_int(TCGContext *s, TCGType type, 
TCGReg ret,
 return;
 }
 
+/*
+ * Load values up to 34 bits, and pc-relative addresses,
+ * with one prefixed insn.
+ */
+if (have_isa_3_10) {
+if (arg == sextract64(arg, 0, 34)) {
+/* pli ret,value = paddi ret,0,value,0 */
+tcg_out_mls_d(s, ADDI, ret, 0, arg, 0);
+return;
+}
+
+tmp = tcg_pcrel_diff_for_prefix(s, (void *)arg);
+if (tmp == sextract64(tmp, 0, 34)) {
+/* pla ret,value = paddi ret,0,value,1 */
+tcg_out_mls_d(s, ADDI, ret, 0, tmp, 1);
+return;
+}
+}
+
 /* Load 32-bit immediates with two insns.  Note that we've already
eliminated bare ADDIS, so we know both insns are required.  */
 if (TCG_TARGET_REG_BITS == 32 || arg == (int32_t)arg) {
-- 
2.34.1

[PATCH v2 7/7] tcg/ppc: Use prefixed instructions for tcg_out_goto_tb

2023-08-07 Thread Richard Henderson

When a direct branch is out of range, we can load the destination for
the indirect branch using PLA (for 16GB worth of buffer) and PLD from
the TranslationBlock for everything larger.

This means the patch affects exactly one instruction: B (plus filler),
PLA or PLD.  Which means we can update and execute the patch atomically.

Signed-off-by: Richard Henderson 
---
 tcg/ppc/tcg-target.c.inc | 31 +++
 1 file changed, 19 insertions(+), 12 deletions(-)

diff --git a/tcg/ppc/tcg-target.c.inc b/tcg/ppc/tcg-target.c.inc
index 63fe4ef995..b686a68247 100644
--- a/tcg/ppc/tcg-target.c.inc
+++ b/tcg/ppc/tcg-target.c.inc
@@ -2646,31 +2646,38 @@ static void tcg_out_goto_tb(TCGContext *s, int which)
 uintptr_t ptr = get_jmp_target_addr(s, which);
 
 if (USE_REG_TB) {
+/*
+ * With REG_TB, we must always use indirect branching,
+ * so that the branch destination and TCG_REG_TB match.
+ */
 ptrdiff_t offset = tcg_tbrel_diff(s, (void *)ptr);
 tcg_out_mem_long(s, LD, LDX, TCG_REG_TB, TCG_REG_TB, offset);
-
-/* TODO: Use direct branches when possible. */
-set_jmp_insn_offset(s, which);
 tcg_out32(s, MTSPR | RS(TCG_REG_TB) | CTR);
-
 tcg_out32(s, BCCTR | BO_ALWAYS);
 
 /* For the unlinked case, need to reset TCG_REG_TB.  */
 set_jmp_reset_offset(s, which);
 tcg_out_mem_long(s, ADDI, ADD, TCG_REG_TB, TCG_REG_TB,
  -tcg_current_code_size(s));
-} else {
-/* Direct branch will be patched by tb_target_set_jmp_target. */
-set_jmp_insn_offset(s, which);
-tcg_out32(s, NOP);
+return;
+}
 
-/* When branch is out of range, fall through to indirect. */
+/* Direct branch will be patched by tb_target_set_jmp_target. */
+set_jmp_insn_offset(s, which);
+tcg_out32(s, NOP);
+
+/* When branch is out of range, fall through to indirect. */
+if (have_isa_3_10) {
+ptrdiff_t offset = tcg_pcrel_diff_for_prefix(s, (void *)ptr);
+tcg_out_8ls_d(s, PLD, TCG_REG_TMP1, 0, offset, 1);
+} else {
 tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_TMP1, ptr - (int16_t)ptr);
 tcg_out_ld(s, TCG_TYPE_PTR, TCG_REG_TMP1, TCG_REG_TMP1, (int16_t)ptr);
-tcg_out32(s, MTSPR | RS(TCG_REG_TMP1) | CTR);
-tcg_out32(s, BCCTR | BO_ALWAYS);
-set_jmp_reset_offset(s, which);
 }
+
+tcg_out32(s, MTSPR | RS(TCG_REG_TMP1) | CTR);
+tcg_out32(s, BCCTR | BO_ALWAYS);
+set_jmp_reset_offset(s, which);
 }
 
 void tb_target_set_jmp_target(const TranslationBlock *tb, int n,
-- 
2.34.1

[PATCH v2 5/7] tcg/ppc: Use prefixed instructions in tcg_out_dupi_vec

2023-08-07 Thread Richard Henderson

The prefixed instructions have a pc-relative form to use here.

Signed-off-by: Richard Henderson 
---
 tcg/ppc/tcg-target.c.inc | 12 
 1 file changed, 12 insertions(+)

diff --git a/tcg/ppc/tcg-target.c.inc b/tcg/ppc/tcg-target.c.inc
index b3b2e9874d..01ca5c9f39 100644
--- a/tcg/ppc/tcg-target.c.inc
+++ b/tcg/ppc/tcg-target.c.inc
@@ -1195,6 +1195,18 @@ static void tcg_out_dupi_vec(TCGContext *s, TCGType 
type, unsigned vece,
 /*
  * Otherwise we must load the value from the constant pool.
  */
+
+if (have_isa_3_10) {
+if (type == TCG_TYPE_V64) {
+tcg_out_8ls_d(s, PLXSD, ret & 31, 0, 0, 1);
+new_pool_label(s, val, R_PPC64_PCREL34, s->code_ptr - 2, 0);
+} else {
+tcg_out_8ls_d(s, PLXV, ret & 31, 0, 0, 1);
+new_pool_l2(s, R_PPC64_PCREL34, s->code_ptr - 2, 0, val, val);
+}
+return;
+}
+
 if (USE_REG_TB) {
 rel = R_PPC_ADDR16;
 add = tcg_tbrel_diff(s, NULL);
-- 
2.34.1

[PATCH v2 1/7] tcg/ppc: Untabify tcg-target.c.inc

2023-08-07 Thread Richard Henderson

Signed-off-by: Richard Henderson 
---
 tcg/ppc/tcg-target.c.inc | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/tcg/ppc/tcg-target.c.inc b/tcg/ppc/tcg-target.c.inc
index 511e14b180..642d0fd128 100644
--- a/tcg/ppc/tcg-target.c.inc
+++ b/tcg/ppc/tcg-target.c.inc
@@ -221,7 +221,7 @@ static inline bool in_range_b(tcg_target_long target)
 }
 
 static uint32_t reloc_pc24_val(const tcg_insn_unit *pc,
-  const tcg_insn_unit *target)
+   const tcg_insn_unit *target)
 {
 ptrdiff_t disp = tcg_ptr_byte_diff(target, pc);
 tcg_debug_assert(in_range_b(disp));
@@ -241,7 +241,7 @@ static bool reloc_pc24(tcg_insn_unit *src_rw, const 
tcg_insn_unit *target)
 }
 
 static uint16_t reloc_pc14_val(const tcg_insn_unit *pc,
-  const tcg_insn_unit *target)
+   const tcg_insn_unit *target)
 {
 ptrdiff_t disp = tcg_ptr_byte_diff(target, pc);
 tcg_debug_assert(disp == (int16_t) disp);
@@ -3587,7 +3587,7 @@ static void expand_vec_mul(TCGType type, unsigned vece, 
TCGv_vec v0,
   tcgv_vec_arg(t1), tcgv_vec_arg(t2));
 vec_gen_3(INDEX_op_ppc_pkum_vec, type, vece, tcgv_vec_arg(v0),
   tcgv_vec_arg(v0), tcgv_vec_arg(t1));
-   break;
+break;
 
 case MO_32:
 tcg_debug_assert(!have_isa_2_07);
-- 
2.34.1

[PATCH v2 3/7] tcg/ppc: Use prefixed instructions in tcg_out_mem_long

2023-08-07 Thread Richard Henderson

When the offset is out of range of the non-prefixed insn, but
fits the 34-bit immediate of the prefixed insn, use that.

Reviewed-by: Jordan Niethe 
Signed-off-by: Richard Henderson 
---
 tcg/ppc/tcg-target.c.inc | 66 
 1 file changed, 66 insertions(+)

diff --git a/tcg/ppc/tcg-target.c.inc b/tcg/ppc/tcg-target.c.inc
index 2141c0bc78..61ae9d8ab7 100644
--- a/tcg/ppc/tcg-target.c.inc
+++ b/tcg/ppc/tcg-target.c.inc
@@ -323,6 +323,15 @@ static bool tcg_target_const_match(int64_t val, TCGType 
type, int ct)
 #define STDX   XO31(149)
 #define STQXO62(  2)
 
+#define PLWA   OPCD( 41)
+#define PLDOPCD( 57)
+#define PLXSD  OPCD( 42)
+#define PLXV   OPCD(25 * 2 + 1)  /* force tx=1 */
+
+#define PSTD   OPCD( 61)
+#define PSTXSD OPCD( 46)
+#define PSTXV  OPCD(27 * 2 + 1)  /* force sx=1 */
+
 #define ADDIC  OPCD( 12)
 #define ADDI   OPCD( 14)
 #define ADDIS  OPCD( 15)
@@ -725,6 +734,20 @@ static ptrdiff_t tcg_pcrel_diff_for_prefix(TCGContext *s, 
const void *target)
 return tcg_pcrel_diff(s, target) - (tcg_out_need_prefix_align(s) ? 4 : 0);
 }
 
+/* Output Type 00 Prefix - 8-Byte Load/Store Form (8LS:D) */
+static void tcg_out_8ls_d(TCGContext *s, tcg_insn_unit opc, unsigned rt,
+  unsigned ra, tcg_target_long imm, bool r)
+{
+tcg_insn_unit p, i;
+
+p = OPCD(1) | (r << 20) | ((imm >> 16) & 0x3);
+i = opc | TAI(rt, ra, imm);
+
+tcg_out_prefix_align(s);
+tcg_out32(s, p);
+tcg_out32(s, i);
+}
+
 /* Output Type 10 Prefix - Modified Load/Store Form (MLS:D) */
 static void tcg_out_mls_d(TCGContext *s, tcg_insn_unit opc, unsigned rt,
   unsigned ra, tcg_target_long imm, bool r)
@@ -1368,6 +1391,49 @@ static void tcg_out_mem_long(TCGContext *s, int opi, int 
opx, TCGReg rt,
 break;
 }
 
+/* For unaligned or large offsets, use the prefixed form. */
+if (have_isa_3_10
+&& (offset != (int16_t)offset || (offset & align))
+&& offset == sextract64(offset, 0, 34)) {
+/*
+ * Note that the MLS:D insns retain their un-prefixed opcode,
+ * while the 8LS:D insns use a different opcode space.
+ */
+switch (opi) {
+case LBZ:
+case LHZ:
+case LHA:
+case LWZ:
+case STB:
+case STH:
+case STW:
+case ADDI:
+tcg_out_mls_d(s, opi, rt, base, offset, 0);
+return;
+case LWA:
+tcg_out_8ls_d(s, PLWA, rt, base, offset, 0);
+return;
+case LD:
+tcg_out_8ls_d(s, PLD, rt, base, offset, 0);
+return;
+case STD:
+tcg_out_8ls_d(s, PSTD, rt, base, offset, 0);
+return;
+case LXSD:
+tcg_out_8ls_d(s, PLXSD, rt & 31, base, offset, 0);
+return;
+case STXSD:
+tcg_out_8ls_d(s, PSTXSD, rt & 31, base, offset, 0);
+return;
+case LXV:
+tcg_out_8ls_d(s, PLXV, rt & 31, base, offset, 0);
+return;
+case STXV:
+tcg_out_8ls_d(s, PSTXV, rt & 31, base, offset, 0);
+return;
+}
+}
+
 /* For unaligned, or very large offsets, use the indexed form.  */
 if (offset & align || offset != (int32_t)offset || opi == 0) {
 if (rs == base) {
-- 
2.34.1

[PATCH v2 6/7] tcg/ppc: Disable USE_REG_TB for Power v3.1

2023-08-07 Thread Richard Henderson

With Power v3.1, we have pc-relative addressing and so
do not require a register holding the current TB.

Signed-off-by: Richard Henderson 
---
 tcg/ppc/tcg-target.c.inc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tcg/ppc/tcg-target.c.inc b/tcg/ppc/tcg-target.c.inc
index 01ca5c9f39..63fe4ef995 100644
--- a/tcg/ppc/tcg-target.c.inc
+++ b/tcg/ppc/tcg-target.c.inc
@@ -83,7 +83,7 @@
 #define TCG_VEC_TMP2TCG_REG_V1
 
 #define TCG_REG_TB TCG_REG_R31
-#define USE_REG_TB (TCG_TARGET_REG_BITS == 64)
+#define USE_REG_TB (TCG_TARGET_REG_BITS == 64 && !have_isa_3_10)
 
 /* Shorthand for size of a pointer.  Avoid promotion to unsigned.  */
 #define SZP  ((int)sizeof(void *))
-- 
2.34.1

[PATCH for-8.2 v2 0/7] tcg/ppc: Support power10 prefixed instructions

2023-08-07 Thread Richard Henderson

Emit one 64-bit instruction for large constants and pc-relatives.
With pc-relative addressing, we don't need REG_TB, which means we
can re-enable direct branching for goto_tb.

Changes for v2:
  * Merged Nick's adjustments for goto_tb.  Only patch B/NOP,
falling through to PLD for indirect branch; drop PLA option.
  * Fix sx typo in patch 3 (jordan).


r~


Richard Henderson (7):
  tcg/ppc: Untabify tcg-target.c.inc
  tcg/ppc: Use PADDI in tcg_out_movi
  tcg/ppc: Use prefixed instructions in tcg_out_mem_long
  tcg/ppc: Use PLD in tcg_out_movi for constant pool
  tcg/ppc: Use prefixed instructions in tcg_out_dupi_vec
  tcg/ppc: Disable USE_REG_TB for Power v3.1
  tcg/ppc: Use prefixed instructions for tcg_out_goto_tb

 tcg/ppc/tcg-target.c.inc | 192 +++
 1 file changed, 176 insertions(+), 16 deletions(-)

-- 
2.34.1

[PATCH v2 4/7] tcg/ppc: Use PLD in tcg_out_movi for constant pool

2023-08-07 Thread Richard Henderson

The prefixed instruction has a pc-relative form to use here.

Signed-off-by: Richard Henderson 
---
 tcg/ppc/tcg-target.c.inc | 24 
 1 file changed, 24 insertions(+)

diff --git a/tcg/ppc/tcg-target.c.inc b/tcg/ppc/tcg-target.c.inc
index 61ae9d8ab7..b3b2e9874d 100644
--- a/tcg/ppc/tcg-target.c.inc
+++ b/tcg/ppc/tcg-target.c.inc
@@ -101,6 +101,10 @@
 #define ALL_GENERAL_REGS  0xu
 #define ALL_VECTOR_REGS   0xull
 
+#ifndef R_PPC64_PCREL34
+#define R_PPC64_PCREL34  132
+#endif
+
 #define have_isel  (cpuinfo & CPUINFO_ISEL)
 
 #ifndef CONFIG_SOFTMMU
@@ -260,6 +264,19 @@ static bool reloc_pc14(tcg_insn_unit *src_rw, const 
tcg_insn_unit *target)
 return false;
 }
 
+static bool reloc_pc34(tcg_insn_unit *src_rw, const tcg_insn_unit *target)
+{
+const tcg_insn_unit *src_rx = tcg_splitwx_to_rx(src_rw);
+ptrdiff_t disp = tcg_ptr_byte_diff(target, src_rx);
+
+if (disp == sextract64(disp, 0, 34)) {
+src_rw[0] = (src_rw[0] & ~0x3) | ((disp >> 16) & 0x3);
+src_rw[1] = (src_rw[1] & ~0x) | (disp & 0x);
+return true;
+}
+return false;
+}
+
 /* test if a constant matches the constraint */
 static bool tcg_target_const_match(int64_t val, TCGType type, int ct)
 {
@@ -684,6 +701,8 @@ static bool patch_reloc(tcg_insn_unit *code_ptr, int type,
 return reloc_pc14(code_ptr, target);
 case R_PPC_REL24:
 return reloc_pc24(code_ptr, target);
+case R_PPC64_PCREL34:
+return reloc_pc34(code_ptr, target);
 case R_PPC_ADDR16:
 /*
  * We are (slightly) abusing this relocation type.  In particular,
@@ -,6 +1130,11 @@ static void tcg_out_movi_int(TCGContext *s, TCGType 
type, TCGReg ret,
 }
 
 /* Use the constant pool, if possible.  */
+if (have_isa_3_10) {
+tcg_out_8ls_d(s, PLD, ret, 0, 0, 1);
+new_pool_label(s, arg, R_PPC64_PCREL34, s->code_ptr - 2, 0);
+return;
+}
 if (!in_prologue && USE_REG_TB) {
 new_pool_label(s, arg, R_PPC_ADDR16, s->code_ptr,
tcg_tbrel_diff(s, NULL));
-- 
2.34.1

[PATCH 1/2] linux-user: Split out do_mmap

2023-08-07 Thread Richard Henderson

New function that rejects unsupported map types and flags.
In 4b840f96 we should not have accepted MAP_SHARED_VALIDATE
without actually validating the rest of the flags.

Fixes: 4b840f96 ("linux-user: Populate more bits in mmap_flags_tbl")
Signed-off-by: Richard Henderson 
---
 linux-user/syscall.c | 90 +++-
 1 file changed, 73 insertions(+), 17 deletions(-)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index a15bce2be2..34deff0723 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -5985,10 +5985,6 @@ static const StructEntry struct_termios_def = {
 #endif
 
 static const bitmask_transtbl mmap_flags_tbl[] = {
-{ TARGET_MAP_TYPE, TARGET_MAP_SHARED, MAP_TYPE, MAP_SHARED },
-{ TARGET_MAP_TYPE, TARGET_MAP_PRIVATE, MAP_TYPE, MAP_PRIVATE },
-{ TARGET_MAP_TYPE, TARGET_MAP_SHARED_VALIDATE,
-  MAP_TYPE, MAP_SHARED_VALIDATE },
 { TARGET_MAP_FIXED, TARGET_MAP_FIXED, MAP_FIXED, MAP_FIXED },
 { TARGET_MAP_ANONYMOUS, TARGET_MAP_ANONYMOUS,
   MAP_ANONYMOUS, MAP_ANONYMOUS },
@@ -6006,7 +6002,6 @@ static const bitmask_transtbl mmap_flags_tbl[] = {
Recognize it for the target insofar as we do not want to pass
it through to the host.  */
 { TARGET_MAP_STACK, TARGET_MAP_STACK, 0, 0 },
-{ TARGET_MAP_SYNC, TARGET_MAP_SYNC, MAP_SYNC, MAP_SYNC },
 { TARGET_MAP_NONBLOCK, TARGET_MAP_NONBLOCK, MAP_NONBLOCK, MAP_NONBLOCK },
 { TARGET_MAP_POPULATE, TARGET_MAP_POPULATE, MAP_POPULATE, MAP_POPULATE },
 { TARGET_MAP_FIXED_NOREPLACE, TARGET_MAP_FIXED_NOREPLACE,
@@ -6016,6 +6011,75 @@ static const bitmask_transtbl mmap_flags_tbl[] = {
 { 0, 0, 0, 0 }
 };
 
+/*
+ * Arrange for legacy / undefined architecture specific flags to be
+ * ignored by mmap handling code.
+ */
+#ifndef TARGET_MAP_32BIT
+#define TARGET_MAP_32BIT 0
+#endif
+#ifndef TARGET_MAP_HUGE_2MB
+#define TARGET_MAP_HUGE_2MB 0
+#endif
+#ifndef TARGET_MAP_HUGE_1GB
+#define TARGET_MAP_HUGE_1GB 0
+#endif
+
+static abi_long do_mmap(abi_ulong addr, abi_ulong len, int prot,
+int target_flags, int fd, off_t offset)
+{
+/*
+ * The historical set of flags that all mmap types implicitly support.
+ */
+enum {
+TARGET_LEGACY_MAP_MASK = TARGET_MAP_SHARED
+   | TARGET_MAP_PRIVATE
+   | TARGET_MAP_FIXED
+   | TARGET_MAP_ANONYMOUS
+   | TARGET_MAP_DENYWRITE
+   | TARGET_MAP_EXECUTABLE
+   | TARGET_MAP_UNINITIALIZED
+   | TARGET_MAP_GROWSDOWN
+   | TARGET_MAP_LOCKED
+   | TARGET_MAP_NORESERVE
+   | TARGET_MAP_POPULATE
+   | TARGET_MAP_NONBLOCK
+   | TARGET_MAP_STACK
+   | TARGET_MAP_HUGETLB
+   | TARGET_MAP_32BIT
+   | TARGET_MAP_HUGE_2MB
+   | TARGET_MAP_HUGE_1GB
+};
+int host_flags;
+
+switch (target_flags & TARGET_MAP_TYPE) {
+case TARGET_MAP_PRIVATE:
+host_flags = MAP_PRIVATE;
+break;
+case TARGET_MAP_SHARED:
+host_flags = MAP_SHARED;
+break;
+case TARGET_MAP_SHARED_VALIDATE:
+/*
+ * MAP_SYNC is only supported for MAP_SHARED_VALIDATE, and is
+ * therefore omitted from mmap_flags_tbl and TARGET_LEGACY_MAP_MASK.
+ */
+if (target_flags & ~(TARGET_LEGACY_MAP_MASK | TARGET_MAP_SYNC)) {
+return -TARGET_EOPNOTSUPP;
+}
+host_flags = MAP_SHARED_VALIDATE;
+if (target_flags & TARGET_MAP_SYNC) {
+host_flags |= MAP_SYNC;
+}
+break;
+default:
+return -TARGET_EINVAL;
+}
+host_flags |= target_to_host_bitmask(target_flags, mmap_flags_tbl);
+
+return get_errno(target_mmap(addr, len, prot, host_flags, fd, offset));
+}
+
 /*
  * NOTE: TARGET_ABI32 is defined for TARGET_I386 (but not for TARGET_X86_64)
  *   TARGET_I386 is defined if TARGET_X86_64 is defined
@@ -10536,28 +10600,20 @@ static abi_long do_syscall1(CPUArchState *cpu_env, 
int num, abi_long arg1,
 v5 = tswapal(v[4]);
 v6 = tswapal(v[5]);
 unlock_user(v, arg1, 0);
-ret = get_errno(target_mmap(v1, v2, v3,
-target_to_host_bitmask(v4, 
mmap_flags_tbl),
-v5, v6));
+return do_mmap(v1, v2, v3, v4, v5, v6);
 }
 #else
 /* mmap pointers are always untagged */
-ret = get_errno(target_mmap(arg1, arg2, arg3,
-target_to_host_bitmask(arg4, 
mmap_flags_tbl),
-arg5,
-arg6));
+return

[PATCH for-8.1 0/2] linux-user: Fix MAP_SHARED_VALIDATE, MAP_FIXED_NOREPLACE

2023-08-07 Thread Richard Henderson

Fixes LTP mmap17 (MAP_FIXED_NOREPLACE) and mmap20 (MAP_SHARED_VALIDATE),
both of which were added to linux-user during the 8.1 cycle, and so
would be nice to fix right away.

Does not fix mmap18, which will fail depending on the guest memory map.
The real kernel avoids placing new maps immediately prior to GROWSDOWN
regions (leaving them no room into which to expand) and qemu does not.
This is a long-standing problem and will not be fixable for 8.1.

Reported-by: Michael Tokarev 


r~


Richard Henderson (2):
  linux-user: Split out do_mmap
  linux-user: Use ARRAY_SIZE with bitmask_transtbl

 bsd-user/syscall_defs.h   |  2 +
 include/exec/user/thunk.h | 15 --
 linux-user/syscall.c  | 96 +--
 linux-user/thunk.c| 24 +-
 4 files changed, 98 insertions(+), 39 deletions(-)

-- 
2.34.1

[PATCH 2/2] linux-user: Use ARRAY_SIZE with bitmask_transtbl

2023-08-07 Thread Richard Henderson

Rather than using a zero tuple to end the table, use a macro
to apply ARRAY_SIZE and pass that on to the convert functions.

This fixes two bugs in which the conversion functions required
that both the target and host masks be non-zero in order to
continue, rather than require both target and host masks be
zero in order to terminate.

This affected mmap_flags_tbl when the host does not support
all of the flags we wish to convert (e.g. MAP_UNINITIALIZED).
Mapping these flags to zero is good enough, and matches how
the kernel ignores bits that are unknown.

Fixes: 4b840f96 ("linux-user: Populate more bits in mmap_flags_tbl")
Signed-off-by: Richard Henderson 
---
 bsd-user/syscall_defs.h   |  2 ++
 include/exec/user/thunk.h | 15 +++
 linux-user/syscall.c  |  6 --
 linux-user/thunk.c| 24 
 4 files changed, 25 insertions(+), 22 deletions(-)

diff --git a/bsd-user/syscall_defs.h b/bsd-user/syscall_defs.h
index b6d113d24a..aedfbf2d7d 100644
--- a/bsd-user/syscall_defs.h
+++ b/bsd-user/syscall_defs.h
@@ -227,7 +227,9 @@ type safe_##name(type1 arg1, type2 arg2, type3 arg3, type4 
arg4, \
 }
 
 /* So far all target and host bitmasks are the same */
+#undef  target_to_host_bitmask
 #define target_to_host_bitmask(x, tbl) (x)
+#undef  host_to_target_bitmask
 #define host_to_target_bitmask(x, tbl) (x)
 
 #endif /* SYSCALL_DEFS_H */
diff --git a/include/exec/user/thunk.h b/include/exec/user/thunk.h
index 300a840d58..6eedef48d8 100644
--- a/include/exec/user/thunk.h
+++ b/include/exec/user/thunk.h
@@ -193,10 +193,17 @@ static inline int thunk_type_align(const argtype 
*type_ptr, int is_host)
 }
 }
 
-unsigned int target_to_host_bitmask(unsigned int target_mask,
-const bitmask_transtbl * trans_tbl);
-unsigned int host_to_target_bitmask(unsigned int host_mask,
-const bitmask_transtbl * trans_tbl);
+unsigned int target_to_host_bitmask_len(unsigned int target_mask,
+const bitmask_transtbl *trans_tbl,
+size_t trans_len);
+unsigned int host_to_target_bitmask_len(unsigned int host_mask,
+const bitmask_transtbl * trans_tbl,
+size_t trans_len);
+
+#define target_to_host_bitmask(M, T) \
+target_to_host_bitmask_len(M, T, ARRAY_SIZE(T))
+#define host_to_target_bitmask(M, T) \
+host_to_target_bitmask_len(M, T, ARRAY_SIZE(T))
 
 void thunk_init(unsigned int max_structs);
 
diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 34deff0723..12ebc70df5 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -455,7 +455,6 @@ static const bitmask_transtbl fcntl_flags_tbl[] = {
 #if TARGET_O_LARGEFILE != 0 || O_LARGEFILE != 0
   { TARGET_O_LARGEFILE, TARGET_O_LARGEFILE, O_LARGEFILE, O_LARGEFILE, },
 #endif
-  { 0, 0, 0, 0 }
 };
 
 _syscall2(int, sys_getcwd1, char *, buf, size_t, size)
@@ -5813,7 +5812,6 @@ static const bitmask_transtbl iflag_tbl[] = {
 { TARGET_IXOFF, TARGET_IXOFF, IXOFF, IXOFF },
 { TARGET_IMAXBEL, TARGET_IMAXBEL, IMAXBEL, IMAXBEL },
 { TARGET_IUTF8, TARGET_IUTF8, IUTF8, IUTF8},
-{ 0, 0, 0, 0 }
 };
 
 static const bitmask_transtbl oflag_tbl[] = {
@@ -5841,7 +5839,6 @@ static const bitmask_transtbl oflag_tbl[] = {
{ TARGET_VTDLY, TARGET_VT1, VTDLY, VT1 },
{ TARGET_FFDLY, TARGET_FF0, FFDLY, FF0 },
{ TARGET_FFDLY, TARGET_FF1, FFDLY, FF1 },
-   { 0, 0, 0, 0 }
 };
 
 static const bitmask_transtbl cflag_tbl[] = {
@@ -5876,7 +5873,6 @@ static const bitmask_transtbl cflag_tbl[] = {
{ TARGET_HUPCL, TARGET_HUPCL, HUPCL, HUPCL },
{ TARGET_CLOCAL, TARGET_CLOCAL, CLOCAL, CLOCAL },
{ TARGET_CRTSCTS, TARGET_CRTSCTS, CRTSCTS, CRTSCTS },
-   { 0, 0, 0, 0 }
 };
 
 static const bitmask_transtbl lflag_tbl[] = {
@@ -5896,7 +5892,6 @@ static const bitmask_transtbl lflag_tbl[] = {
   { TARGET_PENDIN, TARGET_PENDIN, PENDIN, PENDIN },
   { TARGET_IEXTEN, TARGET_IEXTEN, IEXTEN, IEXTEN },
   { TARGET_EXTPROC, TARGET_EXTPROC, EXTPROC, EXTPROC},
-  { 0, 0, 0, 0 }
 };
 
 static void target_to_host_termios (void *dst, const void *src)
@@ -6008,7 +6003,6 @@ static const bitmask_transtbl mmap_flags_tbl[] = {
   MAP_FIXED_NOREPLACE, MAP_FIXED_NOREPLACE },
 { TARGET_MAP_UNINITIALIZED, TARGET_MAP_UNINITIALIZED,
   MAP_UNINITIALIZED, MAP_UNINITIALIZED },
-{ 0, 0, 0, 0 }
 };
 
 /*
diff --git a/linux-user/thunk.c b/linux-user/thunk.c
index dac4bf11c6..071aad4b5f 100644
--- a/linux-user/thunk.c
+++ b/linux-user/thunk.c
@@ -436,29 +436,29 @@ const argtype *thunk_print(void *arg, const argtype 
*type_ptr)
 /* Utility function: Table-driven functions to translate bitmasks
  * between host and target formats
  */
-unsigned int target_to_host_bitmask(unsigned int target_mask,
-const bitmask_transtbl * trans_tbl)
+unsigned

[RFC PATCH v2] target/i386: add support for VMX_SECONDARY_EXEC_ENABLE_USER_WAIT_PAUSE

2023-08-07 Thread Ake Koomsin

Current QEMU can expose waitpkg to guests when it is available. However,
VMX_SECONDARY_EXEC_ENABLE_USER_WAIT_PAUSE is still not recognized and
masked by QEMU. This can lead to an unexpected situation when a L1
hypervisor wants to expose waitpkg to a L2 guest. The L1 hypervisor can
assume that VMX_SECONDARY_EXEC_ENABLE_USER_WAIT_PAUSE exists as waitpkg is
available. The L1 hypervisor then can accidentally expose waitpkg to the
L2 guest. This will cause invalid opcode exception in the L2 guest when
it executes waitpkg related instructions.

This patch adds VMX_SECONDARY_EXEC_ENABLE_USER_WAIT_PAUSE support, and
sets up dependency between the bit and CPUID_7_0_ECX_WAITPKG. QEMU should
not expose waitpkg feature if VMX_SECONDARY_EXEC_ENABLE_USER_WAIT_PAUSE is
not available to avoid unexpected invalid opcode exception in L2 guests.

Signed-off-by: Ake Koomsin 
---

v2:
- Fix typo in the patch header (targer -> target)

v1:
https://lists.gnu.org/archive/html/qemu-devel/2023-08/msg01048.html

 target/i386/cpu.c | 6 +-
 target/i386/cpu.h | 1 +
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 97ad229d8b..00f913b638 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -1228,7 +1228,7 @@ FeatureWordInfo feature_word_info[FEATURE_WORDS] = {
 "vmx-invpcid-exit", "vmx-vmfunc", "vmx-shadow-vmcs", 
"vmx-encls-exit",
 "vmx-rdseed-exit", "vmx-pml", NULL, NULL,
 "vmx-xsaves", NULL, NULL, NULL,
-NULL, "vmx-tsc-scaling", NULL, NULL,
+NULL, "vmx-tsc-scaling", "vmx-enable-user-wait-pause", NULL,
 NULL, NULL, NULL, NULL,
 },
 .msr = {
@@ -1545,6 +1545,10 @@ static FeatureDep feature_dependencies[] = {
 .from = { FEAT_8000_0001_ECX,   CPUID_EXT3_SVM },
 .to = { FEAT_SVM,   ~0ull },
 },
+{
+.from = { FEAT_VMX_SECONDARY_CTLS,  
VMX_SECONDARY_EXEC_ENABLE_USER_WAIT_PAUSE },
+.to = { FEAT_7_0_ECX,   CPUID_7_0_ECX_WAITPKG },
+},
 };
 
 typedef struct X86RegisterInfo32 {
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index e0771a1043..a6000e93bd 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -,6 +,7 @@ uint64_t x86_cpu_get_supported_feature_word(FeatureWord w,
 #define VMX_SECONDARY_EXEC_ENABLE_PML   0x0002
 #define VMX_SECONDARY_EXEC_XSAVES   0x0010
 #define VMX_SECONDARY_EXEC_TSC_SCALING  0x0200
+#define VMX_SECONDARY_EXEC_ENABLE_USER_WAIT_PAUSE   0x0400
 
 #define VMX_PIN_BASED_EXT_INTR_MASK 0x0001
 #define VMX_PIN_BASED_NMI_EXITING   0x0008
-- 
2.41.0

Re: [PATCH v5 1/5] ebpf: Added eBPF map update through mmap.

2023-08-07 Thread Jason Wang

On Thu, Aug 3, 2023 at 5:01 AM Andrew Melnychenko  wrote:
>
> Changed eBPF map updates through mmaped array.
> Mmaped arrays provide direct access to map data.
> It should omit using bpf_map_update_elem() call,
> which may require capabilities that are not present.
>
> Signed-off-by: Andrew Melnychenko 
> ---
>  ebpf/ebpf_rss.c | 117 ++--
>  ebpf/ebpf_rss.h |   5 +++
>  2 files changed, 99 insertions(+), 23 deletions(-)
>
> diff --git a/ebpf/ebpf_rss.c b/ebpf/ebpf_rss.c
> index cee658c158b..247f5eee1b6 100644
> --- a/ebpf/ebpf_rss.c
> +++ b/ebpf/ebpf_rss.c
> @@ -27,19 +27,83 @@ void ebpf_rss_init(struct EBPFRSSContext *ctx)
>  {
>  if (ctx != NULL) {
>  ctx->obj = NULL;
> +ctx->program_fd = -1;
> +ctx->map_configuration = -1;
> +ctx->map_toeplitz_key = -1;
> +ctx->map_indirections_table = -1;
> +
> +ctx->mmap_configuration = NULL;
> +ctx->mmap_toeplitz_key = NULL;
> +ctx->mmap_indirections_table = NULL;
>  }
>  }
>
>  bool ebpf_rss_is_loaded(struct EBPFRSSContext *ctx)
>  {
> -return ctx != NULL && ctx->obj != NULL;
> +return ctx != NULL && (ctx->obj != NULL || ctx->program_fd != -1);
> +}
> +
> +static bool ebpf_rss_mmap(struct EBPFRSSContext *ctx)
> +{
> +if (!ebpf_rss_is_loaded(ctx)) {
> +return false;
> +}
> +
> +ctx->mmap_configuration = mmap(NULL, qemu_real_host_page_size(),
> +   PROT_READ | PROT_WRITE, MAP_SHARED,
> +   ctx->map_configuration, 0);
> +if (ctx->mmap_configuration == MAP_FAILED) {
> +trace_ebpf_error("eBPF RSS", "can not mmap eBPF configuration 
> array");
> +return false;
> +}
> +ctx->mmap_toeplitz_key = mmap(NULL, qemu_real_host_page_size(),
> +   PROT_READ | PROT_WRITE, MAP_SHARED,
> +   ctx->map_toeplitz_key, 0);
> +if (ctx->mmap_toeplitz_key == MAP_FAILED) {
> +trace_ebpf_error("eBPF RSS", "can not mmap eBPF toeplitz key");
> +goto toeplitz_fail;
> +}
> +ctx->mmap_indirections_table = mmap(NULL, qemu_real_host_page_size(),
> +   PROT_READ | PROT_WRITE, MAP_SHARED,
> +   ctx->map_indirections_table, 0);
> +if (ctx->mmap_indirections_table == MAP_FAILED) {
> +trace_ebpf_error("eBPF RSS", "can not mmap eBPF indirection table");
> +goto indirection_fail;
> +}
> +
> +return true;
> +
> +indirection_fail:
> +munmap(ctx->mmap_toeplitz_key, qemu_real_host_page_size());
> +toeplitz_fail:
> +munmap(ctx->mmap_configuration, qemu_real_host_page_size());
> +
> +ctx->mmap_configuration = NULL;
> +ctx->mmap_toeplitz_key = NULL;
> +ctx->mmap_indirections_table = NULL;
> +return false;
> +}
> +
> +static void ebpf_rss_munmap(struct EBPFRSSContext *ctx)
> +{
> +if (!ebpf_rss_is_loaded(ctx)) {
> +return;
> +}
> +
> +munmap(ctx->mmap_indirections_table, qemu_real_host_page_size());
> +munmap(ctx->mmap_toeplitz_key, qemu_real_host_page_size());
> +munmap(ctx->mmap_configuration, qemu_real_host_page_size());
> +
> +ctx->mmap_configuration = NULL;
> +ctx->mmap_toeplitz_key = NULL;
> +ctx->mmap_indirections_table = NULL;
>  }
>
>  bool ebpf_rss_load(struct EBPFRSSContext *ctx)
>  {
>  struct rss_bpf *rss_bpf_ctx;
>
> -if (ctx == NULL) {
> +if (ctx == NULL || ebpf_rss_is_loaded(ctx)) {
>  return false;
>  }
>
> @@ -66,10 +130,18 @@ bool ebpf_rss_load(struct EBPFRSSContext *ctx)
>  ctx->map_toeplitz_key = bpf_map__fd(
>  rss_bpf_ctx->maps.tap_rss_map_toeplitz_key);
>
> +if (!ebpf_rss_mmap(ctx)) {
> +goto error;
> +}
> +
>  return true;
>  error:
>  rss_bpf__destroy(rss_bpf_ctx);
>  ctx->obj = NULL;
> +ctx->program_fd = -1;
> +ctx->map_configuration = -1;
> +ctx->map_toeplitz_key = -1;
> +ctx->map_indirections_table = -1;
>
>  return false;
>  }
> @@ -77,15 +149,11 @@ error:
>  static bool ebpf_rss_set_config(struct EBPFRSSContext *ctx,
>  struct EBPFRSSConfig *config)
>  {
> -uint32_t map_key = 0;
> -
>  if (!ebpf_rss_is_loaded(ctx)) {
>  return false;
>  }
> -if (bpf_map_update_elem(ctx->map_configuration,
> -_key, config, 0) < 0) {
> -return false;
> -}
> +
> +memcpy(ctx->mmap_configuration, config, sizeof(*config));
>  return true;
>  }
>
> @@ -93,27 +161,19 @@ static bool ebpf_rss_set_indirections_table(struct 
> EBPFRSSContext *ctx,
>  uint16_t *indirections_table,
>  size_t len)
>  {
> -uint32_t i = 0;
> -
>  if (!ebpf_rss_is_loaded(ctx) || indirections_table == NULL ||
> len > VIRTIO_NET_RSS_MAX_TABLE_LEN) {
>  return false;

Re: [PATCH v3] net: add initial support for AF_XDP network backend

2023-08-07 Thread Jason Wang

On Sat, Aug 5, 2023 at 2:20 AM Ilya Maximets  wrote:
>
> AF_XDP is a network socket family that allows communication directly
> with the network device driver in the kernel, bypassing most or all
> of the kernel networking stack.  In the essence, the technology is
> pretty similar to netmap.  But, unlike netmap, AF_XDP is Linux-native
> and works with any network interfaces without driver modifications.
> Unlike vhost-based backends (kernel, user, vdpa), AF_XDP doesn't
> require access to character devices or unix sockets.  Only access to
> the network interface itself is necessary.
>
> This patch implements a network backend that communicates with the
> kernel by creating an AF_XDP socket.  A chunk of userspace memory
> is shared between QEMU and the host kernel.  4 ring buffers (Tx, Rx,
> Fill and Completion) are placed in that memory along with a pool of
> memory buffers for the packet data.  Data transmission is done by
> allocating one of the buffers, copying packet data into it and
> placing the pointer into Tx ring.  After transmission, device will
> return the buffer via Completion ring.  On Rx, device will take
> a buffer form a pre-populated Fill ring, write the packet data into
> it and place the buffer into Rx ring.
>
> AF_XDP network backend takes on the communication with the host
> kernel and the network interface and forwards packets to/from the
> peer device in QEMU.
>
> Usage example:
>
>   -device virtio-net-pci,netdev=guest1,mac=00:16:35:AF:AA:5C
>   -netdev af-xdp,ifname=ens6f1np1,id=guest1,mode=native,queues=1
>
> XDP program bridges the socket with a network interface.  It can be
> attached to the interface in 2 different modes:
>
> 1. skb - this mode should work for any interface and doesn't require
>  driver support.  With a caveat of lower performance.
>
> 2. native - this does require support from the driver and allows to
> bypass skb allocation in the kernel and potentially use
> zero-copy while getting packets in/out userspace.
>
> By default, QEMU will try to use native mode and fall back to skb.
> Mode can be forced via 'mode' option.  To force 'copy' even in native
> mode, use 'force-copy=on' option.  This might be useful if there is
> some issue with the driver.
>
> Option 'queues=N' allows to specify how many device queues should
> be open.  Note that all the queues that are not open are still
> functional and can receive traffic, but it will not be delivered to
> QEMU.  So, the number of device queues should generally match the
> QEMU configuration, unless the device is shared with something
> else and the traffic re-direction to appropriate queues is correctly
> configured on a device level (e.g. with ethtool -N).
> 'start-queue=M' option can be used to specify from which queue id
> QEMU should start configuring 'N' queues.  It might also be necessary
> to use this option with certain NICs, e.g. MLX5 NICs.  See the docs
> for examples.
>
> In a general case QEMU will need CAP_NET_ADMIN and CAP_SYS_ADMIN
> or CAP_BPF capabilities in order to load default XSK/XDP programs to
> the network interface and configure BPF maps.  It is possible, however,
> to run with no capabilities.  For that to work, an external process
> with enough capabilities will need to pre-load default XSK program,
> create AF_XDP sockets and pass their file descriptors to QEMU process
> on startup via 'sock-fds' option.  Network backend will need to be
> configured with 'inhibit=on' to avoid loading of the program.
> QEMU will need 32 MB of locked memory (RLIMIT_MEMLOCK) per queue
> or CAP_IPC_LOCK.
>
> There are few performance challenges with the current network backends.
>
> First is that they do not support IO threads.  This means that data
> path is handled by the main thread in QEMU and may slow down other
> work or may be slowed down by some other work.  This also means that
> taking advantage of multi-queue is generally not possible today.
>
> Another thing is that data path is going through the device emulation
> code, which is not really optimized for performance.  The fastest
> "frontend" device is virtio-net.  But it's not optimized for heavy
> traffic either, because it expects such use-cases to be handled via
> some implementation of vhost (user, kernel, vdpa).  In practice, we
> have virtio notifications and rcu lock/unlock on a per-packet basis
> and not very efficient accesses to the guest memory.  Communication
> channels between backend and frontend devices do not allow passing
> more than one packet at a time as well.
>
> Some of these challenges can be avoided in the future by adding better
> batching into device emulation or by implementing vhost-af-xdp variant.
>
> There are also a few kernel limitations.  AF_XDP sockets do not
> support any kinds of checksum or segmentation offloading.  Buffers
> are limited to a page size (4K), i.e. MTU is limited.  Multi-buffer
> support implementation for AF_XDP is in progress, but not ready yet.
> Also,

[PATCH QEMU v3 3/3] tests/migration: Introduce dirty-limit into guestperf

2023-08-07 Thread ~hyman

From: Hyman Huang(黄勇) 

Currently, guestperf does not cover the dirty-limit
migration, support this feature.

Note that dirty-limit requires 'dirty-ring-size' set.

To enable dirty-limit, setting x-vcpu-dirty-limit-period
as 500ms and x-vcpu-dirty-limit as 10MB/s:
$ ./tests/migration/guestperf.py \
--dirty-ring-size 4096 \
--dirty-limit --x-vcpu-dirty-limit-period 500 \
--vcpu-dirty-limit 10 --output output.json \

To run the entire standardized set of dirty-limit-enabled
comparisons, with unix migration:
$ ./tests/migration/guestperf-batch.py \
--dirty-ring-size 4096 \
--dst-host localhost --transport unix \
--filter compr-dirty-limit* --output outputdir

Signed-off-by: Hyman Huang(黄勇) 
Message-Id: <169073391195.19893.61067537833811032...@git.sr.ht>
---
 tests/migration/guestperf/comparison.py | 23 +++
 tests/migration/guestperf/engine.py | 17 +
 tests/migration/guestperf/progress.py   | 16 ++--
 tests/migration/guestperf/scenario.py   | 11 ++-
 tests/migration/guestperf/shell.py  | 18 +-
 5 files changed, 81 insertions(+), 4 deletions(-)

diff --git a/tests/migration/guestperf/comparison.py 
b/tests/migration/guestperf/comparison.py
index c03b3f6d7e..42cc0372d1 100644
--- a/tests/migration/guestperf/comparison.py
+++ b/tests/migration/guestperf/comparison.py
@@ -135,4 +135,27 @@ COMPARISONS = [
 Scenario("compr-multifd-channels-64",
  multifd=True, multifd_channels=64),
 ]),
+
+# Looking at effect of dirty-limit with
+# varying x_vcpu_dirty_limit_period
+Comparison("compr-dirty-limit-period", scenarios = [
+Scenario("compr-dirty-limit-period-500",
+ dirty_limit=True, x_vcpu_dirty_limit_period=500),
+Scenario("compr-dirty-limit-period-800",
+ dirty_limit=True, x_vcpu_dirty_limit_period=800),
+Scenario("compr-dirty-limit-period-1000",
+ dirty_limit=True, x_vcpu_dirty_limit_period=1000),
+]),
+
+
+# Looking at effect of dirty-limit with
+# varying vcpu_dirty_limit
+Comparison("compr-dirty-limit", scenarios = [
+Scenario("compr-dirty-limit-10MB",
+ dirty_limit=True, vcpu_dirty_limit=10),
+Scenario("compr-dirty-limit-20MB",
+ dirty_limit=True, vcpu_dirty_limit=20),
+Scenario("compr-dirty-limit-50MB",
+ dirty_limit=True, vcpu_dirty_limit=50),
+]),
 ]
diff --git a/tests/migration/guestperf/engine.py 
b/tests/migration/guestperf/engine.py
index 29ebb5011b..93a6f78e46 100644
--- a/tests/migration/guestperf/engine.py
+++ b/tests/migration/guestperf/engine.py
@@ -102,6 +102,8 @@ class Engine(object):
 info.get("expected-downtime", 0),
 info.get("setup-time", 0),
 info.get("cpu-throttle-percentage", 0),
+info.get("dirty-limit-throttle-time-per-round", 0),
+info.get("dirty-limit-ring-full-time", 0),
 )
 
 def _migrate(self, hardware, scenario, src, dst, connect_uri):
@@ -203,6 +205,21 @@ class Engine(object):
 resp = dst.command("migrate-set-parameters",
multifd_channels=scenario._multifd_channels)
 
+if scenario._dirty_limit:
+if not hardware._dirty_ring_size:
+raise Exception("dirty ring size must be configured when "
+"testing dirty limit migration")
+
+resp = src.command("migrate-set-capabilities",
+   capabilities = [
+   { "capability": "dirty-limit",
+ "state": True }
+   ])
+resp = src.command("migrate-set-parameters",
+x_vcpu_dirty_limit_period=scenario._x_vcpu_dirty_limit_period)
+resp = src.command("migrate-set-parameters",
+   vcpu_dirty_limit=scenario._vcpu_dirty_limit)
+
 resp = src.command("migrate", uri=connect_uri)
 
 post_copy = False
diff --git a/tests/migration/guestperf/progress.py 
b/tests/migration/guestperf/progress.py
index ab1ee57273..d490584217 100644
--- a/tests/migration/guestperf/progress.py
+++ b/tests/migration/guestperf/progress.py
@@ -81,7 +81,9 @@ class Progress(object):
  downtime,
  downtime_expected,
  setup_time,
- throttle_pcent):
+ throttle_pcent,
+ dirty_limit_throttle_time_per_round,
+ dirty_limit_ring_full_time):
 
 self._status = status
 self._ram = ram
@@ -91,6 +93,10 @@ class Progress(object):
 self._downtime_expected = downtime_expected
 self._setup_time = setup_time
 self._throttle_pcent = throttle_pcent
+self._dirty_limit_throttle_time_per_round = \
+dirty_limit_throttle_time_per_round
+

[PATCH QEMU v3 0/3] migration: enrich the dirty-limit test case

2023-08-07 Thread ~hyman

Ping

This version is a copy of version 2 and is rebased
on the master. No functional changes.

The dirty-limit migration test involves many passes
and takes about 1 minute on average, so put it in
the slow mode of migration-test. Inspired by Peter.

V2:
- put the dirty-limit migration test in slow mode and
  enrich the test case comment

Dirty-limit feature was introduced in 8.1, and the test
case could be enriched to make sure the behavior and
the performance of dirty-limit is exactly what we want.

This series adds 2 test cases, the first commit aims
for the functional test and the others aim for the
performance test.

Please review, thanks.

Yong.

Hyman Huang(黄勇) (3):
  tests: Add migration dirty-limit capability test
  tests/migration: Introduce dirty-ring-size option into guestperf
  tests/migration: Introduce dirty-limit into guestperf

 tests/migration/guestperf/comparison.py |  23 
 tests/migration/guestperf/engine.py |  23 +++-
 tests/migration/guestperf/hardware.py   |   8 +-
 tests/migration/guestperf/progress.py   |  16 ++-
 tests/migration/guestperf/scenario.py   |  11 +-
 tests/migration/guestperf/shell.py  |  24 +++-
 tests/qtest/migration-test.c| 164 
 7 files changed, 261 insertions(+), 8 deletions(-)

-- 
2.38.5

[PATCH QEMU v3 2/3] tests/migration: Introduce dirty-ring-size option into guestperf

2023-08-07 Thread ~hyman

From: Hyman Huang(黄勇) 

Dirty ring size configuration is not supported by guestperf tool.

Introduce dirty-ring-size (ranges in [1024, 65536]) option so
developers can play with dirty-ring and dirty-limit feature easier.

To set dirty ring size with 4096 during migration test:
$ ./tests/migration/guestperf.py --dirty-ring-size 4096 xxx

Signed-off-by: Hyman Huang(黄勇) 
Message-Id: <169073391195.19893.61067537833811032...@git.sr.ht>
---
 tests/migration/guestperf/engine.py   | 6 +-
 tests/migration/guestperf/hardware.py | 8 ++--
 tests/migration/guestperf/shell.py| 6 +-
 3 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/tests/migration/guestperf/engine.py 
b/tests/migration/guestperf/engine.py
index e69d16a62c..29ebb5011b 100644
--- a/tests/migration/guestperf/engine.py
+++ b/tests/migration/guestperf/engine.py
@@ -325,7 +325,6 @@ class Engine(object):
 cmdline = "'" + cmdline + "'"
 
 argv = [
-"-accel", "kvm",
 "-cpu", "host",
 "-kernel", self._kernel,
 "-initrd", self._initrd,
@@ -333,6 +332,11 @@ class Engine(object):
 "-m", str((hardware._mem * 1024) + 512),
 "-smp", str(hardware._cpus),
 ]
+if hardware._dirty_ring_size:
+argv.extend(["-accel", "kvm,dirty-ring-size=%s" %
+ hardware._dirty_ring_size])
+else:
+argv.extend(["-accel", "kvm"])
 
 argv.extend(self._get_qemu_serial_args())
 
diff --git a/tests/migration/guestperf/hardware.py 
b/tests/migration/guestperf/hardware.py
index 3145785ffd..f779cc050b 100644
--- a/tests/migration/guestperf/hardware.py
+++ b/tests/migration/guestperf/hardware.py
@@ -23,7 +23,8 @@ class Hardware(object):
  src_cpu_bind=None, src_mem_bind=None,
  dst_cpu_bind=None, dst_mem_bind=None,
  prealloc_pages = False,
- huge_pages=False, locked_pages=False):
+ huge_pages=False, locked_pages=False,
+ dirty_ring_size=0):
 self._cpus = cpus
 self._mem = mem # GiB
 self._src_mem_bind = src_mem_bind # List of NUMA nodes
@@ -33,6 +34,7 @@ class Hardware(object):
 self._prealloc_pages = prealloc_pages
 self._huge_pages = huge_pages
 self._locked_pages = locked_pages
+self._dirty_ring_size = dirty_ring_size
 
 
 def serialize(self):
@@ -46,6 +48,7 @@ class Hardware(object):
 "prealloc_pages": self._prealloc_pages,
 "huge_pages": self._huge_pages,
 "locked_pages": self._locked_pages,
+"dirty_ring_size": self._dirty_ring_size,
 }
 
 @classmethod
@@ -59,4 +62,5 @@ class Hardware(object):
 data["dst_mem_bind"],
 data["prealloc_pages"],
 data["huge_pages"],
-data["locked_pages"])
+data["locked_pages"],
+data["dirty_ring_size"])
diff --git a/tests/migration/guestperf/shell.py 
b/tests/migration/guestperf/shell.py
index 8a809e3dda..7d6b8cd7cf 100644
--- a/tests/migration/guestperf/shell.py
+++ b/tests/migration/guestperf/shell.py
@@ -60,6 +60,8 @@ class BaseShell(object):
 parser.add_argument("--prealloc-pages", dest="prealloc_pages", 
default=False)
 parser.add_argument("--huge-pages", dest="huge_pages", default=False)
 parser.add_argument("--locked-pages", dest="locked_pages", 
default=False)
+parser.add_argument("--dirty-ring-size", dest="dirty_ring_size",
+default=0, type=int)
 
 self._parser = parser
 
@@ -89,7 +91,9 @@ class BaseShell(object):
 
 locked_pages=args.locked_pages,
 huge_pages=args.huge_pages,
-prealloc_pages=args.prealloc_pages)
+prealloc_pages=args.prealloc_pages,
+
+dirty_ring_size=args.dirty_ring_size)
 
 
 class Shell(BaseShell):
-- 
2.38.5

[PATCH QEMU v3 1/3] tests: Add migration dirty-limit capability test

2023-08-07 Thread ~hyman

From: Hyman Huang(黄勇) 

Add migration dirty-limit capability test if kernel support
dirty ring.

Migration dirty-limit capability introduce dirty limit
capability, two parameters: x-vcpu-dirty-limit-period and
vcpu-dirty-limit are introduced to implement the live
migration with dirty limit.

The test case does the following things:
1. start src, dst vm and enable dirty-limit capability
2. start migrate and set cancel it to check if dirty limit
   stop working.
3. restart dst vm
4. start migrate and enable dirty-limit capability
5. check if migration satisfy the convergence condition
   during pre-switchover phase.

Note that this test case involves many passes, so it runs
in slow mode only.

Signed-off-by: Hyman Huang(黄勇) 
Message-Id: <169073391195.19893.61067537833811032...@git.sr.ht>
---
 tests/qtest/migration-test.c | 164 +++
 1 file changed, 164 insertions(+)

diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index 62d3f37021..0be2d17c42 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -2739,6 +2739,166 @@ static void test_vcpu_dirty_limit(void)
 dirtylimit_stop_vm(vm);
 }
 
+static void migrate_dirty_limit_wait_showup(QTestState *from,
+const int64_t period,
+const int64_t value)
+{
+/* Enable dirty limit capability */
+migrate_set_capability(from, "dirty-limit", true);
+
+/* Set dirty limit parameters */
+migrate_set_parameter_int(from, "x-vcpu-dirty-limit-period", period);
+migrate_set_parameter_int(from, "vcpu-dirty-limit", value);
+
+/* Make sure migrate can't converge */
+migrate_ensure_non_converge(from);
+
+/* To check limit rate after precopy */
+migrate_set_capability(from, "pause-before-switchover", true);
+
+/* Wait for the serial output from the source */
+wait_for_serial("src_serial");
+}
+
+/*
+ * This test does:
+ *  source  destination
+ *  start vm
+ *  start incoming vm
+ *  migrate
+ *  wait dirty limit to begin
+ *  cancel migrate
+ *  cancellation check
+ *  restart incoming vm
+ *  migrate
+ *  wait dirty limit to begin
+ *  wait pre-switchover event
+ *  convergence condition check
+ *
+ * And see if dirty limit migration works correctly.
+ * This test case involves many passes, so it runs in slow mode only.
+ */
+static void test_migrate_dirty_limit(void)
+{
+g_autofree char *uri = g_strdup_printf("unix:%s/migsocket", tmpfs);
+QTestState *from, *to;
+int64_t remaining;
+uint64_t throttle_us_per_full;
+/*
+ * We want the test to be stable and as fast as possible.
+ * E.g., with 1Gb/s bandwith migration may pass without dirty limit,
+ * so we need to decrease a bandwidth.
+ */
+const int64_t dirtylimit_period = 1000, dirtylimit_value = 50;
+const int64_t max_bandwidth = 4; /* ~400Mb/s */
+const int64_t downtime_limit = 250; /* 250ms */
+/*
+ * We migrate through unix-socket (> 500Mb/s).
+ * Thus, expected migration speed ~= bandwidth limit (< 500Mb/s).
+ * So, we can predict expected_threshold
+ */
+const int64_t expected_threshold = max_bandwidth * downtime_limit / 1000;
+int max_try_count = 10;
+MigrateCommon args = {
+.start = {
+.hide_stderr = true,
+.use_dirty_ring = true,
+},
+.listen_uri = uri,
+.connect_uri = uri,
+};
+
+/* Start src, dst vm */
+if (test_migrate_start(, , args.listen_uri, )) {
+return;
+}
+
+/* Prepare for dirty limit migration and wait src vm show up */
+migrate_dirty_limit_wait_showup(from, dirtylimit_period, dirtylimit_value);
+
+/* Start migrate */
+migrate_qmp(from, uri, "{}");
+
+/* Wait for dirty limit throttle begin */
+throttle_us_per_full = 0;
+while (throttle_us_per_full == 0) {
+throttle_us_per_full =
+read_migrate_property_int(from, "dirty-limit-throttle-time-per-round");
+usleep(100);
+g_assert_false(got_src_stop);
+}
+
+/* Now cancel migrate and wait for dirty limit throttle switch off */
+migrate_cancel(from);
+wait_for_migration_status(from, "cancelled", NULL);
+
+/* Check if dirty limit throttle switched off, set timeout 1ms */
+do {
+throttle_us_per_full =
+read_migrate_property_int(from, "dirty-limit-throttle-time-per-round");
+usleep(100);
+g_assert_false(got_src_stop);
+} while (throttle_us_per_full != 0 && --max_try_count);
+
+/* Assert dirty limit is not in service */
+g_assert_cmpint(throttle_us_per_full, ==, 0);
+
+args = (MigrateCommon) {
+.start = {
+.only_target = true,
+.use_dirty_ring = true,
+},
+.listen_uri = uri,
+.connect_uri = uri,
+};
+
+/* Restart dst vm,

Re: [PATCH v10 0/7] igb: packet-split descriptors support

2023-08-07 Thread Jason Wang

On Mon, Aug 7, 2023 at 10:52 PM Tomasz Dzieciol/VIM Integration (NC)
/SRPOL/Engineer/Samsung Electronics 
wrote:
>
> Hi,
>
> It's been a while since review was done and nothing happened with those 
> patches since then.
>
> As I understand from guide: 
> https://www.qemu.org/docs/master/devel/submitting-a-patch.html#is-my-patch-in 
> I should wait. Is that correct?

I've queued this for 8.2.

Thanks

>
> -Original Message-
> From: Akihiko Odaki 
> Sent: wtorek, 30 maja 2023 04:49
> To: Tomasz Dzieciol ; qemu-devel@nongnu.org
> Cc: sriram.yagnara...@est.tech; jasow...@redhat.com; k.kwiec...@samsung.com; 
> m.socha...@samsung.com
> Subject: Re: [PATCH v10 0/7] igb: packet-split descriptors support
>
> On 2023/05/29 23:01, Tomasz Dzieciol wrote:
> > Purposes of this series of patches:
> > * introduce packet-split RX descriptors support. This feature is used by 
> > Linux
> >VF driver for MTU values from 2048.
> > * refactor RX descriptor handling for introduction of packet-split RX
> >descriptors support
> > * fix descriptors flags handling
> >
> > Tomasz Dzieciol (7):
> >igb: remove TCP ACK detection
> >igb: rename E1000E_RingInfo_st
> >igb: RX descriptors guest writting refactoring
> >igb: RX payload guest writting refactoring
> >igb: add IPv6 extended headers traffic detection
> >igb: packet-split descriptors support
> >e1000e: rename e1000e_ba_state and e1000e_write_hdr_to_rx_buffers
> >
> >   hw/net/e1000e_core.c |  78 +++--
> >   hw/net/igb_core.c| 730 ---
> >   hw/net/igb_regs.h|  20 +-
> >   hw/net/trace-events  |   6 +-
> >   tests/qtest/libqos/igb.c |   5 +
> >   5 files changed, 592 insertions(+), 247 deletions(-)
> >
>
> Thanks for keeping working on this. For the entire series:
>
> Reviewed-by: Akihiko Odaki 
> Tested-by: Akihiko Odaki 
>

Re: [PATCH v4 11/11] target/loongarch: Add loongarch32 cpu la132

2023-08-07 Thread Jiajie Chen




On 2023/8/8 09:54, Jiajie Chen wrote:

Add la132 as a loongarch32 cpu type and allow virt machine to be used
with la132 instead of la464.

Refactor common init logic out as loongarch_cpu_initfn_common.

Signed-off-by: Jiajie Chen 
---
  hw/loongarch/virt.c|  5 
  target/loongarch/cpu.c | 54 --
  2 files changed, 41 insertions(+), 18 deletions(-)

diff --git a/hw/loongarch/virt.c b/hw/loongarch/virt.c
index e19b042ce8..af15bf5aaa 100644
--- a/hw/loongarch/virt.c
+++ b/hw/loongarch/virt.c
@@ -798,11 +798,6 @@ static void loongarch_init(MachineState *machine)
  cpu_model = LOONGARCH_CPU_TYPE_NAME("la464");
  }
  
-if (!strstr(cpu_model, "la464")) {

-error_report("LoongArch/TCG needs cpu type la464");
-exit(1);
-}
-
  if (ram_size < 1 * GiB) {
  error_report("ram_size must be greater than 1G.");
  exit(1);
diff --git a/target/loongarch/cpu.c b/target/loongarch/cpu.c
index 13d4fccbd3..341176817e 100644
--- a/target/loongarch/cpu.c
+++ b/target/loongarch/cpu.c
@@ -356,30 +356,18 @@ static bool loongarch_cpu_has_work(CPUState *cs)
  #endif
  }
  
-static void loongarch_la464_initfn(Object *obj)

+static void loongarch_cpu_initfn_common(CPULoongArchState *env)
  {
-LoongArchCPU *cpu = LOONGARCH_CPU(obj);
-CPULoongArchState *env = >env;
  int i;
  
  for (i = 0; i < 21; i++) {

  env->cpucfg[i] = 0x0;
  }
  
-cpu->dtb_compatible = "loongarch,Loongson-3A5000";

-env->cpucfg[0] = 0x14c010;  /* PRID */
-
  uint32_t data = 0;
-data = FIELD_DP32(data, CPUCFG1, ARCH, 2);
  data = FIELD_DP32(data, CPUCFG1, PGMMU, 1);
  data = FIELD_DP32(data, CPUCFG1, IOCSR, 1);
-data = FIELD_DP32(data, CPUCFG1, PALEN, 0x2f);
-data = FIELD_DP32(data, CPUCFG1, VALEN, 0x2f);
  data = FIELD_DP32(data, CPUCFG1, UAL, 1);
-data = FIELD_DP32(data, CPUCFG1, RI, 1);
-data = FIELD_DP32(data, CPUCFG1, EP, 1);
-data = FIELD_DP32(data, CPUCFG1, RPLV, 1);
-data = FIELD_DP32(data, CPUCFG1, HP, 1);

Sorry, this line should not be removed.

  data = FIELD_DP32(data, CPUCFG1, IOCSR_BRD, 1);
  env->cpucfg[1] = data;
  
@@ -439,6 +427,45 @@ static void loongarch_la464_initfn(Object *obj)

  env->CSR_ASID = FIELD_DP64(0, CSR_ASID, ASIDBITS, 0xa);
  }
  
+static void loongarch_la464_initfn(Object *obj)

+{
+LoongArchCPU *cpu = LOONGARCH_CPU(obj);
+CPULoongArchState *env = >env;
+
+loongarch_cpu_initfn_common(env);
+
+cpu->dtb_compatible = "loongarch,Loongson-3A5000";
+env->cpucfg[0] = 0x14c010;  /* PRID */
+
+uint32_t data = env->cpucfg[1];
+data = FIELD_DP32(data, CPUCFG1, ARCH, 2); /* LA64 */
+data = FIELD_DP32(data, CPUCFG1, PALEN, 0x2f); /* 48 bits */
+data = FIELD_DP32(data, CPUCFG1, VALEN, 0x2f); /* 48 bits */
+data = FIELD_DP32(data, CPUCFG1, RI, 1);
+data = FIELD_DP32(data, CPUCFG1, EP, 1);
+data = FIELD_DP32(data, CPUCFG1, RPLV, 1);
+env->cpucfg[1] = data;
+}
+
+static void loongarch_la132_initfn(Object *obj)
+{
+LoongArchCPU *cpu = LOONGARCH_CPU(obj);
+CPULoongArchState *env = >env;
+
+loongarch_cpu_initfn_common(env);
+
+cpu->dtb_compatible = "loongarch,Loongson-1C103";
+
+uint32_t data = env->cpucfg[1];
+data = FIELD_DP32(data, CPUCFG1, ARCH, 1); /* LA32 */
+data = FIELD_DP32(data, CPUCFG1, PALEN, 0x1f); /* 32 bits */
+data = FIELD_DP32(data, CPUCFG1, VALEN, 0x1f); /* 32 bits */
+data = FIELD_DP32(data, CPUCFG1, RI, 0);
+data = FIELD_DP32(data, CPUCFG1, EP, 0);
+data = FIELD_DP32(data, CPUCFG1, RPLV, 0);
+env->cpucfg[1] = data;
+}
+
  static void loongarch_cpu_list_entry(gpointer data, gpointer user_data)
  {
  const char *typename = object_class_get_name(OBJECT_CLASS(data));
@@ -784,5 +811,6 @@ static const TypeInfo loongarch32_cpu_type_infos[] = {
  .class_size = sizeof(LoongArchCPUClass),
  .class_init = loongarch32_cpu_class_init,
  },
+DEFINE_LOONGARCH32_CPU_TYPE("la132", loongarch_la132_initfn),
  };
  DEFINE_TYPES(loongarch32_cpu_type_infos)

[PATCH v4 06/11] target/loongarch: Support LoongArch32 VPPN

2023-08-07 Thread Jiajie Chen

VPPN of TLBEHI/TLBREHI is limited to 19 bits in LA32.

Signed-off-by: Jiajie Chen 
---
 target/loongarch/cpu-csr.h|  6 --
 target/loongarch/tlb_helper.c | 23 ++-
 2 files changed, 22 insertions(+), 7 deletions(-)

diff --git a/target/loongarch/cpu-csr.h b/target/loongarch/cpu-csr.h
index b93f99a9ef..c59d7a9fcb 100644
--- a/target/loongarch/cpu-csr.h
+++ b/target/loongarch/cpu-csr.h
@@ -57,7 +57,8 @@ FIELD(CSR_TLBIDX, PS, 24, 6)
 FIELD(CSR_TLBIDX, NE, 31, 1)
 
 #define LOONGARCH_CSR_TLBEHI 0x11 /* TLB EntryHi */
-FIELD(CSR_TLBEHI, VPPN, 13, 35)
+FIELD(CSR_TLBEHI_32, VPPN, 13, 19)
+FIELD(CSR_TLBEHI_64, VPPN, 13, 35)
 
 #define LOONGARCH_CSR_TLBELO00x12 /* TLB EntryLo0 */
 #define LOONGARCH_CSR_TLBELO10x13 /* TLB EntryLo1 */
@@ -164,7 +165,8 @@ FIELD(CSR_TLBRERA, PC, 2, 62)
 #define LOONGARCH_CSR_TLBRELO1   0x8d /* TLB refill entrylo1 */
 #define LOONGARCH_CSR_TLBREHI0x8e /* TLB refill entryhi */
 FIELD(CSR_TLBREHI, PS, 0, 6)
-FIELD(CSR_TLBREHI, VPPN, 13, 35)
+FIELD(CSR_TLBREHI_32, VPPN, 13, 19)
+FIELD(CSR_TLBREHI_64, VPPN, 13, 35)
 #define LOONGARCH_CSR_TLBRPRMD   0x8f /* TLB refill mode info */
 FIELD(CSR_TLBRPRMD, PPLV, 0, 2)
 FIELD(CSR_TLBRPRMD, PIE, 2, 1)
diff --git a/target/loongarch/tlb_helper.c b/target/loongarch/tlb_helper.c
index 7e26d1c67b..ed4495a301 100644
--- a/target/loongarch/tlb_helper.c
+++ b/target/loongarch/tlb_helper.c
@@ -300,8 +300,13 @@ static void raise_mmu_exception(CPULoongArchState *env, 
target_ulong address,
 
 if (tlb_error == TLBRET_NOMATCH) {
 env->CSR_TLBRBADV = address;
-env->CSR_TLBREHI = FIELD_DP64(env->CSR_TLBREHI, CSR_TLBREHI, VPPN,
-  extract64(address, 13, 35));
+if (LOONGARCH_CPUCFG_ARCH(env, LA64)) {
+env->CSR_TLBREHI = FIELD_DP64(env->CSR_TLBREHI, CSR_TLBREHI_64,
+VPPN, extract64(address, 13, 35));
+} else {
+env->CSR_TLBREHI = FIELD_DP64(env->CSR_TLBREHI, CSR_TLBREHI_32,
+VPPN, extract64(address, 13, 19));
+}
 } else {
 if (!FIELD_EX64(env->CSR_DBG, CSR_DBG, DST)) {
 env->CSR_BADV = address;
@@ -366,12 +371,20 @@ static void fill_tlb_entry(CPULoongArchState *env, int 
index)
 
 if (FIELD_EX64(env->CSR_TLBRERA, CSR_TLBRERA, ISTLBR)) {
 csr_ps = FIELD_EX64(env->CSR_TLBREHI, CSR_TLBREHI, PS);
-csr_vppn = FIELD_EX64(env->CSR_TLBREHI, CSR_TLBREHI, VPPN);
+if (LOONGARCH_CPUCFG_ARCH(env, LA64)) {
+csr_vppn = FIELD_EX64(env->CSR_TLBREHI, CSR_TLBREHI_64, VPPN);
+} else {
+csr_vppn = FIELD_EX64(env->CSR_TLBREHI, CSR_TLBREHI_32, VPPN);
+}
 lo0 = env->CSR_TLBRELO0;
 lo1 = env->CSR_TLBRELO1;
 } else {
 csr_ps = FIELD_EX64(env->CSR_TLBIDX, CSR_TLBIDX, PS);
-csr_vppn = FIELD_EX64(env->CSR_TLBEHI, CSR_TLBEHI, VPPN);
+if (LOONGARCH_CPUCFG_ARCH(env, LA64)) {
+csr_vppn = FIELD_EX64(env->CSR_TLBEHI, CSR_TLBEHI_64, VPPN);
+} else {
+csr_vppn = FIELD_EX64(env->CSR_TLBEHI, CSR_TLBEHI_32, VPPN);
+}
 lo0 = env->CSR_TLBELO0;
 lo1 = env->CSR_TLBELO1;
 }
@@ -491,7 +504,7 @@ void helper_tlbfill(CPULoongArchState *env)
 
 if (pagesize == stlb_ps) {
 /* Only write into STLB bits [47:13] */
-address = entryhi & ~MAKE_64BIT_MASK(0, R_CSR_TLBEHI_VPPN_SHIFT);
+address = entryhi & ~MAKE_64BIT_MASK(0, R_CSR_TLBEHI_64_VPPN_SHIFT);
 
 /* Choose one set ramdomly */
 set = get_random_tlb(0, 7);
-- 
2.41.0

[PATCH v4 08/11] target/loongarch: Reject la64-only instructions in la32 mode

2023-08-07 Thread Jiajie Chen

LoongArch64-only instructions are marked with regard to the instruction
manual Table 2. LSX instructions are not marked for now for lack of
public manual.

Signed-off-by: Jiajie Chen 
---
 target/loongarch/insn_trans/trans_arith.c.inc | 32 
 .../loongarch/insn_trans/trans_atomic.c.inc   | 76 +--
 target/loongarch/insn_trans/trans_bit.c.inc   | 28 +++
 .../loongarch/insn_trans/trans_branch.c.inc   |  4 +-
 target/loongarch/insn_trans/trans_extra.c.inc | 16 ++--
 target/loongarch/insn_trans/trans_fmov.c.inc  |  4 +-
 .../loongarch/insn_trans/trans_memory.c.inc   | 68 -
 target/loongarch/insn_trans/trans_shift.c.inc | 14 ++--
 target/loongarch/translate.h  | 10 +++
 9 files changed, 131 insertions(+), 121 deletions(-)

diff --git a/target/loongarch/insn_trans/trans_arith.c.inc 
b/target/loongarch/insn_trans/trans_arith.c.inc
index 43d6cf261d..e6d218e84a 100644
--- a/target/loongarch/insn_trans/trans_arith.c.inc
+++ b/target/loongarch/insn_trans/trans_arith.c.inc
@@ -249,9 +249,9 @@ static bool trans_addu16i_d(DisasContext *ctx, 
arg_addu16i_d *a)
 }
 
 TRANS(add_w, gen_rrr, EXT_NONE, EXT_NONE, EXT_SIGN, tcg_gen_add_tl)
-TRANS(add_d, gen_rrr, EXT_NONE, EXT_NONE, EXT_NONE, tcg_gen_add_tl)
+TRANS_64(add_d, gen_rrr, EXT_NONE, EXT_NONE, EXT_NONE, tcg_gen_add_tl)
 TRANS(sub_w, gen_rrr, EXT_NONE, EXT_NONE, EXT_SIGN, tcg_gen_sub_tl)
-TRANS(sub_d, gen_rrr, EXT_NONE, EXT_NONE, EXT_NONE, tcg_gen_sub_tl)
+TRANS_64(sub_d, gen_rrr, EXT_NONE, EXT_NONE, EXT_NONE, tcg_gen_sub_tl)
 TRANS(and, gen_rrr, EXT_NONE, EXT_NONE, EXT_NONE, tcg_gen_and_tl)
 TRANS(or, gen_rrr, EXT_NONE, EXT_NONE, EXT_NONE, tcg_gen_or_tl)
 TRANS(xor, gen_rrr, EXT_NONE, EXT_NONE, EXT_NONE, tcg_gen_xor_tl)
@@ -261,32 +261,32 @@ TRANS(orn, gen_rrr, EXT_NONE, EXT_NONE, EXT_NONE, 
tcg_gen_orc_tl)
 TRANS(slt, gen_rrr, EXT_NONE, EXT_NONE, EXT_NONE, gen_slt)
 TRANS(sltu, gen_rrr, EXT_NONE, EXT_NONE, EXT_NONE, gen_sltu)
 TRANS(mul_w, gen_rrr, EXT_SIGN, EXT_SIGN, EXT_SIGN, tcg_gen_mul_tl)
-TRANS(mul_d, gen_rrr, EXT_NONE, EXT_NONE, EXT_NONE, tcg_gen_mul_tl)
+TRANS_64(mul_d, gen_rrr, EXT_NONE, EXT_NONE, EXT_NONE, tcg_gen_mul_tl)
 TRANS(mulh_w, gen_rrr, EXT_SIGN, EXT_SIGN, EXT_NONE, gen_mulh_w)
 TRANS(mulh_wu, gen_rrr, EXT_ZERO, EXT_ZERO, EXT_NONE, gen_mulh_w)
-TRANS(mulh_d, gen_rrr, EXT_NONE, EXT_NONE, EXT_NONE, gen_mulh_d)
-TRANS(mulh_du, gen_rrr, EXT_NONE, EXT_NONE, EXT_NONE, gen_mulh_du)
-TRANS(mulw_d_w, gen_rrr, EXT_SIGN, EXT_SIGN, EXT_NONE, tcg_gen_mul_tl)
-TRANS(mulw_d_wu, gen_rrr, EXT_ZERO, EXT_ZERO, EXT_NONE, tcg_gen_mul_tl)
+TRANS_64(mulh_d, gen_rrr, EXT_NONE, EXT_NONE, EXT_NONE, gen_mulh_d)
+TRANS_64(mulh_du, gen_rrr, EXT_NONE, EXT_NONE, EXT_NONE, gen_mulh_du)
+TRANS_64(mulw_d_w, gen_rrr, EXT_SIGN, EXT_SIGN, EXT_NONE, tcg_gen_mul_tl)
+TRANS_64(mulw_d_wu, gen_rrr, EXT_ZERO, EXT_ZERO, EXT_NONE, tcg_gen_mul_tl)
 TRANS(div_w, gen_rrr, EXT_SIGN, EXT_SIGN, EXT_SIGN, gen_div_w)
 TRANS(mod_w, gen_rrr, EXT_SIGN, EXT_SIGN, EXT_SIGN, gen_rem_w)
 TRANS(div_wu, gen_rrr, EXT_ZERO, EXT_ZERO, EXT_SIGN, gen_div_du)
 TRANS(mod_wu, gen_rrr, EXT_ZERO, EXT_ZERO, EXT_SIGN, gen_rem_du)
-TRANS(div_d, gen_rrr, EXT_NONE, EXT_NONE, EXT_NONE, gen_div_d)
-TRANS(mod_d, gen_rrr, EXT_NONE, EXT_NONE, EXT_NONE, gen_rem_d)
-TRANS(div_du, gen_rrr, EXT_NONE, EXT_NONE, EXT_NONE, gen_div_du)
-TRANS(mod_du, gen_rrr, EXT_NONE, EXT_NONE, EXT_NONE, gen_rem_du)
+TRANS_64(div_d, gen_rrr, EXT_NONE, EXT_NONE, EXT_NONE, gen_div_d)
+TRANS_64(mod_d, gen_rrr, EXT_NONE, EXT_NONE, EXT_NONE, gen_rem_d)
+TRANS_64(div_du, gen_rrr, EXT_NONE, EXT_NONE, EXT_NONE, gen_div_du)
+TRANS_64(mod_du, gen_rrr, EXT_NONE, EXT_NONE, EXT_NONE, gen_rem_du)
 TRANS(slti, gen_rri_v, EXT_NONE, EXT_NONE, gen_slt)
 TRANS(sltui, gen_rri_v, EXT_NONE, EXT_NONE, gen_sltu)
 TRANS(addi_w, gen_rri_c, EXT_NONE, EXT_SIGN, tcg_gen_addi_tl)
-TRANS(addi_d, gen_rri_c, EXT_NONE, EXT_NONE, tcg_gen_addi_tl)
-TRANS(alsl_w, gen_rrr_sa, EXT_NONE, EXT_SIGN, gen_alsl)
-TRANS(alsl_wu, gen_rrr_sa, EXT_NONE, EXT_ZERO, gen_alsl)
-TRANS(alsl_d, gen_rrr_sa, EXT_NONE, EXT_NONE, gen_alsl)
+TRANS_64(addi_d, gen_rri_c, EXT_NONE, EXT_NONE, tcg_gen_addi_tl)
+TRANS_64(alsl_w, gen_rrr_sa, EXT_NONE, EXT_SIGN, gen_alsl)
+TRANS_64(alsl_wu, gen_rrr_sa, EXT_NONE, EXT_ZERO, gen_alsl)
+TRANS_64(alsl_d, gen_rrr_sa, EXT_NONE, EXT_NONE, gen_alsl)
 TRANS(pcaddi, gen_pc, gen_pcaddi)
 TRANS(pcalau12i, gen_pc, gen_pcalau12i)
 TRANS(pcaddu12i, gen_pc, gen_pcaddu12i)
-TRANS(pcaddu18i, gen_pc, gen_pcaddu18i)
+TRANS_64(pcaddu18i, gen_pc, gen_pcaddu18i)
 TRANS(andi, gen_rri_c, EXT_NONE, EXT_NONE, tcg_gen_andi_tl)
 TRANS(ori, gen_rri_c, EXT_NONE, EXT_NONE, tcg_gen_ori_tl)
 TRANS(xori, gen_rri_c, EXT_NONE, EXT_NONE, tcg_gen_xori_tl)
diff --git a/target/loongarch/insn_trans/trans_atomic.c.inc 
b/target/loongarch/insn_trans/trans_atomic.c.inc
index 612709f2a7..c69f31bc78 100644
--- a/target/loongarch/insn_trans/trans_atomic.c.inc
+++ b/target/loongarch/insn_trans/trans_atomic.c.inc
@@ -70,41 +70,41 @@ static bool

[PATCH v4 05/11] target/loongarch: Support LoongArch32 DMW

2023-08-07 Thread Jiajie Chen

LA32 uses a different encoding for CSR.DMW and a new direct mapping
mechanism.

Signed-off-by: Jiajie Chen 
---
 target/loongarch/cpu-csr.h|  7 +++
 target/loongarch/tlb_helper.c | 26 +++---
 2 files changed, 26 insertions(+), 7 deletions(-)

diff --git a/target/loongarch/cpu-csr.h b/target/loongarch/cpu-csr.h
index 48ed2e0632..b93f99a9ef 100644
--- a/target/loongarch/cpu-csr.h
+++ b/target/loongarch/cpu-csr.h
@@ -188,10 +188,9 @@ FIELD(CSR_DMW, PLV1, 1, 1)
 FIELD(CSR_DMW, PLV2, 2, 1)
 FIELD(CSR_DMW, PLV3, 3, 1)
 FIELD(CSR_DMW, MAT, 4, 2)
-FIELD(CSR_DMW, VSEG, 60, 4)
-
-#define dmw_va2pa(va) \
-(va & MAKE_64BIT_MASK(0, TARGET_VIRT_ADDR_SPACE_BITS))
+FIELD(CSR_DMW_32, PSEG, 25, 3)
+FIELD(CSR_DMW_32, VSEG, 29, 3)
+FIELD(CSR_DMW_64, VSEG, 60, 4)
 
 /* Debug CSRs */
 #define LOONGARCH_CSR_DBG0x500 /* debug config */
diff --git a/target/loongarch/tlb_helper.c b/target/loongarch/tlb_helper.c
index f74940ea9f..7e26d1c67b 100644
--- a/target/loongarch/tlb_helper.c
+++ b/target/loongarch/tlb_helper.c
@@ -173,6 +173,18 @@ static int loongarch_map_address(CPULoongArchState *env, 
hwaddr *physical,
 return TLBRET_NOMATCH;
 }
 
+static hwaddr dmw_va2pa(CPULoongArchState *env, target_ulong va,
+target_ulong dmw)
+{
+if (LOONGARCH_CPUCFG_ARCH(env, LA64)) {
+return va & TARGET_VIRT_MASK;
+} else {
+uint32_t pseg = FIELD_EX32(dmw, CSR_DMW_32, PSEG);
+return (va & MAKE_64BIT_MASK(0, R_CSR_DMW_32_VSEG_SHIFT)) | \
+(pseg << R_CSR_DMW_32_VSEG_SHIFT);
+}
+}
+
 static int get_physical_address(CPULoongArchState *env, hwaddr *physical,
 int *prot, target_ulong address,
 MMUAccessType access_type, int mmu_idx)
@@ -192,12 +204,20 @@ static int get_physical_address(CPULoongArchState *env, 
hwaddr *physical,
 }
 
 plv = kernel_mode | (user_mode << R_CSR_DMW_PLV3_SHIFT);
-base_v = address >> R_CSR_DMW_VSEG_SHIFT;
+if (LOONGARCH_CPUCFG_ARCH(env, LA64)) {
+base_v = address >> R_CSR_DMW_64_VSEG_SHIFT;
+} else {
+base_v = address >> R_CSR_DMW_32_VSEG_SHIFT;
+}
 /* Check direct map window */
 for (int i = 0; i < 4; i++) {
-base_c = FIELD_EX64(env->CSR_DMW[i], CSR_DMW, VSEG);
+if (LOONGARCH_CPUCFG_ARCH(env, LA64)) {
+base_c = FIELD_EX64(env->CSR_DMW[i], CSR_DMW_64, VSEG);
+} else {
+base_c = FIELD_EX64(env->CSR_DMW[i], CSR_DMW_32, VSEG);
+}
 if ((plv & env->CSR_DMW[i]) && (base_c == base_v)) {
-*physical = dmw_va2pa(address);
+*physical = dmw_va2pa(env, address, env->CSR_DMW[i]);
 *prot = PAGE_READ | PAGE_WRITE | PAGE_EXEC;
 return TLBRET_MATCH;
 }
-- 
2.41.0

[PATCH v4 10/11] target/loongarch: Sign extend results in VA32 mode

2023-08-07 Thread Jiajie Chen

In VA32 mode, BL, JIRL and PC* instructions should sign-extend the low
32 bit result to 64 bits.

Signed-off-by: Jiajie Chen 
---
 target/loongarch/insn_trans/trans_arith.c.inc  |  2 +-
 target/loongarch/insn_trans/trans_branch.c.inc |  5 +++--
 target/loongarch/translate.c   | 13 +
 3 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/target/loongarch/insn_trans/trans_arith.c.inc 
b/target/loongarch/insn_trans/trans_arith.c.inc
index e6d218e84a..39915f228d 100644
--- a/target/loongarch/insn_trans/trans_arith.c.inc
+++ b/target/loongarch/insn_trans/trans_arith.c.inc
@@ -72,7 +72,7 @@ static bool gen_pc(DisasContext *ctx, arg_r_i *a,
target_ulong (*func)(target_ulong, int))
 {
 TCGv dest = gpr_dst(ctx, a->rd, EXT_NONE);
-target_ulong addr = func(ctx->base.pc_next, a->imm);
+target_ulong addr = va32_result(ctx, func(ctx->base.pc_next, a->imm));
 
 tcg_gen_movi_tl(dest, addr);
 gen_set_gpr(a->rd, dest, EXT_NONE);
diff --git a/target/loongarch/insn_trans/trans_branch.c.inc 
b/target/loongarch/insn_trans/trans_branch.c.inc
index 29b81a9843..41f0bfd489 100644
--- a/target/loongarch/insn_trans/trans_branch.c.inc
+++ b/target/loongarch/insn_trans/trans_branch.c.inc
@@ -12,7 +12,7 @@ static bool trans_b(DisasContext *ctx, arg_b *a)
 
 static bool trans_bl(DisasContext *ctx, arg_bl *a)
 {
-tcg_gen_movi_tl(cpu_gpr[1], ctx->base.pc_next + 4);
+tcg_gen_movi_tl(cpu_gpr[1], va32_result(ctx, ctx->base.pc_next + 4));
 gen_goto_tb(ctx, 0, ctx->base.pc_next + a->offs);
 ctx->base.is_jmp = DISAS_NORETURN;
 return true;
@@ -24,7 +24,8 @@ static bool trans_jirl(DisasContext *ctx, arg_jirl *a)
 TCGv src1 = gpr_src(ctx, a->rj, EXT_NONE);
 
 tcg_gen_addi_tl(cpu_pc, src1, a->imm);
-tcg_gen_movi_tl(dest, ctx->base.pc_next + 4);
+tcg_gen_movi_tl(dest, va32_result(ctx, ctx->base.pc_next + 4));
+
 gen_set_gpr(a->rd, dest, EXT_NONE);
 tcg_gen_lookup_and_goto_ptr();
 ctx->base.is_jmp = DISAS_NORETURN;
diff --git a/target/loongarch/translate.c b/target/loongarch/translate.c
index 9cd2f13778..9703fc46a6 100644
--- a/target/loongarch/translate.c
+++ b/target/loongarch/translate.c
@@ -218,6 +218,19 @@ static TCGv va32_address(DisasContext *ctx, TCGv addr)
 return addr;
 }
 
+static uint64_t sign_extend32(uint64_t data)
+{
+return (data & 0x7FFF) - (data & 0x8000);
+}
+
+static uint64_t va32_result(DisasContext *ctx, uint64_t addr)
+{
+if (ctx->va32) {
+addr = sign_extend32(addr);
+}
+return addr;
+}
+
 #include "decode-insns.c.inc"
 #include "insn_trans/trans_arith.c.inc"
 #include "insn_trans/trans_shift.c.inc"
-- 
2.41.0

[PATCH v4 11/11] target/loongarch: Add loongarch32 cpu la132

2023-08-07 Thread Jiajie Chen

Add la132 as a loongarch32 cpu type and allow virt machine to be used
with la132 instead of la464.

Refactor common init logic out as loongarch_cpu_initfn_common.

Signed-off-by: Jiajie Chen 
---
 hw/loongarch/virt.c|  5 
 target/loongarch/cpu.c | 54 --
 2 files changed, 41 insertions(+), 18 deletions(-)

diff --git a/hw/loongarch/virt.c b/hw/loongarch/virt.c
index e19b042ce8..af15bf5aaa 100644
--- a/hw/loongarch/virt.c
+++ b/hw/loongarch/virt.c
@@ -798,11 +798,6 @@ static void loongarch_init(MachineState *machine)
 cpu_model = LOONGARCH_CPU_TYPE_NAME("la464");
 }
 
-if (!strstr(cpu_model, "la464")) {
-error_report("LoongArch/TCG needs cpu type la464");
-exit(1);
-}
-
 if (ram_size < 1 * GiB) {
 error_report("ram_size must be greater than 1G.");
 exit(1);
diff --git a/target/loongarch/cpu.c b/target/loongarch/cpu.c
index 13d4fccbd3..341176817e 100644
--- a/target/loongarch/cpu.c
+++ b/target/loongarch/cpu.c
@@ -356,30 +356,18 @@ static bool loongarch_cpu_has_work(CPUState *cs)
 #endif
 }
 
-static void loongarch_la464_initfn(Object *obj)
+static void loongarch_cpu_initfn_common(CPULoongArchState *env)
 {
-LoongArchCPU *cpu = LOONGARCH_CPU(obj);
-CPULoongArchState *env = >env;
 int i;
 
 for (i = 0; i < 21; i++) {
 env->cpucfg[i] = 0x0;
 }
 
-cpu->dtb_compatible = "loongarch,Loongson-3A5000";
-env->cpucfg[0] = 0x14c010;  /* PRID */
-
 uint32_t data = 0;
-data = FIELD_DP32(data, CPUCFG1, ARCH, 2);
 data = FIELD_DP32(data, CPUCFG1, PGMMU, 1);
 data = FIELD_DP32(data, CPUCFG1, IOCSR, 1);
-data = FIELD_DP32(data, CPUCFG1, PALEN, 0x2f);
-data = FIELD_DP32(data, CPUCFG1, VALEN, 0x2f);
 data = FIELD_DP32(data, CPUCFG1, UAL, 1);
-data = FIELD_DP32(data, CPUCFG1, RI, 1);
-data = FIELD_DP32(data, CPUCFG1, EP, 1);
-data = FIELD_DP32(data, CPUCFG1, RPLV, 1);
-data = FIELD_DP32(data, CPUCFG1, HP, 1);
 data = FIELD_DP32(data, CPUCFG1, IOCSR_BRD, 1);
 env->cpucfg[1] = data;
 
@@ -439,6 +427,45 @@ static void loongarch_la464_initfn(Object *obj)
 env->CSR_ASID = FIELD_DP64(0, CSR_ASID, ASIDBITS, 0xa);
 }
 
+static void loongarch_la464_initfn(Object *obj)
+{
+LoongArchCPU *cpu = LOONGARCH_CPU(obj);
+CPULoongArchState *env = >env;
+
+loongarch_cpu_initfn_common(env);
+
+cpu->dtb_compatible = "loongarch,Loongson-3A5000";
+env->cpucfg[0] = 0x14c010;  /* PRID */
+
+uint32_t data = env->cpucfg[1];
+data = FIELD_DP32(data, CPUCFG1, ARCH, 2); /* LA64 */
+data = FIELD_DP32(data, CPUCFG1, PALEN, 0x2f); /* 48 bits */
+data = FIELD_DP32(data, CPUCFG1, VALEN, 0x2f); /* 48 bits */
+data = FIELD_DP32(data, CPUCFG1, RI, 1);
+data = FIELD_DP32(data, CPUCFG1, EP, 1);
+data = FIELD_DP32(data, CPUCFG1, RPLV, 1);
+env->cpucfg[1] = data;
+}
+
+static void loongarch_la132_initfn(Object *obj)
+{
+LoongArchCPU *cpu = LOONGARCH_CPU(obj);
+CPULoongArchState *env = >env;
+
+loongarch_cpu_initfn_common(env);
+
+cpu->dtb_compatible = "loongarch,Loongson-1C103";
+
+uint32_t data = env->cpucfg[1];
+data = FIELD_DP32(data, CPUCFG1, ARCH, 1); /* LA32 */
+data = FIELD_DP32(data, CPUCFG1, PALEN, 0x1f); /* 32 bits */
+data = FIELD_DP32(data, CPUCFG1, VALEN, 0x1f); /* 32 bits */
+data = FIELD_DP32(data, CPUCFG1, RI, 0);
+data = FIELD_DP32(data, CPUCFG1, EP, 0);
+data = FIELD_DP32(data, CPUCFG1, RPLV, 0);
+env->cpucfg[1] = data;
+}
+
 static void loongarch_cpu_list_entry(gpointer data, gpointer user_data)
 {
 const char *typename = object_class_get_name(OBJECT_CLASS(data));
@@ -784,5 +811,6 @@ static const TypeInfo loongarch32_cpu_type_infos[] = {
 .class_size = sizeof(LoongArchCPUClass),
 .class_init = loongarch32_cpu_class_init,
 },
+DEFINE_LOONGARCH32_CPU_TYPE("la132", loongarch_la132_initfn),
 };
 DEFINE_TYPES(loongarch32_cpu_type_infos)
-- 
2.41.0

[PATCH v4 04/11] target/loongarch: Support LoongArch32 TLB entry

2023-08-07 Thread Jiajie Chen

The TLB entry of LA32 lacks NR, NX and RPLV and they are hardwired to
zero in LoongArch32.

Signed-off-by: Jiajie Chen 
---
 target/loongarch/cpu-csr.h|  9 +
 target/loongarch/tlb_helper.c | 17 -
 2 files changed, 17 insertions(+), 9 deletions(-)

diff --git a/target/loongarch/cpu-csr.h b/target/loongarch/cpu-csr.h
index f8f24032cb..48ed2e0632 100644
--- a/target/loongarch/cpu-csr.h
+++ b/target/loongarch/cpu-csr.h
@@ -66,10 +66,11 @@ FIELD(TLBENTRY, D, 1, 1)
 FIELD(TLBENTRY, PLV, 2, 2)
 FIELD(TLBENTRY, MAT, 4, 2)
 FIELD(TLBENTRY, G, 6, 1)
-FIELD(TLBENTRY, PPN, 12, 36)
-FIELD(TLBENTRY, NR, 61, 1)
-FIELD(TLBENTRY, NX, 62, 1)
-FIELD(TLBENTRY, RPLV, 63, 1)
+FIELD(TLBENTRY_32, PPN, 8, 24)
+FIELD(TLBENTRY_64, PPN, 12, 36)
+FIELD(TLBENTRY_64, NR, 61, 1)
+FIELD(TLBENTRY_64, NX, 62, 1)
+FIELD(TLBENTRY_64, RPLV, 63, 1)
 
 #define LOONGARCH_CSR_ASID   0x18 /* Address space identifier */
 FIELD(CSR_ASID, ASID, 0, 10)
diff --git a/target/loongarch/tlb_helper.c b/target/loongarch/tlb_helper.c
index 6e00190547..f74940ea9f 100644
--- a/target/loongarch/tlb_helper.c
+++ b/target/loongarch/tlb_helper.c
@@ -48,10 +48,17 @@ static int loongarch_map_tlb_entry(CPULoongArchState *env, 
hwaddr *physical,
 tlb_v = FIELD_EX64(tlb_entry, TLBENTRY, V);
 tlb_d = FIELD_EX64(tlb_entry, TLBENTRY, D);
 tlb_plv = FIELD_EX64(tlb_entry, TLBENTRY, PLV);
-tlb_ppn = FIELD_EX64(tlb_entry, TLBENTRY, PPN);
-tlb_nx = FIELD_EX64(tlb_entry, TLBENTRY, NX);
-tlb_nr = FIELD_EX64(tlb_entry, TLBENTRY, NR);
-tlb_rplv = FIELD_EX64(tlb_entry, TLBENTRY, RPLV);
+if (LOONGARCH_CPUCFG_ARCH(env, LA64)) {
+tlb_ppn = FIELD_EX64(tlb_entry, TLBENTRY_64, PPN);
+tlb_nx = FIELD_EX64(tlb_entry, TLBENTRY_64, NX);
+tlb_nr = FIELD_EX64(tlb_entry, TLBENTRY_64, NR);
+tlb_rplv = FIELD_EX64(tlb_entry, TLBENTRY_64, RPLV);
+} else {
+tlb_ppn = FIELD_EX64(tlb_entry, TLBENTRY_32, PPN);
+tlb_nx = 0;
+tlb_nr = 0;
+tlb_rplv = 0;
+}
 
 /* Check access rights */
 if (!tlb_v) {
@@ -79,7 +86,7 @@ static int loongarch_map_tlb_entry(CPULoongArchState *env, 
hwaddr *physical,
  * tlb_entry contains ppn[47:12] while 16KiB ppn is [47:15]
  * need adjust.
  */
-*physical = (tlb_ppn << R_TLBENTRY_PPN_SHIFT) |
+*physical = (tlb_ppn << R_TLBENTRY_64_PPN_SHIFT) |
 (address & MAKE_64BIT_MASK(0, tlb_ps));
 *prot = PAGE_READ;
 if (tlb_d) {
-- 
2.41.0

[PATCH v4 07/11] target/loongarch: Add LA32 & VA32 to DisasContext

2023-08-07 Thread Jiajie Chen

Add LA32 and VA32(32-bit Virtual Address) to DisasContext to allow the
translator to reject doubleword instructions in LA32 mode for example.

Signed-off-by: Jiajie Chen 
---
 target/loongarch/cpu.h   | 9 +
 target/loongarch/translate.c | 3 +++
 target/loongarch/translate.h | 2 ++
 3 files changed, 14 insertions(+)

diff --git a/target/loongarch/cpu.h b/target/loongarch/cpu.h
index 396869c3b6..69589f0aef 100644
--- a/target/loongarch/cpu.h
+++ b/target/loongarch/cpu.h
@@ -445,15 +445,24 @@ static inline int cpu_mmu_index(CPULoongArchState *env, 
bool ifetch)
 #define HW_FLAGS_CRMD_PGR_CSR_CRMD_PG_MASK   /* 0x10 */
 #define HW_FLAGS_EUEN_FPE   0x04
 #define HW_FLAGS_EUEN_SXE   0x08
+#define HW_FLAGS_VA32   0x20
 
 static inline void cpu_get_tb_cpu_state(CPULoongArchState *env, vaddr *pc,
 uint64_t *cs_base, uint32_t *flags)
 {
+/* VA32 if LA32 or VA32L[1-3] */
+uint32_t va32 = LOONGARCH_CPUCFG_ARCH(env, LA32);
+uint64_t plv = FIELD_EX64(env->CSR_CRMD, CSR_CRMD, PLV);
+if (plv >= 1 && (FIELD_EX64(env->CSR_MISC, CSR_MISC, VA32) & (1 << plv))) {
+va32 = 1;
+}
+
 *pc = env->pc;
 *cs_base = 0;
 *flags = env->CSR_CRMD & (R_CSR_CRMD_PLV_MASK | R_CSR_CRMD_PG_MASK);
 *flags |= FIELD_EX64(env->CSR_EUEN, CSR_EUEN, FPE) * HW_FLAGS_EUEN_FPE;
 *flags |= FIELD_EX64(env->CSR_EUEN, CSR_EUEN, SXE) * HW_FLAGS_EUEN_SXE;
+*flags |= va32 * HW_FLAGS_VA32;
 }
 
 void loongarch_cpu_list(void);
diff --git a/target/loongarch/translate.c b/target/loongarch/translate.c
index 3146a2d4ac..f1e5fe4cf8 100644
--- a/target/loongarch/translate.c
+++ b/target/loongarch/translate.c
@@ -119,6 +119,9 @@ static void 
loongarch_tr_init_disas_context(DisasContextBase *dcbase,
 ctx->vl = LSX_LEN;
 }
 
+ctx->la32 = LOONGARCH_CPUCFG_ARCH(env, LA32);
+ctx->va32 = (ctx->base.tb->flags & HW_FLAGS_VA32) != 0;
+
 ctx->zero = tcg_constant_tl(0);
 }
 
diff --git a/target/loongarch/translate.h b/target/loongarch/translate.h
index 7f60090580..828f1185d2 100644
--- a/target/loongarch/translate.h
+++ b/target/loongarch/translate.h
@@ -33,6 +33,8 @@ typedef struct DisasContext {
 uint16_t plv;
 int vl;   /* Vector length */
 TCGv zero;
+bool la32; /* LoongArch32 mode */
+bool va32; /* 32-bit virtual address */
 } DisasContext;
 
 void generate_exception(DisasContext *ctx, int excp);
-- 
2.41.0

[PATCH v4 01/11] target/loongarch: Add macro to check current arch

2023-08-07 Thread Jiajie Chen

Add macro to check if the current cpucfg[1].arch equals to 1(LA32) or
2(LA64).

Signed-off-by: Jiajie Chen 
---
 target/loongarch/cpu.h | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/target/loongarch/cpu.h b/target/loongarch/cpu.h
index fa371ca8ba..bf0da8d5b4 100644
--- a/target/loongarch/cpu.h
+++ b/target/loongarch/cpu.h
@@ -132,6 +132,13 @@ FIELD(CPUCFG1, HP, 24, 1)
 FIELD(CPUCFG1, IOCSR_BRD, 25, 1)
 FIELD(CPUCFG1, MSG_INT, 26, 1)
 
+/* cpucfg[1].arch */
+#define CPUCFG1_ARCH_LA321
+#define CPUCFG1_ARCH_LA642
+
+#define LOONGARCH_CPUCFG_ARCH(env, mode) \
+  (FIELD_EX32(env->cpucfg[1], CPUCFG1, ARCH) == CPUCFG1_ARCH_##mode)
+
 /* cpucfg[2] bits */
 FIELD(CPUCFG2, FP, 0, 1)
 FIELD(CPUCFG2, FP_SP, 1, 1)
-- 
2.41.0

[PATCH v4 02/11] target/loongarch: Add new object class for loongarch32 cpus

2023-08-07 Thread Jiajie Chen

Add object class for future loongarch32 cpus. It is derived from the
loongarch64 object class.

Signed-off-by: Jiajie Chen 
---
 target/loongarch/cpu.c | 24 
 target/loongarch/cpu.h | 11 +++
 2 files changed, 35 insertions(+)

diff --git a/target/loongarch/cpu.c b/target/loongarch/cpu.c
index ad93ecac92..3bd293d00a 100644
--- a/target/loongarch/cpu.c
+++ b/target/loongarch/cpu.c
@@ -732,6 +732,10 @@ static void loongarch_cpu_class_init(ObjectClass *c, void 
*data)
 #endif
 }
 
+static void loongarch32_cpu_class_init(ObjectClass *c, void *data)
+{
+}
+
 #define DEFINE_LOONGARCH_CPU_TYPE(model, initfn) \
 { \
 .parent = TYPE_LOONGARCH_CPU, \
@@ -754,3 +758,23 @@ static const TypeInfo loongarch_cpu_type_infos[] = {
 };
 
 DEFINE_TYPES(loongarch_cpu_type_infos)
+
+#define DEFINE_LOONGARCH32_CPU_TYPE(model, initfn) \
+{ \
+.parent = TYPE_LOONGARCH32_CPU, \
+.instance_init = initfn, \
+.name = LOONGARCH_CPU_TYPE_NAME(model), \
+}
+
+static const TypeInfo loongarch32_cpu_type_infos[] = {
+{
+.name = TYPE_LOONGARCH32_CPU,
+.parent = TYPE_LOONGARCH_CPU,
+.instance_size = sizeof(LoongArchCPU),
+
+.abstract = true,
+.class_size = sizeof(LoongArchCPUClass),
+.class_init = loongarch32_cpu_class_init,
+},
+};
+DEFINE_TYPES(loongarch32_cpu_type_infos)
diff --git a/target/loongarch/cpu.h b/target/loongarch/cpu.h
index bf0da8d5b4..396869c3b6 100644
--- a/target/loongarch/cpu.h
+++ b/target/loongarch/cpu.h
@@ -404,6 +404,17 @@ struct LoongArchCPUClass {
 ResettablePhases parent_phases;
 };
 
+#define TYPE_LOONGARCH32_CPU "loongarch32-cpu"
+typedef struct LoongArch32CPUClass LoongArch32CPUClass;
+DECLARE_CLASS_CHECKERS(LoongArch32CPUClass, LOONGARCH32_CPU,
+   TYPE_LOONGARCH32_CPU)
+
+struct LoongArch32CPUClass {
+/*< private >*/
+LoongArchCPUClass parent_class;
+/*< public >*/
+};
+
 /*
  * LoongArch CPUs has 4 privilege levels.
  * 0 for kernel mode, 3 for user mode.
-- 
2.41.0

[PATCH v4 03/11] target/loongarch: Add GDB support for loongarch32 mode

2023-08-07 Thread Jiajie Chen

GPRs and PC are 32-bit wide in loongarch32 mode.

Signed-off-by: Jiajie Chen 
---
 configs/targets/loongarch64-softmmu.mak |  2 +-
 gdb-xml/loongarch-base32.xml| 45 +
 target/loongarch/cpu.c  | 10 +-
 target/loongarch/gdbstub.c  | 32 ++
 4 files changed, 80 insertions(+), 9 deletions(-)
 create mode 100644 gdb-xml/loongarch-base32.xml

diff --git a/configs/targets/loongarch64-softmmu.mak 
b/configs/targets/loongarch64-softmmu.mak
index 9abc99056f..f23780fdd8 100644
--- a/configs/targets/loongarch64-softmmu.mak
+++ b/configs/targets/loongarch64-softmmu.mak
@@ -1,5 +1,5 @@
 TARGET_ARCH=loongarch64
 TARGET_BASE_ARCH=loongarch
 TARGET_SUPPORTS_MTTCG=y
-TARGET_XML_FILES= gdb-xml/loongarch-base64.xml gdb-xml/loongarch-fpu.xml
+TARGET_XML_FILES= gdb-xml/loongarch-base32.xml gdb-xml/loongarch-base64.xml 
gdb-xml/loongarch-fpu.xml
 TARGET_NEED_FDT=y
diff --git a/gdb-xml/loongarch-base32.xml b/gdb-xml/loongarch-base32.xml
new file mode 100644
index 00..af47bbd3da
--- /dev/null
+++ b/gdb-xml/loongarch-base32.xml
@@ -0,0 +1,45 @@
+
+
+
+
+
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+
diff --git a/target/loongarch/cpu.c b/target/loongarch/cpu.c
index 3bd293d00a..13d4fccbd3 100644
--- a/target/loongarch/cpu.c
+++ b/target/loongarch/cpu.c
@@ -694,7 +694,13 @@ static const struct SysemuCPUOps loongarch_sysemu_ops = {
 
 static gchar *loongarch_gdb_arch_name(CPUState *cs)
 {
-return g_strdup("loongarch64");
+LoongArchCPU *cpu = LOONGARCH_CPU(cs);
+CPULoongArchState *env = >env;
+if (LOONGARCH_CPUCFG_ARCH(env, LA64)) {
+return g_strdup("loongarch64");
+} else {
+return g_strdup("loongarch32");
+}
 }
 
 static void loongarch_cpu_class_init(ObjectClass *c, void *data)
@@ -734,6 +740,8 @@ static void loongarch_cpu_class_init(ObjectClass *c, void 
*data)
 
 static void loongarch32_cpu_class_init(ObjectClass *c, void *data)
 {
+CPUClass *cc = CPU_CLASS(c);
+cc->gdb_core_xml_file = "loongarch-base32.xml";
 }
 
 #define DEFINE_LOONGARCH_CPU_TYPE(model, initfn) \
diff --git a/target/loongarch/gdbstub.c b/target/loongarch/gdbstub.c
index 0752fff924..0dfd1c8bb9 100644
--- a/target/loongarch/gdbstub.c
+++ b/target/loongarch/gdbstub.c
@@ -34,16 +34,25 @@ int loongarch_cpu_gdb_read_register(CPUState *cs, 
GByteArray *mem_buf, int n)
 {
 LoongArchCPU *cpu = LOONGARCH_CPU(cs);
 CPULoongArchState *env = >env;
+uint64_t val;
 
 if (0 <= n && n < 32) {
-return gdb_get_regl(mem_buf, env->gpr[n]);
+val = env->gpr[n];
 } else if (n == 32) {
 /* orig_a0 */
-return gdb_get_regl(mem_buf, 0);
+val = 0;
 } else if (n == 33) {
-return gdb_get_regl(mem_buf, env->pc);
+val = env->pc;
 } else if (n == 34) {
-return gdb_get_regl(mem_buf, env->CSR_BADV);
+val = env->CSR_BADV;
+}
+
+if (0 <= n && n <= 34) {
+if (LOONGARCH_CPUCFG_ARCH(env, LA64)) {
+return gdb_get_reg64(mem_buf, val);
+} else {
+return gdb_get_reg32(mem_buf, val);
+}
 }
 return 0;
 }
@@ -52,15 +61,24 @@ int loongarch_cpu_gdb_write_register(CPUState *cs, uint8_t 
*mem_buf, int n)
 {
 LoongArchCPU *cpu = LOONGARCH_CPU(cs);
 CPULoongArchState *env = >env;
-target_ulong tmp = ldtul_p(mem_buf);
+target_ulong tmp;
+int read_length;
 int length = 0;
 
+if (LOONGARCH_CPUCFG_ARCH(env, LA64)) {
+tmp = ldq_p(mem_buf);
+read_length = 8;
+} else {
+tmp = ldl_p(mem_buf);
+read_length = 4;
+}
+
 if (0 <= n && n < 32) {
 env->gpr[n] = tmp;
-length = sizeof(target_ulong);
+length = read_length;
 } else if (n == 33) {
 env->pc = tmp;
-length = sizeof(target_ulong);
+length = read_length;
 }
 return length;
 }
-- 
2.41.0

[PATCH v4 09/11] target/loongarch: Truncate high 32 bits of address in VA32 mode

2023-08-07 Thread Jiajie Chen

When running in VA32 mode(LA32 or VA32L[1-3] matching PLV), virtual
address is truncated to 32 bits before address mapping.

Signed-off-by: Jiajie Chen 
---
 target/loongarch/cpu.h  |  6 +-
 target/loongarch/insn_trans/trans_atomic.c.inc  |  1 +
 target/loongarch/insn_trans/trans_fmemory.c.inc |  8 
 target/loongarch/insn_trans/trans_lsx.c.inc |  6 ++
 target/loongarch/insn_trans/trans_memory.c.inc  | 10 ++
 target/loongarch/translate.c| 10 ++
 6 files changed, 40 insertions(+), 1 deletion(-)

diff --git a/target/loongarch/cpu.h b/target/loongarch/cpu.h
index 69589f0aef..9ad5fcc494 100644
--- a/target/loongarch/cpu.h
+++ b/target/loongarch/cpu.h
@@ -457,7 +457,11 @@ static inline void cpu_get_tb_cpu_state(CPULoongArchState 
*env, vaddr *pc,
 va32 = 1;
 }
 
-*pc = env->pc;
+if (va32) {
+*pc = (uint32_t)env->pc;
+} else {
+*pc = env->pc;
+}
 *cs_base = 0;
 *flags = env->CSR_CRMD & (R_CSR_CRMD_PLV_MASK | R_CSR_CRMD_PG_MASK);
 *flags |= FIELD_EX64(env->CSR_EUEN, CSR_EUEN, FPE) * HW_FLAGS_EUEN_FPE;
diff --git a/target/loongarch/insn_trans/trans_atomic.c.inc 
b/target/loongarch/insn_trans/trans_atomic.c.inc
index c69f31bc78..d9d950d642 100644
--- a/target/loongarch/insn_trans/trans_atomic.c.inc
+++ b/target/loongarch/insn_trans/trans_atomic.c.inc
@@ -10,6 +10,7 @@ static bool gen_ll(DisasContext *ctx, arg_rr_i *a, MemOp mop)
 TCGv t0 = tcg_temp_new();
 
 tcg_gen_addi_tl(t0, src1, a->imm);
+t0 = va32_address(ctx, t0);
 tcg_gen_qemu_ld_i64(dest, t0, ctx->mem_idx, mop);
 tcg_gen_st_tl(t0, cpu_env, offsetof(CPULoongArchState, lladdr));
 tcg_gen_st_tl(dest, cpu_env, offsetof(CPULoongArchState, llval));
diff --git a/target/loongarch/insn_trans/trans_fmemory.c.inc 
b/target/loongarch/insn_trans/trans_fmemory.c.inc
index 91c09fb6d9..391af356d0 100644
--- a/target/loongarch/insn_trans/trans_fmemory.c.inc
+++ b/target/loongarch/insn_trans/trans_fmemory.c.inc
@@ -22,6 +22,7 @@ static bool gen_fload_i(DisasContext *ctx, arg_fr_i *a, MemOp 
mop)
 tcg_gen_addi_tl(temp, addr, a->imm);
 addr = temp;
 }
+addr = va32_address(ctx, addr);
 
 tcg_gen_qemu_ld_tl(dest, addr, ctx->mem_idx, mop);
 maybe_nanbox_load(dest, mop);
@@ -42,6 +43,7 @@ static bool gen_fstore_i(DisasContext *ctx, arg_fr_i *a, 
MemOp mop)
 tcg_gen_addi_tl(temp, addr, a->imm);
 addr = temp;
 }
+addr = va32_address(ctx, addr);
 
 tcg_gen_qemu_st_tl(src, addr, ctx->mem_idx, mop);
 
@@ -59,6 +61,7 @@ static bool gen_floadx(DisasContext *ctx, arg_frr *a, MemOp 
mop)
 
 addr = tcg_temp_new();
 tcg_gen_add_tl(addr, src1, src2);
+addr = va32_address(ctx, addr);
 tcg_gen_qemu_ld_tl(dest, addr, ctx->mem_idx, mop);
 maybe_nanbox_load(dest, mop);
 set_fpr(a->fd, dest);
@@ -77,6 +80,7 @@ static bool gen_fstorex(DisasContext *ctx, arg_frr *a, MemOp 
mop)
 
 addr = tcg_temp_new();
 tcg_gen_add_tl(addr, src1, src2);
+addr = va32_address(ctx, addr);
 tcg_gen_qemu_st_tl(src3, addr, ctx->mem_idx, mop);
 
 return true;
@@ -94,6 +98,7 @@ static bool gen_fload_gt(DisasContext *ctx, arg_frr *a, MemOp 
mop)
 addr = tcg_temp_new();
 gen_helper_asrtgt_d(cpu_env, src1, src2);
 tcg_gen_add_tl(addr, src1, src2);
+addr = va32_address(ctx, addr);
 tcg_gen_qemu_ld_tl(dest, addr, ctx->mem_idx, mop);
 maybe_nanbox_load(dest, mop);
 set_fpr(a->fd, dest);
@@ -113,6 +118,7 @@ static bool gen_fstore_gt(DisasContext *ctx, arg_frr *a, 
MemOp mop)
 addr = tcg_temp_new();
 gen_helper_asrtgt_d(cpu_env, src1, src2);
 tcg_gen_add_tl(addr, src1, src2);
+addr = va32_address(ctx, addr);
 tcg_gen_qemu_st_tl(src3, addr, ctx->mem_idx, mop);
 
 return true;
@@ -130,6 +136,7 @@ static bool gen_fload_le(DisasContext *ctx, arg_frr *a, 
MemOp mop)
 addr = tcg_temp_new();
 gen_helper_asrtle_d(cpu_env, src1, src2);
 tcg_gen_add_tl(addr, src1, src2);
+addr = va32_address(ctx, addr);
 tcg_gen_qemu_ld_tl(dest, addr, ctx->mem_idx, mop);
 maybe_nanbox_load(dest, mop);
 set_fpr(a->fd, dest);
@@ -149,6 +156,7 @@ static bool gen_fstore_le(DisasContext *ctx, arg_frr *a, 
MemOp mop)
 addr = tcg_temp_new();
 gen_helper_asrtle_d(cpu_env, src1, src2);
 tcg_gen_add_tl(addr, src1, src2);
+addr = va32_address(ctx, addr);
 tcg_gen_qemu_st_tl(src3, addr, ctx->mem_idx, mop);
 
 return true;
diff --git a/target/loongarch/insn_trans/trans_lsx.c.inc 
b/target/loongarch/insn_trans/trans_lsx.c.inc
index 68779daff6..b7325cfd8a 100644
--- a/target/loongarch/insn_trans/trans_lsx.c.inc
+++ b/target/loongarch/insn_trans/trans_lsx.c.inc
@@ -4271,6 +4271,7 @@ static bool trans_vld(DisasContext *ctx, arg_vr_i *a)
 tcg_gen_addi_tl(temp, addr, a->imm);
 addr = temp;
 }
+addr = va32_address(ctx, addr);
 
 tcg_gen_qemu_ld_i128(val, addr, ctx->mem_idx, MO_128 |

[PATCH v4 00/11] Add la32 & va32 mode for loongarch64-softmmu

2023-08-07 Thread Jiajie Chen

This patch series allow qemu-system-loongarch64 to emulate a LoongArch32
machine. A new CPU model is added for loongarch32. Initial GDB support
is added.

At the same time, VA32(32-bit virtual address) support is introduced for
LoongArch64.

LA32 support is tested using a small supervisor program at
https://github.com/jiegec/supervisor-la32. VA32 mode under LA64 is not
tested yet.

Changes since v3:

- Support VA32 mode for LoongArch64
- Check the current arch from CPUCFG.ARCH
- Reject la64-only instructions in la32 mode

Changes since v2:

- Fix typo in previous commit
- Fix VPPN width in TLBEHI/TLBREHI

Changes since v1:

- No longer create a separate qemu-system-loongarch32 executable, but
  allow user to run loongarch32 emulation using qemu-system-loongarch64
- Add loongarch32 cpu support for virt machine

Full changes:

Jiajie Chen (11):
  target/loongarch: Add macro to check current arch
  target/loongarch: Add new object class for loongarch32 cpus
  target/loongarch: Add GDB support for loongarch32 mode
  target/loongarch: Support LoongArch32 TLB entry
  target/loongarch: Support LoongArch32 DMW
  target/loongarch: Support LoongArch32 VPPN
  target/loongarch: Add LA32 & VA32 to DisasContext
  target/loongarch: Reject la64-only instructions in la32 mode
  target/loongarch: Truncate high 32 bits of address in VA32 mode
  target/loongarch: Sign extend results in VA32 mode
  target/loongarch: Add loongarch32 cpu la132

 configs/targets/loongarch64-softmmu.mak   |  2 +-
 gdb-xml/loongarch-base32.xml  | 45 ++
 hw/loongarch/virt.c   |  5 --
 target/loongarch/cpu-csr.h| 22 ++---
 target/loongarch/cpu.c| 88 ---
 target/loongarch/cpu.h| 33 ++-
 target/loongarch/gdbstub.c| 32 +--
 target/loongarch/insn_trans/trans_arith.c.inc | 34 +++
 .../loongarch/insn_trans/trans_atomic.c.inc   | 77 
 target/loongarch/insn_trans/trans_bit.c.inc   | 28 +++---
 .../loongarch/insn_trans/trans_branch.c.inc   |  9 +-
 target/loongarch/insn_trans/trans_extra.c.inc | 16 ++--
 .../loongarch/insn_trans/trans_fmemory.c.inc  |  8 ++
 target/loongarch/insn_trans/trans_fmov.c.inc  |  4 +-
 target/loongarch/insn_trans/trans_lsx.c.inc   |  6 ++
 .../loongarch/insn_trans/trans_memory.c.inc   | 78 +---
 target/loongarch/insn_trans/trans_shift.c.inc | 14 +--
 target/loongarch/tlb_helper.c | 66 +++---
 target/loongarch/translate.c  | 26 ++
 target/loongarch/translate.h  | 12 +++
 20 files changed, 430 insertions(+), 175 deletions(-)
 create mode 100644 gdb-xml/loongarch-base32.xml

-- 
2.41.0

[PATCH 1/3] hw/nvme: fix CRC64 for guard tag

2023-08-07 Thread Ankit Kumar

The nvme CRC64 generator expects the caller to pass inverted seed value.
Pass inverted crc value for metadata buffer.

Signed-off-by: Ankit Kumar 
---
 hw/nvme/dif.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/nvme/dif.c b/hw/nvme/dif.c
index 63c44c86ab..01b19c3373 100644
--- a/hw/nvme/dif.c
+++ b/hw/nvme/dif.c
@@ -115,7 +115,7 @@ static void nvme_dif_pract_generate_dif_crc64(NvmeNamespace 
*ns, uint8_t *buf,
 uint64_t crc = crc64_nvme(~0ULL, buf, ns->lbasz);
 
 if (pil) {
-crc = crc64_nvme(crc, mbuf, pil);
+crc = crc64_nvme(~crc, mbuf, pil);
 }
 
 dif->g64.guard = cpu_to_be64(crc);
@@ -246,7 +246,7 @@ static uint16_t nvme_dif_prchk_crc64(NvmeNamespace *ns, 
NvmeDifTuple *dif,
 uint64_t crc = crc64_nvme(~0ULL, buf, ns->lbasz);
 
 if (pil) {
-crc = crc64_nvme(crc, mbuf, pil);
+crc = crc64_nvme(~crc, mbuf, pil);
 }
 
 trace_pci_nvme_dif_prchk_guard_crc64(be64_to_cpu(dif->g64.guard), crc);
-- 
2.25.1

[PATCH 2/3] hw/nvme: fix disable pi checks for Type 3 protection

2023-08-07 Thread Ankit Kumar

As per the NVM command set specification, the protection information
checks for Type 3 protection are disabled, only when both application
and reference tag have all bits set to 1.

Signed-off-by: Ankit Kumar 
---
 hw/nvme/dif.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/hw/nvme/dif.c b/hw/nvme/dif.c
index 01b19c3373..f9bd29a2a6 100644
--- a/hw/nvme/dif.c
+++ b/hw/nvme/dif.c
@@ -157,7 +157,8 @@ static uint16_t nvme_dif_prchk_crc16(NvmeNamespace *ns, 
NvmeDifTuple *dif,
 {
 switch (NVME_ID_NS_DPS_TYPE(ns->id_ns.dps)) {
 case NVME_ID_NS_DPS_TYPE_3:
-if (be32_to_cpu(dif->g16.reftag) != 0x) {
+if ((be32_to_cpu(dif->g16.reftag) != 0x) ||
+(be16_to_cpu(dif->g16.apptag) != 0x)) {
 break;
 }
 
@@ -225,7 +226,7 @@ static uint16_t nvme_dif_prchk_crc64(NvmeNamespace *ns, 
NvmeDifTuple *dif,
 
 switch (NVME_ID_NS_DPS_TYPE(ns->id_ns.dps)) {
 case NVME_ID_NS_DPS_TYPE_3:
-if (r != 0x) {
+if (r != 0x || (be16_to_cpu(dif->g64.apptag) != 0x)) {
 break;
 }
 
-- 
2.25.1

[PATCH 3/3] docs: update hw/nvme documentation for protection information

2023-08-07 Thread Ankit Kumar

Add missing entry for pif ("protection information format").
Protection information size can be 8 or 16 bytes, Update the pil entry
as per the NVM command set specification.

Signed-off-by: Ankit Kumar 
---
 docs/system/devices/nvme.rst | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/docs/system/devices/nvme.rst b/docs/system/devices/nvme.rst
index 2a3af268f7..30d46d9338 100644
--- a/docs/system/devices/nvme.rst
+++ b/docs/system/devices/nvme.rst
@@ -271,9 +271,13 @@ The virtual namespace device supports DIF- and DIX-based 
protection information
 
 ``pil=UINT8`` (default: ``0``)
   Controls the location of the protection information within the metadata. Set
-  to ``1`` to transfer protection information as the first eight bytes of
-  metadata. Otherwise, the protection information is transferred as the last
-  eight bytes.
+  to ``1`` to transfer protection information as the first bytes of metadata.
+  Otherwise, the protection information is transferred as the last bytes of
+  metadata.
+
+``pif=UINT8`` (default: ``0``)
+  By default, the namespace device uses 16 bit guard protection information
+  format. Set to ``2`` to enable 64 bit guard protection information format.
 
 Virtualization Enhancements and SR-IOV (Experimental Support)
 -
-- 
2.25.1

[PATCH 0/3] hw/nvme: bug fixes and doc update

2023-08-07 Thread Ankit Kumar

This series fixes two bugs
1. CRC64 generation when metadata buffer is used.
2. Protection information disable check for Type 3 protection.

This series also updates the documentaion for pi (protection information),
and adds missing pif (protection information format) entry.

Ankit Kumar (3):
  hw/nvme: fix CRC64 for guard tag
  hw/nvme: fix disable pi checks for Type 3 protection
  docs: update hw/nvme documentation for protection information

 docs/system/devices/nvme.rst | 10 +++---
 hw/nvme/dif.c|  9 +
 2 files changed, 12 insertions(+), 7 deletions(-)

-- 
2.25.1

Re: [PULL 0/6] Fixes patches

2023-08-07 Thread Richard Henderson


On 8/7/23 13:47, marcandre.lur...@redhat.com wrote:

From: Marc-André Lureau

The following changes since commit 9400601a689a128c25fa9c21e932562e0eeb7a26:

   Merge tag 'pull-tcg-20230806-3' ofhttps://gitlab.com/rth7680/qemu  into 
staging (2023-08-06 16:47:48 -0700)

are available in the Git repository at:

   https://gitlab.com/marcandre.lureau/qemu.git  tags/fixes-pull-request

for you to fetch changes up to 58ea90f8032912b41e753a95089ba764fcc6446a:

   ui/gtk: set scanout mode in gd_egl/gd_gl_area_scanout_texture (2023-08-07 
17:13:42 +0400)


Fixes for 8.1

Hi,

Here is a collection of ui, dump and chardev fixes that are worth for 8.1.


Applied, thanks.  Please update https://wiki.qemu.org/ChangeLog/8.1 as 
appropriate.


r~

Re: [PATCH trivial for-8.1 0/3] trivial-patches for 2023-08-07

2023-08-07 Thread Richard Henderson


On 8/7/23 03:56, Michael Tokarev wrote:

The following changes since commit 9400601a689a128c25fa9c21e932562e0eeb7a26:

   Merge tag 'pull-tcg-20230806-3' of https://gitlab.com/rth7680/qemu into 
staging (2023-08-06 16:47:48 -0700)

are available in the Git repository at:

   https://gitlab.com/mjt0k/qemu.git tags/trivial-patches-pull

for you to fetch changes up to 6ee960823da8fd780ae9912c4327b7e85e80d846:

   Fixed incorrect LLONG alignment for openrisc and cris (2023-08-07 13:52:59 
+0300)


trivial-patches for 2023-08-07

there are 3 trivial bugfixes in there, for 8.1


Be more careful to use PULL in the subject.
This nearly got lost.

Applied, thanks.  Please update https://wiki.qemu.org/ChangeLog/8.1 as 
appropriate.


r~

Re: [PATCH] disas/riscv: Further correction to LUI disassembly

2023-08-07 Thread Richard Henderson


On 8/7/23 15:01, Richard Bagley wrote:
I do apologize, but I do not understand your remark at all. Could I trouble you to spell 
this out.


In:
+                snprintf(tmp, sizeof(tmp), "%d", dec->imm >> 12 & 0xf);
0xf is a mask which recovers the 20 bit field used to represent the immediate in the 
instruction encoding.


You seem to be responding to the syntax, which is unrelated to my change.
But I did notice it is the case that both GCC and LLVM disassemblers do not accept signed 
integer arguments to LUI:

lui r1, -1
but instead require
lui r1, 0xf


Your language is confusing.  Disassembler or assembler?  A disassembler would not "accept" 
but "output" arguments for LUI.


If the assembler rejects "lui r1, -1", that would be a bug, because the field 
*is* signed.

For the disassembler, the field *is* signed, therefore outputting a signed value is 
correct.  Outputting an unsigned hex value hides the fact that bit 31 is the sign.


To my mind this is exactly the same as emitting a signed value for the 
immediate in ADDI.


r~

[PATCH] virtio: don't zero out memory region cache for indirect descriptors

2023-08-07 Thread Ilya Maximets

Lots of virtio functions that are on a hot path in data transmission
are initializing indirect descriptor cache at the point of stack
allocation.  It's a 112 byte structure that is getting zeroed out on
each call adding unnecessary overhead.  It's going to be correctly
initialized later via special init function.  The only reason to
actually initialize right away is the ability to safely destruct it.
However, we only need to destruct it when it was used, i.e. when a
desc_cache points to it.

Removing these unnecessary stack initializations improves throughput
of virtio-net devices in terms of 64B packets per second by 6-14 %
depending on the case.  Tested with a proposed af-xdp network backend
and a dpdk testpmd application in the guest, but should be beneficial
for other virtio devices as well.

Signed-off-by: Ilya Maximets 
---
 hw/virtio/virtio.c | 42 +++---
 1 file changed, 27 insertions(+), 15 deletions(-)

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 309038fd46..a65396e616 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -1071,7 +1071,8 @@ static void virtqueue_split_get_avail_bytes(VirtQueue *vq,
 VirtIODevice *vdev = vq->vdev;
 unsigned int idx;
 unsigned int total_bufs, in_total, out_total;
-MemoryRegionCache indirect_desc_cache = MEMORY_REGION_CACHE_INVALID;
+MemoryRegionCache indirect_desc_cache;
+MemoryRegionCache *desc_cache = NULL;
 int64_t len = 0;
 int rc;
 
@@ -1079,7 +1080,6 @@ static void virtqueue_split_get_avail_bytes(VirtQueue *vq,
 total_bufs = in_total = out_total = 0;
 
 while ((rc = virtqueue_num_heads(vq, idx)) > 0) {
-MemoryRegionCache *desc_cache = >desc;
 unsigned int num_bufs;
 VRingDesc desc;
 unsigned int i;
@@ -1091,6 +1091,8 @@ static void virtqueue_split_get_avail_bytes(VirtQueue *vq,
 goto err;
 }
 
+desc_cache = >desc;
+
 vring_split_desc_read(vdev, , desc_cache, i);
 
 if (desc.flags & VRING_DESC_F_INDIRECT) {
@@ -1156,7 +1158,9 @@ static void virtqueue_split_get_avail_bytes(VirtQueue *vq,
 }
 
 done:
-address_space_cache_destroy(_desc_cache);
+if (desc_cache == _desc_cache) {
+address_space_cache_destroy(_desc_cache);
+}
 if (in_bytes) {
 *in_bytes = in_total;
 }
@@ -1207,8 +1211,8 @@ static void virtqueue_packed_get_avail_bytes(VirtQueue 
*vq,
 VirtIODevice *vdev = vq->vdev;
 unsigned int idx;
 unsigned int total_bufs, in_total, out_total;
-MemoryRegionCache *desc_cache;
-MemoryRegionCache indirect_desc_cache = MEMORY_REGION_CACHE_INVALID;
+MemoryRegionCache indirect_desc_cache;
+MemoryRegionCache *desc_cache = NULL;
 int64_t len = 0;
 VRingPackedDesc desc;
 bool wrap_counter;
@@ -1297,7 +1301,9 @@ static void virtqueue_packed_get_avail_bytes(VirtQueue 
*vq,
 vq->shadow_avail_idx = idx;
 vq->shadow_avail_wrap_counter = wrap_counter;
 done:
-address_space_cache_destroy(_desc_cache);
+if (desc_cache == _desc_cache) {
+address_space_cache_destroy(_desc_cache);
+}
 if (in_bytes) {
 *in_bytes = in_total;
 }
@@ -1487,8 +1493,8 @@ static void *virtqueue_split_pop(VirtQueue *vq, size_t sz)
 {
 unsigned int i, head, max;
 VRingMemoryRegionCaches *caches;
-MemoryRegionCache indirect_desc_cache = MEMORY_REGION_CACHE_INVALID;
-MemoryRegionCache *desc_cache;
+MemoryRegionCache indirect_desc_cache;
+MemoryRegionCache *desc_cache = NULL;
 int64_t len;
 VirtIODevice *vdev = vq->vdev;
 VirtQueueElement *elem = NULL;
@@ -1611,7 +1617,9 @@ static void *virtqueue_split_pop(VirtQueue *vq, size_t sz)
 
 trace_virtqueue_pop(vq, elem, elem->in_num, elem->out_num);
 done:
-address_space_cache_destroy(_desc_cache);
+if (desc_cache == _desc_cache) {
+address_space_cache_destroy(_desc_cache);
+}
 
 return elem;
 
@@ -1624,8 +1632,8 @@ static void *virtqueue_packed_pop(VirtQueue *vq, size_t 
sz)
 {
 unsigned int i, max;
 VRingMemoryRegionCaches *caches;
-MemoryRegionCache indirect_desc_cache = MEMORY_REGION_CACHE_INVALID;
-MemoryRegionCache *desc_cache;
+MemoryRegionCache indirect_desc_cache;
+MemoryRegionCache *desc_cache = NULL;
 int64_t len;
 VirtIODevice *vdev = vq->vdev;
 VirtQueueElement *elem = NULL;
@@ -1746,7 +1754,9 @@ static void *virtqueue_packed_pop(VirtQueue *vq, size_t 
sz)
 
 trace_virtqueue_pop(vq, elem, elem->in_num, elem->out_num);
 done:
-address_space_cache_destroy(_desc_cache);
+if (desc_cache == _desc_cache) {
+address_space_cache_destroy(_desc_cache);
+}
 
 return elem;
 
@@ -3935,8 +3945,8 @@ VirtioQueueElement 
*qmp_x_query_virtio_queue_element(const char *path,
 } else {
 unsigned int head, i, max;
 VRingMemoryRegionCaches *caches;
-MemoryRegionCache indirect_desc_cache = MEMORY_REGION_CACHE_INVALID;
-

Re: [PATCH] disas/riscv: Further correction to LUI disassembly

2023-08-07 Thread Richard Bagley

I do apologize, but I do not understand your remark at all. Could I trouble
you to spell this out.

In:
+snprintf(tmp, sizeof(tmp), "%d", dec->imm >> 12 & 0xf);
0xf is a mask which recovers the 20 bit field used to represent the
immediate in the instruction encoding.

You seem to be responding to the syntax, which is unrelated to my change.
But I did notice it is the case that both GCC and LLVM disassemblers do not
accept signed integer arguments to LUI:
lui r1, -1
but instead require
lui r1, 0xf
I don't see why the former is more accurate, but it would be an aid to the
assembly programmer.

I have recommended internally that if the current format cannot support
both, then it might be worthwhile to propose a pseudo instruction for RISCV
for precisely this syntax variant:
lui.s r1.-1

Richard

On Mon, Jul 31, 2023 at 1:37 PM Richard Henderson <
richard.hender...@linaro.org> wrote:

> On 7/31/23 11:33, Richard Bagley wrote:
> > The recent commit 36df75a0a9 corrected one aspect of LUI disassembly
> > by recovering the immediate argument from the result of LUI with a
> > shift right by 12. However, the shift right will left-fill with the
> > sign. By applying a mask we recover an unsigned representation of the
> > 20-bit field (which includes a sign bit).
>
> Why would you want that?  Surely
>
>  lui r1, -1
>
> is more accurate than
>
>  lui r1, 0xf
>
>
> r~
>

1 2 3 >

1 - 100 of 291 matches

Mail list logo