date:20240630

[PATCH] target/riscv: Validate the mode in write_vstvec

2024-06-30 Thread Jiayi Li

Base on the riscv-privileged spec, vstvec substitutes for the usual stvec.
Therefore, the encoding of the MODE should also be restricted to 0 and 1.

Signed-off-by: Jiayi Li 
---
 target/riscv/csr.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/target/riscv/csr.c b/target/riscv/csr.c
index 432c59dc66..f9229d92ab 100644
--- a/target/riscv/csr.c
+++ b/target/riscv/csr.c
@@ -3791,7 +3791,12 @@ static RISCVException read_vstvec(CPURISCVState *env, 
int csrno,
 static RISCVException write_vstvec(CPURISCVState *env, int csrno,
target_ulong val)
 {
-env->vstvec = val;
+/* bits [1:0] encode mode; 0 = direct, 1 = vectored, 2 >= reserved */
+if ((val & 3) < 2) {
+env->vstvec = val;
+} else {
+qemu_log_mask(LOG_UNIMP, "CSR_VSTVEC: reserved mode not supported\n");
+}
 return RISCV_EXCP_NONE;
 }
 
-- 
2.25.1

Re: [PATCH v2 1/1] memory tier: consolidate the initialization of memory tiers

2024-06-30 Thread Huang, Ying

Hi, Jack,

"Ho-Ren (Jack) Chuang"  writes:

I suggest you to merge the [0/1] with the change log here.  [0/1]
describes why do we need the patch.  The below text describes some
details.  Just don't use "---" to separate them.  We need both parts in
the final commit message.

> If we simply move the set_node_memory_tier() from memory_tier_init()
> to late_initcall(), it will result in HMAT not registering
> the mt_adistance_algorithm callback function, because
> set_node_memory_tier() is not performed during the memory tiering
> initialization phase, leading to a lack of correct default_dram
> information.
>
> Therefore, we introduced a nodemask to pass the information of the
> default DRAM nodes. The reason for not choosing to reuse
> default_dram_type->nodes is that it is not clean enough. So in the end,
> we use a __initdata variable, which is a variable that is released once
> initialization is complete, including both CPU and memory nodes for HMAT
> to iterate through.
>
> Besides, since default_dram_type may be checked/used during the
> initialization process of HMAT and drivers, it is better to keep the
> allocation of default_dram_type in memory_tier_init().

Why do we need it?  IIRC, we have deleted its usage in hmat.c.

> Signed-off-by: Ho-Ren (Jack) Chuang 
> Suggested-by: Jonathan Cameron 
> ---
>  drivers/acpi/numa/hmat.c |  5 +--
>  include/linux/memory-tiers.h |  2 ++
>  mm/memory-tiers.c| 59 +++-
>  3 files changed, 28 insertions(+), 38 deletions(-)
>
> diff --git a/drivers/acpi/numa/hmat.c b/drivers/acpi/numa/hmat.c
> index 2c8ccc91ebe6..a2f9e7a4b479 100644
> --- a/drivers/acpi/numa/hmat.c
> +++ b/drivers/acpi/numa/hmat.c
> @@ -940,10 +940,7 @@ static int hmat_set_default_dram_perf(void)
>   struct memory_target *target;
>   struct access_coordinate *attrs;
>  
> - if (!default_dram_type)
> - return -EIO;
> -
> - for_each_node_mask(nid, default_dram_type->nodes) {
> + for_each_node_mask(nid, default_dram_nodes) {
>   pxm = node_to_pxm(nid);
>   target = find_mem_target(pxm);
>   if (!target)
> diff --git a/include/linux/memory-tiers.h b/include/linux/memory-tiers.h
> index 0d70788558f4..fa61ad9c4d75 100644
> --- a/include/linux/memory-tiers.h
> +++ b/include/linux/memory-tiers.h
> @@ -38,6 +38,7 @@ struct access_coordinate;
>  #ifdef CONFIG_NUMA
>  extern bool numa_demotion_enabled;
>  extern struct memory_dev_type *default_dram_type;

Can we remove the above line?

> +extern nodemask_t default_dram_nodes __initdata;

We don't need to use __initdata in variable declaration.

>  struct memory_dev_type *alloc_memory_type(int adistance);
>  void put_memory_type(struct memory_dev_type *memtype);
>  void init_node_memory_type(int node, struct memory_dev_type *default_type);
> @@ -76,6 +77,7 @@ static inline bool node_is_toptier(int node)
>  
>  #define numa_demotion_enabledfalse
>  #define default_dram_typeNULL
> +#define default_dram_nodes NODE_MASK_NONE

Should we use  after "default_dram_nodes"?

>  /*
>   * CONFIG_NUMA implementation returns non NULL error.
>   */
> diff --git a/mm/memory-tiers.c b/mm/memory-tiers.c
> index 6632102bd5c9..a19a90c3ad36 100644
> --- a/mm/memory-tiers.c
> +++ b/mm/memory-tiers.c
> @@ -43,6 +43,7 @@ static LIST_HEAD(memory_tiers);
>  static LIST_HEAD(default_memory_types);
>  static struct node_memory_type_map node_memory_types[MAX_NUMNODES];
>  struct memory_dev_type *default_dram_type;
> +nodemask_t default_dram_nodes __initdata = NODE_MASK_NONE;
>  
>  static const struct bus_type memory_tier_subsys = {
>   .name = "memory_tiering",
> @@ -671,28 +672,38 @@ EXPORT_SYMBOL_GPL(mt_put_memory_types);
>  
>  /*
>   * This is invoked via `late_initcall()` to initialize memory tiers for
> - * CPU-less memory nodes after driver initialization, which is
> - * expected to provide `adistance` algorithms.
> + * memory nodes, both with and without CPUs. After the initialization of
> + * firmware and devices, adistance algorithms are expected to be provided.
>   */
>  static int __init memory_tier_late_init(void)
>  {
>   int nid;
> + struct memory_tier *memtier;
>  
> + get_online_mems();
>   guard(mutex)(_tier_lock);
> + /*
> +  * Look at all the existing and uninitialized N_MEMORY nodes and
> +  * add them to default memory tier or to a tier if we already have
> +  * memory types assigned.
> +  */

If the memory type of the node has been assigned, we will skip it in the
following code.  So, I think that we need to revise the comments.

>   for_each_node_state(nid, N_MEMORY) {
>   /*
> -  * Some device drivers may have initialized memory tiers
> -  * between `memory_tier_init()` and `memory_tier_late_init()`,
> -  * potentially bringing online memory nodes and
> -  * configuring memory tiers. Exclude them here.
> +  * Some device

Re: [PATCH 2/2] target/i386: drop AMD machine check bits from Intel CPUID

2024-06-30 Thread Zhao Liu

On Fri, Jun 28, 2024 at 03:23:11PM +0200, Paolo Bonzini wrote:
> Date: Fri, 28 Jun 2024 15:23:11 +0200
> From: Paolo Bonzini 
> Subject: Re: [PATCH 2/2] target/i386: drop AMD machine check bits from
>  Intel CPUID
> 
> Il ven 28 giu 2024, 10:32 Xiaoyao Li  ha scritto:
> 
> > On 6/27/2024 10:06 PM, Paolo Bonzini wrote:
> > > The recent addition of the SUCCOR bit to kvm_arch_get_supported_cpuid()
> > > causes the bit to be visible when "-cpu host" VMs are started on Intel
> > > processors.
> > >
> > > While this should in principle be harmless, it's not tidy and we don't
> > > even know for sure that it doesn't cause any guest OS to take unexpected
> > > paths.  Since x86_cpu_get_supported_feature_word() can return different
> > > different values depending on the guest, adjust it to hide the SUCCOR
> >
> > superfluous different
> >
> > > bit if the guest has non-AMD vendor.
> >
> > It seems to adjust it based on vendor in kvm_arch_get_supported_cpuid()
> > is better than in x86_cpu_get_supported_feature_word(). Otherwise
> > kvm_arch_get_supported_cpuid() still returns "risky" value for Intel VMs.
> >
> 
> But the cpuid bit is only invalid for Intel *guest* vendor, not host. It is
> not a problem to have it if you run on Intel host but have a guest model
> with AMD vendor.
> 
> I will check if there are other callers of kvm_arch_get_supported_cpuid(),
> or callers of x86_cpu_get_supported_feature_word() with NULL cpu, that
> might care about the difference.

Another example is CPUID_EXT3_TOPOEXT, though it's a no_autoenable_flags,
it can be set by "-cpu host,+topoext" on Intel platforms.

For this case, we have recognized that that the host/max CPU should only
contain vender specific features, and I think it would be hard to expand
such a rule afterwards, especially since there's other x86 vender like
zhaoxin who implement a subset of Intel/AMD:

https://lore.kernel.org/qemu-devel/d4c0dae5-b9d5-4deb-b300-78492ab11...@zhaoxin.com/#t

What about a new flag "host_bare_metal_check" in FeatureWordInfo? Then
if a feature is marked as "host_bare_metal_check", in addition to the
current checks in x86_cpu_get_supported_feature_word(), bare-metal CPUID
check is also needed (by host_cpuid()) for "host" CPU.

-Zhao

[PATCH 6/6] target/riscv: Enable RV32 CPU support in RV64 QEMU

2024-06-30 Thread LIU Zhiwei

From: TANG Tiancheng 

Add gdb XML files and adjust CPU initialization to allow running RV32 CPUs
in RV64 QEMU.

Signed-off-by: TANG Tiancheng 
Reviewed-by: Liu Zhiwei 
---
 configs/targets/riscv64-softmmu.mak |  2 +-
 target/riscv/cpu.c  | 17 +
 2 files changed, 14 insertions(+), 5 deletions(-)

diff --git a/configs/targets/riscv64-softmmu.mak 
b/configs/targets/riscv64-softmmu.mak
index f688ffa7bc..5c1abb4b51 100644
--- a/configs/targets/riscv64-softmmu.mak
+++ b/configs/targets/riscv64-softmmu.mak
@@ -1,6 +1,6 @@
 TARGET_ARCH=riscv64
 TARGET_BASE_ARCH=riscv
 TARGET_SUPPORTS_MTTCG=y
-TARGET_XML_FILES= gdb-xml/riscv-64bit-cpu.xml gdb-xml/riscv-32bit-fpu.xml 
gdb-xml/riscv-64bit-fpu.xml gdb-xml/riscv-64bit-virtual.xml
+TARGET_XML_FILES= gdb-xml/riscv-64bit-cpu.xml gdb-xml/riscv-32bit-fpu.xml 
gdb-xml/riscv-64bit-fpu.xml gdb-xml/riscv-64bit-virtual.xml 
gdb-xml/riscv-32bit-cpu.xml gdb-xml/riscv-32bit-virtual.xml
 # needed by boot.c
 TARGET_NEED_FDT=y
diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index 69a08e8c2c..58165901a2 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -630,8 +630,10 @@ static void rv64e_bare_cpu_init(Object *obj)
 riscv_cpu_set_misa_ext(env, RVE);
 }
 
-#else /* !TARGET_RISCV64 */
+#endif /* !TARGET_RISCV64 */
 
+#if defined(TARGET_RISCV32) || \
+(defined(TARGET_RISCV64) && !defined(CONFIG_USER_ONLY))
 static void rv32_base_cpu_init(Object *obj)
 {
 RISCVCPU *cpu = RISCV_CPU(obj);
@@ -2544,6 +2546,13 @@ static const TypeInfo riscv_cpu_type_infos[] = {
 #if defined(TARGET_RISCV32)
 DEFINE_DYNAMIC_CPU(TYPE_RISCV_CPU_ANY,   MXL_RV32,  
riscv_any_cpu_init),
 DEFINE_DYNAMIC_CPU(TYPE_RISCV_CPU_MAX,   MXL_RV32,  
riscv_max_cpu_init),
+#elif defined(TARGET_RISCV64)
+DEFINE_DYNAMIC_CPU(TYPE_RISCV_CPU_ANY,   MXL_RV64,  
riscv_any_cpu_init),
+DEFINE_DYNAMIC_CPU(TYPE_RISCV_CPU_MAX,   MXL_RV64,  
riscv_max_cpu_init),
+#endif
+
+#if defined(TARGET_RISCV32) || \
+(defined(TARGET_RISCV64) && !defined(CONFIG_USER_ONLY))
 DEFINE_DYNAMIC_CPU(TYPE_RISCV_CPU_BASE32,MXL_RV32,  
rv32_base_cpu_init),
 DEFINE_VENDOR_CPU(TYPE_RISCV_CPU_IBEX,   MXL_RV32,  
rv32_ibex_cpu_init),
 DEFINE_VENDOR_CPU(TYPE_RISCV_CPU_SIFIVE_E31, MXL_RV32,  
rv32_sifive_e_cpu_init),
@@ -2551,9 +2560,9 @@ static const TypeInfo riscv_cpu_type_infos[] = {
 DEFINE_VENDOR_CPU(TYPE_RISCV_CPU_SIFIVE_U34, MXL_RV32,  
rv32_sifive_u_cpu_init),
 DEFINE_BARE_CPU(TYPE_RISCV_CPU_RV32I,MXL_RV32,  
rv32i_bare_cpu_init),
 DEFINE_BARE_CPU(TYPE_RISCV_CPU_RV32E,MXL_RV32,  
rv32e_bare_cpu_init),
-#elif defined(TARGET_RISCV64)
-DEFINE_DYNAMIC_CPU(TYPE_RISCV_CPU_ANY,   MXL_RV64,  
riscv_any_cpu_init),
-DEFINE_DYNAMIC_CPU(TYPE_RISCV_CPU_MAX,   MXL_RV64,  
riscv_max_cpu_init),
+#endif
+
+#if defined(TARGET_RISCV64)
 DEFINE_DYNAMIC_CPU(TYPE_RISCV_CPU_BASE64,MXL_RV64,  
rv64_base_cpu_init),
 DEFINE_VENDOR_CPU(TYPE_RISCV_CPU_SIFIVE_E51, MXL_RV64,  
rv64_sifive_e_cpu_init),
 DEFINE_VENDOR_CPU(TYPE_RISCV_CPU_SIFIVE_U54, MXL_RV64,  
rv64_sifive_u_cpu_init),
-- 
2.43.0

[PATCH 5/6] target/riscv: Correct mcause/scause bit width for RV32 in RV64 QEMU

2024-06-30 Thread LIU Zhiwei

From: TANG Tiancheng 

Ensure mcause high bit is correctly set by using 32-bit width for RV32
mode and 64-bit width for RV64 mode.

Signed-off-by: TANG Tiancheng 
Reviewed-by: Liu Zhiwei 
---
 target/riscv/cpu_helper.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c
index 1af83a0a36..88a187c10a 100644
--- a/target/riscv/cpu_helper.c
+++ b/target/riscv/cpu_helper.c
@@ -1673,6 +1673,8 @@ void riscv_cpu_do_interrupt(CPUState *cs)
 target_ulong tinst = 0;
 target_ulong htval = 0;
 target_ulong mtval2 = 0;
+int sxlen = 0;
+int mxlen = 0;
 
 if (!async) {
 /* set tval to badaddr for traps with address information */
@@ -1799,7 +1801,8 @@ void riscv_cpu_do_interrupt(CPUState *cs)
 s = set_field(s, MSTATUS_SPP, env->priv);
 s = set_field(s, MSTATUS_SIE, 0);
 env->mstatus = s;
-env->scause = cause | ((target_ulong)async << (TARGET_LONG_BITS - 1));
+sxlen = 16UL << riscv_cpu_sxl(env);
+env->scause = cause | ((target_ulong)async << (sxlen - 1));
 env->sepc = env->pc;
 env->stval = tval;
 env->htval = htval;
@@ -1830,7 +1833,8 @@ void riscv_cpu_do_interrupt(CPUState *cs)
 s = set_field(s, MSTATUS_MPP, env->priv);
 s = set_field(s, MSTATUS_MIE, 0);
 env->mstatus = s;
-env->mcause = cause | ~(((target_ulong)-1) >> async);
+mxlen = 16UL << riscv_cpu_mxl(env);
+env->mcause = cause | ((target_ulong)async << (mxlen - 1));
 env->mepc = env->pc;
 env->mtval = tval;
 env->mtval2 = mtval2;
-- 
2.43.0

[PATCH 4/6] target/riscv: Detect sxl to set bit width for RV32 in RV64

2024-06-30 Thread LIU Zhiwei

From: TANG Tiancheng 

Ensure correct bit width based on sxl when running RV32 on RV64 QEMU.
This is required as MMU address translations run in S-mode.

Signed-off-by: TANG Tiancheng 
Reviewed-by: Liu Zhiwei 
---
 target/riscv/cpu_helper.c | 15 +++
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c
index 6709622dd3..1af83a0a36 100644
--- a/target/riscv/cpu_helper.c
+++ b/target/riscv/cpu_helper.c
@@ -887,12 +887,14 @@ static int get_physical_address(CPURISCVState *env, 
hwaddr *physical,
 
 CPUState *cs = env_cpu(env);
 int va_bits = PGSHIFT + levels * ptidxbits + widened;
+int sxlen = 16UL << riscv_cpu_sxl(env);
+int sxlen_bytes = sxlen / 8;
 
 if (first_stage == true) {
 target_ulong mask, masked_msbs;
 
-if (TARGET_LONG_BITS > (va_bits - 1)) {
-mask = (1L << (TARGET_LONG_BITS - (va_bits - 1))) - 1;
+if (sxlen > (va_bits - 1)) {
+mask = (1L << (sxlen - (va_bits - 1))) - 1;
 } else {
 mask = 0;
 }
@@ -961,7 +963,7 @@ restart:
 
 int pmp_prot;
 int pmp_ret = get_physical_address_pmp(env, _prot, pte_addr,
-   sizeof(target_ulong),
+   sxlen_bytes,
MMU_DATA_LOAD, PRV_S);
 if (pmp_ret != TRANSLATE_SUCCESS) {
 return TRANSLATE_PMP_FAIL;
@@ -1113,7 +1115,7 @@ restart:
  *   it is no longer valid and we must re-walk the page table.
  */
 MemoryRegion *mr;
-hwaddr l = sizeof(target_ulong), addr1;
+hwaddr l = sxlen_bytes, addr1;
 mr = address_space_translate(cs->as, pte_addr, , ,
  false, MEMTXATTRS_UNSPECIFIED);
 if (memory_region_is_ram(mr)) {
@@ -1126,6 +1128,11 @@ restart:
 *pte_pa = pte = updated_pte;
 #else
 target_ulong old_pte = qatomic_cmpxchg(pte_pa, pte, updated_pte);
+if (riscv_cpu_sxl(env) == MXL_RV32) {
+old_pte = qatomic_cmpxchg((uint32_t *)pte_pa, pte, 
updated_pte);
+} else {
+old_pte = qatomic_cmpxchg(pte_pa, pte, updated_pte);
+}
 if (old_pte != pte) {
 goto restart;
 }
-- 
2.43.0

[PATCH 3/6] target/riscv: Correct SXL return value for RV32 in RV64 QEMU

2024-06-30 Thread LIU Zhiwei

From: TANG Tiancheng 

Ensure that riscv_cpu_sxl returns MXL_RV32 when runningRV32 in an
RV64 QEMU.

Signed-off-by: TANG Tiancheng 
Fixes: 05e6ca5e156 ("target/riscv: Ignore reserved bits in PTE for RV64")
Reviewed-by: Liu Zhiwei 
---
 target/riscv/cpu.h | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index 6fe0d712b4..36a712044a 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -668,8 +668,11 @@ static inline RISCVMXL riscv_cpu_sxl(CPURISCVState *env)
 #ifdef CONFIG_USER_ONLY
 return env->misa_mxl;
 #else
-return get_field(env->mstatus, MSTATUS64_SXL);
+if (env->misa_mxl != MXL_RV32) {
+return get_field(env->mstatus, MSTATUS64_SXL);
+}
 #endif
+return MXL_RV32;
 }
 #endif
 
-- 
2.43.0

[PATCH 2/6] target/riscv: Adjust PMP size for no-MMU RV64 QEMU running RV32

2024-06-30 Thread LIU Zhiwei

From: TANG Tiancheng 

Ensure pmp_size is correctly determined using mxl for RV32
in RV64 QEMU.

Signed-off-by: TANG Tiancheng 
Reviewed-by: Liu Zhiwei 
---
 target/riscv/pmp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/riscv/pmp.c b/target/riscv/pmp.c
index 9eea397e72..f65aa3dba7 100644
--- a/target/riscv/pmp.c
+++ b/target/riscv/pmp.c
@@ -326,7 +326,7 @@ bool pmp_hart_has_privs(CPURISCVState *env, hwaddr addr,
  */
 pmp_size = -(addr | TARGET_PAGE_MASK);
 } else {
-pmp_size = sizeof(target_ulong);
+pmp_size = 2UL << riscv_cpu_mxl(env);
 }
 } else {
 pmp_size = size;
-- 
2.43.0

[PATCH 1/6] target/riscv: Add fw_dynamic_info32 for booting RV32 OpenSBI

2024-06-30 Thread LIU Zhiwei

From: TANG Tiancheng 

RV32 OpenSBI need a fw_dynamic_info parameter with 32-bit fields instead
of target_ulong.

In RV64 QEMU, target_ulong is 64. So it is not right for booting RV32 OpenSBI.
We create a fw_dynmaic_info32 struct for this purpose.

Signed-off-by: TANG Tiancheng 
Reviewed-by: Liu Zhiwei 
---
 hw/riscv/boot.c | 35 ++---
 hw/riscv/sifive_u.c |  3 ++-
 include/hw/riscv/boot.h |  4 +++-
 include/hw/riscv/boot_opensbi.h | 29 +++
 4 files changed, 57 insertions(+), 14 deletions(-)

diff --git a/hw/riscv/boot.c b/hw/riscv/boot.c
index 47281ca853..1a2c1ff9e0 100644
--- a/hw/riscv/boot.c
+++ b/hw/riscv/boot.c
@@ -342,27 +342,33 @@ void riscv_load_fdt(hwaddr fdt_addr, void *fdt)
 rom_ptr_for_as(_space_memory, fdt_addr, 
fdtsize));
 }
 
-void riscv_rom_copy_firmware_info(MachineState *machine, hwaddr rom_base,
-  hwaddr rom_size, uint32_t reset_vec_size,
+void riscv_rom_copy_firmware_info(MachineState *machine,
+  RISCVHartArrayState *harts,
+  hwaddr rom_base, hwaddr rom_size,
+  uint32_t reset_vec_size,
   uint64_t kernel_entry)
 {
+struct fw_dynamic_info32 dinfo32;
 struct fw_dynamic_info dinfo;
 size_t dinfo_len;
 
-if (sizeof(dinfo.magic) == 4) {
-dinfo.magic = cpu_to_le32(FW_DYNAMIC_INFO_MAGIC_VALUE);
-dinfo.version = cpu_to_le32(FW_DYNAMIC_INFO_VERSION);
-dinfo.next_mode = cpu_to_le32(FW_DYNAMIC_INFO_NEXT_MODE_S);
-dinfo.next_addr = cpu_to_le32(kernel_entry);
+if (riscv_is_32bit(harts)) {
+dinfo32.magic = cpu_to_le32(FW_DYNAMIC_INFO_MAGIC_VALUE);
+dinfo32.version = cpu_to_le32(FW_DYNAMIC_INFO_VERSION);
+dinfo32.next_mode = cpu_to_le32(FW_DYNAMIC_INFO_NEXT_MODE_S);
+dinfo32.next_addr = cpu_to_le32(kernel_entry);
+dinfo32.options = 0;
+dinfo32.boot_hart = 0;
+dinfo_len = sizeof(dinfo32);
 } else {
 dinfo.magic = cpu_to_le64(FW_DYNAMIC_INFO_MAGIC_VALUE);
 dinfo.version = cpu_to_le64(FW_DYNAMIC_INFO_VERSION);
 dinfo.next_mode = cpu_to_le64(FW_DYNAMIC_INFO_NEXT_MODE_S);
 dinfo.next_addr = cpu_to_le64(kernel_entry);
+dinfo.options = 0;
+dinfo.boot_hart = 0;
+dinfo_len = sizeof(dinfo);
 }
-dinfo.options = 0;
-dinfo.boot_hart = 0;
-dinfo_len = sizeof(dinfo);
 
 /**
  * copy the dynamic firmware info. This information is specific to
@@ -374,7 +380,10 @@ void riscv_rom_copy_firmware_info(MachineState *machine, 
hwaddr rom_base,
 exit(1);
 }
 
-rom_add_blob_fixed_as("mrom.finfo", , dinfo_len,
+rom_add_blob_fixed_as("mrom.finfo",
+   riscv_is_32bit(harts) ?
+   (void *) : (void *),
+   dinfo_len,
rom_base + reset_vec_size,
_space_memory);
 }
@@ -430,7 +439,9 @@ void riscv_setup_rom_reset_vec(MachineState *machine, 
RISCVHartArrayState *harts
 }
 rom_add_blob_fixed_as("mrom.reset", reset_vec, sizeof(reset_vec),
   rom_base, _space_memory);
-riscv_rom_copy_firmware_info(machine, rom_base, rom_size, 
sizeof(reset_vec),
+riscv_rom_copy_firmware_info(machine, harts,
+ rom_base, rom_size,
+ sizeof(reset_vec),
  kernel_entry);
 }
 
diff --git a/hw/riscv/sifive_u.c b/hw/riscv/sifive_u.c
index af5f923f54..5010c3eadb 100644
--- a/hw/riscv/sifive_u.c
+++ b/hw/riscv/sifive_u.c
@@ -646,7 +646,8 @@ static void sifive_u_machine_init(MachineState *machine)
 rom_add_blob_fixed_as("mrom.reset", reset_vec, sizeof(reset_vec),
   memmap[SIFIVE_U_DEV_MROM].base, 
_space_memory);
 
-riscv_rom_copy_firmware_info(machine, memmap[SIFIVE_U_DEV_MROM].base,
+riscv_rom_copy_firmware_info(machine, >soc.u_cpus,
+ memmap[SIFIVE_U_DEV_MROM].base,
  memmap[SIFIVE_U_DEV_MROM].size,
  sizeof(reset_vec), kernel_entry);
 
diff --git a/include/hw/riscv/boot.h b/include/hw/riscv/boot.h
index a2e4ae9cb0..806256d23f 100644
--- a/include/hw/riscv/boot.h
+++ b/include/hw/riscv/boot.h
@@ -56,7 +56,9 @@ void riscv_setup_rom_reset_vec(MachineState *machine, 
RISCVHartArrayState *harts
hwaddr rom_base, hwaddr rom_size,
uint64_t kernel_entry,
uint64_t fdt_load_addr);
-void riscv_rom_copy_firmware_info(MachineState *machine, hwaddr rom_base,
+void riscv_rom_copy_firmware_info(MachineState *machine,
+  RISCVHartArrayState *harts,
+

[PATCH 0/6] target/riscv: Expose RV32 cpu to RV64 QEMU

2024-06-30 Thread LIU Zhiwei

From: TANG Tiancheng 

This patch set aims to expose 32-bit RISC-V cpu to RV64 QEMU. Thus
qemu-system-riscv64 can directly boot a RV32 Linux.

This patch set has been tested with 6.9.0 Linux Image.

- Run RV64 QEMU with RV32 CPU
qemu-system-riscv64 -cpu rv32 -M virt -nographic \
-kernel Image \
-append "root=/dev/vda ro console=ttyS0" \
-drive file=rootfs.ext2,format=raw,id=hd0 \
-device virtio-blk-device,drive=hd0 -netdev user,id=net0 \
-device virtio-net-device,netdev=net0

OpenSBI v1.4
QEMU emulator version 9.0.50 (v9.0.0-1132-g7799dc2e3b)
[0.00] Linux version 6.9.0 (developer@11109ca35736) 
(riscv32-unknown-linux-gnu-gcc (gc891d8dc23e-dirty) 13.2.0, GNU ld (GNU 
Binutils) 2.42) #3 SMP Fri May 31 08:42:15 UTC 2024
[0.00] random: crng init done
[0.00] OF: fdt: Ignoring memory range 0x8000 - 0x8040
[0.00] Machine model: riscv-virtio,qemu
[0.00] SBI specification v2.0 detected
[0.00] SBI implementation ID=0x1 Version=0x10004
[0.00] SBI TIME extension detected
[0.00] SBI IPI extension detected
[0.00] SBI RFENCE extension detected
[0.00] SBI SRST extension detected
[0.00] SBI DBCN extension detected
[0.00] efi: UEFI not found.
[0.00] OF: reserved mem: 0x8000..0x8003 (256 KiB) nomap 
non-reusable mmode_resv1@8000
[0.00] OF: reserved mem: 0x8004..0x8004 (64 KiB) nomap 
non-reusable mmode_resv0@8004
[0.00] Zone ranges:
[0.00]   Normal   [mem 0x8040-0x87ff]
[0.00] Movable zone start for each node
[0.00] Early memory node ranges
[0.00]   node   0: [mem 0x8040-0x87ff]
[0.00] Initmem setup node 0 [mem 0x8040-0x87ff]
[0.00] On node 0, zone Normal: 1024 pages in unavailable ranges
[0.00] SBI HSM extension detected
[0.00] riscv: base ISA extensions acdfhim
[0.00] riscv: ELF capabilities acdfim
[0.00] percpu: Embedded 17 pages/cpu s37728 r8192 d23712 u69632
[0.00] Kernel command line: root=/dev/vda ro console=ttyS0
[0.00] Dentry cache hash table entries: 16384 (order: 4, 65536 bytes, 
linear)
[0.00] Inode-cache hash table entries: 8192 (order: 3, 32768 bytes, 
linear)
[0.00] Built 1 zonelists, mobility grouping on.  Total pages: 31465
[0.00] mem auto-init: stack:all(zero), heap alloc:off, heap free:off
[0.00] Virtual kernel memory layout:
[0.00]   fixmap : 0x9c80 - 0x9d00   (8192 kB)
[0.00]   pci io : 0x9d00 - 0x9e00   (  16 MB)
[0.00]  vmemmap : 0x9e00 - 0xa000   (  32 MB)
[0.00]  vmalloc : 0xa000 - 0xc000   ( 512 MB)
[0.00]   lowmem : 0xc000 - 0xc7c0   ( 124 MB)
[0.00] Memory: 95700K/126976K available (9090K kernel code, 8845K 
rwdata, 4096K rodata, 4231K init, 341K bss, 31276K reserved, 0K cma-reserved)
...
Welcome to Buildroot
buildroot login: root
# cat /proc/cpuinfo
processor   : 0
hart: 0
isa : 
rv32imafdch_zicbom_zicboz_zicntr_zicsr_zifencei_zihintntl_zihintpause_zihpm_zfa_zba_zbb_zbc_zbs_sstc
mmu : sv32

TANG Tiancheng (6):
  target/riscv: Add fw_dynamic_info32 for booting RV32 OpenSBI
  target/riscv: Adjust PMP size for no-MMU RV64 QEMU running RV32
  target/riscv: Correct SXL return value for RV32 in RV64 QEMU
  target/riscv: Detect sxl to set bit width for RV32 in RV64
  target/riscv: Correct mcause/scause bit width for RV32 in RV64 QEMU
  target/riscv: Enable RV32 CPU support in RV64 QEMU

 configs/targets/riscv64-softmmu.mak |  2 +-
 hw/riscv/boot.c | 35 +++--
 hw/riscv/sifive_u.c |  3 ++-
 include/hw/riscv/boot.h |  4 +++-
 include/hw/riscv/boot_opensbi.h | 29 
 target/riscv/cpu.c  | 17 ++
 target/riscv/cpu.h  |  5 -
 target/riscv/cpu_helper.c   | 23 ++-
 target/riscv/pmp.c  |  2 +-
 9 files changed, 93 insertions(+), 27 deletions(-)

-- 
2.43.0

RE: [PATCH] hw/misc/bcm2835_thermal: Handle invalid address accesses gracefully

2024-06-30 Thread Xingtao Yao (Fujitsu)

Hi, zheyu

> -Original Message-
> From: qemu-devel-bounces+yaoxt.fnst=fujitsu@nongnu.org
>  On Behalf Of Zheyu
> Ma
> Sent: Sunday, June 30, 2024 11:14 PM
> To: Peter Maydell ; Philippe Mathieu-Daudé
> 
> Cc: Zheyu Ma ; qemu-...@nongnu.org;
> qemu-devel@nongnu.org
> Subject: [PATCH] hw/misc/bcm2835_thermal: Handle invalid address accesses
> gracefully
> 
> This commit handles invalid address accesses gracefully in both read and write
> functions. Instead of asserting and aborting, it logs an error message and 
> returns
> a neutral value for read operations and does nothing for write operations.
> 
> Error log:
> ERROR:hw/misc/bcm2835_thermal.c:55:bcm2835_thermal_read: code should not
> be reached
> Bail out! ERROR:hw/misc/bcm2835_thermal.c:55:bcm2835_thermal_read: code
> should not be reached
> Aborted
> 
> Reproducer:
> cat << EOF | qemu-system-aarch64 -display \
> none -machine accel=qtest, -m 512M -machine raspi3b -m 1G -qtest stdio
> readw 0x3f212003
> EOF
> 
> Signed-off-by: Zheyu Ma 
> ---
>  hw/misc/bcm2835_thermal.c | 12 
>  1 file changed, 8 insertions(+), 4 deletions(-)
> 
> diff --git a/hw/misc/bcm2835_thermal.c b/hw/misc/bcm2835_thermal.c
> index ee7816b8a5..5c2a429d58 100644
> --- a/hw/misc/bcm2835_thermal.c
> +++ b/hw/misc/bcm2835_thermal.c
> @@ -51,8 +51,10 @@ static uint64_t bcm2835_thermal_read(void *opaque,
> hwaddr addr, unsigned size)
>  val = FIELD_DP32(bcm2835_thermal_temp2adc(25), STAT, VALID, true);
>  break;
>  default:
> -/* MemoryRegionOps are aligned, so this can not happen. */
> -g_assert_not_reached();
> +qemu_log_mask(LOG_GUEST_ERROR,
> +  "bcm2835_thermal_read: invalid address 0x%"
> +  HWADDR_PRIx "\n", addr);
> +val = 0;
>  }
>  return val;
>  }
> @@ -72,8 +74,10 @@ static void bcm2835_thermal_write(void *opaque, hwaddr
> addr,
> __func__, value, addr);
>  break;
>  default:
> -/* MemoryRegionOps are aligned, so this can not happen. */
> -g_assert_not_reached();
> +qemu_log_mask(LOG_GUEST_ERROR,
> +  "bcm2835_thermal_write: invalid address 0x%"
> +  HWADDR_PRIx "\n", addr);
> +break;
>  }
>  }

the default branch will never be reached in normal access, so I think this 
modification is not needed.

> 
> --
> 2.34.1
>

RE: [PATCH] hw/display/tcx: Fix out-of-bounds access in tcx_blit_writel

2024-06-30 Thread Xingtao Yao (Fujitsu)

Hi, zheyu

> -Original Message-
> From: qemu-devel-bounces+yaoxt.fnst=fujitsu@nongnu.org
>  On Behalf Of Zheyu
> Ma
> Sent: Sunday, June 30, 2024 9:04 PM
> To: Mark Cave-Ayland 
> Cc: Zheyu Ma ; qemu-devel@nongnu.org
> Subject: [PATCH] hw/display/tcx: Fix out-of-bounds access in tcx_blit_writel
> 
> This patch addresses a potential out-of-bounds memory access issue in the
> tcx_blit_writel function. It adds bounds checking to ensure that memory
> accesses do not exceed the allocated VRAM size. If an out-of-bounds access
> is detected, an error is logged using qemu_log_mask.
> 
> ASAN log:
> ==2960379==ERROR: AddressSanitizer: SEGV on unknown address
> 0x7f524752fd01 (pc 0x7f525c2c4881 bp 0x7ffdaf87bfd0 sp 0x7ffdaf87b788 T0)
> ==2960379==The signal is caused by a READ memory access.
> #0 0x7f525c2c4881 in memcpy
> string/../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:222
> #1 0x55aa782bd5b1 in __asan_memcpy
> llvm/compiler-rt/lib/asan/asan_interceptors_memintrinsics.cpp:22:3
> #2 0x55aa7854dedd in tcx_blit_writel hw/display/tcx.c:590:13
> 
> Reproducer:
> cat << EOF | qemu-system-sparc -display none \
> -machine accel=qtest, -m 512M -machine LX -m 256 -qtest stdio
> writel 0x562e98c4 0x3d92fd01
> EOF
> 
> Signed-off-by: Zheyu Ma 
> ---
>  hw/display/tcx.c | 9 +
>  1 file changed, 9 insertions(+)
> 
> diff --git a/hw/display/tcx.c b/hw/display/tcx.c
> index 99507e7638..af43bea7f2 100644
> --- a/hw/display/tcx.c
> +++ b/hw/display/tcx.c
> @@ -33,6 +33,7 @@
>  #include "migration/vmstate.h"
>  #include "qemu/error-report.h"
>  #include "qemu/module.h"
> +#include "qemu/log.h"
>  #include "qom/object.h"
> 
>  #define TCX_ROM_FILE "QEMU,tcx.bin"
> @@ -577,6 +578,14 @@ static void tcx_blit_writel(void *opaque, hwaddr addr,
>  addr = (addr >> 3) & 0xf;
>  adsr = val & 0xff;
>  len = ((val >> 24) & 0x1f) + 1;
> +
> +if (addr + len > s->vram_size || adsr + len > s->vram_size) {
if adsr == 0xff, this condition may be always true, thus the branch “if 
(adsr == 0xff) {” will be never
reached.

> +qemu_log_mask(LOG_GUEST_ERROR,
> +  "%s: VRAM access out of bounds. addr: 0x%lx, adsr:
> 0x%x, len: %u\n",
> +  __func__, addr, adsr, len);
> +return;
> +}
> +
>  if (adsr == 0xff) {
>  memset(>vram[addr], s->tmpblit, len);
>  if (s->depth == 24) {
> --
> 2.34.1
>

[PATCH 2/5] target/i386: Convert cc_op_live to a function

2024-06-30 Thread Richard Henderson

Assert that op is known.

Signed-off-by: Richard Henderson 
---
 target/i386/tcg/translate.c  | 56 +++-
 target/i386/tcg/decode-new.c.inc |  2 +-
 target/i386/tcg/emit.c.inc   |  6 ++--
 3 files changed, 39 insertions(+), 25 deletions(-)

diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c
index 95bad55bf4..e5a8aaf793 100644
--- a/target/i386/tcg/translate.c
+++ b/target/i386/tcg/translate.c
@@ -290,26 +290,38 @@ enum {
 };
 
 /* Bit set if the global variable is live after setting CC_OP to X.  */
-static const uint8_t cc_op_live[CC_OP_NB] = {
-[CC_OP_DYNAMIC] = USES_CC_DST | USES_CC_SRC | USES_CC_SRC2,
-[CC_OP_EFLAGS] = USES_CC_SRC,
-[CC_OP_MULB ... CC_OP_MULQ] = USES_CC_DST | USES_CC_SRC,
-[CC_OP_ADDB ... CC_OP_ADDQ] = USES_CC_DST | USES_CC_SRC,
-[CC_OP_ADCB ... CC_OP_ADCQ] = USES_CC_DST | USES_CC_SRC | USES_CC_SRC2,
-[CC_OP_SUBB ... CC_OP_SUBQ] = USES_CC_DST | USES_CC_SRC | USES_CC_SRCT,
-[CC_OP_SBBB ... CC_OP_SBBQ] = USES_CC_DST | USES_CC_SRC | USES_CC_SRC2,
-[CC_OP_LOGICB ... CC_OP_LOGICQ] = USES_CC_DST,
-[CC_OP_INCB ... CC_OP_INCQ] = USES_CC_DST | USES_CC_SRC,
-[CC_OP_DECB ... CC_OP_DECQ] = USES_CC_DST | USES_CC_SRC,
-[CC_OP_SHLB ... CC_OP_SHLQ] = USES_CC_DST | USES_CC_SRC,
-[CC_OP_SARB ... CC_OP_SARQ] = USES_CC_DST | USES_CC_SRC,
-[CC_OP_BMILGB ... CC_OP_BMILGQ] = USES_CC_DST | USES_CC_SRC,
-[CC_OP_ADCX] = USES_CC_DST | USES_CC_SRC,
-[CC_OP_ADOX] = USES_CC_SRC | USES_CC_SRC2,
-[CC_OP_ADCOX] = USES_CC_DST | USES_CC_SRC | USES_CC_SRC2,
-[CC_OP_CLR] = 0,
-[CC_OP_POPCNT] = USES_CC_DST,
-};
+static uint8_t cc_op_live(CCOp op)
+{
+switch (op) {
+case CC_OP_CLR:
+return 0;
+case CC_OP_EFLAGS:
+return USES_CC_SRC;
+case CC_OP_POPCNT:
+case CC_OP_LOGICB ... CC_OP_LOGICQ:
+return USES_CC_DST;
+case CC_OP_MULB ... CC_OP_MULQ:
+case CC_OP_ADDB ... CC_OP_ADDQ:
+case CC_OP_INCB ... CC_OP_INCQ:
+case CC_OP_DECB ... CC_OP_DECQ:
+case CC_OP_SHLB ... CC_OP_SHLQ:
+case CC_OP_SARB ... CC_OP_SARQ:
+case CC_OP_BMILGB ... CC_OP_BMILGQ:
+case CC_OP_ADCX:
+return USES_CC_DST | USES_CC_SRC;
+case CC_OP_ADOX:
+return USES_CC_SRC | USES_CC_SRC2;
+case CC_OP_SUBB ... CC_OP_SUBQ:
+return USES_CC_DST | USES_CC_SRC | USES_CC_SRCT;
+case CC_OP_DYNAMIC:
+case CC_OP_ADCB ... CC_OP_ADCQ:
+case CC_OP_SBBB ... CC_OP_SBBQ:
+case CC_OP_ADCOX:
+return USES_CC_DST | USES_CC_SRC | USES_CC_SRC2;
+default:
+g_assert_not_reached();
+}
+}
 
 static void set_cc_op_1(DisasContext *s, CCOp op, bool dirty)
 {
@@ -320,7 +332,7 @@ static void set_cc_op_1(DisasContext *s, CCOp op, bool 
dirty)
 }
 
 /* Discard CC computation that will no longer be used.  */
-dead = cc_op_live[s->cc_op] & ~cc_op_live[op];
+dead = cc_op_live(s->cc_op) & ~cc_op_live(op);
 if (dead & USES_CC_DST) {
 tcg_gen_discard_tl(cpu_cc_dst);
 }
@@ -816,7 +828,7 @@ static void gen_mov_eflags(DisasContext *s, TCGv reg)
 src2 = cpu_cc_src2;
 
 /* Take care to not read values that are not live.  */
-live = cc_op_live[s->cc_op] & ~USES_CC_SRCT;
+live = cc_op_live(s->cc_op) & ~USES_CC_SRCT;
 dead = live ^ (USES_CC_DST | USES_CC_SRC | USES_CC_SRC2);
 if (dead) {
 TCGv zero = tcg_constant_tl(0);
diff --git a/target/i386/tcg/decode-new.c.inc b/target/i386/tcg/decode-new.c.inc
index 0d846c32c2..ba4b8d16f9 100644
--- a/target/i386/tcg/decode-new.c.inc
+++ b/target/i386/tcg/decode-new.c.inc
@@ -2792,7 +2792,7 @@ static void disas_insn(DisasContext *s, CPUState *cpu)
 tcg_gen_mov_i32(cpu_cc_op, decode.cc_op_dynamic);
 }
 set_cc_op(s, decode.cc_op);
-cc_live = cc_op_live[decode.cc_op];
+cc_live = cc_op_live(decode.cc_op);
 } else {
 cc_live = 0;
 }
diff --git a/target/i386/tcg/emit.c.inc b/target/i386/tcg/emit.c.inc
index fc7477833b..38b399783e 100644
--- a/target/i386/tcg/emit.c.inc
+++ b/target/i386/tcg/emit.c.inc
@@ -3531,18 +3531,20 @@ static void gen_shift_dynamic_flags(DisasContext *s, 
X86DecodedInsn *decode, TCG
 {
 TCGv_i32 count32 = tcg_temp_new_i32();
 TCGv_i32 old_cc_op;
+uint8_t live;
 
 decode->cc_op = CC_OP_DYNAMIC;
 decode->cc_op_dynamic = tcg_temp_new_i32();
 
 assert(decode->cc_dst == s->T0);
-if (cc_op_live[s->cc_op] & USES_CC_DST) {
+live = cc_op_live(s->cc_op);
+if (live & USES_CC_DST) {
 decode->cc_dst = tcg_temp_new();
 tcg_gen_movcond_tl(TCG_COND_EQ, decode->cc_dst, count, 
tcg_constant_tl(0),
cpu_cc_dst, s->T0);
 }
 
-if (cc_op_live[s->cc_op] & USES_CC_SRC) {
+if (live & USES_CC_SRC) {
 tcg_gen_movcond_tl(TCG_COND_EQ, decode->cc_src, count, 
tcg_constant_tl(0),
cpu_cc_src, decode->cc_src);
 }
-- 
2.34.1

[PATCH 3/5] target/i386: Rearrange CCOp

2024-06-30 Thread Richard Henderson

Define CC_OP_{FIRST,LAST}_BWLQ.  Remove CC_OP_NB.
Give the first few enumerators explicit integer constants.
Move CC_OP_POPCNT up in the enumeration; remove unused
CC_OP_POPCNT*__ placeholders.  Align the BWLQ enumerators.

This will be used to simplify ((op - CC_OP_*B) & 3).

Signed-off-by: Richard Henderson 
---
 target/i386/cpu.h | 44 
 1 file changed, 24 insertions(+), 20 deletions(-)

diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 29daf37048..df4272fdae 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -1270,14 +1270,27 @@ uint64_t x86_cpu_get_supported_feature_word(FeatureWord 
w,
  * are only needed for conditional branches.
  */
 typedef enum {
-CC_OP_DYNAMIC, /* must use dynamic code to get cc_op */
-CC_OP_EFLAGS,  /* all cc are explicitly computed, CC_SRC = flags */
-CC_OP_ADCX, /* CC_DST = C, CC_SRC = rest.  */
-CC_OP_ADOX, /* CC_SRC2 = O, CC_SRC = rest.  */
-CC_OP_ADCOX, /* CC_DST = C, CC_SRC2 = O, CC_SRC = rest.  */
-CC_OP_CLR, /* Z and P set, all other flags clear.  */
+CC_OP_DYNAMIC = 0, /* must use dynamic code to get cc_op */
+CC_OP_EFLAGS = 1,  /* all cc are explicitly computed, CC_SRC = flags */
+CC_OP_ADCX = 2,/* CC_DST = C, CC_SRC = rest.  */
+CC_OP_ADOX = 3,/* CC_SRC2 = O, CC_SRC = rest.  */
+CC_OP_ADCOX = 4,   /* CC_DST = C, CC_SRC2 = O, CC_SRC = rest.  */
+CC_OP_CLR = 5, /* Z and P set, all other flags clear.  */
 
-CC_OP_MULB, /* modify all flags, C, O = (CC_SRC != 0) */
+/*
+ * Z via CC_DST, all other flags clear.
+ * Treat CC_OP_POPCNT like the other BWLQ ops in making the low bits
+ * equal MO_TL; this gives a value of either 6 or 7.
+ */
+#ifdef TARGET_X86_64
+CC_OP_POPCNT = 7,
+#else
+CC_OP_POPCNT = 6,
+#endif
+
+#define CC_OP_FIRST_BWLQ  CC_OP_POPCNT
+
+CC_OP_MULB = 8, /* modify all flags, C, O = (CC_SRC != 0) */
 CC_OP_MULW,
 CC_OP_MULL,
 CC_OP_MULQ,
@@ -1332,20 +1345,11 @@ typedef enum {
 CC_OP_BMILGL,
 CC_OP_BMILGQ,
 
-/*
- * Note that only CC_OP_POPCNT (i.e. the one with MO_TL size)
- * is used or implemented, because the translation needs
- * to zero-extend CC_DST anyway.
- */
-CC_OP_POPCNTB__, /* Z via CC_DST, all other flags clear.  */
-CC_OP_POPCNTW__,
-CC_OP_POPCNTL__,
-CC_OP_POPCNTQ__,
-CC_OP_POPCNT = sizeof(target_ulong) == 8 ? CC_OP_POPCNTQ__ : 
CC_OP_POPCNTL__,
-
-CC_OP_NB,
+#define CC_OP_LAST_BWLQ CC_OP_BMILGQ
 } CCOp;
-QEMU_BUILD_BUG_ON(CC_OP_NB >= 128);
+
+/* See X86DecodedInsn.cc_op, using int8_t. */
+QEMU_BUILD_BUG_ON(CC_OP_LAST_BWLQ > INT8_MAX);
 
 typedef struct SegmentCache {
 uint32_t selector;
-- 
2.34.1

[PATCH 0/5] target/i386: CCOp cleanups

2024-06-30 Thread Richard Henderson

While debugging #2413, I spent quite a bit of time trying to work
out if the CCOp value was incorrect.  I think the following is a
worthwhile cleanup, isolating potential problems to asserts.


r~


Richard Henderson (5):
  target/i386: Tidy cc_op_str usage
  target/i386: Convert cc_op_live to a function
  target/i386: Rearrange CCOp
  target/i386: Remove default in cc_op_live
  target/i386: Introduce cc_op_size

 target/i386/cpu.h| 44 +
 target/i386/cpu-dump.c   | 17 ---
 target/i386/tcg/translate.c  | 84 +++-
 target/i386/tcg/decode-new.c.inc |  2 +-
 target/i386/tcg/emit.c.inc   |  9 ++--
 5 files changed, 93 insertions(+), 63 deletions(-)

-- 
2.34.1

[PATCH 4/5] target/i386: Remove default in cc_op_live

2024-06-30 Thread Richard Henderson

Now that CC_OP_NB is gone, push the assert after the switch.
This will allow -Wswitch to diagnose missing entries.

Signed-off-by: Richard Henderson 
---
 target/i386/tcg/translate.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c
index e5a8aaf793..e675afca47 100644
--- a/target/i386/tcg/translate.c
+++ b/target/i386/tcg/translate.c
@@ -318,9 +318,8 @@ static uint8_t cc_op_live(CCOp op)
 case CC_OP_SBBB ... CC_OP_SBBQ:
 case CC_OP_ADCOX:
 return USES_CC_DST | USES_CC_SRC | USES_CC_SRC2;
-default:
-g_assert_not_reached();
 }
+g_assert_not_reached();
 }
 
 static void set_cc_op_1(DisasContext *s, CCOp op, bool dirty)
-- 
2.34.1

[PATCH 5/5] target/i386: Introduce cc_op_size

2024-06-30 Thread Richard Henderson

Replace arithmetic on cc_op with a helper function.
Assert that the op has a size and that it is valid
for the configuration.

Signed-off-by: Richard Henderson 
---
 target/i386/tcg/translate.c | 29 ++---
 target/i386/tcg/emit.c.inc  |  3 ++-
 2 files changed, 20 insertions(+), 12 deletions(-)

diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c
index e675afca47..e98bed0805 100644
--- a/target/i386/tcg/translate.c
+++ b/target/i386/tcg/translate.c
@@ -322,6 +322,16 @@ static uint8_t cc_op_live(CCOp op)
 g_assert_not_reached();
 }
 
+static MemOp cc_op_size(CCOp op)
+{
+MemOp size = op & 3;
+
+assert(op >= CC_OP_FIRST_BWLQ && op <= CC_OP_LAST_BWLQ);
+assert(size <= MO_TL);
+
+return size;
+}
+
 static void set_cc_op_1(DisasContext *s, CCOp op, bool dirty)
 {
 int dead;
@@ -884,7 +894,7 @@ static CCPrepare gen_prepare_eflags_c(DisasContext *s, TCGv 
reg)
 switch (s->cc_op) {
 case CC_OP_SUBB ... CC_OP_SUBQ:
 /* (DATA_TYPE)CC_SRCT < (DATA_TYPE)CC_SRC */
-size = s->cc_op - CC_OP_SUBB;
+size = cc_op_size(s->cc_op);
 gen_ext_tl(s->cc_srcT, s->cc_srcT, size, false);
 gen_ext_tl(cpu_cc_src, cpu_cc_src, size, false);
 return (CCPrepare) { .cond = TCG_COND_LTU, .reg = s->cc_srcT,
@@ -892,7 +902,7 @@ static CCPrepare gen_prepare_eflags_c(DisasContext *s, TCGv 
reg)
 
 case CC_OP_ADDB ... CC_OP_ADDQ:
 /* (DATA_TYPE)CC_DST < (DATA_TYPE)CC_SRC */
-size = s->cc_op - CC_OP_ADDB;
+size = cc_op_size(s->cc_op);
 gen_ext_tl(cpu_cc_dst, cpu_cc_dst, size, false);
 gen_ext_tl(cpu_cc_src, cpu_cc_src, size, false);
 return (CCPrepare) { .cond = TCG_COND_LTU, .reg = cpu_cc_dst,
@@ -910,7 +920,7 @@ static CCPrepare gen_prepare_eflags_c(DisasContext *s, TCGv 
reg)
 
 case CC_OP_SHLB ... CC_OP_SHLQ:
 /* (CC_SRC >> (DATA_BITS - 1)) & 1 */
-size = s->cc_op - CC_OP_SHLB;
+size = cc_op_size(s->cc_op);
 return gen_prepare_sign_nz(cpu_cc_src, size);
 
 case CC_OP_MULB ... CC_OP_MULQ:
@@ -918,7 +928,7 @@ static CCPrepare gen_prepare_eflags_c(DisasContext *s, TCGv 
reg)
  .reg = cpu_cc_src };
 
 case CC_OP_BMILGB ... CC_OP_BMILGQ:
-size = s->cc_op - CC_OP_BMILGB;
+size = cc_op_size(s->cc_op);
 gen_ext_tl(cpu_cc_src, cpu_cc_src, size, false);
 return (CCPrepare) { .cond = TCG_COND_EQ, .reg = cpu_cc_src };
 
@@ -972,10 +982,7 @@ static CCPrepare gen_prepare_eflags_s(DisasContext *s, 
TCGv reg)
 case CC_OP_POPCNT:
 return (CCPrepare) { .cond = TCG_COND_NEVER };
 default:
-{
-MemOp size = (s->cc_op - CC_OP_ADDB) & 3;
-return gen_prepare_sign_nz(cpu_cc_dst, size);
-}
+return gen_prepare_sign_nz(cpu_cc_dst, cc_op_size(s->cc_op));
 }
 }
 
@@ -1016,7 +1023,7 @@ static CCPrepare gen_prepare_eflags_z(DisasContext *s, 
TCGv reg)
 return (CCPrepare) { .cond = TCG_COND_ALWAYS };
 default:
 {
-MemOp size = (s->cc_op - CC_OP_ADDB) & 3;
+MemOp size = cc_op_size(s->cc_op);
 if (size == MO_TL) {
 return (CCPrepare) { .cond = TCG_COND_EQ, .reg = cpu_cc_dst };
 } else {
@@ -1042,7 +1049,7 @@ static CCPrepare gen_prepare_cc(DisasContext *s, int b, 
TCGv reg)
 switch (s->cc_op) {
 case CC_OP_SUBB ... CC_OP_SUBQ:
 /* We optimize relational operators for the cmp/jcc case.  */
-size = s->cc_op - CC_OP_SUBB;
+size = cc_op_size(s->cc_op);
 switch (jcc_op) {
 case JCC_BE:
 gen_ext_tl(s->cc_srcT, s->cc_srcT, size, false);
@@ -3176,7 +3183,7 @@ static void disas_insn_old(DisasContext *s, CPUState 
*cpu, int b)
CC_DST alone, setting CC_SRC, and using a CC_OP_SAR of the
same width.  */
 tcg_gen_mov_tl(cpu_cc_src, s->tmp4);
-set_cc_op(s, ((s->cc_op - CC_OP_MULB) & 3) + CC_OP_SARB);
+set_cc_op(s, CC_OP_SARB + cc_op_size(s->cc_op));
 break;
 default:
 /* Otherwise, generate EFLAGS and replace the C bit.  */
diff --git a/target/i386/tcg/emit.c.inc b/target/i386/tcg/emit.c.inc
index 38b399783e..e9d5d196ce 100644
--- a/target/i386/tcg/emit.c.inc
+++ b/target/i386/tcg/emit.c.inc
@@ -3106,7 +3106,8 @@ static bool gen_eflags_adcox(DisasContext *s, 
X86DecodedInsn *decode, bool want_
  * bit, we might as well fish CF out of EFLAGS and save a shift.
  */
 if (want_carry && (!need_flags || s->cc_op == CC_OP_SHLB + MO_TL)) {
-tcg_gen_shri_tl(decode->cc_dst, cpu_cc_src, (8 << (s->cc_op - 
CC_OP_SHLB)) - 1);
+MemOp size = cc_op_size(s->cc_op);
+tcg_gen_shri_tl(decode->cc_dst, cpu_cc_src, (8 << size) - 1);
 got_cf = true;
 }
 gen_mov_eflags(s, decode->cc_src);
-- 
2.34.1

[PATCH 1/5] target/i386: Tidy cc_op_str usage

2024-06-30 Thread Richard Henderson

Make const.  Use the read-only strings directly; do not copy
them into an on-stack buffer with snprintf.  Allow for holes
in the cc_op_str array, now present with CC_OP_POPCNT.

Fixes: 460231ad369 ("target/i386: give CC_OP_POPCNT low bits corresponding to 
MO_TL")
Signed-off-by: Richard Henderson 
---
 target/i386/cpu-dump.c | 17 +++--
 1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/target/i386/cpu-dump.c b/target/i386/cpu-dump.c
index 3bb8e44091..dc6723aede 100644
--- a/target/i386/cpu-dump.c
+++ b/target/i386/cpu-dump.c
@@ -27,7 +27,7 @@
 /***/
 /* x86 debug */
 
-static const char *cc_op_str[CC_OP_NB] = {
+static const char * const cc_op_str[] = {
 [CC_OP_DYNAMIC] = "DYNAMIC",
 
 [CC_OP_EFLAGS] = "EFLAGS",
@@ -347,7 +347,6 @@ void x86_cpu_dump_state(CPUState *cs, FILE *f, int flags)
 X86CPU *cpu = X86_CPU(cs);
 CPUX86State *env = >env;
 int eflags, i, nb;
-char cc_op_name[32];
 static const char *seg_name[6] = { "ES", "CS", "SS", "DS", "FS", "GS" };
 
 eflags = cpu_compute_eflags(env);
@@ -456,10 +455,16 @@ void x86_cpu_dump_state(CPUState *cs, FILE *f, int flags)
  env->dr[6], env->dr[7]);
 }
 if (flags & CPU_DUMP_CCOP) {
-if ((unsigned)env->cc_op < CC_OP_NB)
-snprintf(cc_op_name, sizeof(cc_op_name), "%s", 
cc_op_str[env->cc_op]);
-else
-snprintf(cc_op_name, sizeof(cc_op_name), "[%d]", env->cc_op);
+const char *cc_op_name = NULL;
+char cc_op_buf[32];
+
+if ((unsigned)env->cc_op < ARRAY_SIZE(cc_op_str)) {
+cc_op_name = cc_op_str[env->cc_op];
+}
+if (cc_op_name == NULL) {
+snprintf(cc_op_buf, sizeof(cc_op_buf), "[%d]", env->cc_op);
+cc_op_name = cc_op_buf;
+}
 #ifdef TARGET_X86_64
 if (env->hflags & HF_CS64_MASK) {
 qemu_fprintf(f, "CCS=%016" PRIx64 " CCD=%016" PRIx64 " CCO=%s\n",
-- 
2.34.1

[PATCH] tcg/optimize: Fix TCG_COND_TST* simplification of setcond2

2024-06-30 Thread Richard Henderson

Fix a typo in the argument movement.

Cc: qemu-sta...@nongnu.org
Fixes: ceb9ee06b71 ("tcg/optimize: Handle TCG_COND_TST{EQ,NE}")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2413
Signed-off-by: Richard Henderson 
---
 tcg/optimize.c   |  2 +-
 tests/tcg/x86_64/test-2413.c | 30 ++
 2 files changed, 31 insertions(+), 1 deletion(-)
 create mode 100644 tests/tcg/x86_64/test-2413.c

diff --git a/tcg/optimize.c b/tcg/optimize.c
index 8886f7037a..ba16ec27e2 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -2384,7 +2384,7 @@ static bool fold_setcond2(OptContext *ctx, TCGOp *op)
 
 case TCG_COND_TSTEQ:
 case TCG_COND_TSTNE:
-if (arg_is_const_val(op->args[2], 0)) {
+if (arg_is_const_val(op->args[3], 0)) {
 goto do_setcond_high;
 }
 if (arg_is_const_val(op->args[4], 0)) {
diff --git a/tests/tcg/x86_64/test-2413.c b/tests/tcg/x86_64/test-2413.c
new file mode 100644
index 00..a0e4d25093
--- /dev/null
+++ b/tests/tcg/x86_64/test-2413.c
@@ -0,0 +1,30 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/* Copyright 2024 Linaro, Ltd. */
+/* See https://gitlab.com/qemu-project/qemu/-/issues/2413 */
+
+#include 
+
+void test(unsigned long *a, unsigned long *d, unsigned long c)
+{
+asm("xorl %%eax, %%eax\n\t"
+"xorl %%edx, %%edx\n\t"
+"testb $0x20, %%cl\n\t"
+"sete %%al\n\t"
+"setne %%dl\n\t"
+"shll %%cl, %%eax\n\t"
+"shll %%cl, %%edx\n\t"
+: "=a"(*a), "=d"(*d)
+: "c"(c));
+}
+
+int main(void)
+{
+long a, c, d;
+
+for (c = 0; c < 64; c++) {
+test(, , c);
+assert(a == (c & 0x20 ? 0 : 1u << (c & 0x1f)));
+assert(d == (c & 0x20 ? 1u << (c & 0x1f) : 0));
+}
+return 0;
+}
-- 
2.34.1

[PATCH v2 1/2] vfio/display: Fix potential memleak of edid info

2024-06-30 Thread Zhenzhong Duan

EDID related device region info is leaked in vfio_display_edid_init()
error path and VFIODisplay destroying path.

Fixes: 08479114b0de ("vfio/display: add edid support.")
Signed-off-by: Zhenzhong Duan 
---
 hw/vfio/display.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/hw/vfio/display.c b/hw/vfio/display.c
index 661e921616..9c57fd3888 100644
--- a/hw/vfio/display.c
+++ b/hw/vfio/display.c
@@ -171,7 +171,9 @@ static void vfio_display_edid_init(VFIOPCIDevice *vdev)
 
 err:
 trace_vfio_display_edid_write_error();
+g_free(dpy->edid_info);
 g_free(dpy->edid_regs);
+dpy->edid_info = NULL;
 dpy->edid_regs = NULL;
 return;
 }
@@ -182,6 +184,7 @@ static void vfio_display_edid_exit(VFIODisplay *dpy)
 return;
 }
 
+g_free(dpy->edid_info);
 g_free(dpy->edid_regs);
 g_free(dpy->edid_blob);
 timer_free(dpy->edid_link_timer);
-- 
2.34.1

[PATCH v2 0/2] Misc fixes on vfio display

2024-06-30 Thread Zhenzhong Duan

Hi,

This is trying to address an issue Cédric found.
See https://www.mail-archive.com/qemu-devel@nongnu.org/msg1043142.html
While looking into it, also found a potential memory leak.

I'm sorry that I didn't find how to test this fix, because it looks
a GFX card is needed. Any idea on how to test or help test are quite
appreciated.

Thanks
Zhenzhong

v2:
- set dpy->edid_info to NULL in vfio_display_edid_init() err path (Marc-André)
- remove a wrongly added g_free(*info) in vfio_get_dev_region_info() 
(Marc-André)
- add R-B on patch2


Zhenzhong Duan (2):
  vfio/display: Fix potential memleak of edid info
  vfio/display: Fix vfio_display_edid_init() error path

 hw/vfio/display.c | 16 ++--
 1 file changed, 10 insertions(+), 6 deletions(-)

-- 
2.34.1

[PATCH v2 2/2] vfio/display: Fix vfio_display_edid_init() error path

2024-06-30 Thread Zhenzhong Duan

vfio_display_edid_init() can fail for many reasons and return silently.
It would be good to report the error.

Old mdev driver may not support vfio edid region and we allow to go
through in this case.

vfio_display_edid_update() isn't changed because it can be called at
runtime when UI changes (i.e. window resize).

Fixes: 08479114b0de ("vfio/display: add edid support.")
Suggested-by: Cédric Le Goater 
Signed-off-by: Zhenzhong Duan 
Reviewed-by: Marc-André Lureau 
---
 hw/vfio/display.c | 13 +++--
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/hw/vfio/display.c b/hw/vfio/display.c
index 9c57fd3888..ea87830fe0 100644
--- a/hw/vfio/display.c
+++ b/hw/vfio/display.c
@@ -124,7 +124,7 @@ static void vfio_display_edid_ui_info(void *opaque, 
uint32_t idx,
 }
 }
 
-static void vfio_display_edid_init(VFIOPCIDevice *vdev)
+static bool vfio_display_edid_init(VFIOPCIDevice *vdev, Error **errp)
 {
 VFIODisplay *dpy = vdev->dpy;
 int fd = vdev->vbasedev.fd;
@@ -135,7 +135,8 @@ static void vfio_display_edid_init(VFIOPCIDevice *vdev)
VFIO_REGION_SUBTYPE_GFX_EDID,
>edid_info);
 if (ret) {
-return;
+/* Failed to get GFX edid info, allow to go through without edid. */
+return true;
 }
 
 trace_vfio_display_edid_available();
@@ -167,15 +168,16 @@ static void vfio_display_edid_init(VFIOPCIDevice *vdev)
 vfio_display_edid_link_up, vdev);
 
 vfio_display_edid_update(vdev, true, 0, 0);
-return;
+return true;
 
 err:
+error_setg(errp, "vfio: failed to read GFX edid field");
 trace_vfio_display_edid_write_error();
 g_free(dpy->edid_info);
 g_free(dpy->edid_regs);
 dpy->edid_info = NULL;
 dpy->edid_regs = NULL;
-return;
+return false;
 }
 
 static void vfio_display_edid_exit(VFIODisplay *dpy)
@@ -368,8 +370,7 @@ static bool vfio_display_dmabuf_init(VFIOPCIDevice *vdev, 
Error **errp)
 return false;
 }
 }
-vfio_display_edid_init(vdev);
-return true;
+return vfio_display_edid_init(vdev, errp);
 }
 
 static void vfio_display_dmabuf_exit(VFIODisplay *dpy)
-- 
2.34.1

Re: [PULL 00/16] Trivial patches for 2024-06-30

2024-06-30 Thread Richard Henderson


On 6/30/24 09:53, Michael Tokarev wrote:

The following changes since commit 3665dd6bb9043bef181c91e2dce9e1efff47ed51:

   Merge tag 'for-upstream' ofhttps://gitlab.com/bonzini/qemu  into staging 
(2024-06-28 16:09:38 -0700)

are available in the Git repository at:

   https://gitlab.com/mjt0k/qemu.git  tags/pull-trivial-patches

for you to fetch changes up to f22855dffdbc2906f744b5bcfea869cbb66b8fb2:

   hw/core/loader: gunzip(): fix memory leak on error path (2024-06-30 19:51:44 
+0300)


trivial patches for 2024-06-30


Applied, thanks.  Please update https://wiki.qemu.org/ChangeLog/9.1 as 
appropriate.


r~

Re: [PATCH v3 1/4] hw/intc: Remove loongarch_ipi.c

2024-06-30 Thread maobibo


Hi Philippe,

On 2024/6/27 下午9:02, Philippe Mathieu-Daudé wrote:

On 27/6/24 04:44, gaosong wrote:

在 2024/6/26 下午8:10, Philippe Mathieu-Daudé 写道:

Hi Bibo,

On 26/6/24 06:11, maobibo wrote:



On 2024/6/5 上午10:15, Jiaxun Yang wrote:

It was missed out in previous commit.

Fixes: b4a12dfc2132 ("hw/intc/loongarch_ipi: Rename as loongson_ipi")
Signed-off-by: Jiaxun Yang 
---
  hw/intc/loongarch_ipi.c | 347 


  1 file changed, 347 deletions(-)




-static void loongarch_ipi_realize(DeviceState *dev, Error **errp)
-{
-    LoongArchIPI *s = LOONGARCH_IPI(dev);
-    SysBusDevice *sbd = SYS_BUS_DEVICE(dev);
-    int i;
-
-    if (s->num_cpu == 0) {
-    error_setg(errp, "num-cpu must be at least 1");
-    return;
-    }
-
-    memory_region_init_io(>ipi_iocsr_mem, OBJECT(dev), 
_ipi_ops,

-  s, "loongarch_ipi_iocsr", 0x48);
-
-    /* loongarch_ipi_iocsr performs re-entrant IO through ipi_send */
-    s->ipi_iocsr_mem.disable_reentrancy_guard = true;
-
-    sysbus_init_mmio(sbd, >ipi_iocsr_mem);
-
-    memory_region_init_io(>ipi64_iocsr_mem, OBJECT(dev),
-  _ipi64_ops,
-  s, "loongarch_ipi64_iocsr", 0x118);
-    sysbus_init_mmio(sbd, >ipi64_iocsr_mem);

It is different with existing implementation.

With hw/intc/loongson_ipi.c, every vcpu has one ipi_mmio_mem, 
however on loongarch ipi machine, there is no ipi_mmio_mem memory 
region.


So if machine has 256 vcpus, there will be 256 ipi_mmio_mem memory 
regions. In function sysbus_init_mmio(), memory region can not exceed

QDEV_MAX_MMIO (32).  With so many memory regions, it slows down memory
region search speed also.

void sysbus_init_mmio(SysBusDevice *dev, MemoryRegion *memory)
{
 int n;

 assert(dev->num_mmio < QDEV_MAX_MMIO);
 n = dev->num_mmio++;
 dev->mmio[n].addr = -1;
 dev->mmio[n].memory = memory;
}

Can we revert this patch? We want to do production usable by real 
users rather than show pure technology.


Since commit b4a12dfc2132 this file is not built/tested anymore:

-specific_ss.add(when: 'CONFIG_LOONGARCH_IPI', if_true: 
files('loongarch_ipi.c'))
+specific_ss.add(when: 'CONFIG_LOONGSON_IPI', if_true: 
files('loongson_ipi.c'))


We don't want to maintain dead code.


Hi,  Philippe

It is commmit 49eba52a5 that causes Loongarch to fail to start.

What bibao means is that LoongArch and mips do not share 
"lloongson_ipi.c".

This avoids mutual influence.


My understanding of the next sentence is as follows.

Nowadays, most of the open source operating systems in China use the 
latest QEMU.
e.g. OpenEuler/OpenAnolis/OpenCloudOS, etc. These operating systems 
have a large
  number of real users. so we need to maintain the stability of the 
LoongArch architecture
of the QEMU community as much as possible. This will reduce 
maintenance costs.


I'm glad there is a such large number of users :)

Therefore, we would like to restore the 'loongarch_ipi.c' file. what 
do you think?


My preference on "reducing maintenance cost" is code reuse instead of
duplication.

Before reverting, lets try to fix the issue. I suggested a v2:
https://lore.kernel.org/qemu-devel/20240627125819.62779-2-phi...@linaro.org

Sorry for late reply.

How about split loongson_ipi.c into 
loongson_ipi_base.c/loongson_ipi_loongson.c/loongson_ipi_loongarch.c,


File loongson_ipi_base.c contains the common code, loongson_ipi_xxx.c 
contains arch specific. Soon we will submit irqchip in kernel function,
it will be much different for architectures, since ioctl command is 
different for two architectures to save/restore ipi registers.


Regards
Bibo Mao



That said, both current patch and the suggested fix pass our
Avocado CI test suite (running tests/avocado/machine_loongarch.py).

Is your use case not covered? Could you expand the CI tests so
we don't hit this problem again? (Also we could reproduce and
fix more easily).

Thanks,

Phil.

Re: [PATCH 1/2] vfio/display: Fix potential memleak of edid info

2024-06-30 Thread Duan, Zhenzhong


Hi,

On 6/29/2024 8:15 PM, Marc-André Lureau wrote:

Hi

On Fri, Jun 28, 2024 at 1:32 PM Zhenzhong Duan 
 wrote:


EDID related device region info is leaked in three paths:
1. In vfio_get_dev_region_info(), when edid info isn't find, the last
device region info is leaked.
2. In vfio_display_edid_init() error path, edid info is leaked.
3. In VFIODisplay destroying path, edid info is leaked.

Fixes: 08479114b0de ("vfio/display: add edid support.")
Signed-off-by: Zhenzhong Duan 
---
 hw/vfio/display.c | 2 ++
 hw/vfio/helpers.c | 1 +
 2 files changed, 3 insertions(+)

diff --git a/hw/vfio/display.c b/hw/vfio/display.c
index 661e921616..5926bd6628 100644
--- a/hw/vfio/display.c
+++ b/hw/vfio/display.c
@@ -171,6 +171,7 @@ static void
vfio_display_edid_init(VFIOPCIDevice *vdev)

 err:
     trace_vfio_display_edid_write_error();
+    g_free(dpy->edid_info);


It would be better to set it to NULL.

Will do.


     g_free(dpy->edid_regs);
     dpy->edid_regs = NULL;
     return;
@@ -182,6 +183,7 @@ static void vfio_display_edid_exit(VFIODisplay
*dpy)
         return;
     }

+    g_free(dpy->edid_info);
     g_free(dpy->edid_regs);
     g_free(dpy->edid_blob);
     timer_free(dpy->edid_link_timer);
diff --git a/hw/vfio/helpers.c b/hw/vfio/helpers.c
index b14edd46ed..3dd32b26a4 100644
--- a/hw/vfio/helpers.c
+++ b/hw/vfio/helpers.c
@@ -586,6 +586,7 @@ int vfio_get_dev_region_info(VFIODevice
*vbasedev, uint32_t type,
         g_free(*info);
     }

+    g_free(*info);


This seems incorrect, it is freed at the end of the loop above if it 
didn't retun.


Good catch! Will remove it.

Thanks

Zhenzhong



     *info = NULL;
     return -ENODEV;
 }
-- 
2.34.1





--
Marc-André Lureau

[RFC PATCH] target/ppc: Inline most of dcbz helper

2024-06-30 Thread BALATON Zoltan

This is an RFC patch, not finished, just to show the idea and test
this approach. I'm not sure it's correct but I'm sure it can be
improved so comments are requested.

The test case I've used came out of a discussion about very slow
access to VRAM of a graphics card passed through with vfio the reason
for which is still not clear but it was already known that dcbz is
often used by MacOS and AmigaOS for clearing memory and to avoid
reading values about to be overwritten which is faster on real CPU but
was found to be slower on QEMU. The optimised copy routines were
posted here:
https://www.amigans.net/modules/newbb/viewtopic.php?post_id=149123#forumpost149123
and the rest of it I've written to make it a test case is here:
http://zero.eik.bme.hu/~balaton/qemu/vramcopy.tar.xz
Replace the body of has_altivec() with just "return false". Sorry for
only giving pieces but the code posted above has a copyright that does
not allow me to include it in the test. This is not measuring VRAM
access now just memory copy but shows the effect of dcbz. I've got
these results with this patch:

Linux user master:  Linux user patch:
byte loop: 2.2 sec  byte loop: 2.2 sec
memcpy: 2.19 secmemcpy: 2.19 sec
copyToVRAMNoAltivec: 1.7 seccopyToVRAMNoAltivec: 1.71 sec
copyToVRAMAltivec: 2.13 sec copyToVRAMAltivec: 2.12 sec
copyFromVRAMNoAltivec: 5.11 sec copyFromVRAMNoAltivec: 2.79 sec
copyFromVRAMAltivec: 5.87 sec   copyFromVRAMAltivec: 3.26 sec

Linux system master:Linux system patch:
byte loop: 5.86 sec byte loop: 5.9 sec
memcpy: 5.45 secmemcpy: 5.47 sec
copyToVRAMNoAltivec: 2.51 sec   copyToVRAMNoAltivec: 2.53 sec
copyToVRAMAltivec: 3.84 sec copyToVRAMAltivec: 3.85 sec
copyFromVRAMNoAltivec: 6.11 sec copyFromVRAMNoAltivec: 3.92 sec
copyFromVRAMAltivec: 7.22 sec   copyFromVRAMAltivec: 5.51 sec

It could probably be further optimised with using vector instuctions
(dcbz_size is between 32 and 128) or by eliminating the check left in
the helper for 970 but I don't know how to do those. (Also the series
that convert AltiVec to use 128 bit access may help but I haven't
tested that, only trying to optimise dcbz here,)

Signed-off-by: BALATON Zoltan 
---
 target/ppc/helper.h |  1 +
 target/ppc/mem_helper.c | 14 ++
 target/ppc/translate.c  | 34 --
 3 files changed, 43 insertions(+), 6 deletions(-)

diff --git a/target/ppc/helper.h b/target/ppc/helper.h
index 76b8f25c77..e49681c25b 100644
--- a/target/ppc/helper.h
+++ b/target/ppc/helper.h
@@ -46,6 +46,7 @@ DEF_HELPER_FLAGS_3(stmw, TCG_CALL_NO_WG, void, env, tl, i32)
 DEF_HELPER_4(lsw, void, env, tl, i32, i32)
 DEF_HELPER_5(lswx, void, env, tl, i32, i32, i32)
 DEF_HELPER_FLAGS_4(stsw, TCG_CALL_NO_WG, void, env, tl, i32, i32)
+DEF_HELPER_FLAGS_2(dcbz_size, TCG_CALL_NO_WG_SE, tl, env, i32)
 DEF_HELPER_FLAGS_3(dcbz, TCG_CALL_NO_WG, void, env, tl, i32)
 DEF_HELPER_FLAGS_3(dcbzep, TCG_CALL_NO_WG, void, env, tl, i32)
 DEF_HELPER_FLAGS_2(icbi, TCG_CALL_NO_WG, void, env, tl)
diff --git a/target/ppc/mem_helper.c b/target/ppc/mem_helper.c
index f88155ad45..b06cb2d00e 100644
--- a/target/ppc/mem_helper.c
+++ b/target/ppc/mem_helper.c
@@ -270,6 +270,20 @@ void helper_stsw(CPUPPCState *env, target_ulong addr, 
uint32_t nb,
 }
 }
 
+target_ulong helper_dcbz_size(CPUPPCState *env, uint32_t opcode)
+{
+target_ulong dcbz_size = env->dcache_line_size;
+
+#if defined(TARGET_PPC64)
+/* Check for dcbz vs dcbzl on 970 */
+if (env->excp_model == POWERPC_EXCP_970 &&
+!(opcode & 0x0020) && ((env->spr[SPR_970_HID5] >> 7) & 0x3) == 1) {
+dcbz_size = 32;
+}
+#endif
+return dcbz_size;
+}
+
 static void dcbz_common(CPUPPCState *env, target_ulong addr,
 uint32_t opcode, bool epid, uintptr_t retaddr)
 {
diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index 0bc16d7251..49221b8303 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -4445,14 +4445,36 @@ static void gen_dcblc(DisasContext *ctx)
 /* dcbz */
 static void gen_dcbz(DisasContext *ctx)
 {
-TCGv tcgv_addr;
-TCGv_i32 tcgv_op;
+TCGv addr, mask, dcbz_size, t0;
+TCGv_i32 op = tcg_constant_i32(ctx->opcode & 0x03FF000);
+TCGv_i64 z64 = tcg_constant_i64(0);
+TCGv_i128 z128 = tcg_temp_new_i128();
+TCGLabel *l;
+
+addr = tcg_temp_new();
+mask = tcg_temp_new();
+dcbz_size = tcg_temp_new();
+t0 = tcg_temp_new();
+l = gen_new_label();
 
 gen_set_access_type(ctx, ACCESS_CACHE);
-tcgv_addr = tcg_temp_new();
-tcgv_op = tcg_constant_i32(ctx->opcode & 0x03FF000);
-gen_addr_reg_index(ctx, tcgv_addr);
-gen_helper_dcbz(tcg_env, tcgv_addr, tcgv_op);
+gen_helper_dcbz_size(dcbz_size, tcg_env, op);
+tcg_gen_mov_tl(mask, dcbz_size);
+tcg_gen_subi_tl(mask, mask, 1);
+tcg_gen_not_tl(mask, mask);
+

RE: [PATCH] hw/usb: Fix memory leak in musb_reset()

2024-06-30 Thread Xingtao Yao (Fujitsu)




> -Original Message-
> From: qemu-devel-bounces+yaoxt.fnst=fujitsu@nongnu.org
>  On Behalf Of Zheyu
> Ma
> Sent: Monday, July 1, 2024 12:32 AM
> Cc: Zheyu Ma ; qemu-devel@nongnu.org
> Subject: [PATCH] hw/usb: Fix memory leak in musb_reset()
> 
> The musb_reset function was causing a memory leak by not properly freeing
> the memory associated with USBPacket instances before reinitializing them.
> This commit addresses the memory leak by adding calls to usb_packet_cleanup
> for each USBPacket instance before reinitializing them with usb_packet_init.
> 
> Asan log:
> 
> =2970623==ERROR: LeakSanitizer: detected memory leaks
> Direct leak of 256 byte(s) in 16 object(s) allocated from:
> #0 0x561e20629c3d in malloc
> llvm/compiler-rt/lib/asan/asan_malloc_linux.cpp:129:3
> #1 0x7fee91885738 in g_malloc
> (/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x5e738)
> #2 0x561e21b4d0e1 in usb_packet_init hw/usb/core.c:531:5
> #3 0x561e21c5016b in musb_reset hw/usb/hcd-musb.c:372:9
> #4 0x561e21c502a9 in musb_init hw/usb/hcd-musb.c:385:5
> #5 0x561e21c893ef in tusb6010_realize hw/usb/tusb6010.c:827:15
> #6 0x561e23443355 in device_set_realized hw/core/qdev.c:510:13
> #7 0x561e2346ac1b in property_set_bool qom/object.c:2354:5
> #8 0x561e23463895 in object_property_set qom/object.c:1463:5
> #9 0x561e23477909 in object_property_set_qobject qom/qom-qobject.c:28:10
> #10 0x561e234645ed in object_property_set_bool qom/object.c:1533:15
> #11 0x561e2343c830 in qdev_realize hw/core/qdev.c:291:12
> #12 0x561e2343c874 in qdev_realize_and_unref hw/core/qdev.c:298:11
> #13 0x561e20ad5091 in sysbus_realize_and_unref hw/core/sysbus.c:261:12
> #14 0x561e22553283 in n8x0_usb_setup hw/arm/nseries.c:800:5
> #15 0x561e2254e99b in n8x0_init hw/arm/nseries.c:1356:5
> #16 0x561e22561170 in n810_init hw/arm/nseries.c:1418:5
> 
> Signed-off-by: Zheyu Ma 
> ---
>  hw/usb/hcd-musb.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/hw/usb/hcd-musb.c b/hw/usb/hcd-musb.c
> index 6dca373cb1..0300aeaec6 100644
> --- a/hw/usb/hcd-musb.c
> +++ b/hw/usb/hcd-musb.c
> @@ -368,6 +368,8 @@ void musb_reset(MUSBState *s)
>  s->ep[i].maxp[1] = 0x40;
>  s->ep[i].musb = s;
>  s->ep[i].epnum = i;
> +usb_packet_cleanup(>ep[i].packey[0].p);
> +usb_packet_cleanup(>ep[i].packey[1].p);
>  usb_packet_init(>ep[i].packey[0].p);
>  usb_packet_init(>ep[i].packey[1].p);
>  }
> --
> 2.34.1
> 
Reviewed-by: Xingtao Yao

Re: [PULL 0/1] ufs queue

2024-06-30 Thread Richard Henderson


On 6/29/24 20:52, Jeuk Kim wrote:

From: Jeuk Kim

The following changes since commit 3665dd6bb9043bef181c91e2dce9e1efff47ed51:

   Merge tag 'for-upstream' ofhttps://gitlab.com/bonzini/qemu  into staging 
(2024-06-28 16:09:38 -0700)

are available in the Git repository at:

   https://gitlab.com/jeuk20.kim/qemu.git  tags/pull-ufs-20240630

for you to fetch changes up to e12b11f6f29272ee31ccde6b0db1a10139e87083:

   hw/ufs: Fix potential bugs in MMIO read|write (2024-06-30 12:44:32 +0900)


hw/ufs: fix coverity issue


Applied, thanks.  Please update https://wiki.qemu.org/ChangeLog/9.1 as 
appropriate.


r~

[RFC V1 2/6] migration: VMSTATE_FD

2024-06-30 Thread Steve Sistare

Define VMSTATE_FD for declaring a file descriptor field in a
VMStateDescription.

Signed-off-by: Steve Sistare 
---
 include/migration/vmstate.h |  9 +
 migration/vmstate-types.c   | 32 
 2 files changed, 41 insertions(+)

diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
index f313f2f..a1dfab4 100644
--- a/include/migration/vmstate.h
+++ b/include/migration/vmstate.h
@@ -230,6 +230,7 @@ extern const VMStateInfo vmstate_info_uint8;
 extern const VMStateInfo vmstate_info_uint16;
 extern const VMStateInfo vmstate_info_uint32;
 extern const VMStateInfo vmstate_info_uint64;
+extern const VMStateInfo vmstate_info_fd;
 
 /** Put this in the stream when migrating a null pointer.*/
 #define VMS_NULLPTR_MARKER (0x30U) /* '0' */
@@ -902,6 +903,9 @@ extern const VMStateInfo vmstate_info_qlist;
 #define VMSTATE_UINT64_V(_f, _s, _v)  \
 VMSTATE_SINGLE(_f, _s, _v, vmstate_info_uint64, uint64_t)
 
+#define VMSTATE_FD_V(_f, _s, _v)  \
+VMSTATE_SINGLE(_f, _s, _v, vmstate_info_fd, int32_t)
+
 #ifdef CONFIG_LINUX
 
 #define VMSTATE_U8_V(_f, _s, _v)   \
@@ -936,6 +940,9 @@ extern const VMStateInfo vmstate_info_qlist;
 #define VMSTATE_UINT64(_f, _s)\
 VMSTATE_UINT64_V(_f, _s, 0)
 
+#define VMSTATE_FD(_f, _s)\
+VMSTATE_FD_V(_f, _s, 0)
+
 #ifdef CONFIG_LINUX
 
 #define VMSTATE_U8(_f, _s) \
@@ -1009,6 +1016,8 @@ extern const VMStateInfo vmstate_info_qlist;
 #define VMSTATE_UINT64_TEST(_f, _s, _t)  \
 VMSTATE_SINGLE_TEST(_f, _s, _t, 0, vmstate_info_uint64, uint64_t)
 
+#define VMSTATE_FD_TEST(_f, _s, _t)
\
+VMSTATE_SINGLE_TEST(_f, _s, _t, 0, vmstate_info_fd, int32_t)
 
 #define VMSTATE_TIMER_PTR_TEST(_f, _s, _test) \
 VMSTATE_POINTER_TEST(_f, _s, _test, vmstate_info_timer, QEMUTimer *)
diff --git a/migration/vmstate-types.c b/migration/vmstate-types.c
index e83bfcc..6e45a4a 100644
--- a/migration/vmstate-types.c
+++ b/migration/vmstate-types.c
@@ -314,6 +314,38 @@ const VMStateInfo vmstate_info_uint64 = {
 .put  = put_uint64,
 };
 
+/* File descriptor communicated via SCM_RIGHTS */
+
+static int get_fd(QEMUFile *f, void *pv, size_t size,
+  const VMStateField *field)
+{
+int32_t *v = pv;
+qemu_get_sbe32s(f, v);
+if (*v < 0) {
+return 0;
+}
+*v = qemu_file_get_fd(f);
+return 0;
+}
+
+static int put_fd(QEMUFile *f, void *pv, size_t size,
+  const VMStateField *field, JSONWriter *vmdesc)
+{
+int32_t *v = pv;
+
+qemu_put_sbe32s(f, v);
+if (*v < 0) {
+return 0;
+}
+return qemu_file_put_fd(f, *v);
+}
+
+const VMStateInfo vmstate_info_fd = {
+.name = "fd",
+.get  = get_fd,
+.put  = put_fd,
+};
+
 static int get_nullptr(QEMUFile *f, void *pv, size_t size,
const VMStateField *field)
 
-- 
1.8.3.1

[RFC V1 6/6] migration: cpr-transfer mode

2024-06-30 Thread Steve Sistare

Add the cpr-transfer migration mode.  Usage:
  qemu-system-$arch -machine anon-alloc=memfd ...

  start new QEMU with "-incoming  -cpr-uri "

  Issue commands to old QEMU:
  migrate_set_parameter mode cpr-transfer
  migrate_set_parameter cpr-uri 
  migrate -d 

The migrate command stops the VM, saves CPR state to uri-2, saves
normal migration state to uri-1, and old QEMU enters the postmigrate
state.  The user starts new QEMU on the same host as old QEMU, with the
same arguments as old QEMU, plus the -incoming option.  Guest RAM is
preserved in place, albeit with new virtual addresses in new QEMU.

This mode requires a second migration channel, specified by the
cpr-uri migration property on the outgoing side, and by the cpr-uri
QEMU command-line option on the incoming side.  The channel must
be a type, such as unix socket, that supports SCM_RIGHTS.

Memory-backend objects must have the share=on attribute, but
memory-backend-epc is not supported.  The VM must be started with
the '-machine anon-alloc=memfd' option, which allows anonymous
memory to be transferred in place to the new process.  The memfds
are kept open by sending the descriptors to new QEMU via the
cpr-uri, which must support SCM_RIGHTS, and they are mmap'd
in new QEMU.

Signed-off-by: Steve Sistare 
---
 migration/cpr.c   |  9 -
 migration/migration.c | 37 +
 migration/ram.c   |  1 +
 migration/vmstate-types.c |  5 +++--
 qapi/migration.json   | 26 +-
 stubs/vmstate.c   |  7 +++
 6 files changed, 81 insertions(+), 4 deletions(-)

diff --git a/migration/cpr.c b/migration/cpr.c
index 50c130c..7ac01a9 100644
--- a/migration/cpr.c
+++ b/migration/cpr.c
@@ -58,7 +58,7 @@ static const VMStateDescription vmstate_cpr_fd = {
 VMSTATE_UINT32(namelen, CprFd),
 VMSTATE_VBUFFER_ALLOC_UINT32(name, CprFd, 0, NULL, namelen),
 VMSTATE_INT32(id, CprFd),
-VMSTATE_INT32(fd, CprFd),
+VMSTATE_FD(fd, CprFd),
 VMSTATE_END_OF_LIST()
 }
 };
@@ -172,6 +172,8 @@ int cpr_state_save(Error **errp)
 
 if (mode == MIG_MODE_CPR_EXEC) {
 f = cpr_exec_output(errp);
+} else if (mode == MIG_MODE_CPR_TRANSFER) {
+f = cpr_transfer_output(migrate_cpr_uri(), errp);
 } else {
 return 0;
 }
@@ -209,6 +211,11 @@ int cpr_state_load(Error **errp)
  */
 if (cpr_exec_has_state()) {
 f = cpr_exec_input(errp);
+if (cpr_uri) {
+warn_report("ignoring cpr-uri option for migration mode cpr-exec");
+}
+} else if (cpr_uri) {
+f = cpr_transfer_input(cpr_uri, errp);
 } else {
 return 0;
 }
diff --git a/migration/migration.c b/migration/migration.c
index a4a020e..65a36a6 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -77,6 +77,7 @@ static NotifierWithReturnList 
migration_state_notifiers[MIG_MODE__MAX] = {
 NOTIFIER_ELEM_INIT(migration_state_notifiers, MIG_MODE_NORMAL),
 NOTIFIER_ELEM_INIT(migration_state_notifiers, MIG_MODE_CPR_REBOOT),
 NOTIFIER_ELEM_INIT(migration_state_notifiers, MIG_MODE_CPR_EXEC),
+NOTIFIER_ELEM_INIT(migration_state_notifiers, MIG_MODE_CPR_TRANSFER),
 };
 
 /* Messages sent on the return path from destination to source */
@@ -205,6 +206,12 @@ 
migration_channels_and_transport_compatible(MigrationAddress *addr,
 return false;
 }
 
+if (migrate_mode() == MIG_MODE_CPR_TRANSFER &&
+addr->transport == MIGRATION_ADDRESS_TYPE_FILE) {
+error_setg(errp, "Migration requires streamable transport (eg unix)");
+return false;
+}
+
 return true;
 }
 
@@ -1697,6 +1704,7 @@ bool migrate_mode_is_cpr(MigrationState *s)
 {
 MigMode mode = s->parameters.mode;
 return mode == MIG_MODE_CPR_REBOOT ||
+   mode == MIG_MODE_CPR_TRANSFER ||
mode == MIG_MODE_CPR_EXEC;
 }
 
@@ -2038,6 +2046,12 @@ static bool migrate_prepare(MigrationState *s, bool 
resume, Error **errp)
 return false;
 }
 
+if (migrate_mode() == MIG_MODE_CPR_TRANSFER &&
+!s->parameters.cpr_uri) {
+error_setg(errp, "cpr-transfer mode requires setting cpr-uri");
+return false;
+}
+
 if (migration_is_blocked(errp)) {
 return false;
 }
@@ -2144,6 +2158,29 @@ void qmp_migrate(const char *uri, bool has_channels,
 goto out;
 }
 
+/*
+ * For cpr-transfer mode, the target first reads CPR state, which cannot
+ * complete until cpr_state_save above finishes, then the target creates
+ * the migration channel and listens.  We must wait for the channel to
+ * be created before connecting to it.
+ *
+ * This implementation of waiting is a hack.  It restricts the channel
+ * type, and will loop forever if the target dies.  It should be defined
+ * as a main-loop event that calls connect on the back end.
+ */
+if (s->parameters.mode == MIG_MODE_CPR_TRANSFER) {
+

[RFC V1 3/6] migration: cpr-transfer save and load

2024-06-30 Thread Steve Sistare

Add functions to create a QEMUFile based on a unix URI, for saving or
loading, for use by cpr-transfer mode to preserve CPR state.

Signed-off-by: Steve Sistare 
---
 include/migration/cpr.h  |  3 ++
 migration/cpr-transfer.c | 81 
 migration/meson.build|  1 +
 3 files changed, 85 insertions(+)
 create mode 100644 migration/cpr-transfer.c

diff --git a/include/migration/cpr.h b/include/migration/cpr.h
index c6c60f8..9aae94c 100644
--- a/include/migration/cpr.h
+++ b/include/migration/cpr.h
@@ -32,4 +32,7 @@ bool cpr_exec_has_state(void);
 void cpr_exec_unpersist_state(void);
 void cpr_mig_init(void);
 void cpr_unpreserve_fds(void);
+
+QEMUFile *cpr_transfer_output(const char *uri, Error **errp);
+QEMUFile *cpr_transfer_input(const char *uri, Error **errp);
 #endif
diff --git a/migration/cpr-transfer.c b/migration/cpr-transfer.c
new file mode 100644
index 000..fb9ecd8
--- /dev/null
+++ b/migration/cpr-transfer.c
@@ -0,0 +1,81 @@
+/*
+ * Copyright (c) 2022, 2024 Oracle and/or its affiliates.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "qapi/error.h"
+#include "io/channel-file.h"
+#include "io/channel-socket.h"
+#include "io/net-listener.h"
+#include "migration/cpr.h"
+#include "migration/migration.h"
+#include "migration/savevm.h"
+#include "migration/qemu-file.h"
+#include "migration/vmstate.h"
+
+QEMUFile *cpr_transfer_output(const char *uri, Error **errp)
+{
+g_autoptr(MigrationChannel) channel = NULL;
+QIOChannel *ioc;
+
+if (!migrate_uri_parse(uri, , errp)) {
+return NULL;
+}
+
+if (channel->addr->transport == MIGRATION_ADDRESS_TYPE_SOCKET &&
+channel->addr->u.socket.type == SOCKET_ADDRESS_TYPE_UNIX) {
+
+QIOChannelSocket *sioc = qio_channel_socket_new();
+SocketAddress *saddr = >addr->u.socket;
+
+if (qio_channel_socket_connect_sync(sioc, saddr, errp)) {
+object_unref(OBJECT(sioc));
+return NULL;
+}
+ioc = QIO_CHANNEL(sioc);
+
+} else {
+error_setg(errp, "bad cpr-uri %s; must be unix:", uri);
+return NULL;
+}
+
+qio_channel_set_name(ioc, "cpr-out");
+return qemu_file_new_output(ioc);
+}
+
+QEMUFile *cpr_transfer_input(const char *uri, Error **errp)
+{
+g_autoptr(MigrationChannel) channel = NULL;
+QIOChannel *ioc;
+
+if (!migrate_uri_parse(uri, , errp)) {
+return NULL;
+}
+
+if (channel->addr->transport == MIGRATION_ADDRESS_TYPE_SOCKET &&
+channel->addr->u.socket.type == SOCKET_ADDRESS_TYPE_UNIX) {
+
+QIOChannelSocket *sioc;
+SocketAddress *saddr = >addr->u.socket;
+QIONetListener *listener = qio_net_listener_new();
+
+qio_net_listener_set_name(listener, "cpr-socket-listener");
+if (qio_net_listener_open_sync(listener, saddr, 1, errp) < 0) {
+object_unref(OBJECT(listener));
+return NULL;
+}
+
+sioc = qio_net_listener_wait_client(listener);
+ioc = QIO_CHANNEL(sioc);
+
+} else {
+error_setg(errp, "bad cpr-uri %s; must be unix:", uri);
+return NULL;
+}
+
+qio_channel_set_name(ioc, "cpr-in");
+return qemu_file_new_input(ioc);
+}
diff --git a/migration/meson.build b/migration/meson.build
index dd1d315..f722980 100644
--- a/migration/meson.build
+++ b/migration/meson.build
@@ -15,6 +15,7 @@ system_ss.add(files(
   'channel-block.c',
   'cpr.c',
   'cpr-exec.c',
+  'cpr-transfer.c',
   'dirtyrate.c',
   'exec.c',
   'fd.c',
-- 
1.8.3.1

[RFC V1 1/6] migration: SCM_RIGHTS for QEMUFile

2024-06-30 Thread Steve Sistare

Define functions to put/get file descriptors to/from a QEMUFile, for qio
channels that support SCM_RIGHTS.  Maintain ordering such that
  put(A), put(fd), put(B)
followed by
  get(A), get(fd), get(B)
always succeeds.  Other get orderings may succeed but are not guaranteed.

Signed-off-by: Steve Sistare 
---
 migration/qemu-file.c  | 83 +++---
 migration/qemu-file.h  |  2 ++
 migration/trace-events |  2 ++
 3 files changed, 83 insertions(+), 4 deletions(-)

diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index b6d2f58..424c27d 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -37,6 +37,11 @@
 #define IO_BUF_SIZE 32768
 #define MAX_IOV_SIZE MIN_CONST(IOV_MAX, 64)
 
+typedef struct FdEntry {
+QTAILQ_ENTRY(FdEntry) entry;
+int fd;
+} FdEntry;
+
 struct QEMUFile {
 QIOChannel *ioc;
 bool is_writable;
@@ -51,6 +56,9 @@ struct QEMUFile {
 
 int last_error;
 Error *last_error_obj;
+
+bool fd_pass;
+QTAILQ_HEAD(, FdEntry) fds;
 };
 
 /*
@@ -109,6 +117,8 @@ static QEMUFile *qemu_file_new_impl(QIOChannel *ioc, bool 
is_writable)
 object_ref(ioc);
 f->ioc = ioc;
 f->is_writable = is_writable;
+f->fd_pass = qio_channel_has_feature(ioc, QIO_CHANNEL_FEATURE_FD_PASS);
+QTAILQ_INIT(>fds);
 
 return f;
 }
@@ -310,6 +320,10 @@ static ssize_t coroutine_mixed_fn 
qemu_fill_buffer(QEMUFile *f)
 int len;
 int pending;
 Error *local_error = NULL;
+g_autofree int *fds = NULL;
+size_t nfd = 0;
+int **pfds = f->fd_pass ?  : NULL;
+size_t *pnfd = f->fd_pass ?  : NULL;
 
 assert(!qemu_file_is_writable(f));
 
@@ -325,10 +339,9 @@ static ssize_t coroutine_mixed_fn 
qemu_fill_buffer(QEMUFile *f)
 }
 
 do {
-len = qio_channel_read(f->ioc,
-   (char *)f->buf + pending,
-   IO_BUF_SIZE - pending,
-   _error);
+struct iovec iov = { f->buf + pending, IO_BUF_SIZE - pending };
+len = qio_channel_readv_full(f->ioc, , 1, pfds, pnfd, 0,
+ _error);
 if (len == QIO_CHANNEL_ERR_BLOCK) {
 if (qemu_in_coroutine()) {
 qio_channel_yield(f->ioc, G_IO_IN);
@@ -348,9 +361,65 @@ static ssize_t coroutine_mixed_fn 
qemu_fill_buffer(QEMUFile *f)
 qemu_file_set_error_obj(f, len, local_error);
 }
 
+for (int i = 0; i < nfd; i++) {
+FdEntry *fde = g_new0(FdEntry, 1);
+fde->fd = fds[i];
+QTAILQ_INSERT_TAIL(>fds, fde, entry);
+}
+
 return len;
 }
 
+int qemu_file_put_fd(QEMUFile *f, int fd)
+{
+int ret = 0;
+QIOChannel *ioc = qemu_file_get_ioc(f);
+Error *err = NULL;
+struct iovec iov = { (void *)" ", 1 };
+
+/*
+ * Send a dummy byte so qemu_fill_buffer on the receiving side does not
+ * fail with a len=0 error.  Flush first to maintain ordering wrt other
+ * data.
+ */
+
+qemu_fflush(f);
+if (qio_channel_writev_full(ioc, , 1, , 1, 0, ) < 1) {
+error_report_err(error_copy(err));
+qemu_file_set_error_obj(f, -EIO, err);
+ret = -1;
+}
+trace_qemu_file_put_fd(f->ioc->name, fd, ret);
+return 0;
+}
+
+int qemu_file_get_fd(QEMUFile *f)
+{
+int fd = -1;
+FdEntry *fde;
+
+if (!f->fd_pass) {
+Error *err = NULL;
+error_setg(, "%s does not support fd passing", f->ioc->name);
+error_report_err(error_copy(err));
+qemu_file_set_error_obj(f, -EIO, err);
+goto out;
+}
+
+/* Force the dummy byte and its fd passenger to appear. */
+qemu_peek_byte(f, 0);
+
+fde = QTAILQ_FIRST(>fds);
+if (fde) {
+qemu_get_byte(f);   /* Drop the dummy byte */
+fd = fde->fd;
+QTAILQ_REMOVE(>fds, fde, entry);
+}
+out:
+trace_qemu_file_get_fd(f->ioc->name, fd);
+return fd;
+}
+
 /** Closes the file
  *
  * Returns negative error value if any error happened on previous operations or
@@ -361,11 +430,17 @@ static ssize_t coroutine_mixed_fn 
qemu_fill_buffer(QEMUFile *f)
  */
 int qemu_fclose(QEMUFile *f)
 {
+FdEntry *fde, *next;
 int ret = qemu_fflush(f);
 int ret2 = qio_channel_close(f->ioc, NULL);
 if (ret >= 0) {
 ret = ret2;
 }
+QTAILQ_FOREACH_SAFE(fde, >fds, entry, next) {
+warn_report("qemu_fclose: received fd %d was never claimed", fde->fd);
+close(fde->fd);
+g_free(fde);
+}
 g_clear_pointer(>ioc, object_unref);
 error_free(f->last_error_obj);
 g_free(f);
diff --git a/migration/qemu-file.h b/migration/qemu-file.h
index 11c2120..3e47a20 100644
--- a/migration/qemu-file.h
+++ b/migration/qemu-file.h
@@ -79,5 +79,7 @@ size_t qemu_get_buffer_at(QEMUFile *f, const uint8_t *buf, 
size_t buflen,
   off_t pos);
 
 QIOChannel *qemu_file_get_ioc(QEMUFile *file);
+int qemu_file_put_fd(QEMUFile *f, int fd);
+int qemu_file_get_fd(QEMUFile *f);

[RFC V1 5/6] migration: cpr-uri option

2024-06-30 Thread Steve Sistare

Define the cpr-uri QEMU command-line option to specify the URI from
which CPR vmstate is loaded for cpr-transfer mode.

Signed-off-by: Steve Sistare 
---
 include/migration/cpr.h | 1 +
 migration/cpr.c | 7 +++
 qemu-options.hx | 8 
 system/vl.c | 3 +++
 4 files changed, 19 insertions(+)

diff --git a/include/migration/cpr.h b/include/migration/cpr.h
index 9aae94c..bc62493 100644
--- a/include/migration/cpr.h
+++ b/include/migration/cpr.h
@@ -22,6 +22,7 @@ int cpr_find_fd(const char *name, int id);
 int cpr_walk_fd(cpr_walk_fd_cb cb);
 void cpr_resave_fd(const char *name, int id, int fd);
 
+void cpr_set_cpr_uri(const char *uri);
 int cpr_state_save(Error **errp);
 int cpr_state_load(Error **errp);
 
diff --git a/migration/cpr.c b/migration/cpr.c
index f756c15..50c130c 100644
--- a/migration/cpr.c
+++ b/migration/cpr.c
@@ -157,6 +157,13 @@ static const VMStateDescription vmstate_cpr_state = {
 };
 /*/
 
+static char *cpr_uri;
+
+void cpr_set_cpr_uri(const char *uri)
+{
+cpr_uri = g_strdup(uri);
+}
+
 int cpr_state_save(Error **errp)
 {
 int ret;
diff --git a/qemu-options.hx b/qemu-options.hx
index 595b693..a6b5253 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -4776,6 +4776,14 @@ SRST
 
 ERST
 
+DEF("cpr-uri", HAS_ARG, QEMU_OPTION_cpr_uri, \
+"-cpr-uri unix:socketpath\n",
+QEMU_ARCH_ALL)
+SRST
+``-cpr-uri unix:socketpath``
+URI for incoming CPR state, for the cpr-transfer migration mode.
+ERST
+
 DEF("incoming", HAS_ARG, QEMU_OPTION_incoming, \
 "-incoming tcp:[host]:port[,to=maxport][,ipv4=on|off][,ipv6=on|off]\n" \
 "-incoming rdma:host:port[,ipv4=on|off][,ipv6=on|off]\n" \
diff --git a/system/vl.c b/system/vl.c
index 6521ee3..32015ac 100644
--- a/system/vl.c
+++ b/system/vl.c
@@ -3483,6 +3483,9 @@ void qemu_init(int argc, char **argv)
 exit(1);
 }
 break;
+case QEMU_OPTION_cpr_uri:
+cpr_set_cpr_uri(optarg);
+break;
 case QEMU_OPTION_incoming:
 if (!incoming) {
 runstate_set(RUN_STATE_INMIGRATE);
-- 
1.8.3.1

[RFC V1 0/6] Live update: cpr-transfer

2024-06-30 Thread Steve Sistare

What?

This patch series adds the live migration cpr-transfer mode, which
allows the user to transfer a guest to a new QEMU instance on the same
host.  It is identical to cpr-exec in most respects, except as described
below. 

The new user-visible interfaces are:
  * cpr-transfer (MigMode migration parameter)
  * cpr-uri (migration parameter)
  * cpr-uri (command-line argument)

In this mode, the user starts new QEMU on the same host as old QEMU, with
the same arguments as old QEMU, plus the -incoming and the -cpr-uri options.
The user issues the migrate command to old QEMU, which stops the VM, saves
state to the migration channels, and enters the postmigrate state.  Execution
resumes in new QEMU.

This mode requires a second migration channel, specified by the cpr-uri
migration property on the outgoing side, and by the cpr-uri QEMU command-line
option on the incoming side.  The channel must be a type, such as unix socket,
that supports SCM_RIGHTS.

This series depends on the series "Live update: cpr-exec mode".

Why?

cpr-transfer offers the same benefits as cpr-exec mode, but with a model
for launching new QEMU that may be more natural for some management packages.

How?

The file descriptors are kept open by sending them to new QEMU via the
cpr-uri, which must support SCM_RIGHTS.

Example:

In this example, we simply restart the same version of QEMU, but in
a real scenario one would use a new QEMU binary path in terminal 2.

  Terminal 1: start old QEMU
  # qemu-kvm -monitor stdio -object
  memory-backend-file,id=ram0,size=4G,mem-path=/dev/shm/ram0,share=on
  -m 4G -machine anon-alloc=memfd ...

  Terminal 2: start new QEMU
  # qemu-kvm ... -incoming unix:vm.sock -cpr-uri unix:cpr.sock

  Terminal 1:
  QEMU 9.1.50 monitor - type 'help' for more information
  (qemu) info status
  VM status: running
  (qemu) migrate_set_parameter mode cpr-transfer
  (qemu) migrate_set_parameter cpr-uri unix:cpr.sock
  (qemu) migrate -d unix:vm.sock
  (qemu) info status
  VM status: paused (postmigrate)

  Terminal 2:
  QEMU 9.1.50 monitor - type 'help' for more information
  (qemu) info status
  VM status: running

Steve Sistare (6):
  migration: SCM_RIGHTS for QEMUFile
  migration: VMSTATE_FD
  migration: cpr-transfer save and load
  migration: cpr-uri parameter
  migration: cpr-uri option
  migration: cpr-transfer mode

 include/migration/cpr.h|  4 ++
 include/migration/vmstate.h|  9 +
 migration/cpr-transfer.c   | 81 +
 migration/cpr.c| 16 +++-
 migration/meson.build  |  1 +
 migration/migration-hmp-cmds.c | 10 +
 migration/migration.c  | 37 +++
 migration/options.c| 29 +++
 migration/options.h|  1 +
 migration/qemu-file.c  | 83 --
 migration/qemu-file.h  |  2 +
 migration/ram.c|  1 +
 migration/trace-events |  2 +
 migration/vmstate-types.c  | 33 +
 qapi/migration.json| 38 ++-
 qemu-options.hx|  8 
 stubs/vmstate.c|  7 
 system/vl.c|  3 ++
 18 files changed, 359 insertions(+), 6 deletions(-)
 create mode 100644 migration/cpr-transfer.c

-- 
1.8.3.1

[RFC V1 4/6] migration: cpr-uri parameter

2024-06-30 Thread Steve Sistare

Define the cpr-uri migration parameter to specify the URI to which
CPR vmstate is saved for cpr-transfer mode.

Signed-off-by: Steve Sistare 
---
 migration/migration-hmp-cmds.c | 10 ++
 migration/options.c| 29 +
 migration/options.h|  1 +
 qapi/migration.json| 12 
 4 files changed, 52 insertions(+)

diff --git a/migration/migration-hmp-cmds.c b/migration/migration-hmp-cmds.c
index 16a4b00..4ede831 100644
--- a/migration/migration-hmp-cmds.c
+++ b/migration/migration-hmp-cmds.c
@@ -371,6 +371,11 @@ void hmp_info_migrate_parameters(Monitor *mon, const QDict 
*qdict)
params->direct_io ? "on" : "off");
 }
 
+assert(params->cpr_uri);
+monitor_printf(mon, "%s: '%s'\n",
+MigrationParameter_str(MIGRATION_PARAMETER_CPR_URI),
+params->cpr_uri);
+
 assert(params->has_cpr_exec_command);
 monitor_print_cpr_exec_command(mon, params->cpr_exec_command);
 }
@@ -650,6 +655,11 @@ void hmp_migrate_set_parameter(Monitor *mon, const QDict 
*qdict)
 p->has_direct_io = true;
 visit_type_bool(v, param, >direct_io, );
 break;
+case MIGRATION_PARAMETER_CPR_URI:
+p->cpr_uri = g_new0(StrOrNull, 1);
+p->cpr_uri->type = QTYPE_QSTRING;
+visit_type_str(v, param, >cpr_uri->u.s, );
+break;
 case MIGRATION_PARAMETER_CPR_EXEC_COMMAND: {
 g_autofree char **strv = g_strsplit(valuestr ?: "", " ", -1);
 strList **tail = >cpr_exec_command;
diff --git a/migration/options.c b/migration/options.c
index b8d5f72..7526f9f 100644
--- a/migration/options.c
+++ b/migration/options.c
@@ -163,6 +163,8 @@ Property migration_properties[] = {
 DEFINE_PROP_ZERO_PAGE_DETECTION("zero-page-detection", MigrationState,
parameters.zero_page_detection,
ZERO_PAGE_DETECTION_MULTIFD),
+DEFINE_PROP_STRING("cpr-uri", MigrationState,
+   parameters.cpr_uri),
 
 /* Migration capabilities */
 DEFINE_PROP_MIG_CAP("x-xbzrle", MIGRATION_CAPABILITY_XBZRLE),
@@ -848,6 +850,13 @@ ZeroPageDetection migrate_zero_page_detection(void)
 return s->parameters.zero_page_detection;
 }
 
+const char *migrate_cpr_uri(void)
+{
+MigrationState *s = migrate_get_current();
+
+return s->parameters.cpr_uri;
+}
+
 /* parameters helpers */
 
 AnnounceParameters *migrate_announce_params(void)
@@ -931,6 +940,7 @@ MigrationParameters *qmp_query_migrate_parameters(Error 
**errp)
 params->zero_page_detection = s->parameters.zero_page_detection;
 params->has_direct_io = true;
 params->direct_io = s->parameters.direct_io;
+params->cpr_uri = g_strdup(s->parameters.cpr_uri);
 params->has_cpr_exec_command = true;
 params->cpr_exec_command = QAPI_CLONE(strList,
   s->parameters.cpr_exec_command);
@@ -967,6 +977,7 @@ void migrate_params_init(MigrationParameters *params)
 params->has_mode = true;
 params->has_zero_page_detection = true;
 params->has_direct_io = true;
+params->cpr_uri = g_strdup("");
 params->has_cpr_exec_command = true;
 }
 
@@ -1257,9 +1268,15 @@ static void 
migrate_params_test_apply(MigrateSetParameters *params,
 dest->direct_io = params->direct_io;
 }
 
+if (params->cpr_uri) {
+assert(params->cpr_uri->type == QTYPE_QSTRING);
+dest->cpr_uri = params->cpr_uri->u.s;
+}
+
 if (params->has_cpr_exec_command) {
 dest->cpr_exec_command = params->cpr_exec_command;
 }
+
 }
 
 static void migrate_params_apply(MigrateSetParameters *params, Error **errp)
@@ -1390,6 +1407,12 @@ static void migrate_params_apply(MigrateSetParameters 
*params, Error **errp)
 s->parameters.direct_io = params->direct_io;
 }
 
+if (params->cpr_uri) {
+g_free(s->parameters.cpr_uri);
+assert(params->cpr_uri->type == QTYPE_QSTRING);
+s->parameters.cpr_uri = g_strdup(params->cpr_uri->u.s);
+}
+
 if (params->has_cpr_exec_command) {
 qapi_free_strList(s->parameters.cpr_exec_command);
 s->parameters.cpr_exec_command =
@@ -1421,6 +1444,12 @@ void qmp_migrate_set_parameters(MigrateSetParameters 
*params, Error **errp)
 params->tls_authz->u.s = strdup("");
 }
 
+if (params->cpr_uri && params->cpr_uri->type == QTYPE_QNULL) {
+qobject_unref(params->cpr_uri->u.n);
+params->cpr_uri->type = QTYPE_QSTRING;
+params->cpr_uri->u.s = strdup("");
+}
+
 migrate_params_test_apply(params, );
 
 if (!migrate_params_check(, errp)) {
diff --git a/migration/options.h b/migration/options.h
index a239702..7492fcf 100644
--- a/migration/options.h
+++ b/migration/options.h
@@ -85,6 +85,7 @@ const char *migrate_tls_creds(void);
 const char *migrate_tls_hostname(void);
 uint64_t migrate_xbzrle_cache_size(void);
 ZeroPageDetection migrate_zero_page_detection(void);

[PATCH V2 05/11] physmem: preserve ram blocks for cpr

2024-06-30 Thread Steve Sistare

Save the memfd for anonymous ramblocks in CPR state, along with a name
that uniquely identifies it.  The block's idstr is not yet set, so it
cannot be used for this purpose.  Find the saved memfd in new QEMU when
creating a block.  QEMU hard-codes the length of some internally-created
blocks, so to guard against that length changing, use lseek to get the
actual length of an incoming memfd.

Signed-off-by: Steve Sistare 
---
 system/physmem.c | 25 -
 1 file changed, 24 insertions(+), 1 deletion(-)

diff --git a/system/physmem.c b/system/physmem.c
index efe95ff..e37352e 100644
--- a/system/physmem.c
+++ b/system/physmem.c
@@ -73,6 +73,7 @@
 
 #include "qapi/qapi-types-migration.h"
 #include "migration/options.h"
+#include "migration/cpr.h"
 #include "migration/vmstate.h"
 
 #include "qemu/range.h"
@@ -1641,6 +1642,19 @@ void qemu_ram_unset_idstr(RAMBlock *block)
 }
 }
 
+static char *cpr_name(RAMBlock *block)
+{
+MemoryRegion *mr = block->mr;
+const char *mr_name = memory_region_name(mr);
+g_autofree char *id = mr->dev ? qdev_get_dev_path(mr->dev) : NULL;
+
+if (id) {
+return g_strdup_printf("%s/%s", id, mr_name);
+} else {
+return g_strdup(mr_name);
+}
+}
+
 size_t qemu_ram_pagesize(RAMBlock *rb)
 {
 return rb->page_size;
@@ -1836,13 +1850,17 @@ static void ram_block_add(RAMBlock *new_block, Error 
**errp)
 } else if (new_block->flags & RAM_SHARED) {
 size_t max_length = new_block->max_length;
 MemoryRegion *mr = new_block->mr;
-const char *name = memory_region_name(mr);
+g_autofree char *name = cpr_name(new_block);
 
 new_block->mr->align = QEMU_VMALLOC_ALIGN;
+new_block->fd = cpr_find_fd(name, 0);
 
 if (new_block->fd == -1) {
 new_block->fd = qemu_memfd_create(name, max_length + mr->align,
   0, 0, 0, errp);
+cpr_save_fd(name, 0, new_block->fd);
+} else {
+new_block->max_length = lseek(new_block->fd, 0, SEEK_END);
 }
 
 if (new_block->fd >= 0) {
@@ -1852,6 +1870,7 @@ static void ram_block_add(RAMBlock *new_block, Error 
**errp)
  false, 0, errp);
 }
 if (!new_block->host) {
+cpr_delete_fd(name, 0);
 qemu_mutex_unlock_ramlist();
 return;
 }
@@ -2162,6 +2181,8 @@ static void reclaim_ramblock(RAMBlock *block)
 
 void qemu_ram_free(RAMBlock *block)
 {
+g_autofree char *name = NULL;
+
 if (!block) {
 return;
 }
@@ -2172,6 +2193,8 @@ void qemu_ram_free(RAMBlock *block)
 }
 
 qemu_mutex_lock_ramlist();
+name = cpr_name(block);
+cpr_delete_fd(name, 0);
 QLIST_REMOVE_RCU(block, next);
 ram_list.mru_block = NULL;
 /* Write list before version */
-- 
1.8.3.1

[PATCH V2 10/11] migration: cpr-exec save and load

2024-06-30 Thread Steve Sistare

To preserve CPR state across exec, create a QEMUFile based on a memfd, and
keep the memfd open across exec.  Save the value of the memfd in an
environment variable so post-exec QEMU can find it.

These new functions are called in a subsequent patch.

Signed-off-by: Steve Sistare 
---
 include/migration/cpr.h |  5 +++
 migration/cpr-exec.c| 95 +
 migration/meson.build   |  1 +
 3 files changed, 101 insertions(+)
 create mode 100644 migration/cpr-exec.c

diff --git a/include/migration/cpr.h b/include/migration/cpr.h
index 42b4019..76d6ccb 100644
--- a/include/migration/cpr.h
+++ b/include/migration/cpr.h
@@ -25,4 +25,9 @@ void cpr_resave_fd(const char *name, int id, int fd);
 int cpr_state_save(Error **errp);
 int cpr_state_load(Error **errp);
 
+QEMUFile *cpr_exec_output(Error **errp);
+QEMUFile *cpr_exec_input(Error **errp);
+void cpr_exec_persist_state(QEMUFile *f);
+bool cpr_exec_has_state(void);
+void cpr_exec_unpersist_state(void);
 #endif
diff --git a/migration/cpr-exec.c b/migration/cpr-exec.c
new file mode 100644
index 000..5c40457
--- /dev/null
+++ b/migration/cpr-exec.c
@@ -0,0 +1,95 @@
+/*
+ * Copyright (c) 2021-2024 Oracle and/or its affiliates.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/cutils.h"
+#include "qemu/memfd.h"
+#include "qapi/error.h"
+#include "io/channel-file.h"
+#include "io/channel-socket.h"
+#include "migration/cpr.h"
+#include "migration/qemu-file.h"
+#include "migration/misc.h"
+#include "migration/vmstate.h"
+#include "sysemu/runstate.h"
+
+#define CPR_EXEC_STATE_NAME "QEMU_CPR_EXEC_STATE"
+
+static QEMUFile *qemu_file_new_fd_input(int fd, const char *name)
+{
+g_autoptr(QIOChannelFile) fioc = qio_channel_file_new_fd(fd);
+QIOChannel *ioc = QIO_CHANNEL(fioc);
+qio_channel_set_name(ioc, name);
+return qemu_file_new_input(ioc);
+}
+
+static QEMUFile *qemu_file_new_fd_output(int fd, const char *name)
+{
+g_autoptr(QIOChannelFile) fioc = qio_channel_file_new_fd(fd);
+QIOChannel *ioc = QIO_CHANNEL(fioc);
+qio_channel_set_name(ioc, name);
+return qemu_file_new_output(ioc);
+}
+
+void cpr_exec_persist_state(QEMUFile *f)
+{
+QIOChannelFile *fioc = QIO_CHANNEL_FILE(qemu_file_get_ioc(f));
+int mfd = dup(fioc->fd);
+char val[16];
+
+/* Remember mfd in environment for post-exec load */
+qemu_clear_cloexec(mfd);
+snprintf(val, sizeof(val), "%d", mfd);
+g_setenv(CPR_EXEC_STATE_NAME, val, 1);
+}
+
+static int cpr_exec_find_state(void)
+{
+const char *val = g_getenv(CPR_EXEC_STATE_NAME);
+int mfd;
+
+assert(val);
+g_unsetenv(CPR_EXEC_STATE_NAME);
+assert(!qemu_strtoi(val, NULL, 10, ));
+return mfd;
+}
+
+bool cpr_exec_has_state(void)
+{
+return g_getenv(CPR_EXEC_STATE_NAME) != NULL;
+}
+
+void cpr_exec_unpersist_state(void)
+{
+int mfd;
+const char *val = g_getenv(CPR_EXEC_STATE_NAME);
+
+g_unsetenv(CPR_EXEC_STATE_NAME);
+assert(val);
+assert(!qemu_strtoi(val, NULL, 10, ));
+close(mfd);
+}
+
+QEMUFile *cpr_exec_output(Error **errp)
+{
+int mfd = memfd_create(CPR_EXEC_STATE_NAME, 0);
+
+if (mfd < 0) {
+error_setg_errno(errp, errno, "memfd_create failed");
+return NULL;
+}
+
+return qemu_file_new_fd_output(mfd, CPR_EXEC_STATE_NAME);
+}
+
+QEMUFile *cpr_exec_input(Error **errp)
+{
+int mfd = cpr_exec_find_state();
+
+lseek(mfd, 0, SEEK_SET);
+return qemu_file_new_fd_input(mfd, CPR_EXEC_STATE_NAME);
+}
diff --git a/migration/meson.build b/migration/meson.build
index 87feb4c..dd1d315 100644
--- a/migration/meson.build
+++ b/migration/meson.build
@@ -14,6 +14,7 @@ system_ss.add(files(
   'channel.c',
   'channel-block.c',
   'cpr.c',
+  'cpr-exec.c',
   'dirtyrate.c',
   'exec.c',
   'fd.c',
-- 
1.8.3.1

[PATCH V2 11/11] migration: cpr-exec mode

2024-06-30 Thread Steve Sistare

Add the cpr-exec migration mode.  Usage:
  qemu-system-$arch -machine anon-alloc=memfd ...
  migrate_set_parameter mode cpr-exec
  migrate_set_parameter cpr-exec-command \
  ... -incoming  \
  migrate -d 

The migrate command stops the VM, saves state to uri-1,
directly exec's a new version of QEMU on the same host,
replacing the original process while retaining its PID, and
loads state from uri-1.  Guest RAM is preserved in place,
albeit with new virtual addresses.

The new QEMU process is started by exec'ing the command
specified by the @cpr-exec-command parameter.  The first word of
the command is the binary, and the remaining words are its
arguments.  The command may be a direct invocation of new QEMU,
or may be a non-QEMU command that exec's the new QEMU binary.

This mode creates a second migration channel that is not visible
to the user.  At the start of migration, old QEMU saves CPR state
to the second channel, and at the end of migration, it tells the
main loop to call cpr_exec.  New QEMU loads CPR state early, before
objects are created.

Because old QEMU terminates when new QEMU starts, one cannot
stream data between the two, so uri-1 must be a type,
such as a file, that accepts all data before old QEMU exits.
Otherwise, old QEMU may quietly block writing to the channel.

Memory-backend objects must have the share=on attribute, but
memory-backend-epc is not supported.  The VM must be started with
the '-machine anon-alloc=memfd' option, which allows anonymous
memory to be transferred in place to the new process.  The memfds
are kept open across exec by clearing the close-on-exec flag, their
values are saved in CPR state, and they are mmap'd in new qemu.

Note that the anon-alloc option is not related to memory-backend-memfd.
Later patches add support for memory-backend-memfd.

Signed-off-by: Steve Sistare 
---
 include/migration/cpr.h |  2 ++
 migration/cpr-exec.c| 85 +
 migration/cpr.c | 37 ++---
 migration/migration.c   | 14 ++--
 migration/ram.c |  2 ++
 qapi/migration.json | 24 +-
 6 files changed, 157 insertions(+), 7 deletions(-)

diff --git a/include/migration/cpr.h b/include/migration/cpr.h
index 76d6ccb..c6c60f8 100644
--- a/include/migration/cpr.h
+++ b/include/migration/cpr.h
@@ -30,4 +30,6 @@ QEMUFile *cpr_exec_input(Error **errp);
 void cpr_exec_persist_state(QEMUFile *f);
 bool cpr_exec_has_state(void);
 void cpr_exec_unpersist_state(void);
+void cpr_mig_init(void);
+void cpr_unpreserve_fds(void);
 #endif
diff --git a/migration/cpr-exec.c b/migration/cpr-exec.c
index 5c40457..fd65435 100644
--- a/migration/cpr-exec.c
+++ b/migration/cpr-exec.c
@@ -11,8 +11,11 @@
 #include "qapi/error.h"
 #include "io/channel-file.h"
 #include "io/channel-socket.h"
+#include "block/block-global-state.h"
+#include "qemu/main-loop.h"
 #include "migration/cpr.h"
 #include "migration/qemu-file.h"
+#include "migration/migration.h"
 #include "migration/misc.h"
 #include "migration/vmstate.h"
 #include "sysemu/runstate.h"
@@ -93,3 +96,85 @@ QEMUFile *cpr_exec_input(Error **errp)
 lseek(mfd, 0, SEEK_SET);
 return qemu_file_new_fd_input(mfd, CPR_EXEC_STATE_NAME);
 }
+
+static int preserve_fd(int fd)
+{
+qemu_clear_cloexec(fd);
+return 0;
+}
+
+static int unpreserve_fd(int fd)
+{
+qemu_set_cloexec(fd);
+return 0;
+}
+
+static void cpr_preserve_fds(void)
+{
+cpr_walk_fd(preserve_fd);
+}
+
+void cpr_unpreserve_fds(void)
+{
+cpr_walk_fd(unpreserve_fd);
+}
+
+static void cpr_exec(char **argv)
+{
+MigrationState *s = migrate_get_current();
+Error *err = NULL;
+
+/*
+ * Clear the close-on-exec flag for all preserved fd's.  We cannot do so
+ * earlier because they should not persist across miscellaneous fork and
+ * exec calls that are performed during normal operation.
+ */
+cpr_preserve_fds();
+
+execvp(argv[0], argv);
+
+cpr_unpreserve_fds();
+
+error_setg_errno(, errno, "execvp %s failed", argv[0]);
+error_report_err(error_copy(err));
+migrate_set_state(>state, s->state, MIGRATION_STATUS_FAILED);
+migrate_set_error(s, err);
+
+migration_call_notifiers(s, MIG_EVENT_PRECOPY_FAILED, NULL);
+
+if (s->block_inactive) {
+Error *local_err = NULL;
+bdrv_activate_all(_err);
+if (local_err) {
+error_report_err(local_err);
+return;
+} else {
+s->block_inactive = false;
+}
+}
+
+if (runstate_is_live(s->vm_old_state)) {
+vm_start();
+}
+}
+
+static int cpr_exec_notifier(NotifierWithReturn *notifier, MigrationEvent *e,
+ Error **errp)
+{
+MigrationState *s = migrate_get_current();
+
+if (e->type == MIG_EVENT_PRECOPY_DONE) {
+assert(s->state == MIGRATION_STATUS_COMPLETED);
+qemu_system_exec_request(cpr_exec, s->parameters.cpr_exec_command);
+} else if (e->type ==

[PATCH V2 03/11] migration: save cpr mode

2024-06-30 Thread Steve Sistare

Save the mode in CPR state, so the user does not need to explicitly specify
it for the target.  Modify migrate_mode() so it returns the incoming mode on
the target.

Signed-off-by: Steve Sistare 
---
 include/migration/cpr.h |  7 +++
 migration/cpr.c | 23 ++-
 migration/migration.c   |  1 +
 migration/options.c |  9 +++--
 4 files changed, 37 insertions(+), 3 deletions(-)

diff --git a/include/migration/cpr.h b/include/migration/cpr.h
index 8e7e705..42b4019 100644
--- a/include/migration/cpr.h
+++ b/include/migration/cpr.h
@@ -8,6 +8,13 @@
 #ifndef MIGRATION_CPR_H
 #define MIGRATION_CPR_H
 
+#include "qapi/qapi-types-migration.h"
+
+#define MIG_MODE_NONE MIG_MODE__MAX
+
+MigMode cpr_get_incoming_mode(void);
+void cpr_set_incoming_mode(MigMode mode);
+
 typedef int (*cpr_walk_fd_cb)(int fd);
 void cpr_save_fd(const char *name, int id, int fd);
 void cpr_delete_fd(const char *name, int id);
diff --git a/migration/cpr.c b/migration/cpr.c
index 313e74e..1c296c6 100644
--- a/migration/cpr.c
+++ b/migration/cpr.c
@@ -21,10 +21,23 @@
 typedef QLIST_HEAD(CprFdList, CprFd) CprFdList;
 
 typedef struct CprState {
+MigMode mode;
 CprFdList fds;
 } CprState;
 
-static CprState cpr_state;
+static CprState cpr_state = {
+.mode = MIG_MODE_NONE,
+};
+
+MigMode cpr_get_incoming_mode(void)
+{
+return cpr_state.mode;
+}
+
+void cpr_set_incoming_mode(MigMode mode)
+{
+cpr_state.mode = mode;
+}
 
 //
 
@@ -124,11 +137,19 @@ void cpr_resave_fd(const char *name, int id, int fd)
 /*/
 #define CPR_STATE "CprState"
 
+static int cpr_state_presave(void *opaque)
+{
+cpr_state.mode = migrate_mode();
+return 0;
+}
+
 static const VMStateDescription vmstate_cpr_state = {
 .name = CPR_STATE,
 .version_id = 1,
 .minimum_version_id = 1,
+.pre_save = cpr_state_presave,
 .fields = (VMStateField[]) {
+VMSTATE_UINT32(mode, CprState),
 VMSTATE_QLIST_V(fds, CprState, 1, vmstate_cpr_fd, CprFd, next),
 VMSTATE_END_OF_LIST()
 }
diff --git a/migration/migration.c b/migration/migration.c
index e394ad7..0f47765 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -411,6 +411,7 @@ void migration_incoming_state_destroy(void)
 mis->postcopy_qemufile_dst = NULL;
 }
 
+cpr_set_incoming_mode(MIG_MODE_NONE);
 yank_unregister_instance(MIGRATION_YANK_INSTANCE);
 }
 
diff --git a/migration/options.c b/migration/options.c
index 645f550..305397a 100644
--- a/migration/options.c
+++ b/migration/options.c
@@ -22,6 +22,7 @@
 #include "qapi/qmp/qnull.h"
 #include "sysemu/runstate.h"
 #include "migration/colo.h"
+#include "migration/cpr.h"
 #include "migration/misc.h"
 #include "migration.h"
 #include "migration-stats.h"
@@ -758,8 +759,12 @@ uint64_t migrate_max_postcopy_bandwidth(void)
 
 MigMode migrate_mode(void)
 {
-MigrationState *s = migrate_get_current();
-MigMode mode = s->parameters.mode;
+MigMode mode = cpr_get_incoming_mode();
+
+if (mode == MIG_MODE_NONE) {
+MigrationState *s = migrate_get_current();
+mode = s->parameters.mode;
+}
 
 assert(mode >= 0 && mode < MIG_MODE__MAX);
 return mode;
-- 
1.8.3.1

[PATCH V2 04/11] migration: stop vm earlier for cpr

2024-06-30 Thread Steve Sistare

Stop the vm earlier for cpr, to guarantee consistent device state when
CPR state is saved.

Signed-off-by: Steve Sistare 
---
 migration/migration.c | 22 +-
 1 file changed, 13 insertions(+), 9 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 0f47765..8a8e927 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -2077,6 +2077,7 @@ void qmp_migrate(const char *uri, bool has_channels,
 MigrationState *s = migrate_get_current();
 g_autoptr(MigrationChannel) channel = NULL;
 MigrationAddress *addr = NULL;
+bool stopped = false;
 
 /*
  * Having preliminary checks for uri and channel
@@ -2120,6 +2121,15 @@ void qmp_migrate(const char *uri, bool has_channels,
 }
 }
 
+if (migrate_mode_is_cpr(s)) {
+int ret = migration_stop_vm(s, RUN_STATE_FINISH_MIGRATE);
+if (ret < 0) {
+error_setg(_err, "migration_stop_vm failed, error %d", -ret);
+goto out;
+}
+stopped = true;
+}
+
 if (cpr_state_save(_err)) {
 goto out;
 }
@@ -2155,6 +2165,9 @@ out:
 }
 migrate_fd_error(s, local_err);
 error_propagate(errp, local_err);
+if (stopped && runstate_is_live(s->vm_old_state)) {
+vm_start();
+}
 return;
 }
 }
@@ -3738,7 +3751,6 @@ void migrate_fd_connect(MigrationState *s, Error 
*error_in)
 Error *local_err = NULL;
 uint64_t rate_limit;
 bool resume = (s->state == MIGRATION_STATUS_POSTCOPY_RECOVER_SETUP);
-int ret;
 
 /*
  * If there's a previous error, free it and prepare for another one.
@@ -3810,14 +3822,6 @@ void migrate_fd_connect(MigrationState *s, Error 
*error_in)
 return;
 }
 
-if (migrate_mode_is_cpr(s)) {
-ret = migration_stop_vm(s, RUN_STATE_FINISH_MIGRATE);
-if (ret < 0) {
-error_setg(_err, "migration_stop_vm failed, error %d", -ret);
-goto fail;
-}
-}
-
 if (migrate_background_snapshot()) {
 qemu_thread_create(>thread, "mig/snapshot",
 bg_migration_thread, s, QEMU_THREAD_JOINABLE);
-- 
1.8.3.1

[PATCH V2 08/11] vl: helper to request exec

2024-06-30 Thread Steve Sistare

Add a qemu_system_exec_request() hook that causes the main loop to exit and
exec a command using the specified arguments.  This will be used during CPR
to exec a new version of QEMU.

Signed-off-by: Steve Sistare 
---
 include/sysemu/runstate.h |  3 +++
 system/runstate.c | 29 +
 2 files changed, 32 insertions(+)

diff --git a/include/sysemu/runstate.h b/include/sysemu/runstate.h
index 0117d24..cb669cf 100644
--- a/include/sysemu/runstate.h
+++ b/include/sysemu/runstate.h
@@ -80,6 +80,8 @@ typedef enum WakeupReason {
 QEMU_WAKEUP_REASON_OTHER,
 } WakeupReason;
 
+typedef void (*qemu_exec_func)(char **exec_argv);
+
 void qemu_system_reset_request(ShutdownCause reason);
 void qemu_system_suspend_request(void);
 void qemu_register_suspend_notifier(Notifier *notifier);
@@ -91,6 +93,7 @@ void qemu_register_wakeup_support(void);
 void qemu_system_shutdown_request_with_code(ShutdownCause reason,
 int exit_code);
 void qemu_system_shutdown_request(ShutdownCause reason);
+void qemu_system_exec_request(qemu_exec_func func, const strList *args);
 void qemu_system_powerdown_request(void);
 void qemu_register_powerdown_notifier(Notifier *notifier);
 void qemu_register_shutdown_notifier(Notifier *notifier);
diff --git a/system/runstate.c b/system/runstate.c
index ec32e27..afc56e4 100644
--- a/system/runstate.c
+++ b/system/runstate.c
@@ -40,6 +40,7 @@
 #include "qapi/error.h"
 #include "qapi/qapi-commands-run-state.h"
 #include "qapi/qapi-events-run-state.h"
+#include "qapi/type-helpers.h"
 #include "qemu/accel.h"
 #include "qemu/error-report.h"
 #include "qemu/job.h"
@@ -400,6 +401,8 @@ static NotifierList wakeup_notifiers =
 static NotifierList shutdown_notifiers =
 NOTIFIER_LIST_INITIALIZER(shutdown_notifiers);
 static uint32_t wakeup_reason_mask = ~(1 << QEMU_WAKEUP_REASON_NONE);
+qemu_exec_func exec_func;
+static char **exec_argv;
 
 ShutdownCause qemu_shutdown_requested_get(void)
 {
@@ -416,6 +419,11 @@ static int qemu_shutdown_requested(void)
 return qatomic_xchg(_requested, SHUTDOWN_CAUSE_NONE);
 }
 
+static int qemu_exec_requested(void)
+{
+return exec_argv != NULL;
+}
+
 static void qemu_kill_report(void)
 {
 if (!qtest_driver() && shutdown_signal) {
@@ -693,6 +701,23 @@ void qemu_system_shutdown_request(ShutdownCause reason)
 qemu_notify_event();
 }
 
+static void qemu_system_exec(void)
+{
+exec_func(exec_argv);
+
+/* exec failed */
+g_strfreev(exec_argv);
+exec_argv = NULL;
+exec_func = NULL;
+}
+
+void qemu_system_exec_request(qemu_exec_func func, const strList *args)
+{
+exec_func = func;
+exec_argv = strv_from_str_list(args);
+qemu_notify_event();
+}
+
 static void qemu_system_powerdown(void)
 {
 qapi_event_send_powerdown();
@@ -739,6 +764,10 @@ static bool main_loop_should_exit(int *status)
 if (qemu_suspend_requested()) {
 qemu_system_suspend();
 }
+if (qemu_exec_requested()) {
+qemu_system_exec();
+return false;
+}
 request = qemu_shutdown_requested();
 if (request) {
 qemu_kill_report();
-- 
1.8.3.1

[PATCH V2 01/11] machine: alloc-anon option

2024-06-30 Thread Steve Sistare

Allocate anonymous memory using mmap MAP_ANON or memfd_create depending
on the value of the anon-alloc machine property.  This affects
memory-backend-ram objects, guest RAM created with the global -m option
but without an associated memory-backend object and without the -mem-path
option, and various memory regions such as ROMs that are allocated when
devices are created.  This option does not affect memory-backend-file,
memory-backend-memfd, or memory-backend-epc objects.

The memfd option is intended to support new migration modes, in which the
memory region can be transferred in place to a new QEMU process, by sending
the memfd file descriptor to the process.  Memory contents are preserved,
and if the mode also transfers device descriptors, then pages that are
locked in memory for DMA remain locked.  This behavior is a pre-requisite
for supporting vfio, vdpa, and iommufd devices with the new modes.

To access the same memory in the old and new QEMU processes, the memory
must be mapped shared.  Therefore, the implementation always sets
RAM_SHARED if alloc-anon=memfd, except for memory-backend-ram, where the
user must explicitly specify the share option.  In lieu of defining a new
RAM flag, at the lowest level the implementation uses RAM_SHARED with fd=-1
as the condition for calling memfd_create.

Signed-off-by: Steve Sistare 
---
 hw/core/machine.c   | 24 
 include/hw/boards.h |  1 +
 qapi/machine.json   | 14 ++
 qemu-options.hx | 13 +
 system/memory.c | 12 +---
 system/physmem.c| 38 +-
 system/trace-events |  3 +++
 7 files changed, 101 insertions(+), 4 deletions(-)

diff --git a/hw/core/machine.c b/hw/core/machine.c
index 655d75c..7ca2ad0 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -454,6 +454,20 @@ static void machine_set_mem_merge(Object *obj, bool value, 
Error **errp)
 ms->mem_merge = value;
 }
 
+static int machine_get_anon_alloc(Object *obj, Error **errp)
+{
+MachineState *ms = MACHINE(obj);
+
+return ms->anon_alloc;
+}
+
+static void machine_set_anon_alloc(Object *obj, int value, Error **errp)
+{
+MachineState *ms = MACHINE(obj);
+
+ms->anon_alloc = value;
+}
+
 static bool machine_get_usb(Object *obj, Error **errp)
 {
 MachineState *ms = MACHINE(obj);
@@ -1066,6 +1080,11 @@ static void machine_class_init(ObjectClass *oc, void 
*data)
 object_class_property_set_description(oc, "mem-merge",
 "Enable/disable memory merge support");
 
+object_class_property_add_enum(oc, "anon-alloc", "AnonAllocOption",
+   _lookup,
+   machine_get_anon_alloc,
+   machine_set_anon_alloc);
+
 object_class_property_add_bool(oc, "usb",
 machine_get_usb, machine_set_usb);
 object_class_property_set_description(oc, "usb",
@@ -1416,6 +1435,11 @@ static bool create_default_memdev(MachineState *ms, 
const char *path, Error **er
 if (!object_property_set_int(obj, "size", ms->ram_size, errp)) {
 goto out;
 }
+if (!object_property_set_bool(obj, "share",
+  ms->anon_alloc == ANON_ALLOC_OPTION_MEMFD,
+  errp)) {
+goto out;
+}
 object_property_add_child(object_get_objects_root(), mc->default_ram_id,
   obj);
 /* Ensure backend's memory region name is equal to mc->default_ram_id */
diff --git a/include/hw/boards.h b/include/hw/boards.h
index 73ad319..77f16ad 100644
--- a/include/hw/boards.h
+++ b/include/hw/boards.h
@@ -383,6 +383,7 @@ struct MachineState {
 bool enable_graphics;
 ConfidentialGuestSupport *cgs;
 HostMemoryBackend *memdev;
+AnonAllocOption anon_alloc;
 /*
  * convenience alias to ram_memdev_id backend memory region
  * or to numa container memory region
diff --git a/qapi/machine.json b/qapi/machine.json
index 2fd3e9c..9173953 100644
--- a/qapi/machine.json
+++ b/qapi/machine.json
@@ -1881,3 +1881,17 @@
 { 'command': 'x-query-interrupt-controllers',
   'returns': 'HumanReadableText',
   'features': [ 'unstable' ]}
+
+##
+# @AnonAllocOption:
+#
+# An enumeration of the options for allocating anonymous guest memory.
+#
+# @mmap: allocate using mmap MAP_ANON
+#
+# @memfd: allocate using memfd_create
+#
+# Since: 9.1
+##
+{ 'enum': 'AnonAllocOption',
+  'data': [ 'mmap', 'memfd' ] }
diff --git a/qemu-options.hx b/qemu-options.hx
index 8ca7f34..595b693 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -38,6 +38,7 @@ DEF("machine", HAS_ARG, QEMU_OPTION_machine, \
 "nvdimm=on|off controls NVDIMM support (default=off)\n"
 "memory-encryption=@var{} memory encryption object to use 
(default=none)\n"
 "hmat=on|off controls ACPI HMAT support (default=off)\n"
+"anon-alloc=mmap|memfd allocate anonymous guest RAM using 
mmap

[PATCH V2 07/11] oslib: qemu_clear_cloexec

2024-06-30 Thread Steve Sistare

Define qemu_clear_cloexec, analogous to qemu_set_cloexec.

Signed-off-by: Steve Sistare 
Reviewed-by: Dr. David Alan Gilbert 
Reviewed-by: Marc-André Lureau 
Reviewed-by: Fabiano Rosas 
---
 include/qemu/osdep.h | 9 +
 util/oslib-posix.c   | 9 +
 util/oslib-win32.c   | 4 
 3 files changed, 22 insertions(+)

diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
index 191916f..6ebf192 100644
--- a/include/qemu/osdep.h
+++ b/include/qemu/osdep.h
@@ -662,6 +662,15 @@ ssize_t qemu_write_full(int fd, const void *buf, size_t 
count)
 
 void qemu_set_cloexec(int fd);
 
+/*
+ * Clear FD_CLOEXEC for a descriptor.
+ *
+ * The caller must guarantee that no other fork+exec's occur before the
+ * exec that is intended to inherit this descriptor, eg by suspending CPUs
+ * and blocking monitor commands.
+ */
+void qemu_clear_cloexec(int fd);
+
 /* Return a dynamically allocated directory path that is appropriate for 
storing
  * local state.
  *
diff --git a/util/oslib-posix.c b/util/oslib-posix.c
index e764416..614c3e5 100644
--- a/util/oslib-posix.c
+++ b/util/oslib-posix.c
@@ -272,6 +272,15 @@ int qemu_socketpair(int domain, int type, int protocol, 
int sv[2])
 return ret;
 }
 
+void qemu_clear_cloexec(int fd)
+{
+int f;
+f = fcntl(fd, F_GETFD);
+assert(f != -1);
+f = fcntl(fd, F_SETFD, f & ~FD_CLOEXEC);
+assert(f != -1);
+}
+
 char *
 qemu_get_local_state_dir(void)
 {
diff --git a/util/oslib-win32.c b/util/oslib-win32.c
index b623830..c3e969a 100644
--- a/util/oslib-win32.c
+++ b/util/oslib-win32.c
@@ -222,6 +222,10 @@ void qemu_set_cloexec(int fd)
 {
 }
 
+void qemu_clear_cloexec(int fd)
+{
+}
+
 int qemu_get_thread_id(void)
 {
 return GetCurrentThreadId();
-- 
1.8.3.1

[PATCH V2 09/11] migration: cpr-exec-command parameter

2024-06-30 Thread Steve Sistare

Create the cpr-exec-command migration parameter, defined as a list of
strings.  It will be used for cpr-exec migration mode in a subsequent
patch, and contains forward references to cpr-exec mode in the qapi
doc.

No functional change, except that cpr-exec-command is shown by the
'info migrate' command.

Signed-off-by: Steve Sistare 
---
 hmp-commands.hx|  2 +-
 migration/migration-hmp-cmds.c | 25 +
 migration/options.c| 14 ++
 qapi/migration.json| 21 ++---
 4 files changed, 58 insertions(+), 4 deletions(-)

diff --git a/hmp-commands.hx b/hmp-commands.hx
index 06746f0..0e04eac 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -1009,7 +1009,7 @@ ERST
 
 {
 .name   = "migrate_set_parameter",
-.args_type  = "parameter:s,value:s",
+.args_type  = "parameter:s,value:S",
 .params = "parameter value",
 .help   = "Set the parameter for migration",
 .cmd= hmp_migrate_set_parameter,
diff --git a/migration/migration-hmp-cmds.c b/migration/migration-hmp-cmds.c
index 7d608d2..16a4b00 100644
--- a/migration/migration-hmp-cmds.c
+++ b/migration/migration-hmp-cmds.c
@@ -229,6 +229,18 @@ void hmp_info_migrate_capabilities(Monitor *mon, const 
QDict *qdict)
 qapi_free_MigrationCapabilityStatusList(caps);
 }
 
+static void monitor_print_cpr_exec_command(Monitor *mon, strList *args)
+{
+monitor_printf(mon, "%s:",
+MigrationParameter_str(MIGRATION_PARAMETER_CPR_EXEC_COMMAND));
+
+while (args) {
+monitor_printf(mon, " %s", args->value);
+args = args->next;
+}
+monitor_printf(mon, "\n");
+}
+
 void hmp_info_migrate_parameters(Monitor *mon, const QDict *qdict)
 {
 MigrationParameters *params;
@@ -358,6 +370,9 @@ void hmp_info_migrate_parameters(Monitor *mon, const QDict 
*qdict)
MIGRATION_PARAMETER_DIRECT_IO),
params->direct_io ? "on" : "off");
 }
+
+assert(params->has_cpr_exec_command);
+monitor_print_cpr_exec_command(mon, params->cpr_exec_command);
 }
 
 qapi_free_MigrationParameters(params);
@@ -635,6 +650,16 @@ void hmp_migrate_set_parameter(Monitor *mon, const QDict 
*qdict)
 p->has_direct_io = true;
 visit_type_bool(v, param, >direct_io, );
 break;
+case MIGRATION_PARAMETER_CPR_EXEC_COMMAND: {
+g_autofree char **strv = g_strsplit(valuestr ?: "", " ", -1);
+strList **tail = >cpr_exec_command;
+
+for (int i = 0; strv[i]; i++) {
+QAPI_LIST_APPEND(tail, strv[i]);
+}
+p->has_cpr_exec_command = true;
+break;
+}
 default:
 assert(0);
 }
diff --git a/migration/options.c b/migration/options.c
index 305397a..b8d5f72 100644
--- a/migration/options.c
+++ b/migration/options.c
@@ -931,6 +931,9 @@ MigrationParameters *qmp_query_migrate_parameters(Error 
**errp)
 params->zero_page_detection = s->parameters.zero_page_detection;
 params->has_direct_io = true;
 params->direct_io = s->parameters.direct_io;
+params->has_cpr_exec_command = true;
+params->cpr_exec_command = QAPI_CLONE(strList,
+  s->parameters.cpr_exec_command);
 
 return params;
 }
@@ -964,6 +967,7 @@ void migrate_params_init(MigrationParameters *params)
 params->has_mode = true;
 params->has_zero_page_detection = true;
 params->has_direct_io = true;
+params->has_cpr_exec_command = true;
 }
 
 /*
@@ -1252,6 +1256,10 @@ static void 
migrate_params_test_apply(MigrateSetParameters *params,
 if (params->has_direct_io) {
 dest->direct_io = params->direct_io;
 }
+
+if (params->has_cpr_exec_command) {
+dest->cpr_exec_command = params->cpr_exec_command;
+}
 }
 
 static void migrate_params_apply(MigrateSetParameters *params, Error **errp)
@@ -1381,6 +1389,12 @@ static void migrate_params_apply(MigrateSetParameters 
*params, Error **errp)
 if (params->has_direct_io) {
 s->parameters.direct_io = params->direct_io;
 }
+
+if (params->has_cpr_exec_command) {
+qapi_free_strList(s->parameters.cpr_exec_command);
+s->parameters.cpr_exec_command =
+QAPI_CLONE(strList, params->cpr_exec_command);
+}
 }
 
 void qmp_migrate_set_parameters(MigrateSetParameters *params, Error **errp)
diff --git a/qapi/migration.json b/qapi/migration.json
index 0f24206..20092d2 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -829,6 +829,10 @@
 # only has effect if the @mapped-ram capability is enabled.
 # (Since 9.1)
 #
+# @cpr-exec-command: Command to start the new QEMU process when @mode
+# is @cpr-exec.  The first list element is the program's filename,
+# the remainder its arguments. (Since 9.1)
+#
 # Features:
 #
 # @unstable: Members @x-checkpoint-delay and
@@ -854,7 +858,8 @@
'vcpu-dirty-limit',

[PATCH V2 00/11] Live update: cpr-exec

2024-06-30 Thread Steve Sistare

What?

This patch series adds the live migration cpr-exec mode, which allows
the user to update QEMU with minimal guest pause time, by preserving
guest RAM in place, albeit with new virtual addresses in new QEMU, and
by preserving device file descriptors.

The new user-visible interfaces are:
  * cpr-exec (MigMode migration parameter)
  * cpr-exec-command (migration parameter)
  * anon-alloc (command-line option for -machine)

The user sets the mode parameter before invoking the migrate command.
In this mode, the user issues the migrate command to old QEMU, which
stops the VM and saves state to the migration channels.  Old QEMU then
exec's new QEMU, replacing the original process while retaining its PID.
The user specifies the command to exec new QEMU in the migration parameter
cpr-exec-command.  The command must pass all old QEMU arguments to new
QEMU, plus the -incoming option.  Execution resumes in new QEMU.

Memory-backend objects must have the share=on attribute, but
memory-backend-epc is not supported.  The VM must be started
with the '-machine anon-alloc=memfd' option, which allows anonymous
memory to be transferred in place to the new process.

Why?

This mode has less impact on the guest than any other method of updating
in place.  The pause time is much lower, because devices need not be torn
down and recreated, DMA does not need to be drained and quiesced, and minimal
state is copied to new QEMU.  Further, there are no constraints on the guest.
By contrast, cpr-reboot mode requires the guest to support S3 suspend-to-ram,
and suspending plus resuming vfio devices adds multiple seconds to the
guest pause time.  Lastly, there is no loss of connectivity to the guest,
because chardev descriptors remain open and connected.

These benefits all derive from the core design principle of this mode,
which is preserving open descriptors.  This approach is very general and
can be used to support a wide variety of devices that do not have hardware
support for live migration, including but not limited to: vfio, chardev,
vhost, vdpa, and iommufd.  Some devices need new kernel software interfaces
to allow a descriptor to be used in a process that did not originally open it.

In a containerized QEMU environment, cpr-exec reuses an existing QEMU
container and its assigned resources.  By contrast, consider a design in
which a new container is created on the same host as the target of the
CPR operation.  Resources must be reserved for the new container, while
the old container still reserves resources until the operation completes.
Avoiding over commitment requires extra work in the management layer.
This is one reason why a cloud provider may prefer cpr-exec.  A second reason
is that the container may include agents with their own connections to the
outside world, and such connections remain intact if the container is reused.

How?

All memory that is mapped by the guest is preserved in place.  Indeed,
it must be, because it may be the target of DMA requests, which are not
quiesced during cpr-exec.  All such memory must be mmap'able in new QEMU.
This is easy for named memory-backend objects, as long as they are mapped
shared, because they are visible in the file system in both old and new QEMU.
Anonymous memory must be allocated using memfd_create rather than MAP_ANON,
so the memfd's can be sent to new QEMU.  Pages that were locked in memory
for DMA in old QEMU remain locked in new QEMU, because the descriptor of
the device that locked them remains open.

cpr-exec preserves descriptors across exec by clearing the CLOEXEC flag,
and by sending the unique name and value of each descriptor to new QEMU
via CPR state.

For device descriptors, new QEMU reuses the descriptor when creating the
device, rather than opening it again.  The same holds for chardevs.  For
memfd descriptors, new QEMU mmap's the preserved memfd when a ramblock
is created.

CPR state cannot be sent over the normal migration channel, because devices
and backends are created prior to reading the channel, so this mode sends
CPR state over a second migration channel that is not visible to the user.
New QEMU reads the second channel prior to creating devices or backends.

The exec itself is trivial.  After writing to the migration channels, the
migration code calls a new main-loop hook to perform the exec.

Example:

In this example, we simply restart the same version of QEMU, but in
a real scenario one would use a new QEMU binary path in cpr-exec-command.

  # qemu-kvm -monitor stdio -object
  memory-backend-file,id=ram0,size=4G,mem-path=/dev/shm/ram0,share=on
  -m 4G -machine anon-alloc=memfd ...

  QEMU 9.1.50 monitor - type 'help' for more information
  (qemu) info status
  VM status: running
  (qemu) migrate_set_parameter mode cpr-exec
  (qemu) migrate_set_parameter cpr-exec-command qemu-kvm ... -incoming 
file:vm.state
  (qemu) migrate -d file:vm.state
  (qemu) QEMU 9.1.50 monitor - type 'help' for more information
  (qemu) info status
  VM status:

[PATCH V2 06/11] migration: fix mismatched GPAs during cpr

2024-06-30 Thread Steve Sistare

For new cpr modes, ramblock_is_ignored will always be true, because the
memory is preserved in place rather than copied.  However, for an ignored
block, parse_ramblock currently requires that the received address of the
block must match the address of the statically initialized region on the
target.  This fails for a PCI rom block, because the memory region address
is set when the guest writes to a BAR on the source, which does not occur
on the target, causing a "Mismatched GPAs" error during cpr migration.

To fix, unconditionally set the target's address to the source's address
if the target region does not have an address yet.

Signed-off-by: Steve Sistare 
Reviewed-by: Fabiano Rosas 
---
 include/exec/memory.h | 12 
 migration/ram.c   | 15 +--
 system/memory.c   | 10 --
 3 files changed, 29 insertions(+), 8 deletions(-)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index c26ede3..227169e 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -811,6 +811,7 @@ struct MemoryRegion {
 bool ram_device;
 bool enabled;
 bool warning_printed; /* For reservations */
+bool has_addr;
 uint8_t vga_logging_count;
 MemoryRegion *alias;
 hwaddr alias_offset;
@@ -2408,6 +2409,17 @@ void memory_region_set_enabled(MemoryRegion *mr, bool 
enabled);
 void memory_region_set_address(MemoryRegion *mr, hwaddr addr);
 
 /*
+ * memory_region_set_address_only: set the address of a region.
+ *
+ * Same as memory_region_set_address, but without causing transaction side
+ * effects.
+ *
+ * @mr: the region to be updated
+ * @addr: new address, relative to container region
+ */
+void memory_region_set_address_only(MemoryRegion *mr, hwaddr addr);
+
+/*
  * memory_region_set_size: dynamically update the size of a region.
  *
  * Dynamically updates the size of a region.
diff --git a/migration/ram.c b/migration/ram.c
index edec1a2..eaf3151 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -4059,12 +4059,15 @@ static int parse_ramblock(QEMUFile *f, RAMBlock *block, 
ram_addr_t length)
 }
 if (migrate_ignore_shared()) {
 hwaddr addr = qemu_get_be64(f);
-if (migrate_ram_is_ignored(block) &&
-block->mr->addr != addr) {
-error_report("Mismatched GPAs for block %s "
- "%" PRId64 "!= %" PRId64, block->idstr,
- (uint64_t)addr, (uint64_t)block->mr->addr);
-return -EINVAL;
+if (migrate_ram_is_ignored(block)) {
+if (!block->mr->has_addr) {
+memory_region_set_address_only(block->mr, addr);
+} else if (block->mr->addr != addr) {
+error_report("Mismatched GPAs for block %s "
+ "%" PRId64 "!= %" PRId64, block->idstr,
+ (uint64_t)addr, (uint64_t)block->mr->addr);
+return -EINVAL;
+}
 }
 }
 ret = rdma_block_notification_handle(f, block->idstr);
diff --git a/system/memory.c b/system/memory.c
index 28a837d..b7548bf 100644
--- a/system/memory.c
+++ b/system/memory.c
@@ -2655,7 +2655,7 @@ static void 
memory_region_add_subregion_common(MemoryRegion *mr,
 for (alias = subregion->alias; alias; alias = alias->alias) {
 alias->mapped_via_alias++;
 }
-subregion->addr = offset;
+memory_region_set_address_only(subregion, offset);
 memory_region_update_container_subregions(subregion);
 }
 
@@ -2735,10 +2735,16 @@ static void memory_region_readd_subregion(MemoryRegion 
*mr)
 }
 }
 
+void memory_region_set_address_only(MemoryRegion *mr, hwaddr addr)
+{
+mr->addr = addr;
+mr->has_addr = true;
+}
+
 void memory_region_set_address(MemoryRegion *mr, hwaddr addr)
 {
 if (addr != mr->addr) {
-mr->addr = addr;
+memory_region_set_address_only(mr, addr);
 memory_region_readd_subregion(mr);
 }
 }
-- 
1.8.3.1

[PATCH V2 02/11] migration: cpr-state

2024-06-30 Thread Steve Sistare

CPR must save state that is needed after QEMU is restarted, when devices
are realized.  Thus the extra state cannot be saved in the migration stream,
as objects must already exist before that stream can be loaded.  Instead,
define auxilliary state structures and vmstate descriptions, not associated
with any registered object, and serialize the aux state to a cpr-specific
stream in cpr_state_save.  Deserialize in cpr_state_load after QEMU
restarts, before devices are realized.

Provide accessors for clients to register file descriptors for saving.
The mechanism for passing the fd's to the new process will be specific
to each migration mode, and added in subsequent patches.

Signed-off-by: Steve Sistare 
---
 include/migration/cpr.h |  21 ++
 migration/cpr.c | 188 
 migration/meson.build   |   1 +
 migration/migration.c   |   6 ++
 migration/trace-events  |   5 ++
 system/vl.c |   3 +
 6 files changed, 224 insertions(+)
 create mode 100644 include/migration/cpr.h
 create mode 100644 migration/cpr.c

diff --git a/include/migration/cpr.h b/include/migration/cpr.h
new file mode 100644
index 000..8e7e705
--- /dev/null
+++ b/include/migration/cpr.h
@@ -0,0 +1,21 @@
+/*
+ * Copyright (c) 2021, 2024 Oracle and/or its affiliates.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#ifndef MIGRATION_CPR_H
+#define MIGRATION_CPR_H
+
+typedef int (*cpr_walk_fd_cb)(int fd);
+void cpr_save_fd(const char *name, int id, int fd);
+void cpr_delete_fd(const char *name, int id);
+int cpr_find_fd(const char *name, int id);
+int cpr_walk_fd(cpr_walk_fd_cb cb);
+void cpr_resave_fd(const char *name, int id, int fd);
+
+int cpr_state_save(Error **errp);
+int cpr_state_load(Error **errp);
+
+#endif
diff --git a/migration/cpr.c b/migration/cpr.c
new file mode 100644
index 000..313e74e
--- /dev/null
+++ b/migration/cpr.c
@@ -0,0 +1,188 @@
+/*
+ * Copyright (c) 2021-2024 Oracle and/or its affiliates.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "qapi/error.h"
+#include "migration/cpr.h"
+#include "migration/misc.h"
+#include "migration/qemu-file.h"
+#include "migration/savevm.h"
+#include "migration/vmstate.h"
+#include "sysemu/runstate.h"
+#include "trace.h"
+
+/*/
+/* cpr state container for all information to be saved. */
+
+typedef QLIST_HEAD(CprFdList, CprFd) CprFdList;
+
+typedef struct CprState {
+CprFdList fds;
+} CprState;
+
+static CprState cpr_state;
+
+//
+
+typedef struct CprFd {
+char *name;
+unsigned int namelen;
+int id;
+int fd;
+QLIST_ENTRY(CprFd) next;
+} CprFd;
+
+static const VMStateDescription vmstate_cpr_fd = {
+.name = "cpr fd",
+.version_id = 1,
+.minimum_version_id = 1,
+.fields = (VMStateField[]) {
+VMSTATE_UINT32(namelen, CprFd),
+VMSTATE_VBUFFER_ALLOC_UINT32(name, CprFd, 0, NULL, namelen),
+VMSTATE_INT32(id, CprFd),
+VMSTATE_INT32(fd, CprFd),
+VMSTATE_END_OF_LIST()
+}
+};
+
+void cpr_save_fd(const char *name, int id, int fd)
+{
+CprFd *elem = g_new0(CprFd, 1);
+
+trace_cpr_save_fd(name, id, fd);
+elem->name = g_strdup(name);
+elem->namelen = strlen(name) + 1;
+elem->id = id;
+elem->fd = fd;
+QLIST_INSERT_HEAD(_state.fds, elem, next);
+}
+
+static CprFd *find_fd(CprFdList *head, const char *name, int id)
+{
+CprFd *elem;
+
+QLIST_FOREACH(elem, head, next) {
+if (!strcmp(elem->name, name) && elem->id == id) {
+return elem;
+}
+}
+return NULL;
+}
+
+void cpr_delete_fd(const char *name, int id)
+{
+CprFd *elem = find_fd(_state.fds, name, id);
+
+if (elem) {
+QLIST_REMOVE(elem, next);
+g_free(elem->name);
+g_free(elem);
+}
+
+trace_cpr_delete_fd(name, id);
+}
+
+int cpr_find_fd(const char *name, int id)
+{
+CprFd *elem = find_fd(_state.fds, name, id);
+int fd = elem ? elem->fd : -1;
+
+trace_cpr_find_fd(name, id, fd);
+return fd;
+}
+
+int cpr_walk_fd(cpr_walk_fd_cb cb)
+{
+CprFd *elem;
+
+QLIST_FOREACH(elem, _state.fds, next) {
+if (elem->fd >= 0 && cb(elem->fd)) {
+return 1;
+}
+}
+return 0;
+}
+
+void cpr_resave_fd(const char *name, int id, int fd)
+{
+CprFd *elem = find_fd(_state.fds, name, id);
+int old_fd = elem ? elem->fd : -1;
+
+if (old_fd < 0) {
+cpr_save_fd(name, id, fd);
+} else if (old_fd != fd) {
+error_setg(_fatal,
+   "internal error: cpr fd '%s' id %d value %d "
+   "already saved with a different value %d",
+   name, id,

[PATCH v4 03/14] tests/tcg/aarch64: Drop -fno-tree-loop-distribute-patterns

2024-06-30 Thread Richard Henderson

This option is not supported by clang, and is not required
in order to get sve code generation with gcc 12.

Signed-off-by: Richard Henderson 
---
 tests/tcg/aarch64/Makefile.softmmu-target | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/tcg/aarch64/Makefile.softmmu-target 
b/tests/tcg/aarch64/Makefile.softmmu-target
index 39d3f961c5..dd6d595830 100644
--- a/tests/tcg/aarch64/Makefile.softmmu-target
+++ b/tests/tcg/aarch64/Makefile.softmmu-target
@@ -39,7 +39,7 @@ memory: CFLAGS+=-DCHECK_UNALIGNED=1
 memory-sve: memory.c $(LINK_SCRIPT) $(CRT_OBJS) $(MINILIB_OBJS)
$(CC) $(CFLAGS) $(EXTRA_CFLAGS) $< -o $@ $(LDFLAGS)
 
-memory-sve: CFLAGS+=-DCHECK_UNALIGNED=1 -march=armv8.1-a+sve -O3 
-fno-tree-loop-distribute-patterns
+memory-sve: CFLAGS+=-DCHECK_UNALIGNED=1 -march=armv8.1-a+sve -O3
 
 TESTS+=memory-sve
 
-- 
2.34.1

[PATCH v4 13/14] tests/tcg/arm: Use vmrs/vmsr instead of mcr/mrc

2024-06-30 Thread Richard Henderson

Clang 14 generates

/home/rth/qemu/src/tests/tcg/arm/fcvt.c:431:9: error: invalid operand for 
instruction
asm("mrc p10, 7, r1, cr1, cr0, 0\n\t"
^
:1:6: note: instantiated into assembly here
mrc p10, 7, r1, cr1, cr0, 0
^
/home/rth/qemu/src/tests/tcg/arm/fcvt.c:432:32: error: invalid operand for 
instruction
"orr r1, r1, %[flags]\n\t"
   ^
:3:6: note: instantiated into assembly here
mcr p10, 7, r1, cr1, cr0, 0
^

This is perhaps a clang bug, but using the neon mnemonic is clearer.

Signed-off-by: Richard Henderson 
---
 tests/tcg/arm/fcvt.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/tests/tcg/arm/fcvt.c b/tests/tcg/arm/fcvt.c
index d8c61cd29f..ecebbb0247 100644
--- a/tests/tcg/arm/fcvt.c
+++ b/tests/tcg/arm/fcvt.c
@@ -427,10 +427,9 @@ int main(int argc, char *argv[argc])
 
 /* And now with ARM alternative FP16 */
 #if defined(__arm__)
-/* See glibc sysdeps/arm/fpu_control.h */
-asm("mrc p10, 7, r1, cr1, cr0, 0\n\t"
+asm("vmrs r1, fpscr\n\t"
 "orr r1, r1, %[flags]\n\t"
-"mcr p10, 7, r1, cr1, cr0, 0\n\t"
+"vmsr fpscr, r1"
 : /* no output */ : [flags] "n" (1 << 26) : "r1" );
 #else
 asm("mrs x1, fpcr\n\t"
-- 
2.34.1

[PATCH v4 04/14] tests/tcg/aarch64: Explicitly specify register width

2024-06-30 Thread Richard Henderson

From: Akihiko Odaki 

clang version 18.1.6 assumes a register is 64-bit by default and
complains if a 32-bit value is given. Explicitly specify register width
when passing a 32-bit value.

Signed-off-by: Akihiko Odaki 
Reviewed-by: Philippe Mathieu-Daudé 
Message-Id: <20240627-tcg-v2-3-1690a8133...@daynix.com>
---
 tests/tcg/aarch64/bti-1.c | 6 +++---
 tests/tcg/aarch64/bti-3.c | 6 +++---
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/tests/tcg/aarch64/bti-1.c b/tests/tcg/aarch64/bti-1.c
index 99a879af23..1fada8108d 100644
--- a/tests/tcg/aarch64/bti-1.c
+++ b/tests/tcg/aarch64/bti-1.c
@@ -17,15 +17,15 @@ static void skip2_sigill(int sig, siginfo_t *info, 
ucontext_t *uc)
 #define BTI_JC"hint #38"
 
 #define BTYPE_1(DEST) \
-asm("mov %0,#1; adr x16, 1f; br x16; 1: " DEST "; mov %0,#0" \
+asm("mov %w0,#1; adr x16, 1f; br x16; 1: " DEST "; mov %w0,#0" \
 : "=r"(skipped) : : "x16")
 
 #define BTYPE_2(DEST) \
-asm("mov %0,#1; adr x16, 1f; blr x16; 1: " DEST "; mov %0,#0" \
+asm("mov %w0,#1; adr x16, 1f; blr x16; 1: " DEST "; mov %w0,#0" \
 : "=r"(skipped) : : "x16", "x30")
 
 #define BTYPE_3(DEST) \
-asm("mov %0,#1; adr x15, 1f; br x15; 1: " DEST "; mov %0,#0" \
+asm("mov %w0,#1; adr x15, 1f; br x15; 1: " DEST "; mov %w0,#0" \
 : "=r"(skipped) : : "x15")
 
 #define TEST(WHICH, DEST, EXPECT) \
diff --git a/tests/tcg/aarch64/bti-3.c b/tests/tcg/aarch64/bti-3.c
index 8c534c09d7..6a3bd037bc 100644
--- a/tests/tcg/aarch64/bti-3.c
+++ b/tests/tcg/aarch64/bti-3.c
@@ -11,15 +11,15 @@ static void skip2_sigill(int sig, siginfo_t *info, 
ucontext_t *uc)
 }
 
 #define BTYPE_1() \
-asm("mov %0,#1; adr x16, 1f; br x16; 1: hint #25; mov %0,#0" \
+asm("mov %w0,#1; adr x16, 1f; br x16; 1: hint #25; mov %w0,#0" \
 : "=r"(skipped) : : "x16", "x30")
 
 #define BTYPE_2() \
-asm("mov %0,#1; adr x16, 1f; blr x16; 1: hint #25; mov %0,#0" \
+asm("mov %w0,#1; adr x16, 1f; blr x16; 1: hint #25; mov %w0,#0" \
 : "=r"(skipped) : : "x16", "x30")
 
 #define BTYPE_3() \
-asm("mov %0,#1; adr x15, 1f; br x15; 1: hint #25; mov %0,#0" \
+asm("mov %w0,#1; adr x15, 1f; br x15; 1: hint #25; mov %w0,#0" \
 : "=r"(skipped) : : "x15", "x30")
 
 #define TEST(WHICH, EXPECT) \
-- 
2.34.1

[PATCH v4 10/14] tests/tcg/arm: Use -fno-integrated-as for test-arm-iwmmxt

2024-06-30 Thread Richard Henderson

Clang does not support IWMXT instructions.
Fall back to the external assembler.

Signed-off-by: Richard Henderson 
---
 tests/tcg/arm/Makefile.target | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/tests/tcg/arm/Makefile.target b/tests/tcg/arm/Makefile.target
index 0a1965fce7..95f891bf8c 100644
--- a/tests/tcg/arm/Makefile.target
+++ b/tests/tcg/arm/Makefile.target
@@ -8,6 +8,11 @@ ARM_SRC=$(SRC_PATH)/tests/tcg/arm
 # Set search path for all sources
 VPATH  += $(ARM_SRC)
 
+config-cc.mak: Makefile
+   $(quiet-@)( \
+   $(call cc-option,-fno-integrated-as, CROSS_CC_HAS_FNIA)) 3> 
config-cc.mak
+-include config-cc.mak
+
 float_madds: CFLAGS+=-mfpu=neon-vfpv4
 
 # Basic Hello World
@@ -17,7 +22,8 @@ hello-arm: LDFLAGS+=-nostdlib
 
 # IWMXT floating point extensions
 ARM_TESTS += test-arm-iwmmxt
-test-arm-iwmmxt: CFLAGS+=-marm -march=iwmmxt -mabi=aapcs -mfpu=fpv4-sp-d16
+# Clang assembler does not support IWMXT, so use the external assembler.
+test-arm-iwmmxt: CFLAGS += -marm -march=iwmmxt -mabi=aapcs -mfpu=fpv4-sp-d16 
$(CROSS_CC_HAS_FNIA)
 test-arm-iwmmxt: test-arm-iwmmxt.S
$(CC) $(CFLAGS) $< -o $@ $(LDFLAGS)
 
-- 
2.34.1

[PATCH v4 12/14] tests/tcg/arm: Use -march and -mfpu for fcvt

2024-06-30 Thread Richard Henderson

Clang requires the architecture to be set properly
in order to assemble the half-precision instructions.

Signed-off-by: Richard Henderson 
---
 tests/tcg/arm/Makefile.target | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tests/tcg/arm/Makefile.target b/tests/tcg/arm/Makefile.target
index 95f891bf8c..8e287191af 100644
--- a/tests/tcg/arm/Makefile.target
+++ b/tests/tcg/arm/Makefile.target
@@ -29,8 +29,8 @@ test-arm-iwmmxt: test-arm-iwmmxt.S
 
 # Float-convert Tests
 ARM_TESTS += fcvt
-fcvt: LDFLAGS+=-lm
-# fcvt: CFLAGS+=-march=armv8.2-a+fp16 -mfpu=neon-fp-armv8
+fcvt: LDFLAGS += -lm
+fcvt: CFLAGS += -march=armv8.2-a+fp16 -mfpu=neon-fp-armv8
 run-fcvt: fcvt
$(call run-test,fcvt,$(QEMU) $<)
$(call diff-out,fcvt,$(ARM_SRC)/fcvt.ref)
-- 
2.34.1

[PATCH v4 05/14] tests/tcg/aarch64: Fix irg operand type

2024-06-30 Thread Richard Henderson

From: Akihiko Odaki 

irg expects 64-bit integers. Passing a 32-bit integer results in
compilation failure with clang version 18.1.6.

Signed-off-by: Akihiko Odaki 
Message-Id: <20240627-tcg-v2-4-1690a8133...@daynix.com>
---
 tests/tcg/aarch64/mte-1.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/tcg/aarch64/mte-1.c b/tests/tcg/aarch64/mte-1.c
index 88dcd617ad..146cad4a04 100644
--- a/tests/tcg/aarch64/mte-1.c
+++ b/tests/tcg/aarch64/mte-1.c
@@ -15,7 +15,7 @@ int main(int ac, char **av)
 enable_mte(PR_MTE_TCF_NONE);
 p0 = alloc_mte_mem(sizeof(*p0));
 
-asm("irg %0,%1,%2" : "=r"(p1) : "r"(p0), "r"(1));
+asm("irg %0,%1,%2" : "=r"(p1) : "r"(p0), "r"(1l));
 assert(p1 != p0);
 asm("subp %0,%1,%2" : "=r"(c) : "r"(p0), "r"(p1));
 assert(c == 0);
-- 
2.34.1

[PATCH v4 01/14] tests/tcg/minilib: Constify digits in print_num

2024-06-30 Thread Richard Henderson

This avoids a memcpy to the stack when compiled with clang.
Since we don't enable optimization, nor provide memcpy,
this results in an undefined symbol error at link time.

Signed-off-by: Richard Henderson 
---
 tests/tcg/minilib/printf.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/tcg/minilib/printf.c b/tests/tcg/minilib/printf.c
index 10472b4f58..fb0189c2bb 100644
--- a/tests/tcg/minilib/printf.c
+++ b/tests/tcg/minilib/printf.c
@@ -27,7 +27,7 @@ static void print_str(char *s)
 
 static void print_num(unsigned long long value, int base)
 {
-char digits[] = "0123456789abcdef";
+static const char digits[] = "0123456789abcdef";
 char buf[32];
 int i = sizeof(buf) - 2, j;
 
-- 
2.34.1

[PATCH v4 06/14] tests/tcg/aarch64: Do not use x constraint

2024-06-30 Thread Richard Henderson

From: Akihiko Odaki 

clang version 18.1.6 does not support x constraint for AArch64.
Use w instead.

Signed-off-by: Akihiko Odaki 
Message-Id: <20240627-tcg-v2-5-1690a8133...@daynix.com>
---
 tests/tcg/arm/fcvt.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/tests/tcg/arm/fcvt.c b/tests/tcg/arm/fcvt.c
index 7ac47b564e..f631197287 100644
--- a/tests/tcg/arm/fcvt.c
+++ b/tests/tcg/arm/fcvt.c
@@ -126,7 +126,7 @@ static void convert_single_to_half(void)
 asm("vcvtb.f16.f32 %0, %1" : "=t" (output) : "x" (input));
 #else
 uint16_t output;
-asm("fcvt %h0, %s1" : "=w" (output) : "x" (input));
+asm("fcvt %h0, %s1" : "=w" (output) : "w" (input));
 #endif
 print_half_number(i, output);
 }
@@ -149,7 +149,7 @@ static void convert_single_to_double(void)
 #if defined(__arm__)
 asm("vcvt.f64.f32 %P0, %1" : "=w" (output) : "t" (input));
 #else
-asm("fcvt %d0, %s1" : "=w" (output) : "x" (input));
+asm("fcvt %d0, %s1" : "=w" (output) : "w" (input));
 #endif
 print_double_number(i, output);
 }
@@ -244,7 +244,7 @@ static void convert_double_to_half(void)
 /* asm("vcvtb.f16.f64 %0, %P1" : "=t" (output) : "x" (input)); */
 output = input;
 #else
-asm("fcvt %h0, %d1" : "=w" (output) : "x" (input));
+asm("fcvt %h0, %d1" : "=w" (output) : "w" (input));
 #endif
 print_half_number(i, output);
 }
@@ -267,7 +267,7 @@ static void convert_double_to_single(void)
 #if defined(__arm__)
 asm("vcvt.f32.f64 %0, %P1" : "=w" (output) : "x" (input));
 #else
-asm("fcvt %s0, %d1" : "=w" (output) : "x" (input));
+asm("fcvt %s0, %d1" : "=w" (output) : "w" (input));
 #endif
 
 print_single_number(i, output);
@@ -335,7 +335,7 @@ static void convert_half_to_double(void)
 /* asm("vcvtb.f64.f16 %P0, %1" : "=w" (output) : "t" (input)); */
 output = input;
 #else
-asm("fcvt %d0, %h1" : "=w" (output) : "x" (input));
+asm("fcvt %d0, %h1" : "=w" (output) : "w" (input));
 #endif
 print_double_number(i, output);
 }
@@ -357,7 +357,7 @@ static void convert_half_to_single(void)
 #if defined(__arm__)
 asm("vcvtb.f32.f16 %0, %1" : "=w" (output) : "x" ((uint32_t)input));
 #else
-asm("fcvt %s0, %h1" : "=w" (output) : "x" (input));
+asm("fcvt %s0, %h1" : "=w" (output) : "w" (input));
 #endif
 print_single_number(i, output);
 }
@@ -380,7 +380,7 @@ static void convert_half_to_integer(void)
 /* asm("vcvt.s32.f16 %0, %1" : "=t" (output) : "t" (input)); v8.2*/
 output = input;
 #else
-asm("fcvt %s0, %h1" : "=w" (output) : "x" (input));
+asm("fcvt %s0, %h1" : "=w" (output) : "w" (input));
 #endif
 print_int64(i, output);
 }
-- 
2.34.1

[PATCH v4 00/14] test/tcg: Clang build fixes for arm/aarch64

2024-06-30 Thread Richard Henderson

Supercedes: 20240629-tcg-v3-0-fa57918bd...@daynix.com
("[PATCH v3 0/7] tests/tcg/aarch64: Fix inline assemblies for clang")

On top of Akihiko's patches for aarch64, additional changes are
required for arm, both as a host and as a guest.


r~


Akihiko Odaki (5):
  tests/tcg/aarch64: Explicitly specify register width
  tests/tcg/aarch64: Fix irg operand type
  tests/tcg/aarch64: Do not use x constraint
  tests/tcg/arm: Fix fcvt result messages
  tests/tcg/arm: Manually register allocate half-precision numbers

Richard Henderson (9):
  tests/tcg/minilib: Constify digits in print_num
  tests/tcg: Adjust variable defintion from cc-option
  tests/tcg/aarch64: Drop -fno-tree-loop-distribute-patterns
  tests/tcg/aarch64: Add -fno-integrated-as for sme
  tests/tcg/arm: Drop -N from LDFLAGS
  tests/tcg/arm: Use -fno-integrated-as for test-arm-iwmmxt
  tests/tcg/arm: Use -march and -mfpu for fcvt
  tests/tcg/arm: Use vmrs/vmsr instead of mcr/mrc
  linux-user/main: Suppress out-of-range comparison warning for clang

 linux-user/main.c |   1 +
 tests/tcg/aarch64/bti-1.c |   6 +-
 tests/tcg/aarch64/bti-3.c |   6 +-
 tests/tcg/aarch64/mte-1.c |   2 +-
 tests/tcg/arm/fcvt.c  |  28 +-
 tests/tcg/minilib/printf.c|   2 +-
 tests/tcg/Makefile.target |   2 +-
 tests/tcg/aarch64/Makefile.softmmu-target |   4 +-
 tests/tcg/aarch64/Makefile.target |  18 +-
 tests/tcg/aarch64/fcvt.ref| 604 +++---
 tests/tcg/arm/Makefile.softmmu-target |   4 +-
 tests/tcg/arm/Makefile.target |  12 +-
 tests/tcg/arm/fcvt.ref| 604 +++---
 13 files changed, 653 insertions(+), 640 deletions(-)

-- 
2.34.1

[PATCH v4 02/14] tests/tcg: Adjust variable defintion from cc-option

2024-06-30 Thread Richard Henderson

Define the variable to the compiler flag used, not "y".
This avoids replication of the compiler flag itself.

Signed-off-by: Richard Henderson 
---
 tests/tcg/Makefile.target |  2 +-
 tests/tcg/aarch64/Makefile.softmmu-target |  2 +-
 tests/tcg/aarch64/Makefile.target | 15 ---
 3 files changed, 10 insertions(+), 9 deletions(-)

diff --git a/tests/tcg/Makefile.target b/tests/tcg/Makefile.target
index f21be50d3b..cb8cfeb6da 100644
--- a/tests/tcg/Makefile.target
+++ b/tests/tcg/Makefile.target
@@ -49,7 +49,7 @@ quiet-command = $(call quiet-@,$2,$3)$1
 
 cc-test = $(CC) -Werror $1 -c -o /dev/null -xc /dev/null >/dev/null 2>&1
 cc-option = if $(call cc-test, $1); then \
-echo "$(TARGET_PREFIX)$1 detected" && echo "$(strip $2)=y" >&3; else \
+echo "$(TARGET_PREFIX)$1 detected" && echo "$(strip $2)=$(strip $1)" >&3; 
else \
 echo "$(TARGET_PREFIX)$1 not detected"; fi
 
 # $1 = test name, $2 = cmd, $3 = desc
diff --git a/tests/tcg/aarch64/Makefile.softmmu-target 
b/tests/tcg/aarch64/Makefile.softmmu-target
index 4b03ef602e..39d3f961c5 100644
--- a/tests/tcg/aarch64/Makefile.softmmu-target
+++ b/tests/tcg/aarch64/Makefile.softmmu-target
@@ -81,7 +81,7 @@ run-memory-replay: memory-replay run-memory-record
 EXTRA_RUNS+=run-memory-replay
 
 ifneq ($(CROSS_CC_HAS_ARMV8_3),)
-pauth-3: CFLAGS += -march=armv8.3-a
+pauth-3: CFLAGS += $(CROSS_CC_HAS_ARMV8_3)
 else
 pauth-3:
$(call skip-test, "BUILD of $@", "missing compiler support")
diff --git a/tests/tcg/aarch64/Makefile.target 
b/tests/tcg/aarch64/Makefile.target
index 70d728ae9a..e6d5e008a8 100644
--- a/tests/tcg/aarch64/Makefile.target
+++ b/tests/tcg/aarch64/Makefile.target
@@ -32,17 +32,17 @@ config-cc.mak: Makefile
 
 ifneq ($(CROSS_CC_HAS_ARMV8_2),)
 AARCH64_TESTS += dcpop
-dcpop: CFLAGS += -march=armv8.2-a
+dcpop: CFLAGS += $(CROSS_CC_HAS_ARMV8_2)
 endif
 ifneq ($(CROSS_CC_HAS_ARMV8_5),)
 AARCH64_TESTS += dcpodp
-dcpodp: CFLAGS += -march=armv8.5-a
+dcpodp: CFLAGS += $(CROSS_CC_HAS_ARMV8_5)
 endif
 
 # Pauth Tests
 ifneq ($(CROSS_CC_HAS_ARMV8_3),)
 AARCH64_TESTS += pauth-1 pauth-2 pauth-4 pauth-5
-pauth-%: CFLAGS += -march=armv8.3-a
+pauth-%: CFLAGS += $(CROSS_CC_HAS_ARMV8_3)
 run-pauth-1: QEMU_OPTS += -cpu max
 run-pauth-2: QEMU_OPTS += -cpu max
 # Choose a cpu with FEAT_Pauth but without FEAT_FPAC for pauth-[45].
@@ -54,7 +54,7 @@ endif
 # bti-1 tests the elf notes, so we require special compiler support.
 ifneq ($(CROSS_CC_HAS_ARMV8_BTI),)
 AARCH64_TESTS += bti-1 bti-3
-bti-1 bti-3: CFLAGS += -fno-stack-protector -mbranch-protection=standard
+bti-1 bti-3: CFLAGS += -fno-stack-protector $(CROSS_CC_HAS_ARMV8_BTI)
 bti-1 bti-3: LDFLAGS += -nostdlib
 endif
 # bti-2 tests PROT_BTI, so no special compiler support required.
@@ -63,12 +63,13 @@ AARCH64_TESTS += bti-2
 # MTE Tests
 ifneq ($(CROSS_CC_HAS_ARMV8_MTE),)
 AARCH64_TESTS += mte-1 mte-2 mte-3 mte-4 mte-5 mte-6 mte-7
-mte-%: CFLAGS += -march=armv8.5-a+memtag
+mte-%: CFLAGS += $(CROSS_CC_HAS_ARMV8_MTE)
 endif
 
 # SME Tests
 ifneq ($(CROSS_AS_HAS_ARMV9_SME),)
 AARCH64_TESTS += sme-outprod1 sme-smopa-1 sme-smopa-2
+sme-outprod1 sme-smopa-1 sme-smopa-2: CFLAGS += $(CROSS_AS_HAS_ARMV9_SME)
 endif
 
 # System Registers Tests
@@ -98,7 +99,7 @@ TESTS += sha512-vector
 ifneq ($(CROSS_CC_HAS_SVE),)
 # SVE ioctl test
 AARCH64_TESTS += sve-ioctls
-sve-ioctls: CFLAGS+=-march=armv8.1-a+sve
+sve-ioctls: CFLAGS += $(CROSS_CC_HAS_SVE)
 
 sha512-sve: CFLAGS=-O3 -march=armv8.1-a+sve
 sha512-sve: sha512.c
@@ -133,7 +134,7 @@ endif
 
 ifneq ($(CROSS_CC_HAS_SVE2),)
 AARCH64_TESTS += test-826
-test-826: CFLAGS+=-march=armv8.1-a+sve2
+test-826: CFLAGS += $(CROSS_CC_HAS_SVE2)
 endif
 
 TESTS += $(AARCH64_TESTS)
-- 
2.34.1

[PATCH v4 14/14] linux-user/main: Suppress out-of-range comparison warning for clang

2024-06-30 Thread Richard Henderson

For arm32 host and arm64 guest we get

.../main.c:851:32: error: result of comparison of constant 70368744177664 with 
expression of type 'unsigned long' is always false 
[-Werror,-Wtautological-constant-out-of-range-compare]
if (TASK_UNMAPPED_BASE < reserved_va) {
~~ ^ ~~~

We already disable -Wtype-limits here, for this exact comparison, but
that is not enough for clang.  Disable -Wtautological-compare as well,
which is a superset.  GCC ignores the unknown warning flag.

Signed-off-by: Richard Henderson 
---
 linux-user/main.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/linux-user/main.c b/linux-user/main.c
index 94c99a1366..7d3cf45fa9 100644
--- a/linux-user/main.c
+++ b/linux-user/main.c
@@ -843,6 +843,7 @@ int main(int argc, char **argv, char **envp)
  */
 #pragma GCC diagnostic push
 #pragma GCC diagnostic ignored "-Wtype-limits"
+#pragma GCC diagnostic ignored "-Wtautological-compare"
 
 /*
  * Select an initial value for task_unmapped_base that is in range.
-- 
2.34.1

[PATCH v4 11/14] tests/tcg/arm: Manually register allocate half-precision numbers

2024-06-30 Thread Richard Henderson

From: Akihiko Odaki 

Clang does not allow specifying an integer as the value of a single
precision register.  Explicitly move value from a general register.

Signed-off-by: Akihiko Odaki 
[rth: Use one single inline asm block.]
Signed-off-by: Richard Henderson 
---
 tests/tcg/arm/fcvt.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/tests/tcg/arm/fcvt.c b/tests/tcg/arm/fcvt.c
index 157790e679..d8c61cd29f 100644
--- a/tests/tcg/arm/fcvt.c
+++ b/tests/tcg/arm/fcvt.c
@@ -355,7 +355,12 @@ static void convert_half_to_single(void)
 
 print_half_number(i, input);
 #if defined(__arm__)
-asm("vcvtb.f32.f16 %0, %1" : "=w" (output) : "x" ((uint32_t)input));
+/*
+ * Clang refuses to allocate an integer to a fp register.
+ * Perform the move from a general register by hand.
+ */
+asm("vmov %0, %1\n\t"
+"vcvtb.f32.f16 %0, %0" : "=w" (output) : "r" (input));
 #else
 asm("fcvt %s0, %h1" : "=w" (output) : "w" (input));
 #endif
-- 
2.34.1

[PATCH v4 09/14] tests/tcg/arm: Drop -N from LDFLAGS

2024-06-30 Thread Richard Henderson

This is redudant with a linker script, and is not
supported by clang.

Signed-off-by: Richard Henderson 
---
 tests/tcg/arm/Makefile.softmmu-target | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tests/tcg/arm/Makefile.softmmu-target 
b/tests/tcg/arm/Makefile.softmmu-target
index 39e01ce49d..547063c08c 100644
--- a/tests/tcg/arm/Makefile.softmmu-target
+++ b/tests/tcg/arm/Makefile.softmmu-target
@@ -13,7 +13,7 @@ VPATH += $(ARM_SRC)
 test-armv6m-undef: test-armv6m-undef.S
$(CC) -mcpu=cortex-m0 -mfloat-abi=soft \
-Wl,--build-id=none -x assembler-with-cpp \
-   $< -o $@ -nostdlib -N -static \
+   $< -o $@ -nostdlib -static \
-T $(ARM_SRC)/$@.ld
 
 run-test-armv6m-undef: QEMU_OPTS=-semihosting-config 
enable=on,target=native,chardev=output -M microbit -kernel
@@ -30,7 +30,7 @@ CRT_PATH=$(ARM_SRC)
 LINK_SCRIPT=$(ARM_SRC)/kernel.ld
 LDFLAGS=-Wl,-T$(LINK_SCRIPT)
 CFLAGS+=-nostdlib -ggdb -O0 $(MINILIB_INC)
-LDFLAGS+=-static -nostdlib -N $(CRT_OBJS) $(MINILIB_OBJS) -lgcc
+LDFLAGS+=-static -nostdlib $(CRT_OBJS) $(MINILIB_OBJS) -lgcc
 
 # building head blobs
 .PRECIOUS: $(CRT_OBJS)
-- 
2.34.1

[PATCH v4 08/14] tests/tcg/arm: Fix fcvt result messages

2024-06-30 Thread Richard Henderson

From: Akihiko Odaki 

The test cases for "converting double-precision to single-precision"
emits float but the result variable was typed as uint32_t and corrupted
the printed values. Propertly type it as float.

Signed-off-by: Akihiko Odaki 
Fixes: 8ec8a55e3fc9 ("tests/tcg/arm: add fcvt test cases for AArch32/64")
Message-Id: <20240627-tcg-v2-1-1690a8133...@daynix.com>
[rth: Update arm ref file as well]
Signed-off-by: Richard Henderson 
---
 tests/tcg/arm/fcvt.c   |   2 +-
 tests/tcg/aarch64/fcvt.ref | 604 ++---
 tests/tcg/arm/fcvt.ref | 604 ++---
 3 files changed, 605 insertions(+), 605 deletions(-)

diff --git a/tests/tcg/arm/fcvt.c b/tests/tcg/arm/fcvt.c
index f631197287..157790e679 100644
--- a/tests/tcg/arm/fcvt.c
+++ b/tests/tcg/arm/fcvt.c
@@ -258,7 +258,7 @@ static void convert_double_to_single(void)
 
 for (i = 0; i < ARRAY_SIZE(double_numbers); ++i) {
 double input = double_numbers[i].d;
-uint32_t output;
+float output;
 
 feclearexcept(FE_ALL_EXCEPT);
 
diff --git a/tests/tcg/aarch64/fcvt.ref b/tests/tcg/aarch64/fcvt.ref
index e7af24dc58..2726b41063 100644
--- a/tests/tcg/aarch64/fcvt.ref
+++ b/tests/tcg/aarch64/fcvt.ref
@@ -211,45 +211,45 @@ Converting double-precision to half-precision
 40   HALF: 0x7f00  (0x1 => INVALID)
 Converting double-precision to single-precision
 00 DOUBLE: nan / 0x007ff4 (0 => OK)
-00 SINGLE: 2.145386496000e+09 / 0x4effc000  (0x1 => INVALID)
+00 SINGLE: nan / 0x7fe0  (0x1 => INVALID)
 01 DOUBLE: -nan / 0x00fff8 (0 => OK)
-01 SINGLE: 4.290772992000e+09 / 0x4f7fc000  (0 => OK)
+01 SINGLE: -nan / 0xffc0  (0 => OK)
 02 DOUBLE: -inf / 0x00fff0 (0 => OK)
-02 SINGLE: 4.286578688000e+09 / 0x4f7f8000  (0 => OK)
+02 SINGLE: -inf / 0xff80  (0 => OK)
 03 DOUBLE: -1.79769313486231570815e+308 / 0x00ffef (0 => OK)
-03 SINGLE: 4.286578688000e+09 / 0x4f7f8000  (0x14 => OVERFLOW   
INEXACT )
+03 SINGLE: -inf / 0xff80  (0x14 => OVERFLOW   INEXACT )
 04 DOUBLE: -3.40282346638528859812e+38 / 0x00c7efe000 (0 => OK)
-04 SINGLE: 4.286578688000e+09 / 0x4f7f8000  (0x10 =>INEXACT )
+04 SINGLE: -3.40282346638528859812e+38 / 0xff7f  (0 => OK)
 05 DOUBLE: -3.40282346638528859812e+38 / 0x00c7efe000 (0 => OK)
-05 SINGLE: 4.286578688000e+09 / 0x4f7f8000  (0x10 =>INEXACT )
+05 SINGLE: -3.40282346638528859812e+38 / 0xff7f  (0 => OK)
 06 DOUBLE: -1.11107529e+31 / 0x00c661874b135ff654 (0 => OK)
-06 SINGLE: 4.077664768000e+09 / 0x4f730c3a  (0x10 =>INEXACT )
+06 SINGLE: -1.1114769645909791e+31 / 0xf30c3a59  (0x10 =>INEXACT )
 07 DOUBLE: -1.11099085e+30 / 0x00c62c0bab523323b9 (0 => OK)
-07 SINGLE: 4.04962432e+09 / 0x4f71605d  (0x10 =>INEXACT )
+07 SINGLE: -1.1113258488635273e+30 / 0xf1605d5b  (0x10 =>INEXACT )
 08 DOUBLE: -2.e+00 / 0x00c000 (0 => OK)
-08 SINGLE: 3.221225472000e+09 / 0x4f40  (0 => OK)
+08 SINGLE: -2.e+00 / 0xc000  (0 => OK)
 09 DOUBLE: -1.e+00 / 0x00bff0 (0 => OK)
-09 SINGLE: 3.212836864000e+09 / 0x4f3f8000  (0 => OK)
+09 SINGLE: -1.e+00 / 0xbf80  (0 => OK)
 10 DOUBLE: -2.22507385850720138309e-308 / 0x008010 (0 => OK)
-10 SINGLE: 2.147483648000e+09 / 0x4f00  (0x18 =>  UNDERFLOW  
INEXACT )
+10 SINGLE: -0.e+00 / 0x8000  (0x18 =>  UNDERFLOW  
INEXACT )
 11 DOUBLE: -1.17549435082228750797e-38 / 0x00b810 (0 => OK)
-11 SINGLE: 2.155872256000e+09 / 0x4f008000  (0 => OK)
+11 SINGLE: -1.17549435082228750797e-38 / 0x8080  (0 => OK)
 12 DOUBLE: 0.e+00 /  (0 => OK)
 12 SINGLE: 0.e+00 / 00  (0 => OK)
 13 DOUBLE: 1.17549435082228750797e-38 / 0x003810 (0 => OK)
-13 SINGLE: 8.38860800e+06 / 0x4b00  (0 => OK)
+13 SINGLE: 1.17549435082228750797e-38 / 0x0080  (0 => OK)
 14 DOUBLE: 2.9802322400013061e-08 / 0x003e60001c5f68 (0 => OK)
-14 SINGLE: 8.55638016e+08 / 0x4e4c  (0x10 =>INEXACT )
+14 SINGLE: 2.98023223876953125000e-08 / 0x3300  (0x10 =>INEXACT )
 15 DOUBLE: 5.960460015661e-08 / 0x003e6e6cb2fa82 (0 => OK)
-15 SINGLE: 8.64026624e+08 / 0x4e4e  (0x10 =>INEXACT )
+15 SINGLE: 5.96045985901128005935e-08 / 0x3373  (0x10 =>INEXACT )
 16 DOUBLE: 6.097559994299e-05 / 0x003f0ff801a9af58a1 (0 => OK)
-16 SINGLE: 9.47896320e+08 / 0x4e61ff00  (0x10 =>INEXACT )
+16 SINGLE: 6.09755988989491015673e-05 / 0x387fc00d  (0x10 =>INEXACT )
 17 DOUBLE: 6.103520013665e-05 / 0x003f10c06a1ef5 (0 => OK)
-17 SINGLE: 9.47912704e+08 / 0x4e62  (0x10 =>INEXACT )
+17 SINGLE:

[PATCH v4 07/14] tests/tcg/aarch64: Add -fno-integrated-as for sme

2024-06-30 Thread Richard Henderson

The only use of SME is inline assembly.  Both gcc and clang only
support SME with very recent releases; by deferring detection to
the assembler we get better test coverage.

Signed-off-by: Richard Henderson 
---
 tests/tcg/aarch64/Makefile.target | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/tests/tcg/aarch64/Makefile.target 
b/tests/tcg/aarch64/Makefile.target
index e6d5e008a8..8817ac262f 100644
--- a/tests/tcg/aarch64/Makefile.target
+++ b/tests/tcg/aarch64/Makefile.target
@@ -20,6 +20,7 @@ run-fcvt: fcvt
 
 config-cc.mak: Makefile
$(quiet-@)( \
+   fnia=`$(call cc-test,-fno-integrated-as) && echo 
-fno-integrated-as`; \
$(call cc-option,-march=armv8.1-a+sve,  CROSS_CC_HAS_SVE); \
$(call cc-option,-march=armv8.1-a+sve2, CROSS_CC_HAS_SVE2); 
\
$(call cc-option,-march=armv8.2-a,  
CROSS_CC_HAS_ARMV8_2); \
@@ -27,7 +28,7 @@ config-cc.mak: Makefile
$(call cc-option,-march=armv8.5-a,  
CROSS_CC_HAS_ARMV8_5); \
$(call cc-option,-mbranch-protection=standard,  
CROSS_CC_HAS_ARMV8_BTI); \
$(call cc-option,-march=armv8.5-a+memtag,   
CROSS_CC_HAS_ARMV8_MTE); \
-   $(call cc-option,-Wa$(COMMA)-march=armv9-a+sme, 
CROSS_AS_HAS_ARMV9_SME)) 3> config-cc.mak
+   $(call cc-option,-Wa$(COMMA)-march=armv9-a+sme $$fnia, 
CROSS_AS_HAS_ARMV9_SME)) 3> config-cc.mak
 -include config-cc.mak
 
 ifneq ($(CROSS_CC_HAS_ARMV8_2),)
-- 
2.34.1

Re: [PATCH v2 3/3] target/ppc : Update VSX storage access insns to use tcg_gen_qemu _ld/st_i128.

2024-06-30 Thread Richard Henderson


On 6/30/24 05:01, Chinmay Rath wrote:

@@ -2175,13 +2179,13 @@ static bool do_lstxv(DisasContext *ctx, int ra, TCGv 
displ,
   int rt, bool store, bool paired)
  {
  TCGv ea;
-TCGv_i64 xt;
+TCGv_i128 data;
  MemOp mop;
  int rt1, rt2;
  
-xt = tcg_temp_new_i64();

+data = tcg_temp_new_i128();
  
-mop = DEF_MEMOP(MO_UQ);

+mop = DEF_MEMOP(MO_128 | MO_ATOM_IFALIGN_PAIR);
  
  gen_set_access_type(ctx, ACCESS_INT);

  ea = do_ea_calc(ctx, ra, displ);
@@ -2195,32 +2199,20 @@ static bool do_lstxv(DisasContext *ctx, int ra, TCGv 
displ,
  }
  
  if (store) {

-get_cpu_vsr(xt, rt1, !ctx->le_mode);
-tcg_gen_qemu_st_i64(xt, ea, ctx->mem_idx, mop);
-gen_addr_add(ctx, ea, ea, 8);
-get_cpu_vsr(xt, rt1, ctx->le_mode);
-tcg_gen_qemu_st_i64(xt, ea, ctx->mem_idx, mop);
+get_vsr_full(data, rt1);
+tcg_gen_qemu_st_i128(data, ea, ctx->mem_idx, mop);
  if (paired) {
  gen_addr_add(ctx, ea, ea, 8);


The increment needs updating to 16.


-get_cpu_vsr(xt, rt2, !ctx->le_mode);
-tcg_gen_qemu_st_i64(xt, ea, ctx->mem_idx, mop);
-gen_addr_add(ctx, ea, ea, 8);
-get_cpu_vsr(xt, rt2, ctx->le_mode);
-tcg_gen_qemu_st_i64(xt, ea, ctx->mem_idx, mop);
+get_vsr_full(data, rt2);
+tcg_gen_qemu_st_i128(data, ea, ctx->mem_idx, mop);
  }
  } else {
-tcg_gen_qemu_ld_i64(xt, ea, ctx->mem_idx, mop);
-set_cpu_vsr(rt1, xt, !ctx->le_mode);
-gen_addr_add(ctx, ea, ea, 8);
-tcg_gen_qemu_ld_i64(xt, ea, ctx->mem_idx, mop);
-set_cpu_vsr(rt1, xt, ctx->le_mode);
+tcg_gen_qemu_ld_i128(data, ea, ctx->mem_idx, mop);
+set_vsr_full(rt1, data);
  if (paired) {
  gen_addr_add(ctx, ea, ea, 8);


Likewise.

With those fixed,
Reviewed-by: Richard Henderson 


r~

[PATCH] hw/char/pl011: ensure UARTIBRD register is 16-bit

2024-06-30 Thread Zheyu Ma

The PL011 TRM says that "The 16-bit integer is written to the Integer Baud Rate
Register, UARTIBRD". Updated the handling of the UARTIBRD register to ensure
only 16-bit values are written to it.

ASAN log:
==2973125==ERROR: AddressSanitizer: FPE on unknown address 0x55f72629b348 (pc 
0x55f72629b348 bp 0x7fffa24d0e00 sp 0x7fffa24d0d60 T0)
#0 0x55f72629b348 in pl011_get_baudrate hw/char/pl011.c:255:17
#1 0x55f726298d94 in pl011_trace_baudrate_change hw/char/pl011.c:260:33
#2 0x55f726296fc8 in pl011_write hw/char/pl011.c:378:9

Reproducer:
cat << EOF | qemu-system-aarch64 -display \
none -machine accel=qtest, -m 512M -machine realview-pb-a8 -qtest stdio
writeq 0x1000b024 0xf800
EOF

Signed-off-by: Zheyu Ma 
---
 hw/char/pl011.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/char/pl011.c b/hw/char/pl011.c
index 8753b84a84..f962786e2a 100644
--- a/hw/char/pl011.c
+++ b/hw/char/pl011.c
@@ -374,7 +374,7 @@ static void pl011_write(void *opaque, hwaddr offset,
 s->ilpr = value;
 break;
 case 9: /* UARTIBRD */
-s->ibrd = value;
+s->ibrd = value & 0x;
 pl011_trace_baudrate_change(s);
 break;
 case 10: /* UARTFBRD */
-- 
2.34.1

Re: [PATCH v2 2/3] target/ppc: Update VMX storage access insns to use tcg_gen_qemu_ld/st_i128.

2024-06-30 Thread Richard Henderson


On 6/30/24 05:01, Chinmay Rath wrote:

Updated instructions {l, st}vx to use tcg_gen_qemu_ld/st_i128,
instead of using 64 bits loads/stores in succession.
Introduced functions {get, set}_avr_full in vmx-impl.c.inc to
facilitate the above, and potential future usage.

Suggested-by: Richard Henderson
Signed-off-by: Chinmay Rath
---
  target/ppc/translate/vmx-impl.c.inc | 42 ++---
  1 file changed, 20 insertions(+), 22 deletions(-)


Reviewed-by: Richard Henderson 


r~

Re: [PATCH v2 1/3] target/ppc: Move get/set_avr64 functions to vmx-impl.c.inc.

2024-06-30 Thread Richard Henderson


On 6/30/24 05:01, Chinmay Rath wrote:

Those functions are used to ld/st data to and from Altivec registers,
in 64 bits chunks, and are only used in vmx-impl.c.inc file,
hence the clean-up movement.

Signed-off-by: Chinmay Rath
---
  target/ppc/translate.c  | 10 --
  target/ppc/translate/vmx-impl.c.inc | 10 ++
  2 files changed, 10 insertions(+), 10 deletions(-)


Reviewed-by: Richard Henderson 

r~

[PULL 15/16] vl.c: select_machine(): add selected machine type to error message

2024-06-30 Thread Michael Tokarev

From: Vladimir Sementsov-Ogievskiy 

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Michael Tokarev 
Signed-off-by: Michael Tokarev 
---
 system/vl.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/system/vl.c b/system/vl.c
index 92fc29c193..bdd2f6ecf6 100644
--- a/system/vl.c
+++ b/system/vl.c
@@ -1674,7 +1674,7 @@ static MachineClass *select_machine(QDict *qdict, Error 
**errp)
 machine_class = find_machine(machine_type, machines);
 qdict_del(qdict, "type");
 if (!machine_class) {
-error_setg(errp, "unsupported machine type");
+error_setg(errp, "unsupported machine type: \"%s\"", optarg);
 }
 } else {
 machine_class = find_default_machine(machines);
-- 
2.39.2

[PULL 05/16] monitor: Remove obsolete stubs

2024-06-30 Thread Michael Tokarev

From: Philippe Mathieu-Daudé 

hmp_info_roms() was removed in commit dd98234c05 ("qapi:
introduce x-query-roms QMP command"),

hmp_info_numa() in commit 1b8ae799d8 ("qapi: introduce
x-query-numa QMP command"),

hmp_info_ramblock() in commit ca411b7c8a ("qapi: introduce
x-query-ramblock QMP command")

and hmp_info_irq() in commit 91f2fa7045 ("qapi: introduce
x-query-irq QMP command").

Signed-off-by: Philippe Mathieu-Daudé 
Reviewed-by: Daniel P. Berrangé 
Reviewed-by: Dr. David Alan Gilbert 
Reviewed-by: Michael Tokarev 
Signed-off-by: Michael Tokarev 
---
 include/hw/loader.h   | 1 -
 include/monitor/hmp.h | 3 ---
 2 files changed, 4 deletions(-)

diff --git a/include/hw/loader.h b/include/hw/loader.h
index 8685e27334..9844c5e3cf 100644
--- a/include/hw/loader.h
+++ b/include/hw/loader.h
@@ -338,7 +338,6 @@ void *rom_ptr(hwaddr addr, size_t size);
  * rom_ptr().
  */
 void *rom_ptr_for_as(AddressSpace *as, hwaddr addr, size_t size);
-void hmp_info_roms(Monitor *mon, const QDict *qdict);
 
 #define rom_add_file_fixed(_f, _a, _i)  \
 rom_add_file(_f, NULL, _a, _i, false, NULL, NULL)
diff --git a/include/monitor/hmp.h b/include/monitor/hmp.h
index 954f3c83ad..ae116d9804 100644
--- a/include/monitor/hmp.h
+++ b/include/monitor/hmp.h
@@ -35,7 +35,6 @@ void hmp_info_cpus(Monitor *mon, const QDict *qdict);
 void hmp_info_vnc(Monitor *mon, const QDict *qdict);
 void hmp_info_spice(Monitor *mon, const QDict *qdict);
 void hmp_info_balloon(Monitor *mon, const QDict *qdict);
-void hmp_info_irq(Monitor *mon, const QDict *qdict);
 void hmp_info_pic(Monitor *mon, const QDict *qdict);
 void hmp_info_pci(Monitor *mon, const QDict *qdict);
 void hmp_info_tpm(Monitor *mon, const QDict *qdict);
@@ -102,7 +101,6 @@ void hmp_chardev_send_break(Monitor *mon, const QDict 
*qdict);
 void hmp_object_add(Monitor *mon, const QDict *qdict);
 void hmp_object_del(Monitor *mon, const QDict *qdict);
 void hmp_info_memdev(Monitor *mon, const QDict *qdict);
-void hmp_info_numa(Monitor *mon, const QDict *qdict);
 void hmp_info_memory_devices(Monitor *mon, const QDict *qdict);
 void hmp_qom_list(Monitor *mon, const QDict *qdict);
 void hmp_qom_get(Monitor *mon, const QDict *qdict);
@@ -141,7 +139,6 @@ void hmp_rocker_ports(Monitor *mon, const QDict *qdict);
 void hmp_rocker_of_dpa_flows(Monitor *mon, const QDict *qdict);
 void hmp_rocker_of_dpa_groups(Monitor *mon, const QDict *qdict);
 void hmp_info_dump(Monitor *mon, const QDict *qdict);
-void hmp_info_ramblock(Monitor *mon, const QDict *qdict);
 void hmp_hotpluggable_cpus(Monitor *mon, const QDict *qdict);
 void hmp_info_vm_generation_id(Monitor *mon, const QDict *qdict);
 void hmp_info_memory_size_summary(Monitor *mon, const QDict *qdict);
-- 
2.39.2

[PULL 00/16] Trivial patches for 2024-06-30

2024-06-30 Thread Michael Tokarev

The following changes since commit 3665dd6bb9043bef181c91e2dce9e1efff47ed51:

  Merge tag 'for-upstream' of https://gitlab.com/bonzini/qemu into staging 
(2024-06-28 16:09:38 -0700)

are available in the Git repository at:

  https://gitlab.com/mjt0k/qemu.git tags/pull-trivial-patches

for you to fetch changes up to f22855dffdbc2906f744b5bcfea869cbb66b8fb2:

  hw/core/loader: gunzip(): fix memory leak on error path (2024-06-30 19:51:44 
+0300)


trivial patches for 2024-06-30

It's been some time since the previous trivial-patches pullreq,
and the queue grew a bit


Dr. David Alan Gilbert (4):
  linux-user: cris: Remove unused struct 'rt_signal_frame'
  linux-user: sparc: Remove unused struct 'target_mc_fq'
  hw/arm/bcm2836: Remove unusued struct 'BCM283XClass'
  net/can: Remove unused struct 'CanBusState'

Hyeongtak Ji (1):
  docs/cxl: fix some typos

Martin Joerg (1):
  hmp-commands-info.hx: Add missing info command for stats subcommand

Matheus Tavares Bernardino (1):
  cpu: fix memleak of 'halt_cond' and 'thread'

Philippe Mathieu-Daudé (1):
  monitor: Remove obsolete stubs

Thomas Huth (1):
  docs/system/devices/usb: Replace the non-existing "qemu" binary

Trent Huber (1):
  os-posix: Expand setrlimit() syscall compatibility

Vladimir Sementsov-Ogievskiy (4):
  vl.c: select_machine(): use ERRP_GUARD instead of error propagation
  vl.c: select_machine(): use g_autoptr
  vl.c: select_machine(): add selected machine type to error message
  hw/core/loader: gunzip(): fix memory leak on error path

Zide Chen (2):
  vl: Allow multiple -overcommit commands
  target/i386: Advertise MWAIT iff host supports

 accel/tcg/tcg-accel-ops-rr.c |  1 +
 docs/system/devices/cxl.rst  |  6 +++---
 docs/system/devices/usb.rst  |  2 +-
 hmp-commands-info.hx |  2 +-
 hw/arm/bcm2836.c | 12 
 hw/core/cpu-common.c |  3 +++
 hw/core/loader.c |  1 +
 include/hw/loader.h  |  1 -
 include/monitor/hmp.h|  3 ---
 linux-user/cris/signal.c |  8 
 linux-user/sparc/signal.c|  5 -
 net/can/can_host.c   |  6 --
 os-posix.c   |  4 
 system/vl.c  | 21 ++---
 target/i386/host-cpu.c   | 12 
 target/i386/kvm/kvm-cpu.c| 11 +--
 16 files changed, 33 insertions(+), 65 deletions(-)

[PULL 13/16] vl.c: select_machine(): use ERRP_GUARD instead of error propagation

2024-06-30 Thread Michael Tokarev

From: Vladimir Sementsov-Ogievskiy 

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Michael Tokarev 
Signed-off-by: Michael Tokarev 
---
 system/vl.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/system/vl.c b/system/vl.c
index 4dc862652f..fda93d150c 100644
--- a/system/vl.c
+++ b/system/vl.c
@@ -1665,28 +1665,28 @@ static const QEMUOption *lookup_opt(int argc, char 
**argv,
 
 static MachineClass *select_machine(QDict *qdict, Error **errp)
 {
+ERRP_GUARD();
 const char *machine_type = qdict_get_try_str(qdict, "type");
 GSList *machines = object_class_get_list(TYPE_MACHINE, false);
-MachineClass *machine_class;
-Error *local_err = NULL;
+MachineClass *machine_class = NULL;
 
 if (machine_type) {
 machine_class = find_machine(machine_type, machines);
 qdict_del(qdict, "type");
 if (!machine_class) {
-error_setg(_err, "unsupported machine type");
+error_setg(errp, "unsupported machine type");
 }
 } else {
 machine_class = find_default_machine(machines);
 if (!machine_class) {
-error_setg(_err, "No machine specified, and there is no 
default");
+error_setg(errp, "No machine specified, and there is no default");
 }
 }
 
 g_slist_free(machines);
-if (local_err) {
-error_append_hint(_err, "Use -machine help to list supported 
machines\n");
-error_propagate(errp, local_err);
+if (!machine_class) {
+error_append_hint(errp,
+  "Use -machine help to list supported machines\n");
 }
 return machine_class;
 }
-- 
2.39.2

[PULL 03/16] vl: Allow multiple -overcommit commands

2024-06-30 Thread Michael Tokarev

From: Zide Chen 

Both cpu-pm and mem-lock are related to system resource overcommit, but
they are separate from each other, in terms of how they are realized,
and of course, they are applied to different system resources.

It's tempting to use separate command lines to specify their behavior.
e.g., in the following example, the cpu-pm command is quietly
overwritten, and it's not easy to notice it without careful inspection.

  --overcommit mem-lock=on
  --overcommit cpu-pm=on

Fixes: c8c9dc42b7ca ("Remove the deprecated -realtime option")
Suggested-by: Thomas Huth 
Signed-off-by: Zide Chen 
Reviewed-by: Thomas Huth 
Reviewed-by: Zhao Liu 
Reviewed-by: Igor Mammedov 
Reviewed-by: Michael Tokarev 
Signed-off-by: Michael Tokarev 
---
 system/vl.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/system/vl.c b/system/vl.c
index cfcb674425..4dc862652f 100644
--- a/system/vl.c
+++ b/system/vl.c
@@ -3546,8 +3546,8 @@ void qemu_init(int argc, char **argv)
 if (!opts) {
 exit(1);
 }
-enable_mlock = qemu_opt_get_bool(opts, "mem-lock", false);
-enable_cpu_pm = qemu_opt_get_bool(opts, "cpu-pm", false);
+enable_mlock = qemu_opt_get_bool(opts, "mem-lock", 
enable_mlock);
+enable_cpu_pm = qemu_opt_get_bool(opts, "cpu-pm", 
enable_cpu_pm);
 break;
 case QEMU_OPTION_compat:
 {
-- 
2.39.2

[PULL 10/16] os-posix: Expand setrlimit() syscall compatibility

2024-06-30 Thread Michael Tokarev

From: Trent Huber 

Darwin uses a subtly different version of the setrlimit() syscall as
described in the COMPATIBILITY section of the macOS man page. The value
of the rlim_cur member has been adjusted accordingly for Darwin-based
systems.

Signed-off-by: Trent Huber 
Tested-by: Philippe Mathieu-Daudé 
Reviewed-by: Daniel P. Berrangé 
Reviewed-by: Michael Tokarev 
Signed-off-by: Michael Tokarev 
---
 os-posix.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/os-posix.c b/os-posix.c
index a4284e2c07..43f9a43f3f 100644
--- a/os-posix.c
+++ b/os-posix.c
@@ -270,7 +270,11 @@ void os_setup_limits(void)
 return;
 }
 
+#ifdef CONFIG_DARWIN
+nofile.rlim_cur = OPEN_MAX < nofile.rlim_max ? OPEN_MAX : nofile.rlim_max;
+#else
 nofile.rlim_cur = nofile.rlim_max;
+#endif
 
 if (setrlimit(RLIMIT_NOFILE, ) < 0) {
 warn_report("unable to set NOFILE limit: %s", strerror(errno));
-- 
2.39.2

[PULL 08/16] hw/arm/bcm2836: Remove unusued struct 'BCM283XClass'

2024-06-30 Thread Michael Tokarev

From: "Dr. David Alan Gilbert" 

This struct has been unused since
Commit f932093ae165 ("hw/arm/bcm2836: Split out common part of BCM283X
classes")

Signed-off-by: Dr. David Alan Gilbert 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Michael Tokarev 
Signed-off-by: Michael Tokarev 
---
 hw/arm/bcm2836.c | 12 
 1 file changed, 12 deletions(-)

diff --git a/hw/arm/bcm2836.c b/hw/arm/bcm2836.c
index db191661f2..40a379bc36 100644
--- a/hw/arm/bcm2836.c
+++ b/hw/arm/bcm2836.c
@@ -18,18 +18,6 @@
 #include "target/arm/cpu-qom.h"
 #include "target/arm/gtimer.h"
 
-struct BCM283XClass {
-/*< private >*/
-DeviceClass parent_class;
-/*< public >*/
-const char *name;
-const char *cpu_type;
-unsigned core_count;
-hwaddr peri_base; /* Peripheral base address seen by the CPU */
-hwaddr ctrl_base; /* Interrupt controller and mailboxes etc. */
-int clusterid;
-};
-
 static Property bcm2836_enabled_cores_property =
 DEFINE_PROP_UINT32("enabled-cpus", BCM283XBaseState, enabled_cpus, 0);
 
-- 
2.39.2

[PULL 04/16] target/i386: Advertise MWAIT iff host supports

2024-06-30 Thread Michael Tokarev

From: Zide Chen 

host_cpu_realizefn() sets CPUID_EXT_MONITOR without consulting host/KVM
capabilities. This may cause problems:

- If MWAIT/MONITOR is not available on the host, advertising this
  feature to the guest and executing MWAIT/MONITOR from the guest
  triggers #UD and the guest doesn't boot.  This is because typically
  #UD takes priority over VM-Exit interception checks and KVM doesn't
  emulate MONITOR/MWAIT on #UD.

- If KVM doesn't support KVM_X86_DISABLE_EXITS_MWAIT, MWAIT/MONITOR
  from the guest are intercepted by KVM, which is not what cpu-pm=on
  intends to do.

In these cases, MWAIT/MONITOR should not be exposed to the guest.

The logic in kvm_arch_get_supported_cpuid() to handle CPUID_EXT_MONITOR
is correct and sufficient, and we can't set CPUID_EXT_MONITOR after
x86_cpu_filter_features().

This was not an issue before commit 662175b91ff ("i386: reorder call to
cpu_exec_realizefn") because the feature added in the accel-specific
realizefn could be checked against host availability and filtered out.

Additionally, it seems not a good idea to handle guest CPUID leaves in
host_cpu_realizefn(), and this patch merges host_cpu_enable_cpu_pm()
into kvm_cpu_realizefn().

Fixes: f5cc5a5c1686 ("i386: split cpu accelerators from cpu.c, using 
AccelCPUClass")
Fixes: 662175b91ff2 ("i386: reorder call to cpu_exec_realizefn")
Signed-off-by: Zide Chen 
Reviewed-by: Zhao Liu 
Reviewed-by: Xiaoyao Li 
Reviewed-by: Igor Mammedov 
Reviewed-by: Michael Tokarev 
Signed-off-by: Michael Tokarev 
---
 target/i386/host-cpu.c| 12 
 target/i386/kvm/kvm-cpu.c | 11 +--
 2 files changed, 9 insertions(+), 14 deletions(-)

diff --git a/target/i386/host-cpu.c b/target/i386/host-cpu.c
index 280e427c01..8b8bf5afec 100644
--- a/target/i386/host-cpu.c
+++ b/target/i386/host-cpu.c
@@ -42,15 +42,6 @@ static uint32_t host_cpu_phys_bits(void)
 return host_phys_bits;
 }
 
-static void host_cpu_enable_cpu_pm(X86CPU *cpu)
-{
-CPUX86State *env = >env;
-
-host_cpuid(5, 0, >mwait.eax, >mwait.ebx,
-   >mwait.ecx, >mwait.edx);
-env->features[FEAT_1_ECX] |= CPUID_EXT_MONITOR;
-}
-
 static uint32_t host_cpu_adjust_phys_bits(X86CPU *cpu)
 {
 uint32_t host_phys_bits = host_cpu_phys_bits();
@@ -83,9 +74,6 @@ bool host_cpu_realizefn(CPUState *cs, Error **errp)
 X86CPU *cpu = X86_CPU(cs);
 CPUX86State *env = >env;
 
-if (cpu->max_features && enable_cpu_pm) {
-host_cpu_enable_cpu_pm(cpu);
-}
 if (env->features[FEAT_8000_0001_EDX] & CPUID_EXT2_LM) {
 uint32_t phys_bits = host_cpu_adjust_phys_bits(cpu);
 
diff --git a/target/i386/kvm/kvm-cpu.c b/target/i386/kvm/kvm-cpu.c
index f9b99b5f50..d57a68a301 100644
--- a/target/i386/kvm/kvm-cpu.c
+++ b/target/i386/kvm/kvm-cpu.c
@@ -64,8 +64,15 @@ static bool kvm_cpu_realizefn(CPUState *cs, Error **errp)
  *   cpu_common_realizefn() (via xcc->parent_realize)
  */
 if (cpu->max_features) {
-if (enable_cpu_pm && kvm_has_waitpkg()) {
-env->features[FEAT_7_0_ECX] |= CPUID_7_0_ECX_WAITPKG;
+if (enable_cpu_pm) {
+if (kvm_has_waitpkg()) {
+env->features[FEAT_7_0_ECX] |= CPUID_7_0_ECX_WAITPKG;
+}
+
+if (env->features[FEAT_1_ECX] & CPUID_EXT_MONITOR) {
+host_cpuid(5, 0, >mwait.eax, >mwait.ebx,
+   >mwait.ecx, >mwait.edx);
+   }
 }
 if (cpu->ucode_rev == 0) {
 cpu->ucode_rev =
-- 
2.39.2

[PULL 11/16] docs/cxl: fix some typos

2024-06-30 Thread Michael Tokarev

From: Hyeongtak Ji 

This patch corrects minor typographical errors to ensure the ASCII art
aligns with the explanations provided.  Specifically, it fixes an
incorrect root port reference and removes redundant words.

Signed-off-by: Hyeongtak Ji 
Signed-off-by: Michael Tokarev 
---
 docs/system/devices/cxl.rst | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/docs/system/devices/cxl.rst b/docs/system/devices/cxl.rst
index 10a0e9bc9f..882b036f5e 100644
--- a/docs/system/devices/cxl.rst
+++ b/docs/system/devices/cxl.rst
@@ -218,17 +218,17 @@ Notes:
 A complex configuration here, might be to use the following HDM
 decoders in HB0. HDM0 routes CFMW0 requests to RP0 and hence
 part of CXL Type3 0. HDM1 routes CFMW0 requests from a
-different region of the CFMW0 PA range to RP2 and hence part
+different region of the CFMW0 PA range to RP1 and hence part
 of CXL Type 3 1.  HDM2 routes yet another PA range from within
 CFMW0 to be interleaved across RP0 and RP1, providing 2 way
 interleave of part of the memory provided by CXL Type3 0 and
 CXL Type 3 1. HDM3 routes those interleaved accesses from
 CFMW1 that target HB0 to RP 0 and another part of the memory of
 CXL Type 3 0 (as part of a 2 way interleave at the system level
-across for example CXL Type3 0 and CXL Type3 2.
+across for example CXL Type3 0 and CXL Type3 2).
 HDM4 is used to enable system wide 4 way interleave across all
 the present CXL type3 devices, by interleaving those (interleaved)
-requests that HB0 receives from from CFMW1 across RP 0 and
+requests that HB0 receives from CFMW1 across RP 0 and
 RP 1 and hence to yet more regions of the memory of the
 attached Type3 devices.  Note this is a representative subset
 of the full range of possible HDM decoder configurations in this
-- 
2.39.2

[PULL 16/16] hw/core/loader: gunzip(): fix memory leak on error path

2024-06-30 Thread Michael Tokarev

From: Vladimir Sementsov-Ogievskiy 

We should call inflateEnd() like on success path to cleanup state in s
variable.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Michael Tokarev 
Signed-off-by: Michael Tokarev 
---
 hw/core/loader.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/hw/core/loader.c b/hw/core/loader.c
index 2f8105d7de..a3bea1e718 100644
--- a/hw/core/loader.c
+++ b/hw/core/loader.c
@@ -610,6 +610,7 @@ ssize_t gunzip(void *dst, size_t dstlen, uint8_t *src, 
size_t srclen)
 r = inflate(, Z_FINISH);
 if (r != Z_OK && r != Z_STREAM_END) {
 printf ("Error: inflate() returned %d\n", r);
+inflateEnd();
 return -1;
 }
 dstbytes = s.next_out - (unsigned char *) dst;
-- 
2.39.2

[PULL 12/16] docs/system/devices/usb: Replace the non-existing "qemu" binary

2024-06-30 Thread Michael Tokarev

From: Thomas Huth 

We don't ship a binary that is simply called "qemu", so we should
avoid this in the documentation. Use the configurable binary name
via "|qemu_system|" instead.

Signed-off-by: Thomas Huth 
Reviewed-by: Michael Tokarev 
Signed-off-by: Michael Tokarev 
---
 docs/system/devices/usb.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/system/devices/usb.rst b/docs/system/devices/usb.rst
index a6ca7b0c37..dc694d23c2 100644
--- a/docs/system/devices/usb.rst
+++ b/docs/system/devices/usb.rst
@@ -18,7 +18,7 @@ emulation uses less resources (especially CPU).  So if your 
guest
 supports XHCI (which should be the case for any operating system
 released around 2010 or later) we recommend using it:
 
-qemu -device qemu-xhci
+|qemu_system| -device qemu-xhci
 
 XHCI supports USB 1.1, USB 2.0 and USB 3.0 devices, so this is the
 only controller you need.  With only a single USB controller (and
-- 
2.39.2

[PULL 07/16] linux-user: sparc: Remove unused struct 'target_mc_fq'

2024-06-30 Thread Michael Tokarev

From: "Dr. David Alan Gilbert" 

This struct is unused since Peter's
Commit b8ae597f0e6d ("linux-user/sparc: Fix errors in target_ucontext
structures")

However, hmm, I'm a bit confused since that commit modifies the
structure and then removes it, was that intentional?

Signed-off-by: Dr. David Alan Gilbert 
Reviewed-by: Michael Tokarev 
Signed-off-by: Michael Tokarev 
---
 linux-user/sparc/signal.c | 5 -
 1 file changed, 5 deletions(-)

diff --git a/linux-user/sparc/signal.c b/linux-user/sparc/signal.c
index f164b74032..8181b8b92c 100644
--- a/linux-user/sparc/signal.c
+++ b/linux-user/sparc/signal.c
@@ -546,11 +546,6 @@ void setup_sigtramp(abi_ulong sigtramp_page)
 typedef abi_ulong target_mc_greg_t;
 typedef target_mc_greg_t target_mc_gregset_t[SPARC_MC_NGREG];
 
-struct target_mc_fq {
-abi_ulong mcfq_addr;
-uint32_t mcfq_insn;
-};
-
 /*
  * Note the manual 16-alignment; the kernel gets this because it
  * includes a "long double qregs[16]" in the mcpu_fregs union,
-- 
2.39.2

[PULL 14/16] vl.c: select_machine(): use g_autoptr

2024-06-30 Thread Michael Tokarev

From: Vladimir Sementsov-Ogievskiy 

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Michael Tokarev 
Signed-off-by: Michael Tokarev 
---
 system/vl.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/system/vl.c b/system/vl.c
index fda93d150c..92fc29c193 100644
--- a/system/vl.c
+++ b/system/vl.c
@@ -1667,7 +1667,7 @@ static MachineClass *select_machine(QDict *qdict, Error 
**errp)
 {
 ERRP_GUARD();
 const char *machine_type = qdict_get_try_str(qdict, "type");
-GSList *machines = object_class_get_list(TYPE_MACHINE, false);
+g_autoptr(GSList) machines = object_class_get_list(TYPE_MACHINE, false);
 MachineClass *machine_class = NULL;
 
 if (machine_type) {
@@ -1683,7 +1683,6 @@ static MachineClass *select_machine(QDict *qdict, Error 
**errp)
 }
 }
 
-g_slist_free(machines);
 if (!machine_class) {
 error_append_hint(errp,
   "Use -machine help to list supported machines\n");
-- 
2.39.2

[PULL 01/16] hmp-commands-info.hx: Add missing info command for stats subcommand

2024-06-30 Thread Michael Tokarev

From: Martin Joerg 

Signed-off-by: Martin Joerg 
Reviewed-by: Dr. David Alan Gilbert 
Reviewed-by: Michael Tokarev 
Signed-off-by: Michael Tokarev 
---
 hmp-commands-info.hx | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hmp-commands-info.hx b/hmp-commands-info.hx
index cfd4ad5651..c59cd6637b 100644
--- a/hmp-commands-info.hx
+++ b/hmp-commands-info.hx
@@ -892,7 +892,7 @@ ERST
 },
 
 SRST
-  ``stats``
+  ``info stats``
 Show runtime-collected statistics
 ERST
 
-- 
2.39.2

[PULL 06/16] linux-user: cris: Remove unused struct 'rt_signal_frame'

2024-06-30 Thread Michael Tokarev

From: "Dr. David Alan Gilbert" 

Since 'setup_rt_frame' has never been implemented, this struct
is unused.

Signed-off-by: Dr. David Alan Gilbert 
Reviewed-by: Richard Henderson 
Reviewed-by: Michael Tokarev 
Signed-off-by: Michael Tokarev 
---
 linux-user/cris/signal.c | 8 
 1 file changed, 8 deletions(-)

diff --git a/linux-user/cris/signal.c b/linux-user/cris/signal.c
index 4f532b2903..10948bcf30 100644
--- a/linux-user/cris/signal.c
+++ b/linux-user/cris/signal.c
@@ -35,14 +35,6 @@ struct target_signal_frame {
 uint16_t retcode[4];  /* Trampoline code. */
 };
 
-struct rt_signal_frame {
-siginfo_t *pinfo;
-void *puc;
-siginfo_t info;
-ucontext_t uc;
-uint16_t retcode[4];  /* Trampoline code. */
-};
-
 static void setup_sigcontext(struct target_sigcontext *sc, CPUCRISState *env)
 {
 __put_user(env->regs[0], >regs.r0);
-- 
2.39.2

[PULL 02/16] cpu: fix memleak of 'halt_cond' and 'thread'

2024-06-30 Thread Michael Tokarev

From: Matheus Tavares Bernardino 

Since a4c2735f35 (cpu: move Qemu[Thread|Cond] setup into common code,
2024-05-30) these fields are now allocated at cpu_common_initfn(). So
let's make sure we also free them at cpu_common_finalize().

Furthermore, the code also frees these on round robin, but we missed
'halt_cond'.

Signed-off-by: Matheus Tavares Bernardino 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Pierrick Bouvier 
Reviewed-by: Michael Tokarev 
Signed-off-by: Michael Tokarev 
---
 accel/tcg/tcg-accel-ops-rr.c | 1 +
 hw/core/cpu-common.c | 3 +++
 2 files changed, 4 insertions(+)

diff --git a/accel/tcg/tcg-accel-ops-rr.c b/accel/tcg/tcg-accel-ops-rr.c
index 84c36c1450..48c38714bd 100644
--- a/accel/tcg/tcg-accel-ops-rr.c
+++ b/accel/tcg/tcg-accel-ops-rr.c
@@ -329,6 +329,7 @@ void rr_start_vcpu_thread(CPUState *cpu)
 /* we share the thread, dump spare data */
 g_free(cpu->thread);
 qemu_cond_destroy(cpu->halt_cond);
+g_free(cpu->halt_cond);
 cpu->thread = single_tcg_cpu_thread;
 cpu->halt_cond = single_tcg_halt_cond;
 
diff --git a/hw/core/cpu-common.c b/hw/core/cpu-common.c
index bf1a7b8892..f131cde2c0 100644
--- a/hw/core/cpu-common.c
+++ b/hw/core/cpu-common.c
@@ -286,6 +286,9 @@ static void cpu_common_finalize(Object *obj)
 g_array_free(cpu->gdb_regs, TRUE);
 qemu_lockcnt_destroy(>in_ioctl_lock);
 qemu_mutex_destroy(>work_mutex);
+qemu_cond_destroy(cpu->halt_cond);
+g_free(cpu->halt_cond);
+g_free(cpu->thread);
 }
 
 static int64_t cpu_common_get_arch_id(CPUState *cpu)
-- 
2.39.2

[PULL 09/16] net/can: Remove unused struct 'CanBusState'

2024-06-30 Thread Michael Tokarev

From: "Dr. David Alan Gilbert" 

As far as I can tell this struct has never been used in this
file (it is used in can_core.c).

Signed-off-by: Dr. David Alan Gilbert 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Michael Tokarev 
Signed-off-by: Michael Tokarev 
---
 net/can/can_host.c | 6 --
 1 file changed, 6 deletions(-)

diff --git a/net/can/can_host.c b/net/can/can_host.c
index a3c84028c6..b2fe553f91 100644
--- a/net/can/can_host.c
+++ b/net/can/can_host.c
@@ -34,12 +34,6 @@
 #include "net/can_emu.h"
 #include "net/can_host.h"
 
-struct CanBusState {
-Object object;
-
-QTAILQ_HEAD(, CanBusClientState) clients;
-};
-
 static void can_host_disconnect(CanHostState *ch)
 {
 CanHostClass *chc = CAN_HOST_GET_CLASS(ch);
-- 
2.39.2

Re: [PATCH v6] virtio-net: Fix network stall at the host side waiting for kick

2024-06-30 Thread Michael S. Tsirkin

Thanks for the patch!
Yes something to improve:


On Sun, Jun 30, 2024 at 02:36:15PM +0800, Wencheng Yang wrote:
> From: thomas 
> 
> Patch 06b12970174 ("virtio-net: fix network stall under load")
> added double-check to test whether the available buffer size
> can satisfy the request or not, in case the guest has added
> some buffers to the avail ring simultaneously after the first
> check. It will be lucky if the available buffer size becomes
> okay after the double-check, then the host can send the packet
> to the guest. If the buffer size still can't satisfy the request,
> even if the guest has added some buffers, viritio-net would
> stall at the host side forever.
> 
> The patch checks whether the guest has added some buffers
> after last check of avail idx when the available buffers are
> not sufficient, if so then recheck the available buffers
> in the loop.
> 
> The patch also reverts patch "06b12970174".
> 
> The case below can reproduce the stall.
> 
>Guest 0
>  ++
>  | iperf  |
> ---> | server |
>  Host   |++
>++   |...
>| iperf  |
>| client |  Guest n
>++   |++
> || iperf  |
> ---> | server |
>  ++
> 
> Boot many guests from qemu with virtio network:
>  qemu ... -netdev tap,id=net_x \
> -device virtio-net-pci-non-transitional,\
> iommu_platform=on,mac=xx:xx:xx:xx:xx:xx,netdev=net_x
> 
> Each guest acts as iperf server with commands below:
>  iperf3 -s -D -i 10 -p 8001
>  iperf3 -s -D -i 10 -p 8002
> 
> The host as iperf client:
>  iperf3 -c guest_IP -p 8001 -i 30 -w 256k -P 20 -t 4
>  iperf3 -c guest_IP -p 8002 -i 30 -w 256k -P 20 -t 4
> 
> After some time, the host loses connection to the guest,
> the guest can send packet to the host, but can't receive
> packet from host.
> 
> It's more likely to happen if SWIOTLB is enabled in the guest,
> allocating and freeing bounce buffer takes some CPU ticks,
> copying from/to bounce buffer takes more CPU ticks, compared
> with that there is no bounce buffer in the guest.
> Once the rate of producing packets from the host approximates
> the rate of receiveing packets in the guest, the guest would
> loop in NAPI.
> 
>  receive packets---
>| |
>v |
>free buf  virtnet_poll
>| |
>v |
>  add buf to avail ring  ---
>|
>|  need kick the host?
>|  NAPI continues
>v
>  receive packets---
>| |
>v |
>free buf  virtnet_poll
>| |
>v |
>  add buf to avail ring  ---
>|
>v
>   ...   ...
> 
> On the other hand, the host fetches free buf from avail
> ring, if the buf in the avail ring is not enough, the
> host notifies the guest the event by writing the avail
> idx read from avail ring to the event idx of used ring,
> then the host goes to sleep, waiting for the kick signal
> from the guest.
> 
> Once the guest finds the host is waiting for kick singal
> (in virtqueue_kick_prepare_split()), it kicks the host.
> 
> The host may stall forever at the sequences below:
> 
>  HostGuest
>   ---
>  fetch buf, send packet   receive packet ---
>  ...  ... |
>  fetch buf, send packet add buf   |
>  ...add buf   virtnet_poll
> buf not enough  avail idx-> add buf   |
> read avail idx  add buf   |
> add buf  ---
>   receive packet ---
> write event idx   ... |
> waiting for kick add buf   virtnet_poll
>   ... |
>  ---
>  no more packet, exit NAPI
> 
> In the first loop of NAPI above, indicated in the range of
> virtnet_poll above, the host is sending packets while the
> guest is receiving packets and adding buf.
>  step 1: The buf is not enough, for example, a big packet
>  needs 5 buf, but the available buf count is 3.
>  The host read current avail idx.
>  step 2: The guest adds some buf, then checks whether the
>  host is waiting for kick signal, not at this time.
>  The used ring is not

[PATCH] hw/usb: Fix memory leak in musb_reset()

2024-06-30 Thread Zheyu Ma

The musb_reset function was causing a memory leak by not properly freeing
the memory associated with USBPacket instances before reinitializing them.
This commit addresses the memory leak by adding calls to usb_packet_cleanup
for each USBPacket instance before reinitializing them with usb_packet_init.

Asan log:

=2970623==ERROR: LeakSanitizer: detected memory leaks
Direct leak of 256 byte(s) in 16 object(s) allocated from:
#0 0x561e20629c3d in malloc 
llvm/compiler-rt/lib/asan/asan_malloc_linux.cpp:129:3
#1 0x7fee91885738 in g_malloc 
(/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x5e738)
#2 0x561e21b4d0e1 in usb_packet_init hw/usb/core.c:531:5
#3 0x561e21c5016b in musb_reset hw/usb/hcd-musb.c:372:9
#4 0x561e21c502a9 in musb_init hw/usb/hcd-musb.c:385:5
#5 0x561e21c893ef in tusb6010_realize hw/usb/tusb6010.c:827:15
#6 0x561e23443355 in device_set_realized hw/core/qdev.c:510:13
#7 0x561e2346ac1b in property_set_bool qom/object.c:2354:5
#8 0x561e23463895 in object_property_set qom/object.c:1463:5
#9 0x561e23477909 in object_property_set_qobject qom/qom-qobject.c:28:10
#10 0x561e234645ed in object_property_set_bool qom/object.c:1533:15
#11 0x561e2343c830 in qdev_realize hw/core/qdev.c:291:12
#12 0x561e2343c874 in qdev_realize_and_unref hw/core/qdev.c:298:11
#13 0x561e20ad5091 in sysbus_realize_and_unref hw/core/sysbus.c:261:12
#14 0x561e22553283 in n8x0_usb_setup hw/arm/nseries.c:800:5
#15 0x561e2254e99b in n8x0_init hw/arm/nseries.c:1356:5
#16 0x561e22561170 in n810_init hw/arm/nseries.c:1418:5

Signed-off-by: Zheyu Ma 
---
 hw/usb/hcd-musb.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/hw/usb/hcd-musb.c b/hw/usb/hcd-musb.c
index 6dca373cb1..0300aeaec6 100644
--- a/hw/usb/hcd-musb.c
+++ b/hw/usb/hcd-musb.c
@@ -368,6 +368,8 @@ void musb_reset(MUSBState *s)
 s->ep[i].maxp[1] = 0x40;
 s->ep[i].musb = s;
 s->ep[i].epnum = i;
+usb_packet_cleanup(>ep[i].packey[0].p);
+usb_packet_cleanup(>ep[i].packey[1].p);
 usb_packet_init(>ep[i].packey[0].p);
 usb_packet_init(>ep[i].packey[1].p);
 }
-- 
2.34.1

[PATCH] hw/misc/bcm2835_thermal: Handle invalid address accesses gracefully

2024-06-30 Thread Zheyu Ma

This commit handles invalid address accesses gracefully in both read and write
functions. Instead of asserting and aborting, it logs an error message and 
returns
a neutral value for read operations and does nothing for write operations.

Error log:
ERROR:hw/misc/bcm2835_thermal.c:55:bcm2835_thermal_read: code should not be 
reached
Bail out! ERROR:hw/misc/bcm2835_thermal.c:55:bcm2835_thermal_read: code should 
not be reached
Aborted

Reproducer:
cat << EOF | qemu-system-aarch64 -display \
none -machine accel=qtest, -m 512M -machine raspi3b -m 1G -qtest stdio
readw 0x3f212003
EOF

Signed-off-by: Zheyu Ma 
---
 hw/misc/bcm2835_thermal.c | 12 
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/hw/misc/bcm2835_thermal.c b/hw/misc/bcm2835_thermal.c
index ee7816b8a5..5c2a429d58 100644
--- a/hw/misc/bcm2835_thermal.c
+++ b/hw/misc/bcm2835_thermal.c
@@ -51,8 +51,10 @@ static uint64_t bcm2835_thermal_read(void *opaque, hwaddr 
addr, unsigned size)
 val = FIELD_DP32(bcm2835_thermal_temp2adc(25), STAT, VALID, true);
 break;
 default:
-/* MemoryRegionOps are aligned, so this can not happen. */
-g_assert_not_reached();
+qemu_log_mask(LOG_GUEST_ERROR,
+  "bcm2835_thermal_read: invalid address 0x%"
+  HWADDR_PRIx "\n", addr);
+val = 0;
 }
 return val;
 }
@@ -72,8 +74,10 @@ static void bcm2835_thermal_write(void *opaque, hwaddr addr,
__func__, value, addr);
 break;
 default:
-/* MemoryRegionOps are aligned, so this can not happen. */
-g_assert_not_reached();
+qemu_log_mask(LOG_GUEST_ERROR,
+  "bcm2835_thermal_write: invalid address 0x%"
+  HWADDR_PRIx "\n", addr);
+break;
 }
 }
 
-- 
2.34.1

Re: [PATCH v3 1/2] tests/avocado: update firmware for sbsa-ref

2024-06-30 Thread Ard Biesheuvel

On Thu, 20 Jun 2024 at 12:20, Marcin Juszkiewicz
 wrote:
>
> Update firmware to have graphics card memory fix from EDK2 commit
> c1d1910be6e04a8b1a73090cf2881fb698947a6e:
>
> OvmfPkg/QemuVideoDxe: add feature PCD to remap framebuffer W/C
>
> Some platforms (such as SBSA-QEMU on recent builds of the emulator) only
> tolerate misaligned accesses to normal memory, and raise alignment
> faults on such accesses to device memory, which is the default for PCIe
> MMIO BARs.
>
> When emulating a PCIe graphics controller, the framebuffer is typically
> exposed via a MMIO BAR, while the disposition of the region is closer to
> memory (no side effects on reads or writes, except for the changing
> picture on the screen; direct random access to any pixel in the image).
>
> In order to permit the use of such controllers on platforms that only
> tolerate these types of accesses for normal memory, it is necessary to
> remap the memory. Use the DXE services to set the desired capabilities
> and attributes.
>
> Hide this behavior under a feature PCD so only platforms that really
> need it can enable it. (OVMF on x86 has no need for this)
>
> With this fix enabled we can boot sbsa-ref with more than one cpu core.
>

This requires an explanation: what does the number of CPU cores have
to do with the memory attributes used for the framebuffer?


> Signed-off-by: Marcin Juszkiewicz 
> ---
>  tests/avocado/machine_aarch64_sbsaref.py | 14 +++---
>  1 file changed, 7 insertions(+), 7 deletions(-)
>
> diff --git a/tests/avocado/machine_aarch64_sbsaref.py 
> b/tests/avocado/machine_aarch64_sbsaref.py
> index 6bb82f2a03..e854ec6a1a 100644
> --- a/tests/avocado/machine_aarch64_sbsaref.py
> +++ b/tests/avocado/machine_aarch64_sbsaref.py
> @@ -37,18 +37,18 @@ def fetch_firmware(self):
>
>  Used components:
>
> -- Trusted Firmware 2.11.0
> -- Tianocore EDK2 stable202405
> -- Tianocore EDK2-platforms commit 4bbd0ed
> +- Trusted Firmware v2.11.0
> +- Tianocore EDK2   4d4f569924
> +- Tianocore EDK2-platforms 3f08401
>
>  """
>
>  # Secure BootRom (TF-A code)
>  fs0_xz_url = (
>  
> "https://artifacts.codelinaro.org/artifactory/linaro-419-sbsa-ref/;
> -"20240528-140808/edk2/SBSA_FLASH0.fd.xz"
> +"20240619-148232/edk2/SBSA_FLASH0.fd.xz"
>  )
> -fs0_xz_hash = 
> "fa6004900b67172914c908b78557fec4d36a5f784f4c3dd08f49adb75e1892a9"
> +fs0_xz_hash = 
> "0c954842a590988f526984de22e21ae0ab9cb351a0c99a8a58e928f0c7359cf7"
>  tar_xz_path = self.fetch_asset(fs0_xz_url, asset_hash=fs0_xz_hash,
>algorithm='sha256')
>  archive.extract(tar_xz_path, self.workdir)
> @@ -57,9 +57,9 @@ def fetch_firmware(self):
>  # Non-secure rom (UEFI and EFI variables)
>  fs1_xz_url = (
>  
> "https://artifacts.codelinaro.org/artifactory/linaro-419-sbsa-ref/;
> -"20240528-140808/edk2/SBSA_FLASH1.fd.xz"
> +"20240619-148232/edk2/SBSA_FLASH1.fd.xz"
>  )
> -fs1_xz_hash = 
> "5f3747d4000bc416d9641e33ff4ac60c3cc8cb74ca51b6e932e58531c62eb6f7"
> +fs1_xz_hash = 
> "c6ec39374c4d79bb9e9cdeeb6db44732d90bb4a334cec92002b3f4b9cac4b5ee"
>  tar_xz_path = self.fetch_asset(fs1_xz_url, asset_hash=fs1_xz_hash,
>algorithm='sha256')
>  archive.extract(tar_xz_path, self.workdir)
>
> --
> 2.45.1
>

[PATCH] virtio: Implement Virtio Backend for SD/MMC in QEMU

2024-06-30 Thread Mikhail Krasheninnikov

Add a Virtio backend for SD/MMC devices. Confirmed interoperability with
Linux.

Signed-off-by: Mikhail Krasheninnikov 
CC: Matwey Kornilov 
CC: qemu-bl...@nongnu.org
CC: Michael S. Tsirkin 
---
 hw/virtio/Kconfig   |   5 +
 hw/virtio/meson.build   |   2 +
 hw/virtio/virtio-mmc-pci.c  |  85 ++
 hw/virtio/virtio-mmc.c  | 165 
 hw/virtio/virtio.c  |   3 +-
 include/hw/virtio/virtio-mmc.h  |  20 +++
 include/standard-headers/linux/virtio_ids.h |   1 +
 7 files changed, 280 insertions(+), 1 deletion(-)
 create mode 100644 hw/virtio/virtio-mmc-pci.c
 create mode 100644 hw/virtio/virtio-mmc.c
 create mode 100644 include/hw/virtio/virtio-mmc.h

diff --git a/hw/virtio/Kconfig b/hw/virtio/Kconfig
index 92c9cf6c96..e5fa7607c4 100644
--- a/hw/virtio/Kconfig
+++ b/hw/virtio/Kconfig
@@ -105,3 +105,8 @@ config VHOST_USER_SCMI
 bool
 default y
 depends on VIRTIO && VHOST_USER
+
+config VIRTIO_MMC
+bool
+default y
+depends on VIRTIO
\ No newline at end of file
diff --git a/hw/virtio/meson.build b/hw/virtio/meson.build
index 47baf00366..1ff095c5bc 100644
--- a/hw/virtio/meson.build
+++ b/hw/virtio/meson.build
@@ -41,6 +41,7 @@ specific_virtio_ss.add(when: 'CONFIG_VHOST_USER_GPIO', 
if_true: files('vhost-use
 specific_virtio_ss.add(when: ['CONFIG_VIRTIO_PCI', 'CONFIG_VHOST_USER_GPIO'], 
if_true: files('vhost-user-gpio-pci.c'))
 specific_virtio_ss.add(when: 'CONFIG_VHOST_USER_SCMI', if_true: 
files('vhost-user-scmi.c'))
 specific_virtio_ss.add(when: ['CONFIG_VIRTIO_PCI', 'CONFIG_VHOST_USER_SCMI'], 
if_true: files('vhost-user-scmi-pci.c'))
+specific_virtio_ss.add(when: 'CONFIG_VIRTIO_MMC', if_true: 
files('virtio-mmc.c'))
 
 virtio_pci_ss = ss.source_set()
 virtio_pci_ss.add(when: 'CONFIG_VHOST_VSOCK', if_true: 
files('vhost-vsock-pci.c'))
@@ -68,6 +69,7 @@ virtio_pci_ss.add(when: 'CONFIG_VIRTIO_IOMMU', if_true: 
files('virtio-iommu-pci.
 virtio_pci_ss.add(when: 'CONFIG_VIRTIO_MEM', if_true: 
files('virtio-mem-pci.c'))
 virtio_pci_ss.add(when: 'CONFIG_VHOST_VDPA_DEV', if_true: 
files('vdpa-dev-pci.c'))
 virtio_pci_ss.add(when: 'CONFIG_VIRTIO_MD', if_true: files('virtio-md-pci.c'))
+virtio_pci_ss.add(when: 'CONFIG_VIRTIO_MMC', if_true: 
files('virtio-mmc-pci.c'))
 
 specific_virtio_ss.add_all(when: 'CONFIG_VIRTIO_PCI', if_true: virtio_pci_ss)
 
diff --git a/hw/virtio/virtio-mmc-pci.c b/hw/virtio/virtio-mmc-pci.c
new file mode 100644
index 00..f0ed17d03b
--- /dev/null
+++ b/hw/virtio/virtio-mmc-pci.c
@@ -0,0 +1,85 @@
+#include "qemu/osdep.h"
+
+#include "hw/virtio/virtio-pci.h"
+#include "hw/virtio/virtio-mmc.h"
+#include "hw/qdev-properties-system.h"
+#include "qemu/typedefs.h"
+#include "qapi/error.h"
+#include "sysemu/block-backend-global-state.h"
+
+typedef struct VirtIOMMCPCI VirtIOMMCPCI;
+
+/*
+ * virtio-mmc-pci: This extends VirtioPCIProxy.
+ */
+#define TYPE_VIRTIO_MMC_PCI "virtio-mmc-pci-base"
+DECLARE_INSTANCE_CHECKER(VirtIOMMCPCI, VIRTIO_MMC_PCI,
+ TYPE_VIRTIO_MMC_PCI)
+
+struct VirtIOMMCPCI {
+VirtIOPCIProxy parent_obj;
+VirtIOMMC vdev;
+BlockBackend *blk;
+};
+
+static void virtio_mmc_pci_realize(VirtIOPCIProxy *vpci_dev, Error **errp)
+{
+VirtIOMMCPCI *vmmc = VIRTIO_MMC_PCI(vpci_dev);
+DeviceState *dev = DEVICE(>vdev);
+
+if (!vmmc->blk) {
+error_setg(errp, "Drive property not set");
+return;
+}
+VirtIOMMC *vmmc_dev = >vdev;
+vmmc_dev->blk = vmmc->blk;
+blk_detach_dev(vmmc->blk, DEVICE(vpci_dev));
+
+qdev_set_parent_bus(dev, BUS(_dev->bus), errp);
+
+virtio_pci_force_virtio_1(vpci_dev);
+object_property_set_bool(OBJECT(dev), "realized", true, errp);
+}
+
+static Property virtio_mmc_properties[] = {
+DEFINE_PROP_DRIVE("drive", VirtIOMMCPCI, blk),
+DEFINE_PROP_END_OF_LIST(),
+};
+
+static void virtio_mmc_pci_class_init(ObjectClass *oc, void *data)
+{
+DeviceClass *dc = DEVICE_CLASS(oc);
+VirtioPCIClass *virtio_pci_class = VIRTIO_PCI_CLASS(oc);
+PCIDeviceClass *pci_device_class = PCI_DEVICE_CLASS(oc);
+
+device_class_set_props(dc, virtio_mmc_properties);
+set_bit(DEVICE_CATEGORY_STORAGE, dc->categories);
+
+virtio_pci_class->realize = virtio_mmc_pci_realize;
+
+pci_device_class->revision = VIRTIO_PCI_ABI_VERSION;
+pci_device_class->class_id = PCI_CLASS_MEMORY_FLASH;
+}
+
+static void virtio_mmc_pci_instance_init(Object *obj)
+{
+VirtIOMMCPCI *dev = VIRTIO_MMC_PCI(obj);
+
+virtio_instance_init_common(obj, >vdev, sizeof(dev->vdev),
+TYPE_VIRTIO_MMC);
+}
+
+static const VirtioPCIDeviceTypeInfo virtio_mmc_pci_info = {
+.base_name = TYPE_VIRTIO_MMC_PCI,
+.generic_name  = "virtio-mmc-pci",
+.instance_size = sizeof(VirtIOMMCPCI),
+.class_init= virtio_mmc_pci_class_init,
+.instance_init = virtio_mmc_pci_instance_init,
+};
+
+static void

[PATCH] hw/display/tcx: Fix out-of-bounds access in tcx_blit_writel

2024-06-30 Thread Zheyu Ma

This patch addresses a potential out-of-bounds memory access issue in the
tcx_blit_writel function. It adds bounds checking to ensure that memory
accesses do not exceed the allocated VRAM size. If an out-of-bounds access
is detected, an error is logged using qemu_log_mask.

ASAN log:
==2960379==ERROR: AddressSanitizer: SEGV on unknown address 0x7f524752fd01 (pc 
0x7f525c2c4881 bp 0x7ffdaf87bfd0 sp 0x7ffdaf87b788 T0)
==2960379==The signal is caused by a READ memory access.
#0 0x7f525c2c4881 in memcpy 
string/../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:222
#1 0x55aa782bd5b1 in __asan_memcpy 
llvm/compiler-rt/lib/asan/asan_interceptors_memintrinsics.cpp:22:3
#2 0x55aa7854dedd in tcx_blit_writel hw/display/tcx.c:590:13

Reproducer:
cat << EOF | qemu-system-sparc -display none \
-machine accel=qtest, -m 512M -machine LX -m 256 -qtest stdio
writel 0x562e98c4 0x3d92fd01
EOF

Signed-off-by: Zheyu Ma 
---
 hw/display/tcx.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/hw/display/tcx.c b/hw/display/tcx.c
index 99507e7638..af43bea7f2 100644
--- a/hw/display/tcx.c
+++ b/hw/display/tcx.c
@@ -33,6 +33,7 @@
 #include "migration/vmstate.h"
 #include "qemu/error-report.h"
 #include "qemu/module.h"
+#include "qemu/log.h"
 #include "qom/object.h"
 
 #define TCX_ROM_FILE "QEMU,tcx.bin"
@@ -577,6 +578,14 @@ static void tcx_blit_writel(void *opaque, hwaddr addr,
 addr = (addr >> 3) & 0xf;
 adsr = val & 0xff;
 len = ((val >> 24) & 0x1f) + 1;
+
+if (addr + len > s->vram_size || adsr + len > s->vram_size) {
+qemu_log_mask(LOG_GUEST_ERROR,
+  "%s: VRAM access out of bounds. addr: 0x%lx, adsr: 
0x%x, len: %u\n",
+  __func__, addr, adsr, len);
+return;
+}
+
 if (adsr == 0xff) {
 memset(>vram[addr], s->tmpblit, len);
 if (s->depth == 24) {
-- 
2.34.1

Question: xen + vhost user

2024-06-30 Thread Peng Fan

Hi All,

I am trying to enable vhost user input with xen hypervisor on i.MX95, using qemu
vhost-user-input. But meet " Invalid vring_addr message ". My xen domu cfg:

'-chardev', 'socket,path=/tmp/input.sock,id=mouse0',
'-device', 'vhost-user-input-pci,chardev=mouse0',

Anyone knows what missing?

Partial error log:
 Vhost user message 
Request: VHOST_USER_SET_VRING_ADDR (9)
Flags:   0x1
Size:40
vhost_vring_addr:
index:  0
flags:  0
desc_user_addr:   0x889b
used_user_addr:   0x889b04c0
avail_user_addr:  0x889b0400
log_guest_addr:   0x444714c0
Setting virtq addresses:
vring_desc  at (nil)
vring_used  at (nil)
vring_avail at (nil)

** (vhost-user-input:1816): CRITICAL **: 07:20:46.077: Invalid vring_addr 
message

Thanks,
Peng.

The full vhost user debug log:
./vhost-user-input --socket-path=/tmp/input.sock --evdev-path=/d
-path=/dev/input/event1 ./vhost-user-input --socket-path=/tmp/input.sock 
--evdev-
 Vhost user message 
Request: VHOST_USER_GET_FEATURES (1)
Flags:   0x1
Size:0
Sending back to guest u64: 0x00017500
 Vhost user message 
Request: VHOST_USER_GET_PROTOCOL_FEATURES (15)
Flags:   0x1
Size:0
 Vhost user message 
Request: VHOST_USER_SET_PROTOCOL_FEATURES (16)
Flags:   0x1
Size:8
u64: 0x8e2b
 Vhost user message 
Request: VHOST_USER_GET_QUEUE_NUM (17)
Flags:   0x1
Size:0
 Vhost user message 
Request: VHOST_USER_GET_MAX_MEM_SLOTS (36)
Flags:   0x1
Size:0
u64: 0x0020
 Vhost user message 
Request: VHOST_USER_SET_BACKEND_REQ_FD (21)
Flags:   0x9
Size:0
Fds: 6
Got backend_fd: 6
 Vhost user message 
Request: VHOST_USER_SET_OWNER (3)
Flags:   0x1
Size:0
 Vhost user message 
Request: VHOST_USER_GET_FEATURES (1)
Flags:   0x1
Size:0
Sending back to guest u64: 0x00017500
 Vhost user message 
Request: VHOST_USER_SET_VRING_CALL (13)
Flags:   0x1
Size:8
Fds: 7
u64: 0x
Got call_fd: 7 for vq: 0
 Vhost user message 
Request: VHOST_USER_SET_VRING_ERR (14)
Flags:   0x1
Size:8
Fds: 8
u64: 0x
 Vhost user message 
Request: VHOST_USER_SET_VRING_CALL (13)
Flags:   0x1
Size:8
Fds: 9
u64: 0x0001
Got call_fd: 9 for vq: 1
 Vhost user message 
Request: VHOST_USER_SET_VRING_ERR (14)
Flags:   0x1
Size:8
Fds: 10
u64: 0x0001
(XEN) d2v0 Unhandled SMC/HVC: 0x8450
(XEN) d2v0 Unhandled SMC/HVC: 0x8600ff01
(XEN) d2v0: vGICD: RAZ on reserved register offset 0x0c
(XEN) d2v0: vGICD: unhandled word write 0x00 to ICACTIVER4
(XEN) d2v0: vGICR: SGI: unhandled word write 0x00 to ICACTIVER0
 Vhost user message 
Request: VHOST_USER_SET_CONFIG (25)
Flags:   0x9
Size:148
 Vhost user message 
Request: VHOST_USER_SET_CONFIG (25)
Flags:   0x9
Size:148
 Vhost user message 
Request: VHOST_USER_GET_CONFIG (24)
Flags:   0x1
Size:148
 Vhost user message 
Request: VHOST_USER_GET_CONFIG (24)
Flags:   0x1
Size:148
 Vhost user message 
Request: VHOST_USER_GET_CONFIG (24)
Flags:   0x1
Size:148
 Vhost user message 
Request: VHOST_USER_GET_CONFIG (24)
Flags:   0x1
Size:148
 Vhost user message 
Request: VHOST_USER_GET_CONFIG (24)
Flags:   0x1
Size:148
 Vhost user message 
Request: VHOST_USER_GET_CONFIG (24)
Flags:   0x1
Size:148
 Vhost user message 
Request: VHOST_USER_GET_CONFIG (24)
Flags:   0x1
Size:148
 Vhost user message 
Request: VHOST_USER_GET_CONFIG (24)
Flags:   0x1
Size:148
 Vhost user message 
Request: VHOST_USER_GET_CONFIG (24)
Flags:   0x1
Size:148
 Vhost user message 
Request: VHOST_USER_GET_CONFIG (24)
Flags:   0x1
Size:148
 Vhost user message 
Request: VHOST_USER_GET_CONFIG (24)
Flags:   0x1
Size:148
 Vhost user message 
Request: VHOST_USER_GET_CONFIG (24)
Flags:   0x1
Size:148
 Vhost user message 
Request: VHOST_USER_GET_CONFIG (24)
Flags:   0x1
Size:148
 Vhost user message 
Request: VHOST_USER_GET_CONFIG (24)
Flags:   0x1
Size:148
 Vhost user message 
Request: VHOST_USER_GET_CONFIG (24)
Flags:

[PATCH v2 3/3] target/ppc : Update VSX storage access insns to use tcg_gen_qemu _ld/st_i128.

2024-06-30 Thread Chinmay Rath

Updated many VSX instructions to use tcg_gen_qemu_ld/st_i128, instead of using
tcg_gen_qemu_ld/st_i64 consecutively.
Introduced functions {get,set}_vsr_full to facilitate the above & for future 
use.

Suggested-by: Richard Henderson 
Signed-off-by: Chinmay Rath 
---
 target/ppc/translate/vsx-impl.c.inc | 70 +
 1 file changed, 31 insertions(+), 39 deletions(-)

diff --git a/target/ppc/translate/vsx-impl.c.inc 
b/target/ppc/translate/vsx-impl.c.inc
index 26ebf3fedf..b622831a73 100644
--- a/target/ppc/translate/vsx-impl.c.inc
+++ b/target/ppc/translate/vsx-impl.c.inc
@@ -10,6 +10,16 @@ static inline void set_cpu_vsr(int n, TCGv_i64 src, bool 
high)
 tcg_gen_st_i64(src, tcg_env, vsr64_offset(n, high));
 }
 
+static inline void get_vsr_full(TCGv_i128 dst, int reg)
+{
+tcg_gen_ld_i128(dst, tcg_env, vsr_full_offset(reg));
+}
+
+static inline void set_vsr_full(int reg, TCGv_i128 src)
+{
+tcg_gen_st_i128(src, tcg_env, vsr_full_offset(reg));
+}
+
 static inline TCGv_ptr gen_vsr_ptr(int reg)
 {
 TCGv_ptr r = tcg_temp_new_ptr();
@@ -196,20 +206,17 @@ static bool trans_LXVH8X(DisasContext *ctx, arg_LXVH8X *a)
 static bool trans_LXVB16X(DisasContext *ctx, arg_LXVB16X *a)
 {
 TCGv EA;
-TCGv_i64 xth, xtl;
+TCGv_i128 data;
 
 REQUIRE_VSX(ctx);
 REQUIRE_INSNS_FLAGS2(ctx, ISA300);
 
-xth = tcg_temp_new_i64();
-xtl = tcg_temp_new_i64();
+data = tcg_temp_new_i128();
 gen_set_access_type(ctx, ACCESS_INT);
 EA = do_ea_calc(ctx, a->ra, cpu_gpr[a->rb]);
-tcg_gen_qemu_ld_i64(xth, EA, ctx->mem_idx, MO_BEUQ);
-tcg_gen_addi_tl(EA, EA, 8);
-tcg_gen_qemu_ld_i64(xtl, EA, ctx->mem_idx, MO_BEUQ);
-set_cpu_vsr(a->rt, xth, true);
-set_cpu_vsr(a->rt, xtl, false);
+tcg_gen_qemu_ld_i128(data, EA, ctx->mem_idx,
+ MO_BE | MO_128 | MO_ATOM_IFALIGN_PAIR);
+set_vsr_full(a->rt, data);
 return true;
 }
 
@@ -385,20 +392,17 @@ static bool trans_STXVH8X(DisasContext *ctx, arg_STXVH8X 
*a)
 static bool trans_STXVB16X(DisasContext *ctx, arg_STXVB16X *a)
 {
 TCGv EA;
-TCGv_i64 xsh, xsl;
+TCGv_i128 data;
 
 REQUIRE_VSX(ctx);
 REQUIRE_INSNS_FLAGS2(ctx, ISA300);
 
-xsh = tcg_temp_new_i64();
-xsl = tcg_temp_new_i64();
-get_cpu_vsr(xsh, a->rt, true);
-get_cpu_vsr(xsl, a->rt, false);
+data = tcg_temp_new_i128();
 gen_set_access_type(ctx, ACCESS_INT);
 EA = do_ea_calc(ctx, a->ra, cpu_gpr[a->rb]);
-tcg_gen_qemu_st_i64(xsh, EA, ctx->mem_idx, MO_BEUQ);
-tcg_gen_addi_tl(EA, EA, 8);
-tcg_gen_qemu_st_i64(xsl, EA, ctx->mem_idx, MO_BEUQ);
+get_vsr_full(data, a->rt);
+tcg_gen_qemu_st_i128(data, EA, ctx->mem_idx,
+ MO_BE | MO_128 | MO_ATOM_IFALIGN_PAIR);
 return true;
 }
 
@@ -2175,13 +2179,13 @@ static bool do_lstxv(DisasContext *ctx, int ra, TCGv 
displ,
  int rt, bool store, bool paired)
 {
 TCGv ea;
-TCGv_i64 xt;
+TCGv_i128 data;
 MemOp mop;
 int rt1, rt2;
 
-xt = tcg_temp_new_i64();
+data = tcg_temp_new_i128();
 
-mop = DEF_MEMOP(MO_UQ);
+mop = DEF_MEMOP(MO_128 | MO_ATOM_IFALIGN_PAIR);
 
 gen_set_access_type(ctx, ACCESS_INT);
 ea = do_ea_calc(ctx, ra, displ);
@@ -2195,32 +2199,20 @@ static bool do_lstxv(DisasContext *ctx, int ra, TCGv 
displ,
 }
 
 if (store) {
-get_cpu_vsr(xt, rt1, !ctx->le_mode);
-tcg_gen_qemu_st_i64(xt, ea, ctx->mem_idx, mop);
-gen_addr_add(ctx, ea, ea, 8);
-get_cpu_vsr(xt, rt1, ctx->le_mode);
-tcg_gen_qemu_st_i64(xt, ea, ctx->mem_idx, mop);
+get_vsr_full(data, rt1);
+tcg_gen_qemu_st_i128(data, ea, ctx->mem_idx, mop);
 if (paired) {
 gen_addr_add(ctx, ea, ea, 8);
-get_cpu_vsr(xt, rt2, !ctx->le_mode);
-tcg_gen_qemu_st_i64(xt, ea, ctx->mem_idx, mop);
-gen_addr_add(ctx, ea, ea, 8);
-get_cpu_vsr(xt, rt2, ctx->le_mode);
-tcg_gen_qemu_st_i64(xt, ea, ctx->mem_idx, mop);
+get_vsr_full(data, rt2);
+tcg_gen_qemu_st_i128(data, ea, ctx->mem_idx, mop);
 }
 } else {
-tcg_gen_qemu_ld_i64(xt, ea, ctx->mem_idx, mop);
-set_cpu_vsr(rt1, xt, !ctx->le_mode);
-gen_addr_add(ctx, ea, ea, 8);
-tcg_gen_qemu_ld_i64(xt, ea, ctx->mem_idx, mop);
-set_cpu_vsr(rt1, xt, ctx->le_mode);
+tcg_gen_qemu_ld_i128(data, ea, ctx->mem_idx, mop);
+set_vsr_full(rt1, data);
 if (paired) {
 gen_addr_add(ctx, ea, ea, 8);
-tcg_gen_qemu_ld_i64(xt, ea, ctx->mem_idx, mop);
-set_cpu_vsr(rt2, xt, !ctx->le_mode);
-gen_addr_add(ctx, ea, ea, 8);
-tcg_gen_qemu_ld_i64(xt, ea, ctx->mem_idx, mop);
-set_cpu_vsr(rt2, xt, ctx->le_mode);
+tcg_gen_qemu_ld_i128(data, ea, ctx->mem_idx, mop);
+set_vsr_full(rt2, data);
 }
 }
 return true;
-- 
2.39.3

[PATCH v2 2/3] target/ppc: Update VMX storage access insns to use tcg_gen_qemu_ld/st_i128.

2024-06-30 Thread Chinmay Rath

Updated instructions {l, st}vx to use tcg_gen_qemu_ld/st_i128,
instead of using 64 bits loads/stores in succession.
Introduced functions {get, set}_avr_full in vmx-impl.c.inc to
facilitate the above, and potential future usage.

Suggested-by: Richard Henderson 
Signed-off-by: Chinmay Rath 
---
 target/ppc/translate/vmx-impl.c.inc | 42 ++---
 1 file changed, 20 insertions(+), 22 deletions(-)

diff --git a/target/ppc/translate/vmx-impl.c.inc 
b/target/ppc/translate/vmx-impl.c.inc
index a182d2cf81..70d0ad2e71 100644
--- a/target/ppc/translate/vmx-impl.c.inc
+++ b/target/ppc/translate/vmx-impl.c.inc
@@ -24,25 +24,29 @@ static inline void set_avr64(int regno, TCGv_i64 src, bool 
high)
 tcg_gen_st_i64(src, tcg_env, avr64_offset(regno, high));
 }
 
+static inline void get_avr_full(TCGv_i128 dst, int regno)
+{
+tcg_gen_ld_i128(dst, tcg_env, avr_full_offset(regno));
+}
+
+static inline void set_avr_full(int regno, TCGv_i128 src)
+{
+tcg_gen_st_i128(src, tcg_env, avr_full_offset(regno));
+}
+
 static bool trans_LVX(DisasContext *ctx, arg_X *a)
 {
 TCGv EA;
-TCGv_i64 avr;
+TCGv_i128 avr;
 REQUIRE_INSNS_FLAGS(ctx, ALTIVEC);
 REQUIRE_VECTOR(ctx);
 gen_set_access_type(ctx, ACCESS_INT);
-avr = tcg_temp_new_i64();
+avr = tcg_temp_new_i128();
 EA = do_ea_calc(ctx, a->ra, cpu_gpr[a->rb]);
 tcg_gen_andi_tl(EA, EA, ~0xf);
-/*
- * We only need to swap high and low halves. gen_qemu_ld64_i64
- * does necessary 64-bit byteswap already.
- */
-gen_qemu_ld64_i64(ctx, avr, EA);
-set_avr64(a->rt, avr, !ctx->le_mode);
-tcg_gen_addi_tl(EA, EA, 8);
-gen_qemu_ld64_i64(ctx, avr, EA);
-set_avr64(a->rt, avr, ctx->le_mode);
+tcg_gen_qemu_ld_i128(avr, EA, ctx->mem_idx,
+ DEF_MEMOP(MO_128 | MO_ATOM_IFALIGN_PAIR));
+set_avr_full(a->rt, avr);
 return true;
 }
 
@@ -56,22 +60,16 @@ static bool trans_LVXL(DisasContext *ctx, arg_LVXL *a)
 static bool trans_STVX(DisasContext *ctx, arg_STVX *a)
 {
 TCGv EA;
-TCGv_i64 avr;
+TCGv_i128 avr;
 REQUIRE_INSNS_FLAGS(ctx, ALTIVEC);
 REQUIRE_VECTOR(ctx);
 gen_set_access_type(ctx, ACCESS_INT);
-avr = tcg_temp_new_i64();
+avr = tcg_temp_new_i128();
 EA = do_ea_calc(ctx, a->ra, cpu_gpr[a->rb]);
 tcg_gen_andi_tl(EA, EA, ~0xf);
-/*
- * We only need to swap high and low halves. gen_qemu_st64_i64
- * does necessary 64-bit byteswap already.
- */
-get_avr64(avr, a->rt, !ctx->le_mode);
-gen_qemu_st64_i64(ctx, avr, EA);
-tcg_gen_addi_tl(EA, EA, 8);
-get_avr64(avr, a->rt, ctx->le_mode);
-gen_qemu_st64_i64(ctx, avr, EA);
+get_avr_full(avr, a->rt);
+tcg_gen_qemu_st_i128(avr, EA, ctx->mem_idx,
+ DEF_MEMOP(MO_128 | MO_ATOM_IFALIGN_PAIR));
 return true;
 }
 
-- 
2.39.3

[PATCH v2 0/3] target/ppc: Update vector insns to use 128 bit

2024-06-30 Thread Chinmay Rath

Updating a bunch of VMX and VSX storage access instructions to use
tcg_gen_qemu_ld/st_i128 instead of using tcg_gen_qemu_ld/st_i64 in
succession; as suggested by Richard, in my decodetree patches.
Plus some minor clean-ups to facilitate the above in case of VMX insns.

Change log:

v2 : Applied IFALIGN_PAIR memop changes in patches 2/3 and 3/3,
based on review comments by Richard in v1.

v1 : 
https://lore.kernel.org/qemu-devel/20240621114604.868415-1-ra...@linux.ibm.com/

Chinmay Rath (3):
  target/ppc: Move get/set_avr64 functions to vmx-impl.c.inc.
  target/ppc: Update VMX storage access insns to use
tcg_gen_qemu_ld/st_i128.
  target/ppc : Update VSX storage access insns to use tcg_gen_qemu
_ld/st_i128.

 target/ppc/translate.c  | 10 -
 target/ppc/translate/vmx-impl.c.inc | 52 -
 target/ppc/translate/vsx-impl.c.inc | 70 +
 3 files changed, 61 insertions(+), 71 deletions(-)

-- 
2.39.3

[PATCH v2 1/3] target/ppc: Move get/set_avr64 functions to vmx-impl.c.inc.

2024-06-30 Thread Chinmay Rath

Those functions are used to ld/st data to and from Altivec registers,
in 64 bits chunks, and are only used in vmx-impl.c.inc file,
hence the clean-up movement.

Signed-off-by: Chinmay Rath 
---
 target/ppc/translate.c  | 10 --
 target/ppc/translate/vmx-impl.c.inc | 10 ++
 2 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index ad512e1922..f7f2c2db9e 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -6200,16 +6200,6 @@ static inline void set_fpr(int regno, TCGv_i64 src)
 tcg_gen_st_i64(tcg_constant_i64(0), tcg_env, vsr64_offset(regno, false));
 }
 
-static inline void get_avr64(TCGv_i64 dst, int regno, bool high)
-{
-tcg_gen_ld_i64(dst, tcg_env, avr64_offset(regno, high));
-}
-
-static inline void set_avr64(int regno, TCGv_i64 src, bool high)
-{
-tcg_gen_st_i64(src, tcg_env, avr64_offset(regno, high));
-}
-
 /*
  * Helpers for decodetree used by !function for decoding arguments.
  */
diff --git a/target/ppc/translate/vmx-impl.c.inc 
b/target/ppc/translate/vmx-impl.c.inc
index 152bcde0e3..a182d2cf81 100644
--- a/target/ppc/translate/vmx-impl.c.inc
+++ b/target/ppc/translate/vmx-impl.c.inc
@@ -14,6 +14,16 @@ static inline TCGv_ptr gen_avr_ptr(int reg)
 return r;
 }
 
+static inline void get_avr64(TCGv_i64 dst, int regno, bool high)
+{
+tcg_gen_ld_i64(dst, tcg_env, avr64_offset(regno, high));
+}
+
+static inline void set_avr64(int regno, TCGv_i64 src, bool high)
+{
+tcg_gen_st_i64(src, tcg_env, avr64_offset(regno, high));
+}
+
 static bool trans_LVX(DisasContext *ctx, arg_X *a)
 {
 TCGv EA;
-- 
2.39.3

[PATCH v6] virtio-net: Fix network stall at the host side waiting for kick

2024-06-30 Thread Wencheng Yang

From: thomas 

Patch 06b12970174 ("virtio-net: fix network stall under load")
added double-check to test whether the available buffer size
can satisfy the request or not, in case the guest has added
some buffers to the avail ring simultaneously after the first
check. It will be lucky if the available buffer size becomes
okay after the double-check, then the host can send the packet
to the guest. If the buffer size still can't satisfy the request,
even if the guest has added some buffers, viritio-net would
stall at the host side forever.

The patch checks whether the guest has added some buffers
after last check of avail idx when the available buffers are
not sufficient, if so then recheck the available buffers
in the loop.

The patch also reverts patch "06b12970174".

The case below can reproduce the stall.

   Guest 0
 ++
 | iperf  |
---> | server |
 Host   |++
   ++   |...
   | iperf  |
   | client |  Guest n
   ++   |++
|| iperf  |
---> | server |
 ++

Boot many guests from qemu with virtio network:
 qemu ... -netdev tap,id=net_x \
-device virtio-net-pci-non-transitional,\
iommu_platform=on,mac=xx:xx:xx:xx:xx:xx,netdev=net_x

Each guest acts as iperf server with commands below:
 iperf3 -s -D -i 10 -p 8001
 iperf3 -s -D -i 10 -p 8002

The host as iperf client:
 iperf3 -c guest_IP -p 8001 -i 30 -w 256k -P 20 -t 4
 iperf3 -c guest_IP -p 8002 -i 30 -w 256k -P 20 -t 4

After some time, the host loses connection to the guest,
the guest can send packet to the host, but can't receive
packet from host.

It's more likely to happen if SWIOTLB is enabled in the guest,
allocating and freeing bounce buffer takes some CPU ticks,
copying from/to bounce buffer takes more CPU ticks, compared
with that there is no bounce buffer in the guest.
Once the rate of producing packets from the host approximates
the rate of receiveing packets in the guest, the guest would
loop in NAPI.

 receive packets---
   | |
   v |
   free buf  virtnet_poll
   | |
   v |
 add buf to avail ring  ---
   |
   |  need kick the host?
   |  NAPI continues
   v
 receive packets---
   | |
   v |
   free buf  virtnet_poll
   | |
   v |
 add buf to avail ring  ---
   |
   v
  ...   ...

On the other hand, the host fetches free buf from avail
ring, if the buf in the avail ring is not enough, the
host notifies the guest the event by writing the avail
idx read from avail ring to the event idx of used ring,
then the host goes to sleep, waiting for the kick signal
from the guest.

Once the guest finds the host is waiting for kick singal
(in virtqueue_kick_prepare_split()), it kicks the host.

The host may stall forever at the sequences below:

 HostGuest
  ---
 fetch buf, send packet   receive packet ---
 ...  ... |
 fetch buf, send packet add buf   |
 ...add buf   virtnet_poll
buf not enough  avail idx-> add buf   |
read avail idx  add buf   |
add buf  ---
  receive packet ---
write event idx   ... |
waiting for kick add buf   virtnet_poll
  ... |
 ---
 no more packet, exit NAPI

In the first loop of NAPI above, indicated in the range of
virtnet_poll above, the host is sending packets while the
guest is receiving packets and adding buf.
 step 1: The buf is not enough, for example, a big packet
 needs 5 buf, but the available buf count is 3.
 The host read current avail idx.
 step 2: The guest adds some buf, then checks whether the
 host is waiting for kick signal, not at this time.
 The used ring is not empty, the guest continues
 the second loop of NAPI.
 step 3: The host writes the avail idx read from avail
 ring to used ring as event idx via
 virtio_queue_set_notification(q->rx_vq, 1).
 step 4: At the end of the second loop of NAPI, recheck
 whether kick is needed, as the event idx in the
 used ring

95 matches

Mail list logo