date:20240611


On Wed, 12 Jun 2024 00:58, Anton Johansson via  wrote:

For TBs crossing page boundaries, the 2nd page will never be
recorded/removed, as the index of the 2nd page is computed from the
address of the 1st page. This is due to a typo, fix it.

Signed-off-by: Anton Johansson 
---


Reviewed-by: Manos Pitsidianakis

Re: [PATCH 00/26] hw/ppc: Prefer HumanReadableText over Monitor


On Mon, 10 Jun 2024 09:20, Philippe Mathieu-Daudé  wrote:

Hi,

This series remove uses of Monitor in hw/ppc/,
replacing by the more generic HumanReadableText.
Care is taken to keep the commit bisectables by
updating functions one by one, also easing review.

For rationale see previous series from Daniel:
https://lore.kernel.org/qemu-devel/20211028155457.967291-1-berra...@redhat.com/

Regards,

Phil.

Philippe Mathieu-Daudé (26):
 hw/ppc: Avoid using Monitor in pnv_phb3_msi_pic_print_info()
 hw/ppc: Avoid using Monitor in icp_pic_print_info()
 hw/ppc: Avoid using Monitor in xive_tctx_pic_print_info()
 hw/ppc: Avoid using Monitor in ics_pic_print_info()
 hw/ppc: Avoid using Monitor in PnvChipClass::intc_print_info()
 hw/ppc: Avoid using Monitor in xive_end_queue_pic_print_info()
 hw/ppc: Avoid using Monitor in spapr_xive_end_pic_print_info()
 hw/ppc: Avoid using Monitor in spapr_xive_pic_print_info()
 hw/ppc: Avoid using Monitor in xive_source_pic_print_info()
 hw/ppc: Avoid using Monitor in pnv_phb4_pic_print_info()
 hw/ppc: Avoid using Monitor in xive_eas_pic_print_info()
 hw/ppc: Avoid using Monitor in xive_end_pic_print_info()
 hw/ppc: Avoid using Monitor in xive_end_eas_pic_print_info()
 hw/ppc: Avoid using Monitor in xive_nvt_pic_print_info()
 hw/ppc: Avoid using Monitor in pnv_xive_pic_print_info()
 hw/ppc: Avoid using Monitor in pnv_psi_pic_print_info()
 hw/ppc: Avoid using Monitor in xive2_eas_pic_print_info()
 hw/ppc: Avoid using Monitor in xive2_end_eas_pic_print_info()
 hw/ppc: Avoid using Monitor in xive2_end_queue_pic_print_info()
 hw/ppc: Avoid using Monitor in xive2_end_pic_print_info()
 hw/ppc: Avoid using Monitor in xive2_nvp_pic_print_info()
 hw/ppc: Avoid using Monitor in pnv_xive2_pic_print_info()
 hw/ppc: Avoid using Monitor in
   SpaprInterruptControllerClass::print_info()
 hw/ppc: Avoid using Monitor in spapr_irq_print_info()
 hw/ppc: Avoid using Monitor in pnv_chip_power9_pic_print_info_child()
 hw/ppc: Avoid using Monitor in pic_print_info()

include/hw/pci-host/pnv_phb3.h |   2 +-
include/hw/pci-host/pnv_phb4.h |   2 +-
include/hw/ppc/pnv_chip.h  |   4 +-
include/hw/ppc/pnv_psi.h   |   2 +-
include/hw/ppc/pnv_xive.h  |   4 +-
include/hw/ppc/spapr_irq.h |   4 +-
include/hw/ppc/xics.h  |   4 +-
include/hw/ppc/xive.h  |   4 +-
include/hw/ppc/xive2_regs.h|   8 +--
include/hw/ppc/xive_regs.h |   8 +--
hw/intc/pnv_xive.c |  38 ++--
hw/intc/pnv_xive2.c|  48 +++
hw/intc/spapr_xive.c   |  41 ++---
hw/intc/xics.c |  25 
hw/intc/xics_spapr.c   |   7 +--
hw/intc/xive.c | 108 -
hw/intc/xive2.c|  87 +-
hw/pci-host/pnv_phb3_msi.c |  21 +++
hw/pci-host/pnv_phb4.c |  17 +++---
hw/ppc/pnv.c   |  52 
hw/ppc/pnv_psi.c   |   9 ++-
hw/ppc/spapr.c |  11 +++-
hw/ppc/spapr_irq.c |   4 +-
23 files changed, 256 insertions(+), 254 deletions(-)

--
2.41.0


For the series:

Reviewed-by: Manos Pitsidianakis

Re: [PATCH v4 0/4] hw/nvme: FDP and SR-IOV enhancements

2024-06-11 Thread Klaus Jensen

On May 29 21:42, Minwoo Im wrote:
> Hello,
> 
> This is v4 patchset to increase number of virtual functions for NVMe SR-IOV.
> Please consider the following change notes per version.
> 
> This patchset has been tested with the following simple script more than
> 127 VFs.
> 
>   -device nvme-subsys,id=subsys0 \
>   -device ioh3420,id=rp2,multifunction=on,chassis=12 \
>   -device 
> nvme,serial=foo,id=nvme0,bus=rp2,subsys=subsys0,mdts=9,msix_qsize=130,max_ioqpairs=260,sriov_max_vfs=129,sriov_vq_flexible=258,sriov_vi_flexible=129
>  \
> 
>   $ cat nvme-enable-vfs.sh
>   #!/bin/bash
> 
>   nr_vfs=129
> 
>   for (( i=1; i<=$nr_vfs; i++ ))
>   do
>   nvme virt-mgmt /dev/nvme0 -c $i -r 0 -a 8 -n 2
>   nvme virt-mgmt /dev/nvme0 -c $i -r 1 -a 8 -n 1
>   done
> 
>   bdf=":01:00.0"
>   sysfs="/sys/bus/pci/devices/$bdf"
>   nvme="/sys/bus/pci/drivers/nvme"
> 
>   echo 0 > $sysfs/sriov_drivers_autoprobe
>   echo $nr_vfs > $sysfs/sriov_numvfs
> 
>   for (( i=1; i<=$nr_vfs; i++ ))
>   do
>   nvme virt-mgmt /dev/nvme0 -c $i -a 9
> 
>   echo "nvme" > $sysfs/virtfn$(($i-1))/driver_override
>   bdf="$(basename $(readlink $sysfs/virtfn$(($i-1"
>   echo $bdf > $nvme/bind
>   done
> 
> Thanks,
> 
> v4:
>  - Rebased on the latest master.
>  - Update n->params.sriov_max_vfs to uint16_t as per spec.
> 
> v3:
>  - Replace [3/4] patch with one allocating a dyanmic array of secondary
>controller list rather than a static array with a fixed size of
>maximum number of VF to support (Suggested by Klaus).
> v2: 
>  - Added [2/4] commit to fix crash due to entry overflow
> 
> Minwoo Im (4):
>   hw/nvme: add Identify Endurance Group List
>   hw/nvme: separate identify data for sec. ctrl list
>   hw/nvme: Allocate sec-ctrl-list as a dynamic array
>   hw/nvme: Expand VI/VQ resource to uint32
> 
>  hw/nvme/ctrl.c   | 59 +++-
>  hw/nvme/nvme.h   | 19 +++---
>  hw/nvme/subsys.c | 10 +---
>  include/block/nvme.h |  1 +
>  4 files changed, 54 insertions(+), 35 deletions(-)
> 
> -- 
> 2.34.1
> 

Looks good Minwoo!

Grabbing for nvme-next.

Reviewed-by: Klaus Jensen 


signature.asc
Description: PGP signature

Re: [PATCH] ui/gtk: Wait until the current guest frame is rendered before switching to RUN_STATE_SAVE_VM

2024-06-11 Thread Marc-André Lureau

Hi

On Wed, Jun 12, 2024 at 5:29 AM Kim, Dongwon  wrote:

> Hi,
>
> From: Marc-André Lureau 
> Sent: Wednesday, June 5, 2024 12:56 AM
> To: Kim, Dongwon 
> Cc: qemu-devel@nongnu.org; Peter Xu 
> Subject: Re: [PATCH] ui/gtk: Wait until the current guest frame is
> rendered before switching to RUN_STATE_SAVE_VM
>
> Hi
>
> On Tue, Jun 4, 2024 at 9:49 PM Kim, Dongwon 
> wrote:
> On 6/4/2024 4:12 AM, Marc-André Lureau wrote:
> > Hi
> >
> > On Thu, May 30, 2024 at 2:44 AM  > > wrote:
> >
> > From: Dongwon  dongwon@intel.com>>
> >
> > Make sure rendering of the current frame is finished before switching
> > the run state to RUN_STATE_SAVE_VM by waiting for egl-sync object to
> be
> > signaled.
> >
> >
> > Can you expand on what this solves?
>
> In current scheme, guest waits for the fence to be signaled for each
> frame it submits before moving to the next frame. If the guest’s state
> is saved while it is still waiting for the fence, The guest will
> continue to  wait for the fence that was signaled while ago when it is
> restored to the point. One way to prevent it is to get it finish the
> current frame before changing the state.
>
> After the UI sets a fence, hw_ops->gl_block(true) gets called, which will
> block virtio-gpu/virgl from processing commands (until the fence is
> signaled and gl_block/false called again).
>
> But this "blocking" state is not saved. So how does this affect
> save/restore? Please give more details, thanks
>
> Yeah sure. "Blocking" state is not saved but guest's state is saved while
> it was still waiting for the response for its last resource-flush virtio
> msg. This virtio response, by the way is set to be sent to the guest when
> the pipeline is unblocked (and when the fence is signaled.). Once the
> guest's state is saved, current instance of guest will be continued and
> receives the response as usual. The problem is happening when we restore
> the saved guest's state again because what guest does will be waiting for
> the response that was sent a while ago to the original instance.
>

Where is the pending response saved? Can you detail how you test this?

thanks


-- 
Marc-André Lureau

[PATCH v4 1/1] qga/linux: Add new api 'guest-network-get-route'

2024-06-11 Thread Dehan Meng

The Route information of the Linux VM needs to be used
by administrators and users when debugging network problems
and troubleshooting.

Signed-off-by: Dehan Meng 
---
 qga/commands-posix.c | 73 
 qga/commands-win32.c |  6 
 qga/qapi-schema.json | 68 +
 3 files changed, 147 insertions(+)

diff --git a/qga/commands-posix.c b/qga/commands-posix.c
index 6169bbf7a0..ffae88ca69 100644
--- a/qga/commands-posix.c
+++ b/qga/commands-posix.c
@@ -2747,6 +2747,73 @@ GuestCpuStatsList *qmp_guest_get_cpustats(Error **errp)
 return head;
 }
 
+char *hexToIPAddress(unsigned int hexValue, char ipAddress[16]);
+char *hexToIPAddress(unsigned int hexValue, char ipAddress[16])
+{
+unsigned int byte1 = (hexValue >> 24) & 0xFF;
+unsigned int byte2 = (hexValue >> 16) & 0xFF;
+unsigned int byte3 = (hexValue >> 8) & 0xFF;
+unsigned int byte4 = hexValue & 0xFF;
+
+snprintf(ipAddress, 16, "%u.%u.%u.%u", byte4, byte3, byte2, byte1);
+
+return ipAddress;
+}
+
+GuestNetworkRouteStatList *qmp_guest_network_get_route(Error **errp)
+{
+GuestNetworkRouteStatList *head = NULL, **tail = 
+const char *routeFile = "/proc/net/route";
+FILE *fp;
+size_t n;
+char *line = NULL;
+
+fp = fopen(routeFile, "r");
+if (fp == NULL) {
+error_setg_errno(errp, errno, "open(\"%s\")", routeFile);
+return NULL;
+}
+
+while (getline(, , fp) != -1) {
+GuestNetworkRouteStat *networkroute;
+int i;
+char Iface[16];
+unsigned int Destination, Gateway, Mask, Flags;
+int RefCnt, Use, Metric, MTU, Window, IRTT;
+
+i = (sscanf(line, "%s %X %X %x %d %d %d %X %d %d %d",
+Iface, , , , ,
+, , , , , ) == 11);
+if (i == EOF) {
+continue;
+}
+
+networkroute = g_new0(GuestNetworkRouteStat, 1);
+
+char DestAddress[16];
+char GateAddress[16];
+char MaskAddress[16];
+
+networkroute->iface = g_strdup(Iface);
+networkroute->destination = g_strdup(hexToIPAddress(Destination, 
DestAddress));
+networkroute->gateway = g_strdup(hexToIPAddress(Gateway, GateAddress));
+networkroute->mask = g_strdup(hexToIPAddress(Mask, MaskAddress));
+networkroute->metric = Metric;
+networkroute->flags = Flags;
+networkroute->refcnt = RefCnt;
+networkroute->use = Use;
+networkroute->mtu = MTU;
+networkroute->window = Window;
+networkroute->irtt = IRTT;
+
+QAPI_LIST_APPEND(tail, networkroute);
+}
+
+free(line);
+fclose(fp);
+return head;
+}
+
 #else /* defined(__linux__) */
 
 void qmp_guest_suspend_disk(Error **errp)
@@ -3118,6 +3185,12 @@ GuestCpuStatsList *qmp_guest_get_cpustats(Error **errp)
 return NULL;
 }
 
+GuestNetworkRouteList *qmp_guest_network_get_route(Error **errp)
+{
+error_setg(errp, QERR_UNSUPPORTED);
+return NULL;
+}
+
 #endif /* CONFIG_FSFREEZE */
 
 #if !defined(CONFIG_FSTRIM)
diff --git a/qga/commands-win32.c b/qga/commands-win32.c
index 697c65507c..e62c04800a 100644
--- a/qga/commands-win32.c
+++ b/qga/commands-win32.c
@@ -2522,3 +2522,9 @@ GuestCpuStatsList *qmp_guest_get_cpustats(Error **errp)
 error_setg(errp, QERR_UNSUPPORTED);
 return NULL;
 }
+
+GuestNetworkRouteList *qmp_guest_network_get_route(Error **errp)
+{
+error_setg(errp, QERR_UNSUPPORTED);
+return NULL;
+}
diff --git a/qga/qapi-schema.json b/qga/qapi-schema.json
index 876e2a8ea8..195f6cd4e7 100644
--- a/qga/qapi-schema.json
+++ b/qga/qapi-schema.json
@@ -1789,3 +1789,71 @@
 { 'command': 'guest-get-cpustats',
   'returns': ['GuestCpuStats']
 }
+
+##
+# @GuestNetworkRouteStat:
+#
+# Route information, currently, only linux supported.
+#
+# @iface: The destination network or host's egress network interface in the 
routing table
+#
+# @destination: The IP address of the target network or host, The final 
destination of the packet
+#
+# @gateway: The IP address of the next hop router
+#
+# @mask: Subnet Mask
+#
+# @metric: Route metricls
+#
+# @flags: Route flags (not for windows)
+#
+# @irtt: Initial round-trip delay (not for windows)
+#
+# @refcnt: The route's reference count (not for windows)
+#
+# @use: Route usage count (not for windows)
+#
+# @window: TCP window size, used for flow control (not for windows)
+#
+# @mtu: Data link layer maximum packet size (not for windows)
+#
+# Since: 9.1
+
+##
+{ 'struct': 'GuestNetworkRouteStat',
+  'data': {'iface': 'str',
+   'destination': 'str',
+   'gateway': 'str',
+   'metric': 'int',
+   'mask': 'str',
+   '*irtt': 'int',
+   '*flags': 'uint64',
+   '*refcnt': 'int',
+   '*use': 'int',
+   '*window': 'int',
+   '*mtu': 'int'
+   }}
+
+##
+# @GuestNetworkRoute:
+#
+# Get route information of system.
+#
+# @routes: A list of network route

[PATCH v4 0/1] qga/linux: Add new api 'guest-network-get-route'

2024-06-11 Thread Dehan Meng

v3 -> v4
- Fix some indentation issues
- Update 'Since 8.2' to 'Since 9.1'
- Remove useless enum and adjust this change.

v2 -> v3
- Remove this declaration and make the function 'hexToIPAddress' as static.
- Define 'IFNAMSIZ' from kernel instead of a hardcode
- Remove 'GUEST_NETWORK_ROUTE_TYPE_LINUX'
- Set flags 'has_xxx' for checking if a field exists or has a value set

v1 -> v2
- Replace snprintf() to g_strdup_printf() to avoid memory problems.
- Remove the parameter 'char ipAddress[16]' in function 'char 
*hexToIPAddress()'.
- Add a piece of logic to skip traversing the first line of the file

Dehan Meng (1):
  qga/linux: Add new api 'guest-network-get-route'

 qga/commands-posix.c | 73 
 qga/commands-win32.c |  6 
 qga/qapi-schema.json | 68 +
 3 files changed, 147 insertions(+)

-- 
2.40.1

[PATCH] hw/loongarch/virt: Remove unused assignment

2024-06-11 Thread Bibo Mao

There is abuse usage about local variable gap. Remove
duplicated assignment and solve Coverity reported error.

Resolves: Coverity CID 1546441
Fixes: 3cc451cbce ("hw/loongarch: Refine fwcfg memory map")
Signed-off-by: Bibo Mao 
---
 hw/loongarch/virt.c | 15 +++
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/hw/loongarch/virt.c b/hw/loongarch/virt.c
index 66cef201ab..2fe08583b8 100644
--- a/hw/loongarch/virt.c
+++ b/hw/loongarch/virt.c
@@ -1054,7 +1054,6 @@ static void fw_cfg_add_memory(MachineState *ms)
 memmap_add_entry(base, gap, 1);
 size -= gap;
 base = VIRT_HIGHMEM_BASE;
-gap = ram_size - VIRT_LOWMEM_SIZE;
 }
 
 if (size) {
@@ -1067,17 +1066,17 @@ static void fw_cfg_add_memory(MachineState *ms)
 }
 
 /* add fw_cfg memory map of other nodes */
-size = ram_size - numa_info[0].node_mem;
-gap  = VIRT_LOWMEM_BASE + VIRT_LOWMEM_SIZE;
-if (base < gap && (base + size) > gap) {
+if (numa_info[0].node_mem < gap && ram_size > gap) {
 /*
  * memory map for the maining nodes splited into two part
- *   lowram:  [base, +(gap - base))
- *   highram: [VIRT_HIGHMEM_BASE, +(size - (gap - base)))
+ * lowram:  [base, +(gap - numa_info[0].node_mem))
+ * highram: [VIRT_HIGHMEM_BASE, +(ram_size - gap))
  */
-memmap_add_entry(base, gap - base, 1);
-size -= gap - base;
+memmap_add_entry(base, gap - numa_info[0].node_mem, 1);
+size = ram_size - gap;
 base = VIRT_HIGHMEM_BASE;
+} else {
+size = ram_size - numa_info[0].node_mem;
 }
 
if (size)

base-commit: 80e8f0602168f451a93e71cbb1d59e93d745e62e
-- 
2.39.3

[PATCH v7 1/2] hw/misc/riscv_iopmp: Add RISC-V IOPMP device

2024-06-11 Thread Ethan Chen via

Support basic functions of IOPMP specification v0.9.1 rapid-k model.
The specification url:
https://github.com/riscv-non-isa/iopmp-spec/releases/tag/v0.9.1

IOPMP check memory access from device is valid or not. This implementation uses
IOMMU to change address space that device access. There are three possible
results of an access: valid, blocked, and stalled(stall is not supported in this
 patch).

If an access is valid, target address space is downstream_as.
If an access is blocked, it will go to blocked_io_as. The operation of
blocked_io_as could be a bus error, or it can respond a success with fabricated
data depending on IOPMP ERR_CFG register value.

Signed-off-by: Ethan Chen 
---
 hw/misc/Kconfig   |3 +
 hw/misc/meson.build   |1 +
 hw/misc/riscv_iopmp.c | 1002 +
 hw/misc/trace-events  |4 +
 include/hw/misc/riscv_iopmp.h |  152 +
 5 files changed, 1162 insertions(+)
 create mode 100644 hw/misc/riscv_iopmp.c
 create mode 100644 include/hw/misc/riscv_iopmp.h

diff --git a/hw/misc/Kconfig b/hw/misc/Kconfig
index 1e08785b83..427f0c702e 100644
--- a/hw/misc/Kconfig
+++ b/hw/misc/Kconfig
@@ -213,4 +213,7 @@ config IOSB
 config XLNX_VERSAL_TRNG
 bool
 
+config RISCV_IOPMP
+bool
+
 source macio/Kconfig
diff --git a/hw/misc/meson.build b/hw/misc/meson.build
index 86596a3888..f83cd108f8 100644
--- a/hw/misc/meson.build
+++ b/hw/misc/meson.build
@@ -34,6 +34,7 @@ system_ss.add(when: 'CONFIG_SIFIVE_E_PRCI', if_true: 
files('sifive_e_prci.c'))
 system_ss.add(when: 'CONFIG_SIFIVE_E_AON', if_true: files('sifive_e_aon.c'))
 system_ss.add(when: 'CONFIG_SIFIVE_U_OTP', if_true: files('sifive_u_otp.c'))
 system_ss.add(when: 'CONFIG_SIFIVE_U_PRCI', if_true: files('sifive_u_prci.c'))
+specific_ss.add(when: 'CONFIG_RISCV_IOPMP', if_true: files('riscv_iopmp.c'))
 
 subdir('macio')
 
diff --git a/hw/misc/riscv_iopmp.c b/hw/misc/riscv_iopmp.c
new file mode 100644
index 00..75b28dc559
--- /dev/null
+++ b/hw/misc/riscv_iopmp.c
@@ -0,0 +1,1002 @@
+/*
+ * QEMU RISC-V IOPMP (Input Output Physical Memory Protection)
+ *
+ * Copyright (c) 2023 Andes Tech. Corp.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2 or later, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program.  If not, see .
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/log.h"
+#include "qapi/error.h"
+#include "trace.h"
+#include "exec/exec-all.h"
+#include "exec/address-spaces.h"
+#include "hw/qdev-properties.h"
+#include "hw/sysbus.h"
+#include "hw/misc/riscv_iopmp.h"
+#include "memory.h"
+#include "hw/irq.h"
+#include "hw/registerfields.h"
+#include "trace.h"
+
+#define TYPE_IOPMP_IOMMU_MEMORY_REGION "iopmp-iommu-memory-region"
+
+REG32(VERSION, 0x00)
+FIELD(VERSION, VENDOR, 0, 24)
+FIELD(VERSION, SPECVER , 24, 8)
+REG32(IMP, 0x04)
+FIELD(IMP, IMPID, 0, 32)
+REG32(HWCFG0, 0x08)
+FIELD(HWCFG0, MODEL, 0, 4)
+FIELD(HWCFG0, TOR_EN, 4, 1)
+FIELD(HWCFG0, SPS_EN, 5, 1)
+FIELD(HWCFG0, USER_CFG_EN, 6, 1)
+FIELD(HWCFG0, PRIENT_PROG, 7, 1)
+FIELD(HWCFG0, RRID_TRANSL_EN, 8, 1)
+FIELD(HWCFG0, RRID_TRANSL_PROG, 9, 1)
+FIELD(HWCFG0, CHK_X, 10, 1)
+FIELD(HWCFG0, NO_X, 11, 1)
+FIELD(HWCFG0, NO_W, 12, 1)
+FIELD(HWCFG0, STALL_EN, 13, 1)
+FIELD(HWCFG0, PEIS, 14, 1)
+FIELD(HWCFG0, PEES, 15, 1)
+FIELD(HWCFG0, MFR_EN, 16, 1)
+FIELD(HWCFG0, MD_NUM, 24, 7)
+FIELD(HWCFG0, ENABLE, 31, 1)
+REG32(HWCFG1, 0x0C)
+FIELD(HWCFG1, RRID_NUM, 0, 16)
+FIELD(HWCFG1, ENTRY_NUM, 16, 16)
+REG32(HWCFG2, 0x10)
+FIELD(HWCFG2, PRIO_ENTRY, 0, 16)
+FIELD(HWCFG2, RRID_TRANSL, 16, 16)
+REG32(ENTRYOFFSET, 0x14)
+FIELD(ENTRYOFFSET, OFFSET, 0, 32)
+REG32(MDSTALL, 0x30)
+FIELD(MDSTALL, EXEMPT, 0, 1)
+FIELD(MDSTALL, MD, 1, 31)
+REG32(MDSTALLH, 0x34)
+FIELD(MDSTALLH, MD, 0, 32)
+REG32(RRIDSCP, 0x38)
+FIELD(RRIDSCP, RRID, 0, 16)
+FIELD(RRIDSCP, OP, 30, 2)
+REG32(MDLCK, 0x40)
+FIELD(MDLCK, L, 0, 1)
+FIELD(MDLCK, MD, 1, 31)
+REG32(MDLCKH, 0x44)
+FIELD(MDLCKH, MDH, 0, 32)
+REG32(MDCFGLCK, 0x48)
+FIELD(MDCFGLCK, L, 0, 1)
+FIELD(MDCFGLCK, F, 1, 7)
+REG32(ENTRYLCK, 0x4C)
+FIELD(ENTRYLCK, L, 0, 1)
+FIELD(ENTRYLCK, F, 1, 16)
+REG32(ERR_CFG, 0x60)
+FIELD(ERR_CFG, L, 0, 1)
+FIELD(ERR_CFG, IE, 1, 1)
+FIELD(ERR_CFG, IRE, 2, 1)
+FIELD(ERR_CFG, IWE, 3, 1)
+FIELD(ERR_CFG, IXE, 4, 1)
+FIELD(ERR_CFG, RRE, 5, 1)
+FIELD(ERR_CFG, RWE, 6, 1)
+FIELD(ERR_CFG, RXE, 7, 1)
+REG32(ERR_REQINFO,

[PATCH v7 2/2] hw/riscv/virt: Add IOPMP support

2024-06-11 Thread Ethan Chen via

If a requestor device is connected to the IOPMP device, its memory access will
be checked by the IOPMP rule.

- Add 'iopmp=on' option to add an iopmp device and make the Generic PCI Express
  Bridge connect to IOPMP.

Signed-off-by: Ethan Chen 
---
 docs/system/riscv/virt.rst |  6 
 hw/riscv/Kconfig   |  1 +
 hw/riscv/virt.c| 57 --
 include/hw/riscv/virt.h|  5 +++-
 4 files changed, 66 insertions(+), 3 deletions(-)

diff --git a/docs/system/riscv/virt.rst b/docs/system/riscv/virt.rst
index 9a06f95a34..3b2576f905 100644
--- a/docs/system/riscv/virt.rst
+++ b/docs/system/riscv/virt.rst
@@ -116,6 +116,12 @@ The following machine-specific options are supported:
   having AIA IMSIC (i.e. "aia=aplic-imsic" selected). When not specified,
   the default number of per-HART VS-level AIA IMSIC pages is 0.
 
+- iopmp=[on|off]
+
+  When this option is "on", an IOPMP device is added to machine. It checks dma
+  operations from the generic PCIe host bridge. This option is assumed to be
+  "off".
+
 Running Linux kernel
 
 
diff --git a/hw/riscv/Kconfig b/hw/riscv/Kconfig
index a2030e3a6f..0b45a5ade2 100644
--- a/hw/riscv/Kconfig
+++ b/hw/riscv/Kconfig
@@ -56,6 +56,7 @@ config RISCV_VIRT
 select PLATFORM_BUS
 select ACPI
 select ACPI_PCI
+select RISCV_IOPMP
 
 config SHAKTI_C
 bool
diff --git a/hw/riscv/virt.c b/hw/riscv/virt.c
index 4fdb660525..53a1b71c71 100644
--- a/hw/riscv/virt.c
+++ b/hw/riscv/virt.c
@@ -55,6 +55,7 @@
 #include "hw/acpi/aml-build.h"
 #include "qapi/qapi-visit-common.h"
 #include "hw/virtio/virtio-iommu.h"
+#include "hw/misc/riscv_iopmp.h"
 
 /* KVM AIA only supports APLIC MSI. APLIC Wired is always emulated by QEMU. */
 static bool virt_use_kvm_aia(RISCVVirtState *s)
@@ -82,6 +83,7 @@ static const MemMapEntry virt_memmap[] = {
 [VIRT_UART0] ={ 0x1000, 0x100 },
 [VIRT_VIRTIO] =   { 0x10001000,0x1000 },
 [VIRT_FW_CFG] =   { 0x1010,  0x18 },
+[VIRT_IOPMP] ={ 0x1020,  0x10 },
 [VIRT_FLASH] ={ 0x2000, 0x400 },
 [VIRT_IMSIC_M] =  { 0x2400, VIRT_IMSIC_MAX_SIZE },
 [VIRT_IMSIC_S] =  { 0x2800, VIRT_IMSIC_MAX_SIZE },
@@ -1006,6 +1008,24 @@ static void create_fdt_virtio_iommu(RISCVVirtState *s, 
uint16_t bdf)
bdf + 1, iommu_phandle, bdf + 1, 0x - bdf);
 }
 
+static void create_fdt_iopmp(RISCVVirtState *s, const MemMapEntry *memmap,
+ uint32_t irq_mmio_phandle) {
+g_autofree char *name = NULL;
+MachineState *ms = MACHINE(s);
+
+name = g_strdup_printf("/soc/iopmp@%lx", (long)memmap[VIRT_IOPMP].base);
+qemu_fdt_add_subnode(ms->fdt, name);
+qemu_fdt_setprop_string(ms->fdt, name, "compatible", "riscv_iopmp");
+qemu_fdt_setprop_cells(ms->fdt, name, "reg", 0x0, memmap[VIRT_IOPMP].base,
+0x0, memmap[VIRT_IOPMP].size);
+qemu_fdt_setprop_cell(ms->fdt, name, "interrupt-parent", irq_mmio_phandle);
+if (s->aia_type == VIRT_AIA_TYPE_NONE) {
+qemu_fdt_setprop_cell(ms->fdt, name, "interrupts", IOPMP_IRQ);
+} else {
+qemu_fdt_setprop_cells(ms->fdt, name, "interrupts", IOPMP_IRQ, 0x4);
+}
+}
+
 static void finalize_fdt(RISCVVirtState *s)
 {
 uint32_t phandle = 1, irq_mmio_phandle = 1, msi_pcie_phandle = 1;
@@ -1024,6 +1044,10 @@ static void finalize_fdt(RISCVVirtState *s)
 create_fdt_uart(s, virt_memmap, irq_mmio_phandle);
 
 create_fdt_rtc(s, virt_memmap, irq_mmio_phandle);
+
+if (s->have_iopmp) {
+create_fdt_iopmp(s, virt_memmap, irq_mmio_phandle);
+}
 }
 
 static void create_fdt(RISCVVirtState *s, const MemMapEntry *memmap)
@@ -1404,7 +1428,7 @@ static void virt_machine_init(MachineState *machine)
 RISCVVirtState *s = RISCV_VIRT_MACHINE(machine);
 MemoryRegion *system_memory = get_system_memory();
 MemoryRegion *mask_rom = g_new(MemoryRegion, 1);
-DeviceState *mmio_irqchip, *virtio_irqchip, *pcie_irqchip;
+DeviceState *mmio_irqchip, *virtio_irqchip, *pcie_irqchip, *gpex_dev;
 int i, base_hartid, hart_count;
 int socket_count = riscv_socket_count(machine);
 
@@ -1570,7 +1594,7 @@ static void virt_machine_init(MachineState *machine)
 qdev_get_gpio_in(virtio_irqchip, VIRTIO_IRQ + i));
 }
 
-gpex_pcie_init(system_memory, pcie_irqchip, s);
+gpex_dev = gpex_pcie_init(system_memory, pcie_irqchip, s);
 
 create_platform_bus(s, mmio_irqchip);
 
@@ -1581,6 +1605,14 @@ static void virt_machine_init(MachineState *machine)
 sysbus_create_simple("goldfish_rtc", memmap[VIRT_RTC].base,
 qdev_get_gpio_in(mmio_irqchip, RTC_IRQ));
 
+if (s->have_iopmp) {
+DeviceState *iopmp_dev = sysbus_create_simple(TYPE_IOPMP,
+memmap[VIRT_IOPMP].base,
+qdev_get_gpio_in(DEVICE(mmio_irqchip), IOPMP_IRQ));
+
+iopmp_setup_pci(iopmp_dev, PCI_HOST_BRIDGE(gpex_dev)->bus);

[PATCH v7 0/2] Support RISC-V IOPMP

2024-06-11 Thread Ethan Chen via

Due to changing the referenced specification version, this patch has changed
a lot in this version.

This series implements basic functions of IOPMP specification v0.9.1 rapid-k
model.
The specification url:
https://github.com/riscv-non-isa/iopmp-spec/releases/tag/v0.9.1

When IOPMP is enabled, memory access from devices will check by IOPMP.

CPU as an IOPMP requestor has not been implemented because the IOTLB does not
support recording sections outside the current CPU address space.

Changes for v7:

  - Change the specification version to v0.9.1
  - Remove the sps extension
  - Remove stall support, transaction information which need requestor device
support.
  - Remove iopmp_cascade option for virt machine
  - Refine 'addr' range checks switch case (Daniel)


Ethan Chen (2):
  hw/misc/riscv_iopmp: Add RISC-V IOPMP device
  hw/riscv/virt: Add IOPMP support

 docs/system/riscv/virt.rst|6 +
 hw/misc/Kconfig   |3 +
 hw/misc/meson.build   |1 +
 hw/misc/riscv_iopmp.c | 1002 +
 hw/misc/trace-events  |4 +
 hw/riscv/Kconfig  |1 +
 hw/riscv/virt.c   |   57 +-
 include/hw/misc/riscv_iopmp.h |  152 +
 include/hw/riscv/virt.h   |5 +-
 9 files changed, 1228 insertions(+), 3 deletions(-)
 create mode 100644 hw/misc/riscv_iopmp.c
 create mode 100644 include/hw/misc/riscv_iopmp.h

-- 
2.34.1

Re: [PATCH 3/6] target/riscv: Add support for Control Transfer Records extension CSRs.

2024-06-11 Thread Jason Chien

It makes sense. Thank you for the explanation.

Rajnesh Kanwal  於 2024年6月10日 週一 下午10:12寫道：

>
> Thanks Jason for your review.
>
> On Tue, Jun 4, 2024 at 11:14 AM Jason Chien 
> wrote:
> >
> >
> > Rajnesh Kanwal 於 2024/5/30 上午 12:09 寫道:
> >
> > This commit adds support for [m|s|vs]ctrcontrol, sctrstatus and
> > sctrdepth CSRs handling.
> >
> > Signed-off-by: Rajnesh Kanwal 
> > ---
> >  target/riscv/cpu.h |   5 ++
> >  target/riscv/cpu_cfg.h |   2 +
> >  target/riscv/csr.c | 159 +
> >  3 files changed, 166 insertions(+)
> >
> > diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
> > index a185e2d494..3d4d5172b8 100644
> > --- a/target/riscv/cpu.h
> > +++ b/target/riscv/cpu.h
> > @@ -263,6 +263,11 @@ struct CPUArchState {
> >  target_ulong mcause;
> >  target_ulong mtval;  /* since: priv-1.10.0 */
> >
> > +uint64_t mctrctl;
> > +uint32_t sctrdepth;
> > +uint32_t sctrstatus;
> > +uint64_t vsctrctl;
> > +
> >  /* Machine and Supervisor interrupt priorities */
> >  uint8_t miprio[64];
> >  uint8_t siprio[64];
> > diff --git a/target/riscv/cpu_cfg.h b/target/riscv/cpu_cfg.h
> > index d9354dc80a..d329a65811 100644
> > --- a/target/riscv/cpu_cfg.h
> > +++ b/target/riscv/cpu_cfg.h
> > @@ -123,6 +123,8 @@ struct RISCVCPUConfig {
> >  bool ext_zvfhmin;
> >  bool ext_smaia;
> >  bool ext_ssaia;
> > +bool ext_smctr;
> > +bool ext_ssctr;
> >  bool ext_sscofpmf;
> >  bool ext_smepmp;
> >  bool rvv_ta_all_1s;
> > diff --git a/target/riscv/csr.c b/target/riscv/csr.c
> > index 2f92e4b717..888084d8e5 100644
> > --- a/target/riscv/csr.c
> > +++ b/target/riscv/csr.c
> > @@ -621,6 +621,61 @@ static RISCVException pointer_masking(CPURISCVState
> *env, int csrno)
> >  return RISCV_EXCP_ILLEGAL_INST;
> >  }
> >
> > +/*
> > + * M-mode:
> > + * Without ext_smctr raise illegal inst excep.
> > + * Otherwise everything is accessible to m-mode.
> > + *
> > + * S-mode:
> > + * Without ext_ssctr or mstateen.ctr raise illegal inst excep.
> > + * Otherwise everything other than mctrctl is accessible.
> > + *
> > + * VS-mode:
> > + * Without ext_ssctr or mstateen.ctr raise illegal inst excep.
> > + * Without hstateen.ctr raise virtual illegal inst excep.
> > + * Otherwise allow vsctrctl, sctrstatus, 0x200-0x2ff entry range.
> > + * Always raise illegal instruction exception for sctrdepth.
> > + */
> > +static RISCVException ctr_mmode(CPURISCVState *env, int csrno)
> > +{
> > +/* Check if smctr-ext is present */
> > +if (riscv_cpu_cfg(env)->ext_smctr) {
> > +return RISCV_EXCP_NONE;
> > +}
> > +
> > +return RISCV_EXCP_ILLEGAL_INST;
> > +}
> > +
> > +static RISCVException ctr_smode(CPURISCVState *env, int csrno)
> > +{
> > +if ((env->priv == PRV_M && riscv_cpu_cfg(env)->ext_smctr) ||
> > +(env->priv == PRV_S && !env->virt_enabled &&
> > + riscv_cpu_cfg(env)->ext_ssctr)) {
> > +return smstateen_acc_ok(env, 0, SMSTATEEN0_CTR);
> > +}
> > +
> > +if (env->priv == PRV_S && env->virt_enabled &&
> > +riscv_cpu_cfg(env)->ext_ssctr) {
> > +if (csrno == CSR_SCTRSTATUS) {
> >
> > missing sctrctl?
> >
> > +return smstateen_acc_ok(env, 0, SMSTATEEN0_CTR);
> > +}
> > +
> > +return RISCV_EXCP_VIRT_INSTRUCTION_FAULT;
> > +}
> > +
> > +return RISCV_EXCP_ILLEGAL_INST;
> > +}
> >
> > I think there is no need to bind M-mode with ext_smctr, S-mode with
> ext_ssctr and VS-mode with ext_ssctr, since this predicate function is for
> S-mode CSRs, which are defined in both smctr and ssctr, we just need to
> check at least one of ext_ssctr or ext_smctr is true.
> >
> > The spec states that:
> > Attempts to access sctrdepth from VS-mode or VU-mode raise a
> virtual-instruction exception, unless CTR state enable access restrictions
> apply.
> >
> > In my understanding, we should check the presence of smstateen extension
> first, and
> >
> > if smstateen is implemented:
> >
> > for sctrctl and sctrstatus, call smstateen_acc_ok()
> > for sctrdepth, call smstateen_acc_ok(), and if there is any exception
> returned, always report virtual-instruction exception.
>
> For sctrdepth, we are supposed to always return a virt-inst exception in
> case of
> VS-VU mode unless CTR state enable access restrictions apply.
>
> So for sctrdepth, call smstateen_acc_ok(), and if there is no exception
> returned
> (mstateen.CTR=1 and hstateen.CTR=1 for virt mode), check if we are in
> virtual
> mode and return virtual-instruction exception otherwise return
> RISCV_EXCP_NONE.
> Note that if hstateen.CTR=0, smstateen_acc_ok() will return
> virtual-instruction
> exception which means regardless of the hstateen.CTR state, we will always
> return virtual-instruction exception for VS/VU mode access to sctrdepth.
>
> Basically this covers following rules for sctrdepth:
>
> if mstateen.ctr == 0
> return RISCV_EXCP_ILLEGAL_INST; // For all modes lower than M-mode.
> else

Re: [PATCH v3] hw/arm/virt: Avoid unexpected warning from Linux guest on host with Fujitsu CPUs

2024-06-11 Thread Donald Dutile





On 6/11/24 10:05 PM, Zhenyu Zhang wrote:

Multiple warning messages and corresponding backtraces are observed when Linux
guest is booted on the host with Fujitsu CPUs. One of them is shown as below.

[0.032443] [ cut here ]
[0.032446] uart-pl011 900.pl011: ARCH_DMA_MINALIGN smaller than
CTR_EL0.CWG (128 < 256)
[0.032454] WARNING: CPU: 0 PID: 1 at arch/arm64/mm/dma-mapping.c:54
arch_setup_dma_ops+0xbc/0xcc
[0.032470] Modules linked in:
[0.032475] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.14.0-452.el9.aarch64
[0.032481] Hardware name: linux,dummy-virt (DT)
[0.032484] pstate: 6045 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[0.032490] pc : arch_setup_dma_ops+0xbc/0xcc
[0.032496] lr : arch_setup_dma_ops+0xbc/0xcc
[0.032501] sp : 80008003b860
[0.032503] x29: 80008003b860 x28:  x27: aae4b949049c
[0.032510] x26:  x25:  x24: 
[0.032517] x23: 0100 x22:  x21: 
[0.032523] x20: 0001 x19: 2f06c02ea400 x18: 
[0.032529] x17: 208a5f76 x16: 6589dbcb x15: aae4ba071c89
[0.032535] x14:  x13: aae4ba071c84 x12: 455f525443206e61
[0.032541] x11: 68742072656c6c61 x10: 0029 x9 : aae4b7d21da4
[0.032547] x8 : 0029 x7 : 4c414e494d5f414d x6 : 0029
[0.032553] x5 : 000f x4 : aae4b9617a00 x3 : 0001
[0.032558] x2 :  x1 :  x0 : 2f06c029be40
[0.032564] Call trace:
[0.032566]  arch_setup_dma_ops+0xbc/0xcc
[0.032572]  of_dma_configure_id+0x138/0x300
[0.032591]  amba_dma_configure+0x34/0xc0
[0.032600]  really_probe+0x78/0x3dc
[0.032614]  __driver_probe_device+0x108/0x160
[0.032619]  driver_probe_device+0x44/0x114
[0.032624]  __device_attach_driver+0xb8/0x14c
[0.032629]  bus_for_each_drv+0x88/0xe4
[0.032634]  __device_attach+0xb0/0x1e0
[0.032638]  device_initial_probe+0x18/0x20
[0.032643]  bus_probe_device+0xa8/0xb0
[0.032648]  device_add+0x4b4/0x6c0
[0.032652]  amba_device_try_add.part.0+0x48/0x360
[0.032657]  amba_device_add+0x104/0x144
[0.032662]  of_amba_device_create.isra.0+0x100/0x1c4
[0.032666]  of_platform_bus_create+0x294/0x35c
[0.032669]  of_platform_populate+0x5c/0x150
[0.032672]  of_platform_default_populate_init+0xd0/0xec
[0.032697]  do_one_initcall+0x4c/0x2e0
[0.032701]  do_initcalls+0x100/0x13c
[0.032707]  kernel_init_freeable+0x1c8/0x21c
[0.032712]  kernel_init+0x28/0x140
[0.032731]  ret_from_fork+0x10/0x20
[0.032735] ---[ end trace  ]---

In Linux, a check is applied to every device which is exposed through
device-tree node. The warning message is raised when the device isn't
DMA coherent and the cache line size is larger than ARCH_DMA_MINALIGN
(128 bytes). The cache line is sorted from CTR_EL0[CWG], which corresponds
to 256 bytes on the guest CPUs. The DMA coherent capability is claimed
through 'dma-coherent' in their device-tree nodes or parent nodes.

Fix the issue by adding 'dma-coherent' property to the device-tree root
node, meaning all devices are capable of DMA coherent by default.

Signed-off-by: Zhenyu Zhang 
---
v3: Add comments explaining why we add 'dma-coherent' property (Peter)
---
  hw/arm/virt.c | 11 +++
  1 file changed, 11 insertions(+)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 3c93c0c0a6..3cefac6d43 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -271,6 +271,17 @@ static void create_fdt(VirtMachineState *vms)
  qemu_fdt_setprop_cell(fdt, "/", "#size-cells", 0x2);
  qemu_fdt_setprop_string(fdt, "/", "model", "linux,dummy-virt");
  
+/*

+ * For QEMU, all DMA is coherent. Advertising this in the root node
+ * has two benefits:
+ *
+ * - It avoids potential bugs where we forget to mark a DMA
+ *   capable device as being dma-coherent
+ * - It avoids spurious warnings from the Linux kernel about
+ *   devices which can't do DMA at all
+ */
+qemu_fdt_setprop(fdt, "/", "dma-coherent", NULL, 0);
+
  /* /chosen must exist for load_dtb to fill in necessary properties later 
*/
  qemu_fdt_add_subnode(fdt, "/chosen");
  if (vms->dtb_randomness) {


+1 to Peter's suggested comment, otherwise, unless privy to this thread,
one would wonder how/why.

Reviewed-by: Donald Dutile

Re: [PATCH v3] hw/arm/virt: Avoid unexpected warning from Linux guest on host with Fujitsu CPUs

2024-06-11 Thread Gavin Shan


On 6/12/24 12:05, Zhenyu Zhang wrote:

Multiple warning messages and corresponding backtraces are observed when Linux
guest is booted on the host with Fujitsu CPUs. One of them is shown as below.

[0.032443] [ cut here ]
[0.032446] uart-pl011 900.pl011: ARCH_DMA_MINALIGN smaller than
CTR_EL0.CWG (128 < 256)
[0.032454] WARNING: CPU: 0 PID: 1 at arch/arm64/mm/dma-mapping.c:54
arch_setup_dma_ops+0xbc/0xcc
[0.032470] Modules linked in:
[0.032475] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.14.0-452.el9.aarch64
[0.032481] Hardware name: linux,dummy-virt (DT)
[0.032484] pstate: 6045 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[0.032490] pc : arch_setup_dma_ops+0xbc/0xcc
[0.032496] lr : arch_setup_dma_ops+0xbc/0xcc
[0.032501] sp : 80008003b860
[0.032503] x29: 80008003b860 x28:  x27: aae4b949049c
[0.032510] x26:  x25:  x24: 
[0.032517] x23: 0100 x22:  x21: 
[0.032523] x20: 0001 x19: 2f06c02ea400 x18: 
[0.032529] x17: 208a5f76 x16: 6589dbcb x15: aae4ba071c89
[0.032535] x14:  x13: aae4ba071c84 x12: 455f525443206e61
[0.032541] x11: 68742072656c6c61 x10: 0029 x9 : aae4b7d21da4
[0.032547] x8 : 0029 x7 : 4c414e494d5f414d x6 : 0029
[0.032553] x5 : 000f x4 : aae4b9617a00 x3 : 0001
[0.032558] x2 :  x1 :  x0 : 2f06c029be40
[0.032564] Call trace:
[0.032566]  arch_setup_dma_ops+0xbc/0xcc
[0.032572]  of_dma_configure_id+0x138/0x300
[0.032591]  amba_dma_configure+0x34/0xc0
[0.032600]  really_probe+0x78/0x3dc
[0.032614]  __driver_probe_device+0x108/0x160
[0.032619]  driver_probe_device+0x44/0x114
[0.032624]  __device_attach_driver+0xb8/0x14c
[0.032629]  bus_for_each_drv+0x88/0xe4
[0.032634]  __device_attach+0xb0/0x1e0
[0.032638]  device_initial_probe+0x18/0x20
[0.032643]  bus_probe_device+0xa8/0xb0
[0.032648]  device_add+0x4b4/0x6c0
[0.032652]  amba_device_try_add.part.0+0x48/0x360
[0.032657]  amba_device_add+0x104/0x144
[0.032662]  of_amba_device_create.isra.0+0x100/0x1c4
[0.032666]  of_platform_bus_create+0x294/0x35c
[0.032669]  of_platform_populate+0x5c/0x150
[0.032672]  of_platform_default_populate_init+0xd0/0xec
[0.032697]  do_one_initcall+0x4c/0x2e0
[0.032701]  do_initcalls+0x100/0x13c
[0.032707]  kernel_init_freeable+0x1c8/0x21c
[0.032712]  kernel_init+0x28/0x140
[0.032731]  ret_from_fork+0x10/0x20
[0.032735] ---[ end trace  ]---

In Linux, a check is applied to every device which is exposed through
device-tree node. The warning message is raised when the device isn't
DMA coherent and the cache line size is larger than ARCH_DMA_MINALIGN
(128 bytes). The cache line is sorted from CTR_EL0[CWG], which corresponds
to 256 bytes on the guest CPUs. The DMA coherent capability is claimed
through 'dma-coherent' in their device-tree nodes or parent nodes.

Fix the issue by adding 'dma-coherent' property to the device-tree root
node, meaning all devices are capable of DMA coherent by default.

Signed-off-by: Zhenyu Zhang 
---
v3: Add comments explaining why we add 'dma-coherent' property (Peter)
---
  hw/arm/virt.c | 11 +++
  1 file changed, 11 insertions(+)



Reviewed-by: Gavin Shan

[PATCH v3] hw/arm/virt: Avoid unexpected warning from Linux guest on host with Fujitsu CPUs

2024-06-11 Thread Zhenyu Zhang

Multiple warning messages and corresponding backtraces are observed when Linux
guest is booted on the host with Fujitsu CPUs. One of them is shown as below.

[0.032443] [ cut here ]
[0.032446] uart-pl011 900.pl011: ARCH_DMA_MINALIGN smaller than
CTR_EL0.CWG (128 < 256)
[0.032454] WARNING: CPU: 0 PID: 1 at arch/arm64/mm/dma-mapping.c:54
arch_setup_dma_ops+0xbc/0xcc
[0.032470] Modules linked in:
[0.032475] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.14.0-452.el9.aarch64
[0.032481] Hardware name: linux,dummy-virt (DT)
[0.032484] pstate: 6045 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[0.032490] pc : arch_setup_dma_ops+0xbc/0xcc
[0.032496] lr : arch_setup_dma_ops+0xbc/0xcc
[0.032501] sp : 80008003b860
[0.032503] x29: 80008003b860 x28:  x27: aae4b949049c
[0.032510] x26:  x25:  x24: 
[0.032517] x23: 0100 x22:  x21: 
[0.032523] x20: 0001 x19: 2f06c02ea400 x18: 
[0.032529] x17: 208a5f76 x16: 6589dbcb x15: aae4ba071c89
[0.032535] x14:  x13: aae4ba071c84 x12: 455f525443206e61
[0.032541] x11: 68742072656c6c61 x10: 0029 x9 : aae4b7d21da4
[0.032547] x8 : 0029 x7 : 4c414e494d5f414d x6 : 0029
[0.032553] x5 : 000f x4 : aae4b9617a00 x3 : 0001
[0.032558] x2 :  x1 :  x0 : 2f06c029be40
[0.032564] Call trace:
[0.032566]  arch_setup_dma_ops+0xbc/0xcc
[0.032572]  of_dma_configure_id+0x138/0x300
[0.032591]  amba_dma_configure+0x34/0xc0
[0.032600]  really_probe+0x78/0x3dc
[0.032614]  __driver_probe_device+0x108/0x160
[0.032619]  driver_probe_device+0x44/0x114
[0.032624]  __device_attach_driver+0xb8/0x14c
[0.032629]  bus_for_each_drv+0x88/0xe4
[0.032634]  __device_attach+0xb0/0x1e0
[0.032638]  device_initial_probe+0x18/0x20
[0.032643]  bus_probe_device+0xa8/0xb0
[0.032648]  device_add+0x4b4/0x6c0
[0.032652]  amba_device_try_add.part.0+0x48/0x360
[0.032657]  amba_device_add+0x104/0x144
[0.032662]  of_amba_device_create.isra.0+0x100/0x1c4
[0.032666]  of_platform_bus_create+0x294/0x35c
[0.032669]  of_platform_populate+0x5c/0x150
[0.032672]  of_platform_default_populate_init+0xd0/0xec
[0.032697]  do_one_initcall+0x4c/0x2e0
[0.032701]  do_initcalls+0x100/0x13c
[0.032707]  kernel_init_freeable+0x1c8/0x21c
[0.032712]  kernel_init+0x28/0x140
[0.032731]  ret_from_fork+0x10/0x20
[0.032735] ---[ end trace  ]---

In Linux, a check is applied to every device which is exposed through
device-tree node. The warning message is raised when the device isn't
DMA coherent and the cache line size is larger than ARCH_DMA_MINALIGN
(128 bytes). The cache line is sorted from CTR_EL0[CWG], which corresponds
to 256 bytes on the guest CPUs. The DMA coherent capability is claimed
through 'dma-coherent' in their device-tree nodes or parent nodes.

Fix the issue by adding 'dma-coherent' property to the device-tree root
node, meaning all devices are capable of DMA coherent by default.

Signed-off-by: Zhenyu Zhang 
---
v3: Add comments explaining why we add 'dma-coherent' property (Peter)
---
 hw/arm/virt.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 3c93c0c0a6..3cefac6d43 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -271,6 +271,17 @@ static void create_fdt(VirtMachineState *vms)
 qemu_fdt_setprop_cell(fdt, "/", "#size-cells", 0x2);
 qemu_fdt_setprop_string(fdt, "/", "model", "linux,dummy-virt");
 
+/*
+ * For QEMU, all DMA is coherent. Advertising this in the root node
+ * has two benefits:
+ *
+ * - It avoids potential bugs where we forget to mark a DMA
+ *   capable device as being dma-coherent
+ * - It avoids spurious warnings from the Linux kernel about
+ *   devices which can't do DMA at all
+ */
+qemu_fdt_setprop(fdt, "/", "dma-coherent", NULL, 0);
+
 /* /chosen must exist for load_dtb to fill in necessary properties later */
 qemu_fdt_add_subnode(fdt, "/chosen");
 if (vms->dtb_randomness) {
-- 
2.43.0

RE: [PATCH] ui/gtk: Wait until the current guest frame is rendered before switching to RUN_STATE_SAVE_VM

2024-06-11 Thread Kim, Dongwon

Hi, 

From: Marc-André Lureau  
Sent: Wednesday, June 5, 2024 12:56 AM
To: Kim, Dongwon 
Cc: qemu-devel@nongnu.org; Peter Xu 
Subject: Re: [PATCH] ui/gtk: Wait until the current guest frame is rendered 
before switching to RUN_STATE_SAVE_VM

Hi

On Tue, Jun 4, 2024 at 9:49 PM Kim, Dongwon  
wrote:
On 6/4/2024 4:12 AM, Marc-André Lureau wrote:
> Hi
> 
> On Thu, May 30, 2024 at 2:44 AM  > wrote:
> 
>     From: Dongwon >
> 
>     Make sure rendering of the current frame is finished before switching
>     the run state to RUN_STATE_SAVE_VM by waiting for egl-sync object to be
>     signaled.
> 
> 
> Can you expand on what this solves?

In current scheme, guest waits for the fence to be signaled for each 
frame it submits before moving to the next frame. If the guest’s state 
is saved while it is still waiting for the fence, The guest will 
continue to  wait for the fence that was signaled while ago when it is 
restored to the point. One way to prevent it is to get it finish the 
current frame before changing the state.

After the UI sets a fence, hw_ops->gl_block(true) gets called, which will block 
virtio-gpu/virgl from processing commands (until the fence is signaled and 
gl_block/false called again).

But this "blocking" state is not saved. So how does this affect save/restore? 
Please give more details, thanks

Yeah sure. "Blocking" state is not saved but guest's state is saved while it 
was still waiting for the response for its last resource-flush virtio msg. This 
virtio response, by the way is set to be sent to the guest when the pipeline is 
unblocked (and when the fence is signaled.). Once the guest's state is saved, 
current instance of guest will be continued and receives the response as usual. 
The problem is happening when we restore the saved guest's state again because 
what guest does will be waiting for the response that was sent a while ago to 
the original instance.

> 
> 
>     Cc: Marc-André Lureau      >
>     Cc: Vivek Kasireddy      >
>     Signed-off-by: Dongwon Kim      >
>     ---
>       ui/egl-helpers.c |  2 --
>       ui/gtk.c         | 19 +++
>       2 files changed, 19 insertions(+), 2 deletions(-)
> 
>     diff --git a/ui/egl-helpers.c b/ui/egl-helpers.c
>     index 99b2ebbe23..dafeb36074 100644
>     --- a/ui/egl-helpers.c
>     +++ b/ui/egl-helpers.c
>     @@ -396,8 +396,6 @@ void egl_dmabuf_create_fence(QemuDmaBuf *dmabuf)
>               fence_fd = eglDupNativeFenceFDANDROID(qemu_egl_display,
>                                                     sync);
>               qemu_dmabuf_set_fence_fd(dmabuf, fence_fd);
>     -        eglDestroySyncKHR(qemu_egl_display, sync);
>     -        qemu_dmabuf_set_sync(dmabuf, NULL);
> 
> 
> If this function is called multiple times, it will now set a new 
> fence_fd each time, and potentially leak older fd. Maybe it could first 
> check if a fence_fd exists instead.

We can make that change.

> 
>           }
>       }
> 
>     diff --git a/ui/gtk.c b/ui/gtk.c
>     index 93b13b7a30..cf0dd6abed 100644
>     --- a/ui/gtk.c
>     +++ b/ui/gtk.c
>     @@ -600,9 +600,12 @@ void gd_hw_gl_flushed(void *vcon)
> 
>           fence_fd = qemu_dmabuf_get_fence_fd(dmabuf);
>           if (fence_fd >= 0) {
>     +        void *sync = qemu_dmabuf_get_sync(dmabuf);
>               qemu_set_fd_handler(fence_fd, NULL, NULL, NULL);
>               close(fence_fd);
>               qemu_dmabuf_set_fence_fd(dmabuf, -1);
>     +        eglDestroySyncKHR(qemu_egl_display, sync);
>     +        qemu_dmabuf_set_sync(dmabuf, NULL);
>               graphic_hw_gl_block(vc->gfx.dcl.con, false);
>           }
>       }
>     @@ -682,6 +685,22 @@ static const DisplayGLCtxOps egl_ctx_ops = {
>       static void gd_change_runstate(void *opaque, bool running,
>     RunState state)
>       {
>           GtkDisplayState *s = opaque;
>     +    QemuDmaBuf *dmabuf;
>     +    int i;
>     +
>     +    if (state == RUN_STATE_SAVE_VM) {
>     +        for (i = 0; i < s->nb_vcs; i++) {
>     +            VirtualConsole *vc = >vc[i];
>     +            dmabuf = vc->gfx.guest_fb.dmabuf;
>     +            if (dmabuf && qemu_dmabuf_get_fence_fd(dmabuf) >= 0) {
>     +                /* wait for the rendering to be completed */
>     +                eglClientWaitSync(qemu_egl_display,
>     +                                  qemu_dmabuf_get_sync(dmabuf),
>     +                                  EGL_SYNC_FLUSH_COMMANDS_BIT_KHR,
>     +                                  10);
> 
> 
>   I don't think adding waiting points in the migration path is 
> appropriate. Perhaps once

Re: qemu-riscv32 usermode still broken?

2024-06-11 Thread Alistair Francis

On Tue, Jun 11, 2024 at 6:57 PM Andreas K. Huettel  wrote:
>
> Hi Alistair,
>
> >
> > Ok!
> >
> > So on my x86 machine I see this
> >
> > --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=285545,
> > si_uid=1000, si_status=0, si_utime=0, si_stime=0} ---
> > wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}],
> > WNOHANG|WSTOPPED|WCONTINUED, NULL) = 285545
> > wait4(-1, 0x7ffe3eeb8210, WNOHANG|WSTOPPED|WCONTINUED, NULL) = 0
> > rt_sigreturn({mask=[INT]})  = 0
> > close(3)= 0
> >
> > It all looks ok.
>
> This was fixed in the meantime (hooray!), sorry I didn't think anyone
> would still look at the old thread. The commit is given below.
>
> Since then we've been able to build riscv32 stages for Gentoo just fine
> using qemu-user, see
> https://www.gentoo.org/downloads/#riscv

Great!

Alistair

>
> Cheers,
> Andreas
>
> commit f0907ff4cae743f1a4ef3d0a55a047029eed06ff
> Author: Richard Henderson 
> AuthorDate: Fri Apr 5 11:58:14 2024 -1000
> Commit: Richard Henderson 
> CommitDate: Tue Apr 9 07:43:11 2024 -1000
>
> linux-user: Fix waitid return of siginfo_t and rusage
>
> The copy back to siginfo_t should be conditional only on arg3,
> not the specific values that might have been written.
> The copy back to rusage was missing entirely.
>
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2262
> Signed-off-by: Richard Henderson 
> Tested-by: Alex Fan 
> Reviewed-by: Philippe Mathieu-Daudé 
>
>
>
> >
> > Maybe the host_to_target_siginfo() function in QEMU is the issue?
> > Something in here?
> > https://github.com/qemu/qemu/blob/master/linux-user/signal.c#L335
> >
> > Nothing jumps out with a quick look though
> >
> > Alistair
> >
> > >
> > >
> > >
> > > --
> > > Andreas K. Hüttel
> > > dilfri...@gentoo.org
> > > Gentoo Linux developer
> > > (council, toolchain, base-system, perl, libreoffice)
> >
>
>
> --
> Andreas K. Hüttel
> dilfri...@gentoo.org
> Gentoo Linux developer
> (council, toolchain, base-system, perl, libreoffice)

Re: [PATCH RESEND 2/6] target/riscv: Introduce extension implied rule helpers

2024-06-11 Thread Frank Chang

On Wed, Jun 5, 2024 at 2:32 PM  wrote:

> From: Frank Chang 
>
> Introduce helpers to enable the extensions based on the implied rules.
> The implied extensions are enabled recursively, so we don't have to
> expand all of them manually. This also eliminates the old-fashioned
> ordering requirement. For example, Zvksg implies Zvks, Zvks implies
> Zvksed, etc., removing the need to check the implied rules of Zvksg
> before Zvks.
>
> Signed-off-by: Frank Chang 
> ---
>  target/riscv/tcg/tcg-cpu.c | 89 ++
>  1 file changed, 89 insertions(+)
>
> diff --git a/target/riscv/tcg/tcg-cpu.c b/target/riscv/tcg/tcg-cpu.c
> index 683f604d9f..899d605d36 100644
> --- a/target/riscv/tcg/tcg-cpu.c
> +++ b/target/riscv/tcg/tcg-cpu.c
> @@ -36,6 +36,9 @@
>  static GHashTable *multi_ext_user_opts;
>  static GHashTable *misa_ext_user_opts;
>
> +static GHashTable *misa_implied_rules;
> +static GHashTable *ext_implied_rules;
> +
>  static bool cpu_cfg_ext_is_user_set(uint32_t ext_offset)
>  {
>  return g_hash_table_contains(multi_ext_user_opts,
> @@ -833,11 +836,95 @@ static void riscv_cpu_validate_profiles(RISCVCPU
> *cpu)
>  }
>  }
>
> +static void riscv_cpu_init_implied_exts_rules(void)
> +{
> +RISCVCPUImpliedExtsRule *rule;
> +int i;
> +
> +for (i = 0; (rule = riscv_misa_implied_rules[i]); i++) {
> +g_hash_table_insert(misa_implied_rules,
> GUINT_TO_POINTER(rule->ext),
> +(gpointer)rule);
> +}
> +
> +for (i = 0; (rule = riscv_ext_implied_rules[i]); i++) {
> +g_hash_table_insert(ext_implied_rules,
> GUINT_TO_POINTER(rule->ext),
> +(gpointer)rule);
> +}
> +}
> +
> +static void cpu_enable_implied_rule(RISCVCPU *cpu,
> +RISCVCPUImpliedExtsRule *rule)
> +{
> +CPURISCVState *env = >env;
> +RISCVCPUImpliedExtsRule *ir;
> +target_ulong hartid = 0;
> +int i;
> +
> +#if !defined(CONFIG_USER_ONLY)
> +hartid = env->mhartid;
> +#endif
> +
> +if (!(rule->enabled & BIT_ULL(hartid))) {
> +/* Enable the implied MISAs. */
> +if (rule->implied_misas) {
> +riscv_cpu_set_misa_ext(env, env->misa_ext |
> rule->implied_misas);
> +
> +for (i = 0; misa_bits[i] != 0; i++) {
> +if (rule->implied_misas & misa_bits[i]) {
> +ir = g_hash_table_lookup(misa_implied_rules,
> +
>  GUINT_TO_POINTER(misa_bits[i]));
> +
> +if (ir) {
> +cpu_enable_implied_rule(cpu, ir);
> +}
> +}
> +}
> +}
> +
> +/* Enable the implied extensions. */
> +for (i = 0; rule->implied_exts[i] != RISCV_IMPLIED_EXTS_RULE_END;
> i++) {
> +cpu_cfg_ext_auto_update(cpu, rule->implied_exts[i], true);
> +
> +ir = g_hash_table_lookup(ext_implied_rules,
> +
>  GUINT_TO_POINTER(rule->implied_exts[i]));
> +
> +if (ir) {
> +cpu_enable_implied_rule(cpu, ir);
> +}
> +}
> +
> +rule->enabled |= BIT_ULL(hartid);
>

Should I use the qatomic API here to set the enabled bitmask?

This wouldn't impact the results but it may cause the implied rules
to be traversed and re-enabled (which has no harm) if the enabled bit
of a hart is accidentally cleared by another harts.


> +}
> +}
> +
> +static void riscv_cpu_enable_implied_rules(RISCVCPU *cpu)
> +{
> +RISCVCPUImpliedExtsRule *rule;
> +int i;
> +
> +/* Enable the implied MISAs. */
> +for (i = 0; (rule = riscv_misa_implied_rules[i]); i++) {
> +if (riscv_has_ext(>env, rule->ext)) {
> +cpu_enable_implied_rule(cpu, rule);
> +}
> +}
> +
> +/* Enable the implied extensions. */
> +for (i = 0; (rule = riscv_ext_implied_rules[i]); i++) {
> +if (isa_ext_is_enabled(cpu, rule->ext)) {
> +cpu_enable_implied_rule(cpu, rule);
> +}
> +}
> +}
> +
>  void riscv_tcg_cpu_finalize_features(RISCVCPU *cpu, Error **errp)
>  {
>  CPURISCVState *env = >env;
>  Error *local_err = NULL;
>
> +riscv_cpu_init_implied_exts_rules();
> +riscv_cpu_enable_implied_rules(cpu);
> +
>  riscv_cpu_validate_misa_priv(env, _err);
>  if (local_err != NULL) {
>  error_propagate(errp, local_err);
> @@ -1343,6 +1430,8 @@ static void riscv_tcg_cpu_instance_init(CPUState *cs)
>
>  misa_ext_user_opts = g_hash_table_new(NULL, g_direct_equal);
>  multi_ext_user_opts = g_hash_table_new(NULL, g_direct_equal);
> +misa_implied_rules = g_hash_table_new(NULL, g_direct_equal);
> +ext_implied_rules = g_hash_table_new(NULL, g_direct_equal);
>  riscv_cpu_add_user_properties(obj);
>
>  if (riscv_cpu_has_max_extensions(obj)) {
> --
> 2.43.2
>
>

[PATCH] accel/tcg: Fix typo causing tb->page_addr[1] to not be recorded

2024-06-11 Thread Anton Johansson via

For TBs crossing page boundaries, the 2nd page will never be
recorded/removed, as the index of the 2nd page is computed from the
address of the 1st page. This is due to a typo, fix it.

Signed-off-by: Anton Johansson 
---
 accel/tcg/tb-maint.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/accel/tcg/tb-maint.c b/accel/tcg/tb-maint.c
index 19ae6793f3..cc0f5afd47 100644
--- a/accel/tcg/tb-maint.c
+++ b/accel/tcg/tb-maint.c
@@ -713,7 +713,7 @@ static void tb_record(TranslationBlock *tb)
 tb_page_addr_t paddr0 = tb_page_addr0(tb);
 tb_page_addr_t paddr1 = tb_page_addr1(tb);
 tb_page_addr_t pindex0 = paddr0 >> TARGET_PAGE_BITS;
-tb_page_addr_t pindex1 = paddr0 >> TARGET_PAGE_BITS;
+tb_page_addr_t pindex1 = paddr1 >> TARGET_PAGE_BITS;
 
 assert(paddr0 != -1);
 if (unlikely(paddr1 != -1) && pindex0 != pindex1) {
@@ -745,7 +745,7 @@ static void tb_remove(TranslationBlock *tb)
 tb_page_addr_t paddr0 = tb_page_addr0(tb);
 tb_page_addr_t paddr1 = tb_page_addr1(tb);
 tb_page_addr_t pindex0 = paddr0 >> TARGET_PAGE_BITS;
-tb_page_addr_t pindex1 = paddr0 >> TARGET_PAGE_BITS;
+tb_page_addr_t pindex1 = paddr1 >> TARGET_PAGE_BITS;
 
 assert(paddr0 != -1);
 if (unlikely(paddr1 != -1) && pindex0 != pindex1) {
-- 
2.45.0

Re: [PATCH v3 1/6] Add an "info pg" command that prints the current page tables

2024-06-11 Thread Don Porter


On 6/7/24 3:16 AM, Daniel P. Berrangé wrote:

On Thu, Jun 06, 2024 at 10:02:48AM -0400, Don Porter wrote:
Please don't add new HMP commands that don't have a QMP
equivalent.

This should be adding an 'x-query-pg' QMP command, which
returns HumanReadableText, and then call that from the HMP

There is guidance on this here:

   
https://www.qemu.org/docs/master/devel/writing-monitor-commands.html#writing-a-debugging-aid-returning-unstructured-text

If you need more real examples, look at the various
'x-query-' commands in qapi/machine.json  and
their impl.


Thank you both for the pointers.  This makes sense to me;
outputting a string is much cleaner.  Will implement in v4...

-dp

Re: [RFC PATCH v2 0/2] ui/gtk: Introduce new param - Connectors

2024-06-11 Thread Kim, Dongwon

Hi Marc-André,

On 6/5/2024 12:26 AM, Marc-André Lureau wrote:

Hi

On Tue, Jun 4, 2024 at 9:59 PM Kim, Dongwon > wrote:

Hi Marc-André,

On 6/4/2024 3:37 AM, Marc-André Lureau wrote:
 > Hi
 >
 > On Fri, May 31, 2024 at 11:00 PM mailto:dongwon@intel.com>
 > >> wrote:
 >
 >     From: Dongwon Kim mailto:dongwon@intel.com> >>
 >
 >     This patch series is a replacement of
 >
https://mail.gnu.org/archive/html/qemu-devel/2023-06/msg03989.html

 >   
  >

 >
 >     There is a need, expressed by several users, to assign
ownership of one
 >     or more physical monitors/connectors to individual guests.
This creates
 >     a clear notion of which guest's contents are being displayed
on any
 >     given
 >     monitor. Given that there is always a display
server/compositor running
 >     on the host, monitor ownership can never truly be transferred
to guests.
 >     However, the closest approximation is to request the host
compositor to
 >     fullscreen the guest's windows on individual monitors. This
allows for
 >     various configurations, such as displaying four different guests'
 >     windows
 >     on four different monitors, a single guest's windows (or virtual
 >     consoles)
 >     on four monitors, or any similar combination.
 >
 >     This patch series attempts to accomplish this by introducing
a new
 >     parameter named "connector" to assign monitors to the GFX VCs
associated
 >     with a guest. If the assigned monitor is not connected, the
guest's
 >     window
 >     will not be displayed, similar to how a host compositor
behaves when
 >     connectors are not connected. Once the monitor is
hot-plugged, the
 >     guest's
 >     window(s) will be positioned on the assigned monitor.
 >
 >     Usage example:
 >
 >     -display gtk,gl=on,connectors=DP-1:eDP-1:HDMI-2...
 >
 >     In this example, the first graphics virtual console will be
placed
 >     on the
 >     DP-1 display, the second on eDP-1, and the third on HDMI-2.
 >
 >
 > Unfortunately, this approach with GTK is doomed. gtk4 dropped the
 > gtk_window_set_position() altogether.

Do you mean we have a plan to lift GTK version in QEMU? Are we going to
lose all GTK3 specific features?

No concrete plan, no. But eventually GTK3 will go away some day.

There are users who still rely on features provided by GTK3 and we also 
have customers who are moving from VMware, virtualbox that have 
requested for this feature. Their use-cases are current and active. If 
windows repositioning won't be supported someday, then we would need to 
make this feature obsolete but many users/customers would benefit from 
it until then.

fwiw, I wish QEMU wouldn't have N built-in UIs/Spice/VNC, but different 
projects elsewhere using -display dbus. There is 
https://gitlab.gnome.org/GNOME/libmks 
 or 
https://gitlab.com/marcandre.lureau/qemu-display 
 gtk4 efforts.

As you know, there cannot be a one size fits all solution that would 
work for all the users, which is probably why there are many Qemu UIs.

 >
 > It's not even clear how the different monitors/outputs/connectors
are
 > actually named, whether they are stable etc (not mentioning the
 > portability).
 >
 > Window placement & geometry is a job for the compositor. Can you
discuss
 > this issue with GTK devs & the compositor you are targeting?

I guess you are talking about wayland compositor. We are mainly using
Xorg on the host and this feature works pretty good on it. I am

Xorg may not be going away soon, but it's used less and less. As one of 
the developers, I am no longer running/testing it for a long time. I 
wish we would just drop its support tbh.

There are features offered by Xorg that are not offered by Wayland 
compositors and again, we have customers that rely on these features.
One of them is the ability to position the window via 
gtk_window_set_position(). There are strong arguments

made on either side when it comes to window positioning:
https://gitlab.freedesktop.org/wayland/wayland-protocols/-/merge_requests/247

Until there is a way to do this with Wayland compositors, we have to 
unfortunately rely on Gnome + Xorg.

wondering if we limit the feature to Xorg case or adding some warning
messages with error return

Re: [RFC PATCH v1 1/6] build-sys: Add rust feature option

2024-06-11 Thread Stefan Hajnoczi

On Tue, 11 Jun 2024 at 13:54, Manos Pitsidianakis
 wrote:
>
> On Tue, 11 Jun 2024 at 17:05, Stefan Hajnoczi  wrote:
> >
> > On Mon, Jun 10, 2024 at 09:22:36PM +0300, Manos Pitsidianakis wrote:
> > > Add options for Rust in meson_options.txt, meson.build, configure to
> > > prepare for adding Rust code in the followup commits.
> > >
> > > `rust` is a reserved meson name, so we have to use an alternative.
> > > `with_rust` was chosen.
> > >
> > > Signed-off-by: Manos Pitsidianakis 
> > > ---
> > > The cargo wrapper script hardcodes some rust target triples. This is
> > > just temporary.
> > > ---
> > >  .gitignore   |   2 +
> > >  configure|  12 +++
> > >  meson.build  |  11 ++
> > >  meson_options.txt|   4 +
> > >  scripts/cargo_wrapper.py | 211 +++
> > >  5 files changed, 240 insertions(+)
> > >  create mode 100644 scripts/cargo_wrapper.py
> > >
> > > diff --git a/.gitignore b/.gitignore
> > > index 61fa39967b..f42b0d937e 100644
> > > --- a/.gitignore
> > > +++ b/.gitignore
> > > @@ -2,6 +2,8 @@
> > >  /build/
> > >  /.cache/
> > >  /.vscode/
> > > +/target/
> > > +rust/**/target
> >
> > Are these necessary since the cargo build command-line below uses
> > --target-dir ?
> >
> > Adding new build output directories outside build/ makes it harder to
> > clean up the source tree and ensure no state from previous builds
> > remains.
>
> Agreed! These build directories would show up when using cargo
> directly instead of through the cargo_wrapper.py script, i.e. during
> development. I'd consider it an edge case, it won't happen much and if
> it does it's better to gitignore them than accidentally checking them
> in. Also, whatever artifacts are in a `target` directory won't be used
> for compilation with qemu inside a build directory.

Why would someone bypass the build system? I don't think we should
encourage developers to do this.

>
>
> > >  *.pyc
> > >  .sdk
> > >  .stgit-*
> > > diff --git a/configure b/configure
> > > index 38ee257701..c195630771 100755
> > > --- a/configure
> > > +++ b/configure
> > > @@ -302,6 +302,9 @@ else
> > >objcc="${objcc-${cross_prefix}clang}"
> > >  fi
> > >
> > > +with_rust="auto"
> > > +with_rust_target_triple=""
> > > +
> > >  ar="${AR-${cross_prefix}ar}"
> > >  as="${AS-${cross_prefix}as}"
> > >  ccas="${CCAS-$cc}"
> > > @@ -760,6 +763,12 @@ for opt do
> > >;;
> > >--gdb=*) gdb_bin="$optarg"
> > >;;
> > > +  --enable-rust) with_rust=enabled
> > > +  ;;
> > > +  --disable-rust) with_rust=disabled
> > > +  ;;
> > > +  --rust-target-triple=*) with_rust_target_triple="$optarg"
> > > +  ;;
> > ># everything else has the same name in configure and meson
> > >--*) meson_option_parse "$opt" "$optarg"
> > >;;
> > > @@ -1796,6 +1805,9 @@ if test "$skip_meson" = no; then
> > >test -n "${LIB_FUZZING_ENGINE+xxx}" && meson_option_add 
> > > "-Dfuzzing_engine=$LIB_FUZZING_ENGINE"
> > >test "$plugins" = yes && meson_option_add "-Dplugins=true"
> > >test "$tcg" != enabled && meson_option_add "-Dtcg=$tcg"
> > > +  test "$with_rust" != enabled && meson_option_add 
> > > "-Dwith_rust=$with_rust"
> > > +  test "$with_rust" != enabled && meson_option_add 
> > > "-Dwith_rust=$with_rust"
> >
> > Duplicate line.
>
> Thanks!
>
> >
> > > +  test "$with_rust_target_triple" != "" && meson_option_add 
> > > "-Dwith_rust_target_triple=$with_rust_target_triple"
> > >run_meson() {
> > >  NINJA=$ninja $meson setup "$@" "$PWD" "$source_path"
> > >}
> > > diff --git a/meson.build b/meson.build
> > > index a9de71d450..3533889852 100644
> > > --- a/meson.build
> > > +++ b/meson.build
> > > @@ -290,6 +290,12 @@ foreach lang : all_languages
> > >endif
> > >  endforeach
> > >
> > > +cargo = not_found
> > > +if get_option('with_rust').allowed()
> > > +  cargo = find_program('cargo', required: get_option('with_rust'))
> > > +endif
> > > +with_rust = cargo.found()
> > > +
> > >  # default flags for all hosts
> > >  # We use -fwrapv to tell the compiler that we require a C dialect where
> > >  # left shift of signed integers is well defined and has the expected
> > > @@ -2066,6 +2072,7 @@ endif
> > >
> > >  config_host_data = configuration_data()
> > >
> > > +config_host_data.set('CONFIG_WITH_RUST', with_rust)
> > >  audio_drivers_selected = []
> > >  if have_system
> > >audio_drivers_available = {
> > > @@ -4190,6 +4197,10 @@ if 'objc' in all_languages
> > >  else
> > >summary_info += {'Objective-C compiler': false}
> > >  endif
> > > +summary_info += {'Rust support':  with_rust}
> > > +if with_rust and get_option('with_rust_target_triple') != ''
> > > +  summary_info += {'Rust target': 
> > > get_option('with_rust_target_triple')}
> > > +endif
> > >  option_cflags = (get_option('debug') ? ['-g'] : [])
> > >  if get_option('optimization') != 'plain'
> > >option_cflags += ['-O' + get_option('optimization')]
> > > diff --git a/meson_options.txt

Re: [PATCH v4 2/4] vvfat: Fix usage of `info.file.offset`

Am 11.06.2024 um 18:22 hat Amjad Alsharafi geschrieben:
> On Tue, Jun 11, 2024 at 04:30:53PM +0200, Kevin Wolf wrote:
> > Am 11.06.2024 um 14:31 hat Amjad Alsharafi geschrieben:
> > > On Mon, Jun 10, 2024 at 06:49:43PM +0200, Kevin Wolf wrote:
> > > > Am 05.06.2024 um 02:58 hat Amjad Alsharafi geschrieben:
> > > > > The field is marked as "the offset in the file (in clusters)", but it
> > > > > was being used like this
> > > > > `cluster_size*(nums)+mapping->info.file.offset`, which is incorrect.
> > > > > 
> > > > > Additionally, removed the `abort` when `first_mapping_index` does not
> > > > > match, as this matches the case when adding new clusters for files, 
> > > > > and
> > > > > its inevitable that we reach this condition when doing that if the
> > > > > clusters are not after one another, so there is no reason to `abort`
> > > > > here, execution continues and the new clusters are written to disk
> > > > > correctly.
> > > > > 
> > > > > Signed-off-by: Amjad Alsharafi 
> > > > 
> > > > Can you help me understand how first_mapping_index really works?
> > > > 
> > > > It seems to me that you get a chain of mappings for each file on the FAT
> > > > filesystem, which are just the contiguous areas in it, and
> > > > first_mapping_index refers to the mapping at the start of the file. But
> > > > for much of the time, it actually doesn't seem to be set at all, so you
> > > > have mapping->first_mapping_index == -1. Do you understand the rules
> > > > around when it's set and when it isn't?
> > > 
> > > Yeah. So `first_mapping_index` is the index of the first mapping, each
> > > mapping is a group of clusters that are contiguous in the file.
> > > Its mostly `-1` because the first mapping will have the value set as
> > > `-1` and not its own index, this value will only be set when the file
> > > contain more than one mapping, and this will only happen when you add
> > > clusters to a file that are not contiguous with the existing clusters.
> > 
> > Ah, that makes some sense. Not sure if it's optimal, but it's a rule I
> > can work with. So just to confirm, this is the invariant that we think
> > should always hold true, right?
> > 
> > assert((mapping->mode & MODE_DIRECTORY) ||
> >!mapping->info.file.offset ||
> >mapping->first_mapping_index > 0);
> > 
> 
> Yes.
> 
> We can add this into `get_cluster_count_for_direntry` loop.

Maybe even find_mapping_for_cluster() because we think it should apply
always? It's called by get_cluster_count_for_direntry(), but also by
other functions.

Either way, I think this should be a separate patch.

> I'm thinking of also converting those `abort` into `assert`, since
> the line `copy_it = 1;` was confusing me, since it was after the `abort`.

I agree for the abort() that you removed, but I'm not sure about the
other one. I have a feeling the copy_it = 1 might actually be correct
there (if the copying logic is implemented correctly; I didn't check
that).

> > > And actually, thanks to that I noticed another bug not fixed in PATCH 3, 
> > > We are doing this check 
> > > `s->current_mapping->first_mapping_index != mapping->first_mapping_index`
> > > to know if we should switch to the new mapping or not. 
> > > If we were reading from the first mapping (`first_mapping_index == -1`)
> > > and we jumped to the second mapping (`first_mapping_index == n`), we
> > > will catch this condition and switch to the new mapping.
> > > 
> > > But if the file has more than 2 mappings, and we jumped to the 3rd
> > > mapping, we will not catch this since (`first_mapping_index == n`) for
> > > both of them haha. I think a better check is to check the `mapping`
> > > pointer directly. (I'll add it also in the next series together with a
> > > test for it.)
> > 
> > This comparison is exactly what confused me. I didn't realise that the
> > first mapping in the chain has a different value here, so I thought this
> > must mean that we're looking at a different file now - but of course I
> > couldn't see a reason for that because we're iterating through a single
> > file in this function.
> > 
> > But even now that I know that the condition triggers when switching from
> > the first to the second mapping, it doesn't make sense to me. We don't
> > have to copy things around just because a file is non-contiguous.
> > 
> > What we want to catch is if the order of mappings has changed compared
> > to the old state. Do we need a linked list, maybe a prev_mapping_index,
> > instead of first_mapping_index so that we can compare if it is still the
> > same as before?
> 
> I think this would be the better design (tbh, that's what I thought 
> `first_mapping_index` would do), though not sure if other components
> depend so much into the current design that it would be hard to change.
> 
> I'll try to implement this `prev_mapping_index` and see how it goes.

Let's try not to do too much at once. We know that vvfat is a mess,
nobody fully understands it, and the write support

Re: [RFC PATCH v1 1/6] build-sys: Add rust feature option

On Tue, 11 Jun 2024 at 17:05, Stefan Hajnoczi  wrote:
>
> On Mon, Jun 10, 2024 at 09:22:36PM +0300, Manos Pitsidianakis wrote:
> > Add options for Rust in meson_options.txt, meson.build, configure to
> > prepare for adding Rust code in the followup commits.
> >
> > `rust` is a reserved meson name, so we have to use an alternative.
> > `with_rust` was chosen.
> >
> > Signed-off-by: Manos Pitsidianakis 
> > ---
> > The cargo wrapper script hardcodes some rust target triples. This is
> > just temporary.
> > ---
> >  .gitignore   |   2 +
> >  configure|  12 +++
> >  meson.build  |  11 ++
> >  meson_options.txt|   4 +
> >  scripts/cargo_wrapper.py | 211 +++
> >  5 files changed, 240 insertions(+)
> >  create mode 100644 scripts/cargo_wrapper.py
> >
> > diff --git a/.gitignore b/.gitignore
> > index 61fa39967b..f42b0d937e 100644
> > --- a/.gitignore
> > +++ b/.gitignore
> > @@ -2,6 +2,8 @@
> >  /build/
> >  /.cache/
> >  /.vscode/
> > +/target/
> > +rust/**/target
>
> Are these necessary since the cargo build command-line below uses
> --target-dir ?
>
> Adding new build output directories outside build/ makes it harder to
> clean up the source tree and ensure no state from previous builds
> remains.

Agreed! These build directories would show up when using cargo
directly instead of through the cargo_wrapper.py script, i.e. during
development. I'd consider it an edge case, it won't happen much and if
it does it's better to gitignore them than accidentally checking them
in. Also, whatever artifacts are in a `target` directory won't be used
for compilation with qemu inside a build directory.


> >  *.pyc
> >  .sdk
> >  .stgit-*
> > diff --git a/configure b/configure
> > index 38ee257701..c195630771 100755
> > --- a/configure
> > +++ b/configure
> > @@ -302,6 +302,9 @@ else
> >objcc="${objcc-${cross_prefix}clang}"
> >  fi
> >
> > +with_rust="auto"
> > +with_rust_target_triple=""
> > +
> >  ar="${AR-${cross_prefix}ar}"
> >  as="${AS-${cross_prefix}as}"
> >  ccas="${CCAS-$cc}"
> > @@ -760,6 +763,12 @@ for opt do
> >;;
> >--gdb=*) gdb_bin="$optarg"
> >;;
> > +  --enable-rust) with_rust=enabled
> > +  ;;
> > +  --disable-rust) with_rust=disabled
> > +  ;;
> > +  --rust-target-triple=*) with_rust_target_triple="$optarg"
> > +  ;;
> ># everything else has the same name in configure and meson
> >--*) meson_option_parse "$opt" "$optarg"
> >;;
> > @@ -1796,6 +1805,9 @@ if test "$skip_meson" = no; then
> >test -n "${LIB_FUZZING_ENGINE+xxx}" && meson_option_add 
> > "-Dfuzzing_engine=$LIB_FUZZING_ENGINE"
> >test "$plugins" = yes && meson_option_add "-Dplugins=true"
> >test "$tcg" != enabled && meson_option_add "-Dtcg=$tcg"
> > +  test "$with_rust" != enabled && meson_option_add "-Dwith_rust=$with_rust"
> > +  test "$with_rust" != enabled && meson_option_add "-Dwith_rust=$with_rust"
>
> Duplicate line.

Thanks!

>
> > +  test "$with_rust_target_triple" != "" && meson_option_add 
> > "-Dwith_rust_target_triple=$with_rust_target_triple"
> >run_meson() {
> >  NINJA=$ninja $meson setup "$@" "$PWD" "$source_path"
> >}
> > diff --git a/meson.build b/meson.build
> > index a9de71d450..3533889852 100644
> > --- a/meson.build
> > +++ b/meson.build
> > @@ -290,6 +290,12 @@ foreach lang : all_languages
> >endif
> >  endforeach
> >
> > +cargo = not_found
> > +if get_option('with_rust').allowed()
> > +  cargo = find_program('cargo', required: get_option('with_rust'))
> > +endif
> > +with_rust = cargo.found()
> > +
> >  # default flags for all hosts
> >  # We use -fwrapv to tell the compiler that we require a C dialect where
> >  # left shift of signed integers is well defined and has the expected
> > @@ -2066,6 +2072,7 @@ endif
> >
> >  config_host_data = configuration_data()
> >
> > +config_host_data.set('CONFIG_WITH_RUST', with_rust)
> >  audio_drivers_selected = []
> >  if have_system
> >audio_drivers_available = {
> > @@ -4190,6 +4197,10 @@ if 'objc' in all_languages
> >  else
> >summary_info += {'Objective-C compiler': false}
> >  endif
> > +summary_info += {'Rust support':  with_rust}
> > +if with_rust and get_option('with_rust_target_triple') != ''
> > +  summary_info += {'Rust target': 
> > get_option('with_rust_target_triple')}
> > +endif
> >  option_cflags = (get_option('debug') ? ['-g'] : [])
> >  if get_option('optimization') != 'plain'
> >option_cflags += ['-O' + get_option('optimization')]
> > diff --git a/meson_options.txt b/meson_options.txt
> > index 4c1583eb40..223491b731 100644
> > --- a/meson_options.txt
> > +++ b/meson_options.txt
> > @@ -366,3 +366,7 @@ option('qemu_ga_version', type: 'string', value: '',
> >
> >  option('hexagon_idef_parser', type : 'boolean', value : true,
> > description: 'use idef-parser to automatically generate TCG code 
> > for the Hexagon frontend')
> > +option('with_rust', type: 'feature', value: 'auto',
> > +   description:

Re: [PATCH v4 5/5] iotests: add backup-discard-source

Am 13.03.2024 um 16:28 hat Vladimir Sementsov-Ogievskiy geschrieben:
> Add test for a new backup option: discard-source.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> Reviewed-by: Fiona Ebner 
> Tested-by: Fiona Ebner 

This test fails for me, and it already does so after this commit that
introduced it. I haven't checked what get_actual_size(), but I'm running
on XFS, so its preallocation could be causing this. We generally avoid
checking the number of allocated blocks in image files for this reason.

Kevin


backup-discard-source   fail   [19:45:49] [19:45:50]   0.8s 
failed, exit status 1
--- /home/kwolf/source/qemu/tests/qemu-iotests/tests/backup-discard-source.out
+++ 
/home/kwolf/source/qemu/build-clang/scratch/qcow2-file-backup-discard-source/backup-discard-source.out.bad
@@ -1,5 +1,14 @@
-..
+F.
+==
+FAIL: test_discard_cbw (__main__.TestBackup.test_discard_cbw)
+1. do backup(discard_source=True), which should inform
+--
+Traceback (most recent call last):
+  File 
"/home/kwolf/source/qemu/tests/qemu-iotests/tests/backup-discard-source", line 
147, in test_discard_cbw
+self.assertLess(get_actual_size(self.vm, 'temp'), 512 * 1024)
+AssertionError: 1249280 not less than 524288
+
 --
 Ran 2 tests

-OK
+FAILED (failures=1)
Failures: backup-discard-source
Failed 1 of 1 iotests

Re: [PATCH v1] virtio-iommu: add error check before assert

On Tue, 11 Jun 2024 at 18:01, Philippe Mathieu-Daudé  wrote:
>
> On 11/6/24 14:23, Manos Pitsidianakis wrote:
> > A fuzzer case discovered by Zheyu Ma causes an assert failure.
> >
> > Add a check before the assert, and respond with an error before moving
> > on to the next queue element.
> >
> > To reproduce the failure:
> >
> > cat << EOF | \
> > qemu-system-x86_64 \
> > -display none -machine accel=qtest -m 512M -machine q35 -nodefaults \
> > -device virtio-iommu -qtest stdio
> > outl 0xcf8 0x8804
> > outw 0xcfc 0x06
> > outl 0xcf8 0x8820
> > outl 0xcfc 0xe0004000
> > write 0x1e 0x1 0x01
> > write 0xe0004020 0x4 0x1000
> > write 0xe0004028 0x4 0x00101000
> > write 0xe000401c 0x1 0x01
> > write 0x106000 0x1 0x05
> > write 0x11 0x1 0x60
> > write 0x12 0x1 0x10
> > write 0x19 0x1 0x04
> > write 0x1c 0x1 0x01
> > write 0x100018 0x1 0x04
> > write 0x10001c 0x1 0x02
> > write 0x101003 0x1 0x01
> > write 0xe0007001 0x1 0x00
> > EOF
> >
> > Reported-by: Zheyu Ma 
> > Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2359
> > Signed-off-by: Manos Pitsidianakis 
> > ---
> >   hw/virtio/virtio-iommu.c | 12 
> >   1 file changed, 12 insertions(+)
> >
> > diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
> > index 1326c6ec41..9b99def39f 100644
> > --- a/hw/virtio/virtio-iommu.c
> > +++ b/hw/virtio/virtio-iommu.c
> > @@ -818,6 +818,18 @@ static void virtio_iommu_handle_command(VirtIODevice 
> > *vdev, VirtQueue *vq)
> >   out:
> >   sz = iov_from_buf(elem->in_sg, elem->in_num, 0,
> > buf ? buf : , output_size);
> > +if (unlikely(sz != output_size)) {
>
> Is this a normal guest behavior? Should we log it as GUEST_ERROR?

It's not, it'd be a virtio spec (implementation) mis-use by the guest.
the Internal device error (VIRTIO_IOMMU_S_DEVERR) would be logged by
the kernel; should we log it as well?

[PULL 3/8] aio: warn about iohandler_ctx special casing

From: Stefan Hajnoczi 

The main loop has two AioContexts: qemu_aio_context and iohandler_ctx.
The main loop runs them both, but nested aio_poll() calls on
qemu_aio_context exclude iohandler_ctx.

Which one should qemu_get_current_aio_context() return when called from
the main loop? Document that it's always qemu_aio_context.

This has subtle effects on functions that use
qemu_get_current_aio_context(). For example, aio_co_reschedule_self()
does not work when moving from iohandler_ctx to qemu_aio_context because
qemu_get_current_aio_context() does not differentiate these two
AioContexts.

Document this in order to reduce the chance of future bugs.

Signed-off-by: Stefan Hajnoczi 
Message-ID: <20240506190622.56095-3-stefa...@redhat.com>
Reviewed-by: Kevin Wolf 
Signed-off-by: Kevin Wolf 
---
 include/block/aio.h | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/include/block/aio.h b/include/block/aio.h
index 8378553eb9..4ee81936ed 100644
--- a/include/block/aio.h
+++ b/include/block/aio.h
@@ -629,6 +629,9 @@ void aio_co_schedule(AioContext *ctx, Coroutine *co);
  *
  * Move the currently running coroutine to new_ctx. If the coroutine is already
  * running in new_ctx, do nothing.
+ *
+ * Note that this function cannot reschedule from iohandler_ctx to
+ * qemu_aio_context.
  */
 void coroutine_fn aio_co_reschedule_self(AioContext *new_ctx);
 
@@ -661,6 +664,9 @@ void aio_co_enter(AioContext *ctx, Coroutine *co);
  * If called from an IOThread this will be the IOThread's AioContext.  If
  * called from the main thread or with the "big QEMU lock" taken it
  * will be the main loop AioContext.
+ *
+ * Note that the return value is never the main loop's iohandler_ctx and the
+ * return value is the main loop AioContext instead.
  */
 AioContext *qemu_get_current_aio_context(void);
 
-- 
2.45.2

[PULL 6/8] linux-aio: add IO_CMD_FDSYNC command support

From: Prasad Pandit 

Libaio defines IO_CMD_FDSYNC command to sync all outstanding
asynchronous I/O operations, by flushing out file data to the
disk storage. Enable linux-aio to submit such aio request.

When using aio=native without fdsync() support, QEMU creates
pthreads, and destroying these pthreads results in TLB flushes.
In a real-time guest environment, TLB flushes cause a latency
spike. This patch helps to avoid such spikes.

Reviewed-by: Stefan Hajnoczi 
Signed-off-by: Prasad Pandit 
Message-ID: <20240425070412.37248-1-ppan...@redhat.com>
Reviewed-by: Kevin Wolf 
Signed-off-by: Kevin Wolf 
---
 include/block/raw-aio.h |  1 +
 block/file-posix.c  |  9 +
 block/linux-aio.c   | 21 -
 3 files changed, 30 insertions(+), 1 deletion(-)

diff --git a/include/block/raw-aio.h b/include/block/raw-aio.h
index 20e000b8ef..626706827f 100644
--- a/include/block/raw-aio.h
+++ b/include/block/raw-aio.h
@@ -60,6 +60,7 @@ void laio_cleanup(LinuxAioState *s);
 int coroutine_fn laio_co_submit(int fd, uint64_t offset, QEMUIOVector *qiov,
 int type, uint64_t dev_max_batch);
 
+bool laio_has_fdsync(int);
 void laio_detach_aio_context(LinuxAioState *s, AioContext *old_context);
 void laio_attach_aio_context(LinuxAioState *s, AioContext *new_context);
 #endif
diff --git a/block/file-posix.c b/block/file-posix.c
index 5c46938936..be25e35ff6 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -159,6 +159,7 @@ typedef struct BDRVRawState {
 bool has_discard:1;
 bool has_write_zeroes:1;
 bool use_linux_aio:1;
+bool has_laio_fdsync:1;
 bool use_linux_io_uring:1;
 int page_cache_inconsistent; /* errno from fdatasync failure */
 bool has_fallocate;
@@ -718,6 +719,9 @@ static int raw_open_common(BlockDriverState *bs, QDict 
*options,
 ret = -EINVAL;
 goto fail;
 }
+if (s->use_linux_aio) {
+s->has_laio_fdsync = laio_has_fdsync(s->fd);
+}
 #else
 if (s->use_linux_aio) {
 error_setg(errp, "aio=native was specified, but is not supported "
@@ -2598,6 +2602,11 @@ static int coroutine_fn 
raw_co_flush_to_disk(BlockDriverState *bs)
 if (raw_check_linux_io_uring(s)) {
 return luring_co_submit(bs, s->fd, 0, NULL, QEMU_AIO_FLUSH);
 }
+#endif
+#ifdef CONFIG_LINUX_AIO
+if (s->has_laio_fdsync && raw_check_linux_aio(s)) {
+return laio_co_submit(s->fd, 0, NULL, QEMU_AIO_FLUSH, 0);
+}
 #endif
 return raw_thread_pool_submit(handle_aiocb_flush, );
 }
diff --git a/block/linux-aio.c b/block/linux-aio.c
index ec05d946f3..e3b5ec9aba 100644
--- a/block/linux-aio.c
+++ b/block/linux-aio.c
@@ -384,6 +384,9 @@ static int laio_do_submit(int fd, struct qemu_laiocb 
*laiocb, off_t offset,
 case QEMU_AIO_READ:
 io_prep_preadv(iocbs, fd, qiov->iov, qiov->niov, offset);
 break;
+case QEMU_AIO_FLUSH:
+io_prep_fdsync(iocbs, fd);
+break;
 /* Currently Linux kernel does not support other operations */
 default:
 fprintf(stderr, "%s: invalid AIO request type 0x%x.\n",
@@ -412,7 +415,7 @@ int coroutine_fn laio_co_submit(int fd, uint64_t offset, 
QEMUIOVector *qiov,
 AioContext *ctx = qemu_get_current_aio_context();
 struct qemu_laiocb laiocb = {
 .co = qemu_coroutine_self(),
-.nbytes = qiov->size,
+.nbytes = qiov ? qiov->size : 0,
 .ctx= aio_get_linux_aio(ctx),
 .ret= -EINPROGRESS,
 .is_read= (type == QEMU_AIO_READ),
@@ -486,3 +489,19 @@ void laio_cleanup(LinuxAioState *s)
 }
 g_free(s);
 }
+
+bool laio_has_fdsync(int fd)
+{
+struct iocb cb;
+struct iocb *cbs[] = {, NULL};
+
+io_context_t ctx = 0;
+io_setup(1, );
+
+/* check if host kernel supports IO_CMD_FDSYNC */
+io_prep_fdsync(, fd);
+int ret = io_submit(ctx, 1, cbs);
+
+io_destroy(ctx);
+return (ret == -EINVAL) ? false : true;
+}
-- 
2.45.2

[PULL 0/8] Block layer patches

The following changes since commit 80e8f0602168f451a93e71cbb1d59e93d745e62e:

  Merge tag 'bsd-user-misc-2024q2-pull-request' of gitlab.com:bsdimp/qemu into 
staging (2024-06-09 11:21:55 -0700)

are available in the Git repository at:

  https://repo.or.cz/qemu/kevin.git tags/for-upstream

for you to fetch changes up to 3ab0f063e58ed9224237d69c4211ca83335164c4:

  crypto/block: drop qcrypto_block_open() n_threads argument (2024-06-10 
11:05:43 +0200)


Block layer patches

- crypto: Fix crash when used with multiqueue devices
- linux-aio: add IO_CMD_FDSYNC command support
- copy-before-write: Avoid integer overflows for timeout > 4s
- Fix crash with QMP block_resize and iothreads
- qemu-io: add cvtnum() error handling for zone commands
- Code cleanup


Denis V. Lunev via (1):
  block: drop force_dup parameter of raw_reconfigure_getfd()

Fiona Ebner (1):
  block/copy-before-write: use uint64_t for timeout in nanoseconds

Prasad J Pandit (1):
  linux-aio: add IO_CMD_FDSYNC command support

Stefan Hajnoczi (5):
  Revert "monitor: use aio_co_reschedule_self()"
  aio: warn about iohandler_ctx special casing
  qemu-io: add cvtnum() error handling for zone commands
  block/crypto: create ciphers on demand
  crypto/block: drop qcrypto_block_open() n_threads argument

 crypto/blockpriv.h |  13 +++--
 include/block/aio.h|   6 +++
 include/block/raw-aio.h|   1 +
 include/crypto/block.h |   2 -
 block/copy-before-write.c  |   2 +-
 block/crypto.c |   1 -
 block/file-posix.c |  17 --
 block/linux-aio.c  |  21 +++-
 block/qcow.c   |   2 +-
 block/qcow2.c  |   5 +-
 crypto/block-luks.c|   4 +-
 crypto/block-qcow.c|   8 ++-
 crypto/block.c | 114 -
 qapi/qmp-dispatch.c|   7 ++-
 qemu-io-cmds.c |  48 -
 tests/unit/test-crypto-block.c |   4 --
 16 files changed, 176 insertions(+), 79 deletions(-)

[PULL 4/8] qemu-io: add cvtnum() error handling for zone commands

From: Stefan Hajnoczi 

cvtnum() parses positive int64_t values and returns a negative errno on
failure. Print errors and return early when cvtnum() fails.

While we're at it, also reject nr_zones values greater or equal to 2^32
since they cannot be represented.

Reported-by: Peter Maydell 
Cc: Sam Li 
Signed-off-by: Stefan Hajnoczi 
Message-ID: <20240507180558.377233-1-stefa...@redhat.com>
Reviewed-by: Sam Li 
Reviewed-by: Kevin Wolf 
Signed-off-by: Kevin Wolf 
---
 qemu-io-cmds.c | 48 +++-
 1 file changed, 47 insertions(+), 1 deletion(-)

diff --git a/qemu-io-cmds.c b/qemu-io-cmds.c
index f5d7202a13..e2fab57183 100644
--- a/qemu-io-cmds.c
+++ b/qemu-io-cmds.c
@@ -1739,12 +1739,26 @@ static int zone_report_f(BlockBackend *blk, int argc, 
char **argv)
 {
 int ret;
 int64_t offset;
+int64_t val;
 unsigned int nr_zones;
 
 ++optind;
 offset = cvtnum(argv[optind]);
+if (offset < 0) {
+print_cvtnum_err(offset, argv[optind]);
+return offset;
+}
 ++optind;
-nr_zones = cvtnum(argv[optind]);
+val = cvtnum(argv[optind]);
+if (val < 0) {
+print_cvtnum_err(val, argv[optind]);
+return val;
+}
+if (val > UINT_MAX) {
+printf("Number of zones must be less than 2^32\n");
+return -ERANGE;
+}
+nr_zones = val;
 
 g_autofree BlockZoneDescriptor *zones = NULL;
 zones = g_new(BlockZoneDescriptor, nr_zones);
@@ -1780,8 +1794,16 @@ static int zone_open_f(BlockBackend *blk, int argc, char 
**argv)
 int64_t offset, len;
 ++optind;
 offset = cvtnum(argv[optind]);
+if (offset < 0) {
+print_cvtnum_err(offset, argv[optind]);
+return offset;
+}
 ++optind;
 len = cvtnum(argv[optind]);
+if (len < 0) {
+print_cvtnum_err(len, argv[optind]);
+return len;
+}
 ret = blk_zone_mgmt(blk, BLK_ZO_OPEN, offset, len);
 if (ret < 0) {
 printf("zone open failed: %s\n", strerror(-ret));
@@ -1805,8 +1827,16 @@ static int zone_close_f(BlockBackend *blk, int argc, 
char **argv)
 int64_t offset, len;
 ++optind;
 offset = cvtnum(argv[optind]);
+if (offset < 0) {
+print_cvtnum_err(offset, argv[optind]);
+return offset;
+}
 ++optind;
 len = cvtnum(argv[optind]);
+if (len < 0) {
+print_cvtnum_err(len, argv[optind]);
+return len;
+}
 ret = blk_zone_mgmt(blk, BLK_ZO_CLOSE, offset, len);
 if (ret < 0) {
 printf("zone close failed: %s\n", strerror(-ret));
@@ -1830,8 +1860,16 @@ static int zone_finish_f(BlockBackend *blk, int argc, 
char **argv)
 int64_t offset, len;
 ++optind;
 offset = cvtnum(argv[optind]);
+if (offset < 0) {
+print_cvtnum_err(offset, argv[optind]);
+return offset;
+}
 ++optind;
 len = cvtnum(argv[optind]);
+if (len < 0) {
+print_cvtnum_err(len, argv[optind]);
+return len;
+}
 ret = blk_zone_mgmt(blk, BLK_ZO_FINISH, offset, len);
 if (ret < 0) {
 printf("zone finish failed: %s\n", strerror(-ret));
@@ -1855,8 +1893,16 @@ static int zone_reset_f(BlockBackend *blk, int argc, 
char **argv)
 int64_t offset, len;
 ++optind;
 offset = cvtnum(argv[optind]);
+if (offset < 0) {
+print_cvtnum_err(offset, argv[optind]);
+return offset;
+}
 ++optind;
 len = cvtnum(argv[optind]);
+if (len < 0) {
+print_cvtnum_err(len, argv[optind]);
+return len;
+}
 ret = blk_zone_mgmt(blk, BLK_ZO_RESET, offset, len);
 if (ret < 0) {
 printf("zone reset failed: %s\n", strerror(-ret));
-- 
2.45.2

[PULL 5/8] block/copy-before-write: use uint64_t for timeout in nanoseconds

From: Fiona Ebner 

rather than the uint32_t for which the maximum is slightly more than 4
seconds and larger values would overflow. The QAPI interface allows
specifying the number of seconds, so only values 0 to 4 are safe right
now, other values lead to a much lower timeout than a user expects.

The block_copy() call where this is used already takes a uint64_t for
the timeout, so no change required there.

Fixes: 6db7fd1ca9 ("block/copy-before-write: implement cbw-timeout option")
Reported-by: Friedrich Weber 
Signed-off-by: Fiona Ebner 
Message-ID: <20240429141934.442154-1-f.eb...@proxmox.com>
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Kevin Wolf 
Signed-off-by: Kevin Wolf 
---
 block/copy-before-write.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/copy-before-write.c b/block/copy-before-write.c
index cd65524e26..853e01a1eb 100644
--- a/block/copy-before-write.c
+++ b/block/copy-before-write.c
@@ -43,7 +43,7 @@ typedef struct BDRVCopyBeforeWriteState {
 BlockCopyState *bcs;
 BdrvChild *target;
 OnCbwError on_cbw_error;
-uint32_t cbw_timeout_ns;
+uint64_t cbw_timeout_ns;
 bool discard_source;
 
 /*
-- 
2.45.2

[PULL 8/8] crypto/block: drop qcrypto_block_open() n_threads argument

From: Stefan Hajnoczi 

The n_threads argument is no longer used since the previous commit.
Remove it.

Signed-off-by: Stefan Hajnoczi 
Message-ID: <20240527155851.892885-3-stefa...@redhat.com>
Reviewed-by: Kevin Wolf 
Acked-by: Daniel P. Berrangé 
Signed-off-by: Kevin Wolf 
---
 crypto/blockpriv.h | 1 -
 include/crypto/block.h | 2 --
 block/crypto.c | 1 -
 block/qcow.c   | 2 +-
 block/qcow2.c  | 5 ++---
 crypto/block-luks.c| 1 -
 crypto/block-qcow.c| 6 ++
 crypto/block.c | 3 +--
 tests/unit/test-crypto-block.c | 4 
 9 files changed, 6 insertions(+), 19 deletions(-)

diff --git a/crypto/blockpriv.h b/crypto/blockpriv.h
index 4bf6043d5d..b8f77cb5eb 100644
--- a/crypto/blockpriv.h
+++ b/crypto/blockpriv.h
@@ -59,7 +59,6 @@ struct QCryptoBlockDriver {
 QCryptoBlockReadFunc readfunc,
 void *opaque,
 unsigned int flags,
-size_t n_threads,
 Error **errp);
 
 int (*create)(QCryptoBlock *block,
diff --git a/include/crypto/block.h b/include/crypto/block.h
index 92e823c9f2..5b5d039800 100644
--- a/include/crypto/block.h
+++ b/include/crypto/block.h
@@ -76,7 +76,6 @@ typedef enum {
  * @readfunc: callback for reading data from the volume
  * @opaque: data to pass to @readfunc
  * @flags: bitmask of QCryptoBlockOpenFlags values
- * @n_threads: allow concurrent I/O from up to @n_threads threads
  * @errp: pointer to a NULL-initialized error object
  *
  * Create a new block encryption object for an existing
@@ -113,7 +112,6 @@ QCryptoBlock *qcrypto_block_open(QCryptoBlockOpenOptions 
*options,
  QCryptoBlockReadFunc readfunc,
  void *opaque,
  unsigned int flags,
- size_t n_threads,
  Error **errp);
 
 typedef enum {
diff --git a/block/crypto.c b/block/crypto.c
index 21eed909c1..4eed3ffa6a 100644
--- a/block/crypto.c
+++ b/block/crypto.c
@@ -363,7 +363,6 @@ static int block_crypto_open_generic(QCryptoBlockFormat 
format,
block_crypto_read_func,
bs,
cflags,
-   1,
errp);
 
 if (!crypto->block) {
diff --git a/block/qcow.c b/block/qcow.c
index ca8e1d5ec8..c2f89db055 100644
--- a/block/qcow.c
+++ b/block/qcow.c
@@ -211,7 +211,7 @@ static int qcow_open(BlockDriverState *bs, QDict *options, 
int flags,
 cflags |= QCRYPTO_BLOCK_OPEN_NO_IO;
 }
 s->crypto = qcrypto_block_open(crypto_opts, "encrypt.",
-   NULL, NULL, cflags, 1, errp);
+   NULL, NULL, cflags, errp);
 if (!s->crypto) {
 ret = -EINVAL;
 goto fail;
diff --git a/block/qcow2.c b/block/qcow2.c
index 956128b409..10883a2494 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -321,7 +321,7 @@ qcow2_read_extensions(BlockDriverState *bs, uint64_t 
start_offset,
 }
 s->crypto = qcrypto_block_open(s->crypto_opts, "encrypt.",
qcow2_crypto_hdr_read_func,
-   bs, cflags, QCOW2_MAX_THREADS, 
errp);
+   bs, cflags, errp);
 if (!s->crypto) {
 return -EINVAL;
 }
@@ -1701,8 +1701,7 @@ qcow2_do_open(BlockDriverState *bs, QDict *options, int 
flags,
 cflags |= QCRYPTO_BLOCK_OPEN_NO_IO;
 }
 s->crypto = qcrypto_block_open(s->crypto_opts, "encrypt.",
-   NULL, NULL, cflags,
-   QCOW2_MAX_THREADS, errp);
+   NULL, NULL, cflags, errp);
 if (!s->crypto) {
 ret = -EINVAL;
 goto fail;
diff --git a/crypto/block-luks.c b/crypto/block-luks.c
index 3357852c0a..5b777c15d3 100644
--- a/crypto/block-luks.c
+++ b/crypto/block-luks.c
@@ -1189,7 +1189,6 @@ qcrypto_block_luks_open(QCryptoBlock *block,
 QCryptoBlockReadFunc readfunc,
 void *opaque,
 unsigned int flags,
-size_t n_threads,
 Error **errp)
 {
 QCryptoBlockLUKS *luks = NULL;
diff --git a/crypto/block-qcow.c b/crypto/block-qcow.c
index 02305058e3..42e9556e42 100644
--- a/crypto/block-qcow.c
+++ b/crypto/block-qcow.c
@@ -44,7 +44,6 @@ qcrypto_block_qcow_has_format(const uint8_t *buf 
G_GNUC_UNUSED,
 static int
 qcrypto_block_qcow_init(QCryptoBlock *block,
 const char *keysecret,
-

[PULL 2/8] Revert "monitor: use aio_co_reschedule_self()"

From: Stefan Hajnoczi 

Commit 1f25c172f837 ("monitor: use aio_co_reschedule_self()") was a code
cleanup that uses aio_co_reschedule_self() instead of open coding
coroutine rescheduling.

Bug RHEL-34618 was reported and Kevin Wolf  identified
the root cause. I missed that aio_co_reschedule_self() ->
qemu_get_current_aio_context() only knows about
qemu_aio_context/IOThread AioContexts and not about iohandler_ctx. It
does not function correctly when going back from the iohandler_ctx to
qemu_aio_context.

Go back to open coding the AioContext transitions to avoid this bug.

This reverts commit 1f25c172f83704e350c0829438d832384084a74d.

Cc: qemu-sta...@nongnu.org
Buglink: https://issues.redhat.com/browse/RHEL-34618
Signed-off-by: Stefan Hajnoczi 
Message-ID: <20240506190622.56095-2-stefa...@redhat.com>
Reviewed-by: Kevin Wolf 
Signed-off-by: Kevin Wolf 
---
 qapi/qmp-dispatch.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/qapi/qmp-dispatch.c b/qapi/qmp-dispatch.c
index f3488afeef..176b549473 100644
--- a/qapi/qmp-dispatch.c
+++ b/qapi/qmp-dispatch.c
@@ -212,7 +212,8 @@ QDict *coroutine_mixed_fn qmp_dispatch(const QmpCommandList 
*cmds, QObject *requ
  * executing the command handler so that it can make progress if it
  * involves an AIO_WAIT_WHILE().
  */
-aio_co_reschedule_self(qemu_get_aio_context());
+aio_co_schedule(qemu_get_aio_context(), qemu_coroutine_self());
+qemu_coroutine_yield();
 }
 
 monitor_set_cur(qemu_coroutine_self(), cur_mon);
@@ -226,7 +227,9 @@ QDict *coroutine_mixed_fn qmp_dispatch(const QmpCommandList 
*cmds, QObject *requ
  * Move back to iohandler_ctx so that nested event loops for
  * qemu_aio_context don't start new monitor commands.
  */
-aio_co_reschedule_self(iohandler_get_aio_context());
+aio_co_schedule(iohandler_get_aio_context(),
+qemu_coroutine_self());
+qemu_coroutine_yield();
 }
 } else {
/*
-- 
2.45.2

[PULL 1/8] block: drop force_dup parameter of raw_reconfigure_getfd()

From: "Denis V. Lunev via" 

Since commit 72373e40fbc, this parameter is always passed as 'false'
from the caller.

Signed-off-by: Denis V. Lunev 
CC: Andrey Zhadchenko 
CC: Kevin Wolf 
CC: Hanna Reitz 
Message-ID: <20240430170213.148558-1-...@openvz.org>
Reviewed-by: Kevin Wolf 
Signed-off-by: Kevin Wolf 
---
 block/file-posix.c | 8 +++-
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/block/file-posix.c b/block/file-posix.c
index 35684f7e21..5c46938936 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -1039,8 +1039,7 @@ static int fcntl_setfl(int fd, int flag)
 }
 
 static int raw_reconfigure_getfd(BlockDriverState *bs, int flags,
- int *open_flags, uint64_t perm, bool 
force_dup,
- Error **errp)
+ int *open_flags, uint64_t perm, Error **errp)
 {
 BDRVRawState *s = bs->opaque;
 int fd = -1;
@@ -1068,7 +1067,7 @@ static int raw_reconfigure_getfd(BlockDriverState *bs, 
int flags,
 assert((s->open_flags & O_ASYNC) == 0);
 #endif
 
-if (!force_dup && *open_flags == s->open_flags) {
+if (*open_flags == s->open_flags) {
 /* We're lucky, the existing fd is fine */
 return s->fd;
 }
@@ -3748,8 +3747,7 @@ static int raw_check_perm(BlockDriverState *bs, uint64_t 
perm, uint64_t shared,
 int ret;
 
 /* We may need a new fd if auto-read-only switches the mode */
-ret = raw_reconfigure_getfd(bs, input_flags, _flags, perm,
-false, errp);
+ret = raw_reconfigure_getfd(bs, input_flags, _flags, perm, errp);
 if (ret < 0) {
 return ret;
 } else if (ret != s->fd) {
-- 
2.45.2

[PULL 7/8] block/crypto: create ciphers on demand

From: Stefan Hajnoczi 

Ciphers are pre-allocated by qcrypto_block_init_cipher() depending on
the given number of threads. The -device
virtio-blk-pci,iothread-vq-mapping= feature allows users to assign
multiple IOThreads to a virtio-blk device, but the association between
the virtio-blk device and the block driver happens after the block
driver is already open.

When the number of threads given to qcrypto_block_init_cipher() is
smaller than the actual number of threads at runtime, the
block->n_free_ciphers > 0 assertion in qcrypto_block_pop_cipher() can
fail.

Get rid of qcrypto_block_init_cipher() n_thread's argument and allocate
ciphers on demand.

Reported-by: Qing Wang 
Buglink: https://issues.redhat.com/browse/RHEL-36159
Signed-off-by: Stefan Hajnoczi 
Message-ID: <20240527155851.892885-2-stefa...@redhat.com>
Reviewed-by: Kevin Wolf 
Acked-by: Daniel P. Berrangé 
Signed-off-by: Kevin Wolf 
---
 crypto/blockpriv.h  |  12 +++--
 crypto/block-luks.c |   3 +-
 crypto/block-qcow.c |   2 +-
 crypto/block.c  | 111 ++--
 4 files changed, 78 insertions(+), 50 deletions(-)

diff --git a/crypto/blockpriv.h b/crypto/blockpriv.h
index 836f3b4726..4bf6043d5d 100644
--- a/crypto/blockpriv.h
+++ b/crypto/blockpriv.h
@@ -32,8 +32,14 @@ struct QCryptoBlock {
 const QCryptoBlockDriver *driver;
 void *opaque;
 
-QCryptoCipher **ciphers;
-size_t n_ciphers;
+/* Cipher parameters */
+QCryptoCipherAlgorithm alg;
+QCryptoCipherMode mode;
+uint8_t *key;
+size_t nkey;
+
+QCryptoCipher **free_ciphers;
+size_t max_free_ciphers;
 size_t n_free_ciphers;
 QCryptoIVGen *ivgen;
 QemuMutex mutex;
@@ -130,7 +136,7 @@ int qcrypto_block_init_cipher(QCryptoBlock *block,
   QCryptoCipherAlgorithm alg,
   QCryptoCipherMode mode,
   const uint8_t *key, size_t nkey,
-  size_t n_threads, Error **errp);
+  Error **errp);
 
 void qcrypto_block_free_cipher(QCryptoBlock *block);
 
diff --git a/crypto/block-luks.c b/crypto/block-luks.c
index 3ee928fb5a..3357852c0a 100644
--- a/crypto/block-luks.c
+++ b/crypto/block-luks.c
@@ -1262,7 +1262,6 @@ qcrypto_block_luks_open(QCryptoBlock *block,
   luks->cipher_mode,
   masterkey,
   luks->header.master_key_len,
-  n_threads,
   errp) < 0) {
 goto fail;
 }
@@ -1456,7 +1455,7 @@ qcrypto_block_luks_create(QCryptoBlock *block,
 /* Setup the block device payload encryption objects */
 if (qcrypto_block_init_cipher(block, luks_opts.cipher_alg,
   luks_opts.cipher_mode, masterkey,
-  luks->header.master_key_len, 1, errp) < 0) {
+  luks->header.master_key_len, errp) < 0) {
 goto error;
 }
 
diff --git a/crypto/block-qcow.c b/crypto/block-qcow.c
index 4d7cf36a8f..02305058e3 100644
--- a/crypto/block-qcow.c
+++ b/crypto/block-qcow.c
@@ -75,7 +75,7 @@ qcrypto_block_qcow_init(QCryptoBlock *block,
 ret = qcrypto_block_init_cipher(block, QCRYPTO_CIPHER_ALG_AES_128,
 QCRYPTO_CIPHER_MODE_CBC,
 keybuf, G_N_ELEMENTS(keybuf),
-n_threads, errp);
+errp);
 if (ret < 0) {
 ret = -ENOTSUP;
 goto fail;
diff --git a/crypto/block.c b/crypto/block.c
index 506ea1d1a3..ba6d1cebc7 100644
--- a/crypto/block.c
+++ b/crypto/block.c
@@ -20,6 +20,7 @@
 
 #include "qemu/osdep.h"
 #include "qapi/error.h"
+#include "qemu/lockable.h"
 #include "blockpriv.h"
 #include "block-qcow.h"
 #include "block-luks.h"
@@ -57,6 +58,8 @@ QCryptoBlock *qcrypto_block_open(QCryptoBlockOpenOptions 
*options,
 {
 QCryptoBlock *block = g_new0(QCryptoBlock, 1);
 
+qemu_mutex_init(>mutex);
+
 block->format = options->format;
 
 if (options->format >= G_N_ELEMENTS(qcrypto_block_drivers) ||
@@ -76,8 +79,6 @@ QCryptoBlock *qcrypto_block_open(QCryptoBlockOpenOptions 
*options,
 return NULL;
 }
 
-qemu_mutex_init(>mutex);
-
 return block;
 }
 
@@ -92,6 +93,8 @@ QCryptoBlock *qcrypto_block_create(QCryptoBlockCreateOptions 
*options,
 {
 QCryptoBlock *block = g_new0(QCryptoBlock, 1);
 
+qemu_mutex_init(>mutex);
+
 block->format = options->format;
 
 if (options->format >= G_N_ELEMENTS(qcrypto_block_drivers) ||
@@ -111,8 +114,6 @@ QCryptoBlock 
*qcrypto_block_create(QCryptoBlockCreateOptions *options,
 return NULL;
 }
 
-qemu_mutex_init(>mutex);
-
 return block;
 }
 
@@ -227,37 +228,42 @@ QCryptoCipher *qcrypto_block_get_cipher(QCryptoBlock 
*block)
  * This function is used only in test

[PATCH 0/1] i386/tcg fix for IRET as used in dotnet runtime

2024-06-11 Thread Robert R. Henry

This patch fixes the i386/tcg implementation of the IRET instruction
so that IRET can return from user space to user space, as used by the
dotnet runtime to switch threads.

This fixes https://gitlab.com/qemu-project/qemu/-/issues/249

I debugged this issue 4+ years ago, and wrote this patch then.

At the time, I did not fully understand the nuances of the priority
levels in the TCG emulation of the x86, nor of the x86 itself.
I understand less now!

I do not recall exactly how I was led to the conclusion that an
unhandled page fault in kernel space was due to a bug in the code
executed in the tcg emulator for IRET. Eventually, my approach to
debugging was to modify the source for the dotnet runtime so that
immediately prior to the IRET I executed an x87 fpatan2 instruction,
knowing that no modern program used that instruction, and that there
was a single point in QEMU source code that emulated that, making it a
convenient place to put gdb breakpoints to enable further breakpoints in
the IRET emulation code.

With this change the page faults go away, and that the dotnet program
completes as expected. For the curious,
https://github.com/dotnet/runtime/blob/main/src/coreclr/pal/src/arch/amd64/context2.S#L241
shows how the dotnet runtime uses iret.

I have booted BSD, solaris and macosX with this change, and await
results for booting Windows from the Windows kernel team.

I have not tested this with other modern JITers, such as Java,
v8, or HHVM.

Robert R. Henry (1):
i386/tcg: Allow IRET from user mode to user mode for dotnet runtime

target/i386/tcg/seg_helper.c | 78 ++--
1 file changed, 47 insertions(+), 31 deletions(-)

--
2.34.1

[PATCH 1/1] i386/tcg: Allow IRET from user mode to user mode for dotnet runtime

2024-06-11 Thread Robert R. Henry

This fixes a bug wherein i386/tcg assumed an interrupt return using
the IRET instruction was always returning from kernel mode to either
kernel mode or user mode. This assumption is violated when IRET is used
as a clever way to restore thread state, as for example in the dotnet
runtime. There, IRET returns from user mode to user mode.

This bug manifested itself as a page fault in the guest Linux kernel.

This bug appears to have been in QEMU since the beginning.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/249
Signed-off-by: Robert R. Henry 
---
 target/i386/tcg/seg_helper.c | 78 ++--
 1 file changed, 47 insertions(+), 31 deletions(-)

diff --git a/target/i386/tcg/seg_helper.c b/target/i386/tcg/seg_helper.c
index 715db1f232..815d26e61d 100644
--- a/target/i386/tcg/seg_helper.c
+++ b/target/i386/tcg/seg_helper.c
@@ -843,20 +843,35 @@ static void do_interrupt_protected(CPUX86State *env, int 
intno, int is_int,
 
 #ifdef TARGET_X86_64
 
-#define PUSHQ_RA(sp, val, ra)   \
-{   \
-sp -= 8;\
-cpu_stq_kernel_ra(env, sp, (val), ra);  \
-}
-
-#define POPQ_RA(sp, val, ra)\
-{   \
-val = cpu_ldq_kernel_ra(env, sp, ra);   \
-sp += 8;\
-}
+#define PUSHQ_RA(sp, val, ra, cpl, dpl) \
+  FUNC_PUSHQ_RA(env, , val, ra, cpl, dpl)
+
+static inline void FUNC_PUSHQ_RA(
+CPUX86State *env, target_ulong *sp,
+target_ulong val, target_ulong ra, int cpl, int dpl) {
+  *sp -= 8;
+  if (dpl == 0) {
+cpu_stq_kernel_ra(env, *sp, val, ra);
+  } else {
+cpu_stq_data_ra(env, *sp, val, ra);
+  } 
+}
 
-#define PUSHQ(sp, val) PUSHQ_RA(sp, val, 0)
-#define POPQ(sp, val) POPQ_RA(sp, val, 0)
+#define POPQ_RA(sp, val, ra, cpl, dpl) \
+  val = FUNC_POPQ_RA(env, , ra, cpl, dpl)
+
+static inline target_ulong FUNC_POPQ_RA(
+CPUX86State *env, target_ulong *sp,
+target_ulong ra, int cpl, int dpl) {
+  target_ulong val;
+  if (cpl == 0) {  /* TODO perhaps both arms reduce to cpu_ldq_data_ra? */
+val = cpu_ldq_kernel_ra(env, *sp, ra);
+  } else {
+val = cpu_ldq_data_ra(env, *sp, ra);
+  }
+  *sp += 8;
+  return val;
+}
 
 static inline target_ulong get_rsp_from_tss(CPUX86State *env, int level)
 {
@@ -901,6 +916,7 @@ static void do_interrupt64(CPUX86State *env, int intno, int 
is_int,
 uint32_t e1, e2, e3, ss, eflags;
 target_ulong old_eip, esp, offset;
 bool set_rf;
+const target_ulong retaddr = 0;
 
 has_error_code = 0;
 if (!is_int && !is_hw) {
@@ -989,13 +1005,13 @@ static void do_interrupt64(CPUX86State *env, int intno, 
int is_int,
 eflags |= RF_MASK;
 }
 
-PUSHQ(esp, env->segs[R_SS].selector);
-PUSHQ(esp, env->regs[R_ESP]);
-PUSHQ(esp, eflags);
-PUSHQ(esp, env->segs[R_CS].selector);
-PUSHQ(esp, old_eip);
+PUSHQ_RA(esp, env->segs[R_SS].selector, retaddr, cpl, dpl);
+PUSHQ_RA(esp, env->regs[R_ESP], retaddr, cpl, dpl);
+PUSHQ_RA(esp, eflags,   retaddr, cpl, dpl);
+PUSHQ_RA(esp, env->segs[R_CS].selector, retaddr, cpl, dpl);
+PUSHQ_RA(esp, old_eip,  retaddr, cpl, dpl);
 if (has_error_code) {
-PUSHQ(esp, error_code);
+PUSHQ_RA(esp, error_code, retaddr, cpl, dpl);
 }
 
 /* interrupt gate clear IF mask */
@@ -1621,8 +1637,8 @@ void helper_lcall_protected(CPUX86State *env, int new_cs, 
target_ulong new_eip,
 
 /* 64 bit case */
 rsp = env->regs[R_ESP];
-PUSHQ_RA(rsp, env->segs[R_CS].selector, GETPC());
-PUSHQ_RA(rsp, next_eip, GETPC());
+PUSHQ_RA(rsp, env->segs[R_CS].selector, GETPC(), cpl, dpl);
+PUSHQ_RA(rsp, next_eip, GETPC(), cpl, dpl);
 /* from this point, not restartable */
 env->regs[R_ESP] = rsp;
 cpu_x86_load_seg_cache(env, R_CS, (new_cs & 0xfffc) | cpl,
@@ -1792,8 +1808,8 @@ void helper_lcall_protected(CPUX86State *env, int new_cs, 
target_ulong new_eip,
 #ifdef TARGET_X86_64
 if (shift == 2) {
 /* XXX: verify if new stack address is canonical */
-PUSHQ_RA(sp, env->segs[R_SS].selector, GETPC());
-PUSHQ_RA(sp, env->regs[R_ESP], GETPC());
+PUSHQ_RA(sp, env->segs[R_SS].selector, GETPC(), cpl, dpl);
+PUSHQ_RA(sp, env->regs[R_ESP], GETPC(), cpl, dpl);
 /* parameters aren't supported for 64-bit call gates */
 } else
 #endif
@@ -1828,8 +1844,8 @@ void helper_lcall_protected(CPUX86State *env, int new_cs, 
target_ulong new_eip,
 
 #ifdef TARGET_X86_64
 if (shift == 2) {
-PUSHQ_RA(sp, env->segs[R_CS].selector, GETPC());
-PUSHQ_RA(sp, next_eip, GETPC());
+PUSHQ_RA(sp, env->segs[R_CS].selector, GETPC(), cpl, dpl);
+PUSHQ_RA(sp, next_eip,

Re: [PATCH v4 2/4] vvfat: Fix usage of `info.file.offset`

2024-06-11 Thread Amjad Alsharafi

On Tue, Jun 11, 2024 at 04:30:53PM +0200, Kevin Wolf wrote:
> Am 11.06.2024 um 14:31 hat Amjad Alsharafi geschrieben:
> > On Mon, Jun 10, 2024 at 06:49:43PM +0200, Kevin Wolf wrote:
> > > Am 05.06.2024 um 02:58 hat Amjad Alsharafi geschrieben:
> > > > The field is marked as "the offset in the file (in clusters)", but it
> > > > was being used like this
> > > > `cluster_size*(nums)+mapping->info.file.offset`, which is incorrect.
> > > > 
> > > > Additionally, removed the `abort` when `first_mapping_index` does not
> > > > match, as this matches the case when adding new clusters for files, and
> > > > its inevitable that we reach this condition when doing that if the
> > > > clusters are not after one another, so there is no reason to `abort`
> > > > here, execution continues and the new clusters are written to disk
> > > > correctly.
> > > > 
> > > > Signed-off-by: Amjad Alsharafi 
> > > 
> > > Can you help me understand how first_mapping_index really works?
> > > 
> > > It seems to me that you get a chain of mappings for each file on the FAT
> > > filesystem, which are just the contiguous areas in it, and
> > > first_mapping_index refers to the mapping at the start of the file. But
> > > for much of the time, it actually doesn't seem to be set at all, so you
> > > have mapping->first_mapping_index == -1. Do you understand the rules
> > > around when it's set and when it isn't?
> > 
> > Yeah. So `first_mapping_index` is the index of the first mapping, each
> > mapping is a group of clusters that are contiguous in the file.
> > Its mostly `-1` because the first mapping will have the value set as
> > `-1` and not its own index, this value will only be set when the file
> > contain more than one mapping, and this will only happen when you add
> > clusters to a file that are not contiguous with the existing clusters.
> 
> Ah, that makes some sense. Not sure if it's optimal, but it's a rule I
> can work with. So just to confirm, this is the invariant that we think
> should always hold true, right?
> 
> assert((mapping->mode & MODE_DIRECTORY) ||
>!mapping->info.file.offset ||
>mapping->first_mapping_index > 0);
> 

Yes.

We can add this into `get_cluster_count_for_direntry` loop.
I'm thinking of also converting those `abort` into `assert`, since
the line `copy_it = 1;` was confusing me, since it was after the `abort`.

> > And actually, thanks to that I noticed another bug not fixed in PATCH 3, 
> > We are doing this check 
> > `s->current_mapping->first_mapping_index != mapping->first_mapping_index`
> > to know if we should switch to the new mapping or not. 
> > If we were reading from the first mapping (`first_mapping_index == -1`)
> > and we jumped to the second mapping (`first_mapping_index == n`), we
> > will catch this condition and switch to the new mapping.
> > 
> > But if the file has more than 2 mappings, and we jumped to the 3rd
> > mapping, we will not catch this since (`first_mapping_index == n`) for
> > both of them haha. I think a better check is to check the `mapping`
> > pointer directly. (I'll add it also in the next series together with a
> > test for it.)
> 
> This comparison is exactly what confused me. I didn't realise that the
> first mapping in the chain has a different value here, so I thought this
> must mean that we're looking at a different file now - but of course I
> couldn't see a reason for that because we're iterating through a single
> file in this function.
> 
> But even now that I know that the condition triggers when switching from
> the first to the second mapping, it doesn't make sense to me. We don't
> have to copy things around just because a file is non-contiguous.
> 
> What we want to catch is if the order of mappings has changed compared
> to the old state. Do we need a linked list, maybe a prev_mapping_index,
> instead of first_mapping_index so that we can compare if it is still the
> same as before?

I think this would be the better design (tbh, that's what I thought 
`first_mapping_index` would do), though not sure if other components
depend so much into the current design that it would be hard to change.

I'll try to implement this `prev_mapping_index` and see how it goes.

> 
> Or actually, I suppose that's the first block with an abort() in the
> code, just that it doesn't compare mappings, but their offsets.

I think, I'm still confused on the whole logic there, the function
`get_cluster_count_for_direntry` is a mess, and it doesn't just
*get* the cluster count, it also schedule writeouts and may
copy clusters around.

> 
> > > 
> > > >  block/vvfat.c | 12 +++-
> > > >  1 file changed, 7 insertions(+), 5 deletions(-)
> > > > 
> > > > diff --git a/block/vvfat.c b/block/vvfat.c
> > > > index 19da009a5b..f0642ac3e4 100644
> > > > --- a/block/vvfat.c
> > > > +++ b/block/vvfat.c
> > > > @@ -1408,7 +1408,9 @@ read_cluster_directory:
> > > >  
> > > >  assert(s->current_fd);
> > > >  
> > > > -
> > > >

Re: [PATCH v3 03/13] hw/riscv: add RISC-V IOMMU base emulation

2024-06-11 Thread Jason Chien


Hi Daniel,

On 2024/5/24 上午 01:39, Daniel Henrique Barboza wrote:

From: Tomasz Jeznach 

The RISC-V IOMMU specification is now ratified as-per the RISC-V
international process. The latest frozen specifcation can be found
at:

https://github.com/riscv-non-isa/riscv-iommu/releases/download/v1.0/riscv-iommu.pdf

Add the foundation of the device emulation for RISC-V IOMMU, which
includes an IOMMU that has no capabilities but MSI interrupt support and
fault queue interfaces. We'll add add more features incrementally in the
next patches.

Co-developed-by: Sebastien Boeuf 
Signed-off-by: Sebastien Boeuf 
Signed-off-by: Tomasz Jeznach 
Signed-off-by: Daniel Henrique Barboza 
---
  hw/riscv/Kconfig |4 +
  hw/riscv/meson.build |1 +
  hw/riscv/riscv-iommu.c   | 1602 ++
  hw/riscv/riscv-iommu.h   |  141 
  hw/riscv/trace-events|   11 +
  hw/riscv/trace.h |1 +
  include/hw/riscv/iommu.h |   36 +
  meson.build  |1 +
  8 files changed, 1797 insertions(+)
  create mode 100644 hw/riscv/riscv-iommu.c
  create mode 100644 hw/riscv/riscv-iommu.h
  create mode 100644 hw/riscv/trace-events
  create mode 100644 hw/riscv/trace.h
  create mode 100644 include/hw/riscv/iommu.h

diff --git a/hw/riscv/Kconfig b/hw/riscv/Kconfig
index a2030e3a6f..f69d6e3c8e 100644
--- a/hw/riscv/Kconfig
+++ b/hw/riscv/Kconfig
@@ -1,3 +1,6 @@
+config RISCV_IOMMU
+bool
+
  config RISCV_NUMA
  bool
  
@@ -47,6 +50,7 @@ config RISCV_VIRT

  select SERIAL
  select RISCV_ACLINT
  select RISCV_APLIC
+select RISCV_IOMMU
  select RISCV_IMSIC
  select SIFIVE_PLIC
  select SIFIVE_TEST
diff --git a/hw/riscv/meson.build b/hw/riscv/meson.build
index f872674093..cbc99c6e8e 100644
--- a/hw/riscv/meson.build
+++ b/hw/riscv/meson.build
@@ -10,5 +10,6 @@ riscv_ss.add(when: 'CONFIG_SIFIVE_U', if_true: 
files('sifive_u.c'))
  riscv_ss.add(when: 'CONFIG_SPIKE', if_true: files('spike.c'))
  riscv_ss.add(when: 'CONFIG_MICROCHIP_PFSOC', if_true: 
files('microchip_pfsoc.c'))
  riscv_ss.add(when: 'CONFIG_ACPI', if_true: files('virt-acpi-build.c'))
+riscv_ss.add(when: 'CONFIG_RISCV_IOMMU', if_true: files('riscv-iommu.c'))
  
  hw_arch += {'riscv': riscv_ss}

diff --git a/hw/riscv/riscv-iommu.c b/hw/riscv/riscv-iommu.c
new file mode 100644
index 00..39b4ff1405
--- /dev/null
+++ b/hw/riscv/riscv-iommu.c
@@ -0,0 +1,1602 @@
+/*
+ * QEMU emulation of an RISC-V IOMMU
+ *
+ * Copyright (C) 2021-2023, Rivos Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see .
+ */
+
+#include "qemu/osdep.h"
+#include "qom/object.h"
+#include "hw/pci/pci_bus.h"
+#include "hw/pci/pci_device.h"
+#include "hw/qdev-properties.h"
+#include "hw/riscv/riscv_hart.h"
+#include "migration/vmstate.h"
+#include "qapi/error.h"
+#include "qemu/timer.h"
+
+#include "cpu_bits.h"
+#include "riscv-iommu.h"
+#include "riscv-iommu-bits.h"
+#include "trace.h"
+
+#define LIMIT_CACHE_CTX   (1U << 7)
+#define LIMIT_CACHE_IOT   (1U << 20)
+
+/* Physical page number coversions */
+#define PPN_PHYS(ppn) ((ppn) << TARGET_PAGE_BITS)
+#define PPN_DOWN(phy) ((phy) >> TARGET_PAGE_BITS)
+
+typedef struct RISCVIOMMUContext RISCVIOMMUContext;
+typedef struct RISCVIOMMUEntry RISCVIOMMUEntry;
+
+/* Device assigned I/O address space */
+struct RISCVIOMMUSpace {
+IOMMUMemoryRegion iova_mr;  /* IOVA memory region for attached device */
+AddressSpace iova_as;   /* IOVA address space for attached device */
+RISCVIOMMUState *iommu; /* Managing IOMMU device state */
+uint32_t devid; /* Requester identifier, AKA device_id */
+bool notifier;  /* IOMMU unmap notifier enabled */
+QLIST_ENTRY(RISCVIOMMUSpace) list;
+};
+
+/* Device translation context state. */
+struct RISCVIOMMUContext {
+uint64_t devid:24;  /* Requester Id, AKA device_id */
+uint64_t pasid:20;  /* Process Address Space ID */
+uint64_t __rfu:20;  /* reserved */
+uint64_t tc;/* Translation Control */
+uint64_t ta;/* Translation Attributes */
+uint64_t msi_addr_mask; /* MSI filtering - address mask */
+uint64_t msi_addr_pattern;  /* MSI filtering - address pattern */
+uint64_t msiptp;/* MSI redirection page table pointer */
+};
+
+/* IOMMU index for transactions without PASID

Re: about QEMU TLS

2024-06-11 Thread Yu Zhang

Hello Daniel and all,

When I was using TLS encryption for VM live-migration, I noticed one
thing: the migration works regardless of the "endpoint" setting (that
is: either "endpoint=server", or "endpoint=client") on the target
server.
The line I added is:
"-object tls-creds-x509,id=tls0,dir=/path/to/qemutls,endpoint=client
(or server),verify-peer=on".

It seems that currently the setting of "endpoint" is not strictly
enforced for VM migration. I'd like to know, if it's intentionally
done to allow a certain flexibility, or should be fixed from the
security perspective. Thank you very much!

Best regards,
Yu Zhang @ IONOS cloud

On Mon, Aug 21, 2023 at 4:29 PM Yu Zhang  wrote:
>
> Hello Daniel,
>
> sorry for my slow reply! I tested the approach you suggested by the
> following way:
>
> On the target server, start a VM in -incoming mode:
>
> qemu-7.1 \
> -uuid ${VM_UUID} \
>  ...
> -object tls-creds-x509,id=tls0,dir=${HOME}/qemutls,endpoint=server \
>  ...
> -incoming defer \
> -qmp unix:${SOCK},server,nowait \
> -qmp unix:${SOCK},server,nowait &
>
> Set the migrate parameter and waiting for the incoming VM from source:
>
> echo '{"execute":"qmp_capabilities"}{ "execute":
> "migrate-set-parameters", "arguments": { "tls-creds": "tls0" }}' |
> sudo nc -U -w 1 ${SOCK}
> echo '{"execute":"qmp_capabilities"}{ "execute": "migrate",
> "arguments": { "uri": "tcp::8089" }}
>
> in HMP:
> (qemu) migrate_set_parameter tls-creds tls0
> (qemu) migrate_incoming tcp:[::]:8089
>
> On the source server, start a VM:
>
> qemu-7.1 \
> -uuid ${VM_UUID} \
>  ...
> -object tls-creds-x509,id=tls0,dir=${HOME}/qemutls,endpoint=client \
>  ...
> -qmp unix:${SOCK},server,nowait \
> -qmp unix:${SOCK},server,nowait &
>
> Set the migrate parameter and migrate the VM from source to target:
>
> echo '{"execute":"qmp_capabilities"}{ "execute":
> "migrate-set-parameters", "arguments": { "tls-creds": "tls0" }}' |
> sudo nc -U -w 1 ${SOCK}
> echo '{"execute":"qmp_capabilities"}{ "execute": "migrate",
> "arguments": { "uri": "tcp:10.41.19.32:8089" }}
>
> and query the migration after a few seconds:
>
> echo '{"execute":"qmp_capabilities"}{ "execute": "query-migrate" }' |
> sudo nc -U -w 1 ${SOCK}
>
> the migrate is completed successfully.
>
> To further migrate the VM from source (the target for the previously
> migration), the endpoint must be changed from "server" to "client" by
> QMP commands:
>
> echo '{"execute":"qmp_capabilities"}{ "execute": "object-del",
> "arguments": { "id": "tls0" }}' | sudo nc -U -w 1 ${SOCK}
> echo '{"execute":"qmp_capabilities"}{ "execute": "object-add",
> "arguments": { "id": "tls0", "qom-type": "tls-creds-x509", "endpoint":
> "client", "dir": "${HOME}/qemutls", "verify-peer": false }}' | sudo nc
> -U -w 1 ${SOCK}
>
> which in HMP commands are:
>
> (qemu) object_del tls0
> (qemu) object_add tls-creds-x509,id=tls0,dir=${HOME}/qemutls,endpoint=client
> (qemu) migrate_set_parameter tls-creds tls0
> (qemu) migrate tcp:10.41.16.10:8089
>
> So far as I tested, the TLS certificate must be valid for at least one
> day. Therefore, the VM migration with an expired TLS certificate can
> only be done in one day.
>
> Thank you so much for your kind reply!
> Best regards
>
> Yu Zhang @ IONOS Compute Platform
>
> On Thu, Aug 17, 2023 at 12:49 PM Daniel P. Berrangé  
> wrote:
> >
> > On Mon, Aug 07, 2023 at 12:07:31AM +0200, Yu Zhang wrote:
> > > Hi all,
> > >
> > > According to qemu docs [1], TLS parameters are specified as an object in
> > > the QEMU command line:
> > >
> > >-object tls-creds-x509,id=id,endpoint=endpoint,dir=/path/to/cred/dir 
> > > ...
> > >
> > > of which "endpoint" is a type of "QCryptoTLSCredsEndpoint" and can be
> > > either a "server" or a "client".
> > >
> > > I'd like to know:
> > >
> > > - When a VM is started with this config, is there a way (e.g. QMP) to
> > > change the value of "endpoint"?
> > >   If possible, how to do this? or else after the first migration of a VM,
> > > the VM has "endpoint=server",
> > >   which can't be migrated without stop / start.
> >
> > Use object_del + object_add to delete the old credentials and
> > create new ones.
> >
> > > - In which case does the QEMU reload its TLS certificate, e.g. when a QEMU
> > > VM has been run longer
> > >   than the valid period of its TLS certificate?
> >
> > The certs are loaded at the time the incoming/outgoing migration
> > operation is initiated, so they are always fresh.
> >
> > > - The migration is done by using HMP monitor on both source and target
> > > side. Is it possible to do it
> > >   by using QMP commands?
> >
> > Almost everything in HMP has an equivalent QMP command.
> >
> >
> > With regards,
> > Daniel
> > --
> > |: https://berrange.com  -o-https://www.flickr.com/photos/dberrange 
> > :|
> > |: https://libvirt.org -o-https://fstop138.berrange.com 
> > :|
> > |: https://entangle-photo.org-o-https://www.instagram.com/dberrange 
> > :|
> >

Re: [PATCH v2 4/7] migration/multifd: Add UADK initialization

On Tue, 11 Jun 2024 at 02:35, Fabiano Rosas  wrote:
>
> Shameer Kolothum via  writes:
>
> > Initialize UADK session and allocate buffers required. The actual
> > compression/decompression will only be done in a subsequent patch.
> >
> > Signed-off-by: Shameer Kolothum 
>
> Reviewed-by: Fabiano Rosas 

Reviewed-by: Zhangfei Gao

Re: [PATCH v2 6/7] migration/multifd: Switch to no compression when no hardware support

On Fri, 7 Jun 2024 at 21:54, Shameer Kolothum
 wrote:
>
> Send raw packets over if UADK hardware support is not available. This is to
> satisfy  Qemu qtest CI which may run on platforms that don't have UADK
> hardware support. Subsequent patch will add support for uadk migration
> qtest.
>
> Reviewed-by: Fabiano Rosas 
> Signed-off-by: Shameer Kolothum 

Reviewed-by: Zhangfei Gao

Re: [PATCH v2 5/7] migration/multifd: Add UADK based compression and decompression

On Fri, 7 Jun 2024 at 21:54, Shameer Kolothum
 wrote:
>
> Uses UADK wd_do_comp_sync() API to (de)compress a normal page using
> hardware accelerator.
>
> Reviewed-by: Fabiano Rosas 
> Signed-off-by: Shameer Kolothum 

Reviewed-by: Zhangfei Gao

Re: [PATCH v2 3/7] migration/multifd: add uadk compression framework

On Fri, 7 Jun 2024 at 21:54, Shameer Kolothum
 wrote:
>
> Adds the skeleton to support uadk compression method.
> Complete functionality will be added in subsequent patches.
>
> Acked-by: Markus Armbruster 
> Reviewed-by: Fabiano Rosas 
> Signed-off-by: Shameer Kolothum 
Reviewed-by: Zhangfei Gao

Re: [PATCH v2 2/7] configure: Add uadk option

On Fri, 7 Jun 2024 at 21:54, Shameer Kolothum
 wrote:
>
> Add --enable-uadk and --disable-uadk options to enable and disable
> UADK compression accelerator. This is for using UADK based hardware
> accelerators for live migration.
>
> Reviewed-by: Fabiano Rosas 
> Signed-off-by: Shameer Kolothum 

Reviewed-by: Zhangfei Gao

Re: [PATCH v2 1/7] docs/migration: add uadk compression feature

On Fri, 7 Jun 2024 at 21:54, Shameer Kolothum
 wrote:
>
> Document UADK(User Space Accelerator Development Kit) library details
> and how to use that for migration.
>
> Signed-off-by: Shameer Kolothum 

Good job, thanks Shameer

Reviewed-by: Zhangfei Gao

Re: [RFC PATCH v1 0/6] Implement ARM PL011 in Rust

2024-06-11 Thread Pierrick Bouvier


On 6/11/24 02:21, Alex Bennée wrote:

Pierrick Bouvier  writes:


On 6/10/24 13:29, Manos Pitsidianakis wrote:

On Mon, 10 Jun 2024 22:37, Pierrick Bouvier  wrote:

Hello Manos,




Excellent work, and thanks for posting this RFC!

IMHO, having patches 2 and 5 splitted is a bit confusing, and exposing
(temporarily) the generated.rs file in patches is not a good move.
Any reason you kept it this way?

That was my first approach, I will rework it on the second version.
The
generated code should not exist in committed code at all.
It was initally tricky setting up the dependency orders correctly,
so I
first committed it and then made it a dependency.



Maybe it could be better if build.rs file was *not* needed for new
devices/folders, and could be abstracted as a detail of the python
wrapper script instead of something that should be committed.

That'd mean you cannot work on the rust files with a LanguageServer,
you
cannot run cargo build or cargo check or cargo clippy, etc. That's why I
left the alternative choice of including a manually generated bindings
file (generated.rs.inc)



Maybe I missed something, but it seems like it just checks/copies the
generated.rs file where it's expected. Definitely something that could
be done as part of the rust build.

Having to run the build before getting completion does not seem to be
a huge compromise.




As long as the Language Server can kick in after a first build. Rust
definitely leans in to the concept of the tooling helping you out while
coding.

I think for the C LSPs compile_commands.json is generated during the
configure step but I could be wrong.



Yes, meson generates it.
I agree having support for completion tooling is important nowadays, 
whether in C or in Rust.

Re: Re: [PATCH v5 00/10] Support persistent reservation operations

2024-06-11 Thread Stefan Hajnoczi

On Mon, Jun 10, 2024 at 07:55:20PM -0700, 卢长奇 wrote:
> Hi,
> 
> Sorry, I explained it in patch2 and forgot to reply your email.
> 
> The existing PRManager only works with local scsi devices. This series
> will completely decouple devices and drivers. The device can not only be
> scsi, but also other devices such as nvme. The same is true for the
> driver, which is completely unrestricted.
> 
> And block/file-posix.c can implement the new block driver, and
> pr_manager can be executed after splicing ioctl commands in these
> drivers. This will be implemented in subsequent patches.

Thanks for explaining!

Stefan

> 
> On 2024/6/11 01:18, Stefan Hajnoczi wrote:
> > On Thu, Jun 06, 2024 at 08:24:34PM +0800, Changqi Lu wrote:
> >> Hi,
> >>
> >> patchv5 has been modified.
> >>
> >> Sincerely hope that everyone can help review the
> >> code and provide some suggestions.
> >>
> >> v4->v5:
> >> - Fixed a memory leak bug at hw/nvme/ctrl.c.
> >>
> >> v3->v4:
> >> - At the nvme layer, the two patches of enabling the ONCS
> >> function and enabling rescap are combined into one.
> >> - At the nvme layer, add helper functions for pr capacity
> >> conversion between the block layer and the nvme layer.
> >>
> >> v2->v3:
> >> In v2 Persist Through Power Loss(PTPL) is enable default.
> >> In v3 PTPL is supported, which is passed as a parameter.
> >>
> >> v1->v2:
> >> - Add sg_persist --report-capabilities for SCSI protocol and enable
> >> oncs and rescap for NVMe protocol.
> >> - Add persistent reservation capabilities constants and helper functions
> for
> >> SCSI and NVMe protocol.
> >> - Add comments for necessary APIs.
> >>
> >> v1:
> >> - Add seven APIs about persistent reservation command for block layer.
> >> These APIs including reading keys, reading reservations, registering,
> >> reserving, releasing, clearing and preempting.
> >> - Add the necessary pr-related operation APIs for both the
> >> SCSI protocol and NVMe protocol at the device layer.
> >> - Add scsi driver at the driver layer to verify the functions
> >
> > My question from v1 is unanswered:
> >
> > What is the relationship to the existing PRManager functionality
> > (docs/interop/pr-helper.rst) where block/file-posix.c interprets SCSI
> > ioctls and sends persistent reservation requests to an external helper
> > process?
> >
> > I wonder if block/file-posix.c can implement the new block driver
> > callbacks using pr_mgr (while keeping the existing scsi-generic
> > support).
> >
> > Thanks,
> > Stefan
> >
> >>
> >>
> >> Changqi Lu (10):
> >> block: add persistent reservation in/out api
> >> block/raw: add persistent reservation in/out driver
> >> scsi/constant: add persistent reservation in/out protocol constants
> >> scsi/util: add helper functions for persistent reservation types
> >> conversion
> >> hw/scsi: add persistent reservation in/out api for scsi device
> >> block/nvme: add reservation command protocol constants
> >> hw/nvme: add helper functions for converting reservation types
> >> hw/nvme: enable ONCS and rescap function
> >> hw/nvme: add reservation protocal command
> >> block/iscsi: add persistent reservation in/out driver
> >>
> >> block/block-backend.c | 397 ++
> >> block/io.c | 163 +++
> >> block/iscsi.c | 443 ++
> >> block/raw-format.c | 56 
> >> hw/nvme/ctrl.c | 326 +-
> >> hw/nvme/ns.c | 5 +
> >> hw/nvme/nvme.h | 84 ++
> >> hw/scsi/scsi-disk.c | 352 
> >> include/block/block-common.h | 40 +++
> >> include/block/block-io.h | 20 ++
> >> include/block/block_int-common.h | 84 ++
> >> include/block/nvme.h | 98 +++
> >> include/scsi/constants.h | 52 
> >> include/scsi/utils.h | 8 +
> >> include/sysemu/block-backend-io.h | 24 ++
> >> scsi/utils.c | 81 ++
> >> 16 files changed, 2231 insertions(+), 2 deletions(-)
> >>
> >> --
> >> 2.20.1
> >>


signature.asc
Description: PGP signature

Re: [PATCH v1] virtio-iommu: add error check before assert

2024-06-11 Thread Philippe Mathieu-Daudé


On 11/6/24 14:23, Manos Pitsidianakis wrote:

A fuzzer case discovered by Zheyu Ma causes an assert failure.

Add a check before the assert, and respond with an error before moving
on to the next queue element.

To reproduce the failure:

cat << EOF | \
qemu-system-x86_64 \
-display none -machine accel=qtest -m 512M -machine q35 -nodefaults \
-device virtio-iommu -qtest stdio
outl 0xcf8 0x8804
outw 0xcfc 0x06
outl 0xcf8 0x8820
outl 0xcfc 0xe0004000
write 0x1e 0x1 0x01
write 0xe0004020 0x4 0x1000
write 0xe0004028 0x4 0x00101000
write 0xe000401c 0x1 0x01
write 0x106000 0x1 0x05
write 0x11 0x1 0x60
write 0x12 0x1 0x10
write 0x19 0x1 0x04
write 0x1c 0x1 0x01
write 0x100018 0x1 0x04
write 0x10001c 0x1 0x02
write 0x101003 0x1 0x01
write 0xe0007001 0x1 0x00
EOF

Reported-by: Zheyu Ma 
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2359
Signed-off-by: Manos Pitsidianakis 
---
  hw/virtio/virtio-iommu.c | 12 
  1 file changed, 12 insertions(+)

diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
index 1326c6ec41..9b99def39f 100644
--- a/hw/virtio/virtio-iommu.c
+++ b/hw/virtio/virtio-iommu.c
@@ -818,6 +818,18 @@ static void virtio_iommu_handle_command(VirtIODevice 
*vdev, VirtQueue *vq)
  out:
  sz = iov_from_buf(elem->in_sg, elem->in_num, 0,
buf ? buf : , output_size);
+if (unlikely(sz != output_size)) {


Is this a normal guest behavior? Should we log it as GUEST_ERROR?


+tail.status = VIRTIO_IOMMU_S_DEVERR;
+/* We checked that tail can fit earlier */
+output_size = sizeof(tail);
+g_free(buf);
+buf = NULL;
+sz = iov_from_buf(elem->in_sg,
+  elem->in_num,
+  0,
+  ,
+  output_size);
+}
  assert(sz == output_size);
  
  virtqueue_push(vq, elem, sz);


base-commit: 80e8f0602168f451a93e71cbb1d59e93d745e62e

Re: [PATCH v2 3/3] hw/arm/virt: allow creation of a second NonSecure UART

2024-06-11 Thread Philippe Mathieu-Daudé


On 10/6/24 18:23, Peter Maydell wrote:

For some use-cases, it is helpful to have more than one UART
available to the guest.  If the second UART slot is not already used
for a TrustZone Secure-World-only UART, create it as a NonSecure UART
only when the user provides a serial backend (e.g.  via a second
-serial command line option).

This avoids problems where existing guest software only expects a
single UART, and gets confused by the second UART in the DTB.  The
major example of this is older EDK2 firmware, which will send the
GRUB bootloader output to UART1 and the guest serial output to UART0.
Users who want to use both UARTs with a guest setup including EDK2
are advised to update to EDK2 release edk2-stable202311 or newer.
(The prebuilt EDK2 blobs QEMU upstream provides are new enough.)
The relevant EDK2 changes are the ones described here:
https://bugzilla.tianocore.org/show_bug.cgi?id=4577

Inspired-by: Axel Heider 
Signed-off-by: Peter Maydell 
Tested-by: Laszlo Ersek 
---
  docs/system/arm/virt.rst |  6 +-
  include/hw/arm/virt.h|  1 +
  hw/arm/virt-acpi-build.c | 12 
  hw/arm/virt.c| 38 +++---
  4 files changed, 49 insertions(+), 8 deletions(-)


Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH v4 2/4] vvfat: Fix usage of `info.file.offset`