from:"Stefano Garzarella"

Re: [PATCH v6 2/3] vhost-user-blk: split vhost_user_blk_sync_config()

2024-10-01 Thread Stefano Garzarella


On Fri, Sep 20, 2024 at 12:49:35PM GMT, Vladimir Sementsov-Ogievskiy wrote:

Split vhost_user_blk_sync_config() out from
vhost_user_blk_handle_config_change(), to be reused in the following
commit.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Acked-by: Raphael Norwitz 
---
hw/block/vhost-user-blk.c | 26 +++---
1 file changed, 19 insertions(+), 7 deletions(-)

diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
index 5b7f46bbb0..48b3dabb8d 100644
--- a/hw/block/vhost-user-blk.c
+++ b/hw/block/vhost-user-blk.c
@@ -90,27 +90,39 @@ static void vhost_user_blk_set_config(VirtIODevice *vdev, 
const uint8_t *config)
s->blkcfg.wce = blkcfg->wce;
}

+static int vhost_user_blk_sync_config(DeviceState *dev, Error **errp)


I was going to ask why use `DeviceState *`, but then I saw the next 
commit where it's needed by `dc->sync_config` callback.


LGTM!

Reviewed-by: Stefano Garzarella 


+{
+int ret;
+VirtIODevice *vdev = VIRTIO_DEVICE(dev);
+VHostUserBlk *s = VHOST_USER_BLK(vdev);
+
+ret = vhost_dev_get_config(&s->dev, (uint8_t *)&s->blkcfg,
+   vdev->config_len, errp);
+if (ret < 0) {
+return ret;
+}
+
+memcpy(vdev->config, &s->blkcfg, vdev->config_len);
+virtio_notify_config(vdev);
+
+return 0;
+}
+
static int vhost_user_blk_handle_config_change(struct vhost_dev *dev)
{
int ret;
-VirtIODevice *vdev = dev->vdev;
-VHostUserBlk *s = VHOST_USER_BLK(dev->vdev);
Error *local_err = NULL;

if (!dev->started) {
return 0;
}

-ret = vhost_dev_get_config(dev, (uint8_t *)&s->blkcfg,
-   vdev->config_len, &local_err);
+ret = vhost_user_blk_sync_config(DEVICE(dev->vdev), &local_err);
if (ret < 0) {
error_report_err(local_err);
return ret;
}

-memcpy(dev->vdev->config, &s->blkcfg, vdev->config_len);
-virtio_notify_config(dev->vdev);
-
return 0;
}

--
2.34.1

Re: [PATCH V2] virtio/vhost-user: fix qemu abort when hotunplug vhost-user-net device

2024-10-01 Thread Stefano Garzarella


On Thu, Sep 19, 2024 at 03:29:44PM GMT, yaozhenguo wrote:

During the process of hot-unplug in vhost-user-net NIC, vhost_user_cleanup
may add same rcu node to rcu list. Function calls are as follows:

vhost_user_cleanup
   ->vhost_user_host_notifier_remove
   ->call_rcu(n, vhost_user_host_notifier_free, rcu);
   ->g_free_rcu(n, rcu);

When this happens, QEMU will abort in try_dequeue:

if (head == &dummy && qatomic_mb_read(&tail) == &dummy.next) {
   abort();
}

Backtrace is as follows:
0  __pthread_kill_implementation () at /usr/lib64/libc.so.6
1  raise () at /usr/lib64/libc.so.6
2  abort () at /usr/lib64/libc.so.6
3  try_dequeue () at ../util/rcu.c:235
4  call_rcu_thread (0) at ../util/rcu.c:288
5  qemu_thread_start (0) at ../util/qemu-thread-posix.c:541
6  start_thread () at /usr/lib64/libc.so.6
7  clone3 () at /usr/lib64/libc.so.6

Reason for the abort is that adding two identical nodes to the rcu list will
cause it becomes a ring. After dummy node is added to the tail of the list in
try_dequeue, the ring is opened. But lead to a situation that only one node is
added to list and rcu_call_count is added twice. This will cause try_dequeue
abort.

This issue happens when n->addr != 0 in vhost_user_host_notifier_remove. It can
happens in a hotplug stress test with a 32queue vhost-user-net type NIC.
Because n->addr is set in VHOST_USER_BACKEND_VRING_HOST_NOTIFIER_MSG function.
during device hotplug process and it is cleared in vhost_dev_stop during device
hot-unplug. Since VHOST_USER_BACKEND_VRING_HOST_NOTIFIER_MSG is a message sent
by DPDK to qemu, it is asynchronous. So, there is no guaranteed order between
the two processes of setting n->addr and clearing n->addr. If setting n->addr
in hotplug occurs after clearing n->addr in hotunplug, the issue will occur.
So, it is necessary to merge g_free_rcu and vhost_user_host_notifier_free into
one rcu node.

Fixes: 503e355465 ("virtio/vhost-user: dynamically assign 
VhostUserHostNotifiers")

Signed-off-by: yaozhenguo 
---

V1->V2:
   add n->addr check in vhost_user_get_vring_base and 
vhost_user_backend_handle_vring_host_notifier
   to prevent submit same node to rcu list.

---
hw/virtio/vhost-user.c | 39 +--
include/hw/virtio/vhost-user.h |  1 +
2 files changed, 22 insertions(+), 18 deletions(-)

diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index 00561da..ba4c09c 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -51,7 +51,6 @@
#else
#define VHOST_USER_MAX_RAM_SLOTS 512
#endif
-
/*
 * Maximum size of virtio device config space
 */
@@ -1185,9 +1184,16 @@ static int vhost_user_set_vring_num(struct vhost_dev 
*dev,

static void vhost_user_host_notifier_free(VhostUserHostNotifier *n)
{
-assert(n && n->unmap_addr);
-munmap(n->unmap_addr, qemu_real_host_page_size());
-n->unmap_addr = NULL;
+if (n->unmap_addr) {
+munmap(n->unmap_addr, qemu_real_host_page_size());
+n->unmap_addr = NULL;
+}
+if (n->need_free) {
+memory_region_transaction_begin();
+object_unparent(OBJECT(&n->mr));
+memory_region_transaction_commit();
+g_free(n);
+}
}

/*
@@ -1195,17 +1201,20 @@ static void 
vhost_user_host_notifier_free(VhostUserHostNotifier *n)
 * under rcu.
 */
static void vhost_user_host_notifier_remove(VhostUserHostNotifier *n,
-VirtIODevice *vdev)
+VirtIODevice *vdev, bool free)


What about `destroy` instead of `free`?

In that way is more clear that it should be true when called by
`vhost_user_state_destroy()`.


{
if (n->addr) {
if (vdev) {
+memory_region_transaction_begin();
virtio_queue_set_host_notifier_mr(vdev, n->idx, &n->mr, false);
+memory_region_transaction_commit();
}
assert(!n->unmap_addr);
n->unmap_addr = n->addr;
n->addr = NULL;
-call_rcu(n, vhost_user_host_notifier_free, rcu);
}


Instead of checking n->addr in the caller, I suggest to move the check
here:

  if (destroy || n->unmap_addr) {
  s->destroy = destroy;
  call_rcu(n, vhost_user_host_notifier_free, rcu);
  }

Thanks,
Stefano


+n->need_free = free;
+call_rcu(n, vhost_user_host_notifier_free, rcu);
}

static int vhost_user_set_vring_base(struct vhost_dev *dev,
@@ -1279,8 +1288,8 @@ static int vhost_user_get_vring_base(struct vhost_dev 
*dev,
struct vhost_user *u = dev->opaque;

VhostUserHostNotifier *n = fetch_notifier(u->user, ring->index);
-if (n) {
-vhost_user_host_notifier_remove(n, dev->vdev);
+if (n && n->addr) {
+vhost_user_host_notifier_remove(n, dev->vdev, false);
}

ret = vhost_user_write(dev, &msg, NULL, 0);
@@ -1562,7 +1571,9 @@ static int 
vhost_user_backend_handle_vring_host_notifier(struct vhost_dev *dev,
 * new mapped address.
 */
n = fetch_or_create_notifier(user, queue_idx);
-

Re: [PATCH] vhost: Remove unused vhost_dev_{load|save}_inflight

2024-10-01 Thread Stefano Garzarella


On Wed, Sep 18, 2024 at 01:10:34PM GMT, d...@treblig.org wrote:

From: "Dr. David Alan Gilbert" 

vhost_dev_load_inflight and vhost_dev_save_inflight have been
unused since they were added in 2019 by:

5ad204bf2a ("vhost-user: Support transferring inflight buffer between qemu and 
backend")

Remove them, and their helper vhost_dev_resize_inflight.

Signed-off-by: Dr. David Alan Gilbert 
---
hw/virtio/vhost.c | 56 ---
include/hw/virtio/vhost.h |  2 --
2 files changed, 58 deletions(-)


Reviewed-by: Stefano Garzarella 



diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index 7c5ef81b55..76f9b2aaad 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -1930,62 +1930,6 @@ void vhost_dev_free_inflight(struct vhost_inflight 
*inflight)
}
}

-static int vhost_dev_resize_inflight(struct vhost_inflight *inflight,
- uint64_t new_size)
-{
-Error *err = NULL;
-int fd = -1;
-void *addr = qemu_memfd_alloc("vhost-inflight", new_size,
-  F_SEAL_GROW | F_SEAL_SHRINK | F_SEAL_SEAL,
-  &fd, &err);
-
-if (err) {
-error_report_err(err);
-return -ENOMEM;
-}
-
-vhost_dev_free_inflight(inflight);
-inflight->offset = 0;
-inflight->addr = addr;
-inflight->fd = fd;
-inflight->size = new_size;
-
-return 0;
-}
-
-void vhost_dev_save_inflight(struct vhost_inflight *inflight, QEMUFile *f)
-{
-if (inflight->addr) {
-qemu_put_be64(f, inflight->size);
-qemu_put_be16(f, inflight->queue_size);
-qemu_put_buffer(f, inflight->addr, inflight->size);
-} else {
-qemu_put_be64(f, 0);
-}
-}
-
-int vhost_dev_load_inflight(struct vhost_inflight *inflight, QEMUFile *f)
-{
-uint64_t size;
-
-size = qemu_get_be64(f);
-if (!size) {
-return 0;
-}
-
-if (inflight->size != size) {
-int ret = vhost_dev_resize_inflight(inflight, size);
-if (ret < 0) {
-return ret;
-}
-}
-inflight->queue_size = qemu_get_be16(f);
-
-qemu_get_buffer(f, inflight->addr, size);
-
-return 0;
-}
-
int vhost_dev_prepare_inflight(struct vhost_dev *hdev, VirtIODevice *vdev)
{
int r;
diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
index c75be46c06..461c168c37 100644
--- a/include/hw/virtio/vhost.h
+++ b/include/hw/virtio/vhost.h
@@ -338,8 +338,6 @@ void vhost_virtqueue_stop(struct vhost_dev *dev, struct 
VirtIODevice *vdev,

void vhost_dev_reset_inflight(struct vhost_inflight *inflight);
void vhost_dev_free_inflight(struct vhost_inflight *inflight);
-void vhost_dev_save_inflight(struct vhost_inflight *inflight, QEMUFile *f);
-int vhost_dev_load_inflight(struct vhost_inflight *inflight, QEMUFile *f);
int vhost_dev_prepare_inflight(struct vhost_dev *hdev, VirtIODevice *vdev);
int vhost_dev_set_inflight(struct vhost_dev *dev,
   struct vhost_inflight *inflight);
--
2.46.0

Re: [PATCH v6 04/16] hw/i386: Add igvm-cfg object and processing for IGVM files

2024-10-01 Thread Stefano Garzarella


On Thu, Sep 26, 2024 at 12:41:53PM GMT, Roy Hopkins wrote:

An IGVM file contains configuration of guest state that should be
applied during configuration of the guest, before the guest is started.

This patch allows the user to add an igvm-cfg object to an X86 machine
configuration that allows an IGVM file to be configured that will be
applied to the guest before it is started.

If an IGVM configuration is provided then the IGVM file is processed at
the end of the board initialization, before the state transition to
PHASE_MACHINE_INITIALIZED.

Signed-off-by: Roy Hopkins 
Reviewed-by: Michael S. Tsirkin 
---
hw/i386/pc.c  | 12 
hw/i386/pc_piix.c | 10 ++
hw/i386/pc_q35.c  | 10 ++
include/hw/i386/x86.h |  3 +++
qemu-options.hx   | 28 
5 files changed, 63 insertions(+)


Reviewed-by: Stefano Garzarella 



diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 8d84c22458..695fc1dbfe 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -1843,6 +1843,18 @@ static void pc_machine_class_init(ObjectClass *oc, void 
*data)
object_class_property_add_bool(oc, "fd-bootchk",
pc_machine_get_fd_bootchk,
pc_machine_set_fd_bootchk);
+
+#if defined(CONFIG_IGVM)
+object_class_property_add_link(oc, "igvm-cfg",
+   TYPE_IGVM_CFG,
+   offsetof(X86MachineState, igvm),
+   object_property_allow_set_link,
+   OBJ_PROP_LINK_STRONG);
+object_class_property_set_description(oc, "igvm-cfg",
+  "Set IGVM configuration");
+#endif
+
+
}

static const TypeInfo pc_machine_info = {
diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index 2bf6865d40..5adf7da6f4 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -360,6 +360,16 @@ static void pc_init1(MachineState *machine, const char 
*pci_type)
   x86_nvdimm_acpi_dsmio,
   x86ms->fw_cfg, OBJECT(pcms));
}
+
+#if defined(CONFIG_IGVM)
+/* Apply guest state from IGVM if supplied */
+if (x86ms->igvm) {
+if (IGVM_CFG_GET_CLASS(x86ms->igvm)
+->process(x86ms->igvm, machine->cgs, &error_fatal) < 0) {
+g_assert_not_reached();
+}
+}
+#endif
}

typedef enum PCSouthBridgeOption {
diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
index 8319b6d45e..483e0a0a40 100644
--- a/hw/i386/pc_q35.c
+++ b/hw/i386/pc_q35.c
@@ -324,6 +324,16 @@ static void pc_q35_init(MachineState *machine)
   x86_nvdimm_acpi_dsmio,
   x86ms->fw_cfg, OBJECT(pcms));
}
+
+#if defined(CONFIG_IGVM)
+/* Apply guest state from IGVM if supplied */
+if (x86ms->igvm) {
+if (IGVM_CFG_GET_CLASS(x86ms->igvm)
+->process(x86ms->igvm, machine->cgs, &error_fatal) < 0) {
+g_assert_not_reached();
+}
+}
+#endif
}

#define DEFINE_Q35_MACHINE(major, minor) \
diff --git a/include/hw/i386/x86.h b/include/hw/i386/x86.h
index d43cb3908e..01ac29acf6 100644
--- a/include/hw/i386/x86.h
+++ b/include/hw/i386/x86.h
@@ -25,6 +25,7 @@
#include "hw/intc/ioapic.h"
#include "hw/isa/isa.h"
#include "qom/object.h"
+#include "sysemu/igvm-cfg.h"

struct X86MachineClass {
/*< private >*/
@@ -97,6 +98,8 @@ struct X86MachineState {
 * which means no limitation on the guest's bus locks.
 */
uint64_t bus_lock_ratelimit;
+
+IgvmCfg *igvm;
};

#define X86_MACHINE_SMM  "smm"
diff --git a/qemu-options.hx b/qemu-options.hx
index d94e2cbbae..66292c160b 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -5927,6 +5927,34 @@ SRST
 -machine ...,memory-encryption=sev0 \\
 .

+``-object igvm-cfg,file=file``
+Create an IGVM configuration object that defines the initial state
+of the guest using a file in that conforms to the Independent Guest
+Virtual Machine (IGVM) file format.
+
+This is currently only supported by ``-machine q35`` and
+``-machine pc``.
+
+The ``file`` parameter is used to specify the IGVM file to load.
+When provided, the IGVM file is used to populate the initial
+memory of the virtual machine and, depending on the platform, can
+define the initial processor state, memory map and parameters.
+
+The IGVM file is expected to contain the firmware for the virtual
+machine, therefore an ``igvm-cfg`` object cannot be provided along
+with other ways of specifying firmware, such as the ``-bios``
+parameter on x86 machines.
+
+e.g to launch a machine providing the firmware in an IGVM file
+
+.. parsed-literal::
+
+ # |qemu

Re: [PATCH v3 18/22] hw/virtio: fix -Werror=maybe-uninitialized

2024-09-30 Thread Stefano Garzarella


On Mon, Sep 30, 2024 at 12:14:53PM GMT, marcandre.lur...@redhat.com wrote:

From: Marc-André Lureau 

../hw/virtio/vhost-shadow-virtqueue.c:545:13: error: ‘r’ may be used 
uninitialized [-Werror=maybe-uninitialized]

Set `r` to 0 at every loop, since we don't check vhost_svq_get_buf()
return value.

Signed-off-by: Marc-André Lureau 
---
hw/virtio/vhost-shadow-virtqueue.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)


Reviewed-by: Stefano Garzarella 



diff --git a/hw/virtio/vhost-shadow-virtqueue.c 
b/hw/virtio/vhost-shadow-virtqueue.c
index fc5f408f77..3b2beaea24 100644
--- a/hw/virtio/vhost-shadow-virtqueue.c
+++ b/hw/virtio/vhost-shadow-virtqueue.c
@@ -526,10 +526,10 @@ static void vhost_svq_flush(VhostShadowVirtqueue *svq,
size_t vhost_svq_poll(VhostShadowVirtqueue *svq, size_t num)
{
size_t len = 0;
-uint32_t r;

while (num--) {
int64_t start_us = g_get_monotonic_time();
+uint32_t r = 0;

do {
if (vhost_svq_more_used(svq)) {
--
2.45.2.827.g557ae147e6

Re: [PATCH v2 18/22] hw/virtio: fix -Werror=maybe-uninitialized false-positive

2024-09-30 Thread Stefano Garzarella

On Fri, Sep 27, 2024 at 3:08 PM Eugenio Perez Martin
 wrote:
>
> On Wed, Sep 25, 2024 at 10:08 AM Stefano Garzarella  
> wrote:
> >
> > On Tue, Sep 24, 2024 at 05:05:49PM GMT, marcandre.lur...@redhat.com wrote:
> > >From: Marc-André Lureau 
> >
> > For the title: I don't think it is a false positive, but a real fix,
> > indeed maybe not a complete one.
> >
> > >
> > >../hw/virtio/vhost-shadow-virtqueue.c:545:13: error: ‘r’ may be used 
> > >uninitialized [-Werror=maybe-uninitialized]
> > >
> > >Signed-off-by: Marc-André Lureau 
> > >---
> > > hw/virtio/vhost-shadow-virtqueue.c | 2 +-
> > > 1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > >diff --git a/hw/virtio/vhost-shadow-virtqueue.c 
> > >b/hw/virtio/vhost-shadow-virtqueue.c
> > >index fc5f408f77..cd29cc795b 100644
> > >--- a/hw/virtio/vhost-shadow-virtqueue.c
> > >+++ b/hw/virtio/vhost-shadow-virtqueue.c
> > >@@ -526,7 +526,7 @@ static void vhost_svq_flush(VhostShadowVirtqueue *svq,
> > > size_t vhost_svq_poll(VhostShadowVirtqueue *svq, size_t num)
> > > {
> > > size_t len = 0;
> > >-uint32_t r;
> > >+uint32_t r = 0;
> > >
> > > while (num--) {
> >
> > I think we should move the initialization to 0 here in the loop:
> >
> >uint32_t r = 0;
> >
> > > int64_t start_us = g_get_monotonic_time();
> >
> > ...
> >
> >vhost_svq_get_buf(svq, &r);
> >len += r;
> >}
> >
> > This because we don't check vhost_svq_get_buf() return value.
> >
> > IIUC, in that function, `r` is set only if the return value of
> > vhost_svq_get_buf() is not null, so if we don't check its return value,
> > we should set `r` to 0 on every cycle (or check the return value of
> > course).
> >
>
> Sorry I missed this mail and I proposed the same :). I do think it is
> a real false positive though, in the sense that if we embed the
> vhost_svq_get_buf here the warning would go away.

I don't think so, I mean if we embed it and check the error path
better, yes, but now in vhost_svq_get_buf() if we fail, we return
NULL, but we don't set "len" to 0, so we would have the same warning.

Thanks,
Stefano

>
> But I understand it is better to change this function than trust the
> reviews long term.
>

Re: [PATCH v2 18/22] hw/virtio: fix -Werror=maybe-uninitialized false-positive

2024-09-30 Thread Stefano Garzarella

On Fri, Sep 27, 2024 at 3:05 PM Eugenio Perez Martin
 wrote:
>
> On Tue, Sep 24, 2024 at 3:07 PM  wrote:
> >
> > From: Marc-André Lureau 
> >
> > ../hw/virtio/vhost-shadow-virtqueue.c:545:13: error: ‘r’ may be used 
> > uninitialized [-Werror=maybe-uninitialized]
> >
> > Signed-off-by: Marc-André Lureau 
> > ---
> >  hw/virtio/vhost-shadow-virtqueue.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/hw/virtio/vhost-shadow-virtqueue.c 
> > b/hw/virtio/vhost-shadow-virtqueue.c
> > index fc5f408f77..cd29cc795b 100644
> > --- a/hw/virtio/vhost-shadow-virtqueue.c
> > +++ b/hw/virtio/vhost-shadow-virtqueue.c
> > @@ -526,7 +526,7 @@ static void vhost_svq_flush(VhostShadowVirtqueue *svq,
> >  size_t vhost_svq_poll(VhostShadowVirtqueue *svq, size_t num)
> >  {
> >  size_t len = 0;
> > -uint32_t r;
> > +uint32_t r = 0;
> >
>
> I understand this is a bulk changeset to avoid the warning, but does
> this mean we cannot use pointer arguments to just return information
> anymore? vhost_svq_get_buf just write to it, it never reads it.

Sure we can, the problem here is that vhost_svq_get_buf() might return
without having written there (in the error path).

>
> If you post a second version and it is convenient for you, it would be
> useful to move it inside of the while.

I think it is the only way, if we keep it out we have the problem from
the second loop on (always in the error path).

>
> Any way we solve it,
>
> Acked-by: Eugenio Pérez 
>
> >  while (num--) {
> >  int64_t start_us = g_get_monotonic_time();
> > --
> > 2.45.2.827.g557ae147e6
> >
>

Re: [PATCH v2 22/22] RFC: hw/virtio: a potential leak fix

2024-09-25 Thread Stefano Garzarella


On Tue, Sep 24, 2024 at 05:05:53PM GMT, marcandre.lur...@redhat.com wrote:

From: Marc-André Lureau 

vhost_svq_get_buf() may return a VirtQueueElement that should be freed.

It's unclear to me if the vhost_svq_get_buf() call should always return NULL.

Signed-off-by: Marc-André Lureau 
---
hw/virtio/vhost-shadow-virtqueue.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/hw/virtio/vhost-shadow-virtqueue.c 
b/hw/virtio/vhost-shadow-virtqueue.c
index cd29cc795b..93742d9ddc 100644
--- a/hw/virtio/vhost-shadow-virtqueue.c
+++ b/hw/virtio/vhost-shadow-virtqueue.c
@@ -414,6 +414,7 @@ static uint16_t vhost_svq_last_desc_of_chain(const 
VhostShadowVirtqueue *svq,
return i;
}

+G_GNUC_WARN_UNUSED_RESULT
static VirtQueueElement *vhost_svq_get_buf(VhostShadowVirtqueue *svq,
   uint32_t *len)
{
@@ -529,6 +530,7 @@ size_t vhost_svq_poll(VhostShadowVirtqueue *svq, size_t num)
uint32_t r = 0;

while (num--) {
+g_autofree VirtQueueElement *elem = NULL;


Yes, indeed it sounds like we should release the buffer, although from
the name of the function here, it sounds like we are just trying to
figure out if the queue has elements, so I expect there is another
function that is then called to process the buffers.

There's still a potential problem here that I pointed out in the other
patch, but I think we need Eugenio here.


int64_t start_us = g_get_monotonic_time();

do {
@@ -541,7 +543,7 @@ size_t vhost_svq_poll(VhostShadowVirtqueue *svq, size_t num)
}
} while (true);

-vhost_svq_get_buf(svq, &r);
+elem = vhost_svq_get_buf(svq, &r);
len += r;
}

--
2.45.2.827.g557ae147e6

Re: [PATCH v2 18/22] hw/virtio: fix -Werror=maybe-uninitialized false-positive

2024-09-25 Thread Stefano Garzarella


On Tue, Sep 24, 2024 at 05:05:49PM GMT, marcandre.lur...@redhat.com wrote:

From: Marc-André Lureau 


For the title: I don't think it is a false positive, but a real fix,
indeed maybe not a complete one.



../hw/virtio/vhost-shadow-virtqueue.c:545:13: error: ‘r’ may be used 
uninitialized [-Werror=maybe-uninitialized]

Signed-off-by: Marc-André Lureau 
---
hw/virtio/vhost-shadow-virtqueue.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/virtio/vhost-shadow-virtqueue.c 
b/hw/virtio/vhost-shadow-virtqueue.c
index fc5f408f77..cd29cc795b 100644
--- a/hw/virtio/vhost-shadow-virtqueue.c
+++ b/hw/virtio/vhost-shadow-virtqueue.c
@@ -526,7 +526,7 @@ static void vhost_svq_flush(VhostShadowVirtqueue *svq,
size_t vhost_svq_poll(VhostShadowVirtqueue *svq, size_t num)
{
size_t len = 0;
-uint32_t r;
+uint32_t r = 0;

while (num--) {


I think we should move the initialization to 0 here in the loop:

  uint32_t r = 0;


int64_t start_us = g_get_monotonic_time();


...

  vhost_svq_get_buf(svq, &r);
  len += r;
  }

This because we don't check vhost_svq_get_buf() return value.

IIUC, in that function, `r` is set only if the return value of
vhost_svq_get_buf() is not null, so if we don't check its return value,
we should set `r` to 0 on every cycle (or check the return value of
course).

Thanks,
Stefano

Re: [PATCH v2 13/22] hw/virtio-blk: fix -Werror=maybe-uninitialized false-positive

2024-09-25 Thread Stefano Garzarella


On Tue, Sep 24, 2024 at 05:05:44PM GMT, marcandre.lur...@redhat.com wrote:

From: Marc-André Lureau 

../hw/block/virtio-blk.c:1212:12: error: ‘rq’ may be used uninitialized 
[-Werror=maybe-uninitialized]

Signed-off-by: Marc-André Lureau 
Reviewed-by: Stefan Hajnoczi 
---
hw/block/virtio-blk.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)


Reviewed-by: Stefano Garzarella 



diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index 115795392c..9166d7974d 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -1060,7 +1060,7 @@ static void virtio_blk_dma_restart_cb(void *opaque, bool 
running,
VirtIOBlock *s = opaque;
uint16_t num_queues = s->conf.num_queues;
g_autofree VirtIOBlockReq **vq_rq = NULL;
-VirtIOBlockReq *rq;
+VirtIOBlockReq *rq = NULL;

if (!running) {
return;
--
2.45.2.827.g557ae147e6

Re: [PATCH v2 09/22] hw/vhost-scsi: fix -Werror=maybe-uninitialized

2024-09-25 Thread Stefano Garzarella


On Tue, Sep 24, 2024 at 05:05:40PM GMT, marcandre.lur...@redhat.com wrote:

From: Marc-André Lureau 

../hw/scsi/vhost-scsi.c:173:12: error: ‘ret’ may be used uninitialized 
[-Werror=maybe-uninitialized]

It can be reached when num_queues=0. It probably doesn't make much sense
to instantiate a vhost-scsi with 0 IO queues though. For now, make
vhost_scsi_set_workers() return success/0 anyway, when no workers have
been setup.


I agree, for vhost_scsi_set_workers() point of view, it doesn't need to
add a new worker in that case, so it should be fine to return 0.

If we really want to fail when num_queues=0, maybe it should be in
vhost_scsi_realize().



Signed-off-by: Marc-André Lureau 
---
hw/scsi/vhost-scsi.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/scsi/vhost-scsi.c b/hw/scsi/vhost-scsi.c
index 49cff2a0cb..22d16dc26b 100644
--- a/hw/scsi/vhost-scsi.c
+++ b/hw/scsi/vhost-scsi.c
@@ -172,7 +172,7 @@ static int vhost_scsi_set_workers(VHostSCSICommon *vsc, 
bool per_virtqueue)
struct vhost_dev *dev = &vsc->dev;
struct vhost_vring_worker vq_worker;
struct vhost_worker_state worker;
-int i, ret;
+int i, ret = 0;

/* Use default worker */
if (!per_virtqueue || dev->nvqs == VHOST_SCSI_VQ_NUM_FIXED + 1) {


Another option could have been to edit this check:
  if (!per_virtqueue || dev->nvqs <= VHOST_SCSI_VQ_NUM_FIXED + 1) {
  return 0;
  }

But I'm fine with your change:

Reviewed-by: Stefano Garzarella

Re: [PATCH V2 1/1] virtio-pci: Add lookup subregion of VirtIOPCIRegion MR

2024-09-03 Thread Stefano Garzarella

On Thu, Aug 29, 2024 at 01:13:43PM GMT, Gao,Shiyuan wrote:

>--- a/hw/virtio/virtio-pci.c
>+++ b/hw/virtio/virtio-pci.c
>@@ -610,19 +610,29 @@ static MemoryRegion 
*virtio_address_space_lookup(VirtIOPCIProxy *proxy,
> {
> int i;
> VirtIOPCIRegion *reg;
>+    MemoryRegion *mr = NULL;

`mr` looks unused.

>+    MemoryRegionSection mrs;

Please, can you move this declaration in the inner block where it's
used?

ok, I will move to inner block and remove unused `mr`.

>
> for (i = 0; i < ARRAY_SIZE(proxy->regs); ++i) {
> reg = &proxy->regs[i];
> if (*off >= reg->offset &&
> *off + len <= reg->offset + reg->size) {
>-    *off -= reg->offset;
>-    return ®->mr;
>+    mrs = memory_region_find(®->mr, *off - reg->offset,
>len);
>+    if (!mrs.mr) {
>+    error_report("Failed to find memory region for address"
>+ "0x%" PRIx64 "", *off);
>+    return NULL;
>+    }
>+    *off = mrs.offset_within_region;
>+    memory_region_unref(mrs.mr);
>+    return mrs.mr;
> }
> }
>
> return NULL;
> }
>
>+

Unrelated change.

Perhaps this would be clearer but not universal in Version 1.

Without this patch, Only lookup common/isr/device/notify MR and
exclude their subregions.

When VHOST_USER_PROTOCOL_F_HOST_NOTIFIER enable, the notify MR has
host-notifier subregions and we need use host-notifier MR to
notify the hardware accelerator directly.

Further more, maybe common/isr/device MR also has subregions in
the future, so need memory_region_find for each MR incluing
their subregions and this will be more universal.

I see, I don't have much experience with this, but what you say I think
makes sense. I would wait for a comment from Michael or Jason.

Thanks,
Stefano

@@ -610,13 +610,22 @@ static MemoryRegion 
*virtio_address_space_lookup(VirtIOPCIProxy *proxy,
{
int i;
VirtIOPCIRegion *reg;
+MemoryRegion *mr, *submr;

for (i = 0; i < ARRAY_SIZE(proxy->regs); ++i) {
reg = &proxy->regs[i];
if (*off >= reg->offset &&
*off + len <= reg->offset + reg->size) {
*off -= reg->offset;
-return ®->mr;
+mr = ®->mr;
+QTAILQ_FOREACH(submr, &mr->subregions, subregions_link) {
+if (*off >= submr->addr &&
+*off + len < submr->addr + submr->size) {
+*off -= submr->addr;
+return submr;
+}
+}
+return mr;
}
}

Re: [PATCH] virtio/vhost-user: fix qemu crash when hotunplug vhost-user-net device

2024-09-03 Thread Stefano Garzarella

On Wed, Aug 28, 2024 at 02:50:57PM GMT, Zhenguo Yao wrote:

I am very sorry that my previous description was not accurate. Below I
will describe the steps to reproduce this problem and my analysis in
detail.The conditions for reproducing this problem are quite
demanding. First, let me introduce my environment. I use DPDK vdpa to
drive
DPU to implement a vhost-user type network card. DPDK vdpa software
communicates with QEMU through the vhost protocol. The steps to
reproduce are as follows:

1. Start a Windows virtual machine.
2. Use the vish command to hotplug and unplug a 32 queues
vhost-user-net network card.
3. After a while, QEMU will crash.

I added some logs to locate this problem. The following is how I
located the root cause of this problem based on logs and code
analysis. The steps to reproduce the problem involve hot plugging and
hot unplugging of network cards. Hot plugging of network cards
involves two processes. Process A is to insert the device into qemu,
and process B is the process of guest operating system initializing
the network card. When process A is completed, the virsh attach-device
command returns. At this time, you can call virsh detach-device to
perform hot unplug operations. Generally, this will not be a problem
because the network card initializes very quickly. However, since my
network card is a vhost-user type network card, and it is implemented
by DPDK vdpa, there is an asynchronous situation. When the Guest
operating system is initializing the network card, some messages from
vhost-user returned to QEMU may be delayed. The problem occurs here.

For example, the message VHOST_USER_BACKEND_VRING_HOST_NOTIFIER_MSG is
responsible for assigning the value of VhostUserHostNotifier->addr,
which is the hot plug process B. There are also two processes for hot
unplugging devices. libvirt will send two QMP commands to qemu: one is
device_del and the other is netdev_del. If this message arrives after
the first hot unplug command, there will be problems. The following is
a detailed analysis: The key function of the device_del command
execution: virtio_set_status->vhost_dev_stop. In the vhost_dev_stop
function, all queues will be traversed and
vhost_user_get_vring_base-->vhost_user_host_notifier_remove will be
called because VHOST_USER_BACKEND_VRING_HOST_NOTIFIER_MSG has not
arrived in some queues at this time, so VhostUserHostNotifier->addr
has no value at this time, so nothing will be done at this time. For
those with values, vhost_user_host_notifier_free will be submitted to
the rcu thread
and clear n->addr.

Next, libvirt will send the netdev_del QMP command.
netdev_del--> vhost_user_cleanup

Because some queues VhostUserHostNotifier->addr are not empty at this
time, there will be the following call path:

1. vhost_user_host_notifier_remove->call_rcu(n,
vhost_user_host_notifier_free, rcu);
2. g_free_rcu(n, rcu);

Got it, so call_rcu() and g_free_rcu() must be avoided on the same node
in any case, right?

I went through docs/devel/rcu.txt and code, it's not explicit, but it
seems clear that only one should be used for cleanup.

The same rcu was submitted twice. In call_rcu1, if two identical nodes
are added to the node list, only one will be successfully added, but
rcu_call_count will be added twice. When rcu processes rcu node, it
will check whether rcu_call_count is empty. If not, it will take the
node from node list. Because the actual node in node list is one less
than rcu_call_count, it will cause rcu_call_count to be not empty, but
there is no node in node list. In this way, rcu thread will abort, the
code is as follows:
if (head == &dummy && qatomic_mb_read(&tail) == &dummy.next) {
   abort();
}
This is the reason why QEMU crashed. Since the latest QEMU cannot run
in our environment due to incompatibility of vhost-user messages, and
this problem is difficult to reproduce, I was unable to reproduce the
problem on the latest qemu. However, from the upstream code, since
VHOST_USER_BACKEND_VRING_HOST_NOTIFIER_MSG is asynchronous, we cannot
assume that all VhostUserHostNotifier->addr in vhost_user_cleanup are
empty. If addr is not empty, there will be a problem. Otherwise,
vhost_user_host_notifier_remove is not necessary to call in
vhost_user_cleanup.

Therefore, the two tasks of vhost_user_host_notifier_free and free
VhostUserHostNotifier need to be placed in a rcu node, and use
n->need_free to determine whether it is free VhostUserHostNotifier.

yeah, thank you for this detailed analysis, now it's clear.

My suggestion is to put some of it in the commit description as well,
maybe a summary with the main things.

Stefano Garzarella  于2024年8月27日周二 16:00写道：

On Wed, Aug 07, 2024 at 05:55:08PM GMT, yaozhenguo wrote:
>When hotplug and hotunplug vhost-user-net device quickly.

I'd replace the . with ,

>qemu will crash. BT is as below:
>
>0  __pthread_kill_implementation () at /usr/lib64/libc.so.6

Re: [PATCH v5 00/16] Introduce support for IGVM files

2024-09-02 Thread Stefano Garzarella


On Tue, Aug 13, 2024 at 04:01:02PM GMT, Roy Hopkins wrote:

Here is v5 of the set of patches to add support for IGVM files to QEMU. This is
based on commit 0f397dcfec of qemu.

This version addresses the review comments from v4 [1] plus changes required to
rebase on the master commit. As always, thanks to those that have been following
along, reviewing and testing this series. This v5 patch series is also available
on github: [2]

For testing IGVM support in QEMU you need to generate an IGVM file that is
configured for the platform you want to launch. You can use the `buildigvm`
test tool [3] to allow generation of IGVM files for all currently supported
platforms. Patch 11/17 contains information on how to generate an IGVM file
using this tool.


I left some minor comments, the patches I didn't respond to are too much
in detail for my knowledge, but I looked at them and I didn't find
anything obviously wrong, so for those feel free to add:

Acked-by: Stefano Garzarella 

Thanks,
Stefano



Changes in v5:

* Fix indentation and apply minimum version check for IGVM library in 
meson.build
* Remove unneeded duplicate macro definitions in confidential-guest-support.h
 and igvm-cvg.h
* Make igvm-cfg object file parameter mandatory instead of optional. Removed
 unused 'igvm_process()' function that checked the file was provided.
* Rename all QEMU IGVM functions and structs using QIGVM/qigvm prefix.
* A few small readability/style fixes.
* Address review comments on error handling, including removal of the v4 patch
 6: "Fix error handling in sev_encrypt_flash()".
* Update `FirmwareMapping` union in firmware.json to include `igvm`.

Patch summary:

1-11: Add support and documentation for processing IGVM files for SEV, SEV-ES,
SEV-SNP and native platforms.

12-15: Processing of policy and SEV-SNP ID_BLOCK from IGVM file.

16: Add pre-processing of IGVM file to support synchronization of 'SEV_FEATURES'
from IGVM VMSA to KVM.

[1] Link to v4:
https://lore.kernel.org/qemu-devel/cover.1720004383.git.roy.hopk...@suse.com/

[2] v5 patches also available here:
https://github.com/roy-hopkins/qemu/tree/igvm_master_v5

[3] `buildigvm` tool v0.2.0
https://github.com/roy-hopkins/buildigvm/releases/tag/v0.2.0

Roy Hopkins (16):
 meson: Add optional dependency on IGVM library
 backends/confidential-guest-support: Add functions to support IGVM
 backends/igvm: Add IGVM loader and configuration
 hw/i386: Add igvm-cfg object and processing for IGVM files
 i386/pc_sysfw: Ensure sysfw flash configuration does not conflict with
   IGVM
 sev: Update launch_update_data functions to use Error handling
 target/i386: Allow setting of R_LDTR and R_TR with
   cpu_x86_load_seg_cache()
 i386/sev: Refactor setting of reset vector and initial CPU state
 i386/sev: Implement ConfidentialGuestSupport functions for SEV
 docs/system: Add documentation on support for IGVM
 docs/interop/firmware.json: Add igvm to FirmwareDevice
 backends/confidential-guest-support: Add set_guest_policy() function
 backends/igvm: Process initialization sections in IGVM file
 backends/igvm: Handle policy for SEV guests
 i386/sev: Add implementation of CGS set_guest_policy()
 sev: Provide sev_features flags from IGVM VMSA to KVM_SEV_INIT2

backends/confidential-guest-support.c  |  43 +
backends/igvm-cfg.c|  52 ++
backends/igvm.c| 964 +
backends/igvm.h|  23 +
backends/meson.build   |   5 +
docs/interop/firmware.json |  30 +-
docs/system/i386/amd-memory-encryption.rst |   2 +
docs/system/igvm.rst   | 173 
docs/system/index.rst  |   1 +
hw/i386/pc.c   |  12 +
hw/i386/pc_piix.c  |  10 +
hw/i386/pc_q35.c   |  10 +
hw/i386/pc_sysfw.c |  31 +-
include/exec/confidential-guest-support.h  |  86 ++
include/hw/i386/x86.h  |   3 +
include/sysemu/igvm-cfg.h  |  47 +
meson.build|   8 +
meson_options.txt  |   2 +
qapi/qom.json  |  17 +
qemu-options.hx|  25 +
scripts/meson-buildoptions.sh  |   3 +
target/i386/cpu.h  |   9 +-
target/i386/sev.c  | 850 --
target/i386/sev.h  | 124 +++
24 files changed, 2446 insertions(+), 84 deletions(-)
create mode 100644 backends/igvm-cfg.c
create mode 100644 backends/igvm.c
create mode 100644 backends/igvm.h
create mode 100644 docs/system/igvm.rst
create mode 100644 include/sysemu/igvm-cfg.h

--
2.43.0

Re: [PATCH v5 13/16] backends/igvm: Process initialization sections in IGVM file

2024-09-02 Thread Stefano Garzarella


On Tue, Aug 13, 2024 at 04:01:15PM GMT, Roy Hopkins wrote:

The initialization sections in IGVM files contain configuration that
should be applied to the guest platform before it is started. This
includes guest policy and other information that can affect the security
level and the startup measurement of a guest.

This commit introduces handling of the initialization sections during
processing of the IGVM file.

Signed-off-by: Roy Hopkins 
Acked-by: Michael S. Tsirkin 
---
backends/igvm.c | 21 +
1 file changed, 21 insertions(+)


Reviewed-by: Stefano Garzarella 



diff --git a/backends/igvm.c b/backends/igvm.c
index 7a3fedcc76..9120922a95 100644
--- a/backends/igvm.c
+++ b/backends/igvm.c
@@ -787,6 +787,27 @@ int qigvm_process_file(IgvmCfg *cfg, 
ConfidentialGuestSupport *cgs,
}
}

+header_count =
+igvm_header_count(ctx.file, IGVM_HEADER_SECTION_INITIALIZATION);
+if (header_count < 0) {
+error_setg(
+errp,
+"Invalid initialization header count in IGVM file. Error code: %X",
+header_count);
+return -1;
+}
+
+for (ctx.current_header_index = 0;
+ ctx.current_header_index < (unsigned)header_count;
+ ctx.current_header_index++) {
+IgvmVariableHeaderType type =
+igvm_get_header_type(ctx.file, IGVM_HEADER_SECTION_INITIALIZATION,
+ ctx.current_header_index);
+if (qigvm_handler(&ctx, type, errp) < 0) {
+goto cleanup;
+}
+}
+
/*
 * Contiguous pages of data with compatible flags are grouped together in
 * order to reduce the number of memory regions we create. Make sure the
--
2.43.0

Re: [PATCH v5 12/16] backends/confidential-guest-support: Add set_guest_policy() function

2024-09-02 Thread Stefano Garzarella


On Tue, Aug 13, 2024 at 04:01:14PM GMT, Roy Hopkins wrote:

For confidential guests a policy can be provided that defines the
security level, debug status, expected launch measurement and other
parameters that define the configuration of the confidential platform.

This commit adds a new function named set_guest_policy() that can be
implemented by each confidential platform, such as AMD SEV to set the
policy. This will allow configuration of the policy from a
multi-platform resource such as an IGVM file without the IGVM processor
requiring specific implementation details for each platform.

Signed-off-by: Roy Hopkins 
Reviewed-by: Daniel P. Berrangé 
Acked-by: Michael S. Tsirkin 
---
backends/confidential-guest-support.c | 12 
include/exec/confidential-guest-support.h | 21 +
2 files changed, 33 insertions(+)


Reviewed-by: Stefano Garzarella 



diff --git a/backends/confidential-guest-support.c 
b/backends/confidential-guest-support.c
index 68e6fd9d18..3c46b2cd6b 100644
--- a/backends/confidential-guest-support.c
+++ b/backends/confidential-guest-support.c
@@ -38,6 +38,17 @@ static int set_guest_state(hwaddr gpa, uint8_t *ptr, 
uint64_t len,
return -1;
}

+static int set_guest_policy(ConfidentialGuestPolicyType policy_type,
+uint64_t policy,
+void *policy_data1, uint32_t policy_data1_size,
+void *policy_data2, uint32_t policy_data2_size,
+Error **errp)
+{
+error_setg(errp,
+   "Setting confidential guest policy is not supported for this 
platform");
+return -1;
+}
+
static int get_mem_map_entry(int index, ConfidentialGuestMemoryMapEntry *entry,
 Error **errp)
{
@@ -52,6 +63,7 @@ static void confidential_guest_support_class_init(ObjectClass 
*oc, void *data)
ConfidentialGuestSupportClass *cgsc = CONFIDENTIAL_GUEST_SUPPORT_CLASS(oc);
cgsc->check_support = check_support;
cgsc->set_guest_state = set_guest_state;
+cgsc->set_guest_policy = set_guest_policy;
cgsc->get_mem_map_entry = get_mem_map_entry;
}

diff --git a/include/exec/confidential-guest-support.h 
b/include/exec/confidential-guest-support.h
index 058c7535ca..6a9ccc2454 100644
--- a/include/exec/confidential-guest-support.h
+++ b/include/exec/confidential-guest-support.h
@@ -59,6 +59,10 @@ typedef enum ConfidentialGuestPageType {
CGS_PAGE_TYPE_REQUIRED_MEMORY,
} ConfidentialGuestPageType;

+typedef enum ConfidentialGuestPolicyType {
+GUEST_POLICY_SEV,
+} ConfidentialGuestPolicyType;
+
struct ConfidentialGuestSupport {
Object parent;

@@ -123,6 +127,23 @@ typedef struct ConfidentialGuestSupportClass {
   ConfidentialGuestPageType memory_type,
   uint16_t cpu_index, Error **errp);

+/*
+ * Set the guest policy. The policy can be used to configure the
+ * confidential platform, such as if debug is enabled or not and can 
contain
+ * information about expected launch measurements, signed verification of
+ * guest configuration and other platform data.
+ *
+ * The format of the policy data is specific to each platform. For example,
+ * SEV-SNP uses a policy bitfield in the 'policy' argument and provides an
+ * ID block and ID authentication in the 'policy_data' parameters. The type
+ * of policy data is identified by the 'policy_type' argument.
+ */
+int (*set_guest_policy)(ConfidentialGuestPolicyType policy_type,
+uint64_t policy,
+void *policy_data1, uint32_t policy_data1_size,
+void *policy_data2, uint32_t policy_data2_size,
+Error **errp);
+
/*
 * Iterate the system memory map, getting the entry with the given index
 * that can be populated into guest memory.
--
2.43.0

Re: [PATCH v5 11/16] docs/interop/firmware.json: Add igvm to FirmwareDevice

2024-09-02 Thread Stefano Garzarella


On Tue, Aug 13, 2024 at 04:01:13PM GMT, Roy Hopkins wrote:

Create an enum entry within FirmwareDevice for 'igvm' to describe that
an IGVM file can be used to map firmware into memory as an alternative
to pre-existing firmware devices.

Signed-off-by: Roy Hopkins 
Acked-by: Michael S. Tsirkin 
---
docs/interop/firmware.json | 30 --
1 file changed, 28 insertions(+), 2 deletions(-)

diff --git a/docs/interop/firmware.json b/docs/interop/firmware.json
index 57f55f6c54..59ae39cb13 100644
--- a/docs/interop/firmware.json
+++ b/docs/interop/firmware.json
@@ -57,10 +57,17 @@
#
# @memory: The firmware is to be mapped into memory.
#
+# @igvm: The firmware is defined by a file conforming to the IGVM
+#specification and mapped into memory according to directives
+#defined in the file. This is similar to @memory but may
+#include additional processing defined by the IGVM file
+#including initial CPU state or population of metadata into
+#the guest address space. Since: 9.1


Since: 9.2


+#
# Since: 3.0
##
{ 'enum' : 'FirmwareDevice',
-  'data' : [ 'flash', 'kernel', 'memory' ] }
+  'data' : [ 'flash', 'kernel', 'memory', 'igvm' ] }

##
# @FirmwareArchitecture:
@@ -367,6 +374,24 @@
{ 'struct' : 'FirmwareMappingMemory',
  'data'   : { 'filename' : 'str' } }

+##
+# @FirmwareMappingIgvm:
+#
+# Describes loading and mapping properties for the firmware executable,
+# when @FirmwareDevice is @igvm.
+#
+# @filename: Identifies the IGVM file containing the firmware executable
+#along with other information used to configure the initial
+#state of the guest. The IGVM file may be shared by multiple
+#virtual machine definitions. This corresponds to creating
+#    an object on the command line with "-object igvm-cfg,
+#file=@filename".
+#
+# Since: 9.1


Ditto

With them fixed:

Reviewed-by: Stefano Garzarella 


+##
+{ 'struct' : 'FirmwareMappingIgvm',
+  'data'   : { 'filename' : 'str' } }
+
##
# @FirmwareMapping:
#
@@ -383,7 +408,8 @@
  'discriminator' : 'device',
  'data'  : { 'flash'  : 'FirmwareMappingFlash',
  'kernel' : 'FirmwareMappingKernel',
-  'memory' : 'FirmwareMappingMemory' } }
+  'memory' : 'FirmwareMappingMemory',
+  'igvm'   : 'FirmwareMappingIgvm' } }

##
# @Firmware:
--
2.43.0

Re: [PATCH v5 07/16] target/i386: Allow setting of R_LDTR and R_TR with cpu_x86_load_seg_cache()

2024-09-02 Thread Stefano Garzarella


On Tue, Aug 13, 2024 at 04:01:09PM GMT, Roy Hopkins wrote:

The x86 segment registers are identified by the X86Seg enumeration which
includes LDTR and TR as well as the normal segment registers. The
function 'cpu_x86_load_seg_cache()' uses the enum to determine which
segment to set. However, specifying R_LDTR or R_TR results in an
out-of-bounds access of the segment array.

Possibly by coincidence, the function does correctly set LDTR or TR in
this case as the structures for these registers immediately follow the
array which is accessed out of bounds.

This patch adds correct handling for R_LDTR and R_TR in the function.

Signed-off-by: Roy Hopkins 
Reviewed-by: Michael S. Tsirkin 
---
target/i386/cpu.h | 9 -
1 file changed, 8 insertions(+), 1 deletion(-)


Reviewed-by: Stefano Garzarella 



diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index c6cc035df3..227bf2600a 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -2256,7 +2256,14 @@ static inline void cpu_x86_load_seg_cache(CPUX86State 
*env,
SegmentCache *sc;
unsigned int new_hflags;

-sc = &env->segs[seg_reg];
+if (seg_reg == R_LDTR) {
+sc = &env->ldt;
+} else if (seg_reg == R_TR) {
+sc = &env->tr;
+} else {
+sc = &env->segs[seg_reg];
+}
+
sc->selector = selector;
sc->base = base;
sc->limit = limit;
--
2.43.0

Re: [PATCH v5 06/16] sev: Update launch_update_data functions to use Error handling

2024-09-02 Thread Stefano Garzarella


On Tue, Aug 13, 2024 at 04:01:08PM GMT, Roy Hopkins wrote:

The class function and implementations for updating launch data return
a code in case of error. In some cases an error message is generated and
in other cases, just the error return value is used.

This small refactor adds an 'Error **errp' parameter to all functions
which consistently set an error condition if a non-zero value is
returned.

Signed-off-by: Roy Hopkins 
Acked-by: Michael S. Tsirkin 
---
target/i386/sev.c | 68 +++
1 file changed, 33 insertions(+), 35 deletions(-)


Reviewed-by: Stefano Garzarella 



diff --git a/target/i386/sev.c b/target/i386/sev.c
index a0d271f898..fab6d1bfb4 100644
--- a/target/i386/sev.c
+++ b/target/i386/sev.c
@@ -121,7 +121,8 @@ struct SevCommonStateClass {
   Error **errp);
int (*launch_start)(SevCommonState *sev_common);
void (*launch_finish)(SevCommonState *sev_common);
-int (*launch_update_data)(SevCommonState *sev_common, hwaddr gpa, uint8_t 
*ptr, size_t len);
+int (*launch_update_data)(SevCommonState *sev_common, hwaddr gpa,
+  uint8_t *ptr, size_t len, Error **errp);
int (*kvm_init)(ConfidentialGuestSupport *cgs, Error **errp);
};

@@ -977,9 +978,8 @@ sev_snp_mask_cpuid_features(X86ConfidentialGuest *cg, 
uint32_t feature, uint32_t
return value;
}

-static int
-sev_launch_update_data(SevCommonState *sev_common, hwaddr gpa,
-   uint8_t *addr, size_t len)
+static int sev_launch_update_data(SevCommonState *sev_common, hwaddr gpa,
+  uint8_t *addr, size_t len, Error **errp)
{
int ret, fw_error;
struct kvm_sev_launch_update_data update;
@@ -994,8 +994,8 @@ sev_launch_update_data(SevCommonState *sev_common, hwaddr 
gpa,
ret = sev_ioctl(sev_common->sev_fd, KVM_SEV_LAUNCH_UPDATE_DATA,
&update, &fw_error);
if (ret) {
-error_report("%s: LAUNCH_UPDATE ret=%d fw_error=%d '%s'",
-__func__, ret, fw_error, fw_error_to_str(fw_error));
+error_setg(errp, "%s: LAUNCH_UPDATE ret=%d fw_error=%d '%s'", __func__,
+   ret, fw_error, fw_error_to_str(fw_error));
}

return ret;
@@ -1123,8 +1123,8 @@ sev_launch_finish(SevCommonState *sev_common)
migrate_add_blocker(&sev_mig_blocker, &error_fatal);
}

-static int
-snp_launch_update_data(uint64_t gpa, void *hva, size_t len, int type)
+static int snp_launch_update_data(uint64_t gpa, void *hva, size_t len,
+  int type, Error **errp)
{
SevLaunchUpdateData *data;

@@ -1139,23 +1139,21 @@ snp_launch_update_data(uint64_t gpa, void *hva, size_t 
len, int type)
return 0;
}

-static int
-sev_snp_launch_update_data(SevCommonState *sev_common, hwaddr gpa,
-   uint8_t *ptr, size_t len)
+static int sev_snp_launch_update_data(SevCommonState *sev_common, hwaddr gpa,
+  uint8_t *ptr, size_t len, Error **errp)
{
-   int ret = snp_launch_update_data(gpa, ptr, len,
- KVM_SEV_SNP_PAGE_TYPE_NORMAL);
-   return ret;
+return snp_launch_update_data(gpa, ptr, len,
+ KVM_SEV_SNP_PAGE_TYPE_NORMAL, errp);
}

static int
sev_snp_cpuid_info_fill(SnpCpuidInfo *snp_cpuid_info,
-const KvmCpuidInfo *kvm_cpuid_info)
+const KvmCpuidInfo *kvm_cpuid_info, Error **errp)
{
size_t i;

if (kvm_cpuid_info->cpuid.nent > SNP_CPUID_FUNCTION_MAXCOUNT) {
-error_report("SEV-SNP: CPUID entry count (%d) exceeds max (%d)",
+error_setg(errp, "SEV-SNP: CPUID entry count (%d) exceeds max (%d)",
 kvm_cpuid_info->cpuid.nent, SNP_CPUID_FUNCTION_MAXCOUNT);
return -1;
}
@@ -1197,8 +1195,8 @@ sev_snp_cpuid_info_fill(SnpCpuidInfo *snp_cpuid_info,
return 0;
}

-static int
-snp_launch_update_cpuid(uint32_t cpuid_addr, void *hva, size_t cpuid_len)
+static int snp_launch_update_cpuid(uint32_t cpuid_addr, void *hva,
+   size_t cpuid_len, Error **errp)
{
KvmCpuidInfo kvm_cpuid_info = {0};
SnpCpuidInfo snp_cpuid_info;
@@ -1215,26 +1213,25 @@ snp_launch_update_cpuid(uint32_t cpuid_addr, void *hva, 
size_t cpuid_len)
} while (ret == -E2BIG);

if (ret) {
-error_report("SEV-SNP: unable to query CPUID values for CPU: '%s'",
- strerror(-ret));
-return 1;
+error_setg(errp, "SEV-SNP: unable to query CPUID values for CPU: '%s'",
+   strerror(-ret));
+return -1;
}

-ret = sev_snp_cpuid_info_fill(&snp_cpuid_info, &kvm_cpuid_info);
-if (ret) {
-error_report("SEV-SNP: failed to generate CPUID tab

Re: [PATCH v5 05/16] i386/pc_sysfw: Ensure sysfw flash configuration does not conflict with IGVM

2024-09-02 Thread Stefano Garzarella


On Tue, Aug 13, 2024 at 04:01:07PM GMT, Roy Hopkins wrote:

When using an IGVM file the configuration of the system firmware is
defined by IGVM directives contained in the file. In this case the user
should not configure any pflash devices.

This commit skips initialization of the ROM mode when pflash0 is not set
then checks to ensure no pflash devices have been configured when using
IGVM, exiting with an error message if this is not the case.

Signed-off-by: Roy Hopkins 
Reviewed-by: Daniel P. Berrangé 
Reviewed-by: Michael S. Tsirkin 
---
hw/i386/pc_sysfw.c | 31 ---
1 file changed, 28 insertions(+), 3 deletions(-)


Reviewed-by: Stefano Garzarella 



diff --git a/hw/i386/pc_sysfw.c b/hw/i386/pc_sysfw.c
index ef80281d28..f5e40b3ef6 100644
--- a/hw/i386/pc_sysfw.c
+++ b/hw/i386/pc_sysfw.c
@@ -219,7 +219,13 @@ void pc_system_firmware_init(PCMachineState *pcms,
BlockBackend *pflash_blk[ARRAY_SIZE(pcms->flash)];

if (!pcmc->pci_enabled) {
-x86_bios_rom_init(X86_MACHINE(pcms), "bios.bin", rom_memory, true);
+/*
+ * If an IGVM file is specified then the firmware must be provided
+ * in the IGVM file.
+ */
+if (!X86_MACHINE(pcms)->igvm) {
+x86_bios_rom_init(X86_MACHINE(pcms), "bios.bin", rom_memory, true);
+}
return;
}

@@ -239,8 +245,13 @@ void pc_system_firmware_init(PCMachineState *pcms,
}

if (!pflash_blk[0]) {
-/* Machine property pflash0 not set, use ROM mode */
-x86_bios_rom_init(X86_MACHINE(pcms), "bios.bin", rom_memory, false);
+/*
+ * Machine property pflash0 not set, use ROM mode unless using IGVM,
+ * in which case the firmware must be provided by the IGVM file.
+ */
+if (!X86_MACHINE(pcms)->igvm) {
+x86_bios_rom_init(X86_MACHINE(pcms), "bios.bin", rom_memory, 
false);
+}
} else {
if (kvm_enabled() && !kvm_readonly_mem_enabled()) {
/*
@@ -256,6 +267,20 @@ void pc_system_firmware_init(PCMachineState *pcms,
}

pc_system_flash_cleanup_unused(pcms);
+
+/*
+ * The user should not have specified any pflash devices when using IGVM
+ * to configure the guest.
+ */
+if (X86_MACHINE(pcms)->igvm) {
+for (i = 0; i < ARRAY_SIZE(pcms->flash); i++) {
+if (pcms->flash[i]) {
+error_report("pflash devices cannot be configured when "
+ "using IGVM");
+exit(1);
+}
+}
+}
}

void x86_firmware_configure(hwaddr gpa, void *ptr, int size)
--
2.43.0

Re: [PATCH v5 04/16] hw/i386: Add igvm-cfg object and processing for IGVM files

2024-09-02 Thread Stefano Garzarella


On Tue, Aug 13, 2024 at 04:01:06PM GMT, Roy Hopkins wrote:

An IGVM file contains configuration of guest state that should be
applied during configuration of the guest, before the guest is started.

This patch allows the user to add an igvm-cfg object to an X86 machine
configuration that allows an IGVM file to be configured that will be
applied to the guest before it is started.

If an IGVM configuration is provided then the IGVM file is processed at
the end of the board initialization, before the state transition to
PHASE_MACHINE_INITIALIZED.

Signed-off-by: Roy Hopkins 
Reviewed-by: Michael S. Tsirkin 
---
hw/i386/pc.c  | 12 
hw/i386/pc_piix.c | 10 ++
hw/i386/pc_q35.c  | 10 ++
include/hw/i386/x86.h |  3 +++
qemu-options.hx   | 25 +
5 files changed, 60 insertions(+)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index c74931d577..30bbe05e3e 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -1827,6 +1827,18 @@ static void pc_machine_class_init(ObjectClass *oc, void 
*data)
object_class_property_add_bool(oc, "fd-bootchk",
pc_machine_get_fd_bootchk,
pc_machine_set_fd_bootchk);
+
+#if defined(CONFIG_IGVM)
+object_class_property_add_link(oc, "igvm-cfg",
+   TYPE_IGVM_CFG,
+   offsetof(X86MachineState, igvm),
+   object_property_allow_set_link,
+   OBJ_PROP_LINK_STRONG);
+object_class_property_set_description(oc, "igvm-cfg",
+  "Set IGVM configuration");
+#endif
+
+
}

static const TypeInfo pc_machine_info = {
diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index d9e69243b4..78367985b4 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -365,6 +365,16 @@ static void pc_init1(MachineState *machine, const char 
*pci_type)
   x86_nvdimm_acpi_dsmio,
   x86ms->fw_cfg, OBJECT(pcms));
}
+
+#if defined(CONFIG_IGVM)
+/* Apply guest state from IGVM if supplied */
+if (x86ms->igvm) {
+if (IGVM_CFG_GET_CLASS(x86ms->igvm)
+->process(x86ms->igvm, machine->cgs, &error_fatal) < 0) {
+g_assert_not_reached();
+}
+}
+#endif
}

typedef enum PCSouthBridgeOption {
diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
index 9d108b194e..08ef8dc17a 100644
--- a/hw/i386/pc_q35.c
+++ b/hw/i386/pc_q35.c
@@ -329,6 +329,16 @@ static void pc_q35_init(MachineState *machine)
   x86_nvdimm_acpi_dsmio,
   x86ms->fw_cfg, OBJECT(pcms));
}
+
+#if defined(CONFIG_IGVM)
+/* Apply guest state from IGVM if supplied */
+if (x86ms->igvm) {
+if (IGVM_CFG_GET_CLASS(x86ms->igvm)
+->process(x86ms->igvm, machine->cgs, &error_fatal) < 0) {
+g_assert_not_reached();
+}
+}
+#endif
}

#define DEFINE_Q35_MACHINE(major, minor) \
diff --git a/include/hw/i386/x86.h b/include/hw/i386/x86.h
index d43cb3908e..01ac29acf6 100644
--- a/include/hw/i386/x86.h
+++ b/include/hw/i386/x86.h
@@ -25,6 +25,7 @@
#include "hw/intc/ioapic.h"
#include "hw/isa/isa.h"
#include "qom/object.h"
+#include "sysemu/igvm-cfg.h"

struct X86MachineClass {
/*< private >*/
@@ -97,6 +98,8 @@ struct X86MachineState {
 * which means no limitation on the guest's bus locks.
 */
uint64_t bus_lock_ratelimit;
+
+IgvmCfg *igvm;
};

#define X86_MACHINE_SMM  "smm"
diff --git a/qemu-options.hx b/qemu-options.hx
index cee0da2014..b6eee49075 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -5927,6 +5927,31 @@ SRST
 -machine ...,memory-encryption=sev0 \\
 .

+``-object igvm-cfg,file=file``
+Create an IGVM configuration object that defines the initial state
+of the guest using a file in that conforms to the Independent Guest
+Virtual Machine (IGVM) file format.
+
+The ``file`` parameter is used to specify the IGVM file to load.
+When provided, the IGVM file is used to populate the initial
+memory of the virtual machine and, depending on the platform, can
+define the initial processor state, memory map and parameters.
+
+The IGVM file is expected to contain the firmware for the virtual
+machine, therefore an ``igvm-cfg`` object cannot be provided along
+with other ways of specifying firmware, such as the ``-bios``
+parameter on x86 machines.
+
+e.g to launch a machine providing the firmware in an IGVM file
+
+.. parsed-literal::
+
+ # |qemu_system_x86| \\
+ .. \\
+ -object igvm-cfg,id=igvm0,file=bios.igvm \\
+ -machine ...,igvm-cfg=igvm0 \\
+ .
+


Should we mention that this is supported only by `q35` and `pc` machines?


``-object authz-simple,id=id,identity=stri

Re: [PATCH v5 03/16] backends/igvm: Add IGVM loader and configuration

2024-09-02 Thread Stefano Garzarella


On Tue, Aug 13, 2024 at 04:01:05PM GMT, Roy Hopkins wrote:

Adds an IGVM loader to QEMU which processes a given IGVM file and
applies the directives within the file to the current guest
configuration.

The IGVM loader can be used to configure both confidential and
non-confidential guests. For confidential guests, the
ConfidentialGuestSupport object for the system is used to encrypt
memory, apply the initial CPU state and perform other confidential guest
operations.

The loader is configured via a new IgvmCfg QOM object which allows the
user to provide a path to the IGVM file to process.

Signed-off-by: Roy Hopkins 
Acked-by: Michael S. Tsirkin 
---
backends/igvm-cfg.c   |  52 +++
backends/igvm.c   | 805 ++
backends/igvm.h   |  23 ++
backends/meson.build  |   2 +
include/sysemu/igvm-cfg.h |  47 +++
qapi/qom.json |  17 +
6 files changed, 946 insertions(+)
create mode 100644 backends/igvm-cfg.c
create mode 100644 backends/igvm.c
create mode 100644 backends/igvm.h
create mode 100644 include/sysemu/igvm-cfg.h

diff --git a/backends/igvm-cfg.c b/backends/igvm-cfg.c
new file mode 100644
index 00..63f8856c7b
--- /dev/null
+++ b/backends/igvm-cfg.c
@@ -0,0 +1,52 @@
+/*
+ * QEMU IGVM interface
+ *
+ * Copyright (C) 2023-2024 SUSE
+ *
+ * Authors:
+ *  Roy Hopkins 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+
+#include "sysemu/igvm-cfg.h"
+#include "igvm.h"
+#include "qom/object_interfaces.h"
+
+static char *get_igvm(Object *obj, Error **errp)
+{
+IgvmCfg *igvm = IGVM_CFG(obj);
+return g_strdup(igvm->filename);
+}
+
+static void set_igvm(Object *obj, const char *value, Error **errp)
+{
+IgvmCfg *igvm = IGVM_CFG(obj);
+g_free(igvm->filename);
+igvm->filename = g_strdup(value);
+}
+
+OBJECT_DEFINE_TYPE_WITH_INTERFACES(IgvmCfg, igvm_cfg, IGVM_CFG, OBJECT,
+   { TYPE_USER_CREATABLE }, { NULL })
+
+static void igvm_cfg_class_init(ObjectClass *oc, void *data)
+{
+IgvmCfgClass *igvmc = IGVM_CFG_CLASS(oc);
+
+object_class_property_add_str(oc, "file", get_igvm, set_igvm);
+object_class_property_set_description(oc, "file",
+  "Set the IGVM filename to use");
+
+igvmc->process = qigvm_process_file;
+}
+
+static void igvm_cfg_init(Object *obj)
+{
+}
+
+static void igvm_cfg_finalize(Object *obj)
+{
+}
diff --git a/backends/igvm.c b/backends/igvm.c
new file mode 100644
index 00..7a3fedcc76
--- /dev/null
+++ b/backends/igvm.c
@@ -0,0 +1,805 @@
+/*
+ * QEMU IGVM configuration backend for guests
+ *
+ * Copyright (C) 2023-2024 SUSE
+ *
+ * Authors:
+ *  Roy Hopkins 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+
+#include "igvm.h"
+#include "qapi/error.h"
+#include "exec/memory.h"
+#include "exec/address-spaces.h"
+#include "hw/core/cpu.h"
+
+#include 
+#include 
+
+typedef struct QIgvmParameterData {
+QTAILQ_ENTRY(QIgvmParameterData) next;
+uint8_t *data;
+uint32_t size;
+uint32_t index;
+} QIgvmParameterData;
+
+/*
+ * QIgvm contains the information required during processing
+ * of a single IGVM file.
+ */
+typedef struct QIgvm {
+IgvmHandle file;
+ConfidentialGuestSupport *cgs;
+ConfidentialGuestSupportClass *cgsc;
+uint32_t compatibility_mask;
+unsigned current_header_index;
+QTAILQ_HEAD(, QIgvmParameterData) parameter_data;
+
+/* These variables keep track of contiguous page regions */
+IGVM_VHS_PAGE_DATA region_prev_page_data;
+uint64_t region_start;
+unsigned region_start_index;
+unsigned region_last_index;
+unsigned region_page_count;
+} QIgvm;
+
+static int qigvm_directive_page_data(QIgvm *ctx, const uint8_t *header_data,
+ Error **errp);
+static int qigvm_directive_vp_context(QIgvm *ctx, const uint8_t *header_data,
+  Error **errp);
+static int qigvm_directive_parameter_area(QIgvm *ctx,
+  const uint8_t *header_data,
+  Error **errp);
+static int qigvm_directive_parameter_insert(QIgvm *ctx,
+const uint8_t *header_data,
+Error **errp);
+static int qigvm_directive_memory_map(QIgvm *ctx, const uint8_t *header_data,
+  Error **errp);
+static int qigvm_directive_vp_count(QIgvm *ctx, const uint8_t *header_data,
+Error **errp);
+static int qigvm_directive_environment_info(QIgvm *ctx,
+const uint8_t *header_data,
+Error **errp);
+static int

Re: [PATCH v5 02/16] backends/confidential-guest-support: Add functions to support IGVM

2024-09-02 Thread Stefano Garzarella


On Tue, Aug 13, 2024 at 04:01:04PM GMT, Roy Hopkins wrote:

In preparation for supporting the processing of IGVM files to configure
guests, this adds a set of functions to ConfidentialGuestSupport
allowing configuration of secure virtual machines that can be
implemented for each supported isolation platform type such as Intel TDX
or AMD SEV-SNP. These functions will be called by IGVM processing code
in subsequent patches.

This commit provides a default implementation of the functions that
either perform no action or generate an error when they are called.
Targets that support ConfidentalGuestSupport should override these
implementations.

Signed-off-by: Roy Hopkins 
Acked-by: Michael S. Tsirkin 
---
backends/confidential-guest-support.c | 31 +++
include/exec/confidential-guest-support.h | 65 +++
2 files changed, 96 insertions(+)

diff --git a/backends/confidential-guest-support.c 
b/backends/confidential-guest-support.c
index 052fde8db0..68e6fd9d18 100644
--- a/backends/confidential-guest-support.c
+++ b/backends/confidential-guest-support.c
@@ -14,14 +14,45 @@
#include "qemu/osdep.h"

#include "exec/confidential-guest-support.h"
+#include "qapi/error.h"

OBJECT_DEFINE_ABSTRACT_TYPE(ConfidentialGuestSupport,
confidential_guest_support,
CONFIDENTIAL_GUEST_SUPPORT,
OBJECT)

+static int check_support(ConfidentialGuestPlatformType platform,
+ uint16_t platform_version, uint8_t highest_vtl,
+ uint64_t shared_gpa_boundary)
+{
+/* Default: no support. */
+return 0;
+}
+
+static int set_guest_state(hwaddr gpa, uint8_t *ptr, uint64_t len,
+   ConfidentialGuestPageType memory_type,
+   uint16_t cpu_index, Error **errp)
+{
+error_setg(errp,
+   "Setting confidential guest state is not supported for this 
platform");
+return -1;
+}
+
+static int get_mem_map_entry(int index, ConfidentialGuestMemoryMapEntry *entry,
+ Error **errp)
+{
+error_setg(
+errp,
+"Obtaining the confidential guest memory map is not supported for this 
platform");
+return -1;
+}
+
static void confidential_guest_support_class_init(ObjectClass *oc, void *data)
{
+ConfidentialGuestSupportClass *cgsc = CONFIDENTIAL_GUEST_SUPPORT_CLASS(oc);
+cgsc->check_support = check_support;
+cgsc->set_guest_state = set_guest_state;
+cgsc->get_mem_map_entry = get_mem_map_entry;
}

static void confidential_guest_support_init(Object *obj)
diff --git a/include/exec/confidential-guest-support.h 
b/include/exec/confidential-guest-support.h
index 02dc4e518f..058c7535ca 100644
--- a/include/exec/confidential-guest-support.h
+++ b/include/exec/confidential-guest-support.h
@@ -21,6 +21,7 @@
#ifndef CONFIG_USER_ONLY

#include "qom/object.h"
+#include "exec/hwaddr.h"

#define TYPE_CONFIDENTIAL_GUEST_SUPPORT "confidential-guest-support"
OBJECT_DECLARE_TYPE(ConfidentialGuestSupport,
@@ -28,6 +29,36 @@ OBJECT_DECLARE_TYPE(ConfidentialGuestSupport,
CONFIDENTIAL_GUEST_SUPPORT)


+typedef enum ConfidentialGuestPlatformType {
+CGS_PLATFORM_SEV,
+CGS_PLATFORM_SEV_ES,
+CGS_PLATFORM_SEV_SNP,
+} ConfidentialGuestPlatformType;
+
+typedef enum ConfidentialGuestMemoryType {
+CGS_MEM_RAM,
+CGS_MEM_RESERVED,
+CGS_MEM_ACPI,
+CGS_MEM_NVS,
+CGS_MEM_UNUSABLE,
+} ConfidentialGuestMemoryType;
+
+typedef struct ConfidentialGuestMemoryMapEntry {
+uint64_t gpa;
+uint64_t size;
+ConfidentialGuestMemoryType type;
+} ConfidentialGuestMemoryMapEntry;
+
+typedef enum ConfidentialGuestPageType {
+CGS_PAGE_TYPE_NORMAL,
+CGS_PAGE_TYPE_VMSA,
+CGS_PAGE_TYPE_ZERO,
+CGS_PAGE_TYPE_UNMEASURED,
+CGS_PAGE_TYPE_SECRETS,
+CGS_PAGE_TYPE_CPUID,
+CGS_PAGE_TYPE_REQUIRED_MEMORY,
+} ConfidentialGuestPageType;
+
struct ConfidentialGuestSupport {
Object parent;

@@ -66,6 +97,40 @@ typedef struct ConfidentialGuestSupportClass {

int (*kvm_init)(ConfidentialGuestSupport *cgs, Error **errp);
int (*kvm_reset)(ConfidentialGuestSupport *cgs, Error **errp);
+
+/*
+ * Check for to see if this confidential guest supports a particular
+ * platform or configuration
+ */


nit: What about using bool as return type?
I'm also fine with int, but I'd document the return value, since 0 in
this case is not supported, right?

BTW, code LGTM:

Reviewed-by: Stefano Garzarella 


+int (*check_support)(ConfidentialGuestPlatformType platform,
+ uint16_t platform_version, uint8_t highest_vtl,
+ uint64_t shared_gpa_boundary);
+
+/*
+ * Configure part of the state of a guest for a particular set of data, 
page
+ * type and gpa. This can be used for example to pre-popu

Re: [PATCH v5 01/16] meson: Add optional dependency on IGVM library

2024-09-02 Thread Stefano Garzarella


On Tue, Aug 13, 2024 at 04:01:03PM GMT, Roy Hopkins wrote:

The IGVM library allows Independent Guest Virtual Machine files to be
parsed and processed. IGVM files are used to configure guest memory
layout, initial processor state and other configuration pertaining to
secure virtual machines.

This adds the --enable-igvm configure option, enabled by default, which
attempts to locate and link against the IGVM library via pkgconfig and
sets CONFIG_IGVM if found.

The library is added to the system_ss target in backends/meson.build
where the IGVM parsing will be performed by the ConfidentialGuestSupport
object.

Signed-off-by: Roy Hopkins 
Acked-by: Michael S. Tsirkin 
---
backends/meson.build  | 3 +++
meson.build   | 8 
meson_options.txt | 2 ++
scripts/meson-buildoptions.sh | 3 +++
4 files changed, 16 insertions(+)


Reviewed-by: Stefano Garzarella 



diff --git a/backends/meson.build b/backends/meson.build
index da714b93d1..b092a19efc 100644
--- a/backends/meson.build
+++ b/backends/meson.build
@@ -32,6 +32,9 @@ if have_vhost_user_crypto
endif
system_ss.add(when: gio, if_true: files('dbus-vmstate.c'))
system_ss.add(when: 'CONFIG_SGX', if_true: files('hostmem-epc.c'))
+if igvm.found()
+  system_ss.add(igvm)
+endif

system_ss.add(when: 'CONFIG_SPDM_SOCKET', if_true: files('spdm-socket.c'))

diff --git a/meson.build b/meson.build
index c2a050b844..11976674ff 100644
--- a/meson.build
+++ b/meson.build
@@ -1289,6 +1289,12 @@ if host_os == 'linux' and (have_system or have_tools)
   method: 'pkg-config',
   required: get_option('libudev'))
endif
+igvm = not_found
+if not get_option('igvm').auto() or have_system
+  igvm = dependency('igvm', version: '>= 0.3.0',
+method: 'pkg-config',
+required: get_option('igvm'))
+endif

mpathlibs = [libudev]
mpathpersist = not_found
@@ -2420,6 +2426,7 @@ config_host_data.set('CONFIG_CFI', get_option('cfi'))
config_host_data.set('CONFIG_SELINUX', selinux.found())
config_host_data.set('CONFIG_XEN_BACKEND', xen.found())
config_host_data.set('CONFIG_LIBDW', libdw.found())
+config_host_data.set('CONFIG_IGVM', igvm.found())
if xen.found()
  # protect from xen.version() having less than three components
  xen_version = xen.version().split('.') + ['0', '0']
@@ -4520,6 +4527,7 @@ summary_info += {'seccomp support':   seccomp}
summary_info += {'GlusterFS support': glusterfs}
summary_info += {'hv-balloon support': hv_balloon}
summary_info += {'TPM support':   have_tpm}
+summary_info += {'IGVM support':  igvm}
summary_info += {'libssh support':libssh}
summary_info += {'lzo support':   lzo}
summary_info += {'snappy support':snappy}
diff --git a/meson_options.txt b/meson_options.txt
index 0269fa0f16..0b09c152dc 100644
--- a/meson_options.txt
+++ b/meson_options.txt
@@ -111,6 +111,8 @@ option('dbus_display', type: 'feature', value: 'auto',
   description: '-display dbus support')
option('tpm', type : 'feature', value : 'auto',
   description: 'TPM support')
+option('igvm', type: 'feature', value: 'auto',
+   description: 'Independent Guest Virtual Machine (IGVM) file support')

# Do not enable it by default even for Mingw32, because it doesn't
# work on Wine.
diff --git a/scripts/meson-buildoptions.sh b/scripts/meson-buildoptions.sh
index c97079a38c..264e46dd4a 100644
--- a/scripts/meson-buildoptions.sh
+++ b/scripts/meson-buildoptions.sh
@@ -128,6 +128,7 @@ meson_options_help() {
  printf "%s\n" '  hv-balloon  hv-balloon driver (requires Glib 2.68+ GTree 
API)'
  printf "%s\n" '  hvf HVF acceleration support'
  printf "%s\n" '  iconv   Font glyph conversion support'
+  printf "%s\n" '  igvmIGVM file support'
  printf "%s\n" '  jackJACK sound support'
  printf "%s\n" '  keyring Linux keyring support'
  printf "%s\n" '  kvm KVM acceleration support'
@@ -343,6 +344,8 @@ _meson_option_parse() {
--iasl=*) quote_sh "-Diasl=$2" ;;
--enable-iconv) printf "%s" -Diconv=enabled ;;
--disable-iconv) printf "%s" -Diconv=disabled ;;
+--enable-igvm) printf "%s" -Digvm=enabled ;;
+--disable-igvm) printf "%s" -Digvm=disabled ;;
--includedir=*) quote_sh "-Dincludedir=$2" ;;
--enable-install-blobs) printf "%s" -Dinstall_blobs=true ;;
--disable-install-blobs) printf "%s" -Dinstall_blobs=false ;;
--
2.43.0

Re: [PATCH V2 1/1] virtio-pci: Add lookup subregion of VirtIOPCIRegion MR

2024-08-27 Thread Stefano Garzarella


On Tue, Aug 20, 2024 at 07:56:31PM GMT, Gao Shiyuan wrote:

When VHOST_USER_PROTOCOL_F_HOST_NOTIFIER feature negotiated and
virtio_queue_set_host_notifier_mr success on system blk
device's queue, the VM can't load MBR if the notify region's
address above 4GB.

Assign the address of notify region in the modern bar above 4G, the vp_notify
in SeaBIOS will use PCI Cfg Capability to write notify region. This will trap
into QEMU and be handled by the host bridge when we don't enable mmconfig.
QEMU will call virtio_write_config and since it writes to the BAR region
through the PCI Cfg Capability, it will call virtio_address_space_write.

virtio_queue_set_host_notifier_mr add host notifier subregion of notify region
MR, QEMU need write the mmap address instead of eventfd notify the hardware
accelerator at the vhost-user backend. So virtio_address_space_lookup in
virtio_address_space_write need return a host-notifier subregion of notify MR
instead of notify MR.

Add lookup subregion of VirtIOPCIRegion MR instead of only lookup container MR.

Fixes: a93c8d8 ("virtio-pci: Replace modern_as with direct access to 
modern_bar")

Co-developed-by: Zuo Boqun 
Signed-off-by: Gao Shiyuan 
Signed-off-by: Zuo Boqun 
---
hw/virtio/virtio-pci.c | 14 --
1 file changed, 12 insertions(+), 2 deletions(-)

---
v1 -> v2:
* modify commit message
* replace direct iteration over subregions with memory_region_find.

diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index 9534730bba..5d2d27a6a3 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -610,19 +610,29 @@ static MemoryRegion 
*virtio_address_space_lookup(VirtIOPCIProxy *proxy,
{
int i;
VirtIOPCIRegion *reg;
+MemoryRegion *mr = NULL;


`mr` looks unused.


+MemoryRegionSection mrs;


Please, can you move this declaration in the inner block where it's 
used?




for (i = 0; i < ARRAY_SIZE(proxy->regs); ++i) {
reg = &proxy->regs[i];
if (*off >= reg->offset &&
*off + len <= reg->offset + reg->size) {
-*off -= reg->offset;
-return ®->mr;
+mrs = memory_region_find(®->mr, *off - reg->offset, 
len);

+if (!mrs.mr) {
+error_report("Failed to find memory region for address"
+ "0x%" PRIx64 "", *off);
+return NULL;
+}
+*off = mrs.offset_within_region;
+memory_region_unref(mrs.mr);
+return mrs.mr;
}
}

return NULL;
}

+


Unrelated change.

Thanks,
Stefano


/* Below are generic functions to do memcpy from/to an address space,
 * without byteswaps, with input validation.
 *
--
2.39.3 (Apple Git-146)

Re: [PATCH] vhost-user: add NEED_REPLY flag

2024-08-27 Thread Stefano Garzarella


On Mon, Aug 12, 2024 at 12:53:19PM GMT, 陆知行 wrote:

Hi, can someone review this patch?
I find requests which call  vhost_user_get_u64 does not set NEED_REPLY flag


Can you provide an example to trigger this issue?

Also, with this change all calls to vhost_user_get_u64() will set that 
flag, is that following the vhost-user user specification?


Please use `scripts/checkpatch.pl` before sending patches, this one for 
example is missing SoB.


Thanks,
Stefano



luzhixing12345  于2024年8月4日周日 23:50写道：


Front-end message requests which need reply should set NEED_REPLY_MASK
in flag, and response from slave need clear NEED_REPLY_MASK flag.

---
 hw/virtio/vhost-user.c| 2 +-
 subprojects/libvhost-user/libvhost-user.c | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index 00561daa06..edf2271e0a 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -1082,7 +1082,7 @@ static int vhost_user_get_u64(struct vhost_dev *dev,
int request, uint64_t *u64)
 int ret;
 VhostUserMsg msg = {
 .hdr.request = request,
-.hdr.flags = VHOST_USER_VERSION,
+.hdr.flags = VHOST_USER_VERSION | VHOST_USER_NEED_REPLY_MASK,
 };

 if (vhost_user_per_device_request(request) && dev->vq_index != 0) {
diff --git a/subprojects/libvhost-user/libvhost-user.c
b/subprojects/libvhost-user/libvhost-user.c
index 9c630c2170..40f665bd7f 100644
--- a/subprojects/libvhost-user/libvhost-user.c
+++ b/subprojects/libvhost-user/libvhost-user.c
@@ -667,6 +667,7 @@ vu_send_reply(VuDev *dev, int conn_fd, VhostUserMsg
*vmsg)
 {
 /* Set the version in the flags when sending the reply */
 vmsg->flags &= ~VHOST_USER_VERSION_MASK;
+vmsg->flags &= ~VHOST_USER_NEED_REPLY_MASK;
 vmsg->flags |= VHOST_USER_VERSION;
 vmsg->flags |= VHOST_USER_REPLY_MASK;

--
2.34.1

Re: [PATCH] virtio/vhost-user: fix qemu crash when hotunplug vhost-user-net device

2024-08-27 Thread Stefano Garzarella


On Wed, Aug 07, 2024 at 05:55:08PM GMT, yaozhenguo wrote:

When hotplug and hotunplug vhost-user-net device quickly.


I'd replace the . with ,


qemu will crash. BT is as below:

0  __pthread_kill_implementation () at /usr/lib64/libc.so.6
1  raise () at /usr/lib64/libc.so.6
2  abort () at /usr/lib64/libc.so.6
3  try_dequeue () at ../util/rcu.c:235
4  call_rcu_thread (opaque=opaque@entry=0x0) at ../util/rcu.c:288
5  qemu_thread_start (args=0x55b10d9ceaa0) at ../util/qemu-thread-posix.c:541
6  start_thread () at /usr/lib64/libc.so.6
7  clone3 () at /usr/lib64/libc.so.6

1. device_del qmp process

virtio_set_status
 vhost_dev_stop
   vhost_user_get_vring_base
 vhost_user_host_notifier_remove

vhost_user_slave_handle_vring_host_notifier maybe called asynchronous after

 ^
Now it's called vhost_user_backend_handle_vring_host_notifier, I'd 
suggest to use the new name.


vhost_user_host_notifier_remove. vhost_user_host_notifier_remove will 
not

all call_rcu because of notifier->addr is NULL at this time.


s/all/call ?



2. netdev_del qmp process

vhost_user_cleanup
  vhost_user_host_notifier_remove
  g_free_rcu

vhost_user_host_notifier_remove and g_free_rcu will sumbit same rcu_head


s/sumbit/submit


to rcu node list. rcu_call_count add twice but only one node is added.
rcu thread will abort when calling try_dequeue with node list is empty.


What's not clear to me is how 1 and 2 are related, could you explain 
that?



Fix this by moving g_free(n) to vhost_user_host_notifier_free.
`
Fixes: 503e355465 ("virtio/vhost-user: dynamically assign 
VhostUserHostNotifiers")
Signed-off-by: yaozhenguo 
---
hw/virtio/vhost-user.c | 23 +++
include/hw/virtio/vhost-user.h |  1 +
2 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index 00561daa06..7ab37c0da2 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -1188,6 +1188,12 @@ static void 
vhost_user_host_notifier_free(VhostUserHostNotifier *n)
assert(n && n->unmap_addr);
munmap(n->unmap_addr, qemu_real_host_page_size());
n->unmap_addr = NULL;
+if (n->need_free) {
+memory_region_transaction_begin();
+object_unparent(OBJECT(&n->mr));
+memory_region_transaction_commit();
+g_free(n);
+}
}

/*
@@ -1195,7 +1201,7 @@ static void 
vhost_user_host_notifier_free(VhostUserHostNotifier *n)
 * under rcu.
 */
static void vhost_user_host_notifier_remove(VhostUserHostNotifier *n,
-VirtIODevice *vdev)
+VirtIODevice *vdev, bool free)
{
if (n->addr) {
if (vdev) {
@@ -1204,6 +1210,7 @@ static void 
vhost_user_host_notifier_remove(VhostUserHostNotifier *n,
assert(!n->unmap_addr);
n->unmap_addr = n->addr;
n->addr = NULL;
+n->need_free = free;
call_rcu(n, vhost_user_host_notifier_free, rcu);
}
}
@@ -1280,7 +1287,7 @@ static int vhost_user_get_vring_base(struct vhost_dev 
*dev,

VhostUserHostNotifier *n = fetch_notifier(u->user, ring->index);
if (n) {
-vhost_user_host_notifier_remove(n, dev->vdev);
+vhost_user_host_notifier_remove(n, dev->vdev, false);
}

ret = vhost_user_write(dev, &msg, NULL, 0);
@@ -1562,7 +1569,7 @@ static int 
vhost_user_backend_handle_vring_host_notifier(struct vhost_dev *dev,
 * new mapped address.
 */
n = fetch_or_create_notifier(user, queue_idx);
-vhost_user_host_notifier_remove(n, vdev);
+vhost_user_host_notifier_remove(n, vdev, false);

if (area->u64 & VHOST_USER_VRING_NOFD_MASK) {
return 0;
@@ -2737,13 +2744,7 @@ static void vhost_user_state_destroy(gpointer data)
{
VhostUserHostNotifier *n = (VhostUserHostNotifier *) data;
if (n) {
-vhost_user_host_notifier_remove(n, NULL);
-object_unparent(OBJECT(&n->mr));
-/*
- * We can't free until vhost_user_host_notifier_remove has
- * done it's thing so schedule the free with RCU.
- */
-g_free_rcu(n, rcu);
+vhost_user_host_notifier_remove(n, NULL, true);


I'm not sure I understand the problem well, but could it be that now we 
don't see the problem anymore, but we have a memory leak?


Here for example could it be the case that `n->addr` is NULL and 
therefore `vhost_user_host_notifier_free` with `n->need_free = true` 
will never be submitted?



}
}

@@ -2765,9 +2766,7 @@ void vhost_user_cleanup(VhostUserState *user)
if (!user->chr) {
return;
}
-memory_region_transaction_begin();
user->notifiers = (GPtrArray *) g_ptr_array_free(user->notifiers, 
true);

-memory_region_transaction_commit();


This is no longer necessary, because the `user->notifiers` free function 
no longer calls `object_unparent(OBJECT(&n->mr))`, right?


Maybe it's worth mentioning in the commit description.


user->chr = NULL;
}

diff --git a/include/hw/virtio/vhost-us

[PATCH] block/blkio: use FUA flag on write zeroes only if supported

2024-08-08 Thread Stefano Garzarella

libblkio supports BLKIO_REQ_FUA with write zeros requests only since
version 1.4.0, so let's inform the block layer that the blkio driver
supports it only in this case. Otherwise we can have runtime errors
as reported in https://issues.redhat.com/browse/RHEL-32878

Fixes: fd66dbd424 ("blkio: add libblkio block driver")
Cc: qemu-sta...@nongnu.org
Buglink: https://issues.redhat.com/browse/RHEL-32878
Signed-off-by: Stefano Garzarella 
---
 meson.build   | 2 ++
 block/blkio.c | 6 --
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/meson.build b/meson.build
index c2a050b844..81ecd4bae7 100644
--- a/meson.build
+++ b/meson.build
@@ -2305,6 +2305,8 @@ config_host_data.set('CONFIG_BLKIO', blkio.found())
 if blkio.found()
   config_host_data.set('CONFIG_BLKIO_VHOST_VDPA_FD',
blkio.version().version_compare('>=1.3.0'))
+  config_host_data.set('CONFIG_BLKIO_WRITE_ZEROS_FUA',
+   blkio.version().version_compare('>=1.4.0'))
 endif
 config_host_data.set('CONFIG_CURL', curl.found())
 config_host_data.set('CONFIG_CURSES', curses.found())
diff --git a/block/blkio.c b/block/blkio.c
index 3d9a2e764c..e0e765af63 100644
--- a/block/blkio.c
+++ b/block/blkio.c
@@ -899,8 +899,10 @@ static int blkio_open(BlockDriverState *bs, QDict 
*options, int flags,
 }
 
 bs->supported_write_flags = BDRV_REQ_FUA | BDRV_REQ_REGISTERED_BUF;
-bs->supported_zero_flags = BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP |
-   BDRV_REQ_NO_FALLBACK;
+bs->supported_zero_flags = BDRV_REQ_MAY_UNMAP | BDRV_REQ_NO_FALLBACK;
+#ifdef CONFIG_BLKIO_WRITE_ZEROS_FUA
+bs->supported_zero_flags |= BDRV_REQ_FUA;
+#endif
 
 qemu_mutex_init(&s->blkio_lock);
 qemu_co_mutex_init(&s->bounce_lock);
-- 
2.45.2

Re: [PATCH] vhost-user: rewrite vu_dispatch with if-else

2024-08-05 Thread Stefano Garzarella


On Sun, Aug 04, 2024 at 10:23:53PM GMT, luzhixing12345 wrote:

rewrite with if-else instead of goto


Why?

IMHO was better before this patch with a single error path.



and I have a question, in two incorrent cases

- need reply but no reply_requested
- no need reply but has reply_requested

should we call vu_panic or print warning message?

---
subprojects/libvhost-user/libvhost-user.c | 39 +--
subprojects/libvhost-user/libvhost-user.h |  6 ++--
2 files changed, 27 insertions(+), 18 deletions(-)

diff --git a/subprojects/libvhost-user/libvhost-user.c 
b/subprojects/libvhost-user/libvhost-user.c

index 9c630c2170..187e25f9bb 100644
--- a/subprojects/libvhost-user/libvhost-user.c
+++ b/subprojects/libvhost-user/libvhost-user.c
@@ -2158,32 +2158,39 @@ vu_dispatch(VuDev *dev)
{
VhostUserMsg vmsg = { 0, };
int reply_requested;
-bool need_reply, success = false;
+bool need_reply, success = true;

if (!dev->read_msg(dev, dev->sock, &vmsg)) {
-goto end;
+success = false;
+free(vmsg.data);
+return success;
}

need_reply = vmsg.flags & VHOST_USER_NEED_REPLY_MASK;

reply_requested = vu_process_message(dev, &vmsg);
-if (!reply_requested && need_reply) {
-vmsg_set_reply_u64(&vmsg, 0);
-reply_requested = 1;
-}
-
-if (!reply_requested) {
-success = true;
-goto end;
-}

-if (!vu_send_reply(dev, dev->sock, &vmsg)) {
-goto end;
+if (need_reply) {
+if (reply_requested) {
+if (!vu_send_reply(dev, dev->sock, &vmsg)) {
+success = false;
+}
+} else {
+// need reply but no reply requested, return 0(u64)
+vmsg_set_reply_u64(&vmsg, 0);
+if (!vu_send_reply(dev, dev->sock, &vmsg)) {
+success = false;
+}
+}
+} else {
+// no need reply but reply requested, send a reply
+if (reply_requested) {
+if (!vu_send_reply(dev, dev->sock, &vmsg)) {
+success = false;
+}
+}
}

-success = true;
-
-end:
free(vmsg.data);
return success;
}
diff --git a/subprojects/libvhost-user/libvhost-user.h 
b/subprojects/libvhost-user/libvhost-user.h
index deb40e77b3..2daf8578f6 100644
--- a/subprojects/libvhost-user/libvhost-user.h
+++ b/subprojects/libvhost-user/libvhost-user.h
@@ -238,6 +238,8 @@ typedef struct VuDev VuDev;

typedef uint64_t (*vu_get_features_cb) (VuDev *dev);
typedef void (*vu_set_features_cb) (VuDev *dev, uint64_t features);
+typedef uint64_t (*vu_get_protocol_features_cb) (VuDev *dev);
+typedef void (*vu_set_protocol_features_cb) (VuDev *dev, uint64_t features);


Are these changes related?

Stefano


typedef int (*vu_process_msg_cb) (VuDev *dev, VhostUserMsg *vmsg,
  int *do_reply);
typedef bool (*vu_read_msg_cb) (VuDev *dev, int sock, VhostUserMsg *vmsg);
@@ -256,9 +258,9 @@ typedef struct VuDevIface {
vu_set_features_cb set_features;
/* get the protocol feature bitmask from the underlying vhost
 * implementation */
-vu_get_features_cb get_protocol_features;
+vu_get_protocol_features_cb get_protocol_features;
/* enable protocol features in the underlying vhost implementation. */
-vu_set_features_cb set_protocol_features;
+vu_set_protocol_features_cb set_protocol_features;
/* process_msg is called for each vhost-user message received */
/* skip libvhost-user processing if return value != 0 */
vu_process_msg_cb process_msg;
--
2.34.1

Re: [PATCH] docs: fix vhost-user protocol doc

2024-08-05 Thread Stefano Garzarella


On Sun, Aug 04, 2024 at 01:04:20PM GMT, luzhixing12345 wrote:

add a ref link to Memory region description

add extra type(64 bits) to Log description structure fields

fix ’s to 's

---
docs/interop/vhost-user.rst | 22 +-
1 file changed, 13 insertions(+), 9 deletions(-)


Please run `scripts/checkpatch.pl` before sending.

S-o-b missing here.



diff --git a/docs/interop/vhost-user.rst b/docs/interop/vhost-user.rst
index d8419fd2f1..e34b305bd9 100644
--- a/docs/interop/vhost-user.rst
+++ b/docs/interop/vhost-user.rst
@@ -167,6 +167,8 @@ A vring address description
Note that a ring address is an IOVA if ``VIRTIO_F_IOMMU_PLATFORM`` has
been negotiated. Otherwise it is a user address.

+.. _memory_region_description:
+
Memory region description
^

@@ -180,7 +182,7 @@ Memory region description

:user address: a 64-bit user address

-:mmap offset: 64-bit offset where region starts in the mapped memory
+:mmap offset: a 64-bit offset where region starts in the mapped memory

When the ``VHOST_USER_PROTOCOL_F_XEN_MMAP`` protocol feature has been
successfully negotiated, the memory region description contains two extra
@@ -190,7 +192,7 @@ fields at the end.
| guest address | size | user address | mmap offset | xen mmap flags | domid |
+---+--+--+-++---+

-:xen mmap flags: 32-bit bit field
+:xen mmap flags: a 32-bit bit field

- Bit 0 is set for Xen foreign memory mapping.
- Bit 1 is set for Xen grant memory mapping.
@@ -211,6 +213,8 @@ Single memory region description

:padding: 64-bit

+:region: :ref:`Memory region description `
+
A region is represented by Memory region description.


Should we merge this line with the one added?



Multiple Memory regions description


Should we extend also the Multiple Memory region description?


@@ -233,9 +237,9 @@ Log description
| log size | log offset |
+--++

-:log size: size of area used for logging
+:log size: a 64-bit size of area used for logging

-:log offset: offset from start of supplied file descriptor where
+:log offset: a 64-bit offset from start of supplied file descriptor where
 logging starts (i.e. where guest address 0 would be
 logged)

@@ -382,7 +386,7 @@ the kernel implementation.

The communication consists of the *front-end* sending message requests and
the *back-end* sending message replies. Most of the requests don't require
-replies. Here is a list of the ones that do:
+replies, except for the following requests:

* ``VHOST_USER_GET_FEATURES``
* ``VHOST_USER_GET_PROTOCOL_FEATURES``
@@ -1239,11 +1243,11 @@ Front-end message types
  (*a vring descriptor index for split virtqueues* vs. *vring descriptor
  indices for packed virtqueues*).

-  When and as long as all of a device’s vrings are stopped, it is
+  When and as long as all of a device's vrings are stopped, it is
  *suspended*, see :ref:`Suspended device state
  `.

-  The request payload’s *num* field is currently reserved and must be
+  The request payload's *num* field is currently reserved and must be
  set to 0.

``VHOST_USER_SET_VRING_KICK``
@@ -1662,7 +1666,7 @@ Front-end message types
  :reply payload: ``u64``

  Front-end and back-end negotiate a channel over which to transfer the
-  back-end’s internal state during migration.  Either side (front-end or
+  back-end's internal state during migration.  Either side (front-end or
  back-end) may create the channel.  The nature of this channel is not
  restricted or defined in this document, but whichever side creates it
  must create a file descriptor that is provided to the respectively
@@ -1714,7 +1718,7 @@ Front-end message types
  :request payload: N/A
  :reply payload: ``u64``

-  After transferring the back-end’s internal state during migration (see
+  After transferring the back-end's internal state during migration (see
  the :ref:`Migrating back-end state `
  section), check whether the back-end was able to successfully fully
  process the state.
--
2.34.1

Re: [PATCH v4 11/17] docs/system: Add documentation on support for IGVM

2024-07-29 Thread Stefano Garzarella

U state with VMSA
+---
+
+The initial state of guest CPUs can be defined in the IGVM file for AMD SEV-ES
+and SEV-SNP. The state data is provided as a VMSA structure as defined in Table
+B-4 in the AMD64 Architecture Programmer's Manual, Volume 2 [1].
+
+The IGVM VMSA is translated to CPU state in QEMU which is then synchronized
+by KVM to the guest VMSA during the launch process where it contributes to the
+launch measurement. See :ref:`amd-sev` for details on the launch process and
+guest launch measurement.
+
+It is important that no information is lost or changed when translating the
+VMSA provided by the IGVM file into the VSMA that is used to launch the guest.
+Therefore, QEMU restricts the VMSA fields that can be provided in the IGVM
+VMSA structure to the following registers:
+
+RAX, RCX, RDX, RBX, RBP, RSI, RDI, R8-R15, RSP, RIP, CS, DS, ES, FS, GS, SS,
+CR0, CR3, CR4, XCR0, EFER, PAT, GDT, IDT, LDTR, TR, DR6, DR7, RFLAGS, X87_FCW,
+MXCSR.
+
+When processing the IGVM file, QEMU will check if any fields other than the
+above are non-zero and generate an error if this is the case.
+
+KVM uses a hardcoded GPA of 0xF000 for the VMSA. When an IGVM file
+defines initial CPU state, the GPA for each VMSA must match this hardcoded
+value.
+
+Firmware Images with IGVM
+-
+
+When an IGVM filename is specified for a Confidential Guest Support object it
+overrides the default handling of system firmware: the firmware image, such as
+an OVMF binary should be contained as a payload of the IGVM file and not
+provided as a flash drive or via the ``-bios`` parameter. The default QEMU
+firmware is not automatically populated into the guest memory space.
+
+If an IGVM file is provided along with either the ``-bios`` parameter or pflash
+devices then an error is displayed and the guest startup is aborted.
+
+Running a guest configured using IGVM
+-
+
+To run a guest configured with IGVM you firstly need to generate an IGVM file
+that contains a guest configuration compatible with the platform you are
+targeting.
+
+The ``buildigvm`` tool [2] is an example of a tool that can be used to generate
+IGVM files for non-confidential X86 platforms as well as for SEV, SEV-ES and
+SEV-SNP confidential platforms.
+
+Example using this tool to generate an IGVM file for AMD SEV-SNP::
+
+buildigvm --firmware /path/to/OVMF.fd --output sev-snp.igvm \
+  --cpucount 4 sev-snp
+
+To run a guest configured with the generated IGVM you need to add an
+``igvm-cfg`` object and refer to it from the ``-machine`` parameter:
+
+Example (for AMD SEV)::
+
+qemu-system-x86_64 \
+ \
+-machine ...,confidential-guest-support=sev0,igvm-cfg=igvm0 \
+-object sev-guest,id=sev0,cbitpos=47,reduced-phys-bits=1 \
+-object igvm-cfg,id=igvm0,file=/path/to/sev-snp.igvm
+
+References
+--
+
+[1] AMD64 Architecture Programmer's Manual, Volume 2: System Programming
+  Rev 3.41
+  
https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/programmer-references/24593.pdf
+
+[2] ``buildigvm`` - A tool to build example IGVM files containing OVMF firmware
+  https://github.com/roy-hopkins/buildigvm


Should we also put a reference to the tool we have in Coconut SVSM?

BTW, this patch LGTM:

Reviewed-by: Stefano Garzarella 


\ No newline at end of file
diff --git a/docs/system/index.rst b/docs/system/index.rst
index c21065e519..6235dfab87 100644
--- a/docs/system/index.rst
+++ b/docs/system/index.rst
@@ -38,4 +38,5 @@ or Hypervisor.Framework.
   security
   multi-process
   confidential-guest-support
+   igvm
   vm-templating
--
2.43.0

Re: [PATCH v4 03/17] backends/igvm: Add IGVM loader and configuration

2024-07-29 Thread Stefano Garzarella


On Wed, Jul 03, 2024 at 12:05:41PM GMT, Roy Hopkins wrote:

Adds an IGVM loader to QEMU which processes a given IGVM file and
applies the directives within the file to the current guest
configuration.

The IGVM loader can be used to configure both confidential and
non-confidential guests. For confidential guests, the
ConfidentialGuestSupport object for the system is used to encrypt
memory, apply the initial CPU state and perform other confidential guest
operations.

The loader is configured via a new IgvmCfg QOM object which allows the
user to provide a path to the IGVM file to process.

Signed-off-by: Roy Hopkins 
---
qapi/qom.json |  17 +
backends/igvm.h   |  23 ++
include/sysemu/igvm-cfg.h |  54 +++
backends/igvm-cfg.c   |  66 
backends/igvm.c   | 799 ++
backends/meson.build  |   2 +
6 files changed, 961 insertions(+)
create mode 100644 backends/igvm.h
create mode 100644 include/sysemu/igvm-cfg.h
create mode 100644 backends/igvm-cfg.c
create mode 100644 backends/igvm.c

diff --git a/qapi/qom.json b/qapi/qom.json
index 8bd299265e..93b416e697 100644
--- a/qapi/qom.json
+++ b/qapi/qom.json
@@ -874,6 +874,19 @@
  'base': 'RngProperties',
  'data': { '*filename': 'str' } }

+##
+# @IgvmCfgProperties:
+#
+# Properties common to objects that handle IGVM files.
+#
+# @file: IGVM file to use to configure guest (default: none)
+#
+# Since: 9.1
+##
+{ 'struct': 'IgvmCfgProperties',
+  'if': 'CONFIG_IGVM',
+  'data': { '*file': 'str' } }
+
##
# @SevCommonProperties:
#
@@ -1039,6 +1052,8 @@
'filter-redirector',
'filter-replay',
'filter-rewriter',
+{ 'name': 'igvm-cfg',
+  'if': 'CONFIG_IGVM' },
'input-barrier',
{ 'name': 'input-linux',
  'if': 'CONFIG_LINUX' },
@@ -,6 +1126,8 @@
  'filter-redirector':  'FilterRedirectorProperties',
  'filter-replay':  'NetfilterProperties',
  'filter-rewriter':'FilterRewriterProperties',
+  'igvm-cfg':   { 'type': 'IgvmCfgProperties',
+  'if': 'CONFIG_IGVM' },
  'input-barrier':  'InputBarrierProperties',
  'input-linux':{ 'type': 'InputLinuxProperties',
  'if': 'CONFIG_LINUX' },
diff --git a/backends/igvm.h b/backends/igvm.h
new file mode 100644
index 00..a206fb85da
--- /dev/null
+++ b/backends/igvm.h
@@ -0,0 +1,23 @@
+/*
+ * QEMU IGVM configuration backend for Confidential Guests
+ *
+ * Copyright (C) 2023-2024 SUSE
+ *
+ * Authors:
+ *  Roy Hopkins 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#ifndef BACKENDS_IGVM_H
+#define BACKENDS_IGVM_H
+
+#include "exec/confidential-guest-support.h"
+#include "sysemu/igvm-cfg.h"
+#include "qapi/error.h"
+
+int igvm_process_file(IgvmCfgState *igvm, ConfidentialGuestSupport *cgs,
+  Error **errp);
+
+#endif
diff --git a/include/sysemu/igvm-cfg.h b/include/sysemu/igvm-cfg.h
new file mode 100644
index 00..8ac8b33d8d
--- /dev/null
+++ b/include/sysemu/igvm-cfg.h
@@ -0,0 +1,54 @@
+/*
+ * QEMU IGVM interface
+ *
+ * Copyright (C) 2024 SUSE
+ *
+ * Authors:
+ *  Roy Hopkins 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#ifndef QEMU_IGVM_CFG_H
+#define QEMU_IGVM_CFG_H
+
+#include "qom/object.h"
+
+typedef struct IgvmCfgState {
+ObjectClass parent_class;
+
+/*
+ * filename: Filename that specifies a file that contains the configuration
+ *   of the guest in Independent Guest Virtual Machine (IGVM)
+ *   format.
+ */
+char *filename;
+} IgvmCfgState;
+
+typedef struct IgvmCfgClass {
+ObjectClass parent_class;
+
+/*
+ * If an IGVM filename has been specified then process the IGVM file.
+ * Performs a no-op if no filename has been specified.
+ *
+ * Returns 0 for ok and -1 on error.
+ */
+int (*process)(IgvmCfgState *cfg, ConfidentialGuestSupport *cgs,
+   Error **errp);
+
+} IgvmCfgClass;
+
+#define TYPE_IGVM_CFG "igvm-cfg"
+
+#define IGVM_CFG_CLASS_SUFFIX "-" TYPE_IGVM_CFG
+#define IGVM_CFG_CLASS_NAME(a) (a IGVM_CFG_CLASS_SUFFIX)
+
+#define IGVM_CFG_CLASS(klass) \
+OBJECT_CLASS_CHECK(IgvmCfgClass, (klass), TYPE_IGVM_CFG)
+#define IGVM_CFG(obj) OBJECT_CHECK(IgvmCfgState, (obj), TYPE_IGVM_CFG)
+#define IGVM_CFG_GET_CLASS(obj) \
+OBJECT_GET_CLASS(IgvmCfgClass, (obj), TYPE_IGVM_CFG)
+
+#endif
diff --git a/backends/igvm-cfg.c b/backends/igvm-cfg.c
new file mode 100644
index 00..5e18f3fd5f
--- /dev/null
+++ b/backends/igvm-cfg.c
@@ -0,0 +1,66 @@
+/*
+ * QEMU IGVM interface
+ *
+ * Copyright (C) 2023-2024 SUSE
+ *
+ * Authors:
+ *  Roy Hopkins 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING

[PATCH v2] scripts/checkpatch: more checks on files imported from Linux

2024-07-18 Thread Stefano Garzarella

If a file imported from Linux is touched, emit a warning and suggest
using scripts/update-linux-headers.sh.

Also check that updating imported files from Linux are not mixed with
other changes, in which case emit an error.

Signed-off-by: Stefano Garzarella 
---
v2:
- added an error when mixing imported files with other changes [Daniel,
  Cornelia]

v1: https://patchew.org/QEMU/20240717093752.50595-1-sgarz...@redhat.com/
---
 scripts/checkpatch.pl | 24 
 1 file changed, 24 insertions(+)

diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index ff373a7083..65b6f46f90 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -1374,6 +1374,9 @@ sub process {
my $in_header_lines = $file ? 0 : 1;
my $in_commit_log = 0;  #Scanning lines before patch
my $reported_maintainer_file = 0;
+   my $reported_mixing_imported_file = 0;
+   my $in_imported_file = 0;
+   my $in_no_imported_file = 0;
my $non_utf8_charset = 0;
 
our @report = ();
@@ -1673,6 +1676,27 @@ sub process {
 # ignore non-hunk lines and lines being removed
next if (!$hunk_line || $line =~ /^-/);
 
+# Check that updating imported files from Linux are not mixed with other 
changes
+   if ($realfile =~ 
/^(linux-headers|include\/standard-headers)\//) {
+   if (!$in_imported_file) {
+   WARN("added, moved or deleted file(s) " .
+"imported from Linux, are you using " .
+"scripts/update-linux-headers.sh?\n" .
+$herecurr);
+   }
+   $in_imported_file = 1;
+   } else {
+   $in_no_imported_file = 1;
+   }
+
+   if (!$reported_mixing_imported_file &&
+   $in_imported_file && $in_no_imported_file) {
+   ERROR("headers imported from Linux should be self-" .
+ "contained in a patch with no other changes\n" .
+ $herecurr);
+   $reported_mixing_imported_file = 1;
+   }
+
 # ignore files that are being periodically imported from Linux
next if ($realfile =~ 
/^(linux-headers|include\/standard-headers)\//);
 
-- 
2.45.2

Re: [PATCH] scripts/checkpatch: emit a warning if an imported file is touched

2024-07-17 Thread Stefano Garzarella


On Wed, Jul 17, 2024 at 11:58:46AM GMT, Cornelia Huck wrote:

On Wed, Jul 17 2024, Stefano Garzarella  wrote:


If a file imported from Linux is touched, emit a warning and suggest
using scripts/update-linux-headers.sh

Signed-off-by: Stefano Garzarella 
---
 scripts/checkpatch.pl | 14 --
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index ff373a7083..b0e8266fa2 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -1374,6 +1374,7 @@ sub process {
my $in_header_lines = $file ? 0 : 1;
my $in_commit_log = 0;  #Scanning lines before patch
my $reported_maintainer_file = 0;
+   my $reported_imported_file = 0;
my $non_utf8_charset = 0;

our @report = ();
@@ -1673,8 +1674,17 @@ sub process {
 # ignore non-hunk lines and lines being removed
next if (!$hunk_line || $line =~ /^-/);

-# ignore files that are being periodically imported from Linux
-   next if ($realfile =~ 
/^(linux-headers|include\/standard-headers)\//);
+# ignore files that are being periodically imported from Linux and emit a 
warning
+   if ($realfile =~ 
/^(linux-headers|include\/standard-headers)\//) {
+   if (!$reported_imported_file) {
+   $reported_imported_file = 1;
+   WARN("added, moved or deleted file(s) " .
+"imported from Linux, are you using " .
+"scripts/update-linux-headers.sh?\n" .
+$herecurr);
+   }
+   next;
+   }


Thanks, that looks useful -- just two comments (sorry, my perl-fu is
low):


Same perl-fu here ;-P


- Is there a way to check that this is a proper linux headers update?
 We'd have to rely on heuristics, but OTOH, we also usually want a
 headers update to use a certain format ($SUBJECT containing "headers
 update", patch description pointing to the version this update was
 done against.) Not sure if it is worth actually trying to figure this
 out.


I think it can be done though I think we should formalize it somewhere 
first, or integrate the generation of the commit in the 
scripts/update-linux-headers.sh


At that point here we can add a check based on that.


- A common issue is headers changes mixed in with other code changes,
 which should not happen -- can we check for that as well and advise
 to either do a headers update, or use a placeholder patch?


Yeah, Daniel suggested the same, I'll address in v2.

Thanks,
Stefano

Re: [PATCH] scripts/checkpatch: emit a warning if an imported file is touched

2024-07-17 Thread Stefano Garzarella


On Wed, Jul 17, 2024 at 10:50:51AM GMT, Daniel P. Berrangé wrote:

On Wed, Jul 17, 2024 at 11:37:52AM +0200, Stefano Garzarella wrote:

If a file imported from Linux is touched, emit a warning and suggest
using scripts/update-linux-headers.sh

Signed-off-by: Stefano Garzarella 
---
 scripts/checkpatch.pl | 14 --
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index ff373a7083..b0e8266fa2 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -1374,6 +1374,7 @@ sub process {
my $in_header_lines = $file ? 0 : 1;
my $in_commit_log = 0;  #Scanning lines before patch
my $reported_maintainer_file = 0;
+   my $reported_imported_file = 0;
my $non_utf8_charset = 0;

our @report = ();
@@ -1673,8 +1674,17 @@ sub process {
 # ignore non-hunk lines and lines being removed
next if (!$hunk_line || $line =~ /^-/);

-# ignore files that are being periodically imported from Linux
-   next if ($realfile =~ 
/^(linux-headers|include\/standard-headers)\//);
+# ignore files that are being periodically imported from Linux and emit a 
warning
+   if ($realfile =~ 
/^(linux-headers|include\/standard-headers)\//) {
+   if (!$reported_imported_file) {
+   $reported_imported_file = 1;
+   WARN("added, moved or deleted file(s) " .
+"imported from Linux, are you using " .
+"scripts/update-linux-headers.sh?\n" .
+$herecurr);


This is a good hint, but can we add a further check that is a fatal error,
if the headers are changed in the same commit as non-header changes. When
importing headers, they should only ever be in a self-contained patch
with nothing else touched.


Yep, good point! I'll add that check in v2.

Thanks,
Stefano

[PATCH] scripts/checkpatch: emit a warning if an imported file is touched

2024-07-17 Thread Stefano Garzarella

If a file imported from Linux is touched, emit a warning and suggest
using scripts/update-linux-headers.sh

Signed-off-by: Stefano Garzarella 
---
 scripts/checkpatch.pl | 14 --
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index ff373a7083..b0e8266fa2 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -1374,6 +1374,7 @@ sub process {
my $in_header_lines = $file ? 0 : 1;
my $in_commit_log = 0;  #Scanning lines before patch
my $reported_maintainer_file = 0;
+   my $reported_imported_file = 0;
my $non_utf8_charset = 0;
 
our @report = ();
@@ -1673,8 +1674,17 @@ sub process {
 # ignore non-hunk lines and lines being removed
next if (!$hunk_line || $line =~ /^-/);
 
-# ignore files that are being periodically imported from Linux
-   next if ($realfile =~ 
/^(linux-headers|include\/standard-headers)\//);
+# ignore files that are being periodically imported from Linux and emit a 
warning
+   if ($realfile =~ 
/^(linux-headers|include\/standard-headers)\//) {
+   if (!$reported_imported_file) {
+   $reported_imported_file = 1;
+   WARN("added, moved or deleted file(s) " .
+"imported from Linux, are you using " .
+"scripts/update-linux-headers.sh?\n" .
+$herecurr);
+   }
+   next;
+   }
 
 #trailing whitespace
if ($line =~ /^\+.*\015/) {
-- 
2.45.2

[PATCH] contrib/vhost-user-blk: fix overflowing expression

2024-07-12 Thread Stefano Garzarella

Coverity reported:

  >>> CID 1549454:  Integer handling issues  (OVERFLOW_BEFORE_WIDEN)
  >>> Potentially overflowing expression
  "le32_to_cpu(desc->num_sectors) << 9" with type "uint32_t"
  (32 bits, unsigned) is evaluated using 32-bit arithmetic, and
  then used in a context that expects an expression of type
  "uint64_t" (64 bits, unsigned).
  199   le32_to_cpu(desc->num_sectors) << 9 };

Coverity noticed this issue after commit ab04420c3 ("contrib/vhost-user-*:
use QEMU bswap helper functions"), but it was pre-existing and introduced
from the beginning by commit caa1ee4313 ("vhost-user-blk: add
discard/write zeroes features support").

Explicitly cast the 32-bit value before the shift to fix this issue.

Fixes: Coverity CID 1549454
Fixes: 5ab04420c3 ("contrib/vhost-user-*: use QEMU bswap helper functions")
Fixes: caa1ee4313 ("vhost-user-blk: add discard/write zeroes features support")
Cc: changpeng@intel.com
Suggested-by: Peter Maydell 
Signed-off-by: Stefano Garzarella 
---
 contrib/vhost-user-blk/vhost-user-blk.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/contrib/vhost-user-blk/vhost-user-blk.c 
b/contrib/vhost-user-blk/vhost-user-blk.c
index 9492146855..6cc18a1c04 100644
--- a/contrib/vhost-user-blk/vhost-user-blk.c
+++ b/contrib/vhost-user-blk/vhost-user-blk.c
@@ -196,7 +196,7 @@ vub_discard_write_zeroes(VubReq *req, struct iovec *iov, 
uint32_t iovcnt,
 VubDev *vdev_blk = req->vdev_blk;
 desc = buf;
 uint64_t range[2] = { le64_to_cpu(desc->sector) << 9,
-  le32_to_cpu(desc->num_sectors) << 9 };
+  (uint64_t)le32_to_cpu(desc->num_sectors) << 9 };
 if (type == VIRTIO_BLK_T_DISCARD) {
 if (ioctl(vdev_blk->blk_fd, BLKDISCARD, range) == 0) {
 g_free(buf);
-- 
2.45.2

Re: [PULL v3 52/85] contrib/vhost-user-*: use QEMU bswap helper functions

2024-07-12 Thread Stefano Garzarella


On Fri, Jul 12, 2024 at 03:24:47PM GMT, Peter Maydell wrote:

On Wed, 3 Jul 2024 at 23:48, Michael S. Tsirkin  wrote:


From: Stefano Garzarella 

Let's replace the calls to le*toh() and htole*() with qemu/bswap.h
helpers to make the code more portable.

Suggested-by: Philippe Mathieu-Daudé 
Reviewed-by: Philippe Mathieu-Daudé 
Tested-by: Philippe Mathieu-Daudé 
Acked-by: Stefan Hajnoczi 
Reviewed-by: David Hildenbrand 
Signed-off-by: Stefano Garzarella 
Message-Id: <20240618100447.145697-1-sgarz...@redhat.com>
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 contrib/vhost-user-blk/vhost-user-blk.c |  9 +
 contrib/vhost-user-input/main.c | 16 
 2 files changed, 13 insertions(+), 12 deletions(-)

diff --git a/contrib/vhost-user-blk/vhost-user-blk.c 
b/contrib/vhost-user-blk/vhost-user-blk.c
index a8ab9269a2..9492146855 100644
--- a/contrib/vhost-user-blk/vhost-user-blk.c
+++ b/contrib/vhost-user-blk/vhost-user-blk.c
@@ -16,6 +16,7 @@
  */

 #include "qemu/osdep.h"
+#include "qemu/bswap.h"
 #include "standard-headers/linux/virtio_blk.h"
 #include "libvhost-user-glib.h"

@@ -194,8 +195,8 @@ vub_discard_write_zeroes(VubReq *req, struct iovec *iov, 
uint32_t iovcnt,
 #if defined(__linux__) && defined(BLKDISCARD) && defined(BLKZEROOUT)
 VubDev *vdev_blk = req->vdev_blk;
 desc = buf;
-uint64_t range[2] = { le64toh(desc->sector) << 9,
-  le32toh(desc->num_sectors) << 9 };
+uint64_t range[2] = { le64_to_cpu(desc->sector) << 9,
+  le32_to_cpu(desc->num_sectors) << 9 };


Hi; Coverity points out that this does a 32-bit shift, not a
64-bit one, so it could unintentionally chop the high bits off
if desc->num_sectors is big enough (CID 1549454).
We could fix this by making it
   (uint64_t)le32_to_cpu(desc->num_sectors) << 9
I think.


Yep, I think so! I'll send a patch.



(It looks like the issue was already there before, so


Yes, it is pre-existing to this patch, introduced from the beginning 
with commit caa1ee4313 ("vhost-user-blk: add discard/write zeroes 
features support")



Coverity has just noticed it because of the code change here.)


Ah, I thought it ran on all the code, not just the changes.

Thanks,
Stefano

Re: [PATCH v8 00/13] vhost-user: support any POSIX system (tested on macOS, FreeBSD, OpenBSD)

2024-07-05 Thread Stefano Garzarella


On Wed, Jul 03, 2024 at 06:49:30PM GMT, Michael S. Tsirkin wrote:

On Tue, Jun 18, 2024 at 12:00:30PM +0200, Stefano Garzarella wrote:

As discussed with Michael and Markus [1], this version also includes the patch
on which v7 depended to simplify the merge in Michael's tree.

The series is all reviewed, so if there are no new changes required, I would
ask to merge it.



I dropped patches 9 and 10 for now since otherwise make vm-build-freebsd
fails.

Pls figure it out and resend just 9 and 10.


I replicated locally, but I can't understand why it only happens in 
certain architectures, in my case on loongarch64, ppc64, and riscv32:


326/846 qemu:qtest+qtest-loongarch64 / qtest-loongarch64/qos-test
ERROR  116.10s   killed by signal 6 SIGABRT
337/846 qemu:qtest+qtest-ppc64 / qtest-ppc64/qos-test
ERROR  115.10s   killed by signal 6 SIGABRT
339/846 qemu:qtest+qtest-riscv32 / qtest-riscv32/qos-test
ERROR  107.65s   killed by signal 6 SIGABRT

I focused on ppc64 running `gmake --output-sync -j6 check-qtest-ppc64` 
in the FreeBSD VM and it fails every time. In particular, the test that 
fails is the `vhost-user/reconnect` test, in fact disabling it this way, 
the qos-test tests always pass:


diff --git a/tests/qtest/vhost-user-test.c b/tests/qtest/vhost-user-test.c
index 0fa8951c9f..c3d686f0ee 100644
--- a/tests/qtest/vhost-user-test.c
+++ b/tests/qtest/vhost-user-test.c
@@ -1118,9 +1119,11 @@ static void register_vhost_user_test(void)
  "virtio-net",
  test_migrate, &opts);

+#if 0
 opts.before = vhost_user_test_setup_reconnect;
 qos_add_test("vhost-user/reconnect", "virtio-net",
  test_reconnect, &opts);
+#endif

 opts.before = vhost_user_test_setup_connect_fail;
 qos_add_test("vhost-user/connect-fail", "virtio-net",

Analyzing the test, what happens is that after the disconnection, the 
test doesn't receive VHOST_USER_SET_MEM_TABLE message, so the second 
`wait_for_fds()` fails after the 5 sec timeout (increasing it doesn't 
help), not having received the fds.


diff --git a/tests/qtest/vhost-user-test.c b/tests/qtest/vhost-user-test.c
index 0fa8951c9f..c3d686f0ee 100644
--- a/tests/qtest/vhost-user-test.c
+++ b/tests/qtest/vhost-user-test.c
@@ -976,6 +976,7 @@ static void test_reconnect(void *obj, void *arg, 
QGuestAllocator *alloc)

 g_source_set_callback(src, reconnect_cb, s, NULL);
 g_source_attach(src, s->context);
 g_source_unref(src);
+// THIS one is failing
 g_assert(wait_for_fds(s));
 wait_for_rings_started(s, 2);
 }

This is the test log (note: IIUC QEMU failures happen after the test 
exits on the assertion, so so it could mean that the chardev reconnected 
correctly):


▶ 28/30 
/ppc64/pseries/spapr-pci-host-bridge/pci-bus-spapr/pci-bus/virtio-net-pci/virtio-net/virtio-net-tests/vhost-user/reconnect
 - ERROR:../src/tests/qtest/qos-test.c:191:subprocess_run_one_test: child 
process 
(/ppc64/pseries/spapr-pci-host-bridge/pci-bus-spapr/pci-bus/virtio-net-pci/virtio-net/virtio-net-tests/vhost-user/reconnect/subprocess
 [54991]) failed unexpectedly FAIL
▶ 28/30   
ERROR
[28-30/30] 🌒 qemu:qtest+qtest-ppc64 / qtest-ppc64/qmp-cmd-test  
 [28-30/30] 🌓 qemu:qtest+qtest-ppc64 / qtest-ppc64/migration-test   
  28/30 qemu:qtest+qtest-ppc64 / qtest-ppc64/qos-test   
ERROR   21.53s   killed by signal 6 SIGABRT
>>> PYTHON=/usr/home/qemu/qemu-test.OD8v2L/build/pyvenv/bin/python3.9 
ASAN_OPTIONS=halt_on_error=1:abort_on_error=1:print_summary=1 
G_TEST_DBUS_DAEMON=/usr/home/qemu/qemu-test.OD8v2L/src/tests/dbus-vmstate-daemon.sh 
QTEST_QEMU_BINARY=./qemu-system-ppc64 MALLOC_PERTURB_=141 QTEST_QEMU_IMG=./qemu-img 
QTEST_QEMU_STORAGE_DAEMON_BINARY=./storage-daemon/qemu-storage-daemon 
UBSAN_OPTIONS=halt_on_error=1:abort_on_error=1:print_summary=1:print_stacktrace=1 
/usr/home/qemu/qemu-test.OD8v2L/build/tests/qtest/qos-test --tap -k
 ✀  

stderr:
Vhost user backend fails to broadcast fake RARP
qemu-system-ppc64: -chardev 
socket,id=chr-reconnect,path=/tmp/vhost-test-Z5VMQ2/reconnect.sock,server=on: 
info: QEMU waiting for connection on: 
disconnected:unix:/tmp/vhost-test-Z5VMQ2/reconnect.sock,server=on
**
ERROR:../src/tests/qtest/vhost-user-test.c:255:wait_for_fds: assertion failed: 
(s->fds_num)
qemu-system-ppc64: Failed to set msg fds.
qemu-system-ppc64: vhost VQ 0 ring restore failed: -22: Invalid argument 
(22)
qemu-system-ppc64: Failed to set msg fds.
qemu-system-ppc64: vhost_set_vring_endian failed: Invalid argument (22)
qemu-system-ppc64: Failed to set msg fds.
qemu-system-ppc64: vhost VQ 1 ring restore failed: -22: In

[PATCH] MAINTAINERS: add Stefano Garzarella as vhost/vhost-user reviewer

2024-07-04 Thread Stefano Garzarella

I have recently been working on supporting vhost-user on any POSIX,
so I want to help maintain it.

Cc: Michael S. Tsirkin 
Signed-off-by: Stefano Garzarella 
---
 MAINTAINERS | 1 +
 1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 6725913c8b..47493f19d9 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2207,6 +2207,7 @@ F: docs/devel/vfio-iommufd.rst
 
 vhost
 M: Michael S. Tsirkin 
+R: Stefano Garzarella 
 S: Supported
 F: hw/*/*vhost*
 F: docs/interop/vhost-user.json
-- 
2.45.2

Re: [Bug Report] Possible Missing Endianness Conversion

2024-07-01 Thread Stefano Garzarella


On Fri, Jun 28, 2024 at 03:53:09PM GMT, Peter Maydell wrote:

On Tue, 25 Jun 2024 at 08:18, Stefano Garzarella  wrote:


On Mon, Jun 24, 2024 at 04:19:52PM GMT, Peter Maydell wrote:
>On Mon, 24 Jun 2024 at 16:11, Stefano Garzarella  wrote:
>>
>> CCing Jason.
>>
>> On Mon, Jun 24, 2024 at 4:30 PM Xoykie  wrote:
>> >
>> > The virtio packed virtqueue support patch[1] suggests converting
>> > endianness by lines:
>> >
>> > virtio_tswap16s(vdev, &e->off_wrap);
>> > virtio_tswap16s(vdev, &e->flags);
>> >
>> > Though both of these conversion statements aren't present in the
>> > latest qemu code here[2]
>> >
>> > Is this intentional?
>>
>> Good catch!
>>
>> It looks like it was removed (maybe by mistake) by commit
>> d152cdd6f6 ("virtio: use virtio accessor to access packed event")
>
>That commit changes from:
>
>-address_space_read_cached(cache, off_off, &e->off_wrap,
>-  sizeof(e->off_wrap));
>-virtio_tswap16s(vdev, &e->off_wrap);
>
>which does a byte read of 2 bytes and then swaps the bytes
>depending on the host endianness and the value of
>virtio_access_is_big_endian()
>
>to this:
>
>+e->off_wrap = virtio_lduw_phys_cached(vdev, cache, off_off);
>
>virtio_lduw_phys_cached() is a small function which calls
>either lduw_be_phys_cached() or lduw_le_phys_cached()
>depending on the value of virtio_access_is_big_endian().
>(And lduw_be_phys_cached() and lduw_le_phys_cached() do
>the right thing for the host-endianness to do a "load
>a specifically big or little endian 16-bit value".)
>
>Which is to say that because we use a load/store function that's
>explicit about the size of the data type it is accessing, the
>function itself can handle doing the load as big or little
>endian, rather than the calling code having to do a manual swap after
>it has done a load-as-bag-of-bytes. This is generally preferable
>as it's less error-prone.

Thanks for the details!

So, should we also remove `virtio_tswap16s(vdev, &e->flags);` ?

I mean:
diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 893a072c9d..2e5e67bdb9 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -323,7 +323,6 @@ static void vring_packed_event_read(VirtIODevice *vdev,
  /* Make sure flags is seen before off_wrap */
  smp_rmb();
  e->off_wrap = virtio_lduw_phys_cached(vdev, cache, off_off);
-virtio_tswap16s(vdev, &e->flags);
  }


That definitely looks like it's probably not correct...


Yeah, I just sent that patch: 
https://lore.kernel.org/qemu-devel/20240701075208.19634-1-sgarz...@redhat.com


We can continue the discussion there.

Thanks,
Stefano

[PATCH] virtio: remove virtio_tswap16s() call in vring_packed_event_read()

2024-07-01 Thread Stefano Garzarella

Commit d152cdd6f6 ("virtio: use virtio accessor to access packed event")
switched using of address_space_read_cached() to virito_lduw_phys_cached()
to access packed descriptor event.

When we used address_space_read_cached(), we needed to call
virtio_tswap16s() to handle the endianess of the field, but
virito_lduw_phys_cached() already handles it internally, so we no longer
need to call virtio_tswap16s() (as the commit had done for `off_wrap`,
but forgot for `flags`).

Fixes: d152cdd6f6 ("virtio: use virtio accessor to access packed event")
Cc: jasow...@redhat.com
Cc: qemu-sta...@nongnu.org
Reported-by: Xoykie 
Link: 
https://lore.kernel.org/qemu-devel/cafu8rb_pjr77zmlsm0unf9xpnxfr_--tjr49f_ex32zbc5o...@mail.gmail.com
Signed-off-by: Stefano Garzarella 
---
 hw/virtio/virtio.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 893a072c9d..2e5e67bdb9 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -323,7 +323,6 @@ static void vring_packed_event_read(VirtIODevice *vdev,
 /* Make sure flags is seen before off_wrap */
 smp_rmb();
 e->off_wrap = virtio_lduw_phys_cached(vdev, cache, off_off);
-virtio_tswap16s(vdev, &e->flags);
 }
 
 static void vring_packed_off_wrap_write(VirtIODevice *vdev,
-- 
2.45.2

Re: [PATCH v3 10/15] docs/interop/firmware.json: Add igvm to FirmwareDevice

2024-06-27 Thread Stefano Garzarella


On Fri, Jun 21, 2024 at 03:29:13PM GMT, Roy Hopkins wrote:

Create an enum entry within FirmwareDevice for 'igvm' to describe that
an IGVM file can be used to map firmware into memory as an alternative
to pre-existing firmware devices.

Signed-off-by: Roy Hopkins 
---
docs/interop/firmware.json | 9 -
1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/docs/interop/firmware.json b/docs/interop/firmware.json
index 54a1fc6c10..9a9178606e 100644
--- a/docs/interop/firmware.json
+++ b/docs/interop/firmware.json
@@ -55,10 +55,17 @@
#
# @memory: The firmware is to be mapped into memory.
#
+# @igvm: The firmware is defined by a file conforming to the IGVM
+#specification and mapped into memory according to directives
+#defined in the file. This is similar to @memory but may
+#include additional processing defined by the IGVM file
+#including initial CPU state or population of metadata into
+#the guest address space.


Should we add (Since: 9.1) ?

I'm not sure about that, since I don't see it used much in docs/interop/

Thanks,
Stefano


+#
# Since: 3.0
##
{ 'enum' : 'FirmwareDevice',
-  'data' : [ 'flash', 'kernel', 'memory' ] }
+  'data' : [ 'flash', 'kernel', 'memory', 'igvm' ] }

##
# @FirmwareTarget:
--
2.43.0

Re: [PATCH v3 06/15] sev: Update launch_update_data functions to use Error handling

2024-06-27 Thread Stefano Garzarella


On Fri, Jun 21, 2024 at 03:29:09PM GMT, Roy Hopkins wrote:

The class function and implementations for updating launch data return
a code in case of error. In some cases an error message is generated and
in other cases, just the error return value is used.

This small refactor adds an 'Error **errp' parameter to all functions
which consistently set an error condition if a non-zero value is
returned.

Signed-off-by: Roy Hopkins 
---
target/i386/sev.c | 67 +--
1 file changed, 35 insertions(+), 32 deletions(-)

diff --git a/target/i386/sev.c b/target/i386/sev.c
index 30b83f1d77..1900c3d9b4 100644
--- a/target/i386/sev.c
+++ b/target/i386/sev.c
@@ -121,7 +121,8 @@ struct SevCommonStateClass {
   Error **errp);
int (*launch_start)(SevCommonState *sev_common);
void (*launch_finish)(SevCommonState *sev_common);
-int (*launch_update_data)(SevCommonState *sev_common, hwaddr gpa, uint8_t 
*ptr, uint64_t len);
+int (*launch_update_data)(SevCommonState *sev_common, hwaddr gpa,
+  uint8_t *ptr, uint64_t len, Error **errp);
int (*kvm_init)(ConfidentialGuestSupport *cgs, Error **errp);
};

@@ -942,14 +943,17 @@ out:
return ret;
}

-static int
-sev_launch_update_data(SevCommonState *sev_common, hwaddr gpa, uint8_t *addr, 
uint64_t len)
+static int sev_launch_update_data(SevCommonState *sev_common, hwaddr gpa,
+  uint8_t *addr, uint64_t len, Error **errp)
{
int ret, fw_error;
struct kvm_sev_launch_update_data update;

if (!addr || !len) {
-return 1;


Why were we returning 1 before? Was that a mistake?
Maybe we should mention it in the patch or fix it in another patch.


+error_setg(errp,
+   "%s: Invalid parameters provided for updating 
launch data.",

+   __func__);
+return -1;
}

update.uaddr = (uintptr_t)addr;
@@ -958,8 +962,8 @@ sev_launch_update_data(SevCommonState *sev_common, hwaddr 
gpa, uint8_t *addr, ui
ret = sev_ioctl(sev_common->sev_fd, KVM_SEV_LAUNCH_UPDATE_DATA,
&update, &fw_error);
if (ret) {
-error_report("%s: LAUNCH_UPDATE ret=%d fw_error=%d '%s'",
-__func__, ret, fw_error, fw_error_to_str(fw_error));
+error_setg(errp, "%s: LAUNCH_UPDATE ret=%d fw_error=%d '%s'", __func__,
+   ret, fw_error, fw_error_to_str(fw_error));
}

return ret;
@@ -1087,9 +1091,8 @@ sev_launch_finish(SevCommonState *sev_common)
migrate_add_blocker(&sev_mig_blocker, &error_fatal);
}

-static int
-snp_launch_update_data(uint64_t gpa, void *hva,
-   uint32_t len, int type)
+static int snp_launch_update_data(uint64_t gpa, void *hva, uint32_t len,
+  int type, Error **errp)
{
SevLaunchUpdateData *data;

@@ -1104,13 +1107,12 @@ snp_launch_update_data(uint64_t gpa, void *hva,
return 0;
}

-static int
-sev_snp_launch_update_data(SevCommonState *sev_common, hwaddr gpa,
-   uint8_t *ptr, uint64_t len)
+static int sev_snp_launch_update_data(SevCommonState *sev_common, hwaddr gpa,
+  uint8_t *ptr, uint64_t len, Error **errp)
{
-   int ret = snp_launch_update_data(gpa, ptr, len,
- KVM_SEV_SNP_PAGE_TYPE_NORMAL);
-   return ret;
+int ret = snp_launch_update_data(gpa, ptr, len,
+ KVM_SEV_SNP_PAGE_TYPE_NORMAL, errp);
+return ret;


Pre-existing, but while we're at it maybe we can remove ret.


}

static int
@@ -1162,8 +1164,8 @@ sev_snp_cpuid_info_fill(SnpCpuidInfo *snp_cpuid_info,
return 0;
}

-static int
-snp_launch_update_cpuid(uint32_t cpuid_addr, void *hva, uint32_t cpuid_len)
+static int snp_launch_update_cpuid(uint32_t cpuid_addr, void *hva,
+   uint32_t cpuid_len, Error **errp)
{
KvmCpuidInfo kvm_cpuid_info = {0};
SnpCpuidInfo snp_cpuid_info;
@@ -1180,26 +1182,26 @@ snp_launch_update_cpuid(uint32_t cpuid_addr, void *hva, 
uint32_t cpuid_len)
} while (ret == -E2BIG);

if (ret) {
-error_report("SEV-SNP: unable to query CPUID values for CPU: '%s'",
- strerror(-ret));
-return 1;
+error_setg(errp, "SEV-SNP: unable to query CPUID values for CPU: '%s'",
+   strerror(-ret));
+return -1;
}

ret = sev_snp_cpuid_info_fill(&snp_cpuid_info, &kvm_cpuid_info);
if (ret) {
-error_report("SEV-SNP: failed to generate CPUID table information");
-return 1;
+error_setg(errp, "SEV-SNP: failed to generate CPUID table 
information");
+return -1;


Ditto for the 2 changes, although IIUC we never check the return value 
of snp_launch_update_cpuid().



}

memcpy(hva, &snp_cpuid_info, sizeof(snp_cpuid_info));

return snp_launch_update_data(cpuid_addr, hva, cpuid_le

Re: [PATCH v3 05/15] i386/pc_sysfw: Ensure sysfw flash configuration does not conflict with IGVM

2024-06-27 Thread Stefano Garzarella


On Fri, Jun 21, 2024 at 03:29:08PM GMT, Roy Hopkins wrote:

When using an IGVM file the configuration of the system firmware is
defined by IGVM directives contained in the file. In this case the user
should not configure any pflash devices.

This commit skips initialization of the ROM mode when pflash0 is not set
then checks to ensure no pflash devices have been configured when using
IGVM, exiting with an error message if this is not the case.

Signed-off-by: Roy Hopkins 
---
hw/i386/pc_sysfw.c | 23 +--
1 file changed, 21 insertions(+), 2 deletions(-)

diff --git a/hw/i386/pc_sysfw.c b/hw/i386/pc_sysfw.c
index ef80281d28..39e94ce144 100644
--- a/hw/i386/pc_sysfw.c
+++ b/hw/i386/pc_sysfw.c
@@ -239,8 +239,13 @@ void pc_system_firmware_init(PCMachineState *pcms,
}

if (!pflash_blk[0]) {
-/* Machine property pflash0 not set, use ROM mode */
-x86_bios_rom_init(X86_MACHINE(pcms), "bios.bin", rom_memory, false);


We have the same call, a few lines above if `pci_enabled` is false, 
should we make the same change there as well?



+/*
+ * Machine property pflash0 not set, use ROM mode unless using 
IGVM,

+ * in which case the firmware must be provided by the IGVM file.
+ */
+if (!MACHINE(pcms)->igvm) {
+x86_bios_rom_init(X86_MACHINE(pcms), "bios.bin", rom_memory, 
false);
+}
} else {
if (kvm_enabled() && !kvm_readonly_mem_enabled()) {
/*
@@ -256,6 +261,20 @@ void pc_system_firmware_init(PCMachineState *pcms,
}

pc_system_flash_cleanup_unused(pcms);
+
+/*
+ * The user should not have specified any pflash devices when using IGVM
+ * to configure the guest.
+ */
+if (MACHINE(pcms)->igvm) {
+for (i = 0; i < ARRAY_SIZE(pcms->flash); i++) {
+if (pcms->flash[i]) {
+error_report("pflash devices cannot be configured when "
+ "using IGVM");
+exit(1);
+}
+}
+}
}

void x86_firmware_configure(hwaddr gpa, void *ptr, int size)
--
2.43.0

Re: [PATCH v3 03/15] backends/igvm: Add IGVM loader and configuration

2024-06-27 Thread Stefano Garzarella


On Fri, Jun 21, 2024 at 03:29:06PM GMT, Roy Hopkins wrote:

Adds an IGVM loader to QEMU which processes a given IGVM file and
applies the directives within the file to the current guest
configuration.

The IGVM loader can be used to configure both confidential and
non-confidential guests. For confidential guests, the
ConfidentialGuestSupport object for the system is used to encrypt
memory, apply the initial CPU state and perform other confidential guest
operations.

The loader is configured via a new IgvmCfg QOM object which allows the
user to provide a path to the IGVM file to process.

Signed-off-by: Roy Hopkins 
---
qapi/qom.json |  16 +
backends/igvm.h   |  37 ++
include/sysemu/igvm-cfg.h |  54 +++
backends/igvm-cfg.c   |  66 
backends/igvm.c   | 791 ++
backends/meson.build  |   2 +
6 files changed, 966 insertions(+)
create mode 100644 backends/igvm.h
create mode 100644 include/sysemu/igvm-cfg.h
create mode 100644 backends/igvm-cfg.c
create mode 100644 backends/igvm.c

diff --git a/qapi/qom.json b/qapi/qom.json
index 8bd299265e..e586707c4c 100644
--- a/qapi/qom.json
+++ b/qapi/qom.json
@@ -874,6 +874,18 @@
  'base': 'RngProperties',
  'data': { '*filename': 'str' } }

+##
+# @IgvmCfgProperties:
+#
+# Properties common to objects that handle IGVM files.
+#
+# @file: IGVM file to use to configure guest (default: none)
+#
+# Since: 9.1
+##
+{ 'struct': 'IgvmCfgProperties',
+  'data': { '*file': 'str' } }


'if': 'CONFIG_IGVM'

I recently did a similar modification to QAPIs and Markus suggested to 
add the if here as well, see 
https://lore.kernel.org/qemu-devel/87zfs2z7jo@pond.sub.org/




+
##
# @SevCommonProperties:
#
@@ -1039,6 +1051,8 @@
'filter-redirector',
'filter-replay',
'filter-rewriter',
+{ 'name': 'igvm-cfg',
+  'if': 'CONFIG_IGVM' },
'input-barrier',
{ 'name': 'input-linux',
  'if': 'CONFIG_LINUX' },
@@ -,6 +1125,8 @@
  'filter-redirector':  'FilterRedirectorProperties',
  'filter-replay':  'NetfilterProperties',
  'filter-rewriter':'FilterRewriterProperties',
+  'igvm-cfg':   { 'type': 'IgvmCfgProperties',
+  'if': 'CONFIG_IGVM' },
  'input-barrier':  'InputBarrierProperties',
  'input-linux':{ 'type': 'InputLinuxProperties',
  'if': 'CONFIG_LINUX' },
diff --git a/backends/igvm.h b/backends/igvm.h
new file mode 100644
index 00..3a3824b391
--- /dev/null
+++ b/backends/igvm.h
@@ -0,0 +1,37 @@
+/*
+ * QEMU IGVM configuration backend for Confidential Guests
+ *
+ * Copyright (C) 2023-2024 SUSE
+ *
+ * Authors:
+ *  Roy Hopkins 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#ifndef BACKENDS_IGVM_H
+#define BACKENDS_IGVM_H
+
+#include "exec/confidential-guest-support.h"
+#include "sysemu/igvm-cfg.h"
+#include "qapi/error.h"
+
+#if defined(CONFIG_IGVM)
+
+int igvm_process_file(IgvmCfgState *igvm, ConfidentialGuestSupport *cgs,
+  Error **errp);
+
+#else
+
+static inline int igvm_process_file(IgvmCfgState *igvm,
+ConfidentialGuestSupport *cgs, Error 
**errp)
+{
+error_setg(
+errp, "Invalid call to igvm_process_file when CONFIG_IGVM is 
disabled");
+return -1;
+}
+
+#endif
+
+#endif
diff --git a/include/sysemu/igvm-cfg.h b/include/sysemu/igvm-cfg.h
new file mode 100644
index 00..8ac8b33d8d
--- /dev/null
+++ b/include/sysemu/igvm-cfg.h
@@ -0,0 +1,54 @@
+/*
+ * QEMU IGVM interface
+ *
+ * Copyright (C) 2024 SUSE
+ *
+ * Authors:
+ *  Roy Hopkins 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#ifndef QEMU_IGVM_CFG_H
+#define QEMU_IGVM_CFG_H
+
+#include "qom/object.h"
+
+typedef struct IgvmCfgState {
+ObjectClass parent_class;
+
+/*
+ * filename: Filename that specifies a file that contains the configuration
+ *   of the guest in Independent Guest Virtual Machine (IGVM)
+ *   format.
+ */
+char *filename;
+} IgvmCfgState;
+
+typedef struct IgvmCfgClass {
+ObjectClass parent_class;
+
+/*
+ * If an IGVM filename has been specified then process the IGVM 
file.

+ * Performs a no-op if no filename has been specified.
+ *
+ * Returns 0 for ok and -1 on error.
+ */
+int (*process)(IgvmCfgState *cfg, ConfidentialGuestSupport *cgs,
+   Error **errp);
+
+} IgvmCfgClass;
+
+#define TYPE_IGVM_CFG "igvm-cfg"
+
+#define IGVM_CFG_CLASS_SUFFIX "-" TYPE_IGVM_CFG
+#define IGVM_CFG_CLASS_NAME(a) (a IGVM_CFG_CLASS_SUFFIX)
+
+#define IGVM_CFG_CLASS(klass) \
+OBJECT_CLASS_CHECK(IgvmCfgClass, (klass), TYPE_IGVM_CFG)
+#define IGVM_CFG(obj) OBJECT_CHECK(IgvmCfgState, (ob

Re: [PATCH] qapi/qom: make some QOM properties depend on the build settings

2024-06-25 Thread Stefano Garzarella


Gentle ping :-)

On Tue, Jun 04, 2024 at 03:59:31PM GMT, Stefano Garzarella wrote:

Some QOM properties are associated with ObjectTypes that already
depend on CONFIG_* switches. So to avoid generating dead code,
let's also make the definition of those properties dependent on
the corresponding CONFIG_*.

Suggested-by: Markus Armbruster 
Signed-off-by: Stefano Garzarella 
---
qapi/qom.json | 21 ++---
1 file changed, 14 insertions(+), 7 deletions(-)

diff --git a/qapi/qom.json b/qapi/qom.json
index 38dde6d785..ae93313a60 100644
--- a/qapi/qom.json
+++ b/qapi/qom.json
@@ -222,7 +222,8 @@
##
{ 'struct': 'CanHostSocketcanProperties',
  'data': { 'if': 'str',
-'canbus': 'str' } }
+'canbus': 'str' },
+  'if': 'CONFIG_LINUX' }

##
# @ColoCompareProperties:
@@ -305,7 +306,8 @@
##
{ 'struct': 'CryptodevVhostUserProperties',
  'base': 'CryptodevBackendProperties',
-  'data': { 'chardev': 'str' } }
+  'data': { 'chardev': 'str' },
+  'if': 'CONFIG_VHOST_CRYPTO' }

##
# @DBusVMStateProperties:
@@ -514,7 +516,8 @@
  'data': { 'evdev': 'str',
'*grab_all': 'bool',
'*repeat': 'bool',
-'*grab-toggle': 'GrabToggleKeys' } }
+'*grab-toggle': 'GrabToggleKeys' },
+  'if': 'CONFIG_LINUX' }

##
# @EventLoopBaseProperties:
@@ -719,7 +722,8 @@
  'base': 'MemoryBackendProperties',
  'data': { '*hugetlb': 'bool',
'*hugetlbsize': 'size',
-'*seal': 'bool' } }
+'*seal': 'bool' },
+  'if': 'CONFIG_LINUX' }

##
# @MemoryBackendEpcProperties:
@@ -736,7 +740,8 @@
##
{ 'struct': 'MemoryBackendEpcProperties',
  'base': 'MemoryBackendProperties',
-  'data': {} }
+  'data': {},
+  'if': 'CONFIG_LINUX' }

##
# @PrManagerHelperProperties:
@@ -749,7 +754,8 @@
# Since: 2.11
##
{ 'struct': 'PrManagerHelperProperties',
-  'data': { 'path': 'str' } }
+  'data': { 'path': 'str' },
+  'if': 'CONFIG_LINUX' }

##
# @QtestProperties:
@@ -872,7 +878,8 @@
##
{ 'struct': 'RngRandomProperties',
  'base': 'RngProperties',
-  'data': { '*filename': 'str' } }
+  'data': { '*filename': 'str' },
+  'if': 'CONFIG_POSIX' }

##
# @SevGuestProperties:
--
2.45.1

Re: [Bug Report] Possible Missing Endianness Conversion

2024-06-25 Thread Stefano Garzarella

On Mon, Jun 24, 2024 at 04:19:52PM GMT, Peter Maydell wrote:

On Mon, 24 Jun 2024 at 16:11, Stefano Garzarella  wrote:

CCing Jason.

On Mon, Jun 24, 2024 at 4:30 PM Xoykie  wrote:
>
> The virtio packed virtqueue support patch[1] suggests converting
> endianness by lines:
>
> virtio_tswap16s(vdev, &e->off_wrap);
> virtio_tswap16s(vdev, &e->flags);
>
> Though both of these conversion statements aren't present in the
> latest qemu code here[2]
>
> Is this intentional?

Good catch!

It looks like it was removed (maybe by mistake) by commit
d152cdd6f6 ("virtio: use virtio accessor to access packed event")

That commit changes from:

-address_space_read_cached(cache, off_off, &e->off_wrap,
-  sizeof(e->off_wrap));
-virtio_tswap16s(vdev, &e->off_wrap);

which does a byte read of 2 bytes and then swaps the bytes
depending on the host endianness and the value of
virtio_access_is_big_endian()

to this:

+e->off_wrap = virtio_lduw_phys_cached(vdev, cache, off_off);

virtio_lduw_phys_cached() is a small function which calls
either lduw_be_phys_cached() or lduw_le_phys_cached()
depending on the value of virtio_access_is_big_endian().
(And lduw_be_phys_cached() and lduw_le_phys_cached() do
the right thing for the host-endianness to do a "load
a specifically big or little endian 16-bit value".)

Which is to say that because we use a load/store function that's
explicit about the size of the data type it is accessing, the
function itself can handle doing the load as big or little
endian, rather than the calling code having to do a manual swap after
it has done a load-as-bag-of-bytes. This is generally preferable
as it's less error-prone.

Thanks for the details!

So, should we also remove `virtio_tswap16s(vdev, &e->flags);` ?

I mean:
diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 893a072c9d..2e5e67bdb9 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -323,7 +323,6 @@ static void vring_packed_event_read(VirtIODevice *vdev,
 /* Make sure flags is seen before off_wrap */
 smp_rmb();
 e->off_wrap = virtio_lduw_phys_cached(vdev, cache, off_off);
-virtio_tswap16s(vdev, &e->flags);
 }

 static void vring_packed_off_wrap_write(VirtIODevice *vdev,

Thanks,
Stefano

(Explicit swap-after-loading still has a place where the
code is doing a load of a whole structure out of the
guest and then swapping each struct field after the fact,
because it means we can do a single load-from-guest-memory
rather than a whole sequence of calls all the way down
through the memory subsystem.)

thanks
-- PMM

Re: [Bug Report] Possible Missing Endianness Conversion

2024-06-24 Thread Stefano Garzarella

CCing Jason.

On Mon, Jun 24, 2024 at 4:30 PM Xoykie  wrote:
>
> The virtio packed virtqueue support patch[1] suggests converting
> endianness by lines:
>
> virtio_tswap16s(vdev, &e->off_wrap);
> virtio_tswap16s(vdev, &e->flags);
>
> Though both of these conversion statements aren't present in the
> latest qemu code here[2]
>
> Is this intentional?

Good catch!

It looks like it was removed (maybe by mistake) by commit
d152cdd6f6 ("virtio: use virtio accessor to access packed event")

Jason can you confirm that?

Thanks,
Stefano

>
> [1]: https://mail.gnu.org/archive/html/qemu-block/2019-10/msg01492.html
> [2]: https://elixir.bootlin.com/qemu/latest/source/hw/virtio/virtio.c#L314
>

[PATCH v8 11/13] hostmem: add a new memory backend based on POSIX shm_open()

2024-06-18 Thread Stefano Garzarella

shm_open() creates and opens a new POSIX shared memory object.
A POSIX shared memory object allows creating memory backend with an
associated file descriptor that can be shared with external processes
(e.g. vhost-user).

The new `memory-backend-shm` can be used as an alternative when
`memory-backend-memfd` is not available (Linux only), since shm_open()
should be provided by any POSIX-compliant operating system.

This backend mimics memfd, allocating memory that is practically
anonymous. In theory shm_open() requires a name, but this is allocated
for a short time interval and shm_unlink() is called right after
shm_open(). After that, only fd is shared with external processes
(e.g., vhost-user) as if it were associated with anonymous memory.

In the future we may also allow the user to specify the name to be
passed to shm_open(), but for now we keep the backend simple, mimicking
anonymous memory such as memfd.

Acked-by: David Hildenbrand 
Acked-by: Stefan Hajnoczi 
Acked-by: Markus Armbruster  (QAPI schema)
Signed-off-by: Stefano Garzarella 
---
v8
- Fixed QAPI documentation about share option [Markus]
v7
- changed default value documentation for @share rebasing on
  20240611130231.83152-1-sgarz...@redhat.com [Markus]
- used `memory-backend-shm` instead of `shm` and wrapped to 70 columns
  [Markus]
- added 'if': 'CONFIG_POSIX' to MemoryBackendShmProperties [Markus]
v5
- fixed documentation in qapi/qom.json and qemu-options.hx [Markus]
v4
- fail if we find "share=off" in shm_backend_memory_alloc() [David]
v3
- enriched commit message and documentation to highlight that we
  want to mimic memfd (David)
---
 docs/system/devices/vhost-user.rst |   5 +-
 qapi/qom.json  |  23 +-
 backends/hostmem-shm.c | 123 +
 backends/meson.build   |   1 +
 qemu-options.hx|  16 
 5 files changed, 164 insertions(+), 4 deletions(-)
 create mode 100644 backends/hostmem-shm.c

diff --git a/docs/system/devices/vhost-user.rst 
b/docs/system/devices/vhost-user.rst
index 9b2da106ce..35259d8ec7 100644
--- a/docs/system/devices/vhost-user.rst
+++ b/docs/system/devices/vhost-user.rst
@@ -98,8 +98,9 @@ Shared memory object
 
 In order for the daemon to access the VirtIO queues to process the
 requests it needs access to the guest's address space. This is
-achieved via the ``memory-backend-file`` or ``memory-backend-memfd``
-objects. A reference to a file-descriptor which can access this object
+achieved via the ``memory-backend-file``, ``memory-backend-memfd``, or
+``memory-backend-shm`` objects.
+A reference to a file-descriptor which can access this object
 will be passed via the socket as part of the protocol negotiation.
 
 Currently the shared memory object needs to match the size of the main
diff --git a/qapi/qom.json b/qapi/qom.json
index 9b8f6a7ab5..92b0fea76c 100644
--- a/qapi/qom.json
+++ b/qapi/qom.json
@@ -601,8 +601,8 @@
 #
 # @share: if false, the memory is private to QEMU; if true, it is
 # shared (default false for backends memory-backend-file and
-# memory-backend-ram, true for backends memory-backend-epc and
-# memory-backend-memfd)
+# memory-backend-ram, true for backends memory-backend-epc,
+# memory-backend-memfd, and memory-backend-shm)
 #
 # @reserve: if true, reserve swap space (or huge pages) if applicable
 # (default: true) (since 6.1)
@@ -721,6 +721,21 @@
 '*hugetlbsize': 'size',
 '*seal': 'bool' } }
 
+##
+# @MemoryBackendShmProperties:
+#
+# Properties for memory-backend-shm objects.
+#
+# This memory backend supports only shared memory, which is the
+# default.
+#
+# Since: 9.1
+##
+{ 'struct': 'MemoryBackendShmProperties',
+  'base': 'MemoryBackendProperties',
+  'data': { },
+  'if': 'CONFIG_POSIX' }
+
 ##
 # @MemoryBackendEpcProperties:
 #
@@ -1049,6 +1064,8 @@
 { 'name': 'memory-backend-memfd',
   'if': 'CONFIG_LINUX' },
 'memory-backend-ram',
+{ 'name': 'memory-backend-shm',
+  'if': 'CONFIG_POSIX' },
 'pef-guest',
 { 'name': 'pr-manager-helper',
   'if': 'CONFIG_LINUX' },
@@ -1121,6 +1138,8 @@
   'memory-backend-memfd':   { 'type': 'MemoryBackendMemfdProperties',
   'if': 'CONFIG_LINUX' },
   'memory-backend-ram': 'MemoryBackendProperties',
+  'memory-backend-shm': { 'type': 'MemoryBackendShmProperties',
+  'if': 'CONFIG_POSIX' },
   'pr-manager-helper':  { 'type': 'PrManagerHelperPropert

[PATCH v8 13/13] tests/qtest/vhost-user-test: add a test case for memory-backend-shm

2024-06-18 Thread Stefano Garzarella

`memory-backend-shm` can be used with vhost-user devices, so let's
add a new test case for it.

Acked-by: Thomas Huth 
Acked-by: Stefan Hajnoczi 
Reviewed-by: David Hildenbrand 
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Stefano Garzarella 
---
 tests/qtest/vhost-user-test.c | 23 +++
 1 file changed, 23 insertions(+)

diff --git a/tests/qtest/vhost-user-test.c b/tests/qtest/vhost-user-test.c
index d4e437265f..8c1d903b2a 100644
--- a/tests/qtest/vhost-user-test.c
+++ b/tests/qtest/vhost-user-test.c
@@ -44,6 +44,8 @@
 "mem-path=%s,share=on -numa node,memdev=mem"
 #define QEMU_CMD_MEMFD  " -m %d -object memory-backend-memfd,id=mem,size=%dM," 
\
 " -numa node,memdev=mem"
+#define QEMU_CMD_SHM" -m %d -object memory-backend-shm,id=mem,size=%dM," \
+" -numa node,memdev=mem"
 #define QEMU_CMD_CHR" -chardev socket,id=%s,path=%s%s"
 #define QEMU_CMD_NETDEV " -netdev vhost-user,id=hs0,chardev=%s,vhostforce=on"
 
@@ -195,6 +197,7 @@ enum test_memfd {
 TEST_MEMFD_AUTO,
 TEST_MEMFD_YES,
 TEST_MEMFD_NO,
+TEST_MEMFD_SHM,
 };
 
 static void append_vhost_net_opts(TestServer *s, GString *cmd_line,
@@ -228,6 +231,8 @@ static void append_mem_opts(TestServer *server, GString 
*cmd_line,
 
 if (memfd == TEST_MEMFD_YES) {
 g_string_append_printf(cmd_line, QEMU_CMD_MEMFD, size, size);
+} else if (memfd == TEST_MEMFD_SHM) {
+g_string_append_printf(cmd_line, QEMU_CMD_SHM, size, size);
 } else {
 const char *root = init_hugepagefs() ? : server->tmpfs;
 
@@ -788,6 +793,19 @@ static void *vhost_user_test_setup_memfd(GString 
*cmd_line, void *arg)
 return server;
 }
 
+static void *vhost_user_test_setup_shm(GString *cmd_line, void *arg)
+{
+TestServer *server = test_server_new("vhost-user-test", arg);
+test_server_listen(server);
+
+append_mem_opts(server, cmd_line, 256, TEST_MEMFD_SHM);
+server->vu_ops->append_opts(server, cmd_line, "");
+
+g_test_queue_destroy(vhost_user_test_cleanup, server);
+
+return server;
+}
+
 static void test_read_guest_mem(void *obj, void *arg, QGuestAllocator *alloc)
 {
 TestServer *server = arg;
@@ -1081,6 +1099,11 @@ static void register_vhost_user_test(void)
  "virtio-net",
  test_read_guest_mem, &opts);
 
+opts.before = vhost_user_test_setup_shm;
+qos_add_test("vhost-user/read-guest-mem/shm",
+ "virtio-net",
+ test_read_guest_mem, &opts);
+
 if (qemu_memfd_check(MFD_ALLOW_SEALING)) {
 opts.before = vhost_user_test_setup_memfd;
 qos_add_test("vhost-user/read-guest-mem/memfd",
-- 
2.45.2

[PATCH v8 12/13] tests/qtest/vhost-user-blk-test: use memory-backend-shm

2024-06-18 Thread Stefano Garzarella

`memory-backend-memfd` is available only on Linux while the new
`memory-backend-shm` can be used on any POSIX-compliant operating
system. Let's use it so we can run the test in multiple environments.

Since we are here, let`s remove `share=on` which is the default for shm
(and also for memfd).

Acked-by: Thomas Huth 
Acked-by: Stefan Hajnoczi 
Reviewed-by: Philippe Mathieu-Daudé 
Tested-by: Philippe Mathieu-Daudé 
Reviewed-by: David Hildenbrand 
Signed-off-by: Stefano Garzarella 
---
v6
- removed `share=on` since it's the default [David]
---
 tests/qtest/vhost-user-blk-test.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/qtest/vhost-user-blk-test.c 
b/tests/qtest/vhost-user-blk-test.c
index 117b9acd10..ea90d41232 100644
--- a/tests/qtest/vhost-user-blk-test.c
+++ b/tests/qtest/vhost-user-blk-test.c
@@ -906,7 +906,7 @@ static void start_vhost_user_blk(GString *cmd_line, int 
vus_instances,
vhost_user_blk_bin);
 
 g_string_append_printf(cmd_line,
-" -object memory-backend-memfd,id=mem,size=256M,share=on "
+" -object memory-backend-shm,id=mem,size=256M "
 " -M memory-backend=mem -m 256M ");
 
 for (i = 0; i < vus_instances; i++) {
-- 
2.45.2

[PATCH v8 09/13] libvhost-user: enable it on any POSIX system

2024-06-18 Thread Stefano Garzarella

The vhost-user protocol is not really Linux-specific so let's enable
libvhost-user for any POSIX system.

Compiling it on macOS and FreeBSD some problems came up:
- avoid to include linux/vhost.h which is available only on Linux
  (vhost_types.h contains many of the things we need)
- macOS doesn't provide sys/endian.h, so let's define them
  (note: libvhost-user doesn't include QEMU's headers, so we can't use
   use "qemu/bswap.h")
- define eventfd_[write|read] as write/read wrapper when system doesn't
  provide those (e.g. macOS)
- copy SEAL defines from include/qemu/memfd.h to make the code works
  on FreeBSD where MFD_ALLOW_SEALING is defined
- define MAP_NORESERVE if it's not defined (e.g. on FreeBSD)

Reviewed-by: Philippe Mathieu-Daudé 
Tested-by: Philippe Mathieu-Daudé 
Acked-by: Stefan Hajnoczi 
Signed-off-by: Stefano Garzarella 
---
v5:
- fixed typos in the commit description [Phil]
---
 meson.build   |  2 +-
 subprojects/libvhost-user/libvhost-user.h |  2 +-
 subprojects/libvhost-user/libvhost-user.c | 60 +--
 3 files changed, 59 insertions(+), 5 deletions(-)

diff --git a/meson.build b/meson.build
index 2ba95a8c35..487153a431 100644
--- a/meson.build
+++ b/meson.build
@@ -3190,7 +3190,7 @@ if have_system and vfio_user_server_allowed
 endif
 
 vhost_user = not_found
-if host_os == 'linux' and have_vhost_user
+if have_vhost_user
   libvhost_user = subproject('libvhost-user')
   vhost_user = libvhost_user.get_variable('vhost_user_dep')
 endif
diff --git a/subprojects/libvhost-user/libvhost-user.h 
b/subprojects/libvhost-user/libvhost-user.h
index deb40e77b3..e13e1d3931 100644
--- a/subprojects/libvhost-user/libvhost-user.h
+++ b/subprojects/libvhost-user/libvhost-user.h
@@ -18,9 +18,9 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include "standard-headers/linux/virtio_ring.h"
+#include "standard-headers/linux/vhost_types.h"
 
 /* Based on qemu/hw/virtio/vhost-user.c */
 #define VHOST_USER_F_PROTOCOL_FEATURES 30
diff --git a/subprojects/libvhost-user/libvhost-user.c 
b/subprojects/libvhost-user/libvhost-user.c
index 2c20cdc16e..57e58d4adb 100644
--- a/subprojects/libvhost-user/libvhost-user.c
+++ b/subprojects/libvhost-user/libvhost-user.c
@@ -28,9 +28,7 @@
 #include 
 #include 
 #include 
-#include 
 #include 
-#include 
 
 /* Necessary to provide VIRTIO_F_VERSION_1 on system
  * with older linux headers. Must appear before
@@ -39,8 +37,8 @@
 #include "standard-headers/linux/virtio_config.h"
 
 #if defined(__linux__)
+#include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -52,6 +50,62 @@
 
 #endif
 
+#if defined(__APPLE__) && (__MACH__)
+#include 
+#define htobe16(x) OSSwapHostToBigInt16(x)
+#define htole16(x) OSSwapHostToLittleInt16(x)
+#define be16toh(x) OSSwapBigToHostInt16(x)
+#define le16toh(x) OSSwapLittleToHostInt16(x)
+
+#define htobe32(x) OSSwapHostToBigInt32(x)
+#define htole32(x) OSSwapHostToLittleInt32(x)
+#define be32toh(x) OSSwapBigToHostInt32(x)
+#define le32toh(x) OSSwapLittleToHostInt32(x)
+
+#define htobe64(x) OSSwapHostToBigInt64(x)
+#define htole64(x) OSSwapHostToLittleInt64(x)
+#define be64toh(x) OSSwapBigToHostInt64(x)
+#define le64toh(x) OSSwapLittleToHostInt64(x)
+#endif
+
+#ifdef CONFIG_EVENTFD
+#include 
+#else
+#define eventfd_t uint64_t
+
+int eventfd_write(int fd, eventfd_t value)
+{
+return (write(fd, &value, sizeof(value)) == sizeof(value)) ? 0 : -1;
+}
+
+int eventfd_read(int fd, eventfd_t *value)
+{
+return (read(fd, value, sizeof(*value)) == sizeof(*value)) ? 0 : -1;
+}
+#endif
+
+#ifdef MFD_ALLOW_SEALING
+#include 
+
+#ifndef F_LINUX_SPECIFIC_BASE
+#define F_LINUX_SPECIFIC_BASE 1024
+#endif
+
+#ifndef F_ADD_SEALS
+#define F_ADD_SEALS (F_LINUX_SPECIFIC_BASE + 9)
+#define F_GET_SEALS (F_LINUX_SPECIFIC_BASE + 10)
+
+#define F_SEAL_SEAL 0x0001  /* prevent further seals from being set */
+#define F_SEAL_SHRINK   0x0002  /* prevent file from shrinking */
+#define F_SEAL_GROW 0x0004  /* prevent file from growing */
+#define F_SEAL_WRITE0x0008  /* prevent writes */
+#endif
+#endif
+
+#ifndef MAP_NORESERVE
+#define MAP_NORESERVE 0
+#endif
+
 #include "include/atomic.h"
 
 #include "libvhost-user.h"
-- 
2.45.2

[PATCH v8 10/13] contrib/vhost-user-blk: enable it on any POSIX system

2024-06-18 Thread Stefano Garzarella

Let's make the code more portable by adding defines from
block/file-posix.c to support O_DIRECT in other systems (e.g. macOS).

vhost-user-server.c is a dependency, let's enable it for any POSIX
system.

Acked-by: Stefan Hajnoczi 
Reviewed-by: Philippe Mathieu-Daudé 
Tested-by: Philippe Mathieu-Daudé 
Signed-off-by: Stefano Garzarella 
---
v6:
- reverted v5 changes since we can't move O_DSYNC and O_DIRECT in osdep
  [Daniel, failing tests on Windows]
v5:
- O_DSYNC and O_DIRECT definition are now in osdep [Phil]
- commit updated since we moved out all code changes
v4:
- moved using of "qemu/bswap.h" API in a separate patch [Phil]
---
 meson.build |  2 --
 contrib/vhost-user-blk/vhost-user-blk.c | 14 ++
 util/meson.build|  4 +++-
 3 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/meson.build b/meson.build
index 487153a431..f86ce92364 100644
--- a/meson.build
+++ b/meson.build
@@ -2001,8 +2001,6 @@ has_statx = cc.has_header_symbol('sys/stat.h', 
'STATX_BASIC_STATS', prefix: gnu_
 has_statx_mnt_id = cc.has_header_symbol('sys/stat.h', 'STATX_MNT_ID', prefix: 
gnu_source_prefix)
 
 have_vhost_user_blk_server = get_option('vhost_user_blk_server') \
-  .require(host_os == 'linux',
-   error_message: 'vhost_user_blk_server requires linux') \
   .require(have_vhost_user,
error_message: 'vhost_user_blk_server requires vhost-user support') 
\
   .disable_auto_if(not have_tools and not have_system) \
diff --git a/contrib/vhost-user-blk/vhost-user-blk.c 
b/contrib/vhost-user-blk/vhost-user-blk.c
index 9492146855..a450337685 100644
--- a/contrib/vhost-user-blk/vhost-user-blk.c
+++ b/contrib/vhost-user-blk/vhost-user-blk.c
@@ -25,6 +25,20 @@
 #include 
 #endif
 
+/* OS X does not have O_DSYNC */
+#ifndef O_DSYNC
+#ifdef O_SYNC
+#define O_DSYNC O_SYNC
+#elif defined(O_FSYNC)
+#define O_DSYNC O_FSYNC
+#endif
+#endif
+
+/* Approximate O_DIRECT with O_DSYNC if O_DIRECT isn't available */
+#ifndef O_DIRECT
+#define O_DIRECT O_DSYNC
+#endif
+
 enum {
 VHOST_USER_BLK_MAX_QUEUES = 8,
 };
diff --git a/util/meson.build b/util/meson.build
index 72b505df11..c414178ace 100644
--- a/util/meson.build
+++ b/util/meson.build
@@ -112,10 +112,12 @@ if have_block
 util_ss.add(files('filemonitor-stub.c'))
   endif
   if host_os == 'linux'
-util_ss.add(files('vhost-user-server.c'), vhost_user)
 util_ss.add(files('vfio-helpers.c'))
 util_ss.add(files('chardev_open.c'))
   endif
+  if host_os != 'windows'
+util_ss.add(files('vhost-user-server.c'), vhost_user)
+  endif
   util_ss.add(files('yank.c'))
 endif
 
-- 
2.45.2

[PATCH v8 07/13] contrib/vhost-user-*: use QEMU bswap helper functions

2024-06-18 Thread Stefano Garzarella

Let's replace the calls to le*toh() and htole*() with qemu/bswap.h
helpers to make the code more portable.

Suggested-by: Philippe Mathieu-Daudé 
Reviewed-by: Philippe Mathieu-Daudé 
Tested-by: Philippe Mathieu-Daudé 
Acked-by: Stefan Hajnoczi 
Reviewed-by: David Hildenbrand 
Signed-off-by: Stefano Garzarella 
---
 contrib/vhost-user-blk/vhost-user-blk.c |  9 +
 contrib/vhost-user-input/main.c | 16 
 2 files changed, 13 insertions(+), 12 deletions(-)

diff --git a/contrib/vhost-user-blk/vhost-user-blk.c 
b/contrib/vhost-user-blk/vhost-user-blk.c
index a8ab9269a2..9492146855 100644
--- a/contrib/vhost-user-blk/vhost-user-blk.c
+++ b/contrib/vhost-user-blk/vhost-user-blk.c
@@ -16,6 +16,7 @@
  */
 
 #include "qemu/osdep.h"
+#include "qemu/bswap.h"
 #include "standard-headers/linux/virtio_blk.h"
 #include "libvhost-user-glib.h"
 
@@ -194,8 +195,8 @@ vub_discard_write_zeroes(VubReq *req, struct iovec *iov, 
uint32_t iovcnt,
 #if defined(__linux__) && defined(BLKDISCARD) && defined(BLKZEROOUT)
 VubDev *vdev_blk = req->vdev_blk;
 desc = buf;
-uint64_t range[2] = { le64toh(desc->sector) << 9,
-  le32toh(desc->num_sectors) << 9 };
+uint64_t range[2] = { le64_to_cpu(desc->sector) << 9,
+  le32_to_cpu(desc->num_sectors) << 9 };
 if (type == VIRTIO_BLK_T_DISCARD) {
 if (ioctl(vdev_blk->blk_fd, BLKDISCARD, range) == 0) {
 g_free(buf);
@@ -267,13 +268,13 @@ static int vub_virtio_process_req(VubDev *vdev_blk,
 req->in = (struct virtio_blk_inhdr *)elem->in_sg[in_num - 1].iov_base;
 in_num--;
 
-type = le32toh(req->out->type);
+type = le32_to_cpu(req->out->type);
 switch (type & ~VIRTIO_BLK_T_BARRIER) {
 case VIRTIO_BLK_T_IN:
 case VIRTIO_BLK_T_OUT: {
 ssize_t ret = 0;
 bool is_write = type & VIRTIO_BLK_T_OUT;
-req->sector_num = le64toh(req->out->sector);
+req->sector_num = le64_to_cpu(req->out->sector);
 if (is_write) {
 ret  = vub_writev(req, &elem->out_sg[1], out_num);
 } else {
diff --git a/contrib/vhost-user-input/main.c b/contrib/vhost-user-input/main.c
index 081230da54..f3362d41ac 100644
--- a/contrib/vhost-user-input/main.c
+++ b/contrib/vhost-user-input/main.c
@@ -51,8 +51,8 @@ static void vi_input_send(VuInput *vi, struct 
virtio_input_event *event)
 vi->queue[vi->qindex++].event = *event;
 
 /* ... until we see a report sync ... */
-if (event->type != htole16(EV_SYN) ||
-event->code != htole16(SYN_REPORT)) {
+if (event->type != cpu_to_le16(EV_SYN) ||
+event->code != cpu_to_le16(SYN_REPORT)) {
 return;
 }
 
@@ -103,9 +103,9 @@ vi_evdev_watch(VuDev *dev, int condition, void *data)
 
 g_debug("input %d %d %d", evdev.type, evdev.code, evdev.value);
 
-virtio.type  = htole16(evdev.type);
-virtio.code  = htole16(evdev.code);
-virtio.value = htole32(evdev.value);
+virtio.type  = cpu_to_le16(evdev.type);
+virtio.code  = cpu_to_le16(evdev.code);
+virtio.value = cpu_to_le32(evdev.value);
 vi_input_send(vi, &virtio);
 }
 }
@@ -124,9 +124,9 @@ static void vi_handle_status(VuInput *vi, 
virtio_input_event *event)
 
 evdev.input_event_sec = tval.tv_sec;
 evdev.input_event_usec = tval.tv_usec;
-evdev.type = le16toh(event->type);
-evdev.code = le16toh(event->code);
-evdev.value = le32toh(event->value);
+evdev.type = le16_to_cpu(event->type);
+evdev.code = le16_to_cpu(event->code);
+evdev.value = le32_to_cpu(event->value);
 
 rc = write(vi->evdevfd, &evdev, sizeof(evdev));
 if (rc == -1) {
-- 
2.45.2

[PATCH v8 06/13] contrib/vhost-user-blk: fix bind() using the right size of the address

2024-06-18 Thread Stefano Garzarella

On macOS passing `-s /tmp/vhost.socket` parameter to the vhost-user-blk
application, the bind was done on `/tmp/vhost.socke` pathname,
missing the last character.

This sounds like one of the portability problems described in the
unix(7) manpage:

   Pathname sockets
   When  binding  a socket to a pathname, a few rules should
   be observed for maximum portability and ease of coding:

   •  The pathname in sun_path should be null-terminated.

   •  The length of the pathname, including the  terminating
  null byte, should not exceed the size of sun_path.

   •  The  addrlen  argument  that  describes  the enclosing
  sockaddr_un structure should have a value of at least:

  offsetof(struct sockaddr_un, sun_path) +
  strlen(addr.sun_path)+1

  or,  more  simply,  addrlen  can   be   specified   as
  sizeof(struct sockaddr_un).

So let's follow the last advice and simplify the code as well.

Reviewed-by: Philippe Mathieu-Daudé 
Tested-by: Philippe Mathieu-Daudé 
Acked-by: Stefan Hajnoczi 
Reviewed-by: David Hildenbrand 
Signed-off-by: Stefano Garzarella 
---
 contrib/vhost-user-blk/vhost-user-blk.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/contrib/vhost-user-blk/vhost-user-blk.c 
b/contrib/vhost-user-blk/vhost-user-blk.c
index 89e5f11a64..a8ab9269a2 100644
--- a/contrib/vhost-user-blk/vhost-user-blk.c
+++ b/contrib/vhost-user-blk/vhost-user-blk.c
@@ -469,7 +469,6 @@ static int unix_sock_new(char *unix_fn)
 {
 int sock;
 struct sockaddr_un un;
-size_t len;
 
 assert(unix_fn);
 
@@ -481,10 +480,9 @@ static int unix_sock_new(char *unix_fn)
 
 un.sun_family = AF_UNIX;
 (void)snprintf(un.sun_path, sizeof(un.sun_path), "%s", unix_fn);
-len = sizeof(un.sun_family) + strlen(un.sun_path);
 
 (void)unlink(unix_fn);
-if (bind(sock, (struct sockaddr *)&un, len) < 0) {
+if (bind(sock, (struct sockaddr *)&un, sizeof(un)) < 0) {
 perror("bind");
 goto fail;
 }
-- 
2.45.2

[PATCH v8 08/13] vhost-user: enable frontends on any POSIX system

2024-06-18 Thread Stefano Garzarella

The vhost-user protocol is not really Linux-specific so let's enable
vhost-user frontends for any POSIX system.

In vhost_net.c we use VHOST_FILE_UNBIND which is defined in a Linux
specific header, let's define it for other systems as well.

Reviewed-by: Philippe Mathieu-Daudé 
Tested-by: Philippe Mathieu-Daudé 
Acked-by: Stefan Hajnoczi 
Reviewed-by: David Hildenbrand 
Signed-off-by: Stefano Garzarella 
---
 meson.build| 1 -
 hw/net/vhost_net.c | 5 +
 hw/block/Kconfig   | 2 +-
 3 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/meson.build b/meson.build
index 97e00d6f59..2ba95a8c35 100644
--- a/meson.build
+++ b/meson.build
@@ -151,7 +151,6 @@ have_tpm = get_option('tpm') \
 
 # vhost
 have_vhost_user = get_option('vhost_user') \
-  .disable_auto_if(host_os != 'linux') \
   .require(host_os != 'windows',
error_message: 'vhost-user is not available on Windows').allowed()
 have_vhost_vdpa = get_option('vhost_vdpa') \
diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index fd1a93701a..fced429813 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -34,8 +34,13 @@
 #include "standard-headers/linux/virtio_ring.h"
 #include "hw/virtio/vhost.h"
 #include "hw/virtio/virtio-bus.h"
+#if defined(__linux__)
 #include "linux-headers/linux/vhost.h"
+#endif
 
+#ifndef VHOST_FILE_UNBIND
+#define VHOST_FILE_UNBIND -1
+#endif
 
 /* Features supported by host kernel. */
 static const int kernel_feature_bits[] = {
diff --git a/hw/block/Kconfig b/hw/block/Kconfig
index 9e8f28f982..29ee09e434 100644
--- a/hw/block/Kconfig
+++ b/hw/block/Kconfig
@@ -40,7 +40,7 @@ config VHOST_USER_BLK
 bool
 # Only PCI devices are provided for now
 default y if VIRTIO_PCI
-depends on VIRTIO && VHOST_USER && LINUX
+depends on VIRTIO && VHOST_USER
 
 config SWIM
 bool
-- 
2.45.2

[PATCH v8 05/13] vhost-user-server: do not set memory fd non-blocking

2024-06-18 Thread Stefano Garzarella

In vhost-user-server we set all fd received from the other peer
in non-blocking mode. For some of them (e.g. memfd, shm_open, etc.)
it's not really needed, because we don't use these fd with blocking
operations, but only to map memory.

In addition, in some systems this operation can fail (e.g. in macOS
setting an fd returned by shm_open() non-blocking fails with errno
= ENOTTY).

So, let's avoid setting fd non-blocking for those messages that we
know carry memory fd (e.g. VHOST_USER_ADD_MEM_REG,
VHOST_USER_SET_MEM_TABLE).

Reviewed-by: Daniel P. Berrangé 
Acked-by: Stefan Hajnoczi 
Reviewed-by: David Hildenbrand 
Signed-off-by: Stefano Garzarella 
---
v3:
- avoiding setting fd non-blocking for messages where we have memory fd
  (Eric)
---
 util/vhost-user-server.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/util/vhost-user-server.c b/util/vhost-user-server.c
index 3bfb1ad3ec..b19229074a 100644
--- a/util/vhost-user-server.c
+++ b/util/vhost-user-server.c
@@ -65,6 +65,18 @@ static void vmsg_close_fds(VhostUserMsg *vmsg)
 static void vmsg_unblock_fds(VhostUserMsg *vmsg)
 {
 int i;
+
+/*
+ * These messages carry fd used to map memory, not to send/receive 
messages,
+ * so this operation is useless. In addition, in some systems this
+ * operation can fail (e.g. in macOS setting an fd returned by shm_open()
+ * non-blocking fails with errno = ENOTTY)
+ */
+if (vmsg->request == VHOST_USER_ADD_MEM_REG ||
+vmsg->request == VHOST_USER_SET_MEM_TABLE) {
+return;
+}
+
 for (i = 0; i < vmsg->fd_num; i++) {
 qemu_socket_set_nonblock(vmsg->fds[i]);
 }
-- 
2.45.2

[PATCH v8 03/13] libvhost-user: fail vu_message_write() if sendmsg() is failing

2024-06-18 Thread Stefano Garzarella

In vu_message_write() we use sendmsg() to send the message header,
then a write() to send the payload.

If sendmsg() fails we should avoid sending the payload, since we
were unable to send the header.

Discovered before fixing the issue with the previous patch, where
sendmsg() failed on macOS due to wrong parameters, but the frontend
still sent the payload which the backend incorrectly interpreted
as a wrong header.

Reviewed-by: Daniel P. Berrangé 
Reviewed-by: Philippe Mathieu-Daudé 
Tested-by: Philippe Mathieu-Daudé 
Acked-by: Stefan Hajnoczi 
Reviewed-by: David Hildenbrand 
Signed-off-by: Stefano Garzarella 
---
 subprojects/libvhost-user/libvhost-user.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/subprojects/libvhost-user/libvhost-user.c 
b/subprojects/libvhost-user/libvhost-user.c
index 22bea0c775..a11afd1960 100644
--- a/subprojects/libvhost-user/libvhost-user.c
+++ b/subprojects/libvhost-user/libvhost-user.c
@@ -639,6 +639,11 @@ vu_message_write(VuDev *dev, int conn_fd, VhostUserMsg 
*vmsg)
 rc = sendmsg(conn_fd, &msg, 0);
 } while (rc < 0 && (errno == EINTR || errno == EAGAIN));
 
+if (rc <= 0) {
+vu_panic(dev, "Error while writing: %s", strerror(errno));
+return false;
+}
+
 if (vmsg->size) {
 do {
 if (vmsg->data) {
-- 
2.45.2

[PATCH v8 04/13] libvhost-user: mask F_INFLIGHT_SHMFD if memfd is not supported

2024-06-18 Thread Stefano Garzarella

libvhost-user will panic when receiving VHOST_USER_GET_INFLIGHT_FD
message if MFD_ALLOW_SEALING is not defined, since it's not able
to create a memfd.

VHOST_USER_GET_INFLIGHT_FD is used only if
VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD is negotiated. So, let's mask
that feature if the backend is not able to properly handle these
messages.

Reviewed-by: Philippe Mathieu-Daudé 
Tested-by: Philippe Mathieu-Daudé 
Acked-by: Stefan Hajnoczi 
Reviewed-by: David Hildenbrand 
Signed-off-by: Stefano Garzarella 
---
 subprojects/libvhost-user/libvhost-user.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/subprojects/libvhost-user/libvhost-user.c 
b/subprojects/libvhost-user/libvhost-user.c
index a11afd1960..2c20cdc16e 100644
--- a/subprojects/libvhost-user/libvhost-user.c
+++ b/subprojects/libvhost-user/libvhost-user.c
@@ -1674,6 +1674,17 @@ vu_get_protocol_features_exec(VuDev *dev, VhostUserMsg 
*vmsg)
 features |= dev->iface->get_protocol_features(dev);
 }
 
+#ifndef MFD_ALLOW_SEALING
+/*
+ * If MFD_ALLOW_SEALING is not defined, we are not able to handle
+ * VHOST_USER_GET_INFLIGHT_FD messages, since we can't create a memfd.
+ * Those messages are used only if VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD
+ * is negotiated. A device implementation can enable it, so let's mask
+ * it to avoid a runtime panic.
+ */
+features &= ~(1ULL << VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD);
+#endif
+
 vmsg_set_reply_u64(vmsg, features);
 return true;
 }
-- 
2.45.2

[PATCH v8 00/13] vhost-user: support any POSIX system (tested on macOS, FreeBSD, OpenBSD)

2024-06-18 Thread Stefano Garzarella

As discussed with Michael and Markus [1], this version also includes the patch
on which v7 depended to simplify the merge in Michael's tree.

The series is all reviewed, so if there are no new changes required, I would
ask to merge it.

[1] 
https://patchew.org/QEMU/20240612130140.63004-1-sgarz...@redhat.com/#vabzv4z6g3dd5yndvpmwktcfgbqrdg7qk2e5se6zuflrhss723@dws4vrzen6cs

Thanks,
Stefano

Changelog

v1: https://patchew.org/QEMU/20240228114759.44758-1-sgarz...@redhat.com/
v2: https://patchew.org/QEMU/20240326133936.125332-1-sgarz...@redhat.com/
v3: https://patchew.org/QEMU/20240404122330.92710-1-sgarz...@redhat.com/
v4: https://patchew.org/QEMU/20240508074457.12367-1-sgarz...@redhat.com/
v5: https://patchew.org/QEMU/20240523145522.313012-1-sgarz...@redhat.com/
v6: https://patchew.org/QEMU/20240528103543.145412-1-sgarz...@redhat.com/
v7: https://patchew.org/QEMU/20240612130140.63004-1-sgarz...@redhat.com/
v8:
- Included the dependent patch in this series
  https://patchew.org/QEMU/20240611130231.83152-1-sgarz...@redhat.com/
- Fixed QAPI documentation about share option [Markus]

Description

The vhost-user protocol is not really Linux-specific, so let's try support
QEMU's frontends and backends (including libvhost-user) in any POSIX system
with this series. The main use case is to be able to use virtio devices that
we don't have built-in in QEMU (e.g. virtiofsd, vhost-user-vsock, etc.) even
in non-Linux systems.

The first 5 patches are more like fixes discovered at runtime on macOS or
FreeBSD that could go even independently of this series.

Patches 6, 7, 8, 9 enable building of frontends and backends (including
libvhost-user) with associated code changes to succeed in compilation.

Patch 10 adds `memory-backend-shm` that uses the POSIX shm_open() API to
create shared memory which is identified by an fd that can be shared with
vhost-user backends. This is useful on those systems (like macOS) where
we don't have memfd_create() or special filesystems like "/dev/shm".

Patches 11 and 12 use `memory-backend-shm` in some vhost-user tests.

Maybe the first 5 patches can go separately, but I only discovered those
problems after testing patches 6 - 9, so I have included them in this series
for now. Please let me know if you prefer that I send them separately.

I tested this series using vhost-user-blk and QSD on macOS Sonoma 14.4
(aarch64), FreeBSD 14 (x86_64), OpenBSD 7.4 (x86_64), and Fedora 40 (x86_64)
in this way:

- Start vhost-user-blk or QSD (same commands for all systems)

  vhost-user-blk -s /tmp/vhost.socket \
-b Fedora-Cloud-Base-39-1.5.x86_64.raw

  qemu-storage-daemon \
--blockdev 
file,filename=Fedora-Cloud-Base-39-1.5.x86_64.qcow2,node-name=file \
--blockdev qcow2,file=file,node-name=qcow2 \
--export 
vhost-user-blk,addr.type=unix,addr.path=/tmp/vhost.socket,id=vub,num-queues=1,node-name=qcow2,writable=on

- macOS (aarch64): start QEMU (using hvf accelerator)

  qemu-system-aarch64 -smp 2 -cpu host -M virt,accel=hvf,memory-backend=mem \
-drive 
file=./build/pc-bios/edk2-aarch64-code.fd,if=pflash,format=raw,readonly=on \
-device virtio-net-device,netdev=net0 -netdev user,id=net0 \
-device ramfb -device usb-ehci -device usb-kbd \
-object memory-backend-shm,id=mem,size=512M \
-device vhost-user-blk-pci,num-queues=1,disable-legacy=on,chardev=char0 \
-chardev socket,id=char0,path=/tmp/vhost.socket

- FreeBSD/OpenBSD (x86_64): start QEMU (no accelerators available)

  qemu-system-x86_64 -smp 2 -M q35,memory-backend=mem \
-object memory-backend-shm,id=mem,size="512M" \
-device vhost-user-blk-pci,num-queues=1,chardev=char0 \
-chardev socket,id=char0,path=/tmp/vhost.socket

- Fedora (x86_64): start QEMU (using kvm accelerator)

  qemu-system-x86_64 -smp 2 -M q35,accel=kvm,memory-backend=mem \
-object memory-backend-shm,size="512M" \
-device vhost-user-blk-pci,num-queues=1,chardev=char0 \
-chardev socket,id=char0,path=/tmp/vhost.socket

Branch pushed (and CI started) at 
https://gitlab.com/sgarzarella/qemu/-/tree/macos-vhost-user?ref_type=heads

Stefano Garzarella (13):
  qapi: clarify that the default is backend dependent
  libvhost-user: set msg.msg_control to NULL when it is empty
  libvhost-user: fail vu_message_write() if sendmsg() is failing
  libvhost-user: mask F_INFLIGHT_SHMFD if memfd is not supported
  vhost-user-server: do not set memory fd non-blocking
  contrib/vhost-user-blk: fix bind() using the right size of the address
  contrib/vhost-user-*: use QEMU bswap helper functions
  vhost-user: enable frontends on any POSIX system
  libvhost-user: enable it on any POSIX system
  contrib/vhost-user-blk: enable it on any POSIX system
  hostmem: add a new memory backend based on POSIX shm_open()
  tests/qtest/vhost-user-blk-test: use memory-backend-shm
  tests/qtest/vhost-user-test: add a test case for memory-backend-shm

 docs/system/devices/vhost-user.rst|   5 +-
 m

[PATCH v8 01/13] qapi: clarify that the default is backend dependent

2024-06-18 Thread Stefano Garzarella

The default value of the @share option of the @MemoryBackendProperties
really depends on the backend type, so let's document the default
values in the same place where we define the option to avoid
dispersing the information.

Cc: David Hildenbrand 
Suggested-by: Markus Armbruster 
Reviewed-by: Markus Armbruster 
Signed-off-by: Stefano Garzarella 
---
v2: https://patchew.org/QEMU/20240611130231.83152-1-sgarz...@redhat.com/
v1: https://patchew.org/QEMU/20240523133302.103858-1-sgarz...@redhat.com/
---
 qapi/qom.json | 8 +++-
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/qapi/qom.json b/qapi/qom.json
index 8bd299265e..9b8f6a7ab5 100644
--- a/qapi/qom.json
+++ b/qapi/qom.json
@@ -600,7 +600,9 @@
 # preallocation threads (default: none) (since 7.2)
 #
 # @share: if false, the memory is private to QEMU; if true, it is
-# shared (default: false)
+# shared (default false for backends memory-backend-file and
+# memory-backend-ram, true for backends memory-backend-epc and
+# memory-backend-memfd)
 #
 # @reserve: if true, reserve swap space (or huge pages) if applicable
 # (default: true) (since 6.1)
@@ -700,8 +702,6 @@
 #
 # Properties for memory-backend-memfd objects.
 #
-# The @share boolean option is true by default with memfd.
-#
 # @hugetlb: if true, the file to be created resides in the hugetlbfs
 # filesystem (default: false)
 #
@@ -726,8 +726,6 @@
 #
 # Properties for memory-backend-epc objects.
 #
-# The @share boolean option is true by default with epc
-#
 # The @merge boolean option is false by default with epc
 #
 # The @dump boolean option is false by default with epc
-- 
2.45.2

[PATCH v8 02/13] libvhost-user: set msg.msg_control to NULL when it is empty

2024-06-18 Thread Stefano Garzarella

On some OS (e.g. macOS) sendmsg() returns -1 (errno EINVAL) if
the `struct msghdr` has the field `msg_controllen` set to 0, but
`msg_control` is not NULL.

Reviewed-by: Eric Blake 
Reviewed-by: David Hildenbrand 
Reviewed-by: Philippe Mathieu-Daudé 
Tested-by: Philippe Mathieu-Daudé 
Acked-by: Stefan Hajnoczi 
Signed-off-by: Stefano Garzarella 
---
 subprojects/libvhost-user/libvhost-user.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/subprojects/libvhost-user/libvhost-user.c 
b/subprojects/libvhost-user/libvhost-user.c
index a879149fef..22bea0c775 100644
--- a/subprojects/libvhost-user/libvhost-user.c
+++ b/subprojects/libvhost-user/libvhost-user.c
@@ -632,6 +632,7 @@ vu_message_write(VuDev *dev, int conn_fd, VhostUserMsg 
*vmsg)
 memcpy(CMSG_DATA(cmsg), vmsg->fds, fdsize);
 } else {
 msg.msg_controllen = 0;
+msg.msg_control = NULL;
 }
 
 do {
-- 
2.45.2

Re: [PATCH RESEND v7 00/12] vhost-user: support any POSIX system (tested on macOS, FreeBSD, OpenBSD)

2024-06-17 Thread Stefano Garzarella


On Mon, Jun 17, 2024 at 09:42:21AM GMT, Michael S. Tsirkin wrote:

On Mon, Jun 17, 2024 at 02:59:14PM +0200, Stefano Garzarella wrote:

On Mon, Jun 17, 2024 at 02:02:30PM GMT, Markus Armbruster wrote:
> Stefano Garzarella  writes:
>
> > Hi Michael,
> >
> > On Wed, Jun 12, 2024 at 03:01:28PM GMT, Stefano Garzarella wrote:
> > > This series should be in a good shape, in which tree should we queue it?
> > > @Micheal would your tree be okay?
> >
> > Markus suggested a small change to patch 10, so do you want me to resend 
the whole series, or is it okay to resend just the last 3 patches (which are also the 
ones that depend on the other patch queued by Markus)?
>
> I guess you mean
>
>[PATCH v2] qapi: clarify that the default is backend dependent
>Message-ID: <20240611130231.83152-1-sgarz...@redhat.com>

Yep!

>
> > In the last case I would ask you to queue up the first 9 patches of this 
series if that is okay with you.
>
> Michael, feel free to merge the patch I queued.
>

I can also include it in v8 if it helps.

Thanks,
Stefano



If I'm to merge it, pls do.
Much less error prone.


Okay, I'll include it in v8.
I'll wait until tomorrow to see if there's any objection on the tree, 
but I think yours is the most suitable.


Thanks,
Stefano

Re: [PATCH RESEND v7 00/12] vhost-user: support any POSIX system (tested on macOS, FreeBSD, OpenBSD)

2024-06-17 Thread Stefano Garzarella


On Mon, Jun 17, 2024 at 02:02:30PM GMT, Markus Armbruster wrote:

Stefano Garzarella  writes:


Hi Michael,

On Wed, Jun 12, 2024 at 03:01:28PM GMT, Stefano Garzarella wrote:

This series should be in a good shape, in which tree should we queue it?
@Micheal would your tree be okay?


Markus suggested a small change to patch 10, so do you want me to resend the 
whole series, or is it okay to resend just the last 3 patches (which are also 
the ones that depend on the other patch queued by Markus)?


I guess you mean

   [PATCH v2] qapi: clarify that the default is backend dependent
   Message-ID: <20240611130231.83152-1-sgarz...@redhat.com>


Yep!




In the last case I would ask you to queue up the first 9 patches of this series 
if that is okay with you.


Michael, feel free to merge the patch I queued.



I can also include it in v8 if it helps.

Thanks,
Stefano

Re: [PATCH RESEND v7 00/12] vhost-user: support any POSIX system (tested on macOS, FreeBSD, OpenBSD)

2024-06-17 Thread Stefano Garzarella


Hi Michael,

On Wed, Jun 12, 2024 at 03:01:28PM GMT, Stefano Garzarella wrote:

This series should be in a good shape, in which tree should we queue it?
@Micheal would your tree be okay?


Markus suggested a small change to patch 10, so do you want me to resend 
the whole series, or is it okay to resend just the last 3 patches (which 
are also the ones that depend on the other patch queued by Markus)?


In the last case I would ask you to queue up the first 9 patches of this 
series if that is okay with you.


Thanks,
Stefano



Thanks,
Stefano

Changelog

v1: https://patchew.org/QEMU/20240228114759.44758-1-sgarz...@redhat.com/
v2: https://patchew.org/QEMU/20240326133936.125332-1-sgarz...@redhat.com/
v3: https://patchew.org/QEMU/20240404122330.92710-1-sgarz...@redhat.com/
v4: https://patchew.org/QEMU/20240508074457.12367-1-sgarz...@redhat.com/
v5: https://patchew.org/QEMU/20240523145522.313012-1-sgarz...@redhat.com/
v6: https://patchew.org/QEMU/20240528103543.145412-1-sgarz...@redhat.com/
v7:
- rebased on 
https://patchew.org/QEMU/20240611130231.83152-1-sgarz...@redhat.com/
 That patch is queued by Markus and only Patch 10 of this series depends on it.
- changed default value documentation for @share [Markus]
- used `memory-backend-shm` instead of `shm` and wrapped to 70 columns
 [Markus]
- added 'if': 'CONFIG_POSIX' to MemoryBackendShmProperties [Markus]

Description

The vhost-user protocol is not really Linux-specific, so let's try support
QEMU's frontends and backends (including libvhost-user) in any POSIX system
with this series. The main use case is to be able to use virtio devices that
we don't have built-in in QEMU (e.g. virtiofsd, vhost-user-vsock, etc.) even
in non-Linux systems.

The first 5 patches are more like fixes discovered at runtime on macOS or
FreeBSD that could go even independently of this series.

Patches 6, 7, 8, 9 enable building of frontends and backends (including
libvhost-user) with associated code changes to succeed in compilation.

Patch 10 adds `memory-backend-shm` that uses the POSIX shm_open() API to
create shared memory which is identified by an fd that can be shared with
vhost-user backends. This is useful on those systems (like macOS) where
we don't have memfd_create() or special filesystems like "/dev/shm".

Patches 11 and 12 use `memory-backend-shm` in some vhost-user tests.

Maybe the first 5 patches can go separately, but I only discovered those
problems after testing patches 6 - 9, so I have included them in this series
for now. Please let me know if you prefer that I send them separately.

I tested this series using vhost-user-blk and QSD on macOS Sonoma 14.4
(aarch64), FreeBSD 14 (x86_64), OpenBSD 7.4 (x86_64), and Fedora 40 (x86_64)
in this way:

- Start vhost-user-blk or QSD (same commands for all systems)

 vhost-user-blk -s /tmp/vhost.socket \
   -b Fedora-Cloud-Base-39-1.5.x86_64.raw

 qemu-storage-daemon \
   --blockdev 
file,filename=Fedora-Cloud-Base-39-1.5.x86_64.qcow2,node-name=file \
   --blockdev qcow2,file=file,node-name=qcow2 \
   --export 
vhost-user-blk,addr.type=unix,addr.path=/tmp/vhost.socket,id=vub,num-queues=1,node-name=qcow2,writable=on

- macOS (aarch64): start QEMU (using hvf accelerator)

 qemu-system-aarch64 -smp 2 -cpu host -M virt,accel=hvf,memory-backend=mem \
   -drive 
file=./build/pc-bios/edk2-aarch64-code.fd,if=pflash,format=raw,readonly=on \
   -device virtio-net-device,netdev=net0 -netdev user,id=net0 \
   -device ramfb -device usb-ehci -device usb-kbd \
   -object memory-backend-shm,id=mem,size=512M \
   -device vhost-user-blk-pci,num-queues=1,disable-legacy=on,chardev=char0 \
   -chardev socket,id=char0,path=/tmp/vhost.socket

- FreeBSD/OpenBSD (x86_64): start QEMU (no accelerators available)

 qemu-system-x86_64 -smp 2 -M q35,memory-backend=mem \
   -object memory-backend-shm,id=mem,size="512M" \
   -device vhost-user-blk-pci,num-queues=1,chardev=char0 \
   -chardev socket,id=char0,path=/tmp/vhost.socket

- Fedora (x86_64): start QEMU (using kvm accelerator)

 qemu-system-x86_64 -smp 2 -M q35,accel=kvm,memory-backend=mem \
   -object memory-backend-shm,size="512M" \
   -device vhost-user-blk-pci,num-queues=1,chardev=char0 \
   -chardev socket,id=char0,path=/tmp/vhost.socket

Branch pushed (and CI started) at 
https://gitlab.com/sgarzarella/qemu/-/tree/macos-vhost-user?ref_type=heads

Based-on: 20240611130231.83152-1-sgarz...@redhat.com

Stefano Garzarella (12):
 libvhost-user: set msg.msg_control to NULL when it is empty
 libvhost-user: fail vu_message_write() if sendmsg() is failing
 libvhost-user: mask F_INFLIGHT_SHMFD if memfd is not supported
 vhost-user-server: do not set memory fd non-blocking
 contrib/vhost-user-blk: fix bind() using the right size of the address
 contrib/vhost-user-*: use QEMU bswap helper functions
 vhost-user: enable frontends on any POSIX system
 libvhost-user: enable it on any POSIX system
 contrib/vhost-user-blk: enable it

Re: [PATCH RESEND v7 10/12] hostmem: add a new memory backend based on POSIX shm_open()

2024-06-13 Thread Stefano Garzarella


On Wed, Jun 12, 2024 at 03:20:48PM GMT, Markus Armbruster wrote:

Stefano Garzarella  writes:


shm_open() creates and opens a new POSIX shared memory object.
A POSIX shared memory object allows creating memory backend with an
associated file descriptor that can be shared with external processes
(e.g. vhost-user).

The new `memory-backend-shm` can be used as an alternative when
`memory-backend-memfd` is not available (Linux only), since shm_open()
should be provided by any POSIX-compliant operating system.

This backend mimics memfd, allocating memory that is practically
anonymous. In theory shm_open() requires a name, but this is allocated
for a short time interval and shm_unlink() is called right after
shm_open(). After that, only fd is shared with external processes
(e.g., vhost-user) as if it were associated with anonymous memory.

In the future we may also allow the user to specify the name to be
passed to shm_open(), but for now we keep the backend simple, mimicking
anonymous memory such as memfd.

Acked-by: David Hildenbrand 
Acked-by: Stefan Hajnoczi 
Signed-off-by: Stefano Garzarella 


[...]


diff --git a/qapi/qom.json b/qapi/qom.json
index 9b8f6a7ab5..94e4458288 100644
--- a/qapi/qom.json
+++ b/qapi/qom.json
@@ -601,8 +601,8 @@
 #
 # @share: if false, the memory is private to QEMU; if true, it is
 # shared (default false for backends memory-backend-file and
-# memory-backend-ram, true for backends memory-backend-epc and
-# memory-backend-memfd)
+# memory-backend-ram, true for backends memory-backend-epc,
+# memory-backend-memfd, and memory-backend-shm)
 #
 # @reserve: if true, reserve swap space (or huge pages) if applicable
 # (default: true) (since 6.1)
@@ -721,6 +721,22 @@
 '*hugetlbsize': 'size',
 '*seal': 'bool' } }

+##
+# @MemoryBackendShmProperties:
+#
+# Properties for memory-backend-shm objects.
+#
+# Setting @share boolean option (defined in the base type) to false
+# will cause a failure during allocation because it is not
+# supported by this backend.


This is QMP reference documentation.  "Failure during allocation" feels
like unnecessary detail there.  Maybe "This memory backend support only
shared memory, which is the default."


I'll fix in v8!




+#
+# Since: 9.1
+##
+{ 'struct': 'MemoryBackendShmProperties',
+  'base': 'MemoryBackendProperties',
+  'data': { },
+  'if': 'CONFIG_POSIX' }
+
 ##
 # @MemoryBackendEpcProperties:
 #
@@ -1049,6 +1065,8 @@
 { 'name': 'memory-backend-memfd',
   'if': 'CONFIG_LINUX' },
 'memory-backend-ram',
+{ 'name': 'memory-backend-shm',
+  'if': 'CONFIG_POSIX' },
 'pef-guest',
 { 'name': 'pr-manager-helper',
   'if': 'CONFIG_LINUX' },
@@ -1121,6 +1139,8 @@
   'memory-backend-memfd':   { 'type': 'MemoryBackendMemfdProperties',
   'if': 'CONFIG_LINUX' },
   'memory-backend-ram': 'MemoryBackendProperties',
+  'memory-backend-shm': { 'type': 'MemoryBackendShmProperties',
+  'if': 'CONFIG_POSIX' },
   'pr-manager-helper':  { 'type': 'PrManagerHelperProperties',
   'if': 'CONFIG_LINUX' },
   'qtest':  'QtestProperties',


[...]

Other than that, QAPI schema
Acked-by: Markus Armbruster 



Thanks for the review!
Stefano

[PATCH RESEND v7 00/12] vhost-user: support any POSIX system (tested on macOS, FreeBSD, OpenBSD)

2024-06-12 Thread Stefano Garzarella

This series should be in a good shape, in which tree should we queue it?
@Micheal would your tree be okay?

Thanks,
Stefano

Changelog

v1: https://patchew.org/QEMU/20240228114759.44758-1-sgarz...@redhat.com/
v2: https://patchew.org/QEMU/20240326133936.125332-1-sgarz...@redhat.com/
v3: https://patchew.org/QEMU/20240404122330.92710-1-sgarz...@redhat.com/
v4: https://patchew.org/QEMU/20240508074457.12367-1-sgarz...@redhat.com/
v5: https://patchew.org/QEMU/20240523145522.313012-1-sgarz...@redhat.com/
v6: https://patchew.org/QEMU/20240528103543.145412-1-sgarz...@redhat.com/
v7:
- rebased on 
https://patchew.org/QEMU/20240611130231.83152-1-sgarz...@redhat.com/
  That patch is queued by Markus and only Patch 10 of this series depends on it.
- changed default value documentation for @share [Markus]
- used `memory-backend-shm` instead of `shm` and wrapped to 70 columns
  [Markus]
- added 'if': 'CONFIG_POSIX' to MemoryBackendShmProperties [Markus]

Description

The vhost-user protocol is not really Linux-specific, so let's try support
QEMU's frontends and backends (including libvhost-user) in any POSIX system
with this series. The main use case is to be able to use virtio devices that
we don't have built-in in QEMU (e.g. virtiofsd, vhost-user-vsock, etc.) even
in non-Linux systems.

The first 5 patches are more like fixes discovered at runtime on macOS or
FreeBSD that could go even independently of this series.

Patches 6, 7, 8, 9 enable building of frontends and backends (including
libvhost-user) with associated code changes to succeed in compilation.

Patch 10 adds `memory-backend-shm` that uses the POSIX shm_open() API to
create shared memory which is identified by an fd that can be shared with
vhost-user backends. This is useful on those systems (like macOS) where
we don't have memfd_create() or special filesystems like "/dev/shm".

Patches 11 and 12 use `memory-backend-shm` in some vhost-user tests.

Maybe the first 5 patches can go separately, but I only discovered those
problems after testing patches 6 - 9, so I have included them in this series
for now. Please let me know if you prefer that I send them separately.

I tested this series using vhost-user-blk and QSD on macOS Sonoma 14.4
(aarch64), FreeBSD 14 (x86_64), OpenBSD 7.4 (x86_64), and Fedora 40 (x86_64)
in this way:

- Start vhost-user-blk or QSD (same commands for all systems)

  vhost-user-blk -s /tmp/vhost.socket \
-b Fedora-Cloud-Base-39-1.5.x86_64.raw

  qemu-storage-daemon \
--blockdev 
file,filename=Fedora-Cloud-Base-39-1.5.x86_64.qcow2,node-name=file \
--blockdev qcow2,file=file,node-name=qcow2 \
--export 
vhost-user-blk,addr.type=unix,addr.path=/tmp/vhost.socket,id=vub,num-queues=1,node-name=qcow2,writable=on

- macOS (aarch64): start QEMU (using hvf accelerator)

  qemu-system-aarch64 -smp 2 -cpu host -M virt,accel=hvf,memory-backend=mem \
-drive 
file=./build/pc-bios/edk2-aarch64-code.fd,if=pflash,format=raw,readonly=on \
-device virtio-net-device,netdev=net0 -netdev user,id=net0 \
-device ramfb -device usb-ehci -device usb-kbd \
-object memory-backend-shm,id=mem,size=512M \
-device vhost-user-blk-pci,num-queues=1,disable-legacy=on,chardev=char0 \
-chardev socket,id=char0,path=/tmp/vhost.socket

- FreeBSD/OpenBSD (x86_64): start QEMU (no accelerators available)

  qemu-system-x86_64 -smp 2 -M q35,memory-backend=mem \
-object memory-backend-shm,id=mem,size="512M" \
-device vhost-user-blk-pci,num-queues=1,chardev=char0 \
-chardev socket,id=char0,path=/tmp/vhost.socket

- Fedora (x86_64): start QEMU (using kvm accelerator)

  qemu-system-x86_64 -smp 2 -M q35,accel=kvm,memory-backend=mem \
-object memory-backend-shm,size="512M" \
-device vhost-user-blk-pci,num-queues=1,chardev=char0 \
-chardev socket,id=char0,path=/tmp/vhost.socket

Branch pushed (and CI started) at 
https://gitlab.com/sgarzarella/qemu/-/tree/macos-vhost-user?ref_type=heads

Based-on: 20240611130231.83152-1-sgarz...@redhat.com

Stefano Garzarella (12):
  libvhost-user: set msg.msg_control to NULL when it is empty
  libvhost-user: fail vu_message_write() if sendmsg() is failing
  libvhost-user: mask F_INFLIGHT_SHMFD if memfd is not supported
  vhost-user-server: do not set memory fd non-blocking
  contrib/vhost-user-blk: fix bind() using the right size of the address
  contrib/vhost-user-*: use QEMU bswap helper functions
  vhost-user: enable frontends on any POSIX system
  libvhost-user: enable it on any POSIX system
  contrib/vhost-user-blk: enable it on any POSIX system
  hostmem: add a new memory backend based on POSIX shm_open()
  tests/qtest/vhost-user-blk-test: use memory-backend-shm
  tests/qtest/vhost-user-test: add a test case for memory-backend-shm

 docs/system/devices/vhost-user.rst|   5 +-
 meson.build   |   5 +-
 qapi/qom.json |  24 -
 subp

[PATCH RESEND v7 01/12] libvhost-user: set msg.msg_control to NULL when it is empty

2024-06-12 Thread Stefano Garzarella

On some OS (e.g. macOS) sendmsg() returns -1 (errno EINVAL) if
the `struct msghdr` has the field `msg_controllen` set to 0, but
`msg_control` is not NULL.

Reviewed-by: Eric Blake 
Reviewed-by: David Hildenbrand 
Reviewed-by: Philippe Mathieu-Daudé 
Tested-by: Philippe Mathieu-Daudé 
Acked-by: Stefan Hajnoczi 
Signed-off-by: Stefano Garzarella 
---
 subprojects/libvhost-user/libvhost-user.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/subprojects/libvhost-user/libvhost-user.c 
b/subprojects/libvhost-user/libvhost-user.c
index a879149fef..22bea0c775 100644
--- a/subprojects/libvhost-user/libvhost-user.c
+++ b/subprojects/libvhost-user/libvhost-user.c
@@ -632,6 +632,7 @@ vu_message_write(VuDev *dev, int conn_fd, VhostUserMsg 
*vmsg)
 memcpy(CMSG_DATA(cmsg), vmsg->fds, fdsize);
 } else {
 msg.msg_controllen = 0;
+msg.msg_control = NULL;
 }
 
 do {
-- 
2.45.2

[PATCH RESEND v7 12/12] tests/qtest/vhost-user-test: add a test case for memory-backend-shm

2024-06-12 Thread Stefano Garzarella

`memory-backend-shm` can be used with vhost-user devices, so let's
add a new test case for it.

Acked-by: Thomas Huth 
Acked-by: Stefan Hajnoczi 
Reviewed-by: David Hildenbrand 
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Stefano Garzarella 
---
 tests/qtest/vhost-user-test.c | 23 +++
 1 file changed, 23 insertions(+)

diff --git a/tests/qtest/vhost-user-test.c b/tests/qtest/vhost-user-test.c
index d4e437265f..8c1d903b2a 100644
--- a/tests/qtest/vhost-user-test.c
+++ b/tests/qtest/vhost-user-test.c
@@ -44,6 +44,8 @@
 "mem-path=%s,share=on -numa node,memdev=mem"
 #define QEMU_CMD_MEMFD  " -m %d -object memory-backend-memfd,id=mem,size=%dM," 
\
 " -numa node,memdev=mem"
+#define QEMU_CMD_SHM" -m %d -object memory-backend-shm,id=mem,size=%dM," \
+" -numa node,memdev=mem"
 #define QEMU_CMD_CHR" -chardev socket,id=%s,path=%s%s"
 #define QEMU_CMD_NETDEV " -netdev vhost-user,id=hs0,chardev=%s,vhostforce=on"
 
@@ -195,6 +197,7 @@ enum test_memfd {
 TEST_MEMFD_AUTO,
 TEST_MEMFD_YES,
 TEST_MEMFD_NO,
+TEST_MEMFD_SHM,
 };
 
 static void append_vhost_net_opts(TestServer *s, GString *cmd_line,
@@ -228,6 +231,8 @@ static void append_mem_opts(TestServer *server, GString 
*cmd_line,
 
 if (memfd == TEST_MEMFD_YES) {
 g_string_append_printf(cmd_line, QEMU_CMD_MEMFD, size, size);
+} else if (memfd == TEST_MEMFD_SHM) {
+g_string_append_printf(cmd_line, QEMU_CMD_SHM, size, size);
 } else {
 const char *root = init_hugepagefs() ? : server->tmpfs;
 
@@ -788,6 +793,19 @@ static void *vhost_user_test_setup_memfd(GString 
*cmd_line, void *arg)
 return server;
 }
 
+static void *vhost_user_test_setup_shm(GString *cmd_line, void *arg)
+{
+TestServer *server = test_server_new("vhost-user-test", arg);
+test_server_listen(server);
+
+append_mem_opts(server, cmd_line, 256, TEST_MEMFD_SHM);
+server->vu_ops->append_opts(server, cmd_line, "");
+
+g_test_queue_destroy(vhost_user_test_cleanup, server);
+
+return server;
+}
+
 static void test_read_guest_mem(void *obj, void *arg, QGuestAllocator *alloc)
 {
 TestServer *server = arg;
@@ -1081,6 +1099,11 @@ static void register_vhost_user_test(void)
  "virtio-net",
  test_read_guest_mem, &opts);
 
+opts.before = vhost_user_test_setup_shm;
+qos_add_test("vhost-user/read-guest-mem/shm",
+ "virtio-net",
+ test_read_guest_mem, &opts);
+
 if (qemu_memfd_check(MFD_ALLOW_SEALING)) {
 opts.before = vhost_user_test_setup_memfd;
 qos_add_test("vhost-user/read-guest-mem/memfd",
-- 
2.45.2

[PATCH RESEND v7 02/12] libvhost-user: fail vu_message_write() if sendmsg() is failing

2024-06-12 Thread Stefano Garzarella

In vu_message_write() we use sendmsg() to send the message header,
then a write() to send the payload.

If sendmsg() fails we should avoid sending the payload, since we
were unable to send the header.

Discovered before fixing the issue with the previous patch, where
sendmsg() failed on macOS due to wrong parameters, but the frontend
still sent the payload which the backend incorrectly interpreted
as a wrong header.

Reviewed-by: Daniel P. Berrangé 
Reviewed-by: Philippe Mathieu-Daudé 
Tested-by: Philippe Mathieu-Daudé 
Acked-by: Stefan Hajnoczi 
Reviewed-by: David Hildenbrand 
Signed-off-by: Stefano Garzarella 
---
 subprojects/libvhost-user/libvhost-user.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/subprojects/libvhost-user/libvhost-user.c 
b/subprojects/libvhost-user/libvhost-user.c
index 22bea0c775..a11afd1960 100644
--- a/subprojects/libvhost-user/libvhost-user.c
+++ b/subprojects/libvhost-user/libvhost-user.c
@@ -639,6 +639,11 @@ vu_message_write(VuDev *dev, int conn_fd, VhostUserMsg 
*vmsg)
 rc = sendmsg(conn_fd, &msg, 0);
 } while (rc < 0 && (errno == EINTR || errno == EAGAIN));
 
+if (rc <= 0) {
+vu_panic(dev, "Error while writing: %s", strerror(errno));
+return false;
+}
+
 if (vmsg->size) {
 do {
 if (vmsg->data) {
-- 
2.45.2

[PATCH RESEND v7 10/12] hostmem: add a new memory backend based on POSIX shm_open()

2024-06-12 Thread Stefano Garzarella

shm_open() creates and opens a new POSIX shared memory object.
A POSIX shared memory object allows creating memory backend with an
associated file descriptor that can be shared with external processes
(e.g. vhost-user).

The new `memory-backend-shm` can be used as an alternative when
`memory-backend-memfd` is not available (Linux only), since shm_open()
should be provided by any POSIX-compliant operating system.

This backend mimics memfd, allocating memory that is practically
anonymous. In theory shm_open() requires a name, but this is allocated
for a short time interval and shm_unlink() is called right after
shm_open(). After that, only fd is shared with external processes
(e.g., vhost-user) as if it were associated with anonymous memory.

In the future we may also allow the user to specify the name to be
passed to shm_open(), but for now we keep the backend simple, mimicking
anonymous memory such as memfd.

Acked-by: David Hildenbrand 
Acked-by: Stefan Hajnoczi 
Signed-off-by: Stefano Garzarella 
---
v7
- changed default value documentation for @share rebasing on
  20240611130231.83152-1-sgarz...@redhat.com [Markus]
- used `memory-backend-shm` instead of `shm` and wrapped to 70 columns
  [Markus]
- added 'if': 'CONFIG_POSIX' to MemoryBackendShmProperties [Markus]
v5
- fixed documentation in qapi/qom.json and qemu-options.hx [Markus]
v4
- fail if we find "share=off" in shm_backend_memory_alloc() [David]
v3
- enriched commit message and documentation to highlight that we
  want to mimic memfd (David)
---
 docs/system/devices/vhost-user.rst |   5 +-
 qapi/qom.json  |  24 +-
 backends/hostmem-shm.c | 123 +
 backends/meson.build   |   1 +
 qemu-options.hx|  16 
 5 files changed, 165 insertions(+), 4 deletions(-)
 create mode 100644 backends/hostmem-shm.c

diff --git a/docs/system/devices/vhost-user.rst 
b/docs/system/devices/vhost-user.rst
index 9b2da106ce..35259d8ec7 100644
--- a/docs/system/devices/vhost-user.rst
+++ b/docs/system/devices/vhost-user.rst
@@ -98,8 +98,9 @@ Shared memory object
 
 In order for the daemon to access the VirtIO queues to process the
 requests it needs access to the guest's address space. This is
-achieved via the ``memory-backend-file`` or ``memory-backend-memfd``
-objects. A reference to a file-descriptor which can access this object
+achieved via the ``memory-backend-file``, ``memory-backend-memfd``, or
+``memory-backend-shm`` objects.
+A reference to a file-descriptor which can access this object
 will be passed via the socket as part of the protocol negotiation.
 
 Currently the shared memory object needs to match the size of the main
diff --git a/qapi/qom.json b/qapi/qom.json
index 9b8f6a7ab5..94e4458288 100644
--- a/qapi/qom.json
+++ b/qapi/qom.json
@@ -601,8 +601,8 @@
 #
 # @share: if false, the memory is private to QEMU; if true, it is
 # shared (default false for backends memory-backend-file and
-# memory-backend-ram, true for backends memory-backend-epc and
-# memory-backend-memfd)
+# memory-backend-ram, true for backends memory-backend-epc,
+# memory-backend-memfd, and memory-backend-shm)
 #
 # @reserve: if true, reserve swap space (or huge pages) if applicable
 # (default: true) (since 6.1)
@@ -721,6 +721,22 @@
 '*hugetlbsize': 'size',
 '*seal': 'bool' } }
 
+##
+# @MemoryBackendShmProperties:
+#
+# Properties for memory-backend-shm objects.
+#
+# Setting @share boolean option (defined in the base type) to false
+# will cause a failure during allocation because it is not
+# supported by this backend.
+#
+# Since: 9.1
+##
+{ 'struct': 'MemoryBackendShmProperties',
+  'base': 'MemoryBackendProperties',
+  'data': { },
+  'if': 'CONFIG_POSIX' }
+
 ##
 # @MemoryBackendEpcProperties:
 #
@@ -1049,6 +1065,8 @@
 { 'name': 'memory-backend-memfd',
   'if': 'CONFIG_LINUX' },
 'memory-backend-ram',
+{ 'name': 'memory-backend-shm',
+  'if': 'CONFIG_POSIX' },
 'pef-guest',
 { 'name': 'pr-manager-helper',
   'if': 'CONFIG_LINUX' },
@@ -1121,6 +1139,8 @@
   'memory-backend-memfd':   { 'type': 'MemoryBackendMemfdProperties',
   'if': 'CONFIG_LINUX' },
   'memory-backend-ram': 'MemoryBackendProperties',
+  'memory-backend-shm': { 'type': 'MemoryBackendShmProperties',
+  'if': 'CONFIG_POSIX' },
   'pr-manager-helper':  { 'type': 'PrManagerHelperProperties',

[PATCH RESEND v7 07/12] vhost-user: enable frontends on any POSIX system

2024-06-12 Thread Stefano Garzarella

The vhost-user protocol is not really Linux-specific so let's enable
vhost-user frontends for any POSIX system.

In vhost_net.c we use VHOST_FILE_UNBIND which is defined in a Linux
specific header, let's define it for other systems as well.

Reviewed-by: Philippe Mathieu-Daudé 
Tested-by: Philippe Mathieu-Daudé 
Acked-by: Stefan Hajnoczi 
Reviewed-by: David Hildenbrand 
Signed-off-by: Stefano Garzarella 
---
 meson.build| 1 -
 hw/net/vhost_net.c | 5 +
 hw/block/Kconfig   | 2 +-
 3 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/meson.build b/meson.build
index ec59effca2..239044070f 100644
--- a/meson.build
+++ b/meson.build
@@ -151,7 +151,6 @@ have_tpm = get_option('tpm') \
 
 # vhost
 have_vhost_user = get_option('vhost_user') \
-  .disable_auto_if(host_os != 'linux') \
   .require(host_os != 'windows',
error_message: 'vhost-user is not available on Windows').allowed()
 have_vhost_vdpa = get_option('vhost_vdpa') \
diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index fd1a93701a..fced429813 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -34,8 +34,13 @@
 #include "standard-headers/linux/virtio_ring.h"
 #include "hw/virtio/vhost.h"
 #include "hw/virtio/virtio-bus.h"
+#if defined(__linux__)
 #include "linux-headers/linux/vhost.h"
+#endif
 
+#ifndef VHOST_FILE_UNBIND
+#define VHOST_FILE_UNBIND -1
+#endif
 
 /* Features supported by host kernel. */
 static const int kernel_feature_bits[] = {
diff --git a/hw/block/Kconfig b/hw/block/Kconfig
index 9e8f28f982..29ee09e434 100644
--- a/hw/block/Kconfig
+++ b/hw/block/Kconfig
@@ -40,7 +40,7 @@ config VHOST_USER_BLK
 bool
 # Only PCI devices are provided for now
 default y if VIRTIO_PCI
-depends on VIRTIO && VHOST_USER && LINUX
+depends on VIRTIO && VHOST_USER
 
 config SWIM
 bool
-- 
2.45.2

[PATCH RESEND v7 09/12] contrib/vhost-user-blk: enable it on any POSIX system

2024-06-12 Thread Stefano Garzarella

Let's make the code more portable by adding defines from
block/file-posix.c to support O_DIRECT in other systems (e.g. macOS).

vhost-user-server.c is a dependency, let's enable it for any POSIX
system.

Acked-by: Stefan Hajnoczi 
Reviewed-by: Philippe Mathieu-Daudé 
Tested-by: Philippe Mathieu-Daudé 
Signed-off-by: Stefano Garzarella 
---
v6:
- reverted v5 changes since we can't move O_DSYNC and O_DIRECT in osdep
  [Daniel, failing tests on Windows]
v5:
- O_DSYNC and O_DIRECT definition are now in osdep [Phil]
- commit updated since we moved out all code changes
v4:
- moved using of "qemu/bswap.h" API in a separate patch [Phil]
---
 meson.build |  2 --
 contrib/vhost-user-blk/vhost-user-blk.c | 14 ++
 util/meson.build|  4 +++-
 3 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/meson.build b/meson.build
index 6413e858ea..8436e0c3d6 100644
--- a/meson.build
+++ b/meson.build
@@ -1985,8 +1985,6 @@ has_statx = cc.has_header_symbol('sys/stat.h', 
'STATX_BASIC_STATS', prefix: gnu_
 has_statx_mnt_id = cc.has_header_symbol('sys/stat.h', 'STATX_MNT_ID', prefix: 
gnu_source_prefix)
 
 have_vhost_user_blk_server = get_option('vhost_user_blk_server') \
-  .require(host_os == 'linux',
-   error_message: 'vhost_user_blk_server requires linux') \
   .require(have_vhost_user,
error_message: 'vhost_user_blk_server requires vhost-user support') 
\
   .disable_auto_if(not have_tools and not have_system) \
diff --git a/contrib/vhost-user-blk/vhost-user-blk.c 
b/contrib/vhost-user-blk/vhost-user-blk.c
index 9492146855..a450337685 100644
--- a/contrib/vhost-user-blk/vhost-user-blk.c
+++ b/contrib/vhost-user-blk/vhost-user-blk.c
@@ -25,6 +25,20 @@
 #include 
 #endif
 
+/* OS X does not have O_DSYNC */
+#ifndef O_DSYNC
+#ifdef O_SYNC
+#define O_DSYNC O_SYNC
+#elif defined(O_FSYNC)
+#define O_DSYNC O_FSYNC
+#endif
+#endif
+
+/* Approximate O_DIRECT with O_DSYNC if O_DIRECT isn't available */
+#ifndef O_DIRECT
+#define O_DIRECT O_DSYNC
+#endif
+
 enum {
 VHOST_USER_BLK_MAX_QUEUES = 8,
 };
diff --git a/util/meson.build b/util/meson.build
index 72b505df11..c414178ace 100644
--- a/util/meson.build
+++ b/util/meson.build
@@ -112,10 +112,12 @@ if have_block
 util_ss.add(files('filemonitor-stub.c'))
   endif
   if host_os == 'linux'
-util_ss.add(files('vhost-user-server.c'), vhost_user)
 util_ss.add(files('vfio-helpers.c'))
 util_ss.add(files('chardev_open.c'))
   endif
+  if host_os != 'windows'
+util_ss.add(files('vhost-user-server.c'), vhost_user)
+  endif
   util_ss.add(files('yank.c'))
 endif
 
-- 
2.45.2

[PATCH RESEND v7 05/12] contrib/vhost-user-blk: fix bind() using the right size of the address

2024-06-12 Thread Stefano Garzarella

On macOS passing `-s /tmp/vhost.socket` parameter to the vhost-user-blk
application, the bind was done on `/tmp/vhost.socke` pathname,
missing the last character.

This sounds like one of the portability problems described in the
unix(7) manpage:

   Pathname sockets
   When  binding  a socket to a pathname, a few rules should
   be observed for maximum portability and ease of coding:

   •  The pathname in sun_path should be null-terminated.

   •  The length of the pathname, including the  terminating
  null byte, should not exceed the size of sun_path.

   •  The  addrlen  argument  that  describes  the enclosing
  sockaddr_un structure should have a value of at least:

  offsetof(struct sockaddr_un, sun_path) +
  strlen(addr.sun_path)+1

  or,  more  simply,  addrlen  can   be   specified   as
  sizeof(struct sockaddr_un).

So let's follow the last advice and simplify the code as well.

Reviewed-by: Philippe Mathieu-Daudé 
Tested-by: Philippe Mathieu-Daudé 
Acked-by: Stefan Hajnoczi 
Reviewed-by: David Hildenbrand 
Signed-off-by: Stefano Garzarella 
---
 contrib/vhost-user-blk/vhost-user-blk.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/contrib/vhost-user-blk/vhost-user-blk.c 
b/contrib/vhost-user-blk/vhost-user-blk.c
index 89e5f11a64..a8ab9269a2 100644
--- a/contrib/vhost-user-blk/vhost-user-blk.c
+++ b/contrib/vhost-user-blk/vhost-user-blk.c
@@ -469,7 +469,6 @@ static int unix_sock_new(char *unix_fn)
 {
 int sock;
 struct sockaddr_un un;
-size_t len;
 
 assert(unix_fn);
 
@@ -481,10 +480,9 @@ static int unix_sock_new(char *unix_fn)
 
 un.sun_family = AF_UNIX;
 (void)snprintf(un.sun_path, sizeof(un.sun_path), "%s", unix_fn);
-len = sizeof(un.sun_family) + strlen(un.sun_path);
 
 (void)unlink(unix_fn);
-if (bind(sock, (struct sockaddr *)&un, len) < 0) {
+if (bind(sock, (struct sockaddr *)&un, sizeof(un)) < 0) {
 perror("bind");
 goto fail;
 }
-- 
2.45.2

[PATCH RESEND v7 03/12] libvhost-user: mask F_INFLIGHT_SHMFD if memfd is not supported

2024-06-12 Thread Stefano Garzarella

libvhost-user will panic when receiving VHOST_USER_GET_INFLIGHT_FD
message if MFD_ALLOW_SEALING is not defined, since it's not able
to create a memfd.

VHOST_USER_GET_INFLIGHT_FD is used only if
VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD is negotiated. So, let's mask
that feature if the backend is not able to properly handle these
messages.

Reviewed-by: Philippe Mathieu-Daudé 
Tested-by: Philippe Mathieu-Daudé 
Acked-by: Stefan Hajnoczi 
Reviewed-by: David Hildenbrand 
Signed-off-by: Stefano Garzarella 
---
 subprojects/libvhost-user/libvhost-user.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/subprojects/libvhost-user/libvhost-user.c 
b/subprojects/libvhost-user/libvhost-user.c
index a11afd1960..2c20cdc16e 100644
--- a/subprojects/libvhost-user/libvhost-user.c
+++ b/subprojects/libvhost-user/libvhost-user.c
@@ -1674,6 +1674,17 @@ vu_get_protocol_features_exec(VuDev *dev, VhostUserMsg 
*vmsg)
 features |= dev->iface->get_protocol_features(dev);
 }
 
+#ifndef MFD_ALLOW_SEALING
+/*
+ * If MFD_ALLOW_SEALING is not defined, we are not able to handle
+ * VHOST_USER_GET_INFLIGHT_FD messages, since we can't create a memfd.
+ * Those messages are used only if VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD
+ * is negotiated. A device implementation can enable it, so let's mask
+ * it to avoid a runtime panic.
+ */
+features &= ~(1ULL << VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD);
+#endif
+
 vmsg_set_reply_u64(vmsg, features);
 return true;
 }
-- 
2.45.2

Re: [PATCH v7 00/13] vhost-user: support any POSIX system (tested on macOS, FreeBSD, OpenBSD)

2024-06-12 Thread Stefano Garzarella

Ooops, wrong cover letter. I just resent the whole series.

Sorry for the confusion.
Stefano

On Wed, Jun 12, 2024 at 2:59 PM Stefano Garzarella  wrote:
>
> This series should be in a good shape, in which tree should we queue it?
> @Micheal would your tree be okay?
>
> Thanks,
> Stefano
>
> Changelog
>
> v1: https://patchew.org/QEMU/20240228114759.44758-1-sgarz...@redhat.com/
> v2: https://patchew.org/QEMU/20240326133936.125332-1-sgarz...@redhat.com/
> v3: https://patchew.org/QEMU/20240404122330.92710-1-sgarz...@redhat.com/
> v4: https://patchew.org/QEMU/20240508074457.12367-1-sgarz...@redhat.com/
> v5: https://patchew.org/QEMU/20240523145522.313012-1-sgarz...@redhat.com/
> v6: https://patchew.org/QEMU/20240528103543.145412-1-sgarz...@redhat.com/
> v7:
> - rebased on 
> https://patchew.org/QEMU/20240611130231.83152-1-sgarz...@redhat.com/
>   That patch is queued by Markus and only Patch 10 of this series depends on 
> it.
> - changed default value documentation for @share [Markus]
> - used `memory-backend-shm` instead of `shm` and wrapped to 70 columns
>   [Markus]
> - added 'if': 'CONFIG_POSIX' to MemoryBackendShmProperties [Markus]
>
> Description
>
> The vhost-user protocol is not really Linux-specific, so let's try support
> QEMU's frontends and backends (including libvhost-user) in any POSIX system
> with this series. The main use case is to be able to use virtio devices that
> we don't have built-in in QEMU (e.g. virtiofsd, vhost-user-vsock, etc.) even
> in non-Linux systems.
>
> The first 5 patches are more like fixes discovered at runtime on macOS or
> FreeBSD that could go even independently of this series.
>
> Patches 6, 7, 8, 9 enable building of frontends and backends (including
> libvhost-user) with associated code changes to succeed in compilation.
>
> Patch 10 adds `memory-backend-shm` that uses the POSIX shm_open() API to
> create shared memory which is identified by an fd that can be shared with
> vhost-user backends. This is useful on those systems (like macOS) where
> we don't have memfd_create() or special filesystems like "/dev/shm".
>
> Patches 11 and 12 use `memory-backend-shm` in some vhost-user tests.
>
> Maybe the first 5 patches can go separately, but I only discovered those
> problems after testing patches 6 - 9, so I have included them in this series
> for now. Please let me know if you prefer that I send them separately.
>
> I tested this series using vhost-user-blk and QSD on macOS Sonoma 14.4
> (aarch64), FreeBSD 14 (x86_64), OpenBSD 7.4 (x86_64), and Fedora 40 (x86_64)
> in this way:
>
> - Start vhost-user-blk or QSD (same commands for all systems)
>
>   vhost-user-blk -s /tmp/vhost.socket \
> -b Fedora-Cloud-Base-39-1.5.x86_64.raw
>
>   qemu-storage-daemon \
> --blockdev 
> file,filename=Fedora-Cloud-Base-39-1.5.x86_64.qcow2,node-name=file \
> --blockdev qcow2,file=file,node-name=qcow2 \
> --export 
> vhost-user-blk,addr.type=unix,addr.path=/tmp/vhost.socket,id=vub,num-queues=1,node-name=qcow2,writable=on
>
> - macOS (aarch64): start QEMU (using hvf accelerator)
>
>   qemu-system-aarch64 -smp 2 -cpu host -M virt,accel=hvf,memory-backend=mem \
> -drive 
> file=./build/pc-bios/edk2-aarch64-code.fd,if=pflash,format=raw,readonly=on \
> -device virtio-net-device,netdev=net0 -netdev user,id=net0 \
> -device ramfb -device usb-ehci -device usb-kbd \
> -object memory-backend-shm,id=mem,size=512M \
> -device vhost-user-blk-pci,num-queues=1,disable-legacy=on,chardev=char0 \
> -chardev socket,id=char0,path=/tmp/vhost.socket
>
> - FreeBSD/OpenBSD (x86_64): start QEMU (no accelerators available)
>
>   qemu-system-x86_64 -smp 2 -M q35,memory-backend=mem \
> -object memory-backend-shm,id=mem,size="512M" \
> -device vhost-user-blk-pci,num-queues=1,chardev=char0 \
> -chardev socket,id=char0,path=/tmp/vhost.socket
>
> - Fedora (x86_64): start QEMU (using kvm accelerator)
>
>   qemu-system-x86_64 -smp 2 -M q35,accel=kvm,memory-backend=mem \
> -object memory-backend-shm,size="512M" \
> -device vhost-user-blk-pci,num-queues=1,chardev=char0 \
> -chardev socket,id=char0,path=/tmp/vhost.socket
>
> Branch pushed (and CI started) at 
> https://gitlab.com/sgarzarella/qemu/-/tree/macos-vhost-user?ref_type=heads
>
> Based-on: 20240611130231.83152-1-sgarz...@redhat.com
>
> Stefano Garzarella (13):
>   qapi: clarify that the default is backend dependent
>   libvhost-user: set msg.msg_control to NULL when it is empty
>   libvhost-user: fail vu_message_write() if sendmsg() is failing
>   libvhost-user: mask F_INFLIGHT_SHMFD if memfd is not supported
>   vhost-user-serve

[PATCH RESEND v7 11/12] tests/qtest/vhost-user-blk-test: use memory-backend-shm

2024-06-12 Thread Stefano Garzarella

`memory-backend-memfd` is available only on Linux while the new
`memory-backend-shm` can be used on any POSIX-compliant operating
system. Let's use it so we can run the test in multiple environments.

Since we are here, let`s remove `share=on` which is the default for shm
(and also for memfd).

Acked-by: Thomas Huth 
Acked-by: Stefan Hajnoczi 
Reviewed-by: Philippe Mathieu-Daudé 
Tested-by: Philippe Mathieu-Daudé 
Reviewed-by: David Hildenbrand 
Signed-off-by: Stefano Garzarella 
---
v6
- removed `share=on` since it's the default [David]
---
 tests/qtest/vhost-user-blk-test.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/qtest/vhost-user-blk-test.c 
b/tests/qtest/vhost-user-blk-test.c
index 117b9acd10..ea90d41232 100644
--- a/tests/qtest/vhost-user-blk-test.c
+++ b/tests/qtest/vhost-user-blk-test.c
@@ -906,7 +906,7 @@ static void start_vhost_user_blk(GString *cmd_line, int 
vus_instances,
vhost_user_blk_bin);
 
 g_string_append_printf(cmd_line,
-" -object memory-backend-memfd,id=mem,size=256M,share=on "
+" -object memory-backend-shm,id=mem,size=256M "
 " -M memory-backend=mem -m 256M ");
 
 for (i = 0; i < vus_instances; i++) {
-- 
2.45.2

[PATCH RESEND v7 08/12] libvhost-user: enable it on any POSIX system

2024-06-12 Thread Stefano Garzarella

The vhost-user protocol is not really Linux-specific so let's enable
libvhost-user for any POSIX system.

Compiling it on macOS and FreeBSD some problems came up:
- avoid to include linux/vhost.h which is available only on Linux
  (vhost_types.h contains many of the things we need)
- macOS doesn't provide sys/endian.h, so let's define them
  (note: libvhost-user doesn't include QEMU's headers, so we can't use
   use "qemu/bswap.h")
- define eventfd_[write|read] as write/read wrapper when system doesn't
  provide those (e.g. macOS)
- copy SEAL defines from include/qemu/memfd.h to make the code works
  on FreeBSD where MFD_ALLOW_SEALING is defined
- define MAP_NORESERVE if it's not defined (e.g. on FreeBSD)

Reviewed-by: Philippe Mathieu-Daudé 
Tested-by: Philippe Mathieu-Daudé 
Acked-by: Stefan Hajnoczi 
Signed-off-by: Stefano Garzarella 
---
v5:
- fixed typos in the commit description [Phil]
---
 meson.build   |  2 +-
 subprojects/libvhost-user/libvhost-user.h |  2 +-
 subprojects/libvhost-user/libvhost-user.c | 60 +--
 3 files changed, 59 insertions(+), 5 deletions(-)

diff --git a/meson.build b/meson.build
index 239044070f..6413e858ea 100644
--- a/meson.build
+++ b/meson.build
@@ -3172,7 +3172,7 @@ if have_system and vfio_user_server_allowed
 endif
 
 vhost_user = not_found
-if host_os == 'linux' and have_vhost_user
+if have_vhost_user
   libvhost_user = subproject('libvhost-user')
   vhost_user = libvhost_user.get_variable('vhost_user_dep')
 endif
diff --git a/subprojects/libvhost-user/libvhost-user.h 
b/subprojects/libvhost-user/libvhost-user.h
index deb40e77b3..e13e1d3931 100644
--- a/subprojects/libvhost-user/libvhost-user.h
+++ b/subprojects/libvhost-user/libvhost-user.h
@@ -18,9 +18,9 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include "standard-headers/linux/virtio_ring.h"
+#include "standard-headers/linux/vhost_types.h"
 
 /* Based on qemu/hw/virtio/vhost-user.c */
 #define VHOST_USER_F_PROTOCOL_FEATURES 30
diff --git a/subprojects/libvhost-user/libvhost-user.c 
b/subprojects/libvhost-user/libvhost-user.c
index 2c20cdc16e..57e58d4adb 100644
--- a/subprojects/libvhost-user/libvhost-user.c
+++ b/subprojects/libvhost-user/libvhost-user.c
@@ -28,9 +28,7 @@
 #include 
 #include 
 #include 
-#include 
 #include 
-#include 
 
 /* Necessary to provide VIRTIO_F_VERSION_1 on system
  * with older linux headers. Must appear before
@@ -39,8 +37,8 @@
 #include "standard-headers/linux/virtio_config.h"
 
 #if defined(__linux__)
+#include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -52,6 +50,62 @@
 
 #endif
 
+#if defined(__APPLE__) && (__MACH__)
+#include 
+#define htobe16(x) OSSwapHostToBigInt16(x)
+#define htole16(x) OSSwapHostToLittleInt16(x)
+#define be16toh(x) OSSwapBigToHostInt16(x)
+#define le16toh(x) OSSwapLittleToHostInt16(x)
+
+#define htobe32(x) OSSwapHostToBigInt32(x)
+#define htole32(x) OSSwapHostToLittleInt32(x)
+#define be32toh(x) OSSwapBigToHostInt32(x)
+#define le32toh(x) OSSwapLittleToHostInt32(x)
+
+#define htobe64(x) OSSwapHostToBigInt64(x)
+#define htole64(x) OSSwapHostToLittleInt64(x)
+#define be64toh(x) OSSwapBigToHostInt64(x)
+#define le64toh(x) OSSwapLittleToHostInt64(x)
+#endif
+
+#ifdef CONFIG_EVENTFD
+#include 
+#else
+#define eventfd_t uint64_t
+
+int eventfd_write(int fd, eventfd_t value)
+{
+return (write(fd, &value, sizeof(value)) == sizeof(value)) ? 0 : -1;
+}
+
+int eventfd_read(int fd, eventfd_t *value)
+{
+return (read(fd, value, sizeof(*value)) == sizeof(*value)) ? 0 : -1;
+}
+#endif
+
+#ifdef MFD_ALLOW_SEALING
+#include 
+
+#ifndef F_LINUX_SPECIFIC_BASE
+#define F_LINUX_SPECIFIC_BASE 1024
+#endif
+
+#ifndef F_ADD_SEALS
+#define F_ADD_SEALS (F_LINUX_SPECIFIC_BASE + 9)
+#define F_GET_SEALS (F_LINUX_SPECIFIC_BASE + 10)
+
+#define F_SEAL_SEAL 0x0001  /* prevent further seals from being set */
+#define F_SEAL_SHRINK   0x0002  /* prevent file from shrinking */
+#define F_SEAL_GROW 0x0004  /* prevent file from growing */
+#define F_SEAL_WRITE0x0008  /* prevent writes */
+#endif
+#endif
+
+#ifndef MAP_NORESERVE
+#define MAP_NORESERVE 0
+#endif
+
 #include "include/atomic.h"
 
 #include "libvhost-user.h"
-- 
2.45.2

[PATCH RESEND v7 06/12] contrib/vhost-user-*: use QEMU bswap helper functions

2024-06-12 Thread Stefano Garzarella

Let's replace the calls to le*toh() and htole*() with qemu/bswap.h
helpers to make the code more portable.

Suggested-by: Philippe Mathieu-Daudé 
Reviewed-by: Philippe Mathieu-Daudé 
Tested-by: Philippe Mathieu-Daudé 
Acked-by: Stefan Hajnoczi 
Reviewed-by: David Hildenbrand 
Signed-off-by: Stefano Garzarella 
---
 contrib/vhost-user-blk/vhost-user-blk.c |  9 +
 contrib/vhost-user-input/main.c | 16 
 2 files changed, 13 insertions(+), 12 deletions(-)

diff --git a/contrib/vhost-user-blk/vhost-user-blk.c 
b/contrib/vhost-user-blk/vhost-user-blk.c
index a8ab9269a2..9492146855 100644
--- a/contrib/vhost-user-blk/vhost-user-blk.c
+++ b/contrib/vhost-user-blk/vhost-user-blk.c
@@ -16,6 +16,7 @@
  */
 
 #include "qemu/osdep.h"
+#include "qemu/bswap.h"
 #include "standard-headers/linux/virtio_blk.h"
 #include "libvhost-user-glib.h"
 
@@ -194,8 +195,8 @@ vub_discard_write_zeroes(VubReq *req, struct iovec *iov, 
uint32_t iovcnt,
 #if defined(__linux__) && defined(BLKDISCARD) && defined(BLKZEROOUT)
 VubDev *vdev_blk = req->vdev_blk;
 desc = buf;
-uint64_t range[2] = { le64toh(desc->sector) << 9,
-  le32toh(desc->num_sectors) << 9 };
+uint64_t range[2] = { le64_to_cpu(desc->sector) << 9,
+  le32_to_cpu(desc->num_sectors) << 9 };
 if (type == VIRTIO_BLK_T_DISCARD) {
 if (ioctl(vdev_blk->blk_fd, BLKDISCARD, range) == 0) {
 g_free(buf);
@@ -267,13 +268,13 @@ static int vub_virtio_process_req(VubDev *vdev_blk,
 req->in = (struct virtio_blk_inhdr *)elem->in_sg[in_num - 1].iov_base;
 in_num--;
 
-type = le32toh(req->out->type);
+type = le32_to_cpu(req->out->type);
 switch (type & ~VIRTIO_BLK_T_BARRIER) {
 case VIRTIO_BLK_T_IN:
 case VIRTIO_BLK_T_OUT: {
 ssize_t ret = 0;
 bool is_write = type & VIRTIO_BLK_T_OUT;
-req->sector_num = le64toh(req->out->sector);
+req->sector_num = le64_to_cpu(req->out->sector);
 if (is_write) {
 ret  = vub_writev(req, &elem->out_sg[1], out_num);
 } else {
diff --git a/contrib/vhost-user-input/main.c b/contrib/vhost-user-input/main.c
index 081230da54..f3362d41ac 100644
--- a/contrib/vhost-user-input/main.c
+++ b/contrib/vhost-user-input/main.c
@@ -51,8 +51,8 @@ static void vi_input_send(VuInput *vi, struct 
virtio_input_event *event)
 vi->queue[vi->qindex++].event = *event;
 
 /* ... until we see a report sync ... */
-if (event->type != htole16(EV_SYN) ||
-event->code != htole16(SYN_REPORT)) {
+if (event->type != cpu_to_le16(EV_SYN) ||
+event->code != cpu_to_le16(SYN_REPORT)) {
 return;
 }
 
@@ -103,9 +103,9 @@ vi_evdev_watch(VuDev *dev, int condition, void *data)
 
 g_debug("input %d %d %d", evdev.type, evdev.code, evdev.value);
 
-virtio.type  = htole16(evdev.type);
-virtio.code  = htole16(evdev.code);
-virtio.value = htole32(evdev.value);
+virtio.type  = cpu_to_le16(evdev.type);
+virtio.code  = cpu_to_le16(evdev.code);
+virtio.value = cpu_to_le32(evdev.value);
 vi_input_send(vi, &virtio);
 }
 }
@@ -124,9 +124,9 @@ static void vi_handle_status(VuInput *vi, 
virtio_input_event *event)
 
 evdev.input_event_sec = tval.tv_sec;
 evdev.input_event_usec = tval.tv_usec;
-evdev.type = le16toh(event->type);
-evdev.code = le16toh(event->code);
-evdev.value = le32toh(event->value);
+evdev.type = le16_to_cpu(event->type);
+evdev.code = le16_to_cpu(event->code);
+evdev.value = le32_to_cpu(event->value);
 
 rc = write(vi->evdevfd, &evdev, sizeof(evdev));
 if (rc == -1) {
-- 
2.45.2

[PATCH RESEND v7 04/12] vhost-user-server: do not set memory fd non-blocking

2024-06-12 Thread Stefano Garzarella

In vhost-user-server we set all fd received from the other peer
in non-blocking mode. For some of them (e.g. memfd, shm_open, etc.)
it's not really needed, because we don't use these fd with blocking
operations, but only to map memory.

In addition, in some systems this operation can fail (e.g. in macOS
setting an fd returned by shm_open() non-blocking fails with errno
= ENOTTY).

So, let's avoid setting fd non-blocking for those messages that we
know carry memory fd (e.g. VHOST_USER_ADD_MEM_REG,
VHOST_USER_SET_MEM_TABLE).

Reviewed-by: Daniel P. Berrangé 
Acked-by: Stefan Hajnoczi 
Reviewed-by: David Hildenbrand 
Signed-off-by: Stefano Garzarella 
---
v3:
- avoiding setting fd non-blocking for messages where we have memory fd
  (Eric)
---
 util/vhost-user-server.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/util/vhost-user-server.c b/util/vhost-user-server.c
index 3bfb1ad3ec..b19229074a 100644
--- a/util/vhost-user-server.c
+++ b/util/vhost-user-server.c
@@ -65,6 +65,18 @@ static void vmsg_close_fds(VhostUserMsg *vmsg)
 static void vmsg_unblock_fds(VhostUserMsg *vmsg)
 {
 int i;
+
+/*
+ * These messages carry fd used to map memory, not to send/receive 
messages,
+ * so this operation is useless. In addition, in some systems this
+ * operation can fail (e.g. in macOS setting an fd returned by shm_open()
+ * non-blocking fails with errno = ENOTTY)
+ */
+if (vmsg->request == VHOST_USER_ADD_MEM_REG ||
+vmsg->request == VHOST_USER_SET_MEM_TABLE) {
+return;
+}
+
 for (i = 0; i < vmsg->fd_num; i++) {
 qemu_socket_set_nonblock(vmsg->fds[i]);
 }
-- 
2.45.2

[PATCH v7 00/13] vhost-user: support any POSIX system (tested on macOS, FreeBSD, OpenBSD)

2024-06-12 Thread Stefano Garzarella

This series should be in a good shape, in which tree should we queue it?
@Micheal would your tree be okay?

Thanks,
Stefano

Changelog

v1: https://patchew.org/QEMU/20240228114759.44758-1-sgarz...@redhat.com/
v2: https://patchew.org/QEMU/20240326133936.125332-1-sgarz...@redhat.com/
v3: https://patchew.org/QEMU/20240404122330.92710-1-sgarz...@redhat.com/
v4: https://patchew.org/QEMU/20240508074457.12367-1-sgarz...@redhat.com/
v5: https://patchew.org/QEMU/20240523145522.313012-1-sgarz...@redhat.com/
v6: https://patchew.org/QEMU/20240528103543.145412-1-sgarz...@redhat.com/
v7:
- rebased on 
https://patchew.org/QEMU/20240611130231.83152-1-sgarz...@redhat.com/
  That patch is queued by Markus and only Patch 10 of this series depends on it.
- changed default value documentation for @share [Markus]
- used `memory-backend-shm` instead of `shm` and wrapped to 70 columns
  [Markus]
- added 'if': 'CONFIG_POSIX' to MemoryBackendShmProperties [Markus]

Description

The vhost-user protocol is not really Linux-specific, so let's try support
QEMU's frontends and backends (including libvhost-user) in any POSIX system
with this series. The main use case is to be able to use virtio devices that
we don't have built-in in QEMU (e.g. virtiofsd, vhost-user-vsock, etc.) even
in non-Linux systems.

The first 5 patches are more like fixes discovered at runtime on macOS or
FreeBSD that could go even independently of this series.

Patches 6, 7, 8, 9 enable building of frontends and backends (including
libvhost-user) with associated code changes to succeed in compilation.

Patch 10 adds `memory-backend-shm` that uses the POSIX shm_open() API to
create shared memory which is identified by an fd that can be shared with
vhost-user backends. This is useful on those systems (like macOS) where
we don't have memfd_create() or special filesystems like "/dev/shm".

Patches 11 and 12 use `memory-backend-shm` in some vhost-user tests.

Maybe the first 5 patches can go separately, but I only discovered those
problems after testing patches 6 - 9, so I have included them in this series
for now. Please let me know if you prefer that I send them separately.

I tested this series using vhost-user-blk and QSD on macOS Sonoma 14.4
(aarch64), FreeBSD 14 (x86_64), OpenBSD 7.4 (x86_64), and Fedora 40 (x86_64)
in this way:

- Start vhost-user-blk or QSD (same commands for all systems)

  vhost-user-blk -s /tmp/vhost.socket \
-b Fedora-Cloud-Base-39-1.5.x86_64.raw

  qemu-storage-daemon \
--blockdev 
file,filename=Fedora-Cloud-Base-39-1.5.x86_64.qcow2,node-name=file \
--blockdev qcow2,file=file,node-name=qcow2 \
--export 
vhost-user-blk,addr.type=unix,addr.path=/tmp/vhost.socket,id=vub,num-queues=1,node-name=qcow2,writable=on

- macOS (aarch64): start QEMU (using hvf accelerator)

  qemu-system-aarch64 -smp 2 -cpu host -M virt,accel=hvf,memory-backend=mem \
-drive 
file=./build/pc-bios/edk2-aarch64-code.fd,if=pflash,format=raw,readonly=on \
-device virtio-net-device,netdev=net0 -netdev user,id=net0 \
-device ramfb -device usb-ehci -device usb-kbd \
-object memory-backend-shm,id=mem,size=512M \
-device vhost-user-blk-pci,num-queues=1,disable-legacy=on,chardev=char0 \
-chardev socket,id=char0,path=/tmp/vhost.socket

- FreeBSD/OpenBSD (x86_64): start QEMU (no accelerators available)

  qemu-system-x86_64 -smp 2 -M q35,memory-backend=mem \
-object memory-backend-shm,id=mem,size="512M" \
-device vhost-user-blk-pci,num-queues=1,chardev=char0 \
-chardev socket,id=char0,path=/tmp/vhost.socket

- Fedora (x86_64): start QEMU (using kvm accelerator)

  qemu-system-x86_64 -smp 2 -M q35,accel=kvm,memory-backend=mem \
-object memory-backend-shm,size="512M" \
-device vhost-user-blk-pci,num-queues=1,chardev=char0 \
-chardev socket,id=char0,path=/tmp/vhost.socket

Branch pushed (and CI started) at 
https://gitlab.com/sgarzarella/qemu/-/tree/macos-vhost-user?ref_type=heads

Based-on: 20240611130231.83152-1-sgarz...@redhat.com

Stefano Garzarella (13):
  qapi: clarify that the default is backend dependent
  libvhost-user: set msg.msg_control to NULL when it is empty
  libvhost-user: fail vu_message_write() if sendmsg() is failing
  libvhost-user: mask F_INFLIGHT_SHMFD if memfd is not supported
  vhost-user-server: do not set memory fd non-blocking
  contrib/vhost-user-blk: fix bind() using the right size of the address
  contrib/vhost-user-*: use QEMU bswap helper functions
  vhost-user: enable frontends on any POSIX system
  libvhost-user: enable it on any POSIX system
  contrib/vhost-user-blk: enable it on any POSIX system
  hostmem: add a new memory backend based on POSIX shm_open()
  tests/qtest/vhost-user-blk-test: use memory-backend-shm
  tests/qtest/vhost-user-test: add a test case for memory-backend-shm

 docs/system/devices/vhost-user.rst|   5 +-
 meson.build   |   5 +-
 qapi/qom.json

[PATCH v2] qapi: clarify that the default is backend dependent

2024-06-11 Thread Stefano Garzarella

The default value of the @share option of the @MemoryBackendProperties
really depends on the backend type, so let's document the default
values in the same place where we define the option to avoid
dispersing the information.

Cc: David Hildenbrand 
Suggested-by: Markus Armbruster 
Signed-off-by: Stefano Garzarella 
---
v2:
- documented @share's default right where it's defined [Markus]

v1: https://patchew.org/QEMU/20240523133302.103858-1-sgarz...@redhat.com/
---
 qapi/qom.json | 8 +++-
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/qapi/qom.json b/qapi/qom.json
index 8bd299265e..9b8f6a7ab5 100644
--- a/qapi/qom.json
+++ b/qapi/qom.json
@@ -600,7 +600,9 @@
 # preallocation threads (default: none) (since 7.2)
 #
 # @share: if false, the memory is private to QEMU; if true, it is
-# shared (default: false)
+# shared (default false for backends memory-backend-file and
+# memory-backend-ram, true for backends memory-backend-epc and
+# memory-backend-memfd)
 #
 # @reserve: if true, reserve swap space (or huge pages) if applicable
 # (default: true) (since 6.1)
@@ -700,8 +702,6 @@
 #
 # Properties for memory-backend-memfd objects.
 #
-# The @share boolean option is true by default with memfd.
-#
 # @hugetlb: if true, the file to be created resides in the hugetlbfs
 # filesystem (default: false)
 #
@@ -726,8 +726,6 @@
 #
 # Properties for memory-backend-epc objects.
 #
-# The @share boolean option is true by default with epc
-#
 # The @merge boolean option is false by default with epc
 #
 # The @dump boolean option is false by default with epc
-- 
2.45.2

Re: [PATCH] qapi: clarify that the default is backend dependent

2024-06-06 Thread Stefano Garzarella


On Tue, Jun 04, 2024 at 04:58:49PM GMT, Markus Armbruster wrote:

Stefano Garzarella  writes:


On Mon, Jun 03, 2024 at 11:34:10AM GMT, Markus Armbruster wrote:

Stefano Garzarella  writes:


The default value of the @share option of the @MemoryBackendProperties
eally depends on the backend type, so let's document it explicitly and
add the default value where it was missing.

Cc: David Hildenbrand 
Suggested-by: Markus Armbruster 
Signed-off-by: Stefano Garzarella 
---
I followed how we document @share in memfd and epc, but I don't like it
very much, I just can't think of a better way, so if you have a suggestion
I can change them in all of them.

Thanks,
Stefano
---
 qapi/qom.json | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/qapi/qom.json b/qapi/qom.json
index 38dde6d785..8463bd32a2 100644
--- a/qapi/qom.json
+++ b/qapi/qom.json
@@ -600,7 +600,7 @@

  ##
  # @MemoryBackendProperties:
  #
  # Properties for objects of classes derived from memory-backend.
  #

[...]


 # preallocation threads (default: none) (since 7.2)
 #
 # @share: if false, the memory is private to QEMU; if true, it is
-# shared (default: false)
+# shared (default depends on the backend type)


Note for later: the backends are the branches of ObjectOptions that use
MemoryBackendProperties as branch type or as base of their branch type.
These are

   memory-backend-epc (uses MemoryBackendEpcProperties)
   memory-backend-file (uses MemoryBackendFileProperties)
   memory-backend-memfd (uses MemoryBackendMemfdProperties)
   memory-backend-ram (uses MemoryBackendProperties)


 #
 # @reserve: if true, reserve swap space (or huge pages) if applicable
 # (default: true) (since 6.1)
@@ -639,6 +639,8 @@
 #
 # Properties for memory-backend-file objects.
 #
+# The @share boolean option is false by default with file.
+#
 # @align: the base address alignment when QEMU mmap(2)s @mem-path.
 # Some backend stores specified by @mem-path require an alignment
 # different than the default one used by QEMU, e.g. the device DAX


As stated in the commit message, this matches existing documentation in
memory-backend-epc

  # The @share boolean option is true by default with epc

and memory-backend-memfd

  # The @share boolean option is true by default with memfd.

I think "with FOO" could be clearer.  Perhaps something like "with
backend 'memory-backend-FOO'.


Ack, I'll do.



However, even with your patch, we're still missing memory-backend-ram.
I can see two solutions:

1. Create MemoryBackendRamProperties just to have a place for
documenting @share's default.

2. Document @share's default right where it's defined, roughly like
this:

  # @share: if false, the memory is private to QEMU; if true, it is
  # shared (default false for backends memory-backend-file and
  # memory-backend-ram, true for backends memory-backend-epc and
  # memory-backend-memfd)

CON: we need to remember to update this whenever we add another backend.

PRO: generated documentation is better, in my opinion.

Thoughts?



Maybe option 2 is slightly better and it's also clearer how to document the 
default for other backends.

When I added a new backend, it was not clear to me how to define the default 
for an inherited parameter.

I would go with 2 if you agree.


I actually like 2 better :)



Yeah, I'll do it ;-)

Thanks,
Stefano

[PATCH] qapi/qom: make some QOM properties depend on the build settings

2024-06-04 Thread Stefano Garzarella

Some QOM properties are associated with ObjectTypes that already
depend on CONFIG_* switches. So to avoid generating dead code,
let's also make the definition of those properties dependent on
the corresponding CONFIG_*.

Suggested-by: Markus Armbruster 
Signed-off-by: Stefano Garzarella 
---
 qapi/qom.json | 21 ++---
 1 file changed, 14 insertions(+), 7 deletions(-)

diff --git a/qapi/qom.json b/qapi/qom.json
index 38dde6d785..ae93313a60 100644
--- a/qapi/qom.json
+++ b/qapi/qom.json
@@ -222,7 +222,8 @@
 ##
 { 'struct': 'CanHostSocketcanProperties',
   'data': { 'if': 'str',
-'canbus': 'str' } }
+'canbus': 'str' },
+  'if': 'CONFIG_LINUX' }
 
 ##
 # @ColoCompareProperties:
@@ -305,7 +306,8 @@
 ##
 { 'struct': 'CryptodevVhostUserProperties',
   'base': 'CryptodevBackendProperties',
-  'data': { 'chardev': 'str' } }
+  'data': { 'chardev': 'str' },
+  'if': 'CONFIG_VHOST_CRYPTO' }
 
 ##
 # @DBusVMStateProperties:
@@ -514,7 +516,8 @@
   'data': { 'evdev': 'str',
 '*grab_all': 'bool',
 '*repeat': 'bool',
-'*grab-toggle': 'GrabToggleKeys' } }
+'*grab-toggle': 'GrabToggleKeys' },
+  'if': 'CONFIG_LINUX' }
 
 ##
 # @EventLoopBaseProperties:
@@ -719,7 +722,8 @@
   'base': 'MemoryBackendProperties',
   'data': { '*hugetlb': 'bool',
 '*hugetlbsize': 'size',
-'*seal': 'bool' } }
+'*seal': 'bool' },
+  'if': 'CONFIG_LINUX' }
 
 ##
 # @MemoryBackendEpcProperties:
@@ -736,7 +740,8 @@
 ##
 { 'struct': 'MemoryBackendEpcProperties',
   'base': 'MemoryBackendProperties',
-  'data': {} }
+  'data': {},
+  'if': 'CONFIG_LINUX' }
 
 ##
 # @PrManagerHelperProperties:
@@ -749,7 +754,8 @@
 # Since: 2.11
 ##
 { 'struct': 'PrManagerHelperProperties',
-  'data': { 'path': 'str' } }
+  'data': { 'path': 'str' },
+  'if': 'CONFIG_LINUX' }
 
 ##
 # @QtestProperties:
@@ -872,7 +878,8 @@
 ##
 { 'struct': 'RngRandomProperties',
   'base': 'RngProperties',
-  'data': { '*filename': 'str' } }
+  'data': { '*filename': 'str' },
+  'if': 'CONFIG_POSIX' }
 
 ##
 # @SevGuestProperties:
-- 
2.45.1

Re: [PATCH v6 10/12] hostmem: add a new memory backend based on POSIX shm_open()

2024-06-04 Thread Stefano Garzarella


On Mon, Jun 03, 2024 at 11:42:35AM GMT, Markus Armbruster wrote:

Stefano Garzarella  writes:


On Wed, May 29, 2024 at 04:50:20PM GMT, Markus Armbruster wrote:

Stefano Garzarella  writes:


shm_open() creates and opens a new POSIX shared memory object.
A POSIX shared memory object allows creating memory backend with an
associated file descriptor that can be shared with external processes
(e.g. vhost-user).

The new `memory-backend-shm` can be used as an alternative when
`memory-backend-memfd` is not available (Linux only), since shm_open()
should be provided by any POSIX-compliant operating system.

This backend mimics memfd, allocating memory that is practically
anonymous. In theory shm_open() requires a name, but this is allocated
for a short time interval and shm_unlink() is called right after
shm_open(). After that, only fd is shared with external processes
(e.g., vhost-user) as if it were associated with anonymous memory.

In the future we may also allow the user to specify the name to be
passed to shm_open(), but for now we keep the backend simple, mimicking
anonymous memory such as memfd.

Acked-by: David Hildenbrand 
Acked-by: Stefan Hajnoczi 
Signed-off-by: Stefano Garzarella 
---
v5
- fixed documentation in qapi/qom.json and qemu-options.hx [Markus]
v4
- fail if we find "share=off" in shm_backend_memory_alloc() [David]
v3
- enriched commit message and documentation to highlight that we
  want to mimic memfd (David)
---
 docs/system/devices/vhost-user.rst |   5 +-
 qapi/qom.json  |  19 +
 backends/hostmem-shm.c | 123 +
 backends/meson.build   |   1 +
 qemu-options.hx|  16 
 5 files changed, 162 insertions(+), 2 deletions(-)
 create mode 100644 backends/hostmem-shm.c

diff --git a/docs/system/devices/vhost-user.rst 
b/docs/system/devices/vhost-user.rst
index 9b2da106ce..35259d8ec7 100644
--- a/docs/system/devices/vhost-user.rst
+++ b/docs/system/devices/vhost-user.rst
@@ -98,8 +98,9 @@ Shared memory object

 In order for the daemon to access the VirtIO queues to process the
 requests it needs access to the guest's address space. This is
-achieved via the ``memory-backend-file`` or ``memory-backend-memfd``
-objects. A reference to a file-descriptor which can access this object
+achieved via the ``memory-backend-file``, ``memory-backend-memfd``, or
+``memory-backend-shm`` objects.
+A reference to a file-descriptor which can access this object
 will be passed via the socket as part of the protocol negotiation.

 Currently the shared memory object needs to match the size of the main
diff --git a/qapi/qom.json b/qapi/qom.json
index 38dde6d785..d40592d863 100644
--- a/qapi/qom.json
+++ b/qapi/qom.json
@@ -721,6 +721,21 @@
 '*hugetlbsize': 'size',
 '*seal': 'bool' } }

+##
+# @MemoryBackendShmProperties:
+#
+# Properties for memory-backend-shm objects.
+#
+# The @share boolean option is true by default with shm. Setting it to false
+# will cause a failure during allocation because it is not supported by this
+# backend.


docs/devel/qapi-code-gen.rst:

   For legibility, wrap text paragraphs so every line is at most 70
   characters long.

   Separate sentences with two spaces.

Result:

  # Properties for memory-backend-shm objects.
  #
  # The @share boolean option is true by default with shm.  Setting it
  # to false will cause a failure during allocation because it is not
  # supported by this backend.


Ops, sorry, I'll fix!



However, this contradicts the doc comment for @share:

  # @share: if false, the memory is private to QEMU; if true, it is
  # shared (default: false)

Your intention is to override that text.  But that's less than clear.
Moreover, the documentation of @share is pretty far from this override.
John Snow is working on patches that'll pull it closer.

Hmm, MemoryBackendMemfdProperties has the same override.

I think we should change the doc comment for @share to something like

  # @share: if false, the memory is private to QEMU; if true, it is
  # shared (default depends on the backend type)

and then document the actual default with each backend type.


Yes, I had already seen your comment to an earlier version and sent another 
separate patch:
https://patchew.org/QEMU/20240523133302.103858-1-sgarz...@redhat.com/

Is that okay?


Looks like I'm going through my post-vacation review backlog in
suboptimal order...

Replied there!


Thanks!




+#
+# Since: 9.1
+##
+{ 'struct': 'MemoryBackendShmProperties',
+  'base': 'MemoryBackendProperties',
+  'data': { } }


Let's add 'if': 'CONFIG_POSIX' here.



I think my response to your review at v4 fell through a crack :-)
https://patchew.org/QEMU/20240508074457.12367-1-sgarz...@redhat.com/20240508074457.12367-11-sgarz...@redhat.com/#z3lbtmkn6zlwdhd

Re: [PATCH] qapi: clarify that the default is backend dependent

2024-06-04 Thread Stefano Garzarella


On Mon, Jun 03, 2024 at 11:34:10AM GMT, Markus Armbruster wrote:

Stefano Garzarella  writes:


The default value of the @share option of the @MemoryBackendProperties
eally depends on the backend type, so let's document it explicitly and
add the default value where it was missing.

Cc: David Hildenbrand 
Suggested-by: Markus Armbruster 
Signed-off-by: Stefano Garzarella 
---
I followed how we document @share in memfd and epc, but I don't like it
very much, I just can't think of a better way, so if you have a suggestion
I can change them in all of them.

Thanks,
Stefano
---
 qapi/qom.json | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/qapi/qom.json b/qapi/qom.json
index 38dde6d785..8463bd32a2 100644
--- a/qapi/qom.json
+++ b/qapi/qom.json
@@ -600,7 +600,7 @@

  ##
  # @MemoryBackendProperties:
  #
  # Properties for objects of classes derived from memory-backend.
  #

[...]


 # preallocation threads (default: none) (since 7.2)
 #
 # @share: if false, the memory is private to QEMU; if true, it is
-# shared (default: false)
+# shared (default depends on the backend type)


Note for later: the backends are the branches of ObjectOptions that use
MemoryBackendProperties as branch type or as base of their branch type.
These are

   memory-backend-epc (uses MemoryBackendEpcProperties)
   memory-backend-file (uses MemoryBackendFileProperties)
   memory-backend-memfd (uses MemoryBackendMemfdProperties)
   memory-backend-ram (uses MemoryBackendProperties)


 #
 # @reserve: if true, reserve swap space (or huge pages) if applicable
 # (default: true) (since 6.1)
@@ -639,6 +639,8 @@
 #
 # Properties for memory-backend-file objects.
 #
+# The @share boolean option is false by default with file.
+#
 # @align: the base address alignment when QEMU mmap(2)s @mem-path.
 # Some backend stores specified by @mem-path require an alignment
 # different than the default one used by QEMU, e.g. the device DAX


As stated in the commit message, this matches existing documentation in
memory-backend-epc

  # The @share boolean option is true by default with epc

and memory-backend-memfd

  # The @share boolean option is true by default with memfd.

I think "with FOO" could be clearer.  Perhaps something like "with
backend 'memory-backend-FOO'.


Ack, I'll do.



However, even with your patch, we're still missing memory-backend-ram.
I can see two solutions:

1. Create MemoryBackendRamProperties just to have a place for
documenting @share's default.

2. Document @share's default right where it's defined, roughly like
this:

  # @share: if false, the memory is private to QEMU; if true, it is
  # shared (default false for backends memory-backend-file and
  # memory-backend-ram, true for backends memory-backend-epc and
  # memory-backend-memfd)

CON: we need to remember to update this whenever we add another backend.

PRO: generated documentation is better, in my opinion.

Thoughts?



Maybe option 2 is slightly better and it's also clearer how to document 
the default for other backends.


When I added a new backend, it was not clear to me how to define the 
default for an inherited parameter.


I would go with 2 if you agree.

Thanks,
Stefano

Re: [PATCH 1/1] vhost-vsock: add VIRTIO_F_RING_PACKED to feaure_bits

2024-05-30 Thread Stefano Garzarella

On Wed, May 29, 2024 at 02:49:28PM GMT, Halil Pasic wrote:

On Tue, 28 May 2024 17:32:26 +0200
Stefano Garzarella  wrote:

>1) The uses is explicitly asking for a vhost device and giving the user
>a non vhost device is not an option.

I didn't get this point :-( can you elaborate?

I was thinking along the lines: QEMU gets told what devices to
provision, and that includes things like what virtio features,
and what kind of a backend.

In this example, the default for vsock-vhost is no VIRTIO_F_RING_PACKED,
but if we tell QEMU to create a vsock-vhost device with the feature
VIRTIO_F_RING_PACKED, things go south in a not nice way.

Given that vhost not supporting VIRTIO_F_RING_PACKED as of today is a
fact of life we must accept, there are multiple ways how such a situation
can be handled.

For instance vhost-net is handling this by the device not offering the
VIRTIO_F_RING_PACKED feature. This is at least what I think I have
observed, but I would not mind somebody confirming it. But for the sake
of the argument, let us look at other options.

The straightforward one would be to not realize the device, because we
can't provide what we have been asked to provide. And this actually
makes me think about migration! What would happen, were we to
eventually introduce, packed to vhost and vhost net, and then attempt to
migrate between a host that has this new feature and host that has not. I
guess things would pretty much blow up in a very unpleasant way!

Yes, migration with vhost devices implies that the destination host 
supports at least the same features as the source host. We should 
consider how migration between 2 QEMUs and the destination doesn't 
support a required feature, I guess migration can't happen or the device 
has to be removed and then re-added without that feature.

Then for some devices, at least in theory, it might be possible to
abandon not the feature but the backend. Along the lines we were asked to
provide the feature X with backend Y but since backend Y does not
support that feature and backed Z does, we will determistically go
with backend Z. But IMHO this is a purely theoretical consideration, and
we shall not go this way.

Yep, I agree!

In any case if we are asked to provide with properties such that we
can't actually do that, something has to go out of the window: either
some of the properties, or the entire device.

I see.

Thanks,
Stefano

Re: [PATCH v6 10/12] hostmem: add a new memory backend based on POSIX shm_open()

2024-05-29 Thread Stefano Garzarella


On Wed, May 29, 2024 at 04:50:20PM GMT, Markus Armbruster wrote:

Stefano Garzarella  writes:


shm_open() creates and opens a new POSIX shared memory object.
A POSIX shared memory object allows creating memory backend with an
associated file descriptor that can be shared with external processes
(e.g. vhost-user).

The new `memory-backend-shm` can be used as an alternative when
`memory-backend-memfd` is not available (Linux only), since shm_open()
should be provided by any POSIX-compliant operating system.

This backend mimics memfd, allocating memory that is practically
anonymous. In theory shm_open() requires a name, but this is allocated
for a short time interval and shm_unlink() is called right after
shm_open(). After that, only fd is shared with external processes
(e.g., vhost-user) as if it were associated with anonymous memory.

In the future we may also allow the user to specify the name to be
passed to shm_open(), but for now we keep the backend simple, mimicking
anonymous memory such as memfd.

Acked-by: David Hildenbrand 
Acked-by: Stefan Hajnoczi 
Signed-off-by: Stefano Garzarella 
---
v5
- fixed documentation in qapi/qom.json and qemu-options.hx [Markus]
v4
- fail if we find "share=off" in shm_backend_memory_alloc() [David]
v3
- enriched commit message and documentation to highlight that we
  want to mimic memfd (David)
---
 docs/system/devices/vhost-user.rst |   5 +-
 qapi/qom.json  |  19 +
 backends/hostmem-shm.c | 123 +
 backends/meson.build   |   1 +
 qemu-options.hx|  16 
 5 files changed, 162 insertions(+), 2 deletions(-)
 create mode 100644 backends/hostmem-shm.c

diff --git a/docs/system/devices/vhost-user.rst 
b/docs/system/devices/vhost-user.rst
index 9b2da106ce..35259d8ec7 100644
--- a/docs/system/devices/vhost-user.rst
+++ b/docs/system/devices/vhost-user.rst
@@ -98,8 +98,9 @@ Shared memory object

 In order for the daemon to access the VirtIO queues to process the
 requests it needs access to the guest's address space. This is
-achieved via the ``memory-backend-file`` or ``memory-backend-memfd``
-objects. A reference to a file-descriptor which can access this object
+achieved via the ``memory-backend-file``, ``memory-backend-memfd``, or
+``memory-backend-shm`` objects.
+A reference to a file-descriptor which can access this object
 will be passed via the socket as part of the protocol negotiation.

 Currently the shared memory object needs to match the size of the main
diff --git a/qapi/qom.json b/qapi/qom.json
index 38dde6d785..d40592d863 100644
--- a/qapi/qom.json
+++ b/qapi/qom.json
@@ -721,6 +721,21 @@
 '*hugetlbsize': 'size',
 '*seal': 'bool' } }

+##
+# @MemoryBackendShmProperties:
+#
+# Properties for memory-backend-shm objects.
+#
+# The @share boolean option is true by default with shm. Setting it to false
+# will cause a failure during allocation because it is not supported by this
+# backend.


docs/devel/qapi-code-gen.rst:

   For legibility, wrap text paragraphs so every line is at most 70
   characters long.

   Separate sentences with two spaces.

Result:

  # Properties for memory-backend-shm objects.
  #
  # The @share boolean option is true by default with shm.  Setting it
  # to false will cause a failure during allocation because it is not
  # supported by this backend.


Ops, sorry, I'll fix!



However, this contradicts the doc comment for @share:

  # @share: if false, the memory is private to QEMU; if true, it is
  # shared (default: false)

Your intention is to override that text.  But that's less than clear.
Moreover, the documentation of @share is pretty far from this override.
John Snow is working on patches that'll pull it closer.

Hmm, MemoryBackendMemfdProperties has the same override.

I think we should change the doc comment for @share to something like

  # @share: if false, the memory is private to QEMU; if true, it is
  # shared (default depends on the backend type)

and then document the actual default with each backend type.


Yes, I had already seen your comment to an earlier version and sent 
another separate patch:

https://patchew.org/QEMU/20240523133302.103858-1-sgarz...@redhat.com/

Is that okay?




+#
+# Since: 9.1
+##
+{ 'struct': 'MemoryBackendShmProperties',
+  'base': 'MemoryBackendProperties',
+  'data': { } }


Let's add 'if': 'CONFIG_POSIX' here.



I think my response to your review at v4 fell through a crack :-)
https://patchew.org/QEMU/20240508074457.12367-1-sgarz...@redhat.com/20240508074457.12367-11-sgarz...@redhat.com/#z3lbtmkn6zlwdhdea7owav3mblttxr3asrmlilwxmkla67tdby@732gn3uuupoq

I'll bring back my doubts here:

  Do you mean something like this:

  { 'struct': 'MemoryBackendShmProperties',
 'if': 'CO

Re: [PATCH 1/1] vhost-vsock: add VIRTIO_F_RING_PACKED to feaure_bits

2024-05-28 Thread Stefano Garzarella

On Mon, May 27, 2024 at 01:27:10PM GMT, Halil Pasic wrote:

On Thu, 16 May 2024 10:39:42 +0200
Stefano Garzarella  wrote:

[..]

>---
>
>This is a minimal fix, that follows the current patterns in the
>codebase, and not necessarily the best one.

Yeah, I did something similar with commit 562a7d23bf ("vhost: mask
VIRTIO_F_RING_RESET for vhost and vhost-user devices") so I think for
now is the right approach.

I suggest to check also other devices like we did in that commit (e.g.
hw/scsi/vhost-scsi.c, hw/scsi/vhost-user-scsi.c, etc. )

Hi Stefano!

Thank you for chiming in, and sorry for the late response. I was hoping
that Michael is going to chime in and that I can base my reply on his
take. Anyway here I  go.

A very valid observation! I do agree that we need this for
basically every vhost device, and since:
* net/vhost-vdpa.c
* hw/net/vhost_net.c
* hw/virtio/vhost-user-fs.c
already have it, that translates to shotgun it to the rest. Which
isn't nice in my opinion, which is why I am hoping for a discussion
on this topic, and a better solution (even if it turns out to be
something like a common macro).

Yeah, I see your point and I agree on a better solution.

[..]

>
>The documentation however does kind of state, that feature_bits is
>supposed to contain the supported features. And under the assumption
>that feature bit not in feature_bits implies that the corresponding bit
>must not be set in the 3rd argument (features), then even with the
>current implementation we do end up with the intersection of the three
>as stated. And then vsock would be at fault for violating that
>assumption, and my fix would be the best thing to do -- I guess.
>
>Is the implementation the way it is for a good reason, I can't judge
>that with certainty for myself.

Yes, I think we should fix the documentation, and after a few years of
not looking at it I'm confused again about what it does.

I would prefer to fix the algorithm and make whole thing less fragile.

But re-reading my commit for VIRTIO_F_RING_RESET, it seems that I had
interpreted `feature_bits` (2nd argument) as a list of features that
QEMU doesn't know how to emulate and therefore are required by the
backend (vhost/vhost-user/vdpa). Because the problem is that `features`
(3rd argument) is a set of features required by the driver that can be
provided by both QEMU and the backend.

Hm. I would say, this does sound like the sanest explanation, that might
justify the current code, but I will argue that for me, it isn't sane
enough.

Here comes my argument.

1) The uses is explicitly asking for a vhost device and giving the user
a non vhost device is not an option.

I didn't get this point :-( can you elaborate?

2) The whole purpose of vhost is that at least the data plane is
implemented outside of QEMU (I am maybe a little sloppy here with
dataplane). That means a rather substantial portion of the device
implementation is not in QEMU, while QEMU remains in charge of the
setup.

Yep

3) Thus I would argue, that all the "transport feature bits" from 24 to
40 should have a corresponding vhost feature because the vhost part needs
some sort of a support.

What do we have there in bits from 24 to 40 according to the spec?
* VIRTIO_F_INDIRECT_DESC
* VIRTIO_F_EVENT_IDX
* VIRTIO_F_VERSION_1
* VIRTIO_F_ACCESS_PLATFORM
* VIRTIO_F_RING_PACKED
* VIRTIO_F_IN_ORDER
* VIRTIO_F_ORDER_PLATFORM
* VIRTIO_F_SR_IOV
* VIRTIO_F_NOTIFICATION_DATA
* VIRTIO_F_NOTIF_CONFIG_DATA
* VIRTIO_F_RING_RESET
and for transitional:
* VIRTIO_F_NOTIFY_ON_EMPTY
* VIRTIO_F_ANY_LAYOUT
* UNUSED

I would say, form these only VIRTIO_F_SR_IOV and
VIRTIO_F_NOTIF_CONFIG_DATA look iffy in a sense things may work out
for vhost devices without the vhost part doing something for it. And
even there, I don't think it would hurt to make vhost part of the
negotiation (I don't think those are supported by QEMU at this point).

I would very much prefer having a consolidated and safe handling for
these.

I completely agree on this!

4) I would also argue that a bunch of the device specific feature bits
should have vhost feature bits as well for the same reason:
features are also such that for a vhost device, the vhost part needs
some sort of a support.

Looking through all of these would require a lot of time, so instead
of that, let me use SCSI as an example. The features are:
* VIRTIO_SCSI_F_INOUT
* VIRTIO_SCSI_F_HOTPLUG
* VIRTIO_SCSI_F_CHANGE
* VIRTIO_SCSI_F_T10_PI

The in the Linux kernel we have
   VHOST_SCSI_FEATURES = VHOST_FEATURES | (1ULL << VIRTIO_SCSI_F_HOTPLUG) |
  (1ULL << VIRTIO_SCSI_F_T10_PI)
but in QEMU kernel_feature_bits does not have
VIRTIO_SCSI_F_T10_PI which together does not make much sense to me. And I would
also expect VIRTIO_SCSI_F_INOUT to be a part of the negotiation, because
to me that the side that is processing the q

[PATCH v6 11/12] tests/qtest/vhost-user-blk-test: use memory-backend-shm

2024-05-28 Thread Stefano Garzarella

`memory-backend-memfd` is available only on Linux while the new
`memory-backend-shm` can be used on any POSIX-compliant operating
system. Let's use it so we can run the test in multiple environments.

Since we are here, let`s remove `share=on` which is the default for shm
(and also for memfd).

Acked-by: Thomas Huth 
Acked-by: Stefan Hajnoczi 
Reviewed-by: Philippe Mathieu-Daudé 
Tested-by: Philippe Mathieu-Daudé 
Reviewed-by: David Hildenbrand 
Signed-off-by: Stefano Garzarella 
---
v6
- removed `share=on` since it's the default [David]
---
 tests/qtest/vhost-user-blk-test.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/qtest/vhost-user-blk-test.c 
b/tests/qtest/vhost-user-blk-test.c
index 117b9acd10..ea90d41232 100644
--- a/tests/qtest/vhost-user-blk-test.c
+++ b/tests/qtest/vhost-user-blk-test.c
@@ -906,7 +906,7 @@ static void start_vhost_user_blk(GString *cmd_line, int 
vus_instances,
vhost_user_blk_bin);
 
 g_string_append_printf(cmd_line,
-" -object memory-backend-memfd,id=mem,size=256M,share=on "
+" -object memory-backend-shm,id=mem,size=256M "
 " -M memory-backend=mem -m 256M ");
 
 for (i = 0; i < vus_instances; i++) {
-- 
2.45.1

[PATCH v6 12/12] tests/qtest/vhost-user-test: add a test case for memory-backend-shm

2024-05-28 Thread Stefano Garzarella

`memory-backend-shm` can be used with vhost-user devices, so let's
add a new test case for it.

Acked-by: Thomas Huth 
Acked-by: Stefan Hajnoczi 
Reviewed-by: David Hildenbrand 
Signed-off-by: Stefano Garzarella 
---
 tests/qtest/vhost-user-test.c | 23 +++
 1 file changed, 23 insertions(+)

diff --git a/tests/qtest/vhost-user-test.c b/tests/qtest/vhost-user-test.c
index d4e437265f..8c1d903b2a 100644
--- a/tests/qtest/vhost-user-test.c
+++ b/tests/qtest/vhost-user-test.c
@@ -44,6 +44,8 @@
 "mem-path=%s,share=on -numa node,memdev=mem"
 #define QEMU_CMD_MEMFD  " -m %d -object memory-backend-memfd,id=mem,size=%dM," 
\
 " -numa node,memdev=mem"
+#define QEMU_CMD_SHM" -m %d -object memory-backend-shm,id=mem,size=%dM," \
+" -numa node,memdev=mem"
 #define QEMU_CMD_CHR" -chardev socket,id=%s,path=%s%s"
 #define QEMU_CMD_NETDEV " -netdev vhost-user,id=hs0,chardev=%s,vhostforce=on"
 
@@ -195,6 +197,7 @@ enum test_memfd {
 TEST_MEMFD_AUTO,
 TEST_MEMFD_YES,
 TEST_MEMFD_NO,
+TEST_MEMFD_SHM,
 };
 
 static void append_vhost_net_opts(TestServer *s, GString *cmd_line,
@@ -228,6 +231,8 @@ static void append_mem_opts(TestServer *server, GString 
*cmd_line,
 
 if (memfd == TEST_MEMFD_YES) {
 g_string_append_printf(cmd_line, QEMU_CMD_MEMFD, size, size);
+} else if (memfd == TEST_MEMFD_SHM) {
+g_string_append_printf(cmd_line, QEMU_CMD_SHM, size, size);
 } else {
 const char *root = init_hugepagefs() ? : server->tmpfs;
 
@@ -788,6 +793,19 @@ static void *vhost_user_test_setup_memfd(GString 
*cmd_line, void *arg)
 return server;
 }
 
+static void *vhost_user_test_setup_shm(GString *cmd_line, void *arg)
+{
+TestServer *server = test_server_new("vhost-user-test", arg);
+test_server_listen(server);
+
+append_mem_opts(server, cmd_line, 256, TEST_MEMFD_SHM);
+server->vu_ops->append_opts(server, cmd_line, "");
+
+g_test_queue_destroy(vhost_user_test_cleanup, server);
+
+return server;
+}
+
 static void test_read_guest_mem(void *obj, void *arg, QGuestAllocator *alloc)
 {
 TestServer *server = arg;
@@ -1081,6 +1099,11 @@ static void register_vhost_user_test(void)
  "virtio-net",
  test_read_guest_mem, &opts);
 
+opts.before = vhost_user_test_setup_shm;
+qos_add_test("vhost-user/read-guest-mem/shm",
+ "virtio-net",
+ test_read_guest_mem, &opts);
+
 if (qemu_memfd_check(MFD_ALLOW_SEALING)) {
 opts.before = vhost_user_test_setup_memfd;
 qos_add_test("vhost-user/read-guest-mem/memfd",
-- 
2.45.1

[PATCH v6 10/12] hostmem: add a new memory backend based on POSIX shm_open()

2024-05-28 Thread Stefano Garzarella

shm_open() creates and opens a new POSIX shared memory object.
A POSIX shared memory object allows creating memory backend with an
associated file descriptor that can be shared with external processes
(e.g. vhost-user).

The new `memory-backend-shm` can be used as an alternative when
`memory-backend-memfd` is not available (Linux only), since shm_open()
should be provided by any POSIX-compliant operating system.

This backend mimics memfd, allocating memory that is practically
anonymous. In theory shm_open() requires a name, but this is allocated
for a short time interval and shm_unlink() is called right after
shm_open(). After that, only fd is shared with external processes
(e.g., vhost-user) as if it were associated with anonymous memory.

In the future we may also allow the user to specify the name to be
passed to shm_open(), but for now we keep the backend simple, mimicking
anonymous memory such as memfd.

Acked-by: David Hildenbrand 
Acked-by: Stefan Hajnoczi 
Signed-off-by: Stefano Garzarella 
---
v5
- fixed documentation in qapi/qom.json and qemu-options.hx [Markus]
v4
- fail if we find "share=off" in shm_backend_memory_alloc() [David]
v3
- enriched commit message and documentation to highlight that we
  want to mimic memfd (David)
---
 docs/system/devices/vhost-user.rst |   5 +-
 qapi/qom.json  |  19 +
 backends/hostmem-shm.c | 123 +
 backends/meson.build   |   1 +
 qemu-options.hx|  16 
 5 files changed, 162 insertions(+), 2 deletions(-)
 create mode 100644 backends/hostmem-shm.c

diff --git a/docs/system/devices/vhost-user.rst 
b/docs/system/devices/vhost-user.rst
index 9b2da106ce..35259d8ec7 100644
--- a/docs/system/devices/vhost-user.rst
+++ b/docs/system/devices/vhost-user.rst
@@ -98,8 +98,9 @@ Shared memory object
 
 In order for the daemon to access the VirtIO queues to process the
 requests it needs access to the guest's address space. This is
-achieved via the ``memory-backend-file`` or ``memory-backend-memfd``
-objects. A reference to a file-descriptor which can access this object
+achieved via the ``memory-backend-file``, ``memory-backend-memfd``, or
+``memory-backend-shm`` objects.
+A reference to a file-descriptor which can access this object
 will be passed via the socket as part of the protocol negotiation.
 
 Currently the shared memory object needs to match the size of the main
diff --git a/qapi/qom.json b/qapi/qom.json
index 38dde6d785..d40592d863 100644
--- a/qapi/qom.json
+++ b/qapi/qom.json
@@ -721,6 +721,21 @@
 '*hugetlbsize': 'size',
 '*seal': 'bool' } }
 
+##
+# @MemoryBackendShmProperties:
+#
+# Properties for memory-backend-shm objects.
+#
+# The @share boolean option is true by default with shm. Setting it to false
+# will cause a failure during allocation because it is not supported by this
+# backend.
+#
+# Since: 9.1
+##
+{ 'struct': 'MemoryBackendShmProperties',
+  'base': 'MemoryBackendProperties',
+  'data': { } }
+
 ##
 # @MemoryBackendEpcProperties:
 #
@@ -985,6 +1000,8 @@
 { 'name': 'memory-backend-memfd',
   'if': 'CONFIG_LINUX' },
 'memory-backend-ram',
+{ 'name': 'memory-backend-shm',
+  'if': 'CONFIG_POSIX' },
 'pef-guest',
 { 'name': 'pr-manager-helper',
   'if': 'CONFIG_LINUX' },
@@ -1056,6 +1073,8 @@
   'memory-backend-memfd':   { 'type': 'MemoryBackendMemfdProperties',
   'if': 'CONFIG_LINUX' },
   'memory-backend-ram': 'MemoryBackendProperties',
+  'memory-backend-shm': { 'type': 'MemoryBackendShmProperties',
+  'if': 'CONFIG_POSIX' },
   'pr-manager-helper':  { 'type': 'PrManagerHelperProperties',
           'if': 'CONFIG_LINUX' },
   'qtest':  'QtestProperties',
diff --git a/backends/hostmem-shm.c b/backends/hostmem-shm.c
new file mode 100644
index 00..374edc3db8
--- /dev/null
+++ b/backends/hostmem-shm.c
@@ -0,0 +1,123 @@
+/*
+ * QEMU host POSIX shared memory object backend
+ *
+ * Copyright (C) 2024 Red Hat Inc
+ *
+ * Authors:
+ *   Stefano Garzarella 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "sysemu/hostmem.h"
+#include "qapi/error.h"
+
+#define TYPE_MEMORY_BACKEND_SHM "memory-backend-shm"
+
+OBJECT_DECLARE_SIMPLE_TYPE(Hos

[PATCH v6 09/12] contrib/vhost-user-blk: enable it on any POSIX system

2024-05-28 Thread Stefano Garzarella

Let's make the code more portable by adding defines from
block/file-posix.c to support O_DIRECT in other systems (e.g. macOS).

vhost-user-server.c is a dependency, let's enable it for any POSIX
system.

Acked-by: Stefan Hajnoczi 
Reviewed-by: Philippe Mathieu-Daudé 
Tested-by: Philippe Mathieu-Daudé 
Signed-off-by: Stefano Garzarella 
---
v6:
- reverted v5 changes since we can't move O_DSYNC and O_DIRECT in osdep
  [Daniel, failing tests on Windows]
v5:
- O_DSYNC and O_DIRECT definition are now in osdep [Phil]
- commit updated since we moved out all code changes
v4:
- moved using of "qemu/bswap.h" API in a separate patch [Phil]
---
 meson.build |  2 --
 contrib/vhost-user-blk/vhost-user-blk.c | 14 ++
 util/meson.build|  4 +++-
 3 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/meson.build b/meson.build
index 48e476b237..c89ee7b578 100644
--- a/meson.build
+++ b/meson.build
@@ -1981,8 +1981,6 @@ has_statx = cc.has_header_symbol('sys/stat.h', 
'STATX_BASIC_STATS', prefix: gnu_
 has_statx_mnt_id = cc.has_header_symbol('sys/stat.h', 'STATX_MNT_ID', prefix: 
gnu_source_prefix)
 
 have_vhost_user_blk_server = get_option('vhost_user_blk_server') \
-  .require(host_os == 'linux',
-   error_message: 'vhost_user_blk_server requires linux') \
   .require(have_vhost_user,
error_message: 'vhost_user_blk_server requires vhost-user support') 
\
   .disable_auto_if(not have_tools and not have_system) \
diff --git a/contrib/vhost-user-blk/vhost-user-blk.c 
b/contrib/vhost-user-blk/vhost-user-blk.c
index 9492146855..a450337685 100644
--- a/contrib/vhost-user-blk/vhost-user-blk.c
+++ b/contrib/vhost-user-blk/vhost-user-blk.c
@@ -25,6 +25,20 @@
 #include 
 #endif
 
+/* OS X does not have O_DSYNC */
+#ifndef O_DSYNC
+#ifdef O_SYNC
+#define O_DSYNC O_SYNC
+#elif defined(O_FSYNC)
+#define O_DSYNC O_FSYNC
+#endif
+#endif
+
+/* Approximate O_DIRECT with O_DSYNC if O_DIRECT isn't available */
+#ifndef O_DIRECT
+#define O_DIRECT O_DSYNC
+#endif
+
 enum {
 VHOST_USER_BLK_MAX_QUEUES = 8,
 };
diff --git a/util/meson.build b/util/meson.build
index 72b505df11..c414178ace 100644
--- a/util/meson.build
+++ b/util/meson.build
@@ -112,10 +112,12 @@ if have_block
 util_ss.add(files('filemonitor-stub.c'))
   endif
   if host_os == 'linux'
-util_ss.add(files('vhost-user-server.c'), vhost_user)
 util_ss.add(files('vfio-helpers.c'))
 util_ss.add(files('chardev_open.c'))
   endif
+  if host_os != 'windows'
+util_ss.add(files('vhost-user-server.c'), vhost_user)
+  endif
   util_ss.add(files('yank.c'))
 endif
 
-- 
2.45.1

[PATCH v6 03/12] libvhost-user: mask F_INFLIGHT_SHMFD if memfd is not supported

2024-05-28 Thread Stefano Garzarella

libvhost-user will panic when receiving VHOST_USER_GET_INFLIGHT_FD
message if MFD_ALLOW_SEALING is not defined, since it's not able
to create a memfd.

VHOST_USER_GET_INFLIGHT_FD is used only if
VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD is negotiated. So, let's mask
that feature if the backend is not able to properly handle these
messages.

Reviewed-by: Philippe Mathieu-Daudé 
Tested-by: Philippe Mathieu-Daudé 
Acked-by: Stefan Hajnoczi 
Reviewed-by: David Hildenbrand 
Signed-off-by: Stefano Garzarella 
---
 subprojects/libvhost-user/libvhost-user.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/subprojects/libvhost-user/libvhost-user.c 
b/subprojects/libvhost-user/libvhost-user.c
index a11afd1960..2c20cdc16e 100644
--- a/subprojects/libvhost-user/libvhost-user.c
+++ b/subprojects/libvhost-user/libvhost-user.c
@@ -1674,6 +1674,17 @@ vu_get_protocol_features_exec(VuDev *dev, VhostUserMsg 
*vmsg)
 features |= dev->iface->get_protocol_features(dev);
 }
 
+#ifndef MFD_ALLOW_SEALING
+/*
+ * If MFD_ALLOW_SEALING is not defined, we are not able to handle
+ * VHOST_USER_GET_INFLIGHT_FD messages, since we can't create a memfd.
+ * Those messages are used only if VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD
+ * is negotiated. A device implementation can enable it, so let's mask
+ * it to avoid a runtime panic.
+ */
+features &= ~(1ULL << VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD);
+#endif
+
 vmsg_set_reply_u64(vmsg, features);
 return true;
 }
-- 
2.45.1

[PATCH v6 08/12] libvhost-user: enable it on any POSIX system

2024-05-28 Thread Stefano Garzarella

The vhost-user protocol is not really Linux-specific so let's enable
libvhost-user for any POSIX system.

Compiling it on macOS and FreeBSD some problems came up:
- avoid to include linux/vhost.h which is available only on Linux
  (vhost_types.h contains many of the things we need)
- macOS doesn't provide sys/endian.h, so let's define them
  (note: libvhost-user doesn't include QEMU's headers, so we can't use
   use "qemu/bswap.h")
- define eventfd_[write|read] as write/read wrapper when system doesn't
  provide those (e.g. macOS)
- copy SEAL defines from include/qemu/memfd.h to make the code works
  on FreeBSD where MFD_ALLOW_SEALING is defined
- define MAP_NORESERVE if it's not defined (e.g. on FreeBSD)

Reviewed-by: Philippe Mathieu-Daudé 
Tested-by: Philippe Mathieu-Daudé 
Acked-by: Stefan Hajnoczi 
Signed-off-by: Stefano Garzarella 
---
v5:
- fixed typos in the commit description [Phil]
---
 meson.build   |  2 +-
 subprojects/libvhost-user/libvhost-user.h |  2 +-
 subprojects/libvhost-user/libvhost-user.c | 60 +--
 3 files changed, 59 insertions(+), 5 deletions(-)

diff --git a/meson.build b/meson.build
index a72500be77..48e476b237 100644
--- a/meson.build
+++ b/meson.build
@@ -3162,7 +3162,7 @@ if have_system and vfio_user_server_allowed
 endif
 
 vhost_user = not_found
-if host_os == 'linux' and have_vhost_user
+if have_vhost_user
   libvhost_user = subproject('libvhost-user')
   vhost_user = libvhost_user.get_variable('vhost_user_dep')
 endif
diff --git a/subprojects/libvhost-user/libvhost-user.h 
b/subprojects/libvhost-user/libvhost-user.h
index deb40e77b3..e13e1d3931 100644
--- a/subprojects/libvhost-user/libvhost-user.h
+++ b/subprojects/libvhost-user/libvhost-user.h
@@ -18,9 +18,9 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include "standard-headers/linux/virtio_ring.h"
+#include "standard-headers/linux/vhost_types.h"
 
 /* Based on qemu/hw/virtio/vhost-user.c */
 #define VHOST_USER_F_PROTOCOL_FEATURES 30
diff --git a/subprojects/libvhost-user/libvhost-user.c 
b/subprojects/libvhost-user/libvhost-user.c
index 2c20cdc16e..57e58d4adb 100644
--- a/subprojects/libvhost-user/libvhost-user.c
+++ b/subprojects/libvhost-user/libvhost-user.c
@@ -28,9 +28,7 @@
 #include 
 #include 
 #include 
-#include 
 #include 
-#include 
 
 /* Necessary to provide VIRTIO_F_VERSION_1 on system
  * with older linux headers. Must appear before
@@ -39,8 +37,8 @@
 #include "standard-headers/linux/virtio_config.h"
 
 #if defined(__linux__)
+#include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -52,6 +50,62 @@
 
 #endif
 
+#if defined(__APPLE__) && (__MACH__)
+#include 
+#define htobe16(x) OSSwapHostToBigInt16(x)
+#define htole16(x) OSSwapHostToLittleInt16(x)
+#define be16toh(x) OSSwapBigToHostInt16(x)
+#define le16toh(x) OSSwapLittleToHostInt16(x)
+
+#define htobe32(x) OSSwapHostToBigInt32(x)
+#define htole32(x) OSSwapHostToLittleInt32(x)
+#define be32toh(x) OSSwapBigToHostInt32(x)
+#define le32toh(x) OSSwapLittleToHostInt32(x)
+
+#define htobe64(x) OSSwapHostToBigInt64(x)
+#define htole64(x) OSSwapHostToLittleInt64(x)
+#define be64toh(x) OSSwapBigToHostInt64(x)
+#define le64toh(x) OSSwapLittleToHostInt64(x)
+#endif
+
+#ifdef CONFIG_EVENTFD
+#include 
+#else
+#define eventfd_t uint64_t
+
+int eventfd_write(int fd, eventfd_t value)
+{
+return (write(fd, &value, sizeof(value)) == sizeof(value)) ? 0 : -1;
+}
+
+int eventfd_read(int fd, eventfd_t *value)
+{
+return (read(fd, value, sizeof(*value)) == sizeof(*value)) ? 0 : -1;
+}
+#endif
+
+#ifdef MFD_ALLOW_SEALING
+#include 
+
+#ifndef F_LINUX_SPECIFIC_BASE
+#define F_LINUX_SPECIFIC_BASE 1024
+#endif
+
+#ifndef F_ADD_SEALS
+#define F_ADD_SEALS (F_LINUX_SPECIFIC_BASE + 9)
+#define F_GET_SEALS (F_LINUX_SPECIFIC_BASE + 10)
+
+#define F_SEAL_SEAL 0x0001  /* prevent further seals from being set */
+#define F_SEAL_SHRINK   0x0002  /* prevent file from shrinking */
+#define F_SEAL_GROW 0x0004  /* prevent file from growing */
+#define F_SEAL_WRITE0x0008  /* prevent writes */
+#endif
+#endif
+
+#ifndef MAP_NORESERVE
+#define MAP_NORESERVE 0
+#endif
+
 #include "include/atomic.h"
 
 #include "libvhost-user.h"
-- 
2.45.1

[PATCH v6 02/12] libvhost-user: fail vu_message_write() if sendmsg() is failing

2024-05-28 Thread Stefano Garzarella

In vu_message_write() we use sendmsg() to send the message header,
then a write() to send the payload.

If sendmsg() fails we should avoid sending the payload, since we
were unable to send the header.

Discovered before fixing the issue with the previous patch, where
sendmsg() failed on macOS due to wrong parameters, but the frontend
still sent the payload which the backend incorrectly interpreted
as a wrong header.

Reviewed-by: Daniel P. Berrangé 
Reviewed-by: Philippe Mathieu-Daudé 
Tested-by: Philippe Mathieu-Daudé 
Acked-by: Stefan Hajnoczi 
Reviewed-by: David Hildenbrand 
Signed-off-by: Stefano Garzarella 
---
 subprojects/libvhost-user/libvhost-user.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/subprojects/libvhost-user/libvhost-user.c 
b/subprojects/libvhost-user/libvhost-user.c
index 22bea0c775..a11afd1960 100644
--- a/subprojects/libvhost-user/libvhost-user.c
+++ b/subprojects/libvhost-user/libvhost-user.c
@@ -639,6 +639,11 @@ vu_message_write(VuDev *dev, int conn_fd, VhostUserMsg 
*vmsg)
 rc = sendmsg(conn_fd, &msg, 0);
 } while (rc < 0 && (errno == EINTR || errno == EAGAIN));
 
+if (rc <= 0) {
+vu_panic(dev, "Error while writing: %s", strerror(errno));
+return false;
+}
+
 if (vmsg->size) {
 do {
 if (vmsg->data) {
-- 
2.45.1

[PATCH v6 06/12] contrib/vhost-user-*: use QEMU bswap helper functions

2024-05-28 Thread Stefano Garzarella

Let's replace the calls to le*toh() and htole*() with qemu/bswap.h
helpers to make the code more portable.

Suggested-by: Philippe Mathieu-Daudé 
Reviewed-by: Philippe Mathieu-Daudé 
Tested-by: Philippe Mathieu-Daudé 
Acked-by: Stefan Hajnoczi 
Reviewed-by: David Hildenbrand 
Signed-off-by: Stefano Garzarella 
---
 contrib/vhost-user-blk/vhost-user-blk.c |  9 +
 contrib/vhost-user-input/main.c | 16 
 2 files changed, 13 insertions(+), 12 deletions(-)

diff --git a/contrib/vhost-user-blk/vhost-user-blk.c 
b/contrib/vhost-user-blk/vhost-user-blk.c
index a8ab9269a2..9492146855 100644
--- a/contrib/vhost-user-blk/vhost-user-blk.c
+++ b/contrib/vhost-user-blk/vhost-user-blk.c
@@ -16,6 +16,7 @@
  */
 
 #include "qemu/osdep.h"
+#include "qemu/bswap.h"
 #include "standard-headers/linux/virtio_blk.h"
 #include "libvhost-user-glib.h"
 
@@ -194,8 +195,8 @@ vub_discard_write_zeroes(VubReq *req, struct iovec *iov, 
uint32_t iovcnt,
 #if defined(__linux__) && defined(BLKDISCARD) && defined(BLKZEROOUT)
 VubDev *vdev_blk = req->vdev_blk;
 desc = buf;
-uint64_t range[2] = { le64toh(desc->sector) << 9,
-  le32toh(desc->num_sectors) << 9 };
+uint64_t range[2] = { le64_to_cpu(desc->sector) << 9,
+  le32_to_cpu(desc->num_sectors) << 9 };
 if (type == VIRTIO_BLK_T_DISCARD) {
 if (ioctl(vdev_blk->blk_fd, BLKDISCARD, range) == 0) {
 g_free(buf);
@@ -267,13 +268,13 @@ static int vub_virtio_process_req(VubDev *vdev_blk,
 req->in = (struct virtio_blk_inhdr *)elem->in_sg[in_num - 1].iov_base;
 in_num--;
 
-type = le32toh(req->out->type);
+type = le32_to_cpu(req->out->type);
 switch (type & ~VIRTIO_BLK_T_BARRIER) {
 case VIRTIO_BLK_T_IN:
 case VIRTIO_BLK_T_OUT: {
 ssize_t ret = 0;
 bool is_write = type & VIRTIO_BLK_T_OUT;
-req->sector_num = le64toh(req->out->sector);
+req->sector_num = le64_to_cpu(req->out->sector);
 if (is_write) {
 ret  = vub_writev(req, &elem->out_sg[1], out_num);
 } else {
diff --git a/contrib/vhost-user-input/main.c b/contrib/vhost-user-input/main.c
index 081230da54..f3362d41ac 100644
--- a/contrib/vhost-user-input/main.c
+++ b/contrib/vhost-user-input/main.c
@@ -51,8 +51,8 @@ static void vi_input_send(VuInput *vi, struct 
virtio_input_event *event)
 vi->queue[vi->qindex++].event = *event;
 
 /* ... until we see a report sync ... */
-if (event->type != htole16(EV_SYN) ||
-event->code != htole16(SYN_REPORT)) {
+if (event->type != cpu_to_le16(EV_SYN) ||
+event->code != cpu_to_le16(SYN_REPORT)) {
 return;
 }
 
@@ -103,9 +103,9 @@ vi_evdev_watch(VuDev *dev, int condition, void *data)
 
 g_debug("input %d %d %d", evdev.type, evdev.code, evdev.value);
 
-virtio.type  = htole16(evdev.type);
-virtio.code  = htole16(evdev.code);
-virtio.value = htole32(evdev.value);
+virtio.type  = cpu_to_le16(evdev.type);
+virtio.code  = cpu_to_le16(evdev.code);
+virtio.value = cpu_to_le32(evdev.value);
 vi_input_send(vi, &virtio);
 }
 }
@@ -124,9 +124,9 @@ static void vi_handle_status(VuInput *vi, 
virtio_input_event *event)
 
 evdev.input_event_sec = tval.tv_sec;
 evdev.input_event_usec = tval.tv_usec;
-evdev.type = le16toh(event->type);
-evdev.code = le16toh(event->code);
-evdev.value = le32toh(event->value);
+evdev.type = le16_to_cpu(event->type);
+evdev.code = le16_to_cpu(event->code);
+evdev.value = le32_to_cpu(event->value);
 
 rc = write(vi->evdevfd, &evdev, sizeof(evdev));
 if (rc == -1) {
-- 
2.45.1

[PATCH v6 04/12] vhost-user-server: do not set memory fd non-blocking

2024-05-28 Thread Stefano Garzarella

In vhost-user-server we set all fd received from the other peer
in non-blocking mode. For some of them (e.g. memfd, shm_open, etc.)
it's not really needed, because we don't use these fd with blocking
operations, but only to map memory.

In addition, in some systems this operation can fail (e.g. in macOS
setting an fd returned by shm_open() non-blocking fails with errno
= ENOTTY).

So, let's avoid setting fd non-blocking for those messages that we
know carry memory fd (e.g. VHOST_USER_ADD_MEM_REG,
VHOST_USER_SET_MEM_TABLE).

Reviewed-by: Daniel P. Berrangé 
Acked-by: Stefan Hajnoczi 
Reviewed-by: David Hildenbrand 
Signed-off-by: Stefano Garzarella 
---
v3:
- avoiding setting fd non-blocking for messages where we have memory fd
  (Eric)
---
 util/vhost-user-server.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/util/vhost-user-server.c b/util/vhost-user-server.c
index 3bfb1ad3ec..b19229074a 100644
--- a/util/vhost-user-server.c
+++ b/util/vhost-user-server.c
@@ -65,6 +65,18 @@ static void vmsg_close_fds(VhostUserMsg *vmsg)
 static void vmsg_unblock_fds(VhostUserMsg *vmsg)
 {
 int i;
+
+/*
+ * These messages carry fd used to map memory, not to send/receive 
messages,
+ * so this operation is useless. In addition, in some systems this
+ * operation can fail (e.g. in macOS setting an fd returned by shm_open()
+ * non-blocking fails with errno = ENOTTY)
+ */
+if (vmsg->request == VHOST_USER_ADD_MEM_REG ||
+vmsg->request == VHOST_USER_SET_MEM_TABLE) {
+return;
+}
+
 for (i = 0; i < vmsg->fd_num; i++) {
 qemu_socket_set_nonblock(vmsg->fds[i]);
 }
-- 
2.45.1

[PATCH v6 00/12] vhost-user: support any POSIX system (tested on macOS, FreeBSD, OpenBSD)

2024-05-28 Thread Stefano Garzarella

v1: https://patchew.org/QEMU/20240228114759.44758-1-sgarz...@redhat.com/
v2: https://patchew.org/QEMU/20240326133936.125332-1-sgarz...@redhat.com/
v3: https://patchew.org/QEMU/20240404122330.92710-1-sgarz...@redhat.com/
v4: https://patchew.org/QEMU/20240508074457.12367-1-sgarz...@redhat.com/
v5: https://patchew.org/QEMU/20240523145522.313012-1-sgarz...@redhat.com/
v6:
- rebased on 60b54b67c63d8f076152e0f7dccf39854dfc6a77
- added David R-b tags [thanks!]
- patch 9 (was split in 9 & 10 in v5): reverted v5 changes since we can't
  move O_DSYNC and O_DIRECT in osdep [Daniel, failing tests on Windows]
- patch 11: removed `share=on` since it's the default [David]
- the series is now fully acked/reviewed

The vhost-user protocol is not really Linux-specific, so let's try support
QEMU's frontends and backends (including libvhost-user) in any POSIX system
with this series. The main use case is to be able to use virtio devices that
we don't have built-in in QEMU (e.g. virtiofsd, vhost-user-vsock, etc.) even
in non-Linux systems.

The first 5 patches are more like fixes discovered at runtime on macOS or
FreeBSD that could go even independently of this series.

Patches 6, 7, 8, 9 enable building of frontends and backends (including
libvhost-user) with associated code changes to succeed in compilation.

Patch 10 adds `memory-backend-shm` that uses the POSIX shm_open() API to
create shared memory which is identified by an fd that can be shared with
vhost-user backends. This is useful on those systems (like macOS) where
we don't have memfd_create() or special filesystems like "/dev/shm".

Patches 11 and 12 use `memory-backend-shm` in some vhost-user tests.

Maybe the first 5 patches can go separately, but I only discovered those
problems after testing patches 6 - 9, so I have included them in this series
for now. Please let me know if you prefer that I send them separately.

I tested this series using vhost-user-blk and QSD on macOS Sonoma 14.4
(aarch64), FreeBSD 14 (x86_64), OpenBSD 7.4 (x86_64), and Fedora 40 (x86_64)
in this way:

- Start vhost-user-blk or QSD (same commands for all systems)

  vhost-user-blk -s /tmp/vhost.socket \
-b Fedora-Cloud-Base-39-1.5.x86_64.raw

  qemu-storage-daemon \
--blockdev 
file,filename=Fedora-Cloud-Base-39-1.5.x86_64.qcow2,node-name=file \
--blockdev qcow2,file=file,node-name=qcow2 \
--export 
vhost-user-blk,addr.type=unix,addr.path=/tmp/vhost.socket,id=vub,num-queues=1,node-name=qcow2,writable=on

- macOS (aarch64): start QEMU (using hvf accelerator)

  qemu-system-aarch64 -smp 2 -cpu host -M virt,accel=hvf,memory-backend=mem \
-drive 
file=./build/pc-bios/edk2-aarch64-code.fd,if=pflash,format=raw,readonly=on \
-device virtio-net-device,netdev=net0 -netdev user,id=net0 \
-device ramfb -device usb-ehci -device usb-kbd \
-object memory-backend-shm,id=mem,size=512M \
-device vhost-user-blk-pci,num-queues=1,disable-legacy=on,chardev=char0 \
-chardev socket,id=char0,path=/tmp/vhost.socket

- FreeBSD/OpenBSD (x86_64): start QEMU (no accelerators available)

  qemu-system-x86_64 -smp 2 -M q35,memory-backend=mem \
-object memory-backend-shm,id=mem,size="512M" \
-device vhost-user-blk-pci,num-queues=1,chardev=char0 \
-chardev socket,id=char0,path=/tmp/vhost.socket

- Fedora (x86_64): start QEMU (using kvm accelerator)

  qemu-system-x86_64 -smp 2 -M q35,accel=kvm,memory-backend=mem \
-object memory-backend-shm,size="512M" \
-device vhost-user-blk-pci,num-queues=1,chardev=char0 \
-chardev socket,id=char0,path=/tmp/vhost.socket

Branch pushed (and CI started) at 
https://gitlab.com/sgarzarella/qemu/-/tree/macos-vhost-user?ref_type=heads

Thanks,
Stefano

Stefano Garzarella (12):
  libvhost-user: set msg.msg_control to NULL when it is empty
  libvhost-user: fail vu_message_write() if sendmsg() is failing
  libvhost-user: mask F_INFLIGHT_SHMFD if memfd is not supported
  vhost-user-server: do not set memory fd non-blocking
  contrib/vhost-user-blk: fix bind() using the right size of the address
  contrib/vhost-user-*: use QEMU bswap helper functions
  vhost-user: enable frontends on any POSIX system
  libvhost-user: enable it on any POSIX system
  contrib/vhost-user-blk: enable it on any POSIX system
  hostmem: add a new memory backend based on POSIX shm_open()
  tests/qtest/vhost-user-blk-test: use memory-backend-shm
  tests/qtest/vhost-user-test: add a test case for memory-backend-shm

 docs/system/devices/vhost-user.rst|   5 +-
 meson.build   |   5 +-
 qapi/qom.json |  19 
 subprojects/libvhost-user/libvhost-user.h |   2 +-
 backends/hostmem-shm.c| 123 ++
 contrib/vhost-user-blk/vhost-user-blk.c   |  27 +++--
 contrib/vhost-user-input/main.c   |  16 +--
 hw/net/vhost_net.c|   5 +
 subprojects/libvhost-user/libvhost-user.c |  77

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1012 matches

Mail list logo