[PATCH trivial] qemu-nbd: mention --tls-hostname option in qemu-nbd --help

2024-02-06 Thread Michael Tokarev
This option was not documented.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1240
Signed-off-by: Michael Tokarev 
---
 qemu-nbd.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/qemu-nbd.c b/qemu-nbd.c
index bac0b5e3ec..d7b3ccab21 100644
--- a/qemu-nbd.c
+++ b/qemu-nbd.c
@@ -114,6 +114,7 @@ static void usage(const char *name)
 "  --tls-creds=IDuse id of an earlier --object to provide TLS\n"
 "  --tls-authz=IDuse id of an earlier --object to provide\n"
 "authorization\n"
+"  --tls-hostname=HOSTNAME   override hostname used to check x509 
certificate\n"
 "  -T, --trace [[enable=]][,events=][,file=]\n"
 "specify tracing options\n"
 "  --forkfork off the server process and exit the parent\n"
-- 
2.39.2




Re: [PATCH 5/5] monitor: use aio_co_reschedule_self()

2024-02-06 Thread Markus Armbruster
Stefan Hajnoczi  writes:

> The aio_co_reschedule_self() API is designed to avoid the race
> condition between scheduling the coroutine in another AioContext and
> yielding.
>
> The QMP dispatch code uses the open-coded version that appears
> susceptible to the race condition at first glance:
>
>   aio_co_schedule(qemu_get_aio_context(), qemu_coroutine_self());
>   qemu_coroutine_yield();
>
> The code is actually safe because the iohandler and qemu_aio_context
> AioContext run under the Big QEMU Lock. Nevertheless, set a good example
> and use aio_co_reschedule_self() so it's obvious that there is no race.
>
> Suggested-by: Hanna Reitz 
> Signed-off-by: Stefan Hajnoczi 

Acked-by: Markus Armbruster 

Feel free to merge this together with the remainder of the series.




Re: [PATCH 00/15] qapi: Require member documentation (with loophole)

2024-02-06 Thread Peter Xu
On Mon, Feb 05, 2024 at 08:46:54AM +0100, Markus Armbruster wrote:
> qapi/migration.json
>   MigrateSetParameters 1

It's tls-authz.  I'll send a patch for this one.

Thanks,

-- 
Peter Xu




Re: Re: [PATCH] vdpa-dev: Fix initialisation order to restore VDUSE compatibility

2024-02-06 Thread Jason Wang
On Tue, Feb 6, 2024 at 4:31 PM Stefano Garzarella  wrote:
>
> On Tue, Feb 06, 2024 at 10:47:40AM +0800, Jason Wang wrote:
> >On Mon, Feb 5, 2024 at 6:51 PM Stefano Garzarella  
> >wrote:
> >>
> >> On Fri, Feb 02, 2024 at 02:25:21PM +0100, Kevin Wolf wrote:
> >> >VDUSE requires that virtqueues are first enabled before the DRIVER_OK
> >> >status flag is set; with the current API of the kernel module, it is
> >> >impossible to enable the opposite order in our block export code because
> >> >userspace is not notified when a virtqueue is enabled.
> >
> >Did this mean virtio-blk will enable a virtqueue after DRIVER_OK?
>
> It's not specific to virtio-blk, but to the generic vdpa device we have
> in QEMU (i.e. vhost-vdpa-device). Yep, after commit
> 6c4825476a4351530bcac17abab72295b75ffe98, virtqueues are enabled after
> DRIVER_OK.

Right.

>
> >Sepc is not clear about this and that's why we introduce
> >VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK.
>
> Ah, I didn't know about this new feature. So after commit
> 6c4825476a4351530bcac17abab72295b75ffe98 the vhost-vdpa-device is not
> complying with the specification, right?

Kind of, but as stated, it's just because spec is unclear about the
behaviour. There's a chance that spec will explicitly support it in
the future.

>
> >
> >>
> >> Yeah, IMHO the VDUSE protocol is missing a VDUSE_SET_VQ_READY message,
> >
> >I think you meant when VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK is
> >negotiated.
>
> At this point yes. But if VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK is not
> negotiated, should we return an error in vhost-vdpa kernel module if
> VHOST_VDPA_SET_VRING_ENABLE is called when DRIVER_OK is already set?

I'm not sure if this can break some setups or not. It might be better
to leave it as is?

Without VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK, we don't know if
parent support vq_ready after driver_ok.
With VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK, we know parent support
vq_ready after driver_ok.

>
> >If this is truth, it seems a little more complicated, for
> >example the get_backend_features needs to be forward to the userspace?
>
> I'm not understanding, don't we already have VHOST_GET_BACKEND_FEATURES
> for this? Or do you mean userspace on the VDUSE side?

Yes, since in this case the parent is in the userspace, there's no way
for VDUSE to know if user space supports vq_ready after driver_ok or
not.

As you may have noticed, we don't have a message for vq_ready which
implies that vq_ready after driver_ok can't be supported.

>
> >This seems suboptimal to implement this in the spec first and then we
> >can leverage the features. Or we can have another parameter for the
> >ioctl that creates the vduse device.
>
> I got a little lost, though in vhost-user, the device can always expect
> a vring_enable/disable, so I thought it was not complicated in VDUSE.

Yes, the problem is assuming we have a message for vq_ready, there
could be  a "legacy" userspace that doesn't support that.  So in that
case, VDUSE needs to know if the userspace parent can support that or
not.

>
> >
> >> I'll start another thread about that, but in the meantime I agree that
> >> we should fix QEMU since we need to work properly with old kernels as
> >> well.
> >>
> >> >
> >> >This requirement also mathces the normal initialisation order as done by
> >> >the generic vhost code in QEMU. However, commit 6c482547 accidentally
> >> >changed the order for vdpa-dev and broke access to VDUSE devices with
> >> >this.
> >> >
> >> >This changes vdpa-dev to use the normal order again and use the standard
> >> >vhost callback .vhost_set_vring_enable for this. VDUSE devices can be
> >> >used with vdpa-dev again after this fix.
> >>
> >> I like this approach and the patch LGTM, but I'm a bit worried about
> >> this function in hw/net/vhost_net.c:
> >>
> >>  int vhost_set_vring_enable(NetClientState *nc, int enable)
> >>  {
> >>  VHostNetState *net = get_vhost_net(nc);
> >>  const VhostOps *vhost_ops = net->dev.vhost_ops;
> >>
> >>  nc->vring_enable = enable;
> >>
> >>  if (vhost_ops && vhost_ops->vhost_set_vring_enable) {
> >>  return vhost_ops->vhost_set_vring_enable(>dev, enable);
> >>  }
> >>
> >>  return 0;
> >>  }
> >>
> >> @Eugenio, @Jason, should we change some things there if vhost-vdpa
> >> implements the vhost_set_vring_enable callback?
> >
> >Eugenio may know more, I remember we need to enable cvq first for
> >shadow virtqueue to restore some states.
> >
> >>
> >> Do you remember why we didn't implement it from the beginning?
> >
> >It seems the vrings parameter is introduced after vhost-vdpa is
> >implemented.
>
> Sorry, I mean why we didn't implement the vhost_set_vring_enable
> callback for vhost-vdpa from the beginning.

Adding Cindy who writes those codes for more thoughts.

Thanks

>
> Thanks,
> Stefano
>




Re: [PULL 11/33] scsi: only access SCSIDevice->requests from one thread

2024-02-06 Thread Stefan Hajnoczi
On Fri, Feb 02, 2024 at 01:32:39PM +0100, Hanna Czenczek wrote:
> On 01.02.24 16:25, Hanna Czenczek wrote:
> > On 01.02.24 15:28, Stefan Hajnoczi wrote:
> 
> [...]
> 
> > > Did you find a scenario where the virtio-scsi AioContext is different
> > > from the scsi-hd BB's Aiocontext?
> > 
> > Technically, that’s the reason for this thread, specifically that
> > virtio_scsi_hotunplug() switches the BB back to the main context while
> > scsi_device_for_each_req_async_bh() is running.  Yes, we can fix that
> > specific case via the in-flight counter, but I’m wondering whether
> > there’s really any merit in requiring the BB to always be in
> > virtio-scsi’s context, or whether it would make more sense to schedule
> > everything in virtio-scsi’s context.  Now that BBs/BDSs can receive
> > requests from any context, that is.
> 
> Now that I know that wouldn’t be easy, let me turn this around: As far as I
> understand, scsi_device_for_each_req_async_bh() should still run in
> virtio-scsi’s context, but that’s hard, so we take the BB’s context, which
> we therefore require to be the same one. Further, (again AFAIU,)
> virtio-scsi’s context cannot change (only set in
> virtio_scsi_dataplane_setup(), which is run in
> virtio_scsi_device_realize()).  Therefore, why does the
> scsi_device_for_each_req_async() code accommodate for BB context changes?

1. scsi_disk_reset() -> scsi_device_purge_requests() is called without
   in-flight requests.
2. The BH is scheduled by scsi_device_purge_requests() ->
   scsi_device_for_each_req_async().
3. blk_drain() is a nop when there no in-flight requests and does not
   flush BHs.
3. The AioContext changes when the virtio-scsi device resets.
4. The BH executes.

Kevin and I touched on the idea of flushing BHs in bdrv_drain() even
when there are no requests in flight. This hasn't been implemented as of
today, but would also reduce the chance of scenarios like the one I
mentioned.

I think it's safer to handle the case where the BH runs after an
AioContext change until either everything is thread-safe or the
AioContext never changes.

Stefan


signature.asc
Description: PGP signature


Re: [PATCH v2 0/5] virtio-blk: iothread-vq-mapping cleanups

2024-02-06 Thread Michael S. Tsirkin
On Tue, Feb 06, 2024 at 02:06:05PM -0500, Stefan Hajnoczi wrote:
> v2:
> - Add comment in Patch 3 explaining why bounds check assertion [Manos]
> - Remove redundant nested if in Patch 1 [Hanna]
> 
> Hanna reviewed the iothread-vq-mapping patches after they were applied to
> qemu.git. This series consists of code cleanups that Hanna identified.
> 
> There are no functional changes or bug fixes that need to be backported to the
> stable tree here, but it may make sense to backport them in the future to 
> avoid
> conflicts.

Reviewed-by: Michael S. Tsirkin 




> Stefan Hajnoczi (5):
>   virtio-blk: enforce iothread-vq-mapping validation
>   virtio-blk: clarify that there is at least 1 virtqueue
>   virtio-blk: add vq_rq[] bounds check in virtio_blk_dma_restart_cb()
>   virtio-blk: declare VirtIOBlock::rq with a type
>   monitor: use aio_co_reschedule_self()
> 
>  include/hw/virtio/virtio-blk.h |   2 +-
>  hw/block/virtio-blk.c  | 194 ++---
>  qapi/qmp-dispatch.c|   7 +-
>  3 files changed, 112 insertions(+), 91 deletions(-)
> 
> -- 
> 2.43.0




[PATCH v2 3/5] virtio-blk: add vq_rq[] bounds check in virtio_blk_dma_restart_cb()

2024-02-06 Thread Stefan Hajnoczi
Hanna Czenczek  noted that the array index in
virtio_blk_dma_restart_cb() is not bounds-checked:

  g_autofree VirtIOBlockReq **vq_rq = g_new0(VirtIOBlockReq *, num_queues);
  ...
  while (rq) {
  VirtIOBlockReq *next = rq->next;
  uint16_t idx = virtio_get_queue_index(rq->vq);

  rq->next = vq_rq[idx];
 ^^

The code is correct because both rq->vq and vq_rq[] depend on
num_queues, but this is indirect and not 100% obvious. Add an assertion.

Suggested-by: Hanna Czenczek 
Reviewed-by: Manos Pitsidianakis 
Reviewed-by: Hanna Czenczek 
Signed-off-by: Stefan Hajnoczi 
---
 hw/block/virtio-blk.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index e430ba583c..31212506ca 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -1209,6 +1209,8 @@ static void virtio_blk_dma_restart_cb(void *opaque, bool 
running,
 VirtIOBlockReq *next = rq->next;
 uint16_t idx = virtio_get_queue_index(rq->vq);
 
+/* Only num_queues vqs were created so vq_rq[idx] is within bounds */
+assert(idx < num_queues);
 rq->next = vq_rq[idx];
 vq_rq[idx] = rq;
 rq = next;
-- 
2.43.0




[PATCH v2 1/5] virtio-blk: enforce iothread-vq-mapping validation

2024-02-06 Thread Stefan Hajnoczi
Hanna Czenczek  noticed that the safety of
`vq_aio_context[vq->value] = ctx;` with user-defined vq->value inputs is
not obvious.

The code is structured in validate() + apply() steps so input validation
is there, but it happens way earlier and there is nothing that
guarantees apply() can only be called with validated inputs.

This patch moves the validate() call inside the apply() function so
validation is guaranteed. I also added the bounds checking assertion
that Hanna suggested.

Signed-off-by: Stefan Hajnoczi 
Reviewed-by: Manos Pitsidianakis 
Reviewed-by: Hanna Czenczek 
---
 hw/block/virtio-blk.c | 191 +++---
 1 file changed, 106 insertions(+), 85 deletions(-)

diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index 227d83569f..6e3e3a23ee 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -1485,6 +1485,72 @@ static int virtio_blk_load_device(VirtIODevice *vdev, 
QEMUFile *f,
 return 0;
 }
 
+static void virtio_resize_cb(void *opaque)
+{
+VirtIODevice *vdev = opaque;
+
+assert(qemu_get_current_aio_context() == qemu_get_aio_context());
+virtio_notify_config(vdev);
+}
+
+static void virtio_blk_resize(void *opaque)
+{
+VirtIODevice *vdev = VIRTIO_DEVICE(opaque);
+
+/*
+ * virtio_notify_config() needs to acquire the BQL,
+ * so it can't be called from an iothread. Instead, schedule
+ * it to be run in the main context BH.
+ */
+aio_bh_schedule_oneshot(qemu_get_aio_context(), virtio_resize_cb, vdev);
+}
+
+static void virtio_blk_ioeventfd_detach(VirtIOBlock *s)
+{
+VirtIODevice *vdev = VIRTIO_DEVICE(s);
+
+for (uint16_t i = 0; i < s->conf.num_queues; i++) {
+VirtQueue *vq = virtio_get_queue(vdev, i);
+virtio_queue_aio_detach_host_notifier(vq, s->vq_aio_context[i]);
+}
+}
+
+static void virtio_blk_ioeventfd_attach(VirtIOBlock *s)
+{
+VirtIODevice *vdev = VIRTIO_DEVICE(s);
+
+for (uint16_t i = 0; i < s->conf.num_queues; i++) {
+VirtQueue *vq = virtio_get_queue(vdev, i);
+virtio_queue_aio_attach_host_notifier(vq, s->vq_aio_context[i]);
+}
+}
+
+/* Suspend virtqueue ioeventfd processing during drain */
+static void virtio_blk_drained_begin(void *opaque)
+{
+VirtIOBlock *s = opaque;
+
+if (s->ioeventfd_started) {
+virtio_blk_ioeventfd_detach(s);
+}
+}
+
+/* Resume virtqueue ioeventfd processing after drain */
+static void virtio_blk_drained_end(void *opaque)
+{
+VirtIOBlock *s = opaque;
+
+if (s->ioeventfd_started) {
+virtio_blk_ioeventfd_attach(s);
+}
+}
+
+static const BlockDevOps virtio_block_ops = {
+.resize_cb = virtio_blk_resize,
+.drained_begin = virtio_blk_drained_begin,
+.drained_end   = virtio_blk_drained_end,
+};
+
 static bool
 validate_iothread_vq_mapping_list(IOThreadVirtQueueMappingList *list,
 uint16_t num_queues, Error **errp)
@@ -1547,81 +1613,33 @@ 
validate_iothread_vq_mapping_list(IOThreadVirtQueueMappingList *list,
 return true;
 }
 
-static void virtio_resize_cb(void *opaque)
-{
-VirtIODevice *vdev = opaque;
-
-assert(qemu_get_current_aio_context() == qemu_get_aio_context());
-virtio_notify_config(vdev);
-}
-
-static void virtio_blk_resize(void *opaque)
-{
-VirtIODevice *vdev = VIRTIO_DEVICE(opaque);
-
-/*
- * virtio_notify_config() needs to acquire the BQL,
- * so it can't be called from an iothread. Instead, schedule
- * it to be run in the main context BH.
- */
-aio_bh_schedule_oneshot(qemu_get_aio_context(), virtio_resize_cb, vdev);
-}
-
-static void virtio_blk_ioeventfd_detach(VirtIOBlock *s)
-{
-VirtIODevice *vdev = VIRTIO_DEVICE(s);
-
-for (uint16_t i = 0; i < s->conf.num_queues; i++) {
-VirtQueue *vq = virtio_get_queue(vdev, i);
-virtio_queue_aio_detach_host_notifier(vq, s->vq_aio_context[i]);
-}
-}
-
-static void virtio_blk_ioeventfd_attach(VirtIOBlock *s)
-{
-VirtIODevice *vdev = VIRTIO_DEVICE(s);
-
-for (uint16_t i = 0; i < s->conf.num_queues; i++) {
-VirtQueue *vq = virtio_get_queue(vdev, i);
-virtio_queue_aio_attach_host_notifier(vq, s->vq_aio_context[i]);
-}
-}
-
-/* Suspend virtqueue ioeventfd processing during drain */
-static void virtio_blk_drained_begin(void *opaque)
-{
-VirtIOBlock *s = opaque;
-
-if (s->ioeventfd_started) {
-virtio_blk_ioeventfd_detach(s);
-}
-}
-
-/* Resume virtqueue ioeventfd processing after drain */
-static void virtio_blk_drained_end(void *opaque)
-{
-VirtIOBlock *s = opaque;
-
-if (s->ioeventfd_started) {
-virtio_blk_ioeventfd_attach(s);
-}
-}
-
-static const BlockDevOps virtio_block_ops = {
-.resize_cb = virtio_blk_resize,
-.drained_begin = virtio_blk_drained_begin,
-.drained_end   = virtio_blk_drained_end,
-};
-
-/* Generate vq:AioContext mappings from a validated iothread-vq-mapping list */
-static void
-apply_vq_mapping(IOThreadVirtQueueMappingList 

[PATCH v2 0/5] virtio-blk: iothread-vq-mapping cleanups

2024-02-06 Thread Stefan Hajnoczi
v2:
- Add comment in Patch 3 explaining why bounds check assertion [Manos]
- Remove redundant nested if in Patch 1 [Hanna]

Hanna reviewed the iothread-vq-mapping patches after they were applied to
qemu.git. This series consists of code cleanups that Hanna identified.

There are no functional changes or bug fixes that need to be backported to the
stable tree here, but it may make sense to backport them in the future to avoid
conflicts.

Stefan Hajnoczi (5):
  virtio-blk: enforce iothread-vq-mapping validation
  virtio-blk: clarify that there is at least 1 virtqueue
  virtio-blk: add vq_rq[] bounds check in virtio_blk_dma_restart_cb()
  virtio-blk: declare VirtIOBlock::rq with a type
  monitor: use aio_co_reschedule_self()

 include/hw/virtio/virtio-blk.h |   2 +-
 hw/block/virtio-blk.c  | 194 ++---
 qapi/qmp-dispatch.c|   7 +-
 3 files changed, 112 insertions(+), 91 deletions(-)

-- 
2.43.0




[PATCH v2 2/5] virtio-blk: clarify that there is at least 1 virtqueue

2024-02-06 Thread Stefan Hajnoczi
It is not possible to instantiate a virtio-blk device with 0 virtqueues.
The following check is located in ->realize():

  if (!conf->num_queues) {
  error_setg(errp, "num-queues property must be larger than 0");
  return;
  }

Later on we access s->vq_aio_context[0] under the assumption that there
is as least one virtqueue. Hanna Czenczek  noted that
it would help to show that the array index is already valid.

Add an assertion to document that s->vq_aio_context[0] is always
safe...and catch future code changes that break this assumption.

Suggested-by: Hanna Czenczek 
Reviewed-by: Manos Pitsidianakis 
Reviewed-by: Hanna Czenczek 
Signed-off-by: Stefan Hajnoczi 
---
 hw/block/virtio-blk.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index 6e3e3a23ee..e430ba583c 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -1824,6 +1824,7 @@ static int virtio_blk_start_ioeventfd(VirtIODevice *vdev)
  * Try to change the AioContext so that block jobs and other operations can
  * co-locate their activity in the same AioContext. If it fails, nevermind.
  */
+assert(nvqs > 0); /* enforced during ->realize() */
 r = blk_set_aio_context(s->conf.conf.blk, s->vq_aio_context[0],
 _err);
 if (r < 0) {
-- 
2.43.0




[PATCH v2 4/5] virtio-blk: declare VirtIOBlock::rq with a type

2024-02-06 Thread Stefan Hajnoczi
The VirtIOBlock::rq field has had the type void * since its introduction
in commit 869a5c6df19a ("Stop VM on error in virtio-blk. (Gleb
Natapov)").

Perhaps this was done to avoid the forward declaration of
VirtIOBlockReq.

Hanna Czenczek  pointed out the missing type. Specify
the actual type because there is no need to use void * here.

Suggested-by: Hanna Czenczek 
Reviewed-by: Manos Pitsidianakis 
Reviewed-by: Hanna Czenczek 
Signed-off-by: Stefan Hajnoczi 
---
 include/hw/virtio/virtio-blk.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/hw/virtio/virtio-blk.h b/include/hw/virtio/virtio-blk.h
index 833a9a344f..5c14110c4b 100644
--- a/include/hw/virtio/virtio-blk.h
+++ b/include/hw/virtio/virtio-blk.h
@@ -55,7 +55,7 @@ struct VirtIOBlock {
 VirtIODevice parent_obj;
 BlockBackend *blk;
 QemuMutex rq_lock;
-void *rq; /* protected by rq_lock */
+struct VirtIOBlockReq *rq; /* protected by rq_lock */
 VirtIOBlkConf conf;
 unsigned short sector_mask;
 bool original_wce;
-- 
2.43.0




[PATCH v2 5/5] monitor: use aio_co_reschedule_self()

2024-02-06 Thread Stefan Hajnoczi
The aio_co_reschedule_self() API is designed to avoid the race
condition between scheduling the coroutine in another AioContext and
yielding.

The QMP dispatch code uses the open-coded version that appears
susceptible to the race condition at first glance:

  aio_co_schedule(qemu_get_aio_context(), qemu_coroutine_self());
  qemu_coroutine_yield();

The code is actually safe because the iohandler and qemu_aio_context
AioContext run under the Big QEMU Lock. Nevertheless, set a good example
and use aio_co_reschedule_self() so it's obvious that there is no race.

Suggested-by: Hanna Reitz 
Reviewed-by: Manos Pitsidianakis 
Reviewed-by: Hanna Czenczek 
Signed-off-by: Stefan Hajnoczi 
---
 qapi/qmp-dispatch.c | 7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/qapi/qmp-dispatch.c b/qapi/qmp-dispatch.c
index 176b549473..f3488afeef 100644
--- a/qapi/qmp-dispatch.c
+++ b/qapi/qmp-dispatch.c
@@ -212,8 +212,7 @@ QDict *coroutine_mixed_fn qmp_dispatch(const QmpCommandList 
*cmds, QObject *requ
  * executing the command handler so that it can make progress if it
  * involves an AIO_WAIT_WHILE().
  */
-aio_co_schedule(qemu_get_aio_context(), qemu_coroutine_self());
-qemu_coroutine_yield();
+aio_co_reschedule_self(qemu_get_aio_context());
 }
 
 monitor_set_cur(qemu_coroutine_self(), cur_mon);
@@ -227,9 +226,7 @@ QDict *coroutine_mixed_fn qmp_dispatch(const QmpCommandList 
*cmds, QObject *requ
  * Move back to iohandler_ctx so that nested event loops for
  * qemu_aio_context don't start new monitor commands.
  */
-aio_co_schedule(iohandler_get_aio_context(),
-qemu_coroutine_self());
-qemu_coroutine_yield();
+aio_co_reschedule_self(iohandler_get_aio_context());
 }
 } else {
/*
-- 
2.43.0




Re: [PATCH] virtio-blk: do not use C99 mixed declarations

2024-02-06 Thread Stefan Hajnoczi
On Tue, Feb 06, 2024 at 10:57:37AM -0500, Michael S. Tsirkin wrote:
> On Tue, Feb 06, 2024 at 09:04:09AM -0500, Stefan Hajnoczi wrote:
> > QEMU's coding style generally forbids C99 mixed declarations.
> > 
> > Signed-off-by: Stefan Hajnoczi 
> 
> Acked-by: Michael S. Tsirkin 
> 
> or maybe it's time we moved on?

I sent a patch to allow C99 mixed declarations but a number of people
spoke out against changing the coding style so it didn't seem like a
good thing to pursue:
https://lore.kernel.org/qemu-devel/20240205171819.474283-1-stefa...@redhat.com/

Instead, I just want to make virtio-blk consistent with the coding
style...especially because I introduced one of the recent violations
:-).

Stefan


signature.asc
Description: PGP signature


Re: [PATCH v2 0/3] virtio: Re-enable notifications after drain

2024-02-06 Thread Stefan Hajnoczi
On Fri, Feb 02, 2024 at 04:31:55PM +0100, Hanna Czenczek wrote:
> v1:
> 
> https://lists.nongnu.org/archive/html/qemu-block/2024-01/msg00336.html
> 
> 
> Hi,
> 
> This is basically the same series as v1: When using
> aio_set_event_notifier_poll(), the io_poll_end() callback is only
> invoked when polling ends, not when the notifier is being removed while
> in a polling section.  This can leave the virtqueue notifier disabled
> during drained sections, which however is not a bad thing.  We just need
> to ensure they are re-enabled after the drain, and kick the virtqueue
> once to pick up all the requests that came in during the drained
> section.
> 
> Patch 1 is a technically unrelated fix, but addresses a problem that
> became visible with patch 2 applied.
> 
> Patch 3 is a small (optional) clean-up patch.
> 
> 
> v2:
> - Changed the title of this series and patch 2 (was: "Keep notifications
>   disabled durin drain"): Keeping the notifier disabled was something
>   the initial RFC did, this version (v1 too) just ensures the notifier
>   is enabled after the drain, regardless of its state before.
> 
> - Use event_notifier_set() instead of virtio_queue_notify() in patch 2
> 
> - Added patch 3
> 
> 
> Hanna Czenczek (3):
>   virtio-scsi: Attach event vq notifier with no_poll
>   virtio: Re-enable notifications after drain
>   virtio-blk: Use ioeventfd_attach in start_ioeventfd
> 
>  include/block/aio.h   |  7 ++-
>  hw/block/virtio-blk.c | 21 ++---
>  hw/scsi/virtio-scsi.c |  7 ++-
>  hw/virtio/virtio.c| 42 ++
>  4 files changed, 64 insertions(+), 13 deletions(-)
> 
> -- 
> 2.43.0
> 

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature


Re: [PATCH 2/2] scsi: Await request purging

2024-02-06 Thread Stefan Hajnoczi
On Fri, Feb 02, 2024 at 03:47:55PM +0100, Hanna Czenczek wrote:
> scsi_device_for_each_req_async() currently does not provide any way to
> be awaited.  One of its callers is scsi_device_purge_requests(), which
> therefore currently does not guarantee that all requests are fully
> settled when it returns.
> 
> We want all requests to be settled, because scsi_device_purge_requests()
> is called through the unrealize path, including the one invoked by
> virtio_scsi_hotunplug() through qdev_simple_device_unplug_cb(), which
> most likely assumes that all SCSI requests are done then.
> 
> In fact, scsi_device_purge_requests() already contains a blk_drain(),
> but this will not fully await scsi_device_for_each_req_async(), only the
> I/O requests it potentially cancels (not the non-I/O requests).
> However, we can have scsi_device_for_each_req_async() increment the BB
> in-flight counter, and have scsi_device_for_each_req_async_bh()
> decrement it when it is done.  This way, the blk_drain() will fully
> await all SCSI requests to be purged.
> 
> This also removes the need for scsi_device_for_each_req_async_bh() to
> double-check the current context and potentially re-schedule itself,
> should it now differ from the BB's context: Changing a BB's AioContext
> with a root node is done through bdrv_try_change_aio_context(), which
> creates a drained section.  With this patch, we keep the BB in-flight
> counter elevated throughout, so we know the BB's context cannot change.
> 
> Signed-off-by: Hanna Czenczek 
> ---
>  hw/scsi/scsi-bus.c | 30 +-
>  1 file changed, 21 insertions(+), 9 deletions(-)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature


Re: [PATCH 1/2] block-backend: Allow concurrent context changes

2024-02-06 Thread Stefan Hajnoczi
On Fri, Feb 02, 2024 at 03:47:54PM +0100, Hanna Czenczek wrote:
> Since AioContext locks have been removed, a BlockBackend's AioContext
> may really change at any time (only exception is that it is often
> confined to a drained section, as noted in this patch).  Therefore,
> blk_get_aio_context() cannot rely on its root node's context always
> matching that of the BlockBackend.
> 
> In practice, whether they match does not matter anymore anyway: Requests
> can be sent to BDSs from any context, so anyone who requests the BB's
> context should have no reason to require the root node to have the same
> context.  Therefore, we can and should remove the assertion to that
> effect.
> 
> In addition, because the context can be set and queried from different
> threads concurrently, it has to be accessed with atomic operations.
> 
> Buglink: https://issues.redhat.com/browse/RHEL-19381
> Suggested-by: Kevin Wolf 
> Signed-off-by: Hanna Czenczek 
> ---
>  block/block-backend.c | 22 +++---
>  1 file changed, 11 insertions(+), 11 deletions(-)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature


Re: [PATCH 0/2] block: Allow concurrent BB context changes

2024-02-06 Thread Stefan Hajnoczi
On Fri, Feb 02, 2024 at 03:47:53PM +0100, Hanna Czenczek wrote:
> Hi,
> 
> Without the AioContext lock, a BB's context may kind of change at any
> time (unless it has a root node, and I/O requests are pending).  That
> also means that its own context (BlockBackend.ctx) and that of its root
> node can differ sometimes (while the context is being changed).
> 
> blk_get_aio_context() doesn't know this yet and asserts that both are
> always equal (if there is a root node).  Because it's no longer true,
> and because callers don't seem to really care about the root node's
> context, we can and should remove the assertion and just return the BB's
> context.
> 
> Beyond that, the question is whether the callers of
> blk_get_aio_context() are OK with the context potentially changing
> concurrently.  Honestly, it isn't entirely clear to me; most look OK,
> except for the virtio-scsi code, which operates under the general
> assumption that the BB's context is always equal to that of the
> virtio-scsi device.  I doubt that this assumption always holds (it is
> definitely not obvious to me that it would), but then again, this series
> will not make matters worse in that regard, and that is what counts for
> me now.
> 
> One clear point of contention is scsi_device_for_each_req_async(), which
> is addressed by patch 2.  Right now, it schedules a BH in the BB
> context, then the BH double-checks whether the context still fits, and
> if not, re-schedules itself.  Because virtio-scsi's context is fixed,
> this seems to indicate to me that it wants to be able to deal with a
> case where BB and virtio-scsi context differ, which seems to break that
> aforementioned general virtio-scsi assumption.

I don't agree with the last sentence: virtio-scsi's context isn't fixed.

The AioContext changes when dataplane is started/stopped. virtio-scsi
switches AioContext between the IOThread's AioContext and the main
loop's qemu_aio_context.

However, virtio-scsi virtqueue processing only happens in the IOThread's
AioContext. Maybe this is what you meant when you said the AioContext is
fixed?

The BH function is aware that the current AioContext might not be the
same as the AioContext at the time the BH was scheduled. That doesn't
break assumptions in the code.

(It may be possible to rewrite virtio-blk, virtio-scsi, and core
VirtIODevice ioeventfd code to use the simpler model where the
AioContext really is fixed because things have changed significantly
over the years, but I looked a few weeks ago and it's difficult work.)

I'm just pointing out that I think this description is incomplete. I
*do* agree with what this patch series is doing :).

> Unfortunately, I definitely have to touch that code, because accepting
> concurrent changes of AioContexts breaks the double-check (just because
> the BB has the right context in that place does not mean it stays in
> that context); instead, we must prevent any concurrent change until the
> BH is done.  Because changing contexts generally involves a drained
> section, we can prevent it by keeping the BB in-flight counter elevated.
> 
> Question is, how to reason for that.  I’d really rather not include the
> need to follow the BB context in my argument, because I find that part a
> bit fishy.
> 
> Luckily, there’s a second, completely different reason for having
> scsi_device_for_each_req_async() increment the in-flight counter:
> Specifically, scsi_device_purge_requests() probably wants to await full
> completion of scsi_device_for_each_req_async(), and we can do that most
> easily in the very same way by incrementing the in-flight counter.  This
> way, the blk_drain() in scsi_device_purge_requests() will not only await
> all (cancelled) I/O requests, but also the non-I/O requests.
> 
> The fact that this prevents the BB AioContext from changing while the BH
> is scheduled/running then is just a nice side effect.
> 
> 
> Hanna Czenczek (2):
>   block-backend: Allow concurrent context changes
>   scsi: Await request purging
> 
>  block/block-backend.c | 22 +++---
>  hw/scsi/scsi-bus.c| 30 +-
>  2 files changed, 32 insertions(+), 20 deletions(-)
> 
> -- 
> 2.43.0
> 


signature.asc
Description: PGP signature


Re: [PATCH] vdpa-dev: Fix initialisation order to restore VDUSE compatibility

2024-02-06 Thread Eugenio Perez Martin
On Mon, Feb 5, 2024 at 2:49 PM Kevin Wolf  wrote:
>
> Am 05.02.2024 um 13:22 hat Eugenio Perez Martin geschrieben:
> > On Fri, Feb 2, 2024 at 2:25 PM Kevin Wolf  wrote:
> > >
> > > VDUSE requires that virtqueues are first enabled before the DRIVER_OK
> > > status flag is set; with the current API of the kernel module, it is
> > > impossible to enable the opposite order in our block export code because
> > > userspace is not notified when a virtqueue is enabled.
> > >
> > > This requirement also mathces the normal initialisation order as done by
> > > the generic vhost code in QEMU. However, commit 6c482547 accidentally
> > > changed the order for vdpa-dev and broke access to VDUSE devices with
> > > this.
> > >
> > > This changes vdpa-dev to use the normal order again and use the standard
> > > vhost callback .vhost_set_vring_enable for this. VDUSE devices can be
> > > used with vdpa-dev again after this fix.
> > >
> > > Cc: qemu-sta...@nongnu.org
> > > Fixes: 6c4825476a4351530bcac17abab72295b75ffe98
> > > Signed-off-by: Kevin Wolf 
> > > ---
> > >  hw/virtio/vdpa-dev.c   |  5 +
> > >  hw/virtio/vhost-vdpa.c | 17 +
> > >  2 files changed, 18 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/hw/virtio/vdpa-dev.c b/hw/virtio/vdpa-dev.c
> > > index eb9ecea83b..13e87f06f6 100644
> > > --- a/hw/virtio/vdpa-dev.c
> > > +++ b/hw/virtio/vdpa-dev.c
> > > @@ -253,14 +253,11 @@ static int vhost_vdpa_device_start(VirtIODevice 
> > > *vdev, Error **errp)
> > >
> > >  s->dev.acked_features = vdev->guest_features;
> > >
> > > -ret = vhost_dev_start(>dev, vdev, false);
> > > +ret = vhost_dev_start(>dev, vdev, true);
> > >  if (ret < 0) {
> > >  error_setg_errno(errp, -ret, "Error starting vhost");
> > >  goto err_guest_notifiers;
> > >  }
> > > -for (i = 0; i < s->dev.nvqs; ++i) {
> > > -vhost_vdpa_set_vring_ready(>vdpa, i);
> > > -}
> > >  s->started = true;
> > >
> > >  /*
> > > diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> > > index 3a43beb312..c4574d56c5 100644
> > > --- a/hw/virtio/vhost-vdpa.c
> > > +++ b/hw/virtio/vhost-vdpa.c
> > > @@ -904,6 +904,22 @@ int vhost_vdpa_set_vring_ready(struct vhost_vdpa *v, 
> > > unsigned idx)
> > >  return r;
> > >  }
> > >
> > > +static int vhost_vdpa_set_vring_enable(struct vhost_dev *dev, int enable)
> > > +{
> > > +struct vhost_vdpa *v = dev->opaque;
> > > +unsigned int i;
> > > +int ret;
> > > +
> > > +for (i = 0; i < dev->nvqs; ++i) {
> > > +ret = vhost_vdpa_set_vring_ready(v, i);
> > > +if (ret < 0) {
> > > +return ret;
> > > +}
> > > +}
> > > +
> > > +return 0;
> > > +}
> > > +
> > >  static int vhost_vdpa_set_config_call(struct vhost_dev *dev,
> > > int fd)
> > >  {
> > > @@ -1524,6 +1540,7 @@ const VhostOps vdpa_ops = {
> > >  .vhost_set_features = vhost_vdpa_set_features,
> > >  .vhost_reset_device = vhost_vdpa_reset_device,
> > >  .vhost_get_vq_index = vhost_vdpa_get_vq_index,
> > > +.vhost_set_vring_enable = vhost_vdpa_set_vring_enable,
> > >  .vhost_get_config  = vhost_vdpa_get_config,
> > >  .vhost_set_config = vhost_vdpa_set_config,
> > >  .vhost_requires_shm_log = NULL,
> >
> > vhost-vdpa net enables CVQ before dataplane ones to configure all the
> > device in the destination of a live migration. To go back again to
> > this callback would cause the device to enable the dataplane before
> > virtqueues are configured again.
>
> Not that it makes a difference, but I don't think it would actually be
> going back. Even before your commit 6c482547, we were not making use of
> the generic callback but just called the function in a slightly
> different place (but less different than after commit 6c482547).
>
> But anyway... Why don't the other vhost backend need the same for
> vhost-net then? Do they just not support live migration?
>

They don't support control virtqueue. More specifically, control
virtqueue is handled directly in QEMU.

> I don't know the code well enough to say where the problem is, but if
> vhost-vdpa networking code relies on the usual vhost operations not
> being implemented and bypasses VhostOps to replace it, that sounds like
> a design problem to me.

I don't follow this. What vhost operation is expected not to be implemented?

> Maybe VhostOps needs a new operation to enable
> just a single virtqueue that can be used by the networking code instead?
>
> > How does VDUSE userspace knows how many queues should enable? Can't
> > the kernel perform the necessary actions after DRIVER_OK, like
> > configuring the kick etc?
>
> Not sure if I understand the question. The vdpa kernel interface always
> enables individual queues, so the VDUSE userspace will enable whatever
> queues it was asked to enable. The only restriction is that the queues
> need to be enabled before setting DRIVER_OK.
>
> The 

Re: [PATCH] virtio-blk: do not use C99 mixed declarations

2024-02-06 Thread Michael S. Tsirkin
On Tue, Feb 06, 2024 at 09:04:09AM -0500, Stefan Hajnoczi wrote:
> QEMU's coding style generally forbids C99 mixed declarations.
> 
> Signed-off-by: Stefan Hajnoczi 

Acked-by: Michael S. Tsirkin 

or maybe it's time we moved on?

> ---
>  hw/block/virtio-blk.c | 25 ++---
>  1 file changed, 14 insertions(+), 11 deletions(-)
> 
> diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
> index 227d83569f..f6009cd9b3 100644
> --- a/hw/block/virtio-blk.c
> +++ b/hw/block/virtio-blk.c
> @@ -661,16 +661,16 @@ static void virtio_blk_zone_report_complete(void 
> *opaque, int ret)
>  int64_t zrp_size, n, j = 0;
>  int64_t nz = data->zone_report_data.nr_zones;
>  int8_t err_status = VIRTIO_BLK_S_OK;
> -
> -trace_virtio_blk_zone_report_complete(vdev, req, nz, ret);
> -if (ret) {
> -err_status = VIRTIO_BLK_S_ZONE_INVALID_CMD;
> -goto out;
> -}
> -
>  struct virtio_blk_zone_report zrp_hdr = (struct virtio_blk_zone_report) {
>  .nr_zones = cpu_to_le64(nz),
>  };
> +
> +trace_virtio_blk_zone_report_complete(vdev, req, nz, ret);
> +if (ret) {
> +err_status = VIRTIO_BLK_S_ZONE_INVALID_CMD;
> +goto out;
> +}
> +
>  zrp_size = sizeof(struct virtio_blk_zone_report)
> + sizeof(struct virtio_blk_zone_descriptor) * nz;
>  n = iov_from_buf(in_iov, in_num, 0, _hdr, sizeof(zrp_hdr));
> @@ -898,13 +898,14 @@ static int virtio_blk_handle_zone_append(VirtIOBlockReq 
> *req,
>  
>  int64_t offset = virtio_ldq_p(vdev, >out.sector) << 
> BDRV_SECTOR_BITS;
>  int64_t len = iov_size(out_iov, out_num);
> +ZoneCmdData *data;
>  
>  trace_virtio_blk_handle_zone_append(vdev, req, offset >> 
> BDRV_SECTOR_BITS);
>  if (!check_zoned_request(s, offset, len, true, _status)) {
>  goto out;
>  }
>  
> -ZoneCmdData *data = g_malloc(sizeof(ZoneCmdData));
> +data = g_malloc(sizeof(ZoneCmdData));
>  data->req = req;
>  data->in_iov = in_iov;
>  data->in_num = in_num;
> @@ -1191,14 +1192,15 @@ static void virtio_blk_dma_restart_cb(void *opaque, 
> bool running,
>  {
>  VirtIOBlock *s = opaque;
>  uint16_t num_queues = s->conf.num_queues;
> +g_autofree VirtIOBlockReq **vq_rq = NULL;
> +VirtIOBlockReq *rq;
>  
>  if (!running) {
>  return;
>  }
>  
>  /* Split the device-wide s->rq request list into per-vq request lists */
> -g_autofree VirtIOBlockReq **vq_rq = g_new0(VirtIOBlockReq *, num_queues);
> -VirtIOBlockReq *rq;
> +vq_rq = g_new0(VirtIOBlockReq *, num_queues);
>  
>  WITH_QEMU_LOCK_GUARD(>rq_lock) {
>  rq = s->rq;
> @@ -1924,6 +1926,7 @@ static void virtio_blk_device_realize(DeviceState *dev, 
> Error **errp)
>  VirtIODevice *vdev = VIRTIO_DEVICE(dev);
>  VirtIOBlock *s = VIRTIO_BLK(dev);
>  VirtIOBlkConf *conf = >conf;
> +BlockDriverState *bs;
>  Error *err = NULL;
>  unsigned i;
>  
> @@ -1969,7 +1972,7 @@ static void virtio_blk_device_realize(DeviceState *dev, 
> Error **errp)
>  return;
>  }
>  
> -BlockDriverState *bs = blk_bs(conf->conf.blk);
> +bs = blk_bs(conf->conf.blk);
>  if (bs->bl.zoned != BLK_Z_NONE) {
>  virtio_add_feature(>host_features, VIRTIO_BLK_F_ZONED);
>  if (bs->bl.zoned == BLK_Z_HM) {
> -- 
> 2.43.0




[PULL 0/1] Block patches

2024-02-06 Thread Stefan Hajnoczi
The following changes since commit 39a6e4f87e7b75a45b08d6dc8b8b7c2954c87440:

  Merge tag 'pull-qapi-2024-02-03' of https://repo.or.cz/qemu/armbru into 
staging (2024-02-03 13:31:58 +)

are available in the Git repository at:

  https://gitlab.com/stefanha/qemu.git tags/block-pull-request

for you to fetch changes up to 1d52cc0ac27761e296b14655c2f5b2649ee69491:

  virtio-blk: avoid using ioeventfd state in irqfd conditional (2024-02-06 
10:22:18 -0500)


Pull request

A bug fix for in-flight I/O during ioeventfd shutdown.



Stefan Hajnoczi (1):
  virtio-blk: avoid using ioeventfd state in irqfd conditional

 hw/block/virtio-blk.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

-- 
2.43.0




[PULL 1/1] virtio-blk: avoid using ioeventfd state in irqfd conditional

2024-02-06 Thread Stefan Hajnoczi
Requests that complete in an IOThread use irqfd to notify the guest
while requests that complete in the main loop thread use the traditional
qdev irq code path. The reason for this conditional is that the irq code
path requires the BQL:

  if (s->ioeventfd_started && !s->ioeventfd_disabled) {
  virtio_notify_irqfd(vdev, req->vq);
  } else {
  virtio_notify(vdev, req->vq);
  }

There is a corner case where the conditional invokes the irq code path
instead of the irqfd code path:

  static void virtio_blk_stop_ioeventfd(VirtIODevice *vdev)
  {
  ...
  /*
   * Set ->ioeventfd_started to false before draining so that host notifiers
   * are not detached/attached anymore.
   */
  s->ioeventfd_started = false;

  /* Wait for virtio_blk_dma_restart_bh() and in flight I/O to complete */
  blk_drain(s->conf.conf.blk);

During blk_drain() the conditional produces the wrong result because
ioeventfd_started is false.

Use qemu_in_iothread() instead of checking the ioeventfd state.

Buglink: https://issues.redhat.com/browse/RHEL-15394
Signed-off-by: Stefan Hajnoczi 
Message-id: 20240122172625.415386-1-stefa...@redhat.com
Signed-off-by: Stefan Hajnoczi 
---
 hw/block/virtio-blk.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index 227d83569f..287c31ee3c 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -64,7 +64,7 @@ static void virtio_blk_req_complete(VirtIOBlockReq *req, 
unsigned char status)
 iov_discard_undo(>inhdr_undo);
 iov_discard_undo(>outhdr_undo);
 virtqueue_push(req->vq, >elem, req->in_len);
-if (s->ioeventfd_started && !s->ioeventfd_disabled) {
+if (qemu_in_iothread()) {
 virtio_notify_irqfd(vdev, req->vq);
 } else {
 virtio_notify(vdev, req->vq);
-- 
2.43.0




Re: [PATCH 1/5] virtio-blk: enforce iothread-vq-mapping validation

2024-02-06 Thread Stefan Hajnoczi
On Tue, Feb 06, 2024 at 09:29:29AM +0200, Manos Pitsidianakis wrote:
> On Mon, 05 Feb 2024 19:26, Stefan Hajnoczi  wrote:
> > Hanna Czenczek  noticed that the safety of
> > `vq_aio_context[vq->value] = ctx;` with user-defined vq->value inputs is
> > not obvious.
> > 
> > The code is structured in validate() + apply() steps so input validation
> > is there, but it happens way earlier and there is nothing that
> > guarantees apply() can only be called with validated inputs.
> > 
> > This patch moves the validate() call inside the apply() function so
> > validation is guaranteed. I also added the bounds checking assertion
> > that Hanna suggested.
> > 
> > Signed-off-by: Stefan Hajnoczi 
> > ---
> > hw/block/virtio-blk.c | 192 +++---
> > 1 file changed, 107 insertions(+), 85 deletions(-)
> > 
> > diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
> > index 227d83569f..e8b37fd5f4 100644
> > --- a/hw/block/virtio-blk.c
> > +++ b/hw/block/virtio-blk.c
> > @@ -1485,6 +1485,72 @@ static int virtio_blk_load_device(VirtIODevice 
> > *vdev, QEMUFile *f,
> > return 0;
> > }
> > 
> > +static void virtio_resize_cb(void *opaque)
> > +{
> > +VirtIODevice *vdev = opaque;
> > +
> > +assert(qemu_get_current_aio_context() == qemu_get_aio_context());
> > +virtio_notify_config(vdev);
> > +}
> > +
> > +static void virtio_blk_resize(void *opaque)
> > +{
> > +VirtIODevice *vdev = VIRTIO_DEVICE(opaque);
> > +
> > +/*
> > + * virtio_notify_config() needs to acquire the BQL,
> > + * so it can't be called from an iothread. Instead, schedule
> > + * it to be run in the main context BH.
> > + */
> > +aio_bh_schedule_oneshot(qemu_get_aio_context(), virtio_resize_cb, 
> > vdev);
> > +}
> > +
> > +static void virtio_blk_ioeventfd_detach(VirtIOBlock *s)
> > +{
> > +VirtIODevice *vdev = VIRTIO_DEVICE(s);
> > +
> > +for (uint16_t i = 0; i < s->conf.num_queues; i++) {
> > +VirtQueue *vq = virtio_get_queue(vdev, i);
> > +virtio_queue_aio_detach_host_notifier(vq, s->vq_aio_context[i]);
> > +}
> > +}
> > +
> > +static void virtio_blk_ioeventfd_attach(VirtIOBlock *s)
> > +{
> > +VirtIODevice *vdev = VIRTIO_DEVICE(s);
> > +
> > +for (uint16_t i = 0; i < s->conf.num_queues; i++) {
> > +VirtQueue *vq = virtio_get_queue(vdev, i);
> > +virtio_queue_aio_attach_host_notifier(vq, s->vq_aio_context[i]);
> > +}
> > +}
> > +
> > +/* Suspend virtqueue ioeventfd processing during drain */
> > +static void virtio_blk_drained_begin(void *opaque)
> > +{
> > +VirtIOBlock *s = opaque;
> > +
> > +if (s->ioeventfd_started) {
> > +virtio_blk_ioeventfd_detach(s);
> > +}
> > +}
> > +
> > +/* Resume virtqueue ioeventfd processing after drain */
> > +static void virtio_blk_drained_end(void *opaque)
> > +{
> > +VirtIOBlock *s = opaque;
> > +
> > +if (s->ioeventfd_started) {
> > +virtio_blk_ioeventfd_attach(s);
> > +}
> > +}
> > +
> > +static const BlockDevOps virtio_block_ops = {
> > +.resize_cb = virtio_blk_resize,
> > +.drained_begin = virtio_blk_drained_begin,
> > +.drained_end   = virtio_blk_drained_end,
> > +};
> > +
> > static bool
> > validate_iothread_vq_mapping_list(IOThreadVirtQueueMappingList *list,
> > uint16_t num_queues, Error **errp)
> > @@ -1547,81 +1613,33 @@ 
> > validate_iothread_vq_mapping_list(IOThreadVirtQueueMappingList *list,
> > return true;
> > }
> > 
> > -static void virtio_resize_cb(void *opaque)
> > -{
> > -VirtIODevice *vdev = opaque;
> > -
> > -assert(qemu_get_current_aio_context() == qemu_get_aio_context());
> > -virtio_notify_config(vdev);
> > -}
> > -
> > -static void virtio_blk_resize(void *opaque)
> > -{
> > -VirtIODevice *vdev = VIRTIO_DEVICE(opaque);
> > -
> > -/*
> > - * virtio_notify_config() needs to acquire the BQL,
> > - * so it can't be called from an iothread. Instead, schedule
> > - * it to be run in the main context BH.
> > - */
> > -aio_bh_schedule_oneshot(qemu_get_aio_context(), virtio_resize_cb, 
> > vdev);
> > -}
> > -
> > -static void virtio_blk_ioeventfd_detach(VirtIOBlock *s)
> > -{
> > -VirtIODevice *vdev = VIRTIO_DEVICE(s);
> > -
> > -for (uint16_t i = 0; i < s->conf.num_queues; i++) {
> > -VirtQueue *vq = virtio_get_queue(vdev, i);
> > -virtio_queue_aio_detach_host_notifier(vq, s->vq_aio_context[i]);
> > -}
> > -}
> > -
> > -static void virtio_blk_ioeventfd_attach(VirtIOBlock *s)
> > -{
> > -VirtIODevice *vdev = VIRTIO_DEVICE(s);
> > -
> > -for (uint16_t i = 0; i < s->conf.num_queues; i++) {
> > -VirtQueue *vq = virtio_get_queue(vdev, i);
> > -virtio_queue_aio_attach_host_notifier(vq, s->vq_aio_context[i]);
> > -}
> > -}
> > -
> > -/* Suspend virtqueue ioeventfd processing during drain */
> > -static void virtio_blk_drained_begin(void *opaque)
> > -{
> > -VirtIOBlock *s = opaque;
> > -
> > -if 

Re: [PATCH] virtio-blk: do not use C99 mixed declarations

2024-02-06 Thread Hanna Czenczek

On 06.02.24 15:04, Stefan Hajnoczi wrote:

QEMU's coding style generally forbids C99 mixed declarations.

Signed-off-by: Stefan Hajnoczi 
---
  hw/block/virtio-blk.c | 25 ++---
  1 file changed, 14 insertions(+), 11 deletions(-)


Reviewed-by: Hanna Czenczek 




Re: [PATCH 5/5] monitor: use aio_co_reschedule_self()

2024-02-06 Thread Hanna Czenczek

On 05.02.24 18:26, Stefan Hajnoczi wrote:

The aio_co_reschedule_self() API is designed to avoid the race
condition between scheduling the coroutine in another AioContext and
yielding.

The QMP dispatch code uses the open-coded version that appears
susceptible to the race condition at first glance:

   aio_co_schedule(qemu_get_aio_context(), qemu_coroutine_self());
   qemu_coroutine_yield();

The code is actually safe because the iohandler and qemu_aio_context
AioContext run under the Big QEMU Lock. Nevertheless, set a good example
and use aio_co_reschedule_self() so it's obvious that there is no race.

Suggested-by: Hanna Reitz 
Signed-off-by: Stefan Hajnoczi 
---
  qapi/qmp-dispatch.c | 7 ++-
  1 file changed, 2 insertions(+), 5 deletions(-)


Reviewed-by: Hanna Czenczek 




Re: [PATCH 4/5] virtio-blk: declare VirtIOBlock::rq with a type

2024-02-06 Thread Hanna Czenczek

On 05.02.24 18:26, Stefan Hajnoczi wrote:

The VirtIOBlock::rq field has had the type void * since its introduction
in commit 869a5c6df19a ("Stop VM on error in virtio-blk. (Gleb
Natapov)").

Perhaps this was done to avoid the forward declaration of
VirtIOBlockReq.

Hanna Czenczek  pointed out the missing type. Specify
the actual type because there is no need to use void * here.

Suggested-by: Hanna Czenczek 
Signed-off-by: Stefan Hajnoczi 
---
  include/hw/virtio/virtio-blk.h | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)


Reviewed-by: Hanna Czenczek 




Re: [PATCH 3/5] virtio-blk: add vq_rq[] bounds check in virtio_blk_dma_restart_cb()

2024-02-06 Thread Hanna Czenczek

On 05.02.24 18:26, Stefan Hajnoczi wrote:

Hanna Czenczek  noted that the array index in
virtio_blk_dma_restart_cb() is not bounds-checked:

   g_autofree VirtIOBlockReq **vq_rq = g_new0(VirtIOBlockReq *, num_queues);
   ...
   while (rq) {
   VirtIOBlockReq *next = rq->next;
   uint16_t idx = virtio_get_queue_index(rq->vq);

   rq->next = vq_rq[idx];
  ^^

The code is correct because both rq->vq and vq_rq[] depend on
num_queues, but this is indirect and not 100% obvious. Add an assertion.

Suggested-by: Hanna Czenczek 
Signed-off-by: Stefan Hajnoczi 
---
  hw/block/virtio-blk.c | 1 +
  1 file changed, 1 insertion(+)


Reviewed-by: Hanna Czenczek 




Re: [PATCH 2/5] virtio-blk: clarify that there is at least 1 virtqueue

2024-02-06 Thread Hanna Czenczek

On 05.02.24 18:26, Stefan Hajnoczi wrote:

It is not possible to instantiate a virtio-blk device with 0 virtqueues.
The following check is located in ->realize():

   if (!conf->num_queues) {
   error_setg(errp, "num-queues property must be larger than 0");
   return;
   }

Later on we access s->vq_aio_context[0] under the assumption that there
is as least one virtqueue. Hanna Czenczek  noted that
it would help to show that the array index is already valid.

Add an assertion to document that s->vq_aio_context[0] is always
safe...and catch future code changes that break this assumption.

Suggested-by: Hanna Czenczek
Signed-off-by: Stefan Hajnoczi
---
  hw/block/virtio-blk.c | 1 +
  1 file changed, 1 insertion(+)


Reviewed-by: Hanna Czenczek 

Re: [PATCH 1/5] virtio-blk: enforce iothread-vq-mapping validation

2024-02-06 Thread Hanna Czenczek

On 05.02.24 18:26, Stefan Hajnoczi wrote:

Hanna Czenczek  noticed that the safety of
`vq_aio_context[vq->value] = ctx;` with user-defined vq->value inputs is
not obvious.

The code is structured in validate() + apply() steps so input validation
is there, but it happens way earlier and there is nothing that
guarantees apply() can only be called with validated inputs.

This patch moves the validate() call inside the apply() function so
validation is guaranteed. I also added the bounds checking assertion
that Hanna suggested.

Signed-off-by: Stefan Hajnoczi 
---
  hw/block/virtio-blk.c | 192 +++---
  1 file changed, 107 insertions(+), 85 deletions(-)

diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index 227d83569f..e8b37fd5f4 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c


[...]


@@ -1660,6 +1681,14 @@ static bool virtio_blk_vq_aio_context_init(VirtIOBlock 
*s, Error **errp)
  BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(vdev)));
  VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus);
  
+if (conf->iothread && conf->iothread_vq_mapping_list) {

+if (conf->iothread) {


This inner condition should probably be dropped.  With that done:

Reviewed-by: Hanna Czenczek 


+error_setg(errp, "iothread and iothread-vq-mapping properties "
+ "cannot be set at the same time");
+return false;
+}
+}
+
  if (conf->iothread || conf->iothread_vq_mapping_list) {
  if (!k->set_guest_notifiers || !k->ioeventfd_assign) {
  error_setg(errp,





Re: Re: [PATCH] vdpa-dev: Fix initialisation order to restore VDUSE compatibility

2024-02-06 Thread Stefano Garzarella
On Tue, Feb 6, 2024 at 9:31 AM Stefano Garzarella  wrote:
>
> On Tue, Feb 06, 2024 at 10:47:40AM +0800, Jason Wang wrote:
> >On Mon, Feb 5, 2024 at 6:51 PM Stefano Garzarella  
> >wrote:
> >>
> >> On Fri, Feb 02, 2024 at 02:25:21PM +0100, Kevin Wolf wrote:
> >> >VDUSE requires that virtqueues are first enabled before the DRIVER_OK
> >> >status flag is set; with the current API of the kernel module, it is
> >> >impossible to enable the opposite order in our block export code because
> >> >userspace is not notified when a virtqueue is enabled.
> >
> >Did this mean virtio-blk will enable a virtqueue after DRIVER_OK?
>
> It's not specific to virtio-blk, but to the generic vdpa device we have
> in QEMU (i.e. vhost-vdpa-device). Yep, after commit
> 6c4825476a4351530bcac17abab72295b75ffe98, virtqueues are enabled after
> DRIVER_OK.
>
> >Sepc is not clear about this and that's why we introduce
> >VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK.
>
> Ah, I didn't know about this new feature. So after commit
> 6c4825476a4351530bcac17abab72295b75ffe98 the vhost-vdpa-device is not
> complying with the specification, right?
>
> >
> >>
> >> Yeah, IMHO the VDUSE protocol is missing a VDUSE_SET_VQ_READY message,
> >
> >I think you meant when VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK is
> >negotiated.
>
> At this point yes. But if VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK is not
> negotiated, should we return an error in vhost-vdpa kernel module if
> VHOST_VDPA_SET_VRING_ENABLE is called when DRIVER_OK is already set?

I just sent a patch:
https://lore.kernel.org/virtualization/20240206145154.118044-1-sgarz...@redhat.com/T/#u

Then I discovered that we never check the return value of
vhost_vdpa_set_vring_ready() in QEMU.
I'll send a patch for that!

Stefano

>
> >If this is truth, it seems a little more complicated, for
> >example the get_backend_features needs to be forward to the userspace?
>
> I'm not understanding, don't we already have VHOST_GET_BACKEND_FEATURES
> for this? Or do you mean userspace on the VDUSE side?
>
> >This seems suboptimal to implement this in the spec first and then we
> >can leverage the features. Or we can have another parameter for the
> >ioctl that creates the vduse device.
>
> I got a little lost, though in vhost-user, the device can always expect
> a vring_enable/disable, so I thought it was not complicated in VDUSE.
>
> >
> >> I'll start another thread about that, but in the meantime I agree that
> >> we should fix QEMU since we need to work properly with old kernels as
> >> well.
> >>
> >> >
> >> >This requirement also mathces the normal initialisation order as done by
> >> >the generic vhost code in QEMU. However, commit 6c482547 accidentally
> >> >changed the order for vdpa-dev and broke access to VDUSE devices with
> >> >this.
> >> >
> >> >This changes vdpa-dev to use the normal order again and use the standard
> >> >vhost callback .vhost_set_vring_enable for this. VDUSE devices can be
> >> >used with vdpa-dev again after this fix.
> >>
> >> I like this approach and the patch LGTM, but I'm a bit worried about
> >> this function in hw/net/vhost_net.c:
> >>
> >>  int vhost_set_vring_enable(NetClientState *nc, int enable)
> >>  {
> >>  VHostNetState *net = get_vhost_net(nc);
> >>  const VhostOps *vhost_ops = net->dev.vhost_ops;
> >>
> >>  nc->vring_enable = enable;
> >>
> >>  if (vhost_ops && vhost_ops->vhost_set_vring_enable) {
> >>  return vhost_ops->vhost_set_vring_enable(>dev, enable);
> >>  }
> >>
> >>  return 0;
> >>  }
> >>
> >> @Eugenio, @Jason, should we change some things there if vhost-vdpa
> >> implements the vhost_set_vring_enable callback?
> >
> >Eugenio may know more, I remember we need to enable cvq first for
> >shadow virtqueue to restore some states.
> >
> >>
> >> Do you remember why we didn't implement it from the beginning?
> >
> >It seems the vrings parameter is introduced after vhost-vdpa is
> >implemented.
>
> Sorry, I mean why we didn't implement the vhost_set_vring_enable
> callback for vhost-vdpa from the beginning.
>
> Thanks,
> Stefano




[PATCH] virtio-blk: do not use C99 mixed declarations

2024-02-06 Thread Stefan Hajnoczi
QEMU's coding style generally forbids C99 mixed declarations.

Signed-off-by: Stefan Hajnoczi 
---
 hw/block/virtio-blk.c | 25 ++---
 1 file changed, 14 insertions(+), 11 deletions(-)

diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index 227d83569f..f6009cd9b3 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -661,16 +661,16 @@ static void virtio_blk_zone_report_complete(void *opaque, 
int ret)
 int64_t zrp_size, n, j = 0;
 int64_t nz = data->zone_report_data.nr_zones;
 int8_t err_status = VIRTIO_BLK_S_OK;
-
-trace_virtio_blk_zone_report_complete(vdev, req, nz, ret);
-if (ret) {
-err_status = VIRTIO_BLK_S_ZONE_INVALID_CMD;
-goto out;
-}
-
 struct virtio_blk_zone_report zrp_hdr = (struct virtio_blk_zone_report) {
 .nr_zones = cpu_to_le64(nz),
 };
+
+trace_virtio_blk_zone_report_complete(vdev, req, nz, ret);
+if (ret) {
+err_status = VIRTIO_BLK_S_ZONE_INVALID_CMD;
+goto out;
+}
+
 zrp_size = sizeof(struct virtio_blk_zone_report)
+ sizeof(struct virtio_blk_zone_descriptor) * nz;
 n = iov_from_buf(in_iov, in_num, 0, _hdr, sizeof(zrp_hdr));
@@ -898,13 +898,14 @@ static int virtio_blk_handle_zone_append(VirtIOBlockReq 
*req,
 
 int64_t offset = virtio_ldq_p(vdev, >out.sector) << BDRV_SECTOR_BITS;
 int64_t len = iov_size(out_iov, out_num);
+ZoneCmdData *data;
 
 trace_virtio_blk_handle_zone_append(vdev, req, offset >> BDRV_SECTOR_BITS);
 if (!check_zoned_request(s, offset, len, true, _status)) {
 goto out;
 }
 
-ZoneCmdData *data = g_malloc(sizeof(ZoneCmdData));
+data = g_malloc(sizeof(ZoneCmdData));
 data->req = req;
 data->in_iov = in_iov;
 data->in_num = in_num;
@@ -1191,14 +1192,15 @@ static void virtio_blk_dma_restart_cb(void *opaque, 
bool running,
 {
 VirtIOBlock *s = opaque;
 uint16_t num_queues = s->conf.num_queues;
+g_autofree VirtIOBlockReq **vq_rq = NULL;
+VirtIOBlockReq *rq;
 
 if (!running) {
 return;
 }
 
 /* Split the device-wide s->rq request list into per-vq request lists */
-g_autofree VirtIOBlockReq **vq_rq = g_new0(VirtIOBlockReq *, num_queues);
-VirtIOBlockReq *rq;
+vq_rq = g_new0(VirtIOBlockReq *, num_queues);
 
 WITH_QEMU_LOCK_GUARD(>rq_lock) {
 rq = s->rq;
@@ -1924,6 +1926,7 @@ static void virtio_blk_device_realize(DeviceState *dev, 
Error **errp)
 VirtIODevice *vdev = VIRTIO_DEVICE(dev);
 VirtIOBlock *s = VIRTIO_BLK(dev);
 VirtIOBlkConf *conf = >conf;
+BlockDriverState *bs;
 Error *err = NULL;
 unsigned i;
 
@@ -1969,7 +1972,7 @@ static void virtio_blk_device_realize(DeviceState *dev, 
Error **errp)
 return;
 }
 
-BlockDriverState *bs = blk_bs(conf->conf.blk);
+bs = blk_bs(conf->conf.blk);
 if (bs->bl.zoned != BLK_Z_NONE) {
 virtio_add_feature(>host_features, VIRTIO_BLK_F_ZONED);
 if (bs->bl.zoned == BLK_Z_HM) {
-- 
2.43.0




Re: [PATCH] tests/cdrom-test: Add cdrom test for LoongArch virt machine

2024-02-06 Thread Thomas Huth

On 06/02/2024 03.29, maobibo wrote:

Hi Philippe,

On 2024/2/5 下午8:58, Philippe Mathieu-Daudé wrote:

Hi Bibo,

On 5/2/24 03:13, Bibo Mao wrote:

The cdrom test skips to execute on LoongArch system with command
"make check", this patch enables cdrom test for LoongArch virt
machine platform.

With this patch, cdrom test passes to run on LoongArch virt
machine type.

Signed-off-by: Bibo Mao 
---
  tests/qtest/cdrom-test.c | 3 +++
  1 file changed, 3 insertions(+)

diff --git a/tests/qtest/cdrom-test.c b/tests/qtest/cdrom-test.c
index 0945383789..c8b97d8d9a 100644
--- a/tests/qtest/cdrom-test.c
+++ b/tests/qtest/cdrom-test.c
@@ -271,6 +271,9 @@ int main(int argc, char **argv)
  const char *virtmachine[] = { "virt", NULL };
  add_cdrom_param_tests(virtmachine);
  }
+    } else if (g_str_equal(arch, "loongarch64")) {
+    const char *virtmachine[] = { "virt", NULL };
+    add_cdrom_param_tests(virtmachine);


What is the default device used, virtio-blk-pci?


yes, it is. For virt machine type, the default type for block device is
virtio interface, and it is defined at function loongarch_class_init().
    mc->block_default_type = IF_VIRTIO


Ok, then you might need to check whether your patch still works when you run 
"configure" with "--without-default-devices". You might need to check with 
'if (qtest_has_device("virtio-blk-pci"))' whether the device is really 
available in the binary, like it is done some lines earlier in the arm case.


 Thomas




Re: Re: [PATCH] vdpa-dev: Fix initialisation order to restore VDUSE compatibility

2024-02-06 Thread Stefano Garzarella

On Tue, Feb 06, 2024 at 10:47:40AM +0800, Jason Wang wrote:

On Mon, Feb 5, 2024 at 6:51 PM Stefano Garzarella  wrote:


On Fri, Feb 02, 2024 at 02:25:21PM +0100, Kevin Wolf wrote:
>VDUSE requires that virtqueues are first enabled before the DRIVER_OK
>status flag is set; with the current API of the kernel module, it is
>impossible to enable the opposite order in our block export code because
>userspace is not notified when a virtqueue is enabled.


Did this mean virtio-blk will enable a virtqueue after DRIVER_OK?


It's not specific to virtio-blk, but to the generic vdpa device we have 
in QEMU (i.e. vhost-vdpa-device). Yep, after commit 
6c4825476a4351530bcac17abab72295b75ffe98, virtqueues are enabled after 
DRIVER_OK.



Sepc is not clear about this and that's why we introduce
VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK.


Ah, I didn't know about this new feature. So after commit 
6c4825476a4351530bcac17abab72295b75ffe98 the vhost-vdpa-device is not 
complying with the specification, right?






Yeah, IMHO the VDUSE protocol is missing a VDUSE_SET_VQ_READY message,


I think you meant when VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK is
negotiated.


At this point yes. But if VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK is not 
negotiated, should we return an error in vhost-vdpa kernel module if 
VHOST_VDPA_SET_VRING_ENABLE is called when DRIVER_OK is already set?



If this is truth, it seems a little more complicated, for
example the get_backend_features needs to be forward to the userspace?


I'm not understanding, don't we already have VHOST_GET_BACKEND_FEATURES 
for this? Or do you mean userspace on the VDUSE side?



This seems suboptimal to implement this in the spec first and then we
can leverage the features. Or we can have another parameter for the
ioctl that creates the vduse device.


I got a little lost, though in vhost-user, the device can always expect 
a vring_enable/disable, so I thought it was not complicated in VDUSE.





I'll start another thread about that, but in the meantime I agree that
we should fix QEMU since we need to work properly with old kernels as
well.

>
>This requirement also mathces the normal initialisation order as done by
>the generic vhost code in QEMU. However, commit 6c482547 accidentally
>changed the order for vdpa-dev and broke access to VDUSE devices with
>this.
>
>This changes vdpa-dev to use the normal order again and use the standard
>vhost callback .vhost_set_vring_enable for this. VDUSE devices can be
>used with vdpa-dev again after this fix.

I like this approach and the patch LGTM, but I'm a bit worried about
this function in hw/net/vhost_net.c:

 int vhost_set_vring_enable(NetClientState *nc, int enable)
 {
 VHostNetState *net = get_vhost_net(nc);
 const VhostOps *vhost_ops = net->dev.vhost_ops;

 nc->vring_enable = enable;

 if (vhost_ops && vhost_ops->vhost_set_vring_enable) {
 return vhost_ops->vhost_set_vring_enable(>dev, enable);
 }

 return 0;
 }

@Eugenio, @Jason, should we change some things there if vhost-vdpa
implements the vhost_set_vring_enable callback?


Eugenio may know more, I remember we need to enable cvq first for
shadow virtqueue to restore some states.



Do you remember why we didn't implement it from the beginning?


It seems the vrings parameter is introduced after vhost-vdpa is 
implemented.


Sorry, I mean why we didn't implement the vhost_set_vring_enable 
callback for vhost-vdpa from the beginning.


Thanks,
Stefano