date:20230404

Re: [PATCH 00/13] block: remove aio_disable_external() API

2023-04-04 Thread Stefan Hajnoczi

On Tue, Apr 04, 2023 at 03:43:20PM +0200, Paolo Bonzini wrote:
> On 4/3/23 20:29, Stefan Hajnoczi wrote:
> > The aio_disable_external() API temporarily suspends file descriptor 
> > monitoring
> > in the event loop. The block layer uses this to prevent new I/O requests 
> > being
> > submitted from the guest and elsewhere between bdrv_drained_begin() and
> > bdrv_drained_end().
> > 
> > While the block layer still needs to prevent new I/O requests in drained
> > sections, the aio_disable_external() API can be replaced with
> > .drained_begin/end/poll() callbacks that have been added to BdrvChildClass 
> > and
> > BlockDevOps.
> > 
> > This newer .bdrained_begin/end/poll() approach is attractive because it 
> > works
> > without specifying a specific AioContext. The block layer is moving towards
> > multi-queue and that means multiple AioContexts may be processing I/O
> > simultaneously.
> > 
> > The aio_disable_external() was always somewhat hacky. It suspends all file
> > descriptors that were registered with is_external=true, even if they have
> > nothing to do with the BlockDriverState graph nodes that are being drained.
> > It's better to solve a block layer problem in the block layer than to have 
> > an
> > odd event loop API solution.
> > 
> > That covers the motivation for this change, now on to the specifics of this
> > series:
> > 
> > While it would be nice if a single conceptual approach could be applied to 
> > all
> > is_external=true file descriptors, I ended up looking at callers on a
> > case-by-case basis. There are two general ways I migrated code away from
> > is_external=true:
> > 
> > 1. Block exports are typically best off unregistering fds in 
> > .drained_begin()
> > and registering them again in .drained_end(). The .drained_poll() 
> > function
> > waits for in-flight requests to finish using a reference counter.
> > 
> > 2. Emulated storage controllers like virtio-blk and virtio-scsi are a little
> > simpler. They can rely on BlockBackend's request queuing during drain
> > feature. Guest I/O request coroutines are suspended in a drained 
> > section and
> > resume upon the end of the drained section.
> 
> Sorry, I disagree with this.
> 
> Request queuing was shown to cause deadlocks; Hanna's latest patch is piling
> another hack upon it, instead in my opinion we should go in the direction of
> relying _less_ (or not at all) on request queuing.
> 
> I am strongly convinced that request queuing must apply only after
> bdrv_drained_begin has returned, which would also fix the IDE TRIM bug
> reported by Fiona Ebner.  The possible livelock scenario is generally not a
> problem because 1) outside an iothread you have anyway the BQL that prevents
> a vCPU from issuing more I/O operations during bdrv_drained_begin 2) in
> iothreads you have aio_disable_external() instead of .drained_begin().
> 
> It is also less tidy to start a request during the drained_begin phase,
> because a request that has been submitted has to be completed (cancel
> doesn't really work).
> 
> So in an ideal world, request queuing would not only apply only after
> bdrv_drained_begin has returned, it would log a warning and .drained_begin()
> should set up things so that there are no such warnings.

That's fine, I will give .drained_begin/end/poll() a try with virtio-blk
and virtio-scsi in the next revision.

Stefan


signature.asc
Description: PGP signature

Re: [PATCH 11/13] block/fuse: take AioContext lock around blk_exp_ref/unref()

2023-04-04 Thread Stefan Hajnoczi

On Tue, Apr 04, 2023 at 03:46:34PM +0200, Paolo Bonzini wrote:
> On 4/3/23 20:30, Stefan Hajnoczi wrote:
> > These functions must be called with the AioContext acquired:
> > 
> >/* Callers must hold exp->ctx lock */
> >void blk_exp_ref(BlockExport *exp)
> >...
> >/* Callers must hold exp->ctx lock */
> >void blk_exp_unref(BlockExport *exp)
> > 
> > Signed-off-by: Stefan Hajnoczi 
> > ---
> >   block/export/fuse.c | 4 
> >   1 file changed, 4 insertions(+)
> > 
> > diff --git a/block/export/fuse.c b/block/export/fuse.c
> > index 06fa41079e..18394f9e07 100644
> > --- a/block/export/fuse.c
> > +++ b/block/export/fuse.c
> > @@ -244,7 +244,9 @@ static void read_from_fuse_export(void *opaque)
> >   FuseExport *exp = opaque;
> >   int ret;
> > +aio_context_acquire(exp->common.ctx);
> >   blk_exp_ref(&exp->common);
> > +aio_context_release(exp->common.ctx);
> >   do {
> >   ret = fuse_session_receive_buf(exp->fuse_session, &exp->fuse_buf);
> > @@ -256,7 +258,9 @@ static void read_from_fuse_export(void *opaque)
> >   fuse_session_process_buf(exp->fuse_session, &exp->fuse_buf);
> >   out:
> > +aio_context_acquire(exp->common.ctx);
> >   blk_exp_unref(&exp->common);
> > +aio_context_release(exp->common.ctx);
> >   }
> 
> Since the actual thread-unsafe work is done in a bottom half, perhaps
> instead you can use qatomic_inc and qatomic_fetch_dec in
> blk_exp_{ref,unref}?

Sure, I'll give that a try in the next revision.

Stefan


signature.asc
Description: PGP signature

Re: [PATCH v9 0/5] Add zoned storage emulation to virtio-blk driver

2023-04-04 Thread Stefan Hajnoczi

On Tue, Apr 04, 2023 at 11:46:13PM +0800, Sam Li wrote:
> Stefan Hajnoczi  于2023年4月3日周一 20:18写道：
> >
> > On Wed, 29 Mar 2023 at 01:01, Michael S. Tsirkin  wrote:
> > >
> > > On Mon, Mar 27, 2023 at 10:45:48PM +0800, Sam Li wrote:
> > >
> > > virtio bits look ok.
> > >
> > > Reviewed-by: Michael S. Tsirkin 
> > >
> > > merge through block layer tree I'm guessing?
> >
> > Sounds good. Thank you!
> 
> Hi Stefan,
> 
> I've sent the v8 zone append write to the list where I move the wps
> field to BlockDriverState. It will make a small change the emulation
> code, which is in hw/block/virtio-blk.c of [2/5] virtio-blk: add zoned
> storage emulation for zoned devices:
> - if (BDRV_ZT_IS_CONV(bs->bl.wps->wp[index])) {
> + if (BDRV_ZT_IS_CONV(bs->wps->wp[index])) {
> 
> Please let me know if you prefer a new version or not.

Yes, please.

Stefan


signature.asc
Description: PGP signature

Re: [PATCH 2/4] block: Split padded I/O vectors exceeding IOV_MAX

2023-04-04 Thread Hanna Czenczek


On 04.04.23 10:10, Vladimir Sementsov-Ogievskiy wrote:

On 03.04.23 16:33, Hanna Czenczek wrote:

(Sorry for the rather late reply... Thanks for the review!)

On 20.03.23 11:31, Vladimir Sementsov-Ogievskiy wrote:

On 17.03.23 20:50, Hanna Czenczek wrote:


[...]


diff --git a/block/io.c b/block/io.c
index 8974d46941..1e9cdba17a 100644
--- a/block/io.c
+++ b/block/io.c


[..]


+    pad->write = write;
+
  return true;
  }
  @@ -1545,6 +1561,18 @@ zero_mem:
    static void bdrv_padding_destroy(BdrvRequestPadding *pad)


Maybe, rename to _finalize, to stress that it's not only freeing 
memory.


Sounds good!

[...]

@@ -1552,6 +1580,101 @@ static void 
bdrv_padding_destroy(BdrvRequestPadding *pad)

  memset(pad, 0, sizeof(*pad));
  }
  +/*
+ * Create pad->local_qiov by wrapping @iov in the padding head and 
tail, while
+ * ensuring that the resulting vector will not exceed IOV_MAX 
elements.

+ *
+ * To ensure this, when necessary, the first couple of elements 
(up to three)


maybe, "first two-three elements"


Sure (here and...

[...]


+    /*
+ * If padded_niov > IOV_MAX, we cannot just concatenate 
everything.
+ * Instead, merge the first couple of elements of @iov to 
reduce the number


maybe, "first two-three elements"


...here).




+ * of vector elements as necessary.
+ */
+    if (padded_niov > IOV_MAX) {



[..]

@@ -1653,8 +1786,8 @@ int coroutine_fn 
bdrv_co_preadv_part(BdrvChild *child,

  flags |= BDRV_REQ_COPY_ON_READ;
  }
  -    ret = bdrv_pad_request(bs, &qiov, &qiov_offset, &offset, 
&bytes, &pad,

-   NULL, &flags);
+    ret = bdrv_pad_request(bs, &qiov, &qiov_offset, &offset, 
&bytes, false,

+   &pad, NULL, &flags);
  if (ret < 0) {
  goto fail;
  }


a bit later:

tracked_request_end(&req);
bdrv_padding_destroy(&pad);


Now, the request is formally finished inside 
bdrv_padding_destroy().. Not sure, does it really violate something, 
but seems safer to swap these two calls. 


I’d rather not, for two reasons: First, tracked requests are (as far 
as I understand) only there to implement request serialization, and 
so only care about metadata (offset, length, and type), which is not 
changed by changes to the I/O vector.


Second, even if the state of the I/O vector were relevant to tracked 
requests, I think it would actually be the other way around, i.e. the 
tracked request must be ended before the padding is 
finalized/destroyed.  The tracked request is about the actual request 
we submit to `child` (which is why tracked_request_begin() is called 
after bdrv_pad_request()), and that request is done using the 
modified I/O vector.  So if the tracked request had any connection to 
the request’s I/O vector (which it doesn’t), it would be to this 
modified one, so we mustn’t invalidate it via bdrv_padding_finalize() 
while the tracked request lives.


Or, said differently: I generally try to clean up things in the 
inverse way they were set up, and because bdrv_pad_requests() comes 
before tracked_request_begin(), I think tracked_request_end() should 
come before bdrv_padding_finalize().


Note, that it's wise-versa in bdrv_co_pwritev_part().


Well, and it’s this way here.  We agree that for clean-up, the order 
doesn’t functionally matter, so either way is actually fine.


For me it's just simpler to think that the whole request, including 
filling user-given qiov with data on read part is inside 
tracked_request_begin() / tracked_request_end().


It isn’t, though, because padding must be done before the tracked 
request is created.  The tracked request uses the request’s actual 
offset and length, after padding, so bdrv_pad_request() must always be 
done before (i.e., outside) tracked_request_begin().


And moving the last manipulation with qiov out of it breaks this 
simple thought.
Guest should not care of it, as it doesn't know about request 
tracking.. But what about internal code? Some code may depend on some 
requests be finished after bdrv_drained_begin() call, but now they may 
be not fully finished, and some data may be not copied back to 
original qiov.


I agree with your point about sequence of objects finalization, but 
maybe, that just shows that copying data back to qiov should not be a 
part of bdrv_padding_finalize(), but instead be a separate function, 
called before tracked_request_end().


But my thought is that copying back shouldn’t be done before 
tracked_request_end(), because copying back is not part of the tracked 
request.  What we track is the padded request, which uses a modified I/O 
vector, so undoing that modification shouldn’t be done while the tracked 
request lives.


I know I’m inconsistent with regards to bdrv_co_pwritev_part(), which is 
because it doesn’t matter.  My actual position is that tracked requests 
are about metadata, so undoing/finalizing the padding (including 
potentially copying data back) has nothing to do with a tracked request

Re: [PATCH 01/16] qga/qapi-schema: Tidy up documentation of guest-fsfreeze-status

2023-04-04 Thread Konstantin Kostiuk

Reviewed-by: Konstantin Kostiuk 

On Tue, Apr 4, 2023 at 2:59 PM Markus Armbruster  wrote:

> Delete "error state indicates", because it doesn't make sense.
> I suspect it was an accident.
>
> Signed-off-by: Markus Armbruster 
> ---
>  qga/qapi-schema.json | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/qga/qapi-schema.json b/qga/qapi-schema.json
> index 796434ed34..f349345116 100644
> --- a/qga/qapi-schema.json
> +++ b/qga/qapi-schema.json
> @@ -420,7 +420,7 @@
>  ##
>  # @guest-fsfreeze-status:
>  #
> -# Get guest fsfreeze state. error state indicates
> +# Get guest fsfreeze state.
>  #
>  # Returns: GuestFsfreezeStatus ("thawed", "frozen", etc., as defined
> below)
>  #
> --
> 2.39.2
>
>

Re: [PATCH] aio-wait: avoid AioContext lock in aio_wait_bh_oneshot()

2023-04-04 Thread Paolo Bonzini


On 4/4/23 17:33, Stefan Hajnoczi wrote:

There is no need for the AioContext lock in aio_wait_bh_oneshot().
It's easy to remove the lock from existing callers and then switch from
AIO_WAIT_WHILE() to AIO_WAIT_WHILE_UNLOCKED() in aio_wait_bh_oneshot().

Document that the AioContext lock should not be held across
aio_wait_bh_oneshot(). Holding a lock across aio_poll() can cause
deadlock so we don't want callers to do that.

This is a step towards getting rid of the AioContext lock.


Reviewed-by: Paolo Bonzini 

Paolo

Re: [PATCH v2 05/11] qemu-options: finesse the recommendations around -blockdev

2023-04-04 Thread Kevin Wolf

Am 04.04.2023 um 17:07 hat Michael Tokarev geschrieben:
> 04.04.2023 16:57, Kevin Wolf пишет:
> > Let's not make the use of -drive look more advisable than it really is.
> > If you're writing a management tool/script and you're still using -drive
> > today, you're doing it wrong.
> 
> Kevin, maybe I'm wrong here, but what to do with the situation which
> started it all, -- with -snapshot?
> 
> If anything, I think there should be a bold note that -snapshot is
> broken by -blockdev.  Users are learning that the *hard* way, after
> losing their data..

Ah, I missed this context.

Maybe -snapshot should error out if -blockdev is in use. You'd generally
expect that either -blockdev is used primarily and snapshots are done
externally (if the command line is generated by some management tool),
or that -drive is used consistently (by a human who likes the
convenience). In both cases, we wouldn't hit the error path.

There may be some exceptional cases where you have both -drive and
-blockdev (maybe because a human users needs more control for one
specific disk). This is the case where you can get a nasty surprise and
that would error out. If you legitimately want the -drive images
snapshotted, but not the -blockdev ones, you can still use individual
'-drive snapshot=on' options instead of the global '-snapshot' (and the
error message should mention this).

Would you see any problems with such an approach?

Kevin

Re: [PATCH v9 0/5] Add zoned storage emulation to virtio-blk driver

2023-04-04 Thread Sam Li

Stefan Hajnoczi  于2023年4月3日周一 20:18写道：
>
> On Wed, 29 Mar 2023 at 01:01, Michael S. Tsirkin  wrote:
> >
> > On Mon, Mar 27, 2023 at 10:45:48PM +0800, Sam Li wrote:
> >
> > virtio bits look ok.
> >
> > Reviewed-by: Michael S. Tsirkin 
> >
> > merge through block layer tree I'm guessing?
>
> Sounds good. Thank you!

Hi Stefan,

I've sent the v8 zone append write to the list where I move the wps
field to BlockDriverState. It will make a small change the emulation
code, which is in hw/block/virtio-blk.c of [2/5] virtio-blk: add zoned
storage emulation for zoned devices:
- if (BDRV_ZT_IS_CONV(bs->bl.wps->wp[index])) {
+ if (BDRV_ZT_IS_CONV(bs->wps->wp[index])) {

Please let me know if you prefer a new version or not.

Thanks,
Sam

[PATCH] aio-wait: avoid AioContext lock in aio_wait_bh_oneshot()

2023-04-04 Thread Stefan Hajnoczi

There is no need for the AioContext lock in aio_wait_bh_oneshot().
It's easy to remove the lock from existing callers and then switch from
AIO_WAIT_WHILE() to AIO_WAIT_WHILE_UNLOCKED() in aio_wait_bh_oneshot().

Document that the AioContext lock should not be held across
aio_wait_bh_oneshot(). Holding a lock across aio_poll() can cause
deadlock so we don't want callers to do that.

This is a step towards getting rid of the AioContext lock.

Cc: Paolo Bonzini 
Signed-off-by: Stefan Hajnoczi 
---
 include/block/aio-wait.h| 2 +-
 hw/block/dataplane/virtio-blk.c | 3 ++-
 hw/scsi/virtio-scsi-dataplane.c | 2 --
 util/aio-wait.c | 2 +-
 4 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/include/block/aio-wait.h b/include/block/aio-wait.h
index da13357bb8..fdd14294ec 100644
--- a/include/block/aio-wait.h
+++ b/include/block/aio-wait.h
@@ -131,7 +131,7 @@ void aio_wait_kick(void);
  *
  * Run a BH in @ctx and wait for it to complete.
  *
- * Must be called from the main loop thread with @ctx acquired exactly once.
+ * Must be called from the main loop thread without @ctx acquired.
  * Note that main loop event processing may occur.
  */
 void aio_wait_bh_oneshot(AioContext *ctx, QEMUBHFunc *cb, void *opaque);
diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c
index b28d81737e..e0111efd6d 100644
--- a/hw/block/dataplane/virtio-blk.c
+++ b/hw/block/dataplane/virtio-blk.c
@@ -314,9 +314,10 @@ void virtio_blk_data_plane_stop(VirtIODevice *vdev)
 s->stopping = true;
 trace_virtio_blk_data_plane_stop(s);
 
-aio_context_acquire(s->ctx);
 aio_wait_bh_oneshot(s->ctx, virtio_blk_data_plane_stop_bh, s);
 
+aio_context_acquire(s->ctx);
+
 /* Wait for virtio_blk_dma_restart_bh() and in flight I/O to complete */
 blk_drain(s->conf->conf.blk);
 
diff --git a/hw/scsi/virtio-scsi-dataplane.c b/hw/scsi/virtio-scsi-dataplane.c
index 20bb91766e..f3214e1c57 100644
--- a/hw/scsi/virtio-scsi-dataplane.c
+++ b/hw/scsi/virtio-scsi-dataplane.c
@@ -197,9 +197,7 @@ void virtio_scsi_dataplane_stop(VirtIODevice *vdev)
 }
 s->dataplane_stopping = true;
 
-aio_context_acquire(s->ctx);
 aio_wait_bh_oneshot(s->ctx, virtio_scsi_dataplane_stop_bh, s);
-aio_context_release(s->ctx);
 
 blk_drain_all(); /* ensure there are no in-flight requests */
 
diff --git a/util/aio-wait.c b/util/aio-wait.c
index 98c5accd29..b5336cf5fd 100644
--- a/util/aio-wait.c
+++ b/util/aio-wait.c
@@ -82,5 +82,5 @@ void aio_wait_bh_oneshot(AioContext *ctx, QEMUBHFunc *cb, 
void *opaque)
 assert(qemu_get_current_aio_context() == qemu_get_aio_context());
 
 aio_bh_schedule_oneshot(ctx, aio_wait_bh, &data);
-AIO_WAIT_WHILE(ctx, !data.done);
+AIO_WAIT_WHILE_UNLOCKED(NULL, !data.done);
 }
-- 
2.39.2

[PATCH v8 4/4] block: add some trace events for zone append

2023-04-04 Thread Sam Li

Signed-off-by: Sam Li 
Reviewed-by: Dmitry Fomichev 
Reviewed-by: Stefan Hajnoczi 
---
 block/file-posix.c | 3 +++
 block/trace-events | 2 ++
 2 files changed, 5 insertions(+)

diff --git a/block/file-posix.c b/block/file-posix.c
index a7130b1024..825301467e 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -2502,6 +2502,8 @@ out:
 if (!BDRV_ZT_IS_CONV(*wp)) {
 if (type & QEMU_AIO_ZONE_APPEND) {
 *s->offset = *wp;
+trace_zbd_zone_append_complete(bs, *s->offset
+>> BDRV_SECTOR_BITS);
 }
 /* Advance the wp if needed */
 if (offset + bytes > *wp) {
@@ -3546,6 +3548,7 @@ static int coroutine_fn 
raw_co_zone_append(BlockDriverState *bs,
 len += iov_len;
 }
 
+trace_zbd_zone_append(bs, *offset >> BDRV_SECTOR_BITS);
 return raw_co_prw(bs, *offset, len, qiov, QEMU_AIO_ZONE_APPEND);
 }
 #endif
diff --git a/block/trace-events b/block/trace-events
index 3f4e1d088a..32665158d6 100644
--- a/block/trace-events
+++ b/block/trace-events
@@ -211,6 +211,8 @@ file_hdev_is_sg(int type, int version) "SG device found: 
type=%d, version=%d"
 file_flush_fdatasync_failed(int err) "errno %d"
 zbd_zone_report(void *bs, unsigned int nr_zones, int64_t sector) "bs %p report 
%d zones starting at sector offset 0x%" PRIx64 ""
 zbd_zone_mgmt(void *bs, const char *op_name, int64_t sector, int64_t len) "bs 
%p %s starts at sector offset 0x%" PRIx64 " over a range of 0x%" PRIx64 " 
sectors"
+zbd_zone_append(void *bs, int64_t sector) "bs %p append at sector offset 0x%" 
PRIx64 ""
+zbd_zone_append_complete(void *bs, int64_t sector) "bs %p returns append 
sector 0x%" PRIx64 ""
 
 # ssh.c
 sftp_error(const char *op, const char *ssh_err, int ssh_err_code, int 
sftp_err_code) "%s failed: %s (libssh error code: %d, sftp error code: %d)"
-- 
2.39.2

[PATCH v8 3/4] qemu-iotests: test zone append operation

2023-04-04 Thread Sam Li

The patch tests zone append writes by reporting the zone wp after
the completion of the call. "zap -p" option can print the sector
offset value after completion, which should be the start sector
where the append write begins.

Signed-off-by: Sam Li 
Reviewed-by: Stefan Hajnoczi 
---
 qemu-io-cmds.c | 75 ++
 tests/qemu-iotests/tests/zoned | 16 +++
 tests/qemu-iotests/tests/zoned.out | 16 +++
 3 files changed, 107 insertions(+)

diff --git a/qemu-io-cmds.c b/qemu-io-cmds.c
index f35ea627d7..3f75d2f5a6 100644
--- a/qemu-io-cmds.c
+++ b/qemu-io-cmds.c
@@ -1874,6 +1874,80 @@ static const cmdinfo_t zone_reset_cmd = {
 .oneline = "reset a zone write pointer in zone block device",
 };
 
+static int do_aio_zone_append(BlockBackend *blk, QEMUIOVector *qiov,
+  int64_t *offset, int flags, int *total)
+{
+int async_ret = NOT_DONE;
+
+blk_aio_zone_append(blk, offset, qiov, flags, aio_rw_done, &async_ret);
+while (async_ret == NOT_DONE) {
+main_loop_wait(false);
+}
+
+*total = qiov->size;
+return async_ret < 0 ? async_ret : 1;
+}
+
+static int zone_append_f(BlockBackend *blk, int argc, char **argv)
+{
+int ret;
+bool pflag = false;
+int flags = 0;
+int total = 0;
+int64_t offset;
+char *buf;
+int c, nr_iov;
+int pattern = 0xcd;
+QEMUIOVector qiov;
+
+if (optind > argc - 3) {
+return -EINVAL;
+}
+
+if ((c = getopt(argc, argv, "p")) != -1) {
+pflag = true;
+}
+
+offset = cvtnum(argv[optind]);
+if (offset < 0) {
+print_cvtnum_err(offset, argv[optind]);
+return offset;
+}
+optind++;
+nr_iov = argc - optind;
+buf = create_iovec(blk, &qiov, &argv[optind], nr_iov, pattern,
+   flags & BDRV_REQ_REGISTERED_BUF);
+if (buf == NULL) {
+return -EINVAL;
+}
+ret = do_aio_zone_append(blk, &qiov, &offset, flags, &total);
+if (ret < 0) {
+printf("zone append failed: %s\n", strerror(-ret));
+goto out;
+}
+
+if (pflag) {
+printf("After zap done, the append sector is 0x%" PRIx64 "\n",
+   tosector(offset));
+}
+
+out:
+qemu_io_free(blk, buf, qiov.size,
+ flags & BDRV_REQ_REGISTERED_BUF);
+qemu_iovec_destroy(&qiov);
+return ret;
+}
+
+static const cmdinfo_t zone_append_cmd = {
+.name = "zone_append",
+.altname = "zap",
+.cfunc = zone_append_f,
+.argmin = 3,
+.argmax = 4,
+.args = "offset len [len..]",
+.oneline = "append write a number of bytes at a specified offset",
+};
+
 static int truncate_f(BlockBackend *blk, int argc, char **argv);
 static const cmdinfo_t truncate_cmd = {
 .name   = "truncate",
@@ -2672,6 +2746,7 @@ static void __attribute((constructor)) 
init_qemuio_commands(void)
 qemuio_add_command(&zone_close_cmd);
 qemuio_add_command(&zone_finish_cmd);
 qemuio_add_command(&zone_reset_cmd);
+qemuio_add_command(&zone_append_cmd);
 qemuio_add_command(&truncate_cmd);
 qemuio_add_command(&length_cmd);
 qemuio_add_command(&info_cmd);
diff --git a/tests/qemu-iotests/tests/zoned b/tests/qemu-iotests/tests/zoned
index 56f60616b5..3d23ce9cc1 100755
--- a/tests/qemu-iotests/tests/zoned
+++ b/tests/qemu-iotests/tests/zoned
@@ -82,6 +82,22 @@ echo "(5) resetting the second zone"
 $QEMU_IO $IMG -c "zrs 268435456 268435456"
 echo "After resetting a zone:"
 $QEMU_IO $IMG -c "zrp 268435456 1"
+echo
+echo
+echo "(6) append write" # the physical block size of the device is 4096
+$QEMU_IO $IMG -c "zrp 0 1"
+$QEMU_IO $IMG -c "zap -p 0 0x1000 0x2000"
+echo "After appending the first zone firstly:"
+$QEMU_IO $IMG -c "zrp 0 1"
+$QEMU_IO $IMG -c "zap -p 0 0x1000 0x2000"
+echo "After appending the first zone secondly:"
+$QEMU_IO $IMG -c "zrp 0 1"
+$QEMU_IO $IMG -c "zap -p 268435456 0x1000 0x2000"
+echo "After appending the second zone firstly:"
+$QEMU_IO $IMG -c "zrp 268435456 1"
+$QEMU_IO $IMG -c "zap -p 268435456 0x1000 0x2000"
+echo "After appending the second zone secondly:"
+$QEMU_IO $IMG -c "zrp 268435456 1"
 
 # success, all done
 echo "*** done"
diff --git a/tests/qemu-iotests/tests/zoned.out 
b/tests/qemu-iotests/tests/zoned.out
index b2d061da49..fe53ba4744 100644
--- a/tests/qemu-iotests/tests/zoned.out
+++ b/tests/qemu-iotests/tests/zoned.out
@@ -50,4 +50,20 @@ start: 0x8, len 0x8, cap 0x8, wptr 0x10, 
zcond:14, [type: 2]
 (5) resetting the second zone
 After resetting a zone:
 start: 0x8, len 0x8, cap 0x8, wptr 0x8, zcond:1, [type: 2]
+
+
+(6) append write
+start: 0x0, len 0x8, cap 0x8, wptr 0x0, zcond:1, [type: 2]
+After zap done, the append sector is 0x0
+After appending the first zone firstly:
+start: 0x0, len 0x8, cap 0x8, wptr 0x18, zcond:2, [type: 2]
+After zap done, the append sector is 0x18
+After appending the first zone secondly:
+start: 0x0, len 0x8, cap 0x8, wptr 0x30,

[PATCH v8 1/4] file-posix: add tracking of the zone write pointers

2023-04-04 Thread Sam Li

Since Linux doesn't have a user API to issue zone append operations to
zoned devices from user space, the file-posix driver is modified to add
zone append emulation using regular writes. To do this, the file-posix
driver tracks the wp location of all zones of the device. It uses an
array of uint64_t. The most significant bit of each wp location indicates
if the zone type is conventional zones.

The zones wp can be changed due to the following operations issued:
- zone reset: change the wp to the start offset of that zone
- zone finish: change to the end location of that zone
- write to a zone
- zone append

Signed-off-by: Sam Li 
---
 block/file-posix.c   | 168 ++-
 include/block/block-common.h |  14 +++
 include/block/block_int-common.h |   5 +
 3 files changed, 184 insertions(+), 3 deletions(-)

diff --git a/block/file-posix.c b/block/file-posix.c
index 65efe5147e..bc58f7193b 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -1324,6 +1324,88 @@ static int hdev_get_max_segments(int fd, struct stat *st)
 #endif
 }
 
+#if defined(CONFIG_BLKZONED)
+/*
+ * If the reset_all flag is true, then the wps of zone whose state is
+ * not readonly or offline should be all reset to the start sector.
+ * Else, take the real wp of the device.
+ */
+static int get_zones_wp(int fd, BlockZoneWps *wps, int64_t offset,
+unsigned int nrz, bool reset_all)
+{
+struct blk_zone *blkz;
+size_t rep_size;
+uint64_t sector = offset >> BDRV_SECTOR_BITS;
+int ret, n = 0, i = 0;
+rep_size = sizeof(struct blk_zone_report) + nrz * sizeof(struct blk_zone);
+g_autofree struct blk_zone_report *rep = NULL;
+
+rep = g_malloc(rep_size);
+blkz = (struct blk_zone *)(rep + 1);
+while (n < nrz) {
+memset(rep, 0, rep_size);
+rep->sector = sector;
+rep->nr_zones = nrz - n;
+
+do {
+ret = ioctl(fd, BLKREPORTZONE, rep);
+} while (ret != 0 && errno == EINTR);
+if (ret != 0) {
+error_report("%d: ioctl BLKREPORTZONE at %" PRId64 " failed %d",
+fd, offset, errno);
+return -errno;
+}
+
+if (!rep->nr_zones) {
+break;
+}
+
+for (i = 0; i < rep->nr_zones; i++, n++) {
+/*
+ * The wp tracking cares only about sequential writes required and
+ * sequential write preferred zones so that the wp can advance to
+ * the right location.
+ * Use the most significant bit of the wp location to indicate the
+ * zone type: 0 for SWR/SWP zones and 1 for conventional zones.
+ */
+if (blkz[i].type == BLK_ZONE_TYPE_CONVENTIONAL) {
+wps->wp[i] &= 1ULL << 63;
+} else {
+switch(blkz[i].cond) {
+case BLK_ZONE_COND_FULL:
+case BLK_ZONE_COND_READONLY:
+/* Zone not writable */
+wps->wp[i] = (blkz[i].start + blkz[i].len) << 
BDRV_SECTOR_BITS;
+break;
+case BLK_ZONE_COND_OFFLINE:
+/* Zone not writable nor readable */
+wps->wp[i] = (blkz[i].start) << BDRV_SECTOR_BITS;
+break;
+default:
+if (reset_all) {
+wps->wp[i] = blkz[i].start << BDRV_SECTOR_BITS;
+} else {
+wps->wp[i] = blkz[i].wp << BDRV_SECTOR_BITS;
+}
+break;
+}
+}
+}
+sector = blkz[i - 1].start + blkz[i - 1].len;
+}
+
+return 0;
+}
+
+static void update_zones_wp(int fd, BlockZoneWps *wps, int64_t offset,
+unsigned int nrz)
+{
+if (get_zones_wp(fd, wps, offset, nrz, 0) < 0) {
+error_report("update zone wp failed");
+}
+}
+#endif
+
 static void raw_refresh_limits(BlockDriverState *bs, Error **errp)
 {
 BDRVRawState *s = bs->opaque;
@@ -1413,6 +1495,23 @@ static void raw_refresh_limits(BlockDriverState *bs, 
Error **errp)
 if (ret >= 0) {
 bs->bl.max_active_zones = ret;
 }
+
+ret = get_sysfs_long_val(&st, "physical_block_size");
+if (ret >= 0) {
+bs->bl.write_granularity = ret;
+}
+
+/* The refresh_limits() function can be called multiple times. */
+bs->wps = NULL;
+bs->wps = g_malloc(sizeof(BlockZoneWps) +
+sizeof(int64_t) * bs->bl.nr_zones);
+ret = get_zones_wp(s->fd, bs->wps, 0, bs->bl.nr_zones, 0);
+if (ret < 0) {
+error_setg_errno(errp, -ret, "report wps failed");
+bs->wps = NULL;
+return;
+}
+qemu_co_mutex_init(&bs->wps->colock);
 return;
 }
 out:
@@ -2338,9 +2437,15 @@ static int coroutine_fn raw_co_prw(BlockDriverState *bs, 
uint64_t offset,
 {

[PATCH v8 0/4] Add zone append write for zoned device

2023-04-04 Thread Sam Li

This patch series add zone append operation based on the previous
zoned device support part. The file-posix driver is modified to
add zone append emulation using regular writes.

v8:
- address review comments [Stefan]
  * fix zone_mgmt covering multiple zones case
  * fix memory leak bug of wps in refresh_limits()
  * mv BlockZoneWps field from BlockLimits to BlockDriverState
  * add check_qiov_request() to bdrv_co_zone_append

v7:
- address review comments
  * fix wp assignment [Stefan]
  * fix reset_all cases, skip R/O & offline zones [Dmitry, Damien]
  * fix locking on non-zap related cases [Stefan]
  * cleanups and typos correction
- add "zap -p" option to qemuio-cmds [Stefan]

v6:
- add small fixes

v5:
- fix locking conditions and error handling
- drop some trival optimizations
- add tracing points for zone append

v4:
- fix lock related issues[Damien]
- drop all field in zone_mgmt op [Damien]
- fix state checks in zong_mgmt command [Damien]
- return start sector of wp when issuing zap req [Damien]

v3:
- only read wps when it is locked [Damien]
- allow last smaller zone case [Damien]
- add zone type and state checks in zone_mgmt command [Damien]
- fix RESET_ALL related problems

v2:
- split patch to two patches for better reviewing
- change BlockZoneWps's structure to an array of integers
- use only mutex lock on locking conditions of zone wps
- coding styles and clean-ups

v1:
- introduce zone append write

Sam Li (4):
  file-posix: add tracking of the zone write pointers
  block: introduce zone append write for zoned devices
  qemu-iotests: test zone append operation
  block: add some trace events for zone append

 block/block-backend.c  |  60 
 block/file-posix.c | 221 -
 block/io.c |  27 
 block/io_uring.c   |   4 +
 block/linux-aio.c  |   3 +
 block/raw-format.c |   8 ++
 block/trace-events |   2 +
 include/block/block-common.h   |  14 ++
 include/block/block-io.h   |   4 +
 include/block/block_int-common.h   |   8 ++
 include/block/raw-aio.h|   4 +-
 include/sysemu/block-backend-io.h  |   9 ++
 qemu-io-cmds.c |  75 ++
 tests/qemu-iotests/tests/zoned |  16 +++
 tests/qemu-iotests/tests/zoned.out |  16 +++
 15 files changed, 464 insertions(+), 7 deletions(-)

-- 
2.39.2

[PATCH v8 2/4] block: introduce zone append write for zoned devices

2023-04-04 Thread Sam Li

A zone append command is a write operation that specifies the first
logical block of a zone as the write position. When writing to a zoned
block device using zone append, the byte offset of the call may point at
any position within the zone to which the data is being appended. Upon
completion the device will respond with the position where the data has
been written in the zone.

Signed-off-by: Sam Li 
Reviewed-by: Dmitry Fomichev 
---
 block/block-backend.c | 60 +++
 block/file-posix.c| 56 +
 block/io.c| 27 ++
 block/io_uring.c  |  4 +++
 block/linux-aio.c |  3 ++
 block/raw-format.c|  8 +
 include/block/block-io.h  |  4 +++
 include/block/block_int-common.h  |  3 ++
 include/block/raw-aio.h   |  4 ++-
 include/sysemu/block-backend-io.h |  9 +
 10 files changed, 171 insertions(+), 7 deletions(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index f70b08e3f6..bcb3a1eff0 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -1888,6 +1888,45 @@ BlockAIOCB *blk_aio_zone_mgmt(BlockBackend *blk, 
BlockZoneOp op,
 return &acb->common;
 }
 
+static void coroutine_fn blk_aio_zone_append_entry(void *opaque)
+{
+BlkAioEmAIOCB *acb = opaque;
+BlkRwCo *rwco = &acb->rwco;
+
+rwco->ret = blk_co_zone_append(rwco->blk, (int64_t *)acb->bytes,
+   rwco->iobuf, rwco->flags);
+blk_aio_complete(acb);
+}
+
+BlockAIOCB *blk_aio_zone_append(BlockBackend *blk, int64_t *offset,
+QEMUIOVector *qiov, BdrvRequestFlags flags,
+BlockCompletionFunc *cb, void *opaque) {
+BlkAioEmAIOCB *acb;
+Coroutine *co;
+IO_CODE();
+
+blk_inc_in_flight(blk);
+acb = blk_aio_get(&blk_aio_em_aiocb_info, blk, cb, opaque);
+acb->rwco = (BlkRwCo) {
+.blk= blk,
+.ret= NOT_DONE,
+.flags  = flags,
+.iobuf  = qiov,
+};
+acb->bytes = (int64_t)offset;
+acb->has_returned = false;
+
+co = qemu_coroutine_create(blk_aio_zone_append_entry, acb);
+aio_co_enter(blk_get_aio_context(blk), co);
+acb->has_returned = true;
+if (acb->rwco.ret != NOT_DONE) {
+replay_bh_schedule_oneshot_event(blk_get_aio_context(blk),
+ blk_aio_complete_bh, acb);
+}
+
+return &acb->common;
+}
+
 /*
  * Send a zone_report command.
  * offset is a byte offset from the start of the device. No alignment
@@ -1939,6 +1978,27 @@ int coroutine_fn blk_co_zone_mgmt(BlockBackend *blk, 
BlockZoneOp op,
 return ret;
 }
 
+/*
+ * Send a zone_append command.
+ */
+int coroutine_fn blk_co_zone_append(BlockBackend *blk, int64_t *offset,
+QEMUIOVector *qiov, BdrvRequestFlags flags)
+{
+int ret;
+IO_CODE();
+
+blk_inc_in_flight(blk);
+blk_wait_while_drained(blk);
+if (!blk_is_available(blk)) {
+blk_dec_in_flight(blk);
+return -ENOMEDIUM;
+}
+
+ret = bdrv_co_zone_append(blk_bs(blk), offset, qiov, flags);
+blk_dec_in_flight(blk);
+return ret;
+}
+
 void blk_drain(BlockBackend *blk)
 {
 BlockDriverState *bs = blk_bs(blk);
diff --git a/block/file-posix.c b/block/file-posix.c
index bc58f7193b..a7130b1024 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -160,6 +160,7 @@ typedef struct BDRVRawState {
 bool has_write_zeroes:1;
 bool use_linux_aio:1;
 bool use_linux_io_uring:1;
+int64_t *offset; /* offset of zone append operation */
 int page_cache_inconsistent; /* errno from fdatasync failure */
 bool has_fallocate;
 bool needs_alignment;
@@ -1685,7 +1686,7 @@ static ssize_t handle_aiocb_rw_vector(RawPosixAIOData 
*aiocb)
 ssize_t len;
 
 len = RETRY_ON_EINTR(
-(aiocb->aio_type & QEMU_AIO_WRITE) ?
+(aiocb->aio_type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND)) ?
 qemu_pwritev(aiocb->aio_fildes,
aiocb->io.iov,
aiocb->io.niov,
@@ -1714,7 +1715,7 @@ static ssize_t handle_aiocb_rw_linear(RawPosixAIOData 
*aiocb, char *buf)
 ssize_t len;
 
 while (offset < aiocb->aio_nbytes) {
-if (aiocb->aio_type & QEMU_AIO_WRITE) {
+if (aiocb->aio_type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND)) {
 len = pwrite(aiocb->aio_fildes,
  (const char *)buf + offset,
  aiocb->aio_nbytes - offset,
@@ -1807,7 +1808,7 @@ static int handle_aiocb_rw(void *opaque)
 }
 
 nbytes = handle_aiocb_rw_linear(aiocb, buf);
-if (!(aiocb->aio_type & QEMU_AIO_WRITE)) {
+if (!(aiocb->aio_type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND))) {
 char *p = buf;
 size_t count = aiocb->aio_nbytes, copy;
 int i;
@@ -2442,8 +2443,12 @@ static int coroutine_fn raw_co_prw(BlockDriverState *bs, 
ui

[PULL 1/1] nbd/server: Request TCP_NODELAY

2023-04-04 Thread Eric Blake

Nagle's algorithm adds latency in order to reduce network packet
overhead on small packets.  But when we are already using corking to
merge smaller packets into transactional requests, the extra delay
from TCP defaults just gets in the way (see recent commit bd2cd4a4).

For reference, qemu as an NBD client already requests TCP_NODELAY (see
nbd_connect() in nbd/client-connection.c); as does libnbd as a client
[1], and nbdkit as a server [2].  Furthermore, the NBD spec recommends
the use of TCP_NODELAY [3].

[1] 
https://gitlab.com/nbdkit/libnbd/-/blob/a48a1142/generator/states-connect.c#L39
[2] https://gitlab.com/nbdkit/nbdkit/-/blob/45b72f5b/server/sockets.c#L430
[3] 
https://github.com/NetworkBlockDevice/nbd/blob/master/doc/proto.md#protocol-phases

CC: Florian Westphal 
Signed-off-by: Eric Blake 
Message-Id: <20230404004047.142086-1-ebl...@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé 
---
 nbd/server.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/nbd/server.c b/nbd/server.c
index 848836d4140..3d8d0d81df2 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -2758,6 +2758,7 @@ void nbd_client_new(QIOChannelSocket *sioc,
 }
 client->tlsauthz = g_strdup(tlsauthz);
 client->sioc = sioc;
+qio_channel_set_delay(QIO_CHANNEL(sioc), false);
 object_ref(OBJECT(client->sioc));
 client->ioc = QIO_CHANNEL(sioc);
 object_ref(OBJECT(client->ioc));
-- 
2.39.2

Re: s390x TCG migration failure

2023-04-04 Thread Thomas Huth


On 29/03/2023 08.36, Thomas Huth wrote:

On 29/03/2023 00.21, Nina Schoetterl-Glausch wrote:

On Tue, 2023-03-28 at 15:01 +0200, Thomas Huth wrote:

On 24/03/2023 19.41, Nina Schoetterl-Glausch wrote:

Hi,

We're seeing failures running s390x migration kvm-unit-tests tests with 
TCG.

Some initial findings:
What seems to be happening is that after migration a control block 
header accessed by the test code is all zeros which causes an unexpected 
exception.
I did a bisection which points to c8df4a7aef ("migration: Split 
save_live_pending() into state_pending_*") as the culprit.
The migration issue persists after applying the fix e264705012 
("migration: I messed state_pending_exact/estimate") on top of c8df4a7aef.


   Hi Nina,

could you please open a ticket in the QEMU bug tracker and add the "8.0"
label there, so that it correctly shows up in the list of things that should
be fixed before the 8.0 release?


https://gitlab.com/qemu-project/qemu/-/issues/1565 


Thanks!


Ping!

Juan, did you have a chance to look at this issue yet? ... we're getting 
quite close to the 8.0 release, and IMHO this sounds like a critical bug?


 Thomas

Re: [PATCH 01/13] virtio-scsi: avoid race between unplug and transport event

2023-04-04 Thread Michael S. Tsirkin

On Mon, Apr 03, 2023 at 02:29:52PM -0400, Stefan Hajnoczi wrote:
> Only report a transport reset event to the guest after the SCSIDevice
> has been unrealized by qdev_simple_device_unplug_cb().
> 
> qdev_simple_device_unplug_cb() sets the SCSIDevice's qdev.realized field
> to false so that scsi_device_find/get() no longer see it.
> 
> scsi_target_emulate_report_luns() also needs to be updated to filter out
> SCSIDevices that are unrealized.
> 
> These changes ensure that the guest driver does not see the SCSIDevice
> that's being unplugged if it responds very quickly to the transport
> reset event.
> 
> Signed-off-by: Stefan Hajnoczi 


Reviewed-by: Michael S. Tsirkin 

Feel free to merge.

> ---
>  hw/scsi/scsi-bus.c|  3 ++-
>  hw/scsi/virtio-scsi.c | 18 +-
>  2 files changed, 11 insertions(+), 10 deletions(-)
> 
> diff --git a/hw/scsi/scsi-bus.c b/hw/scsi/scsi-bus.c
> index c97176110c..f9bd064833 100644
> --- a/hw/scsi/scsi-bus.c
> +++ b/hw/scsi/scsi-bus.c
> @@ -487,7 +487,8 @@ static bool scsi_target_emulate_report_luns(SCSITargetReq 
> *r)
>  DeviceState *qdev = kid->child;
>  SCSIDevice *dev = SCSI_DEVICE(qdev);
>  
> -if (dev->channel == channel && dev->id == id && dev->lun != 0) {
> +if (dev->channel == channel && dev->id == id && dev->lun != 0 &&
> +qatomic_load_acquire(&dev->qdev.realized)) {
>  store_lun(tmp, dev->lun);
>  g_byte_array_append(buf, tmp, 8);
>  len += 8;
> diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
> index 612c525d9d..000961446c 100644
> --- a/hw/scsi/virtio-scsi.c
> +++ b/hw/scsi/virtio-scsi.c
> @@ -1063,15 +1063,6 @@ static void virtio_scsi_hotunplug(HotplugHandler 
> *hotplug_dev, DeviceState *dev,
>  SCSIDevice *sd = SCSI_DEVICE(dev);
>  AioContext *ctx = s->ctx ?: qemu_get_aio_context();
>  
> -if (virtio_vdev_has_feature(vdev, VIRTIO_SCSI_F_HOTPLUG)) {
> -virtio_scsi_acquire(s);
> -virtio_scsi_push_event(s, sd,
> -   VIRTIO_SCSI_T_TRANSPORT_RESET,
> -   VIRTIO_SCSI_EVT_RESET_REMOVED);
> -scsi_bus_set_ua(&s->bus, SENSE_CODE(REPORTED_LUNS_CHANGED));
> -virtio_scsi_release(s);
> -}
> -
>  aio_disable_external(ctx);
>  qdev_simple_device_unplug_cb(hotplug_dev, dev, errp);
>  aio_enable_external(ctx);
> @@ -1082,6 +1073,15 @@ static void virtio_scsi_hotunplug(HotplugHandler 
> *hotplug_dev, DeviceState *dev,
>  blk_set_aio_context(sd->conf.blk, qemu_get_aio_context(), NULL);
>  virtio_scsi_release(s);
>  }
> +
> +if (virtio_vdev_has_feature(vdev, VIRTIO_SCSI_F_HOTPLUG)) {
> +virtio_scsi_acquire(s);
> +virtio_scsi_push_event(s, sd,
> +   VIRTIO_SCSI_T_TRANSPORT_RESET,
> +   VIRTIO_SCSI_EVT_RESET_REMOVED);
> +scsi_bus_set_ua(&s->bus, SENSE_CODE(REPORTED_LUNS_CHANGED));
> +virtio_scsi_release(s);
> +}
>  }
>  
>  static struct SCSIBusInfo virtio_scsi_scsi_info = {
> -- 
> 2.39.2

Re: [PATCH v2 05/11] qemu-options: finesse the recommendations around -blockdev

2023-04-04 Thread Michael Tokarev


04.04.2023 16:57, Kevin Wolf пишет:

Let's not make the use of -drive look more advisable than it really is.
If you're writing a management tool/script and you're still using -drive
today, you're doing it wrong.


Kevin, maybe I'm wrong here, but what to do with the situation which
started it all, -- with -snapshot?

If anything, I think there should be a bold note that -snapshot is broken
by -blockdev.  Users are learning that the *hard* way, after losing their
data..

/mjt

[PULL 07/10] tests/qemu-iotests: explicitly invoke 'check' via 'python'

2023-04-04 Thread Alex Bennée

From: Daniel P. Berrangé 

The 'check' script will use "#!/usr/bin/env python3" by default
to locate python, but this doesn't work in distros which lack a
bare 'python3' binary like NetBSD.

We need to explicitly invoke 'check' by referring to the 'python'
variable in meson, which resolves to the detected python binary
that QEMU intends to use.

This fixes a regression introduced by

  commit 51ab5f8bd795d8980351f8531e54995ff9e6d163
  Author: Daniel P. Berrangé 
  Date:   Wed Mar 15 17:43:23 2023 +

iotests: register each I/O test separately with meson

Signed-off-by: Daniel P. Berrangé 
Reviewed-by: Paolo Bonzini 
Reviewed-by: Philippe Mathieu-Daudé 
Message-Id: <20230329124539.822022-1-berra...@redhat.com>
Signed-off-by: Alex Bennée 
Reviewed-by: Thomas Huth 
Message-Id: <20230403134920.2132362-9-alex.ben...@linaro.org>

diff --git a/tests/qemu-iotests/meson.build b/tests/qemu-iotests/meson.build
index a162f683ef..9735071a29 100644
--- a/tests/qemu-iotests/meson.build
+++ b/tests/qemu-iotests/meson.build
@@ -47,19 +47,20 @@ foreach format, speed: qemu_iotests_formats
   endif
 
   rc = run_command(
-  [qemu_iotests_check_cmd] + args + ['-n'],
+  [python, qemu_iotests_check_cmd] + args + ['-n'],
   check: true,
   )
 
   foreach item: rc.stdout().strip().split()
-  args = ['-tap', '-' + format, item,
+  args = [qemu_iotests_check_cmd,
+  '-tap', '-' + format, item,
   '--source-dir', meson.current_source_dir(),
   '--build-dir', meson.current_build_dir()]
   # Some individual tests take as long as 45 seconds
   # Bump the timeout to 3 minutes for some headroom
   # on slow machines to minimize spurious failures
   test('io-' + format + '-' + item,
-   qemu_iotests_check_cmd,
+   python,
args: args,
depends: qemu_iotests_binaries,
env: qemu_iotests_env,
-- 
2.39.2

Re: [PATCH v2 05/11] qemu-options: finesse the recommendations around -blockdev

2023-04-04 Thread Alex Bennée



Kevin Wolf  writes:

> Am 03.04.2023 um 15:49 hat Alex Bennée geschrieben:
>> We are a bit premature in recommending -blockdev/-device as the best
>> way to configure block devices, especially in the common case.
>> Improve the language to hopefully make things clearer.
>> 
>> Suggested-by: Michael Tokarev 
>> Signed-off-by: Alex Bennée 
>> Reviewed-by: Thomas Huth 
>> Message-Id: <20230330101141.30199-5-alex.ben...@linaro.org>
>> ---
>>  qemu-options.hx | 8 ++--
>>  1 file changed, 6 insertions(+), 2 deletions(-)
>> 
>> diff --git a/qemu-options.hx b/qemu-options.hx
>> index 59bdf67a2c..9a69ed838e 100644
>> --- a/qemu-options.hx
>> +++ b/qemu-options.hx
>> @@ -1143,10 +1143,14 @@ have gone through several iterations as the feature 
>> set and complexity
>>  of the block layer have grown. Many online guides to QEMU often
>>  reference older and deprecated options, which can lead to confusion.
>>  
>> -The recommended modern way to describe disks is to use a combination of
>> +The most explicit way to describe disks is to use a combination of
>>  ``-device`` to specify the hardware device and ``-blockdev`` to
>>  describe the backend. The device defines what the guest sees and the
>> -backend describes how QEMU handles the data.
>> +backend describes how QEMU handles the data. The ``-drive`` option
>> +combines the device and backend into a single command line options
>> +which is useful in the majority of cases. Older options like ``-hda``
>> +bake in a lot of assumptions from the days when QEMU was emulating a
>> +legacy PC, they are not recommended for modern configurations.
>
> Let's not make the use of -drive look more advisable than it really is.
> If you're writing a management tool/script and you're still using -drive
> today, you're doing it wrong.
>
> Maybe this is actually the point where we should just clearly define
> that -blockdev is the only supported stable API (like QMP), and that
> -drive etc. are convenient shortcuts for human users with no
> compatibility promise (like HMP).

OK I'll drop this patch from today's PR and await a better description
in due course.

>
> What stopped us from doing so is that there are certain boards that
> don't allow the user to configure the onboard devices, but that look at
> -drive. These wouldn't provide any stable API any more after this
> change. However, if this hasn't been solved in many years, maybe it's
> time to view it as the board's problem, and use this change to motivate
> them to implement ways to configure the devices. Or maybe some don't
> even want to bother with a stable API, who knows.
>
> Kevin


-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro

Re: [PATCH v7 1/4] file-posix: add tracking of the zone write pointers

2023-04-04 Thread Sam Li

Stefan Hajnoczi  于2023年4月4日周二 01:04写道：
>
> On Thu, Mar 23, 2023 at 01:19:04PM +0800, Sam Li wrote:
> > Since Linux doesn't have a user API to issue zone append operations to
> > zoned devices from user space, the file-posix driver is modified to add
> > zone append emulation using regular writes. To do this, the file-posix
> > driver tracks the wp location of all zones of the device. It uses an
> > array of uint64_t. The most significant bit of each wp location indicates
> > if the zone type is conventional zones.
> >
> > The zones wp can be changed due to the following operations issued:
> > - zone reset: change the wp to the start offset of that zone
> > - zone finish: change to the end location of that zone
> > - write to a zone
> > - zone append
> >
> > Signed-off-by: Sam Li 
> > ---
> >  block/file-posix.c   | 168 ++-
> >  include/block/block-common.h |  14 +++
> >  include/block/block_int-common.h |   5 +
> >  3 files changed, 183 insertions(+), 4 deletions(-)
> >
> > diff --git a/block/file-posix.c b/block/file-posix.c
> > index 65efe5147e..0fb425dcae 100644
> > --- a/block/file-posix.c
> > +++ b/block/file-posix.c
> > @@ -1324,6 +1324,85 @@ static int hdev_get_max_segments(int fd, struct stat 
> > *st)
> >  #endif
> >  }
> >
> > +#if defined(CONFIG_BLKZONED)
> > +/*
> > + * If the ra (reset_all) flag > 0, then the wp of that zone should be 
> > reset to
> > + * the start sector. Else, take the real wp of the device.
> > + */
> > +static int get_zones_wp(int fd, BlockZoneWps *wps, int64_t offset,
> > +unsigned int nrz, int ra) {
>
> Please use bool for true/false and use clear variable names:
> int ra -> bool reset_all
>
> > +struct blk_zone *blkz;
> > +size_t rep_size;
> > +uint64_t sector = offset >> BDRV_SECTOR_BITS;
> > +int ret, n = 0, i = 0;
> > +rep_size = sizeof(struct blk_zone_report) + nrz * sizeof(struct 
> > blk_zone);
> > +g_autofree struct blk_zone_report *rep = NULL;
> > +
> > +rep = g_malloc(rep_size);
> > +blkz = (struct blk_zone *)(rep + 1);
> > +while (n < nrz) {
> > +memset(rep, 0, rep_size);
> > +rep->sector = sector;
> > +rep->nr_zones = nrz - n;
> > +
> > +do {
> > +ret = ioctl(fd, BLKREPORTZONE, rep);
> > +} while (ret != 0 && errno == EINTR);
> > +if (ret != 0) {
> > +error_report("%d: ioctl BLKREPORTZONE at %" PRId64 " failed 
> > %d",
> > +fd, offset, errno);
> > +return -errno;
> > +}
> > +
> > +if (!rep->nr_zones) {
> > +break;
> > +}
> > +
> > +for (i = 0; i < rep->nr_zones; i++, n++) {
> > +/*
> > + * The wp tracking cares only about sequential writes required 
> > and
> > + * sequential write preferred zones so that the wp can advance 
> > to
> > + * the right location.
> > + * Use the most significant bit of the wp location to indicate 
> > the
> > + * zone type: 0 for SWR/SWP zones and 1 for conventional zones.
> > + */
> > +if (blkz[i].type == BLK_ZONE_TYPE_CONVENTIONAL) {
> > +wps->wp[i] &= 1ULL << 63;
> > +} else {
> > +switch(blkz[i].cond) {
> > +case BLK_ZONE_COND_FULL:
> > +case BLK_ZONE_COND_READONLY:
> > +/* Zone not writable */
> > +wps->wp[i] = (blkz[i].start + blkz[i].len) << 
> > BDRV_SECTOR_BITS;
> > +break;
> > +case BLK_ZONE_COND_OFFLINE:
> > +/* Zone not writable nor readable */
> > +wps->wp[i] = (blkz[i].start) << BDRV_SECTOR_BITS;
> > +break;
> > +default:
> > +if (ra > 0) {
> > +wps->wp[i] = blkz[i].start << BDRV_SECTOR_BITS;
> > +} else {
> > +wps->wp[i] = blkz[i].wp << BDRV_SECTOR_BITS;
> > +}
> > +break;
> > +}
> > +}
> > +}
> > +sector = blkz[i - 1].start + blkz[i - 1].len;
> > +}
> > +
> > +return 0;
> > +}
> > +
> > +static void update_zones_wp(int fd, BlockZoneWps *wps, int64_t offset,
> > +unsigned int nrz) {
>
> QEMU coding style puts the opening curly bracket on a new line:
>
>   static void update_zones_wp(int fd, BlockZoneWps *wps, int64_t offset,
>   unsigned int nrz)
>   {
>
> > +if (get_zones_wp(fd, wps, offset, nrz, 0) < 0) {
> > +error_report("update zone wp failed");
> > +}
> > +}
> > +#endif
> > +
> >  static void raw_refresh_limits(BlockDriverState *bs, Error **errp)
> >  {
> >  BDRVRawState *s = bs->opaque;
> > @@ -1413,6 +1492,21 @@ static void raw_refresh_limits(BlockDriverState *bs, 
> > Error **errp)
> >

Re: [PATCH v2 05/11] qemu-options: finesse the recommendations around -blockdev

2023-04-04 Thread Kevin Wolf

Am 03.04.2023 um 15:49 hat Alex Bennée geschrieben:
> We are a bit premature in recommending -blockdev/-device as the best
> way to configure block devices, especially in the common case.
> Improve the language to hopefully make things clearer.
> 
> Suggested-by: Michael Tokarev 
> Signed-off-by: Alex Bennée 
> Reviewed-by: Thomas Huth 
> Message-Id: <20230330101141.30199-5-alex.ben...@linaro.org>
> ---
>  qemu-options.hx | 8 ++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/qemu-options.hx b/qemu-options.hx
> index 59bdf67a2c..9a69ed838e 100644
> --- a/qemu-options.hx
> +++ b/qemu-options.hx
> @@ -1143,10 +1143,14 @@ have gone through several iterations as the feature 
> set and complexity
>  of the block layer have grown. Many online guides to QEMU often
>  reference older and deprecated options, which can lead to confusion.
>  
> -The recommended modern way to describe disks is to use a combination of
> +The most explicit way to describe disks is to use a combination of
>  ``-device`` to specify the hardware device and ``-blockdev`` to
>  describe the backend. The device defines what the guest sees and the
> -backend describes how QEMU handles the data.
> +backend describes how QEMU handles the data. The ``-drive`` option
> +combines the device and backend into a single command line options
> +which is useful in the majority of cases. Older options like ``-hda``
> +bake in a lot of assumptions from the days when QEMU was emulating a
> +legacy PC, they are not recommended for modern configurations.

Let's not make the use of -drive look more advisable than it really is.
If you're writing a management tool/script and you're still using -drive
today, you're doing it wrong.

Maybe this is actually the point where we should just clearly define
that -blockdev is the only supported stable API (like QMP), and that
-drive etc. are convenient shortcuts for human users with no
compatibility promise (like HMP).

What stopped us from doing so is that there are certain boards that
don't allow the user to configure the onboard devices, but that look at
-drive. These wouldn't provide any stable API any more after this
change. However, if this hasn't been solved in many years, maybe it's
time to view it as the board's problem, and use this change to motivate
them to implement ways to configure the devices. Or maybe some don't
even want to bother with a stable API, who knows.

Kevin

Re: [PATCH v2 for 8.0?] nbd/server: Request TCP_NODELAY

2023-04-04 Thread Paolo Bonzini


On 4/4/23 02:40, Eric Blake wrote:

Nagle's algorithm adds latency in order to reduce network packet
overhead on small packets.  But when we are already using corking to
merge smaller packets into transactional requests, the extra delay
from TCP defaults just gets in the way (see recent commit bd2cd4a4).

For reference, qemu as an NBD client already requests TCP_NODELAY (see
nbd_connect() in nbd/client-connection.c); as does libnbd as a client
[1], and nbdkit as a server [2].  Furthermore, the NBD spec recommends
the use of TCP_NODELAY [3].

[1] 
https://gitlab.com/nbdkit/libnbd/-/blob/a48a1142/generator/states-connect.c#L39
[2] https://gitlab.com/nbdkit/nbdkit/-/blob/45b72f5b/server/sockets.c#L430
[3] 
https://github.com/NetworkBlockDevice/nbd/blob/master/doc/proto.md#protocol-phases

CC: Florian Westphal 
Signed-off-by: Eric Blake 
Message-Id: <20230327192947.1324372-1-ebl...@redhat.com>
---

v2 fix typo, enhance commit message

Given that corking made it in through Kevin's tree for 8.0-rc2 but
this one did not, but I didn't get any R-b, is there any objection to
me doing a pull request to get this into 8.0-rc3?

  nbd/server.c | 1 +
  1 file changed, 1 insertion(+)

diff --git a/nbd/server.c b/nbd/server.c
index 848836d4140..3d8d0d81df2 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -2758,6 +2758,7 @@ void nbd_client_new(QIOChannelSocket *sioc,
  }
  client->tlsauthz = g_strdup(tlsauthz);
  client->sioc = sioc;
+qio_channel_set_delay(QIO_CHANNEL(sioc), false);
  object_ref(OBJECT(client->sioc));
  client->ioc = QIO_CHANNEL(sioc);
  object_ref(OBJECT(client->ioc));

base-commit: efcd0ec14b0fe9ee0ee70277763b2d538d19238d


Acked-by: Paolo Bonzini

Re: [PATCH] block/nvme: use AIO_WAIT_WHILE_UNLOCKED()

2023-04-04 Thread Paolo Bonzini


On 4/4/23 13:20, Stefan Hajnoczi wrote:

A few Admin Queue commands are submitted during nvme_file_open(). They
are synchronous since device initialization cannot continue until the
commands complete.

AIO_WAIT_WHILE() is currently used, but the block/nvme.c code actually
doesn't rely on the AioContext lock. Replace it with
AIO_WAIT_WHILE_UNLOCKED(NULL, condition). There is no change in behavior
and the dependency on the AioContext lock is eliminated.

This is a step towards removing the AioContext lock.

Signed-off-by: Stefan Hajnoczi 
---
  block/nvme.c | 3 +--
  1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/block/nvme.c b/block/nvme.c
index 5b744c2bda..829b9c04db 100644
--- a/block/nvme.c
+++ b/block/nvme.c
@@ -512,7 +512,6 @@ static int nvme_admin_cmd_sync(BlockDriverState *bs, 
NvmeCmd *cmd)
  {
  BDRVNVMeState *s = bs->opaque;
  NVMeQueuePair *q = s->queues[INDEX_ADMIN];
-AioContext *aio_context = bdrv_get_aio_context(bs);
  NVMeRequest *req;
  int ret = -EINPROGRESS;
  req = nvme_get_free_req_nowait(q);
@@ -521,7 +520,7 @@ static int nvme_admin_cmd_sync(BlockDriverState *bs, 
NvmeCmd *cmd)
  }
  nvme_submit_command(q, req, cmd, nvme_admin_cmd_sync_cb, &ret);
  
-AIO_WAIT_WHILE(aio_context, ret == -EINPROGRESS);

+AIO_WAIT_WHILE_UNLOCKED(NULL, ret == -EINPROGRESS);
  return ret;
  }




Reviewed-by: Paolo Bonzini

Re: [PATCH 01/13] virtio-scsi: avoid race between unplug and transport event

2023-04-04 Thread Paolo Bonzini


On 4/4/23 15:06, Stefan Hajnoczi wrote:

Would this be more useful as a qdev_is_realized() helper?

Yes. There are no other users, but I think a helper makes sense.


Agreed; anyway,

Reviewed-by: Paolo Bonzini 

Paolo

Re: [PATCH 11/13] block/fuse: take AioContext lock around blk_exp_ref/unref()

2023-04-04 Thread Paolo Bonzini


On 4/3/23 20:30, Stefan Hajnoczi wrote:

These functions must be called with the AioContext acquired:

   /* Callers must hold exp->ctx lock */
   void blk_exp_ref(BlockExport *exp)
   ...
   /* Callers must hold exp->ctx lock */
   void blk_exp_unref(BlockExport *exp)

Signed-off-by: Stefan Hajnoczi 
---
  block/export/fuse.c | 4 
  1 file changed, 4 insertions(+)

diff --git a/block/export/fuse.c b/block/export/fuse.c
index 06fa41079e..18394f9e07 100644
--- a/block/export/fuse.c
+++ b/block/export/fuse.c
@@ -244,7 +244,9 @@ static void read_from_fuse_export(void *opaque)
  FuseExport *exp = opaque;
  int ret;
  
+aio_context_acquire(exp->common.ctx);

  blk_exp_ref(&exp->common);
+aio_context_release(exp->common.ctx);
  
  do {

  ret = fuse_session_receive_buf(exp->fuse_session, &exp->fuse_buf);
@@ -256,7 +258,9 @@ static void read_from_fuse_export(void *opaque)
  fuse_session_process_buf(exp->fuse_session, &exp->fuse_buf);
  
  out:

+aio_context_acquire(exp->common.ctx);
  blk_exp_unref(&exp->common);
+aio_context_release(exp->common.ctx);
  }


Since the actual thread-unsafe work is done in a bottom half, perhaps 
instead you can use qatomic_inc and qatomic_fetch_dec in 
blk_exp_{ref,unref}?


Paolo

Re: [PATCH 00/13] block: remove aio_disable_external() API

2023-04-04 Thread Paolo Bonzini


On 4/3/23 20:29, Stefan Hajnoczi wrote:

The aio_disable_external() API temporarily suspends file descriptor monitoring
in the event loop. The block layer uses this to prevent new I/O requests being
submitted from the guest and elsewhere between bdrv_drained_begin() and
bdrv_drained_end().

While the block layer still needs to prevent new I/O requests in drained
sections, the aio_disable_external() API can be replaced with
.drained_begin/end/poll() callbacks that have been added to BdrvChildClass and
BlockDevOps.

This newer .bdrained_begin/end/poll() approach is attractive because it works
without specifying a specific AioContext. The block layer is moving towards
multi-queue and that means multiple AioContexts may be processing I/O
simultaneously.

The aio_disable_external() was always somewhat hacky. It suspends all file
descriptors that were registered with is_external=true, even if they have
nothing to do with the BlockDriverState graph nodes that are being drained.
It's better to solve a block layer problem in the block layer than to have an
odd event loop API solution.

That covers the motivation for this change, now on to the specifics of this
series:

While it would be nice if a single conceptual approach could be applied to all
is_external=true file descriptors, I ended up looking at callers on a
case-by-case basis. There are two general ways I migrated code away from
is_external=true:

1. Block exports are typically best off unregistering fds in .drained_begin()
and registering them again in .drained_end(). The .drained_poll() function
waits for in-flight requests to finish using a reference counter.

2. Emulated storage controllers like virtio-blk and virtio-scsi are a little
simpler. They can rely on BlockBackend's request queuing during drain
feature. Guest I/O request coroutines are suspended in a drained section and
resume upon the end of the drained section.


Sorry, I disagree with this.

Request queuing was shown to cause deadlocks; Hanna's latest patch is 
piling another hack upon it, instead in my opinion we should go in the 
direction of relying _less_ (or not at all) on request queuing.


I am strongly convinced that request queuing must apply only after 
bdrv_drained_begin has returned, which would also fix the IDE TRIM bug 
reported by Fiona Ebner.  The possible livelock scenario is generally 
not a problem because 1) outside an iothread you have anyway the BQL 
that prevents a vCPU from issuing more I/O operations during 
bdrv_drained_begin 2) in iothreads you have aio_disable_external() 
instead of .drained_begin().


It is also less tidy to start a request during the drained_begin phase, 
because a request that has been submitted has to be completed (cancel 
doesn't really work).


So in an ideal world, request queuing would not only apply only after 
bdrv_drained_begin has returned, it would log a warning and 
.drained_begin() should set up things so that there are no such warnings.


Thanks,

Paolo

Re: [PATCH 04/13] util/vhost-user-server: rename refcount to in_flight counter

2023-04-04 Thread Paolo Bonzini


On 4/3/23 20:29, Stefan Hajnoczi wrote:

The VuServer object has a refcount field and ref/unref APIs. The name is
confusing because it's actually an in-flight request counter instead of
a refcount.

Normally a refcount destroys the object upon reaching zero. The VuServer
counter is used to wake up the vhost-user coroutine when there are no
more requests.

Avoid confusing by renaming refcount and ref/unref to in_flight and
inc/dec.

Signed-off-by: Stefan Hajnoczi 
---
  include/qemu/vhost-user-server.h |  6 +++---
  block/export/vhost-user-blk-server.c | 11 +++
  util/vhost-user-server.c | 14 +++---
  3 files changed, 17 insertions(+), 14 deletions(-)

diff --git a/include/qemu/vhost-user-server.h b/include/qemu/vhost-user-server.h
index 25c72433ca..bc0ac9ddb6 100644
--- a/include/qemu/vhost-user-server.h
+++ b/include/qemu/vhost-user-server.h
@@ -41,7 +41,7 @@ typedef struct {
  const VuDevIface *vu_iface;
  
  /* Protected by ctx lock */

-unsigned int refcount;
+unsigned int in_flight;
  bool wait_idle;
  VuDev vu_dev;
  QIOChannel *ioc; /* The I/O channel with the client */
@@ -60,8 +60,8 @@ bool vhost_user_server_start(VuServer *server,
  
  void vhost_user_server_stop(VuServer *server);
  
-void vhost_user_server_ref(VuServer *server);

-void vhost_user_server_unref(VuServer *server);
+void vhost_user_server_inc_in_flight(VuServer *server);
+void vhost_user_server_dec_in_flight(VuServer *server);
  
  void vhost_user_server_attach_aio_context(VuServer *server, AioContext *ctx);

  void vhost_user_server_detach_aio_context(VuServer *server);
diff --git a/block/export/vhost-user-blk-server.c 
b/block/export/vhost-user-blk-server.c
index 3409d9e02e..e93f2ed6b4 100644
--- a/block/export/vhost-user-blk-server.c
+++ b/block/export/vhost-user-blk-server.c
@@ -49,7 +49,10 @@ static void vu_blk_req_complete(VuBlkReq *req, size_t in_len)
  free(req);
  }
  
-/* Called with server refcount increased, must decrease before returning */

+/*
+ * Called with server in_flight counter increased, must decrease before
+ * returning.
+ */
  static void coroutine_fn vu_blk_virtio_process_req(void *opaque)
  {
  VuBlkReq *req = opaque;
@@ -67,12 +70,12 @@ static void coroutine_fn vu_blk_virtio_process_req(void 
*opaque)
  in_num, out_num);
  if (in_len < 0) {
  free(req);
-vhost_user_server_unref(server);
+vhost_user_server_dec_in_flight(server);
  return;
  }
  
  vu_blk_req_complete(req, in_len);

-vhost_user_server_unref(server);
+vhost_user_server_dec_in_flight(server);
  }
  
  static void vu_blk_process_vq(VuDev *vu_dev, int idx)

@@ -94,7 +97,7 @@ static void vu_blk_process_vq(VuDev *vu_dev, int idx)
  Coroutine *co =
  qemu_coroutine_create(vu_blk_virtio_process_req, req);
  
-vhost_user_server_ref(server);

+vhost_user_server_inc_in_flight(server);
  qemu_coroutine_enter(co);
  }
  }
diff --git a/util/vhost-user-server.c b/util/vhost-user-server.c
index 5b6216069c..1622f8cfb3 100644
--- a/util/vhost-user-server.c
+++ b/util/vhost-user-server.c
@@ -75,16 +75,16 @@ static void panic_cb(VuDev *vu_dev, const char *buf)
  error_report("vu_panic: %s", buf);
  }
  
-void vhost_user_server_ref(VuServer *server)

+void vhost_user_server_inc_in_flight(VuServer *server)
  {
  assert(!server->wait_idle);
-server->refcount++;
+server->in_flight++;
  }
  
-void vhost_user_server_unref(VuServer *server)

+void vhost_user_server_dec_in_flight(VuServer *server)
  {
-server->refcount--;
-if (server->wait_idle && !server->refcount) {
+server->in_flight--;
+if (server->wait_idle && !server->in_flight) {
  aio_co_wake(server->co_trip);
  }
  }
@@ -192,13 +192,13 @@ static coroutine_fn void vu_client_trip(void *opaque)
  /* Keep running */
  }
  
-if (server->refcount) {

+if (server->in_flight) {
  /* Wait for requests to complete before we can unmap the memory */
  server->wait_idle = true;
  qemu_coroutine_yield();
  server->wait_idle = false;
  }
-assert(server->refcount == 0);
+assert(server->in_flight == 0);
  
  vu_deinit(vu_dev);
  


Reviewed-by: Paolo Bonzini

Re: [PATCH 02/13] virtio-scsi: stop using aio_disable_external() during unplug

2023-04-04 Thread Paolo Bonzini


On 4/3/23 20:29, Stefan Hajnoczi wrote:

This patch is part of an effort to remove the aio_disable_external()
API because it does not fit in a multi-queue block layer world where
many AioContexts may be submitting requests to the same disk.

The SCSI emulation code is already in good shape to stop using
aio_disable_external(). It was only used by commit 9c5aad84da1c
("virtio-scsi: fixed virtio_scsi_ctx_check failed when detaching scsi
disk") to ensure that virtio_scsi_hotunplug() works while the guest
driver is submitting I/O.

Ensure virtio_scsi_hotunplug() is safe as follows:

1. qdev_simple_device_unplug_cb() -> qdev_unrealize() ->
device_set_realized() calls qatomic_set(&dev->realized, false) so
that future scsi_device_get() calls return NULL because they exclude
SCSIDevices with realized=false.

That means virtio-scsi will reject new I/O requests to this
SCSIDevice with VIRTIO_SCSI_S_BAD_TARGET even while
virtio_scsi_hotunplug() is still executing. We are protected against
new requests!

2. Add a call to scsi_device_purge_requests() from scsi_unrealize() so
that in-flight requests are cancelled synchronously. This ensures
that no in-flight requests remain once qdev_simple_device_unplug_cb()
returns.

Thanks to these two conditions we don't need aio_disable_external()
anymore.

Cc: Zhengui Li 
Signed-off-by: Stefan Hajnoczi 
---
  hw/scsi/scsi-disk.c   | 1 +
  hw/scsi/virtio-scsi.c | 3 ---
  2 files changed, 1 insertion(+), 3 deletions(-)

diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c
index 97c9b1c8cd..e01bd84541 100644
--- a/hw/scsi/scsi-disk.c
+++ b/hw/scsi/scsi-disk.c
@@ -2522,6 +2522,7 @@ static void scsi_realize(SCSIDevice *dev, Error **errp)
  
  static void scsi_unrealize(SCSIDevice *dev)

  {
+scsi_device_purge_requests(dev, SENSE_CODE(RESET));
  del_boot_device_lchs(&dev->qdev, NULL);
  }
  
diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c

index 000961446c..a02f9233ec 100644
--- a/hw/scsi/virtio-scsi.c
+++ b/hw/scsi/virtio-scsi.c
@@ -1061,11 +1061,8 @@ static void virtio_scsi_hotunplug(HotplugHandler 
*hotplug_dev, DeviceState *dev,
  VirtIODevice *vdev = VIRTIO_DEVICE(hotplug_dev);
  VirtIOSCSI *s = VIRTIO_SCSI(vdev);
  SCSIDevice *sd = SCSI_DEVICE(dev);
-AioContext *ctx = s->ctx ?: qemu_get_aio_context();
  
-aio_disable_external(ctx);

  qdev_simple_device_unplug_cb(hotplug_dev, dev, errp);
-aio_enable_external(ctx);
  
  if (s->ctx) {

  virtio_scsi_acquire(s);


Reviewed-by: Paolo Bonzini

Re: [PATCH 15/16] qapi: Format since information the conventional way: (since X.Y)

2023-04-04 Thread Markus Armbruster

Vladimir Sementsov-Ogievskiy  writes:

> On 04.04.23 14:59, Markus Armbruster wrote:
>> @@ -2741,7 +2741,7 @@
>>   #
>>   # @on-error: the action to take on an error (default report).
>>   #'stop' and 'enospc' can only be used if the block device
>> -#supports io-status (see BlockInfo).  Since 1.3.
>> +#supports io-status (see BlockInfo).  (Since 1.3)
>>   #
>>   # @filter-node-name: the node name that should be assigned to the
>>   #filter driver that the stream job inserts into the 
>> graph
>> diff --git a/qapi/stats.json b/qapi/stats.json
>> index f17495ee65..36d5f4dc94 100644
>> --- a/qapi/stats.json
>> +++ b/qapi/stats.json
>> @@ -69,7 +69,7 @@
>>   #
>>   # @vcpu: statistics that apply to a single virtual CPU.
>>   #
>> -# @cryptodev: statistics that apply to a crypto device. since 8.0
>> +# @cryptodev: statistics that apply to a crypto device (since 8.0)
>>   #
>>   # Since: 7.1
>>   ##
>> diff --git a/qapi/tpm.json b/qapi/tpm.json
>> index 4e2ea9756a..eac87d30b2 100644
>> --- a/qapi/tpm.json
>> +++ b/qapi/tpm.json
>> @@ -44,8 +44,7 @@
>>   # An enumeration of TPM types
>>   #
>>   # @passthrough: TPM passthrough type
>> -# @emulator: Software Emulator TPM type
>> -#Since: 2.11
>> +# @emulator: Software Emulator TPM type (since 2.11)
>
> Seems, we don't have any preference between "some text (since VER)" vs "some 
> text. (Since VER)" ?

I personally use (Since VER) after a full sentence ending in
punctuation, and (since VER) after something that doesn't end in
punctuation.

Re: [PATCH 16/16] qapi storage-daemon/qapi: Fix documentation section structure

2023-04-04 Thread zhenwei pi


Looks good to me, thanks!

Acked-by: zhenwei pi 

On 4/4/23 19:59, Markus Armbruster wrote:

In the QEMU QMP Reference Manual, subsection "Block core (VM
unrelated)" is empty.  Its contents is at the end of subsection
"Background jobs" instead.  That's because qapi/job.json is includeded
first from qapi/block-core.json, which makes qapi/job.json's
documentation go between qapi/block-core.json's subsection heading and
contents.

In the QEMU Storage Daemon QMP Reference Manual, section "Block
Devices" contains nothing but an empty subsection "Block core (VM
unrelated)".  The latter's contents is at the end section "Socket data
types", along with subsection "Block device exports".  Subsection
"Background jobs" is at the end of section "Cryptography".  All this
is because storage-daemon/qapi/qapi-schema.json includes modules in a
confused order.

Fix both as follows.

Turn subsection "Background jobs" into a section.

Move it before section "Block devices" in the QEMU QMP Reference
Manual, by including qapi/jobs.json right before qapi/block.json.

Reorder include directives in storage-daemon/qapi/qapi-schema.json to
match the order in qapi/qapi-schema.json, so that the QEMU Storage
Daemon QMP Reference Manual's section structure the QEMU QMP Reference
Manual's.

In the QEMU QMP Reference Manual, qapi/cryptodev.json's documentation
is at the end of section "Virtio devices".  That's because it lacks a
section heading, and therefore gets squashed into whatever section
happens to precede it.

Add section heading so it's in section "Cryptography devices".

Signed-off-by: Markus Armbruster 
---
  qapi/cryptodev.json  |  4 
  qapi/job.json|  2 +-
  qapi/qapi-schema.json|  2 +-
  storage-daemon/qapi/qapi-schema.json | 22 +++---
  4 files changed, 21 insertions(+), 9 deletions(-)

diff --git a/qapi/cryptodev.json b/qapi/cryptodev.json
index f33f96a692..cf960ea81f 100644
--- a/qapi/cryptodev.json
+++ b/qapi/cryptodev.json
@@ -4,6 +4,10 @@
  # This work is licensed under the terms of the GNU GPL, version 2 or later.
  # See the COPYING file in the top-level directory.
  
+##

+# = Cryptography devices
+##
+
  ##
  # @QCryptodevBackendAlgType:
  #
diff --git a/qapi/job.json b/qapi/job.json
index bc4104757a..9e29a796c5 100644
--- a/qapi/job.json
+++ b/qapi/job.json
@@ -2,7 +2,7 @@
  # vim: filetype=python
  
  ##

-# == Background jobs
+# = Background jobs
  ##
  
  ##

diff --git a/qapi/qapi-schema.json b/qapi/qapi-schema.json
index e57d8ff801..bb7217da26 100644
--- a/qapi/qapi-schema.json
+++ b/qapi/qapi-schema.json
@@ -43,11 +43,11 @@
  { 'include': 'sockets.json' }
  { 'include': 'run-state.json' }
  { 'include': 'crypto.json' }
+{ 'include': 'job.json' }
  { 'include': 'block.json' }
  { 'include': 'block-export.json' }
  { 'include': 'char.json' }
  { 'include': 'dump.json' }
-{ 'include': 'job.json' }
  { 'include': 'net.json' }
  { 'include': 'rdma.json' }
  { 'include': 'rocker.json' }
diff --git a/storage-daemon/qapi/qapi-schema.json 
b/storage-daemon/qapi/qapi-schema.json
index 67749d1101..f10c949490 100644
--- a/storage-daemon/qapi/qapi-schema.json
+++ b/storage-daemon/qapi/qapi-schema.json
@@ -15,18 +15,26 @@
  
  { 'include': '../../qapi/pragma.json' }
  
+# Documentation generated with qapi-gen.py is in source order, with

+# included sub-schemas inserted at the first include directive
+# (subsequent include directives have no effect).  To get a sane and
+# stable order, it's best to include each sub-schema just once, or
+# include it first right here.
+
+{ 'include': '../../qapi/common.json' }
+{ 'include': '../../qapi/sockets.json' }
+{ 'include': '../../qapi/crypto.json' }
+{ 'include': '../../qapi/job.json' }
+
  ##
  # = Block devices
  ##
  { 'include': '../../qapi/block-core.json' }
  { 'include': '../../qapi/block-export.json' }
+
  { 'include': '../../qapi/char.json' }
-{ 'include': '../../qapi/common.json' }
-{ 'include': '../../qapi/control.json' }
-{ 'include': '../../qapi/crypto.json' }
-{ 'include': '../../qapi/introspect.json' }
-{ 'include': '../../qapi/job.json' }
  { 'include': '../../qapi/authz.json' }
-{ 'include': '../../qapi/qom.json' }
-{ 'include': '../../qapi/sockets.json' }
  { 'include': '../../qapi/transaction.json' }
+{ 'include': '../../qapi/control.json' }
+{ 'include': '../../qapi/introspect.json' }
+{ 'include': '../../qapi/qom.json' }


--
zhenwei pi

Re: [PATCH 16/16] qapi storage-daemon/qapi: Fix documentation section structure

2023-04-04 Thread Eric Blake

On Tue, Apr 04, 2023 at 01:59:12PM +0200, Markus Armbruster wrote:
> In the QEMU QMP Reference Manual, subsection "Block core (VM
> unrelated)" is empty.  Its contents is at the end of subsection
> "Background jobs" instead.  That's because qapi/job.json is includeded

included

> first from qapi/block-core.json, which makes qapi/job.json's
> documentation go between qapi/block-core.json's subsection heading and
> contents.
> 
> In the QEMU Storage Daemon QMP Reference Manual, section "Block
> Devices" contains nothing but an empty subsection "Block core (VM
> unrelated)".  The latter's contents is at the end section "Socket data
> types", along with subsection "Block device exports".  Subsection
> "Background jobs" is at the end of section "Cryptography".  All this
> is because storage-daemon/qapi/qapi-schema.json includes modules in a
> confused order.
> 
> Fix both as follows.
> 
> Turn subsection "Background jobs" into a section.
> 
> Move it before section "Block devices" in the QEMU QMP Reference
> Manual, by including qapi/jobs.json right before qapi/block.json.
> 
> Reorder include directives in storage-daemon/qapi/qapi-schema.json to
> match the order in qapi/qapi-schema.json, so that the QEMU Storage
> Daemon QMP Reference Manual's section structure the QEMU QMP Reference
> Manual's.
> 
> In the QEMU QMP Reference Manual, qapi/cryptodev.json's documentation
> is at the end of section "Virtio devices".  That's because it lacks a
> section heading, and therefore gets squashed into whatever section
> happens to precede it.
> 
> Add section heading so it's in section "Cryptography devices".
> 
> Signed-off-by: Markus Armbruster 
> ---
>  qapi/cryptodev.json  |  4 
>  qapi/job.json|  2 +-
>  qapi/qapi-schema.json|  2 +-
>  storage-daemon/qapi/qapi-schema.json | 22 +++---
>  4 files changed, 21 insertions(+), 9 deletions(-)

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Re: [PATCH 00/16] qapi qga/qapi-schema: Doc fixes

2023-04-04 Thread Marc-André Lureau

On Tue, Apr 4, 2023 at 4:02 PM Markus Armbruster  wrote:
>
> It's always nice to get doc fixes into the release, but if it's too
> late, it's too late.
>
> Generated code does not change, except for the last patch, which moves
> a bit of code without changing it.
>
> Markus Armbruster (16):
>   qga/qapi-schema: Tidy up documentation of guest-fsfreeze-status
>   qga/qapi-schema: Fix a misspelled reference
>   qapi: Fix misspelled references
>   qapi: Fix up references to long gone error classes
>   qapi/block-core: Clean up after removal of dirty bitmap @status
>   qapi: @foo should be used to reference, not ``foo``
>   qapi: Tidy up examples
>   qapi: Delete largely misleading "Stability Considerations"
>   qapi: Fix bullet list markup in documentation
>   qapi: Fix unintended definition lists in documentation
>   qga/qapi-schema: Fix member documentation markup
>   qapi: Fix argument documentation markup
>   qapi: Replace ad hoc "since" documentation by member documentation
>   qapi: Fix misspelled section tags in doc comments
>   qapi: Format since information the conventional way: (since X.Y)
>   qapi storage-daemon/qapi: Fix documentation section structure
>
>  docs/devel/qapi-code-gen.rst |  8 ++-
>  docs/interop/firmware.json   |  6 +-
>  qapi/block-core.json | 82 ++--
>  qapi/block.json  |  2 +-
>  qapi/char.json   |  4 +-
>  qapi/control.json|  2 +-
>  qapi/cryptodev.json  |  4 ++
>  qapi/job.json|  4 +-
>  qapi/machine-target.json |  2 +-
>  qapi/machine.json| 26 +
>  qapi/migration.json  | 37 -
>  qapi/misc.json   |  6 +-
>  qapi/net.json| 25 +++--
>  qapi/qapi-schema.json| 24 +---
>  qapi/qdev.json   |  2 +-
>  qapi/qom.json|  4 +-
>  qapi/rdma.json   |  2 +-
>  qapi/replay.json |  3 +
>  qapi/run-state.json  |  6 +-
>  qapi/stats.json  |  3 +-
>  qapi/tpm.json|  3 +-
>  qapi/trace.json  |  1 +
>  qapi/ui.json | 12 ++--
>  qga/qapi-schema.json | 10 ++--
>  storage-daemon/qapi/qapi-schema.json | 22 +---
>  25 files changed, 154 insertions(+), 146 deletions(-)
>
> --
> 2.39.2
>
>

lgtm,
Reviewed-by: Marc-André Lureau 



-- 
Marc-André Lureau

Re: [PATCH 01/13] virtio-scsi: avoid race between unplug and transport event

2023-04-04 Thread Stefan Hajnoczi

On Mon, Apr 03, 2023 at 10:47:11PM +0200, Philippe Mathieu-Daudé wrote:
> On 3/4/23 20:29, Stefan Hajnoczi wrote:
> > Only report a transport reset event to the guest after the SCSIDevice
> > has been unrealized by qdev_simple_device_unplug_cb().
> > 
> > qdev_simple_device_unplug_cb() sets the SCSIDevice's qdev.realized field
> > to false so that scsi_device_find/get() no longer see it.
> > 
> > scsi_target_emulate_report_luns() also needs to be updated to filter out
> > SCSIDevices that are unrealized.
> > 
> > These changes ensure that the guest driver does not see the SCSIDevice
> > that's being unplugged if it responds very quickly to the transport
> > reset event.
> > 
> > Signed-off-by: Stefan Hajnoczi 
> > ---
> >   hw/scsi/scsi-bus.c|  3 ++-
> >   hw/scsi/virtio-scsi.c | 18 +-
> >   2 files changed, 11 insertions(+), 10 deletions(-)
> > 
> > diff --git a/hw/scsi/scsi-bus.c b/hw/scsi/scsi-bus.c
> > index c97176110c..f9bd064833 100644
> > --- a/hw/scsi/scsi-bus.c
> > +++ b/hw/scsi/scsi-bus.c
> > @@ -487,7 +487,8 @@ static bool 
> > scsi_target_emulate_report_luns(SCSITargetReq *r)
> >   DeviceState *qdev = kid->child;
> >   SCSIDevice *dev = SCSI_DEVICE(qdev);
> > -if (dev->channel == channel && dev->id == id && dev->lun != 0) 
> > {
> > +if (dev->channel == channel && dev->id == id && dev->lun != 0 
> > &&
> > +qatomic_load_acquire(&dev->qdev.realized)) {
> 
> Would this be more useful as a qdev_is_realized() helper?

Yes. There are no other users, but I think a helper makes sense.

Stefan


signature.asc
Description: PGP signature

Re: [PATCH 00/16] qapi qga/qapi-schema: Doc fixes

2023-04-04 Thread Vladimir Sementsov-Ogievskiy


On 04.04.23 14:58, Markus Armbruster wrote:

It's always nice to get doc fixes into the release, but if it's too
late, it's too late.

Generated code does not change, except for the last patch, which moves
a bit of code without changing it.



I didn't deeply check the details, but looked through and nothing seems wrong 
to me. Good cleanup!

all patches:
Reviewed-by: Vladimir Sementsov-Ogievskiy 


PS: do you plan some automatic checks in build process to avoid similar 
style/naming problems in future?

--
Best regards,
Vladimir

Re: [PATCH 15/16] qapi: Format since information the conventional way: (since X.Y)

2023-04-04 Thread Vladimir Sementsov-Ogievskiy


On 04.04.23 14:59, Markus Armbruster wrote:

@@ -2741,7 +2741,7 @@
  #
  # @on-error: the action to take on an error (default report).
  #'stop' and 'enospc' can only be used if the block device
-#supports io-status (see BlockInfo).  Since 1.3.
+#supports io-status (see BlockInfo).  (Since 1.3)
  #
  # @filter-node-name: the node name that should be assigned to the
  #filter driver that the stream job inserts into the graph
diff --git a/qapi/stats.json b/qapi/stats.json
index f17495ee65..36d5f4dc94 100644
--- a/qapi/stats.json
+++ b/qapi/stats.json
@@ -69,7 +69,7 @@
  #
  # @vcpu: statistics that apply to a single virtual CPU.
  #
-# @cryptodev: statistics that apply to a crypto device. since 8.0
+# @cryptodev: statistics that apply to a crypto device (since 8.0)
  #
  # Since: 7.1
  ##
diff --git a/qapi/tpm.json b/qapi/tpm.json
index 4e2ea9756a..eac87d30b2 100644
--- a/qapi/tpm.json
+++ b/qapi/tpm.json
@@ -44,8 +44,7 @@
  # An enumeration of TPM types
  #
  # @passthrough: TPM passthrough type
-# @emulator: Software Emulator TPM type
-#Since: 2.11
+# @emulator: Software Emulator TPM type (since 2.11)


Seems, we don't have any preference between "some text (since VER)" vs "some text. 
(Since VER)" ?

--
Best regards,
Vladimir

Re: [PATCH 10/16] qapi: Fix unintended definition lists in documentation

2023-04-04 Thread Peter Maydell

On Tue, 4 Apr 2023 at 12:59, Markus Armbruster  wrote:
>
> rST parses something like
>
> first line
> second line
>
> as a definition list item, where "first line" is the term being
> defined by "second line".
>
> This bites us in a couple of places.  Here's one:
>
> # @bps_max: total throughput limit during bursts,
> # in bytes (Since 1.7)
>
> scripts/qapi/parser.py parses this into an "argument section" with
> name "bps_max" and text
>
> total throughput limit during bursts,
>   in bytes (Since 1.7)
>
> docs/sphinx/qapidoc.py duly passes the text to the rST parser, which
> parses it as another definition list.  Comes out as nested
> definitions: term "bps_max: int (optional)" defined as term "total
> throughput limit during bursts," defined as "in bytes (Since 1.7)".
>
> rST truly is the Perl of ASCII-based markups.
>
> Fix by deleting the extra indentation.
>
> Reported-by: Peter Maydell 
> Signed-off-by: Markus Armbruster 

Reviewed-by: Peter Maydell 

thanks
-- PMM

[PATCH 07/16] qapi: Tidy up examples

2023-04-04 Thread Markus Armbruster

A few examples neglect to prefix QMP input with '->'.  Fix that.

A few examples neglect to show output.  Provide some.  The example
output for query-vcpu-dirty-limit could use further improvement.  Add
a TODO comment.

Use "Examples:" instead of "Example:" where multiple examples are
given.

One example section numbers its two examples.  Not done elswehre;
drop.

Another example section separates them with "or".  Likewise.

Signed-off-by: Markus Armbruster 
---
 qapi/block-core.json | 14 ++
 qapi/block.json  |  2 +-
 qapi/char.json   |  4 ++--
 qapi/machine.json|  7 ---
 qapi/migration.json  | 33 ++---
 qapi/net.json|  4 +---
 qapi/qdev.json   |  2 +-
 qapi/qom.json|  2 +-
 qapi/replay.json |  3 +++
 qapi/ui.json |  2 +-
 10 files changed, 42 insertions(+), 31 deletions(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index eeb2ed3f16..a5a5007b28 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -4574,9 +4574,8 @@
 #
 # Since: 2.9
 #
-# Example:
+# Examples:
 #
-# 1.
 # -> { "execute": "blockdev-add",
 #  "arguments": {
 #   "driver": "qcow2",
@@ -4589,7 +4588,6 @@
 # }
 # <- { "return": {} }
 #
-# 2.
 # -> { "execute": "blockdev-add",
 #  "arguments": {
 #   "driver": "qcow2",
@@ -5596,7 +5594,7 @@
 #
 # Since: 2.7
 #
-# Example:
+# Examples:
 #
 # 1. Add a new node to a quorum
 # -> { "execute": "blockdev-add",
@@ -5646,7 +5644,7 @@
 #
 # Since: 2.12
 #
-# Example:
+# Examples:
 #
 # 1. Move a node into an IOThread
 # -> { "execute": "x-blockdev-set-iothread",
@@ -5731,18 +5729,18 @@
 #
 # Since: 2.0
 #
-# Example:
+# Examples:
 #
 # 1. Read operation
 #
-# { "event": "QUORUM_REPORT_BAD",
+# <- { "event": "QUORUM_REPORT_BAD",
 #  "data": { "node-name": "node0", "sector-num": 345435, "sectors-count": 
5,
 #"type": "read" },
 #  "timestamp": { "seconds": 1344522075, "microseconds": 745528 } }
 #
 # 2. Flush operation
 #
-# { "event": "QUORUM_REPORT_BAD",
+# <- { "event": "QUORUM_REPORT_BAD",
 #  "data": { "node-name": "node0", "sector-num": 0, "sectors-count": 
2097120,
 #"type": "flush", "error": "Broken pipe" },
 #  "timestamp": { "seconds": 1456406829, "microseconds": 291763 } }
diff --git a/qapi/block.json b/qapi/block.json
index 5fe068f903..94339a1761 100644
--- a/qapi/block.json
+++ b/qapi/block.json
@@ -457,7 +457,7 @@
 #
 # Since: 1.1
 #
-# Example:
+# Examples:
 #
 # -> { "execute": "block_set_io_throttle",
 #  "arguments": { "id": "virtio-blk-pci0/virtio-backend",
diff --git a/qapi/char.json b/qapi/char.json
index 923dc5056d..c9431dd0a7 100644
--- a/qapi/char.json
+++ b/qapi/char.json
@@ -637,7 +637,7 @@
 #
 # Since: 1.4
 #
-# Example:
+# Examples:
 #
 # -> { "execute" : "chardev-add",
 #  "arguments" : { "id" : "foo",
@@ -673,7 +673,7 @@
 #
 # Since: 2.10
 #
-# Example:
+# Examples:
 #
 # -> { "execute" : "chardev-change",
 #  "arguments" : { "id" : "baz",
diff --git a/qapi/machine.json b/qapi/machine.json
index 8c3c58c763..20541cb319 100644
--- a/qapi/machine.json
+++ b/qapi/machine.json
@@ -954,7 +954,7 @@
 #
 # Since: 2.7
 #
-# Example:
+# Examples:
 #
 # For pseries machine type started with -smp 2,cores=2,maxcpus=4 -cpu POWER8:
 #
@@ -1677,8 +1677,9 @@
 # Since: 7.2
 #
 # Example:
-#   {"execute": "dumpdtb"}
-#"arguments": { "filename": "fdt.dtb" } }
+# -> { "execute": "dumpdtb" }
+#  "arguments": { "filename": "fdt.dtb" } }
+# <- { "return": {} }
 #
 ##
 { 'command': 'dumpdtb',
diff --git a/qapi/migration.json b/qapi/migration.json
index 87c174dca2..477ee4d35b 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -273,7 +273,7 @@
 #
 # Since: 0.14
 #
-# Example:
+# Examples:
 #
 # 1. Before the first migration
 #
@@ -521,6 +521,7 @@
 #
 # -> { "execute": "migrate-set-capabilities" , "arguments":
 #  { "capabilities": [ { "capability": "xbzrle", "state": true } ] } }
+# <- { "return": {} }
 #
 ##
 { 'command': 'migrate-set-capabilities',
@@ -989,6 +990,7 @@
 #
 # -> { "execute": "migrate-set-parameters" ,
 #  "arguments": { "compress-level": 1 } }
+# <- { "return": {} }
 #
 ##
 { 'command': 'migrate-set-parameters', 'boxed': true,
@@ -1279,8 +1281,8 @@
 #
 # Example:
 #
-# { "timestamp": {"seconds": 1449669631, "microseconds": 239225},
-#   "event": "MIGRATION_PASS", "data": {"pass": 2} }
+# <- { "timestamp": {"seconds": 1449669631, "microseconds": 239225},
+#   "event": "MIGRATION_PASS", "data": {"pass": 2} }
 #
 ##
 { 'event': 'MIGRATION_PASS',
@@ -1861,8 +1863,9 @@
 #
 # Example:
 #
-#   {"execute": "calc-dirty-rate", "arguments": {"calc-time": 1,
-#'sample-pages': 512} }
+# -> {"execute": "calc-dirty-rate", "arguments": {"calc-time": 1,
+# 'sample-pages': 512} }
+# <- { "return": {} }
 #
 ##
 { 'command': 'calc-dirty-rate', 'data': {'calc-time': 'int64',
@@ -1914,9

[PATCH 14/16] qapi: Fix misspelled section tags in doc comments

2023-04-04 Thread Markus Armbruster

Section tags are case sensitive and end with a colon.  Screwing up
either gets them interpreted as ordinary paragraph.  Fix a few.

Signed-off-by: Markus Armbruster 
---
 qapi/machine.json   | 4 ++--
 qapi/migration.json | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/qapi/machine.json b/qapi/machine.json
index 172ef0ad96..135a4926a9 100644
--- a/qapi/machine.json
+++ b/qapi/machine.json
@@ -517,7 +517,7 @@
 # @targets: Target root bridge IDs from -device ...,id= for each root
 #   bridge.
 #
-# Since 7.1
+# Since: 7.1
 ##
 { 'struct': 'CXLFixedMemoryWindowOptions',
   'data': {
@@ -532,7 +532,7 @@
 #
 # @cxl-fmw: List of CXLFixedMemoryWindowOptions
 #
-# Since 7.1
+# Since: 7.1
 ##
 { 'struct' : 'CXLFMWProperties',
   'data': { 'cxl-fmw': ['CXLFixedMemoryWindowOptions'] }
diff --git a/qapi/migration.json b/qapi/migration.json
index 477ee4d35b..1b6c7b19e4 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -1991,7 +1991,7 @@
 #
 # data: migration thread name
 #
-# returns: information about migration threads
+# Returns: information about migration threads
 #
 # Since: 7.2
 ##
-- 
2.39.2

[PATCH 06/16] qapi: @foo should be used to reference, not ``foo``

2023-04-04 Thread Markus Armbruster

Documentation suggests @foo is merely shorthand for ``foo``.  It's
not, it carries additional meaning: it's a reference to a QAPI schema
name.

Reword the documentation to spell that out.

Fix up the few ``foo`` that should be @foo.

Signed-off-by: Markus Armbruster 
---
 docs/devel/qapi-code-gen.rst | 8 +---
 docs/interop/firmware.json   | 6 +++---
 qapi/qom.json| 2 +-
 qapi/ui.json | 2 +-
 4 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/docs/devel/qapi-code-gen.rst b/docs/devel/qapi-code-gen.rst
index 23e7f2fb1c..23f9059db2 100644
--- a/docs/devel/qapi-code-gen.rst
+++ b/docs/devel/qapi-code-gen.rst
@@ -925,9 +925,11 @@ first character of the first line.
 
 The usual strong, *\*emphasized\** and literal markup
 should be used.  If you need a single literal ``*``, you will need to
-backslash-escape it.  As an extension beyond the usual rST syntax, you
-can also use ``@foo`` to reference a name in the schema; this is rendered
-the same way as foo.
+backslash-escape it.
+
+Use ``@foo`` to reference a name in the schema.  This is an rST
+extension.  It is rendered the same way as foo, but carries
+additional meaning.
 
 Example::
 
diff --git a/docs/interop/firmware.json b/docs/interop/firmware.json
index 56814f02b3..cc8f869186 100644
--- a/docs/interop/firmware.json
+++ b/docs/interop/firmware.json
@@ -258,7 +258,7 @@
 #
 # @mode: Describes how the firmware build handles code versus variable
 #storage. If not present, it must be treated as if it was
-#configured with value ``split``. Since: 7.0.0
+#configured with value @split. Since: 7.0.0
 #
 # @executable: Identifies the firmware executable. The @mode
 #  indicates whether there will be an associated
@@ -267,13 +267,13 @@
 #  -drive 
if=none,id=pflash0,readonly=on,file=@executable.@filename,format=@executable.@format
 #  -machine pflash0=pflash0
 #  or equivalent -blockdev instead of -drive. When
-#  @mode is ``combined`` the executable must be
+#  @mode is @combined the executable must be
 #  cloned before use and configured with readonly=off.
 #  With QEMU versions older than 4.0, you have to use
 #  -drive 
if=pflash,unit=0,readonly=on,file=@executable.@filename,format=@executable.@format
 #
 # @nvram-template: Identifies the NVRAM template compatible with
-#  @executable, when @mode is set to ``split``,
+#  @executable, when @mode is set to @split,
 #  otherwise it should not be present.
 #  Management software instantiates an
 #  individual copy -- a specific NVRAM file -- from
diff --git a/qapi/qom.json b/qapi/qom.json
index a877b879b9..4fe7a93a75 100644
--- a/qapi/qom.json
+++ b/qapi/qom.json
@@ -637,7 +637,7 @@
 #
 # @discard-data: if true, the file contents can be destroyed when QEMU exits,
 #to avoid unnecessarily flushing data to the backing file. Note
-#that ``discard-data`` is only an optimization, and QEMU might
+#that @discard-data is only an optimization, and QEMU might
 #not discard file contents if it aborts unexpectedly or is
 #terminated using SIGKILL. (default: false)
 #
diff --git a/qapi/ui.json b/qapi/ui.json
index 25f9d731df..c383c11097 100644
--- a/qapi/ui.json
+++ b/qapi/ui.json
@@ -1247,7 +1247,7 @@
 #  available node on the host.
 #
 # @p2p: Whether to use peer-to-peer connections (accepted through
-#   ``add_client``).
+#   @add_client).
 #
 # @audiodev: Use the specified DBus audiodev to export audio.
 #
-- 
2.39.2

[PATCH 02/16] qga/qapi-schema: Fix a misspelled reference

2023-04-04 Thread Markus Armbruster

Code returns a list of GuestNetworkInterface, documentation claims
GuestNetworkInfo, which doesn't exist.  Fix the documentation.

Fixes: 3424fc9f16a1 (qemu-ga: add guest-network-get-interfaces command)
Signed-off-by: Markus Armbruster 
---
 qga/qapi-schema.json | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/qga/qapi-schema.json b/qga/qapi-schema.json
index f349345116..bb9bac0655 100644
--- a/qga/qapi-schema.json
+++ b/qga/qapi-schema.json
@@ -722,7 +722,7 @@
 # Get list of guest IP addresses, MAC addresses
 # and netmasks.
 #
-# Returns: List of GuestNetworkInfo on success.
+# Returns: List of GuestNetworkInterface on success.
 #
 # Since: 1.1
 ##
-- 
2.39.2

[PATCH 16/16] qapi storage-daemon/qapi: Fix documentation section structure

2023-04-04 Thread Markus Armbruster

In the QEMU QMP Reference Manual, subsection "Block core (VM
unrelated)" is empty.  Its contents is at the end of subsection
"Background jobs" instead.  That's because qapi/job.json is includeded
first from qapi/block-core.json, which makes qapi/job.json's
documentation go between qapi/block-core.json's subsection heading and
contents.

In the QEMU Storage Daemon QMP Reference Manual, section "Block
Devices" contains nothing but an empty subsection "Block core (VM
unrelated)".  The latter's contents is at the end section "Socket data
types", along with subsection "Block device exports".  Subsection
"Background jobs" is at the end of section "Cryptography".  All this
is because storage-daemon/qapi/qapi-schema.json includes modules in a
confused order.

Fix both as follows.

Turn subsection "Background jobs" into a section.

Move it before section "Block devices" in the QEMU QMP Reference
Manual, by including qapi/jobs.json right before qapi/block.json.

Reorder include directives in storage-daemon/qapi/qapi-schema.json to
match the order in qapi/qapi-schema.json, so that the QEMU Storage
Daemon QMP Reference Manual's section structure the QEMU QMP Reference
Manual's.

In the QEMU QMP Reference Manual, qapi/cryptodev.json's documentation
is at the end of section "Virtio devices".  That's because it lacks a
section heading, and therefore gets squashed into whatever section
happens to precede it.

Add section heading so it's in section "Cryptography devices".

Signed-off-by: Markus Armbruster 
---
 qapi/cryptodev.json  |  4 
 qapi/job.json|  2 +-
 qapi/qapi-schema.json|  2 +-
 storage-daemon/qapi/qapi-schema.json | 22 +++---
 4 files changed, 21 insertions(+), 9 deletions(-)

diff --git a/qapi/cryptodev.json b/qapi/cryptodev.json
index f33f96a692..cf960ea81f 100644
--- a/qapi/cryptodev.json
+++ b/qapi/cryptodev.json
@@ -4,6 +4,10 @@
 # This work is licensed under the terms of the GNU GPL, version 2 or later.
 # See the COPYING file in the top-level directory.
 
+##
+# = Cryptography devices
+##
+
 ##
 # @QCryptodevBackendAlgType:
 #
diff --git a/qapi/job.json b/qapi/job.json
index bc4104757a..9e29a796c5 100644
--- a/qapi/job.json
+++ b/qapi/job.json
@@ -2,7 +2,7 @@
 # vim: filetype=python
 
 ##
-# == Background jobs
+# = Background jobs
 ##
 
 ##
diff --git a/qapi/qapi-schema.json b/qapi/qapi-schema.json
index e57d8ff801..bb7217da26 100644
--- a/qapi/qapi-schema.json
+++ b/qapi/qapi-schema.json
@@ -43,11 +43,11 @@
 { 'include': 'sockets.json' }
 { 'include': 'run-state.json' }
 { 'include': 'crypto.json' }
+{ 'include': 'job.json' }
 { 'include': 'block.json' }
 { 'include': 'block-export.json' }
 { 'include': 'char.json' }
 { 'include': 'dump.json' }
-{ 'include': 'job.json' }
 { 'include': 'net.json' }
 { 'include': 'rdma.json' }
 { 'include': 'rocker.json' }
diff --git a/storage-daemon/qapi/qapi-schema.json 
b/storage-daemon/qapi/qapi-schema.json
index 67749d1101..f10c949490 100644
--- a/storage-daemon/qapi/qapi-schema.json
+++ b/storage-daemon/qapi/qapi-schema.json
@@ -15,18 +15,26 @@
 
 { 'include': '../../qapi/pragma.json' }
 
+# Documentation generated with qapi-gen.py is in source order, with
+# included sub-schemas inserted at the first include directive
+# (subsequent include directives have no effect).  To get a sane and
+# stable order, it's best to include each sub-schema just once, or
+# include it first right here.
+
+{ 'include': '../../qapi/common.json' }
+{ 'include': '../../qapi/sockets.json' }
+{ 'include': '../../qapi/crypto.json' }
+{ 'include': '../../qapi/job.json' }
+
 ##
 # = Block devices
 ##
 { 'include': '../../qapi/block-core.json' }
 { 'include': '../../qapi/block-export.json' }
+
 { 'include': '../../qapi/char.json' }
-{ 'include': '../../qapi/common.json' }
-{ 'include': '../../qapi/control.json' }
-{ 'include': '../../qapi/crypto.json' }
-{ 'include': '../../qapi/introspect.json' }
-{ 'include': '../../qapi/job.json' }
 { 'include': '../../qapi/authz.json' }
-{ 'include': '../../qapi/qom.json' }
-{ 'include': '../../qapi/sockets.json' }
 { 'include': '../../qapi/transaction.json' }
+{ 'include': '../../qapi/control.json' }
+{ 'include': '../../qapi/introspect.json' }
+{ 'include': '../../qapi/qom.json' }
-- 
2.39.2

[PATCH 09/16] qapi: Fix bullet list markup in documentation

2023-04-04 Thread Markus Armbruster

Peter Maydell's commit 100cc4fe0f08 explains:

rST insists on a blank line before and after a bulleted list [...]
Add some extra blank lines in the doc comments so they're
acceptable rST input.

It missed one in qapi/trace.json.

Paolo Bonzini later added another instance in qapi/stats.json,
providing further, if unintended, evidence for his quip that rST is
the Perl of ASCII-based markups.

Both are parsed as ordinary paragraph, resulting in garbled output.

Add the blank lines we need to get the bullet lists recognized as
such.

Fixes: 100cc4fe0f0827f8da1a5c05f9c65e2aaa40e03d (qapi: Add blank lines before 
bulleted lists)
Fixes: 467ef823d83e (qmp: add filtering of statistics by target vCPU)
Signed-off-by: Markus Armbruster 
---
 qapi/stats.json | 1 +
 qapi/trace.json | 1 +
 2 files changed, 2 insertions(+)

diff --git a/qapi/stats.json b/qapi/stats.json
index 1f5d3c59ab..f17495ee65 100644
--- a/qapi/stats.json
+++ b/qapi/stats.json
@@ -107,6 +107,7 @@
 # The arguments to the query-stats command; specifies a target for which to
 # request statistics and optionally the required subset of information for
 # that target:
+#
 # - which vCPUs to request statistics for
 # - which providers to request statistics from
 # - which named values to return within each provider
diff --git a/qapi/trace.json b/qapi/trace.json
index 6c6982a587..f425d10764 100644
--- a/qapi/trace.json
+++ b/qapi/trace.json
@@ -87,6 +87,7 @@
 # @vcpu: The vCPU to act upon (all by default; since 2.7).
 #
 # An event's state is modified if:
+#
 # - its name matches the @name pattern, and
 # - if @vcpu is given, the event has the "vcpu" property.
 #
-- 
2.39.2

[PATCH 10/16] qapi: Fix unintended definition lists in documentation

2023-04-04 Thread Markus Armbruster

rST parses something like

first line
second line

as a definition list item, where "first line" is the term being
defined by "second line".

This bites us in a couple of places.  Here's one:

# @bps_max: total throughput limit during bursts,
# in bytes (Since 1.7)

scripts/qapi/parser.py parses this into an "argument section" with
name "bps_max" and text

total throughput limit during bursts,
  in bytes (Since 1.7)

docs/sphinx/qapidoc.py duly passes the text to the rST parser, which
parses it as another definition list.  Comes out as nested
definitions: term "bps_max: int (optional)" defined as term "total
throughput limit during bursts," defined as "in bytes (Since 1.7)".

rST truly is the Perl of ASCII-based markups.

Fix by deleting the extra indentation.

Reported-by: Peter Maydell 
Signed-off-by: Markus Armbruster 
---
 qapi/block-core.json | 52 ++--
 qapi/control.json|  2 +-
 2 files changed, 27 insertions(+), 27 deletions(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index a5a5007b28..2382772e17 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -488,41 +488,41 @@
 #
 # @image: the info of image used (since: 1.6)
 #
-# @bps_max: total throughput limit during bursts,
-# in bytes (Since 1.7)
+# @bps_max: total throughput limit during bursts, in bytes
+#   (Since 1.7)
 #
-# @bps_rd_max: read throughput limit during bursts,
-#in bytes (Since 1.7)
+# @bps_rd_max: read throughput limit during bursts, in bytes
+#  (Since 1.7)
 #
-# @bps_wr_max: write throughput limit during bursts,
-#in bytes (Since 1.7)
+# @bps_wr_max: write throughput limit during bursts, in bytes
+#  (Since 1.7)
 #
-# @iops_max: total I/O operations per second during bursts,
-#  in bytes (Since 1.7)
+# @iops_max: total I/O operations per second during bursts, in bytes
+#(Since 1.7)
 #
-# @iops_rd_max: read I/O operations per second during bursts,
-# in bytes (Since 1.7)
+# @iops_rd_max: read I/O operations per second during bursts, in bytes
+#   (Since 1.7)
 #
-# @iops_wr_max: write I/O operations per second during bursts,
-# in bytes (Since 1.7)
+# @iops_wr_max: write I/O operations per second during bursts, in
+#   bytes (Since 1.7)
 #
-# @bps_max_length: maximum length of the @bps_max burst
-#period, in seconds. (Since 2.6)
+# @bps_max_length: maximum length of the @bps_max burst period, in
+#  seconds. (Since 2.6)
 #
-# @bps_rd_max_length: maximum length of the @bps_rd_max
-#   burst period, in seconds. (Since 2.6)
+# @bps_rd_max_length: maximum length of the @bps_rd_max burst period,
+# in seconds. (Since 2.6)
 #
-# @bps_wr_max_length: maximum length of the @bps_wr_max
-#   burst period, in seconds. (Since 2.6)
+# @bps_wr_max_length: maximum length of the @bps_wr_max burst period,
+# in seconds. (Since 2.6)
 #
-# @iops_max_length: maximum length of the @iops burst
-# period, in seconds. (Since 2.6)
+# @iops_max_length: maximum length of the @iops burst period, in
+#   seconds. (Since 2.6)
 #
-# @iops_rd_max_length: maximum length of the @iops_rd_max
-#burst period, in seconds. (Since 2.6)
+# @iops_rd_max_length: maximum length of the @iops_rd_max burst
+#  period, in seconds. (Since 2.6)
 #
-# @iops_wr_max_length: maximum length of the @iops_wr_max
-#burst period, in seconds. (Since 2.6)
+# @iops_wr_max_length: maximum length of the @iops_wr_max burst
+#  period, in seconds. (Since 2.6)
 #
 # @iops_size: an I/O size in bytes (Since 1.7)
 #
@@ -948,7 +948,7 @@
 #   by the device (Since 4.2)
 #
 # @invalid_rd_operations: The number of invalid read operations
-#  performed by the device (Since 2.5)
+# performed by the device (Since 2.5)
 #
 # @invalid_wr_operations: The number of invalid write operations
 # performed by the device (Since 2.5)
@@ -3735,7 +3735,7 @@
 # Driver specific block device options for Quorum
 #
 # @blkverify: true if the driver must print content mismatch
-#  set to false by default
+# set to false by default
 #
 # @children: the children block devices to use
 #
diff --git a/qapi/control.json b/qapi/control.json
index afca2043af..f83499280a 100644
--- a/qapi/control.json
+++ b/qapi/control.json
@@ -195,7 +195,7 @@
 # @id: Name of the monitor
 #
 # @mode: Selects the monitor mode (default: readline in the system
-#   emulator, control in qemu-storage-daemon)
+#emula

[PATCH 01/16] qga/qapi-schema: Tidy up documentation of guest-fsfreeze-status

2023-04-04 Thread Markus Armbruster

Delete "error state indicates", because it doesn't make sense.
I suspect it was an accident.

Signed-off-by: Markus Armbruster 
---
 qga/qapi-schema.json | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/qga/qapi-schema.json b/qga/qapi-schema.json
index 796434ed34..f349345116 100644
--- a/qga/qapi-schema.json
+++ b/qga/qapi-schema.json
@@ -420,7 +420,7 @@
 ##
 # @guest-fsfreeze-status:
 #
-# Get guest fsfreeze state. error state indicates
+# Get guest fsfreeze state.
 #
 # Returns: GuestFsfreezeStatus ("thawed", "frozen", etc., as defined below)
 #
-- 
2.39.2

[PATCH 12/16] qapi: Fix argument documentation markup

2023-04-04 Thread Markus Armbruster

Member / argument documentation of BlockdevAmendOptionsQcow2,
job-resume, and RDMA_GID_STATUS_CHANGED is parsed as ordinary text due
to missing colon or space before the colon.  The generated
documentation shows these members / arguments as "Not documented".

The fix is obvious: add missing colons, delete extra spaces.

Signed-off-by: Markus Armbruster 
---
 qapi/block-core.json | 2 +-
 qapi/job.json| 2 +-
 qapi/rdma.json   | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index 2382772e17..9dd5ed9a47 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -5155,7 +5155,7 @@
 # Driver specific image amend options for qcow2.
 # For now, only encryption options can be amended
 #
-# @encrypt  Encryption options to be amended
+# @encrypt: Encryption options to be amended
 #
 # Since: 5.1
 ##
diff --git a/qapi/job.json b/qapi/job.json
index d5f84e9615..bc4104757a 100644
--- a/qapi/job.json
+++ b/qapi/job.json
@@ -148,7 +148,7 @@
 # This command returns immediately after resuming a paused job. Resuming an
 # already running job is an error.
 #
-# @id : The job identifier.
+# @id: The job identifier.
 #
 # Since: 3.0
 ##
diff --git a/qapi/rdma.json b/qapi/rdma.json
index a1d2175a8b..5b6b66afa4 100644
--- a/qapi/rdma.json
+++ b/qapi/rdma.json
@@ -17,7 +17,7 @@
 #
 # @subnet-prefix: Subnet Prefix
 #
-# @interface-id : Interface ID
+# @interface-id: Interface ID
 #
 # Since: 4.0
 #
-- 
2.39.2

[PATCH 08/16] qapi: Delete largely misleading "Stability Considerations"

2023-04-04 Thread Markus Armbruster

Documentation section "Stability Considerations" dates back to the
early days of QMP (commit 82a56f0d83d (Monitor: Introduce the
qmp-commands.hx file)).  It became largely misleading years ago.
Delete it.

Signed-off-by: Markus Armbruster 
---
 qapi/qapi-schema.json | 22 --
 1 file changed, 22 deletions(-)

diff --git a/qapi/qapi-schema.json b/qapi/qapi-schema.json
index 7c09af5cc8..e57d8ff801 100644
--- a/qapi/qapi-schema.json
+++ b/qapi/qapi-schema.json
@@ -28,28 +28,6 @@
 #
 # Please, refer to the QMP specification (docs/interop/qmp-spec.txt) for
 # detailed information on the Server command and response formats.
-#
-# = Stability Considerations
-#
-# The current QMP command set (described in this file) may be useful for a
-# number of use cases, however it's limited and several commands have bad
-# defined semantics, specially with regard to command completion.
-#
-# These problems are going to be solved incrementally in the next QEMU releases
-# and we're going to establish a deprecation policy for badly defined commands.
-#
-# If you're planning to adopt QMP, please observe the following:
-#
-# 1. The deprecation policy will take effect and be documented soon, please
-#check the documentation of each used command as soon as a new release 
of
-#QEMU is available
-#
-# 2. DO NOT rely on anything which is not explicit documented
-#
-# 3. Errors, in special, are not documented. Applications should NOT check
-#for specific errors classes or data (it's strongly recommended to only
-#check for the "error" key)
-#
 ##
 
 { 'include': 'pragma.json' }
-- 
2.39.2

[PATCH 05/16] qapi/block-core: Clean up after removal of dirty bitmap @status

2023-04-04 Thread Markus Armbruster

Commit 81cbfd50886 (block: remove dirty bitmaps 'status' field)
removed deprecated BlockDirtyInfo member @status.  It neglected to
remove references to its enumeration values from the documentation of
its replacements.  Do that now.

Signed-off-by: Markus Armbruster 
---
 qapi/block-core.json | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index 75f7c62405..eeb2ed3f16 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -582,11 +582,11 @@
 # @granularity: granularity of the dirty bitmap in bytes (since 1.4)
 #
 # @recording: true if the bitmap is recording new writes from the guest.
-# Replaces ``active`` and ``disabled`` statuses. (since 4.0)
+# (since 4.0)
 #
 # @busy: true if the bitmap is in-use by some operation (NBD or jobs)
 #and cannot be modified via QMP or used by another operation.
-#Replaces ``locked`` and ``frozen`` statuses. (since 4.0)
+#(since 4.0)
 #
 # @persistent: true if the bitmap was stored on disk, is scheduled to be stored
 #  on disk, or both. (since 4.0)
-- 
2.39.2

[PATCH 15/16] qapi: Format since information the conventional way: (since X.Y)

2023-04-04 Thread Markus Armbruster

Signed-off-by: Markus Armbruster 
---
 qapi/block-core.json | 6 +++---
 qapi/stats.json  | 2 +-
 qapi/tpm.json| 3 +--
 qapi/ui.json | 6 +++---
 4 files changed, 8 insertions(+), 9 deletions(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index 9dd5ed9a47..b57978957f 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -1318,10 +1318,10 @@
 #  value is monotonically increasing.
 #
 # @busy: false if the job is known to be in a quiescent state, with
-#no pending I/O.  Since 1.3.
+#no pending I/O.  (Since 1.3)
 #
 # @paused: whether the job is paused or, if @busy is true, will
-#  pause itself as soon as possible.  Since 1.3.
+#  pause itself as soon as possible.  (Since 1.3)
 #
 # @speed: the rate limit, bytes per second
 #
@@ -2741,7 +2741,7 @@
 #
 # @on-error: the action to take on an error (default report).
 #'stop' and 'enospc' can only be used if the block device
-#supports io-status (see BlockInfo).  Since 1.3.
+#supports io-status (see BlockInfo).  (Since 1.3)
 #
 # @filter-node-name: the node name that should be assigned to the
 #filter driver that the stream job inserts into the graph
diff --git a/qapi/stats.json b/qapi/stats.json
index f17495ee65..36d5f4dc94 100644
--- a/qapi/stats.json
+++ b/qapi/stats.json
@@ -69,7 +69,7 @@
 #
 # @vcpu: statistics that apply to a single virtual CPU.
 #
-# @cryptodev: statistics that apply to a crypto device. since 8.0
+# @cryptodev: statistics that apply to a crypto device (since 8.0)
 #
 # Since: 7.1
 ##
diff --git a/qapi/tpm.json b/qapi/tpm.json
index 4e2ea9756a..eac87d30b2 100644
--- a/qapi/tpm.json
+++ b/qapi/tpm.json
@@ -44,8 +44,7 @@
 # An enumeration of TPM types
 #
 # @passthrough: TPM passthrough type
-# @emulator: Software Emulator TPM type
-#Since: 2.11
+# @emulator: Software Emulator TPM type (since 2.11)
 #
 # Since: 1.5
 ##
diff --git a/qapi/ui.json b/qapi/ui.json
index e9599dea50..88de458ba9 100644
--- a/qapi/ui.json
+++ b/qapi/ui.json
@@ -1207,13 +1207,13 @@
 #   window resizes (virtio-gpu) this will default to "on",
 #   assuming the guest will resize the display to match
 #   the window size then.  Otherwise it defaults to "off".
-#   Since 3.1
+#   (Since 3.1)
 # @show-tabs:   Display the tab bar for switching between the various graphical
 #   interfaces (e.g. VGA and virtual console character devices)
 #   by default.
-#   Since 7.1
+#   (Since 7.1)
 # @show-menubar: Display the main window menubar. Defaults to "on".
-#Since 8.0
+#(Since 8.0)
 #
 # Since: 2.12
 ##
-- 
2.39.2

[PATCH 00/16] qapi qga/qapi-schema: Doc fixes

2023-04-04 Thread Markus Armbruster

It's always nice to get doc fixes into the release, but if it's too
late, it's too late.

Generated code does not change, except for the last patch, which moves
a bit of code without changing it.

Markus Armbruster (16):
  qga/qapi-schema: Tidy up documentation of guest-fsfreeze-status
  qga/qapi-schema: Fix a misspelled reference
  qapi: Fix misspelled references
  qapi: Fix up references to long gone error classes
  qapi/block-core: Clean up after removal of dirty bitmap @status
  qapi: @foo should be used to reference, not ``foo``
  qapi: Tidy up examples
  qapi: Delete largely misleading "Stability Considerations"
  qapi: Fix bullet list markup in documentation
  qapi: Fix unintended definition lists in documentation
  qga/qapi-schema: Fix member documentation markup
  qapi: Fix argument documentation markup
  qapi: Replace ad hoc "since" documentation by member documentation
  qapi: Fix misspelled section tags in doc comments
  qapi: Format since information the conventional way: (since X.Y)
  qapi storage-daemon/qapi: Fix documentation section structure

 docs/devel/qapi-code-gen.rst |  8 ++-
 docs/interop/firmware.json   |  6 +-
 qapi/block-core.json | 82 ++--
 qapi/block.json  |  2 +-
 qapi/char.json   |  4 +-
 qapi/control.json|  2 +-
 qapi/cryptodev.json  |  4 ++
 qapi/job.json|  4 +-
 qapi/machine-target.json |  2 +-
 qapi/machine.json| 26 +
 qapi/migration.json  | 37 -
 qapi/misc.json   |  6 +-
 qapi/net.json| 25 +++--
 qapi/qapi-schema.json| 24 +---
 qapi/qdev.json   |  2 +-
 qapi/qom.json|  4 +-
 qapi/rdma.json   |  2 +-
 qapi/replay.json |  3 +
 qapi/run-state.json  |  6 +-
 qapi/stats.json  |  3 +-
 qapi/tpm.json|  3 +-
 qapi/trace.json  |  1 +
 qapi/ui.json | 12 ++--
 qga/qapi-schema.json | 10 ++--
 storage-daemon/qapi/qapi-schema.json | 22 +---
 25 files changed, 154 insertions(+), 146 deletions(-)

-- 
2.39.2

[PATCH 03/16] qapi: Fix misspelled references

2023-04-04 Thread Markus Armbruster

query-cpu-definitions returns a list of CpuDefinitionInfo, but
documentation claims CpuDefInfo, which doesn't exist.

query-migrate-capabilities returns a list of
MigrationCapabilityStatus, but documentation claims
MigrationCapabilitiesStatus, which doesn't exist.

balloon and query-balloon can fail with KVMMissingCap, but
documentation claims KvmMissingCap, which doesn't exist.

Fix the documentation.

Fixes: e4e31c6324af (qapi: add query-cpu-definitions command (v2))
Fixes: bbf6da32b5bd (Add migration capabilities)
Fixes: d72f326431e2 (qapi: Convert balloon)
Fixes: 96637bcdf9e0 (qapi: Convert query-balloon)
Signed-off-by: Markus Armbruster 
---
 qapi/machine-target.json | 2 +-
 qapi/machine.json| 4 ++--
 qapi/migration.json  | 2 +-
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/qapi/machine-target.json b/qapi/machine-target.json
index 2e267fa458..b94fbdb65e 100644
--- a/qapi/machine-target.json
+++ b/qapi/machine-target.json
@@ -331,7 +331,7 @@
 #
 # Return a list of supported virtual CPU definitions
 #
-# Returns: a list of CpuDefInfo
+# Returns: a list of CpuDefinitionInfo
 #
 # Since: 1.2
 ##
diff --git a/qapi/machine.json b/qapi/machine.json
index 604b686e59..8c3c58c763 100644
--- a/qapi/machine.json
+++ b/qapi/machine.json
@@ -1029,7 +1029,7 @@
 #
 # Returns: - Nothing on success
 #  - If the balloon driver is enabled but not functional because the 
KVM
-#kernel module cannot support it, KvmMissingCap
+#kernel module cannot support it, KVMMissingCap
 #  - If no balloon device is present, DeviceNotActive
 #
 # Notes: This command just issues a request to the guest.  When it returns,
@@ -1067,7 +1067,7 @@
 #
 # Returns: - @BalloonInfo on success
 #  - If the balloon driver is enabled but not functional because the 
KVM
-#kernel module cannot support it, KvmMissingCap
+#kernel module cannot support it, KVMMissingCap
 #  - If no balloon device is present, DeviceNotActive
 #
 # Since: 0.14
diff --git a/qapi/migration.json b/qapi/migration.json
index c84fa10e86..87c174dca2 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -531,7 +531,7 @@
 #
 # Returns information about the current migration capabilities status
 #
-# Returns: @MigrationCapabilitiesStatus
+# Returns: @MigrationCapabilityStatus
 #
 # Since: 1.2
 #
-- 
2.39.2

[PATCH 11/16] qga/qapi-schema: Fix member documentation markup

2023-04-04 Thread Markus Armbruster

GuestDiskStatsInfo's member documentation is parsed as ordinary text
due to missing colons.  The generated documentation shows these
members as "Not documented".

The fix is obvious: add the missing colons.

Signed-off-by: Markus Armbruster 
---
 qga/qapi-schema.json | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/qga/qapi-schema.json b/qga/qapi-schema.json
index bb9bac0655..6a20eeb297 100644
--- a/qga/qapi-schema.json
+++ b/qga/qapi-schema.json
@@ -1553,11 +1553,11 @@
 ##
 # @GuestDiskStatsInfo:
 #
-# @name disk name
+# @name: disk name
 #
-# @major major device number of disk
+# @major: major device number of disk
 #
-# @minor minor device number of disk
+# @minor: minor device number of disk
 ##
 { 'struct': 'GuestDiskStatsInfo',
   'data': {'name': 'str',
-- 
2.39.2

[PATCH 04/16] qapi: Fix up references to long gone error classes

2023-04-04 Thread Markus Armbruster

Commit de253f14912e88f4 (qmp: switch to the new error format on the
wire) removed most error classes.  Several later commits mistakenly
mentioned them in documentation.  Replace them by the actual error
class there.

Fixes: 44e3e053af56 (qmp: add interface blockdev-snapshot-delete-internal-sync)
Fixes: f323bc9e8b3b (qmp: add interface blockdev-snapshot-internal-sync)
Fixes: ba1c048a8f9c (qapi: Introduce add-fd, remove-fd, query-fdsets)
Fixes: ed61fc10e8c8 (QAPI: add command for live block commit, 'block-commit')
Fixes: e4c8f004c55d (qapi: convert sendkey)
Signed-off-by: Markus Armbruster 
---
 qapi/block-core.json | 4 ++--
 qapi/misc.json   | 6 +++---
 qapi/ui.json | 2 +-
 3 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index c05ad0c07e..75f7c62405 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -5784,7 +5784,7 @@
 #  - If any snapshot matching @name exists, or @name is empty,
 #GenericError
 #  - If the format of the image used does not support it,
-#BlockFormatFeatureNotSupported
+#GenericError
 #
 # Since: 1.7
 #
@@ -5820,7 +5820,7 @@
 #  - If @device is not a valid block device, GenericError
 #  - If snapshot not found, GenericError
 #  - If the format of the image used does not support it,
-#BlockFormatFeatureNotSupported
+#GenericError
 #  - If @id and @name are both not specified, GenericError
 #
 # Since: 1.7
diff --git a/qapi/misc.json b/qapi/misc.json
index 6ddd16ea28..7e278ca1eb 100644
--- a/qapi/misc.json
+++ b/qapi/misc.json
@@ -349,8 +349,8 @@
 # @opaque: A free-form string that can be used to describe the fd.
 #
 # Returns: - @AddfdInfo on success
-#  - If file descriptor was not received, FdNotSupplied
-#  - If @fdset-id is a negative value, InvalidParameterValue
+#  - If file descriptor was not received, GenericError
+#  - If @fdset-id is a negative value, GenericError
 #
 # Notes: The list of fd sets is shared by all monitor connections.
 #
@@ -379,7 +379,7 @@
 # @fd: The file descriptor that is to be removed.
 #
 # Returns: - Nothing on success
-#  - If @fdset-id or @fd is not found, FdNotFound
+#  - If @fdset-id or @fd is not found, GenericError
 #
 # Since: 1.2
 #
diff --git a/qapi/ui.json b/qapi/ui.json
index 98322342f7..25f9d731df 100644
--- a/qapi/ui.json
+++ b/qapi/ui.json
@@ -985,7 +985,7 @@
 # to 100
 #
 # Returns: - Nothing on success
-#  - If key is unknown or redundant, InvalidParameter
+#  - If key is unknown or redundant, GenericError
 #
 # Since: 1.3
 #
-- 
2.39.2

[PATCH 13/16] qapi: Replace ad hoc "since" documentation by member documentation

2023-04-04 Thread Markus Armbruster

MemoryDeviceInfoKind, NetClientDriver, and GuestPanicAction mention
some members only in ad hoc since documentation.  The generated
documentation shows these members as "Not documented".

Replace by formal member documentation.

Add actual documentation text for the GuestPanicAction members, to
match existing member documentation there.  For the others, merely
move existing "since" information.

Signed-off-by: Markus Armbruster 
---
 qapi/machine.json   | 11 ---
 qapi/net.json   | 21 -
 qapi/run-state.json |  6 +-
 3 files changed, 21 insertions(+), 17 deletions(-)

diff --git a/qapi/machine.json b/qapi/machine.json
index 20541cb319..172ef0ad96 100644
--- a/qapi/machine.json
+++ b/qapi/machine.json
@@ -1260,6 +1260,14 @@
 ##
 # @MemoryDeviceInfoKind:
 #
+# @nvdimm: since 2.12
+#
+# @virtio-pmem: since 4.1
+#
+# @virtio-mem: since 5.1
+#
+# @sgx-epc: since 6.2.
+#
 # Since: 2.1
 ##
 { 'enum': 'MemoryDeviceInfoKind',
@@ -1302,9 +1310,6 @@
 #
 # Union containing information about a memory device
 #
-# nvdimm is included since 2.12. virtio-pmem is included since 4.1.
-# virtio-mem is included since 5.1. sgx-epc is included since 6.2.
-#
 # Since: 2.1
 ##
 { 'union': 'MemoryDeviceInfo',
diff --git a/qapi/net.json b/qapi/net.json
index 1f1e148f01..3f9db0a718 100644
--- a/qapi/net.json
+++ b/qapi/net.json
@@ -641,14 +641,15 @@
 #
 # Available netdev drivers.
 #
-# Since: 2.7
+# @l2tpv3: since 2.1
+# @vhost-vdpa: since 5.1
+# @vmnet-host: since 7.1
+# @vmnet-shared: since 7.1
+# @vmnet-bridged: since 7.1
+# @stream: since 7.2
+# @dgram: since 7.2
 #
-#@vhost-vdpa since 5.1
-#@vmnet-host since 7.1
-#@vmnet-shared since 7.1
-#@vmnet-bridged since 7.1
-#@stream since 7.2
-#@dgram since 7.2
+# Since: 2.7
 ##
 { 'enum': 'NetClientDriver',
   'data': [ 'none', 'nic', 'user', 'tap', 'l2tpv3', 'socket', 'stream',
@@ -669,12 +670,6 @@
 #
 # Since: 1.2
 #
-#'l2tpv3' - since 2.1
-#'vmnet-host' - since 7.1
-#'vmnet-shared' - since 7.1
-#'vmnet-bridged' - since 7.1
-#'stream' since 7.2
-#'dgram' since 7.2
 ##
 { 'union': 'Netdev',
   'base': { 'id': 'str', 'type': 'NetClientDriver' },
diff --git a/qapi/run-state.json b/qapi/run-state.json
index 419c188dd1..52495884c5 100644
--- a/qapi/run-state.json
+++ b/qapi/run-state.json
@@ -468,7 +468,11 @@
 #
 # @pause: system pauses
 #
-# Since: 2.1 (poweroff since 2.8, run since 5.0)
+# @poweroff: system powers off (since 2.8)
+#
+# @run: system continues to run (since 5.0)
+#
+# Since: 2.1
 ##
 { 'enum': 'GuestPanicAction',
   'data': [ 'pause', 'poweroff', 'run' ] }
-- 
2.39.2

Re: [PATCH] block/nvme: use AIO_WAIT_WHILE_UNLOCKED()

2023-04-04 Thread Philippe Mathieu-Daudé


On 4/4/23 13:20, Stefan Hajnoczi wrote:

A few Admin Queue commands are submitted during nvme_file_open(). They
are synchronous since device initialization cannot continue until the
commands complete.

AIO_WAIT_WHILE() is currently used, but the block/nvme.c code actually
doesn't rely on the AioContext lock. Replace it with
AIO_WAIT_WHILE_UNLOCKED(NULL, condition). There is no change in behavior
and the dependency on the AioContext lock is eliminated.

This is a step towards removing the AioContext lock.

Signed-off-by: Stefan Hajnoczi 
---
  block/nvme.c | 3 +--
  1 file changed, 1 insertion(+), 2 deletions(-)


Reviewed-by: Philippe Mathieu-Daudé

[PATCH] block/nvme: use AIO_WAIT_WHILE_UNLOCKED()

2023-04-04 Thread Stefan Hajnoczi

A few Admin Queue commands are submitted during nvme_file_open(). They
are synchronous since device initialization cannot continue until the
commands complete.

AIO_WAIT_WHILE() is currently used, but the block/nvme.c code actually
doesn't rely on the AioContext lock. Replace it with
AIO_WAIT_WHILE_UNLOCKED(NULL, condition). There is no change in behavior
and the dependency on the AioContext lock is eliminated.

This is a step towards removing the AioContext lock.

Signed-off-by: Stefan Hajnoczi 
---
 block/nvme.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/block/nvme.c b/block/nvme.c
index 5b744c2bda..829b9c04db 100644
--- a/block/nvme.c
+++ b/block/nvme.c
@@ -512,7 +512,6 @@ static int nvme_admin_cmd_sync(BlockDriverState *bs, 
NvmeCmd *cmd)
 {
 BDRVNVMeState *s = bs->opaque;
 NVMeQueuePair *q = s->queues[INDEX_ADMIN];
-AioContext *aio_context = bdrv_get_aio_context(bs);
 NVMeRequest *req;
 int ret = -EINPROGRESS;
 req = nvme_get_free_req_nowait(q);
@@ -521,7 +520,7 @@ static int nvme_admin_cmd_sync(BlockDriverState *bs, 
NvmeCmd *cmd)
 }
 nvme_submit_command(q, req, cmd, nvme_admin_cmd_sync_cb, &ret);
 
-AIO_WAIT_WHILE(aio_context, ret == -EINPROGRESS);
+AIO_WAIT_WHILE_UNLOCKED(NULL, ret == -EINPROGRESS);
 return ret;
 }
 
-- 
2.39.2

Re: [PATCH v2 09/11] tests/vm: use the default system python for NetBSD

2023-04-04 Thread Philippe Mathieu-Daudé


On 3/4/23 15:49, Alex Bennée wrote:

From: Daniel P. Berrangé 

Currently our NetBSD VM recipe requests instal of the python37 package
and explicitly tells QEMU to use that version of python. Since the
NetBSD base ISO was updated to version 9.3 though, the default system
python version is 3.9 which is sufficiently new for QEMU to rely on.
Rather than requesting an older python, just test against the default
system python which is what most users will have.

Signed-off-by: Daniel P. Berrangé 
Reviewed-by: Paolo Bonzini 
Reviewed-by: Philippe Mathieu-Daudé 
Message-Id: <20230329124601.822209-1-berra...@redhat.com>
Signed-off-by: Alex Bennée 
Reviewed-by: Thomas Huth 
Message-Id: <20230330101141.30199-9-alex.ben...@linaro.org>
---
  tests/vm/netbsd | 3 +--
  1 file changed, 1 insertion(+), 2 deletions(-)


Tested-by: Philippe Mathieu-Daudé

Re: [PATCH 0/2] virtio-scsi: stop using aio_disable_external() during unplug

2023-04-04 Thread Daniil Tatianin


On 3/23/23 9:56 PM, Stefan Hajnoczi wrote:

The aio_disable_external() API is a solution for stopping I/O during critical
sections. The newer BlockDevOps->drained_begin/end/poll() callbacks offer a
cleaner solution that supports the upcoming multi-queue block layer. This
series removes aio_disable_external() from the virtio-scsi emulation code.

Patch 1 is a fix for something I noticed when reading the code.

Patch 2 replaces aio_disable_external() with functionality for safe hot unplug
that's mostly already present in the SCSI emulation code.

Stefan Hajnoczi (2):
   virtio-scsi: avoid race between unplug and transport event
   virtio-scsi: stop using aio_disable_external() during unplug

  hw/scsi/scsi-bus.c|  3 ++-
  hw/scsi/scsi-disk.c   |  1 +
  hw/scsi/virtio-scsi.c | 21 +
  3 files changed, 12 insertions(+), 13 deletions(-)



For both patches:
Reviewed-by: Daniil Tatianin

Re: [PATCH 08/13] hw/xen: do not use aio_set_fd_handler(is_external=true) in xen_xenstore

2023-04-04 Thread David Woodhouse

On Mon, 2023-04-03 at 14:29 -0400, Stefan Hajnoczi wrote:
> There is no need to suspend activity between aio_disable_external() and
> aio_enable_external(), which is mainly used for the block layer's drain
> operation.
> 
> This is part of ongoing work to remove the aio_disable_external() API.
> 
> Signed-off-by: Stefan Hajnoczi 

Reviewed-by: David Woodhouse 

Thanks.

> ---
>  hw/i386/kvm/xen_xenstore.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/hw/i386/kvm/xen_xenstore.c b/hw/i386/kvm/xen_xenstore.c
> index 900679af8a..6e81bc8791 100644
> --- a/hw/i386/kvm/xen_xenstore.c
> +++ b/hw/i386/kvm/xen_xenstore.c
> @@ -133,7 +133,7 @@ static void xen_xenstore_realize(DeviceState *dev, Error 
> **errp)
>  error_setg(errp, "Xenstore evtchn port init failed");
>  return;
>  }
> -    aio_set_fd_handler(qemu_get_aio_context(), xen_be_evtchn_fd(s->eh), true,
> +    aio_set_fd_handler(qemu_get_aio_context(), xen_be_evtchn_fd(s->eh), 
> false,
>     xen_xenstore_event, NULL, NULL, NULL, s);
>  
>  s->impl = xs_impl_create(xen_domid);



smime.p7s
Description: S/MIME cryptographic signature

Re: [PATCH 13/13] aio: remove aio_disable_external() API

2023-04-04 Thread Dr. David Alan Gilbert

* Juan Quintela (quint...@redhat.com) wrote:
> Stefan Hajnoczi  wrote:
> > All callers now pass is_external=false to aio_set_fd_handler() and
> > aio_set_event_notifier(). The aio_disable_external() API that
> > temporarily disables fd handlers that were registered is_external=true
> > is therefore dead code.
> >
> > Remove aio_disable_external(), aio_enable_external(), and the
> > is_external arguments to aio_set_fd_handler() and
> > aio_set_event_notifier().
> >
> > The entire test-fdmon-epoll test is removed because its sole purpose was
> > testing aio_disable_external().
> >
> > Parts of this patch were generated using the following coccinelle
> > (https://coccinelle.lip6.fr/) semantic patch:
> >
> >   @@
> >   expression ctx, fd, is_external, io_read, io_write, io_poll, 
> > io_poll_ready, opaque;
> >   @@
> >   - aio_set_fd_handler(ctx, fd, is_external, io_read, io_write, io_poll, 
> > io_poll_ready, opaque)
> >   + aio_set_fd_handler(ctx, fd, io_read, io_write, io_poll, io_poll_ready, 
> > opaque)
> >
> >   @@
> >   expression ctx, notifier, is_external, io_read, io_poll, io_poll_ready;
> >   @@
> >   - aio_set_event_notifier(ctx, notifier, is_external, io_read, io_poll, 
> > io_poll_ready)
> >   + aio_set_event_notifier(ctx, notifier, io_read, io_poll, io_poll_ready)
> >
> > Signed-off-by: Stefan Hajnoczi 
> 
> []
> 
> > diff --git a/migration/rdma.c b/migration/rdma.c
> > index df646be35e..aee41ca43e 100644
> > --- a/migration/rdma.c
> > +++ b/migration/rdma.c
> > @@ -3104,15 +3104,15 @@ static void 
> > qio_channel_rdma_set_aio_fd_handler(QIOChannel *ioc,
> >  {
> >  QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(ioc);
> >  if (io_read) {
> > -aio_set_fd_handler(ctx, rioc->rdmain->recv_comp_channel->fd,
> > -   false, io_read, io_write, NULL, NULL, opaque);
> > -aio_set_fd_handler(ctx, rioc->rdmain->send_comp_channel->fd,
> > -   false, io_read, io_write, NULL, NULL, opaque);
> > +aio_set_fd_handler(ctx, rioc->rdmain->recv_comp_channel->fd, 
> > io_read,
> > +   io_write, NULL, NULL, opaque);
> > +aio_set_fd_handler(ctx, rioc->rdmain->send_comp_channel->fd, 
> > io_read,
> > +   io_write, NULL, NULL, opaque);
> >  } else {
> > -aio_set_fd_handler(ctx, rioc->rdmaout->recv_comp_channel->fd,
> > -   false, io_read, io_write, NULL, NULL, opaque);
> > -aio_set_fd_handler(ctx, rioc->rdmaout->send_comp_channel->fd,
> > -   false, io_read, io_write, NULL, NULL, opaque);
> > +aio_set_fd_handler(ctx, rioc->rdmaout->recv_comp_channel->fd, 
> > io_read,
> > +   io_write, NULL, NULL, opaque);
> > +aio_set_fd_handler(ctx, rioc->rdmaout->send_comp_channel->fd, 
> > io_read,
> > +   io_write, NULL, NULL, opaque);
> >  }
> >  }
> 
> Reviewed-by: Juan Quintela 
> 
> For the migration bits.
> I don't even want to know why the RDMA code uses a low level block layer API.

I don't think it's block specific.
It looks like it's because qio_channel uses aio in the case where
something QIO_CHANNEL_ERR_BLOCK and then waits for the recovery; see
4d9f675 that added it.

Dave
-- 
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [PATCH 13/13] aio: remove aio_disable_external() API

2023-04-04 Thread Juan Quintela

Stefan Hajnoczi  wrote:
> All callers now pass is_external=false to aio_set_fd_handler() and
> aio_set_event_notifier(). The aio_disable_external() API that
> temporarily disables fd handlers that were registered is_external=true
> is therefore dead code.
>
> Remove aio_disable_external(), aio_enable_external(), and the
> is_external arguments to aio_set_fd_handler() and
> aio_set_event_notifier().
>
> The entire test-fdmon-epoll test is removed because its sole purpose was
> testing aio_disable_external().
>
> Parts of this patch were generated using the following coccinelle
> (https://coccinelle.lip6.fr/) semantic patch:
>
>   @@
>   expression ctx, fd, is_external, io_read, io_write, io_poll, io_poll_ready, 
> opaque;
>   @@
>   - aio_set_fd_handler(ctx, fd, is_external, io_read, io_write, io_poll, 
> io_poll_ready, opaque)
>   + aio_set_fd_handler(ctx, fd, io_read, io_write, io_poll, io_poll_ready, 
> opaque)
>
>   @@
>   expression ctx, notifier, is_external, io_read, io_poll, io_poll_ready;
>   @@
>   - aio_set_event_notifier(ctx, notifier, is_external, io_read, io_poll, 
> io_poll_ready)
>   + aio_set_event_notifier(ctx, notifier, io_read, io_poll, io_poll_ready)
>
> Signed-off-by: Stefan Hajnoczi 

[]

> diff --git a/migration/rdma.c b/migration/rdma.c
> index df646be35e..aee41ca43e 100644
> --- a/migration/rdma.c
> +++ b/migration/rdma.c
> @@ -3104,15 +3104,15 @@ static void 
> qio_channel_rdma_set_aio_fd_handler(QIOChannel *ioc,
>  {
>  QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(ioc);
>  if (io_read) {
> -aio_set_fd_handler(ctx, rioc->rdmain->recv_comp_channel->fd,
> -   false, io_read, io_write, NULL, NULL, opaque);
> -aio_set_fd_handler(ctx, rioc->rdmain->send_comp_channel->fd,
> -   false, io_read, io_write, NULL, NULL, opaque);
> +aio_set_fd_handler(ctx, rioc->rdmain->recv_comp_channel->fd, io_read,
> +   io_write, NULL, NULL, opaque);
> +aio_set_fd_handler(ctx, rioc->rdmain->send_comp_channel->fd, io_read,
> +   io_write, NULL, NULL, opaque);
>  } else {
> -aio_set_fd_handler(ctx, rioc->rdmaout->recv_comp_channel->fd,
> -   false, io_read, io_write, NULL, NULL, opaque);
> -aio_set_fd_handler(ctx, rioc->rdmaout->send_comp_channel->fd,
> -   false, io_read, io_write, NULL, NULL, opaque);
> +aio_set_fd_handler(ctx, rioc->rdmaout->recv_comp_channel->fd, 
> io_read,
> +   io_write, NULL, NULL, opaque);
> +aio_set_fd_handler(ctx, rioc->rdmaout->send_comp_channel->fd, 
> io_read,
> +   io_write, NULL, NULL, opaque);
>  }
>  }

Reviewed-by: Juan Quintela 

For the migration bits.
I don't even want to know why the RDMA code uses a low level block layer API.

Re: [PATCH v2 for 8.0?] nbd/server: Request TCP_NODELAY

2023-04-04 Thread Philippe Mathieu-Daudé


On 4/4/23 02:40, Eric Blake wrote:

Nagle's algorithm adds latency in order to reduce network packet
overhead on small packets.  But when we are already using corking to
merge smaller packets into transactional requests, the extra delay
from TCP defaults just gets in the way (see recent commit bd2cd4a4).

For reference, qemu as an NBD client already requests TCP_NODELAY (see
nbd_connect() in nbd/client-connection.c); as does libnbd as a client
[1], and nbdkit as a server [2].  Furthermore, the NBD spec recommends
the use of TCP_NODELAY [3].

[1] 
https://gitlab.com/nbdkit/libnbd/-/blob/a48a1142/generator/states-connect.c#L39
[2] https://gitlab.com/nbdkit/nbdkit/-/blob/45b72f5b/server/sockets.c#L430
[3] 
https://github.com/NetworkBlockDevice/nbd/blob/master/doc/proto.md#protocol-phases

CC: Florian Westphal 
Signed-off-by: Eric Blake 
Message-Id: <20230327192947.1324372-1-ebl...@redhat.com>
---

v2 fix typo, enhance commit message

Given that corking made it in through Kevin's tree for 8.0-rc2 but
this one did not, but I didn't get any R-b, is there any objection to
me doing a pull request to get this into 8.0-rc3?


FWIW, no objection.


  nbd/server.c | 1 +
  1 file changed, 1 insertion(+)


Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH 2/4] block: Split padded I/O vectors exceeding IOV_MAX

2023-04-04 Thread Vladimir Sementsov-Ogievskiy


On 03.04.23 16:33, Hanna Czenczek wrote:

(Sorry for the rather late reply... Thanks for the review!)

On 20.03.23 11:31, Vladimir Sementsov-Ogievskiy wrote:

On 17.03.23 20:50, Hanna Czenczek wrote:


[...]


diff --git a/block/io.c b/block/io.c
index 8974d46941..1e9cdba17a 100644
--- a/block/io.c
+++ b/block/io.c


[..]


+    pad->write = write;
+
  return true;
  }
  @@ -1545,6 +1561,18 @@ zero_mem:
    static void bdrv_padding_destroy(BdrvRequestPadding *pad)


Maybe, rename to _finalize, to stress that it's not only freeing memory.


Sounds good!

[...]


@@ -1552,6 +1580,101 @@ static void bdrv_padding_destroy(BdrvRequestPadding 
*pad)
  memset(pad, 0, sizeof(*pad));
  }
  +/*
+ * Create pad->local_qiov by wrapping @iov in the padding head and tail, while
+ * ensuring that the resulting vector will not exceed IOV_MAX elements.
+ *
+ * To ensure this, when necessary, the first couple of elements (up to three)


maybe, "first two-three elements"


Sure (here and...

[...]


+    /*
+ * If padded_niov > IOV_MAX, we cannot just concatenate everything.
+ * Instead, merge the first couple of elements of @iov to reduce the number


maybe, "first two-three elements"


...here).




+ * of vector elements as necessary.
+ */
+    if (padded_niov > IOV_MAX) {



[..]


@@ -1653,8 +1786,8 @@ int coroutine_fn bdrv_co_preadv_part(BdrvChild *child,
  flags |= BDRV_REQ_COPY_ON_READ;
  }
  -    ret = bdrv_pad_request(bs, &qiov, &qiov_offset, &offset, &bytes, &pad,
-   NULL, &flags);
+    ret = bdrv_pad_request(bs, &qiov, &qiov_offset, &offset, &bytes, false,
+   &pad, NULL, &flags);
  if (ret < 0) {
  goto fail;
  }


a bit later:

tracked_request_end(&req);
bdrv_padding_destroy(&pad);


Now, the request is formally finished inside bdrv_padding_destroy().. Not sure, does it really violate something, but seems safer to swap these two calls. 


I’d rather not, for two reasons: First, tracked requests are (as far as I 
understand) only there to implement request serialization, and so only care 
about metadata (offset, length, and type), which is not changed by changes to 
the I/O vector.

Second, even if the state of the I/O vector were relevant to tracked requests, 
I think it would actually be the other way around, i.e. the tracked request 
must be ended before the padding is finalized/destroyed.  The tracked request 
is about the actual request we submit to `child` (which is why 
tracked_request_begin() is called after bdrv_pad_request()), and that request 
is done using the modified I/O vector.  So if the tracked request had any 
connection to the request’s I/O vector (which it doesn’t), it would be to this 
modified one, so we mustn’t invalidate it via bdrv_padding_finalize() while the 
tracked request lives.

Or, said differently: I generally try to clean up things in the inverse way 
they were set up, and because bdrv_pad_requests() comes before 
tracked_request_begin(), I think tracked_request_end() should come before 
bdrv_padding_finalize().


Note, that it's wise-versa in bdrv_co_pwritev_part().

For me it's just simpler to think that the whole request, including filling 
user-given qiov with data on read part is inside tracked_request_begin() / 
tracked_request_end(). And moving the last manipulation with qiov out of it 
breaks this simple thought.
Guest should not care of it, as it doesn't know about request tracking.. But 
what about internal code? Some code may depend on some requests be finished 
after bdrv_drained_begin() call, but now they may be not fully finished, and 
some data may be not copied back to original qiov.

I agree with your point about sequence of objects finalization, but maybe, that 
just shows that copying data back to qiov should not be a part of 
bdrv_padding_finalize(), but instead be a separate function, called before 
tracked_request_end().




With that:

Reviewed-by: Vladimir Sementsov-Ogievskiy 



PS, I feel here still exists small space for optimization:


The question is whether any optimization is really worth it, and I’m not sure 
it is.  The bug has been in qemu for over two years, and because the only 
report I’ve seen about it came from our QE department, it seems like a very 
rare case, so I find it more important for the code to be as simple as possible 
than to optimize.


move the logic to bdrv_init_padding(), and

1. allocate only one buffer
2. make the new collpase are to be attached to head or tail padding
3. avoid creating extra iov-slice, maybe with help of some new qemu_iovec_* API 
that can control number of copied/to-be-copied iovs and/or calculation number 
of iovs in qiov/qiov_offset/bytes slice


I’ve actually begun by trying to reuse the padding buffer, and to collapse 
head/tail into it, but found it to be rather complicated. See also my reply to 
Stefan here: 
https://lists.nongnu.org/archive/html/qemu-devel/2023-03/msg04774.html

Hanna



--
Best re

63 matches

Mail list logo