Re: [PATCH 0/3] iotests: multiprocessing!!

2021-12-07 Thread John Snow
On Mon, Dec 6, 2021 at 3:26 PM Vladimir Sementsov-Ogievskiy <
vsement...@virtuozzo.com> wrote:

> 06.12.2021 21:37, John Snow wrote:
> >
> >
> > On Fri, Dec 3, 2021 at 7:22 AM Vladimir Sementsov-Ogievskiy <
> vsement...@virtuozzo.com > wrote:
> >
> > Hi all!
> >
> > Finally, I can not stand it any longer. So, I'm happy to present
> > multiprocessing support for iotests test runner.
> >
> > testing on tmpfs:
> >
> > Before:
> >
> > time check -qcow2
> > ...
> > real12m28.095s
> > user9m53.398s
> > sys 2m55.548s
> >
> > After:
> >
> > time check -qcow2 -j 12
> > ...
> > real2m12.136s
> > user12m40.948s
> > sys 4m7.449s
> >
> >
> > VERY excellent. And this will probably flush a lot more bugs loose, too.
> (Which I consider a good thing!)
>
> Thanks!)
>
> > We could look into utilizing it for 'make check', but we'll have to be
> prepared for a greater risk of race conditions on the CI if we do. But...
> it's seriously hard to argue with this kind of optimization, very well done!
>
> I thought about this too. I think, we can at least passthrought -j flag of
> "make -j9 check" to ./check
>
> I think, CIs mostly call make check without -j flag. But I always call
> make -j9 check. And it always upset me that all tests run in parallel
> except for iotests. So if it possible to detect that we are called through
> "make -j9 check" and use "-j 9" for iotests/check in this case, it would be
> good.
>
> >
> >
> > Hmm, seems -j 6 should be enough. I have 6 cores, 2 threads per core.
> > Anyway, that's so fast!
> >
> > Vladimir Sementsov-Ogievskiy (3):
> >iotests/testrunner.py: add doc string for run_test()
> >iotests/testrunner.py: move updating last_elapsed to run_tests
> >iotests: check: multiprocessing support
> >
> >   tests/qemu-iotests/check |  4 +-
> >   tests/qemu-iotests/testrunner.py | 86
> 
> >   2 files changed, 80 insertions(+), 10 deletions(-)
> >
> > --
> > 2.31.1
> >
>
>
>
I'll also now add:

Tested-by: John Snow 

I tried to find a different workaround in just a few minutes, but that just
made it clear that your solution was right. While I had it checked out, I
ran it a few times and it looks good to me!
(And no new problems from the Python CI stuff, so it looks good to me.)


Re: [RFC v3 0/4] tls: add macros for coroutine-safe TLS variables

2021-12-07 Thread Stefan Hajnoczi
On Tue, Dec 07, 2021 at 01:55:34PM +, Peter Maydell wrote:
> On Tue, 7 Dec 2021 at 13:53, Stefan Hajnoczi  wrote:
> >
> > On Mon, Dec 06, 2021 at 02:34:45PM +, Peter Maydell wrote:
> > > On Mon, 6 Dec 2021 at 14:33, Stefan Hajnoczi  wrote:
> > > >
> > > > v3:
> > > > - Added __attribute__((weak)) to get_ptr_*() [Florian]
> > >
> > > Do we really need it *only* on get_ptr_*() ? If we need to
> > > noinline the other two we probably also should use the same
> > > attribute weak to force no optimizations at all.
> >
> > The weak attribute can't be used on static functions, so I think we need
> > a different approach:
> >
> > In file included from ../util/async.c:35:
> > /builds/stefanha/qemu/include/qemu/coroutine-tls.h:201:11: error: weak 
> > declaration of 'get_ptr_my_aiocontext' must be public
> >  type *get_ptr_##var(void)  
> >   \
> >^~~~
> > ../util/async.c:673:1: note: in expansion of macro 
> > 'QEMU_DEFINE_STATIC_CO_TLS'
> >  QEMU_DEFINE_STATIC_CO_TLS(AioContext *, my_aiocontext)
> >  ^
> >
> > Adding asm volatile("") seems to work though:
> > https://godbolt.org/z/3hn8Gh41d
> 
> You can see in the clang disassembly there that this isn't
> sufficient. The compiler puts in both calls, but it ignores
> the return results and always returns "true" from the function.

You're right! I missed that the return value of the call isn't used >_<.

Stefan


signature.asc
Description: PGP signature


Re: [PATCH] spec: Add NBD_OPT_EXTENDED_HEADERS

2021-12-07 Thread Wouter Verhelst
On Mon, Dec 06, 2021 at 05:00:47PM -0600, Eric Blake wrote:
> On Mon, Dec 06, 2021 at 02:40:45PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> > >    Simple reply message
> > > 
> > >   The simple reply message MUST be sent by the server in response to all
> > >   requests if structured replies have not been negotiated using
> > > -`NBD_OPT_STRUCTURED_REPLY`. If structured replies have been negotiated, 
> > > a simple
> > > -reply MAY be used as a reply to any request other than `NBD_CMD_READ`,
> > > -but only if the reply has no data payload.  The message looks as
> > > -follows:
> > > +`NBD_OPT_STRUCTURED_REPLY`. If structured replies have been
> > > +negotiated, a simple reply MAY be used as a reply to any request other
> > > +than `NBD_CMD_READ`, but only if the reply has no data payload.  If
> > > +extended headers were not negotiated using `NBD_OPT_EXTENDED_HEADERS`,
> > > +the message looks as follows:
> > > 
> > >   S: 32 bits, 0x67446698, magic (`NBD_SIMPLE_REPLY_MAGIC`; used to be
> > >  `NBD_REPLY_MAGIC`)
> > > @@ -369,6 +398,16 @@ S: 64 bits, handle
> > >   S: (*length* bytes of data if the request is of type `NBD_CMD_READ` and
> > >   *error* is zero)
> > > 
> > > +If extended headers were negotiated using `NBD_OPT_EXTENDED_HEADERS`,
> > > +the message looks like:
> > > +
> > > +S: 32 bits, 0x60d12fd6, magic (`NBD_SIMPLE_REPLY_EXT_MAGIC`)
> > > +S: 32 bits, error (MAY be zero)
> > > +S: 64 bits, handle
> > > +S: 128 bits, padding (MUST be zero)
> > > +S: (*length* bytes of data if the request is of type `NBD_CMD_READ` and
> > > +*error* is zero)
> > > +
> > 
> > If we go this way, let's put payload length into padding: it will help to 
> > make the protocol context-independent and less error-prone.

Agreed.

> Easy enough to do (the payload length will be 0 except for
> NBD_CMD_READ).

Indeed.

> > Or, the otherway, may be just forbid the payload for simple-64bit ? What's 
> > the reason to allow 64bit requests without structured reply negotiation?
> 
> The two happened to be orthogonal enough in my implementation.  It was
> easy to demonstrate either one without the other, and it IS easier to
> write a client using non-structured replies (structured reads ARE
> tougher than simple reads, even if it is less efficient when it comes
> to reading zeros).  But you are also right that we could require
> structured reads prior to allowing 64-bit operations, and then have
> only one supported reply type on the wire when negotiated.  Wouter,
> which way do you prefer?

Given that I *still* haven't gotten around to implementing structured
replies for nbd-server, I'd prefer not to require it, but that's not
really a decent argument IMO :-)

[... I haven't read this in much detail yet, intend to do that later...]

-- 
 w@uter.{be,co.za}
wouter@{grep.be,fosdem.org,debian.org}



Re: [RFC v3 0/4] tls: add macros for coroutine-safe TLS variables

2021-12-07 Thread Peter Maydell
On Tue, 7 Dec 2021 at 13:53, Stefan Hajnoczi  wrote:
>
> On Mon, Dec 06, 2021 at 02:34:45PM +, Peter Maydell wrote:
> > On Mon, 6 Dec 2021 at 14:33, Stefan Hajnoczi  wrote:
> > >
> > > v3:
> > > - Added __attribute__((weak)) to get_ptr_*() [Florian]
> >
> > Do we really need it *only* on get_ptr_*() ? If we need to
> > noinline the other two we probably also should use the same
> > attribute weak to force no optimizations at all.
>
> The weak attribute can't be used on static functions, so I think we need
> a different approach:
>
> In file included from ../util/async.c:35:
> /builds/stefanha/qemu/include/qemu/coroutine-tls.h:201:11: error: weak 
> declaration of 'get_ptr_my_aiocontext' must be public
>  type *get_ptr_##var(void)
> \
>^~~~
> ../util/async.c:673:1: note: in expansion of macro 'QEMU_DEFINE_STATIC_CO_TLS'
>  QEMU_DEFINE_STATIC_CO_TLS(AioContext *, my_aiocontext)
>  ^
>
> Adding asm volatile("") seems to work though:
> https://godbolt.org/z/3hn8Gh41d

You can see in the clang disassembly there that this isn't
sufficient. The compiler puts in both calls, but it ignores
the return results and always returns "true" from the function.

-- PMM



Re: [RFC v3 0/4] tls: add macros for coroutine-safe TLS variables

2021-12-07 Thread Stefan Hajnoczi
On Mon, Dec 06, 2021 at 02:34:45PM +, Peter Maydell wrote:
> On Mon, 6 Dec 2021 at 14:33, Stefan Hajnoczi  wrote:
> >
> > v3:
> > - Added __attribute__((weak)) to get_ptr_*() [Florian]
> 
> Do we really need it *only* on get_ptr_*() ? If we need to
> noinline the other two we probably also should use the same
> attribute weak to force no optimizations at all.

The weak attribute can't be used on static functions, so I think we need
a different approach:

In file included from ../util/async.c:35:
/builds/stefanha/qemu/include/qemu/coroutine-tls.h:201:11: error: weak 
declaration of 'get_ptr_my_aiocontext' must be public
 type *get_ptr_##var(void)\
   ^~~~
../util/async.c:673:1: note: in expansion of macro 'QEMU_DEFINE_STATIC_CO_TLS'
 QEMU_DEFINE_STATIC_CO_TLS(AioContext *, my_aiocontext)
 ^

Adding asm volatile("") seems to work though:
https://godbolt.org/z/3hn8Gh41d

The GCC documentation mentions combining noinline with asm(""):
https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#index-noinline-function-attribute

Stefan


signature.asc
Description: PGP signature


Re: [RFC v3 0/4] tls: add macros for coroutine-safe TLS variables

2021-12-07 Thread Stefan Hajnoczi
On Mon, Dec 06, 2021 at 02:34:45PM +, Peter Maydell wrote:
> On Mon, 6 Dec 2021 at 14:33, Stefan Hajnoczi  wrote:
> >
> > v3:
> > - Added __attribute__((weak)) to get_ptr_*() [Florian]
> 
> Do we really need it *only* on get_ptr_*() ? If we need to
> noinline the other two we probably also should use the same
> attribute weak to force no optimizations at all.

I don't know but it does seem safer to use weak in all cases.

Florian and others?

Stefan


signature.asc
Description: PGP signature


[PATCH v3 1/6] aio-posix: split poll check from ready handler

2021-12-07 Thread Stefan Hajnoczi
Adaptive polling measures the execution time of the polling check plus
handlers called when a polled event becomes ready. Handlers can take a
significant amount of time, making it look like polling was running for
a long time when in fact the event handler was running for a long time.

For example, on Linux the io_submit(2) syscall invoked when a virtio-blk
device's virtqueue becomes ready can take 10s of microseconds. This
can exceed the default polling interval (32 microseconds) and cause
adaptive polling to stop polling.

By excluding the handler's execution time from the polling check we make
the adaptive polling calculation more accurate. As a result, the event
loop now stays in polling mode where previously it would have fallen
back to file descriptor monitoring.

The following data was collected with virtio-blk num-queues=2
event_idx=off using an IOThread. Before:

168k IOPS, IOThread syscalls:

  9837.115 ( 0.020 ms): IO iothread1/620155 io_submit(ctx_id: 140512552468480, 
nr: 16, iocbpp: 0x7fcb9f937db0)= 16
  9837.158 ( 0.002 ms): IO iothread1/620155 write(fd: 103, buf: 0x556a2ef71b88, 
count: 8) = 8
  9837.161 ( 0.001 ms): IO iothread1/620155 write(fd: 104, buf: 0x556a2ef71b88, 
count: 8) = 8
  9837.163 ( 0.001 ms): IO iothread1/620155 ppoll(ufds: 0x7fcb90002800, nfds: 
4, tsp: 0x7fcb9f1342d0, sigsetsize: 8) = 3
  9837.164 ( 0.001 ms): IO iothread1/620155 read(fd: 107, buf: 0x7fcb9f939cc0, 
count: 512)= 8
  9837.174 ( 0.001 ms): IO iothread1/620155 read(fd: 105, buf: 0x7fcb9f939cc0, 
count: 512)= 8
  9837.176 ( 0.001 ms): IO iothread1/620155 read(fd: 106, buf: 0x7fcb9f939cc0, 
count: 512)= 8
  9837.209 ( 0.035 ms): IO iothread1/620155 io_submit(ctx_id: 140512552468480, 
nr: 32, iocbpp: 0x7fca7d0cebe0)= 32

174k IOPS (+3.6%), IOThread syscalls:

  9809.566 ( 0.036 ms): IO iothread1/623061 io_submit(ctx_id: 140539805028352, 
nr: 32, iocbpp: 0x7fd0cdd62be0)= 32
  9809.625 ( 0.001 ms): IO iothread1/623061 write(fd: 103, buf: 0x5647cfba5f58, 
count: 8) = 8
  9809.627 ( 0.002 ms): IO iothread1/623061 write(fd: 104, buf: 0x5647cfba5f58, 
count: 8) = 8
  9809.663 ( 0.036 ms): IO iothread1/623061 io_submit(ctx_id: 140539805028352, 
nr: 32, iocbpp: 0x7fd0d0388b50)= 32

Notice that ppoll(2) and eventfd read(2) syscalls are eliminated because
the IOThread stays in polling mode instead of falling back to file
descriptor monitoring.

As usual, polling is not implemented on Windows so this patch ignores
the new io_poll_read() callback in aio-win32.c.

Signed-off-by: Stefan Hajnoczi 
---
 include/block/aio.h  |  4 +-
 util/aio-posix.h |  1 +
 block/curl.c | 11 ++---
 block/export/fuse.c  |  4 +-
 block/io_uring.c | 19 +
 block/iscsi.c|  4 +-
 block/linux-aio.c| 16 +---
 block/nfs.c  |  6 +--
 block/nvme.c | 51 +++
 block/ssh.c  |  4 +-
 block/win32-aio.c|  4 +-
 hw/virtio/virtio.c   | 16 +---
 hw/xen/xen-bus.c |  6 +--
 io/channel-command.c |  6 ++-
 io/channel-file.c|  3 +-
 io/channel-socket.c  |  3 +-
 migration/rdma.c |  8 ++--
 tests/unit/test-aio.c|  4 +-
 util/aio-posix.c | 89 ++--
 util/aio-win32.c |  4 +-
 util/async.c | 10 -
 util/main-loop.c |  4 +-
 util/qemu-coroutine-io.c |  5 ++-
 util/vhost-user-server.c | 11 ++---
 24 files changed, 191 insertions(+), 102 deletions(-)

diff --git a/include/block/aio.h b/include/block/aio.h
index 47fbe9d81f..5634173b12 100644
--- a/include/block/aio.h
+++ b/include/block/aio.h
@@ -469,6 +469,7 @@ void aio_set_fd_handler(AioContext *ctx,
 IOHandler *io_read,
 IOHandler *io_write,
 AioPollFn *io_poll,
+IOHandler *io_poll_ready,
 void *opaque);
 
 /* Set polling begin/end callbacks for a file descriptor that has already been
@@ -490,7 +491,8 @@ void aio_set_event_notifier(AioContext *ctx,
 EventNotifier *notifier,
 bool is_external,
 EventNotifierHandler *io_read,
-AioPollFn *io_poll);
+AioPollFn *io_poll,
+EventNotifierHandler *io_poll_ready);
 
 /* Set polling begin/end callbacks for an event notifier that has already been
  * registered with aio_set_event_notifier.  Do nothing if the event notifier is
diff --git a/util/aio-posix.h b/util/aio-posix.h
index c80c04506a..7f2c37a684 100644
--- a/util/aio-posix.h
+++ b/util/aio-posix.h
@@ -24,6 +24,7 @@ struct AioHandler {
 IOHandler *io_read;
 IOHandler *io_write;
 AioPollFn *io_poll;
+IOHand

[PATCH v3 2/6] virtio: get rid of VirtIOHandleAIOOutput

2021-12-07 Thread Stefan Hajnoczi
The virtqueue host notifier API
virtio_queue_aio_set_host_notifier_handler() polls the virtqueue for new
buffers. AioContext previously required a bool progress return value
indicating whether an event was handled or not. This is no longer
necessary because the AioContext polling API has been split into a poll
check function and an event handler function. The event handler is only
run when we know there is work to do, so it doesn't return bool.

The VirtIOHandleAIOOutput function signature is now the same as
VirtIOHandleOutput. Get rid of the bool return value.

Further simplifications will be made for virtio-blk and virtio-scsi in
the next patch.

Signed-off-by: Stefan Hajnoczi 
---
 include/hw/virtio/virtio.h  |  3 +--
 hw/block/dataplane/virtio-blk.c |  4 ++--
 hw/scsi/virtio-scsi-dataplane.c | 18 ++
 hw/virtio/virtio.c  | 12 
 4 files changed, 13 insertions(+), 24 deletions(-)

diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index 8bab9cfb75..b90095628f 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -175,7 +175,6 @@ void virtio_error(VirtIODevice *vdev, const char *fmt, ...) 
GCC_FMT_ATTR(2, 3);
 void virtio_device_set_child_bus_name(VirtIODevice *vdev, char *bus_name);
 
 typedef void (*VirtIOHandleOutput)(VirtIODevice *, VirtQueue *);
-typedef bool (*VirtIOHandleAIOOutput)(VirtIODevice *, VirtQueue *);
 
 VirtQueue *virtio_add_queue(VirtIODevice *vdev, int queue_size,
 VirtIOHandleOutput handle_output);
@@ -318,7 +317,7 @@ EventNotifier *virtio_queue_get_host_notifier(VirtQueue 
*vq);
 void virtio_queue_set_host_notifier_enabled(VirtQueue *vq, bool enabled);
 void virtio_queue_host_notifier_read(EventNotifier *n);
 void virtio_queue_aio_set_host_notifier_handler(VirtQueue *vq, AioContext *ctx,
-VirtIOHandleAIOOutput 
handle_output);
+VirtIOHandleOutput handle_output);
 VirtQueue *virtio_vector_first_queue(VirtIODevice *vdev, uint16_t vector);
 VirtQueue *virtio_vector_next_queue(VirtQueue *vq);
 
diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c
index ee5a5352dc..a2fa407b98 100644
--- a/hw/block/dataplane/virtio-blk.c
+++ b/hw/block/dataplane/virtio-blk.c
@@ -154,7 +154,7 @@ void virtio_blk_data_plane_destroy(VirtIOBlockDataPlane *s)
 g_free(s);
 }
 
-static bool virtio_blk_data_plane_handle_output(VirtIODevice *vdev,
+static void virtio_blk_data_plane_handle_output(VirtIODevice *vdev,
 VirtQueue *vq)
 {
 VirtIOBlock *s = (VirtIOBlock *)vdev;
@@ -162,7 +162,7 @@ static bool 
virtio_blk_data_plane_handle_output(VirtIODevice *vdev,
 assert(s->dataplane);
 assert(s->dataplane_started);
 
-return virtio_blk_handle_vq(s, vq);
+virtio_blk_handle_vq(s, vq);
 }
 
 /* Context: QEMU global mutex held */
diff --git a/hw/scsi/virtio-scsi-dataplane.c b/hw/scsi/virtio-scsi-dataplane.c
index 18eb824c97..76137de67f 100644
--- a/hw/scsi/virtio-scsi-dataplane.c
+++ b/hw/scsi/virtio-scsi-dataplane.c
@@ -49,49 +49,43 @@ void virtio_scsi_dataplane_setup(VirtIOSCSI *s, Error 
**errp)
 }
 }
 
-static bool virtio_scsi_data_plane_handle_cmd(VirtIODevice *vdev,
+static void virtio_scsi_data_plane_handle_cmd(VirtIODevice *vdev,
   VirtQueue *vq)
 {
-bool progress = false;
 VirtIOSCSI *s = VIRTIO_SCSI(vdev);
 
 virtio_scsi_acquire(s);
 if (!s->dataplane_fenced) {
 assert(s->ctx && s->dataplane_started);
-progress = virtio_scsi_handle_cmd_vq(s, vq);
+virtio_scsi_handle_cmd_vq(s, vq);
 }
 virtio_scsi_release(s);
-return progress;
 }
 
-static bool virtio_scsi_data_plane_handle_ctrl(VirtIODevice *vdev,
+static void virtio_scsi_data_plane_handle_ctrl(VirtIODevice *vdev,
VirtQueue *vq)
 {
-bool progress = false;
 VirtIOSCSI *s = VIRTIO_SCSI(vdev);
 
 virtio_scsi_acquire(s);
 if (!s->dataplane_fenced) {
 assert(s->ctx && s->dataplane_started);
-progress = virtio_scsi_handle_ctrl_vq(s, vq);
+virtio_scsi_handle_ctrl_vq(s, vq);
 }
 virtio_scsi_release(s);
-return progress;
 }
 
-static bool virtio_scsi_data_plane_handle_event(VirtIODevice *vdev,
+static void virtio_scsi_data_plane_handle_event(VirtIODevice *vdev,
 VirtQueue *vq)
 {
-bool progress = false;
 VirtIOSCSI *s = VIRTIO_SCSI(vdev);
 
 virtio_scsi_acquire(s);
 if (!s->dataplane_fenced) {
 assert(s->ctx && s->dataplane_started);
-progress = virtio_scsi_handle_event_vq(s, vq);
+virtio_scsi_handle_event_vq(s, vq);
 }
 virtio_scsi_release(s);
-return progress;
 }
 
 static int virtio_scsi_set_host_notifier(VirtIOSCSI *s, VirtQueue *vq, int n)
diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 0039e1c74c..c0

[PATCH v3 4/6] virtio-scsi: prepare virtio_scsi_handle_cmd for dataplane

2021-12-07 Thread Stefan Hajnoczi
Prepare virtio_scsi_handle_cmd() to be used by both dataplane and
non-dataplane by making the condition for starting ioeventfd more
specific. This way it won't trigger when dataplane has already been
started.

Signed-off-by: Stefan Hajnoczi 
---
 hw/scsi/virtio-scsi.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
index 51fd09522a..34a968ecfb 100644
--- a/hw/scsi/virtio-scsi.c
+++ b/hw/scsi/virtio-scsi.c
@@ -720,7 +720,7 @@ static void virtio_scsi_handle_cmd(VirtIODevice *vdev, 
VirtQueue *vq)
 /* use non-QOM casts in the data path */
 VirtIOSCSI *s = (VirtIOSCSI *)vdev;
 
-if (s->ctx) {
+if (s->ctx && !s->dataplane_started) {
 virtio_device_start_ioeventfd(vdev);
 if (!s->dataplane_fenced) {
 return;
-- 
2.33.1




[PATCH v3 3/6] virtio-blk: drop unused virtio_blk_handle_vq() return value

2021-12-07 Thread Stefan Hajnoczi
The return value of virtio_blk_handle_vq() is no longer used. Get rid of
it. This is a step towards unifying the dataplane and non-dataplane
virtqueue handler functions.

Prepare virtio_blk_handle_output() to be used by both dataplane and
non-dataplane by making the condition for starting ioeventfd more
specific. This way it won't trigger when dataplane has already been
started.

Signed-off-by: Stefan Hajnoczi 
---
 include/hw/virtio/virtio-blk.h |  2 +-
 hw/block/virtio-blk.c  | 14 +++---
 2 files changed, 4 insertions(+), 12 deletions(-)

diff --git a/include/hw/virtio/virtio-blk.h b/include/hw/virtio/virtio-blk.h
index 29655a406d..d311c57cca 100644
--- a/include/hw/virtio/virtio-blk.h
+++ b/include/hw/virtio/virtio-blk.h
@@ -90,7 +90,7 @@ typedef struct MultiReqBuffer {
 bool is_write;
 } MultiReqBuffer;
 
-bool virtio_blk_handle_vq(VirtIOBlock *s, VirtQueue *vq);
+void virtio_blk_handle_vq(VirtIOBlock *s, VirtQueue *vq);
 void virtio_blk_process_queued_requests(VirtIOBlock *s, bool is_bh);
 
 #endif
diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index f139cd7cc9..82676cdd01 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -767,12 +767,11 @@ static int virtio_blk_handle_request(VirtIOBlockReq *req, 
MultiReqBuffer *mrb)
 return 0;
 }
 
-bool virtio_blk_handle_vq(VirtIOBlock *s, VirtQueue *vq)
+void virtio_blk_handle_vq(VirtIOBlock *s, VirtQueue *vq)
 {
 VirtIOBlockReq *req;
 MultiReqBuffer mrb = {};
 bool suppress_notifications = virtio_queue_get_notification(vq);
-bool progress = false;
 
 aio_context_acquire(blk_get_aio_context(s->blk));
 blk_io_plug(s->blk);
@@ -783,7 +782,6 @@ bool virtio_blk_handle_vq(VirtIOBlock *s, VirtQueue *vq)
 }
 
 while ((req = virtio_blk_get_request(s, vq))) {
-progress = true;
 if (virtio_blk_handle_request(req, &mrb)) {
 virtqueue_detach_element(req->vq, &req->elem, 0);
 virtio_blk_free_request(req);
@@ -802,19 +800,13 @@ bool virtio_blk_handle_vq(VirtIOBlock *s, VirtQueue *vq)
 
 blk_io_unplug(s->blk);
 aio_context_release(blk_get_aio_context(s->blk));
-return progress;
-}
-
-static void virtio_blk_handle_output_do(VirtIOBlock *s, VirtQueue *vq)
-{
-virtio_blk_handle_vq(s, vq);
 }
 
 static void virtio_blk_handle_output(VirtIODevice *vdev, VirtQueue *vq)
 {
 VirtIOBlock *s = (VirtIOBlock *)vdev;
 
-if (s->dataplane) {
+if (s->dataplane && !s->dataplane_started) {
 /* Some guests kick before setting VIRTIO_CONFIG_S_DRIVER_OK so start
  * dataplane here instead of waiting for .set_status().
  */
@@ -823,7 +815,7 @@ static void virtio_blk_handle_output(VirtIODevice *vdev, 
VirtQueue *vq)
 return;
 }
 }
-virtio_blk_handle_output_do(s, vq);
+virtio_blk_handle_vq(s, vq);
 }
 
 void virtio_blk_process_queued_requests(VirtIOBlock *s, bool is_bh)
-- 
2.33.1




[PATCH v3 6/6] virtio: unify dataplane and non-dataplane ->handle_output()

2021-12-07 Thread Stefan Hajnoczi
Now that virtio-blk and virtio-scsi are ready, get rid of
the handle_aio_output() callback. It's no longer needed.

Signed-off-by: Stefan Hajnoczi 
---
 include/hw/virtio/virtio.h  |  4 +--
 hw/block/dataplane/virtio-blk.c | 16 ++
 hw/scsi/virtio-scsi-dataplane.c | 54 -
 hw/virtio/virtio.c  | 32 +--
 4 files changed, 26 insertions(+), 80 deletions(-)

diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index b90095628f..f095637058 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -316,8 +316,8 @@ bool virtio_device_ioeventfd_enabled(VirtIODevice *vdev);
 EventNotifier *virtio_queue_get_host_notifier(VirtQueue *vq);
 void virtio_queue_set_host_notifier_enabled(VirtQueue *vq, bool enabled);
 void virtio_queue_host_notifier_read(EventNotifier *n);
-void virtio_queue_aio_set_host_notifier_handler(VirtQueue *vq, AioContext *ctx,
-VirtIOHandleOutput handle_output);
+void virtio_queue_aio_attach_host_notifier(VirtQueue *vq, AioContext *ctx);
+void virtio_queue_aio_detach_host_notifier(VirtQueue *vq, AioContext *ctx);
 VirtQueue *virtio_vector_first_queue(VirtIODevice *vdev, uint16_t vector);
 VirtQueue *virtio_vector_next_queue(VirtQueue *vq);
 
diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c
index a2fa407b98..49276e46f2 100644
--- a/hw/block/dataplane/virtio-blk.c
+++ b/hw/block/dataplane/virtio-blk.c
@@ -154,17 +154,6 @@ void virtio_blk_data_plane_destroy(VirtIOBlockDataPlane *s)
 g_free(s);
 }
 
-static void virtio_blk_data_plane_handle_output(VirtIODevice *vdev,
-VirtQueue *vq)
-{
-VirtIOBlock *s = (VirtIOBlock *)vdev;
-
-assert(s->dataplane);
-assert(s->dataplane_started);
-
-virtio_blk_handle_vq(s, vq);
-}
-
 /* Context: QEMU global mutex held */
 int virtio_blk_data_plane_start(VirtIODevice *vdev)
 {
@@ -258,8 +247,7 @@ int virtio_blk_data_plane_start(VirtIODevice *vdev)
 for (i = 0; i < nvqs; i++) {
 VirtQueue *vq = virtio_get_queue(s->vdev, i);
 
-virtio_queue_aio_set_host_notifier_handler(vq, s->ctx,
-virtio_blk_data_plane_handle_output);
+virtio_queue_aio_attach_host_notifier(vq, s->ctx);
 }
 aio_context_release(s->ctx);
 return 0;
@@ -302,7 +290,7 @@ static void virtio_blk_data_plane_stop_bh(void *opaque)
 for (i = 0; i < s->conf->num_queues; i++) {
 VirtQueue *vq = virtio_get_queue(s->vdev, i);
 
-virtio_queue_aio_set_host_notifier_handler(vq, s->ctx, NULL);
+virtio_queue_aio_detach_host_notifier(vq, s->ctx);
 }
 }
 
diff --git a/hw/scsi/virtio-scsi-dataplane.c b/hw/scsi/virtio-scsi-dataplane.c
index 76137de67f..29575cbaf6 100644
--- a/hw/scsi/virtio-scsi-dataplane.c
+++ b/hw/scsi/virtio-scsi-dataplane.c
@@ -49,45 +49,6 @@ void virtio_scsi_dataplane_setup(VirtIOSCSI *s, Error **errp)
 }
 }
 
-static void virtio_scsi_data_plane_handle_cmd(VirtIODevice *vdev,
-  VirtQueue *vq)
-{
-VirtIOSCSI *s = VIRTIO_SCSI(vdev);
-
-virtio_scsi_acquire(s);
-if (!s->dataplane_fenced) {
-assert(s->ctx && s->dataplane_started);
-virtio_scsi_handle_cmd_vq(s, vq);
-}
-virtio_scsi_release(s);
-}
-
-static void virtio_scsi_data_plane_handle_ctrl(VirtIODevice *vdev,
-   VirtQueue *vq)
-{
-VirtIOSCSI *s = VIRTIO_SCSI(vdev);
-
-virtio_scsi_acquire(s);
-if (!s->dataplane_fenced) {
-assert(s->ctx && s->dataplane_started);
-virtio_scsi_handle_ctrl_vq(s, vq);
-}
-virtio_scsi_release(s);
-}
-
-static void virtio_scsi_data_plane_handle_event(VirtIODevice *vdev,
-VirtQueue *vq)
-{
-VirtIOSCSI *s = VIRTIO_SCSI(vdev);
-
-virtio_scsi_acquire(s);
-if (!s->dataplane_fenced) {
-assert(s->ctx && s->dataplane_started);
-virtio_scsi_handle_event_vq(s, vq);
-}
-virtio_scsi_release(s);
-}
-
 static int virtio_scsi_set_host_notifier(VirtIOSCSI *s, VirtQueue *vq, int n)
 {
 BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(s)));
@@ -112,10 +73,10 @@ static void virtio_scsi_dataplane_stop_bh(void *opaque)
 VirtIOSCSICommon *vs = VIRTIO_SCSI_COMMON(s);
 int i;
 
-virtio_queue_aio_set_host_notifier_handler(vs->ctrl_vq, s->ctx, NULL);
-virtio_queue_aio_set_host_notifier_handler(vs->event_vq, s->ctx, NULL);
+virtio_queue_aio_detach_host_notifier(vs->ctrl_vq, s->ctx);
+virtio_queue_aio_detach_host_notifier(vs->event_vq, s->ctx);
 for (i = 0; i < vs->conf.num_queues; i++) {
-virtio_queue_aio_set_host_notifier_handler(vs->cmd_vqs[i], s->ctx, 
NULL);
+virtio_queue_aio_detach_host_notifier(vs->cmd_vqs[i], s->ctx);
 }
 }
 
@@ -176,14 +137,11 @@ int virtio_scsi_dataplane_start(VirtIODevice *vdev)
 memory_region_transaction_commit();
 
 aio_context_acquir

[PATCH v3 5/6] virtio: use ->handle_output() instead of ->handle_aio_output()

2021-12-07 Thread Stefan Hajnoczi
The difference between ->handle_output() and ->handle_aio_output() was
that ->handle_aio_output() returned a bool return value indicating
progress. This was needed by the old polling API but now that the bool
return value is gone, the two functions can be unified.

Signed-off-by: Stefan Hajnoczi 
---
 hw/virtio/virtio.c | 33 +++--
 1 file changed, 3 insertions(+), 30 deletions(-)

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index c042be3935..a97a406d3c 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -125,7 +125,6 @@ struct VirtQueue
 
 uint16_t vector;
 VirtIOHandleOutput handle_output;
-VirtIOHandleOutput handle_aio_output;
 VirtIODevice *vdev;
 EventNotifier guest_notifier;
 EventNotifier host_notifier;
@@ -2300,20 +2299,6 @@ void virtio_queue_set_align(VirtIODevice *vdev, int n, 
int align)
 }
 }
 
-static void virtio_queue_notify_aio_vq(VirtQueue *vq)
-{
-if (vq->vring.desc && vq->handle_aio_output) {
-VirtIODevice *vdev = vq->vdev;
-
-trace_virtio_queue_notify(vdev, vq - vdev->vq, vq);
-vq->handle_aio_output(vdev, vq);
-
-if (unlikely(vdev->start_on_kick)) {
-virtio_set_started(vdev, true);
-}
-}
-}
-
 static void virtio_queue_notify_vq(VirtQueue *vq)
 {
 if (vq->vring.desc && vq->handle_output) {
@@ -2392,7 +2377,6 @@ VirtQueue *virtio_add_queue(VirtIODevice *vdev, int 
queue_size,
 vdev->vq[i].vring.num_default = queue_size;
 vdev->vq[i].vring.align = VIRTIO_PCI_VRING_ALIGN;
 vdev->vq[i].handle_output = handle_output;
-vdev->vq[i].handle_aio_output = NULL;
 vdev->vq[i].used_elems = g_malloc0(sizeof(VirtQueueElement) *
queue_size);
 
@@ -2404,7 +2388,6 @@ void virtio_delete_queue(VirtQueue *vq)
 vq->vring.num = 0;
 vq->vring.num_default = 0;
 vq->handle_output = NULL;
-vq->handle_aio_output = NULL;
 g_free(vq->used_elems);
 vq->used_elems = NULL;
 virtio_virtqueue_reset_region_cache(vq);
@@ -3509,14 +3492,6 @@ EventNotifier *virtio_queue_get_guest_notifier(VirtQueue 
*vq)
 return &vq->guest_notifier;
 }
 
-static void virtio_queue_host_notifier_aio_read(EventNotifier *n)
-{
-VirtQueue *vq = container_of(n, VirtQueue, host_notifier);
-if (event_notifier_test_and_clear(n)) {
-virtio_queue_notify_aio_vq(vq);
-}
-}
-
 static void virtio_queue_host_notifier_aio_poll_begin(EventNotifier *n)
 {
 VirtQueue *vq = container_of(n, VirtQueue, host_notifier);
@@ -3536,7 +3511,7 @@ static void 
virtio_queue_host_notifier_aio_poll_ready(EventNotifier *n)
 {
 VirtQueue *vq = container_of(n, VirtQueue, host_notifier);
 
-virtio_queue_notify_aio_vq(vq);
+virtio_queue_notify_vq(vq);
 }
 
 static void virtio_queue_host_notifier_aio_poll_end(EventNotifier *n)
@@ -3551,9 +3526,8 @@ void virtio_queue_aio_set_host_notifier_handler(VirtQueue 
*vq, AioContext *ctx,
 VirtIOHandleOutput handle_output)
 {
 if (handle_output) {
-vq->handle_aio_output = handle_output;
 aio_set_event_notifier(ctx, &vq->host_notifier, true,
-   virtio_queue_host_notifier_aio_read,
+   virtio_queue_host_notifier_read,
virtio_queue_host_notifier_aio_poll,
virtio_queue_host_notifier_aio_poll_ready);
 aio_set_event_notifier_poll(ctx, &vq->host_notifier,
@@ -3563,8 +3537,7 @@ void virtio_queue_aio_set_host_notifier_handler(VirtQueue 
*vq, AioContext *ctx,
 aio_set_event_notifier(ctx, &vq->host_notifier, true, NULL, NULL, 
NULL);
 /* Test and clear notifier before after disabling event,
  * in case poll callback didn't have time to run. */
-virtio_queue_host_notifier_aio_read(&vq->host_notifier);
-vq->handle_aio_output = NULL;
+virtio_queue_host_notifier_read(&vq->host_notifier);
 }
 }
 
-- 
2.33.1




[PATCH v3 0/6] aio-posix: split poll check from ready handler

2021-12-07 Thread Stefan Hajnoczi
v3:
- Fixed FUSE export aio_set_fd_handler() call that I missed and double-checked
  for any other missing call sites using Coccinelle [Rich]
v2:
- Cleaned up unused return values in nvme and virtio-blk [Stefano]
- Documented try_poll_mode() ready_list argument [Stefano]
- Unified virtio-blk/scsi dataplane and non-dataplane virtqueue handlers 
[Stefano]

The first patch improves AioContext's adaptive polling execution time
measurement. This can result in better performance because the algorithm makes
better decisions about when to poll versus when to fall back to file descriptor
monitoring.

The remaining patches unify the virtio-blk and virtio-scsi dataplane and
non-dataplane virtqueue handlers. This became possible because the dataplane
handler function now has the same function signature as the non-dataplane
handler function. Stefano Garzarella prompted me to make this refactoring.

Stefan Hajnoczi (6):
  aio-posix: split poll check from ready handler
  virtio: get rid of VirtIOHandleAIOOutput
  virtio-blk: drop unused virtio_blk_handle_vq() return value
  virtio-scsi: prepare virtio_scsi_handle_cmd for dataplane
  virtio: use ->handle_output() instead of ->handle_aio_output()
  virtio: unify dataplane and non-dataplane ->handle_output()

 include/block/aio.h |  4 +-
 include/hw/virtio/virtio-blk.h  |  2 +-
 include/hw/virtio/virtio.h  |  5 +-
 util/aio-posix.h|  1 +
 block/curl.c| 11 ++--
 block/export/fuse.c |  4 +-
 block/io_uring.c| 19 ---
 block/iscsi.c   |  4 +-
 block/linux-aio.c   | 16 +++---
 block/nfs.c |  6 +--
 block/nvme.c| 51 ---
 block/ssh.c |  4 +-
 block/win32-aio.c   |  4 +-
 hw/block/dataplane/virtio-blk.c | 16 +-
 hw/block/virtio-blk.c   | 14 ++
 hw/scsi/virtio-scsi-dataplane.c | 60 +++---
 hw/scsi/virtio-scsi.c   |  2 +-
 hw/virtio/virtio.c  | 73 +--
 hw/xen/xen-bus.c|  6 +--
 io/channel-command.c|  6 ++-
 io/channel-file.c   |  3 +-
 io/channel-socket.c |  3 +-
 migration/rdma.c|  8 +--
 tests/unit/test-aio.c   |  4 +-
 util/aio-posix.c| 89 +
 util/aio-win32.c|  4 +-
 util/async.c| 10 +++-
 util/main-loop.c|  4 +-
 util/qemu-coroutine-io.c|  5 +-
 util/vhost-user-server.c| 11 ++--
 30 files changed, 219 insertions(+), 230 deletions(-)

-- 
2.33.1





Re: [PATCH] mirror: Avoid assertion failed in mirror_run

2021-12-07 Thread Hanna Reitz
[CC-ing qemu-block, Vladimir, Kevin, and John – when sending patches, 
please look into the MAINTAINERS file or use the 
scripts/get_maintainer.pl script to find out who to CC on them.  It’s 
very to overlook patches on qemu-devel :/]


On 07.12.21 11:56, Yi Wang wrote:

From: Long YunJian 

when blockcommit from active leaf node, sometimes, we get assertion failed with
"mirror_run: Assertion `QLIST_EMPTY(&bs->tracked_requests)' failed" messages.
According to the core file, we find bs->tracked_requests has IO request,
so assertion failed.
(gdb) bt
#0  0x7f410df707cf in raise () from /lib64/libc.so.6
#1  0x7f410df5ac05 in abort () from /lib64/libc.so.6
#2  0x7f410df5aad9 in __assert_fail_base.cold.0 () from /lib64/libc.so.6
#3  0x7f410df68db6 in __assert_fail () from /lib64/libc.so.6
#4  0x556915635371 in mirror_run (job=0x556916ff8600, errp=) 
at block/mirror.c:1092
#5  0x5569155e6c53 in job_co_entry (opaque=0x556916ff8600) at job.c:904
#6  0x5569156d9483 in coroutine_trampoline (i0=, i1=) at util/coroutine-ucontext.c:115
(gdb) p s->mirror_top_bs->backing->bs->tracked_requests
$12 = {lh_first = 0x7f3f07bfb8b0}
(gdb) p s->mirror_top_bs->backing->bs->tracked_requests->lh_first
$13 = (struct BdrvTrackedRequest *) 0x7f3f07bfb8b0

Actually, before excuting assert(QLIST_EMPTY(&bs->tracked_requests)),
it will excute mirror_flush(s). It may handle new I/O request and maybe
pending I/O during this flush. Just likes in bdrv_close fuction,
bdrv_drain(bs) followed by bdrv_flush(bs), we should add bdrv_drain fuction
to handle pending I/O after mirror_flush.


Oh.  How is that happening, though?  I would have expected that flushing 
the target BB (and associated BDS) only flushes requests to the OS and 
lower layers, but the source node (which is `bs`) should (in the case of 
commit) always be above the target, so I wouldn’t have expected it to 
get any new requests due to this flush.


Do you have a reproducer for this?


Signed-off-by: Long YunJian 
Signed-off-by: Yi Wang 
---
  block/mirror.c | 2 ++
  1 file changed, 2 insertions(+)

diff --git a/block/mirror.c b/block/mirror.c
index efec2c7674..1eec356310 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -1079,6 +1079,8 @@ static int coroutine_fn mirror_run(Job *job, Error **errp)
  s->in_drain = false;
  continue;
  }
+/* in case flush left pending I/O */
+bdrv_drain(bs);


I don’t think this works, because if we drain, we would also need to 
flush the target again.  Essentially I believe we’d basically need 
something like


do {
    bdrv_drained_begin(bs);
    mirror_flush(s);
    if (!QLIST_EMPTY(&bs->tracked_requests)) {
    bdrv_drained_end(bs);
    }
} while (!QLIST_EMPTY(&bs->tracked_requests));

(Which I know is really ugly)

Hanna

  
  /* The two disks are in sync.  Exit and report successful

   * completion.





Re: [PATCH v10 7/8] qmp: add QMP command x-query-virtio-queue-element

2021-12-07 Thread Markus Armbruster
Jonah Palmer  writes:

> From: Laurent Vivier 
>
> This new command shows the information of a VirtQueue element.
>
> Signed-off-by: Jonah Palmer 
> ---
>  hw/virtio/virtio-stub.c |   9 +++
>  hw/virtio/virtio.c  | 154 
>  qapi/virtio.json| 183 
> 
>  3 files changed, 346 insertions(+)
>
> diff --git a/hw/virtio/virtio-stub.c b/hw/virtio/virtio-stub.c
> index 13e5f93..7ddb22c 100644
> --- a/hw/virtio/virtio-stub.c
> +++ b/hw/virtio/virtio-stub.c
> @@ -31,3 +31,12 @@ VirtQueueStatus *qmp_x_query_virtio_queue_status(const 
> char *path,
>  {
>  return qmp_virtio_unsupported(errp);
>  }
> +
> +VirtioQueueElement *qmp_x_query_virtio_queue_element(const char *path,
> + uint16_t queue,
> + bool has_index,
> + uint16_t index,
> + Error **errp)
> +{
> +return qmp_virtio_unsupported(errp);
> +}
> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> index 459bfb2..8c6cc27 100644
> --- a/hw/virtio/virtio.c
> +++ b/hw/virtio/virtio.c
> @@ -475,6 +475,19 @@ static inline void vring_used_write(VirtQueue *vq, 
> VRingUsedElem *uelem,
>  address_space_cache_invalidate(&caches->used, pa, sizeof(VRingUsedElem));
>  }
>  
> +/* Called within rcu_read_lock(). */
> +static inline uint16_t vring_used_flags(VirtQueue *vq)
> +{
> +VRingMemoryRegionCaches *caches = vring_get_region_caches(vq);
> +hwaddr pa = offsetof(VRingUsed, flags);
> +
> +if (!caches) {
> +return 0;
> +}
> +
> +return virtio_lduw_phys_cached(vq->vdev, &caches->used, pa);
> +}
> +
>  /* Called within rcu_read_lock().  */
>  static uint16_t vring_used_idx(VirtQueue *vq)
>  {
> @@ -4381,6 +4394,147 @@ VirtQueueStatus 
> *qmp_x_query_virtio_queue_status(const char *path,
>  return status;
>  }
>  
> +static strList *qmp_decode_vring_desc_flags(uint16_t flags)
> +{
> +strList *list = NULL;
> +strList *node;
> +int i;
> +
> +struct {
> +uint16_t flag;
> +const char *value;
> +} map[] = {
> +{ VRING_DESC_F_NEXT, "next" },
> +{ VRING_DESC_F_WRITE, "write" },
> +{ VRING_DESC_F_INDIRECT, "indirect" },
> +{ 1 << VRING_PACKED_DESC_F_AVAIL, "avail" },
> +{ 1 << VRING_PACKED_DESC_F_USED, "used" },
> +{ 0, "" }
> +};
> +
> +for (i = 0; map[i].flag; i++) {
> +if ((map[i].flag & flags) == 0) {
> +continue;
> +}
> +node = g_malloc0(sizeof(strList));
> +node->value = g_strdup(map[i].value);
> +node->next = list;
> +list = node;
> +}
> +
> +return list;
> +}
> +
> +VirtioQueueElement *qmp_x_query_virtio_queue_element(const char *path,
> + uint16_t queue,
> + bool has_index,
> + uint16_t index,
> + Error **errp)
> +{
> +VirtIODevice *vdev;
> +VirtQueue *vq;
> +VirtioQueueElement *element = NULL;
> +
> +vdev = virtio_device_find(path);
> +if (vdev == NULL) {
> +error_setg(errp, "Path %s is not a VirtIO device", path);
> +return NULL;
> +}
> +
> +if (queue >= VIRTIO_QUEUE_MAX || !virtio_queue_get_num(vdev, queue)) {
> +error_setg(errp, "Invalid virtqueue number %d", queue);
> +return NULL;
> +}
> +vq = &vdev->vq[queue];
> +
> +if (virtio_vdev_has_feature(vdev, VIRTIO_F_RING_PACKED)) {
> +error_setg(errp, "Packed ring not supported");
> +return NULL;
> +} else {
> +unsigned int head, i, max;
> +VRingMemoryRegionCaches *caches;
> +MemoryRegionCache indirect_desc_cache = MEMORY_REGION_CACHE_INVALID;
> +MemoryRegionCache *desc_cache;
> +VRingDesc desc;
> +VirtioRingDescList *list = NULL;
> +VirtioRingDescList *node;
> +int rc; int ndescs;
> +
> +RCU_READ_LOCK_GUARD();
> +
> +max = vq->vring.num;
> +
> +if (!has_index) {
> +head = vring_avail_ring(vq, vq->last_avail_idx % vq->vring.num);
> +} else {
> +head = vring_avail_ring(vq, index % vq->vring.num);
> +}
> +i = head;
> +
> +caches = vring_get_region_caches(vq);
> +if (!caches) {
> +error_setg(errp, "Region caches not initialized");
> +return NULL;
> +}
> +if (caches->desc.len < max * sizeof(VRingDesc)) {
> +error_setg(errp, "Cannot map descriptor ring");
> +return NULL;
> +}
> +
> +desc_cache = &caches->desc;
> +vring_split_desc_read(vdev, &desc, desc_cache, i);
> +if (desc.flags & VRING_DESC

Re: [PATCH v10 6/8] qmp: add QMP commands for virtio/vhost queue-status

2021-12-07 Thread Markus Armbruster
Jonah Palmer  writes:

> From: Laurent Vivier 
>
> These new commands show the internal status of a VirtIODevice's
> VirtQueue and a vhost device's vhost_virtqueue (if active).
>
> Signed-off-by: Jonah Palmer 
> ---

[...]

> diff --git a/qapi/virtio.json b/qapi/virtio.json
> index 7ef1f95..56e56d2 100644
> --- a/qapi/virtio.json
> +++ b/qapi/virtio.json
> @@ -402,3 +402,255 @@
>'data': { 'transports': [ 'str' ],
>  '*dev-features': [ 'str' ],
>  '*unknown-dev-features': 'uint64' } }
> +
> +##
> +# @VirtQueueStatus:
> +#
> +# Information of a VirtIODevice VirtQueue, including most members of
> +# the VirtQueue data structure.
> +#
> +# @name: Name of the VirtIODevice that uses this VirtQueue
> +#
> +# @queue-index: VirtQueue queue_index
> +#
> +# @inuse: VirtQueue inuse
> +#
> +# @vring-num: VirtQueue vring.num
> +#
> +# @vring-num-default: VirtQueue vring.num_default
> +#
> +# @vring-align: VirtQueue vring.align
> +#
> +# @vring-desc: VirtQueue vring.desc (descriptor area)
> +#
> +# @vring-avail: VirtQueue vring.avail (driver area)
> +#
> +# @vring-used: VirtQueue vring.used (device area)
> +#
> +# @last-avail-idx: VirtQueue last_avail_idx or return of vhost_dev
> +#  vhost_get_vring_base (if vhost active)
> +#
> +# @shadow-avail-idx: VirtQueue shadow_avail_idx
> +#
> +# @used-idx: VirtQueue used_idx
> +#
> +# @signalled-used: VirtQueue signalled_used
> +#
> +# @signalled-used-valid: VirtQueue signalled_used_valid flag
> +#
> +# Since: 7.0
> +#
> +##
> +
> +{ 'struct': 'VirtQueueStatus',
> +  'data': { 'name': 'str',
> +'queue-index': 'uint16',
> +'inuse': 'uint32',
> +'vring-num': 'uint32',
> +'vring-num-default': 'uint32',
> +'vring-align': 'uint32',
> +'vring-desc': 'uint64',
> +'vring-avail': 'uint64',
> +'vring-used': 'uint64',
> +'*last-avail-idx': 'uint16',
> +'*shadow-avail-idx': 'uint16',
> +'used-idx': 'uint16',
> +'signalled-used': 'uint16',
> +'signalled-used-valid': 'bool' } }
> +
> +##
> +# @x-query-virtio-queue-status:
> +#
> +# Return the status of a given VirtIODevice's VirtQueue
> +#
> +# @path: VirtIODevice canonical QOM path
> +#
> +# @queue: VirtQueue index to examine
> +#
> +# Features:
> +# @unstable: This command is meant for debugging

End with a period for consistency with existing docs, like you did in
v9.

> +#
> +# Returns: VirtQueueStatus of the VirtQueue
> +#
> +# Notes: last_avail_idx will not be displayed in the case where
> +#the selected VirtIODevice has a running vhost device and
> +#the VirtIODevice VirtQueue index (queue) does not exist for
> +#the corresponding vhost device vhost_virtqueue. Also,
> +#shadow_avail_idx will not be displayed in the case where
> +#the selected VirtIODevice has a running vhost device.
> +#
> +# Since: 7.0
> +#
> +# Examples:
> +#
> +# 1. Get VirtQueueStatus for virtio-vsock (vhost-vsock running)
> +#
> +# -> { "execute": "x-query-virtio-queue-status",
> +#  "arguments": { "path": "/machine/peripheral/vsock0/virtio-backend",
> +# "queue": 1 }
> +#}
> +# <- { "return": {
> +#"signalled-used": 0,
> +#"inuse": 0,
> +#"vring-align": 4096,
> +#"vring-desc": 5217370112,
> +#"signalled-used-valid": false,
> +#"vring-num-default": 128,
> +#"vring-avail": 5217372160,
> +#"queue-index": 1,
> +#"last-avail-idx": 0,
> +#"vring-used": 5217372480,
> +#"used-idx": 0,
> +#"name": "vhost-vsock",
> +#"vring-num": 128 }
> +#}
> +#
> +# 2. Get VirtQueueStatus for virtio-serial (no vhost)
> +#
> +# -> { "execute": "x-query-virtio-queue-status",
> +#  "arguments": { "path": 
> "/machine/peripheral-anon/device[0]/virtio-backend",
> +# "queue": 20 }
> +#}
> +# <- { "return": {
> +#"signalled-used": 0,
> +#"inuse": 0,
> +#"vring-align": 4096,
> +#"vring-desc": 5182074880,
> +#"signalled-used-valid": false,
> +#"vring-num-default": 128,
> +#"vring-avail": 5182076928,
> +#"queue-index": 20,
> +#"last-avail-idx": 0,
> +#"vring-used": 5182077248,
> +#"used-idx": 0,
> +#"name": "virtio-serial",
> +#"shadow-avail-idx": 0,
> +#"vring-num": 128 }
> +#}
> +#
> +##
> +
> +{ 'command': 'x-query-virtio-queue-status',
> +  'data': { 'path': 'str', 'queue': 'uint16' },
> +  'returns': 'VirtQueueStatus',
> +  'features': [ 'unstable' ] }
> +
> +##
> +# @VirtVhostQueueStatus:
> +#
> +# Information of a vhost device's vhost_virtqueue, including most
> +# members of the vhost_dev vhost_virtqueue data structure.
> +#
> +# @name: Name of the VirtIODevice that uses thi

Re: [PATCH] spec: Add NBD_OPT_EXTENDED_HEADERS

2021-12-07 Thread Vladimir Sementsov-Ogievskiy

07.12.2021 02:00, Eric Blake wrote:

On Mon, Dec 06, 2021 at 02:40:45PM +0300, Vladimir Sementsov-Ogievskiy wrote:

    Simple reply message

   The simple reply message MUST be sent by the server in response to all
   requests if structured replies have not been negotiated using
-`NBD_OPT_STRUCTURED_REPLY`. If structured replies have been negotiated, a 
simple
-reply MAY be used as a reply to any request other than `NBD_CMD_READ`,
-but only if the reply has no data payload.  The message looks as
-follows:
+`NBD_OPT_STRUCTURED_REPLY`. If structured replies have been
+negotiated, a simple reply MAY be used as a reply to any request other
+than `NBD_CMD_READ`, but only if the reply has no data payload.  If
+extended headers were not negotiated using `NBD_OPT_EXTENDED_HEADERS`,
+the message looks as follows:

   S: 32 bits, 0x67446698, magic (`NBD_SIMPLE_REPLY_MAGIC`; used to be
  `NBD_REPLY_MAGIC`)
@@ -369,6 +398,16 @@ S: 64 bits, handle
   S: (*length* bytes of data if the request is of type `NBD_CMD_READ` and
   *error* is zero)

+If extended headers were negotiated using `NBD_OPT_EXTENDED_HEADERS`,
+the message looks like:
+
+S: 32 bits, 0x60d12fd6, magic (`NBD_SIMPLE_REPLY_EXT_MAGIC`)
+S: 32 bits, error (MAY be zero)
+S: 64 bits, handle
+S: 128 bits, padding (MUST be zero)
+S: (*length* bytes of data if the request is of type `NBD_CMD_READ` and
+*error* is zero)
+


If we go this way, let's put payload length into padding: it will help to make 
the protocol context-independent and less error-prone.


Easy enough to do (the payload length will be 0 except for
NBD_CMD_READ).



Or, the otherway, may be just forbid the payload for simple-64bit ? What's the 
reason to allow 64bit requests without structured reply negotiation?


The two happened to be orthogonal enough in my implementation.  It was
easy to demonstrate either one without the other, and it IS easier to
write a client using non-structured replies (structured reads ARE
tougher than simple reads, even if it is less efficient when it comes
to reading zeros).  But you are also right that we could require
structured reads prior to allowing 64-bit operations, and then have
only one supported reply type on the wire when negotiated.  Wouter,
which way do you prefer?




    Structured reply chunk message

   Some of the major downsides of the default simple reply to
@@ -410,7 +449,9 @@ considered successful only if it did not contain any error 
chunks,
   although the client MAY be able to determine partial success based
   on the chunks received.

-A structured reply chunk message looks as follows:
+If extended headers were not negotiated using
+`NBD_OPT_EXTENDED_HEADERS`, a structured reply chunk message looks as
+follows:

   S: 32 bits, 0x668e33ef, magic (`NBD_STRUCTURED_REPLY_MAGIC`)
   S: 16 bits, flags
@@ -423,6 +464,17 @@ The use of *length* in the reply allows context-free 
division of
   the overall server traffic into individual reply messages; the
   *type* field describes how to further interpret the payload.

+If extended headers were negotiated using `NBD_OPT_EXTENDED_HEADERS`,
+the message looks like:
+
+S: 32 bits, 0x6e8a278c, magic (`NBD_STRUCTURED_REPLY_EXT_MAGIC`)
+S: 16 bits, flags
+S: 16 bits, type
+S: 64 bits, handle
+S: 64 bits, length of payload (unsigned)


Maybe, 64bits is too much for payload. But who knows. And it's good that it's 
symmetric to 64bit length in request.


Indeed, both qemu and libnbd implementations explicitly kill the
connection to any server that replies with more than the max buffer
used for NBD_CMD_READ/WRITE (32M for qemu, 64M for libnbd).  And if
the spec is not already clear on the topic, I should add an
independent patch to NBD_CMD_BLOCK_STATUS to make it obvious that a
server cannot reply with too many extents because of such clients.

So none of my proof-of-concept code ever used the full 64-bits of the
reply header length.  On the other hand, there is indeed the symmetry
argument - if someone writes a server willing to accept a 4G
NBD_CMD_WRITE, then it should also support a 4G NBD_CMD_READ, even if
no known existing server or client allows buffers that large..




+S: 64 bits, padding (MUST be zero)


Hmm. Extra 8 bytes to be power-of-2. Does 32 bytes really perform better than 
24 bytes?


Consider:
struct header[100];

if sizeof(header[0]) is a power of 2 <= the cache line size (and the
compiler prefers to start arrays aligned to the cache line) then we
are guaranteed that all array members each reside in a single cache
line.  But if it is not a power of 2, some of the array members
straddle two cache lines.

Will there be code that wants to create an array of headers?  Perhaps
so, because that is a logical way (along with scatter/gather to
combine the header with variable-sized payloads) of tracking the
headers for multiple commands issued in parallel.

Do I have actual performance numbers?  No. But there's plenty of
google hits for why sizing structs to a power of 2 is a g