date:20210407

Re: [PATCH 0/7] block/nbd: decouple reconnect from drain

2021-04-07 Thread Roman Kagan

On Wed, Mar 17, 2021 at 11:35:31AM +0300, Vladimir Sementsov-Ogievskiy wrote:
> 15.03.2021 09:06, Roman Kagan wrote:
> > The reconnection logic doesn't need to stop while in a drained section.
> > Moreover it has to be active during the drained section, as the requests
> > that were caught in-flight with the connection to the server broken can
> > only usefully get drained if the connection is restored.  Otherwise such
> > requests can only either stall resulting in a deadlock (before
> > 8c517de24a), or be aborted defeating the purpose of the reconnection
> > machinery (after 8c517de24a).
> > 
> > This series aims to just stop messing with the drained section in the
> > reconnection code.
> > 
> > While doing so it undoes the effect of 5ad81b4946 ("nbd: Restrict
> > connection_co reentrance"); as I've missed the point of that commit I'd
> > appreciate more scrutiny in this area.
> > 
> > Roman Kagan (7):
> >block/nbd: avoid touching freed connect_thread
> >block/nbd: use uniformly nbd_client_connecting_wait
> >block/nbd: assert attach/detach runs in the proper context
> >block/nbd: transfer reconnection stuff across aio_context switch
> >block/nbd: better document a case in nbd_co_establish_connection
> >block/nbd: decouple reconnect from drain
> >block/nbd: stop manipulating in_flight counter
> > 
> >   block/nbd.c  | 191 +++
> >   nbd/client.c |   2 -
> >   2 files changed, 86 insertions(+), 107 deletions(-)
> > 
> 
> 
> Hmm. The huge source of problems for this series is weird logic around
> drain and aio context switch in NBD driver.
> 
> Why do we have all these too complicated logic with abuse of in_flight
> counter in NBD? The answer is connection_co. NBD differs from other
> drivers, it has a coroutine independent of request coroutines. And we
> have to move this coroutine carefully to new aio context. We can't
> just enter it from the new context, we want to be sure that
> connection_co is in one of yield points that supports reentering.
> 
> I have an idea of how to avoid this thing: drop connection_co at all.
> 
> 1. nbd negotiation goes to connection thread and becomes independent
> of any aio context.
> 
> 2. waiting for server reply goes to request code. So, instead of
> reading the replay from socket always in connection_co, we read in the
> request coroutine, after sending the request. We'll need a CoMutex for
> it (as only one request coroutine should read from socket), and be
> prepared to coming reply is not for _this_ request (in this case we
> should wake another request and continue read from socket).

The problem with this approach is that it would change the reconnect
behavior.

Currently connection_co purpose is three-fold:

1) receive the header of the server response, identify the request it
   pertains to, and wake the resective request coroutine

2) take on the responsibility to reestablish the connection when it's
   lost

3) monitor the idle connection and initiate the reconnect as soon as the
   connection is lost

Points 1 and 2 can be moved to the request coroutines indeed.  However I
don't see how to do 3 without an extra ever-running coroutine.
Sacrificing it would mean that a connection loss wouldn't be noticed and
the recovery wouldn't be attempted until a request arrived.

This change looks to me like a degradation compared to the current
state.

Roman.

Re: [PULL for-6.0 v2 00/10] emulated nvme fixes for -rc3

2021-04-07 Thread Peter Maydell

On Wed, 7 Apr 2021 at 06:51, Klaus Jensen  wrote:
>
> From: Klaus Jensen 
>
> Hi Peter,
>
> My apologies that these didn't make it for -rc2!
>
> I botched v1, so please pull this v2 instead.
>
>
> The following changes since commit d0d3dd401b70168a353450e031727affee828527:
>
>   Update version for v6.0.0-rc2 release (2021-04-06 18:34:34 +0100)
>
> are available in the Git repository at:
>
>   git://git.infradead.org/qemu-nvme.git 
> tags/nvme-fixes-2021-04-07-pull-request
>
> for you to fetch changes up to 5dd79300df47f07d0e9d6a7bda43b23ff26001dc:
>
>   hw/block/nvme: fix out-of-bounds read in nvme_subsys_ctrl (2021-04-07 
> 07:27:09 +0200)
>
> 
> emulated nvme fixes for -rc3
>
> v2:
>   - added missing patches
>
> 

Hi; this semes to generate a bunch of new warnings during 'make check'
(not sure exactly which test is producing these, due to the usual
interleaving when using -j8):

qemu-system-i386: -device nvme,addr=04.0,drive=drv0,serial=foo:
warning: drive property is deprecated; please use an nvme-ns device
instead
qemu-system-i386: -device
nvme,addr=04.0,drive=drv0,serial=foo,cmb_size_mb=2: warning: drive
property is deprecated; please use an nvme-ns device instead
qemu-system-ppc64: -device nvme,addr=04.0,drive=drv0,serial=foo:
warning: drive property is deprecated; please use an nvme-ns device
instead
qemu-system-ppc64: -device
nvme,addr=04.0,drive=drv0,serial=foo,cmb_size_mb=2: warning: drive
property is deprecated; please use an nvme-ns device instead
qemu-system-x86_64: -device nvme,addr=04.0,drive=drv0,serial=foo:
warning: drive property is deprecated; please use an nvme-ns device
instead
qemu-system-x86_64: -device
nvme,addr=04.0,drive=drv0,serial=foo,cmb_size_mb=2: warning: drive
property is deprecated; please use an nvme-ns device instead

thanks
-- PMM

Re: [PULL for-6.0 v2 00/10] emulated nvme fixes for -rc3

2021-04-07 Thread Klaus Jensen


On Apr  7 08:03, Peter Maydell wrote:

On Wed, 7 Apr 2021 at 06:51, Klaus Jensen  wrote:


From: Klaus Jensen 

Hi Peter,

My apologies that these didn't make it for -rc2!

I botched v1, so please pull this v2 instead.


The following changes since commit d0d3dd401b70168a353450e031727affee828527:

  Update version for v6.0.0-rc2 release (2021-04-06 18:34:34 +0100)

are available in the Git repository at:

  git://git.infradead.org/qemu-nvme.git tags/nvme-fixes-2021-04-07-pull-request

for you to fetch changes up to 5dd79300df47f07d0e9d6a7bda43b23ff26001dc:

  hw/block/nvme: fix out-of-bounds read in nvme_subsys_ctrl (2021-04-07 
07:27:09 +0200)


emulated nvme fixes for -rc3

v2:
  - added missing patches




Hi; this semes to generate a bunch of new warnings during 'make check'
(not sure exactly which test is producing these, due to the usual
interleaving when using -j8):

qemu-system-i386: -device nvme,addr=04.0,drive=drv0,serial=foo:
warning: drive property is deprecated; please use an nvme-ns device
instead
qemu-system-i386: -device
nvme,addr=04.0,drive=drv0,serial=foo,cmb_size_mb=2: warning: drive
property is deprecated; please use an nvme-ns device instead
qemu-system-ppc64: -device nvme,addr=04.0,drive=drv0,serial=foo:
warning: drive property is deprecated; please use an nvme-ns device
instead
qemu-system-ppc64: -device
nvme,addr=04.0,drive=drv0,serial=foo,cmb_size_mb=2: warning: drive
property is deprecated; please use an nvme-ns device instead
qemu-system-x86_64: -device nvme,addr=04.0,drive=drv0,serial=foo:
warning: drive property is deprecated; please use an nvme-ns device
instead
qemu-system-x86_64: -device
nvme,addr=04.0,drive=drv0,serial=foo,cmb_size_mb=2: warning: drive
property is deprecated; please use an nvme-ns device instead

thanks
-- PMM



Hi Peter,

tests/qtest/nvme-test.c is generating these warnings.

We didn't deprecate this formally, so I will remove the warning for now. 
The device works just fine with both "legacy" and "new-style" nvme-ns 
namespace definitions.


I'll do a v3.


signature.asc
Description: PGP signature

Re: [PATCH 0/2] block/rbd: fix memory leaks

2021-04-07 Thread Markus Armbruster

Max Reitz  writes:

> On 29.03.21 17:01, Stefano Garzarella wrote:
>> This series fixes two memory leaks, found through valgrind, in the
>> rbd driver.
>> Stefano Garzarella (2):
>>block/rbd: fix memory leak in qemu_rbd_connect()
>>block/rbd: fix memory leak in qemu_rbd_co_create_opts()
>>   block/rbd.c | 10 ++
>>   1 file changed, 6 insertions(+), 4 deletions(-)
>
> Reviewed-by: Max Reitz 
>
> I’m not quite sure whether this is fit for 6.0...  I think it’s too
> late for rc2, so I don’t know.

This the maintainers' call to make.

* PATCH 1:

  CON: Old bug, probably 2.9, i.e. four years

  PRO: The fix is straightforward

* PATCH 2:

  NEUTRAL: Not recent from upstream's point of view (5.0), but
  downstreams may have different ideas

  PRO: The fix is trivial

I encourage you to take at least PATCH 2.

Re: [PATCH 0/7] block/nbd: decouple reconnect from drain

2021-04-07 Thread Vladimir Sementsov-Ogievskiy


07.04.2021 10:45, Roman Kagan wrote:

On Wed, Mar 17, 2021 at 11:35:31AM +0300, Vladimir Sementsov-Ogievskiy wrote:

15.03.2021 09:06, Roman Kagan wrote:

The reconnection logic doesn't need to stop while in a drained section.
Moreover it has to be active during the drained section, as the requests
that were caught in-flight with the connection to the server broken can
only usefully get drained if the connection is restored.  Otherwise such
requests can only either stall resulting in a deadlock (before
8c517de24a), or be aborted defeating the purpose of the reconnection
machinery (after 8c517de24a).

This series aims to just stop messing with the drained section in the
reconnection code.

While doing so it undoes the effect of 5ad81b4946 ("nbd: Restrict
connection_co reentrance"); as I've missed the point of that commit I'd
appreciate more scrutiny in this area.

Roman Kagan (7):
block/nbd: avoid touching freed connect_thread
block/nbd: use uniformly nbd_client_connecting_wait
block/nbd: assert attach/detach runs in the proper context
block/nbd: transfer reconnection stuff across aio_context switch
block/nbd: better document a case in nbd_co_establish_connection
block/nbd: decouple reconnect from drain
block/nbd: stop manipulating in_flight counter

   block/nbd.c  | 191 +++
   nbd/client.c |   2 -
   2 files changed, 86 insertions(+), 107 deletions(-)




Hmm. The huge source of problems for this series is weird logic around
drain and aio context switch in NBD driver.

Why do we have all these too complicated logic with abuse of in_flight
counter in NBD? The answer is connection_co. NBD differs from other
drivers, it has a coroutine independent of request coroutines. And we
have to move this coroutine carefully to new aio context. We can't
just enter it from the new context, we want to be sure that
connection_co is in one of yield points that supports reentering.

I have an idea of how to avoid this thing: drop connection_co at all.

1. nbd negotiation goes to connection thread and becomes independent
of any aio context.

2. waiting for server reply goes to request code. So, instead of
reading the replay from socket always in connection_co, we read in the
request coroutine, after sending the request. We'll need a CoMutex for
it (as only one request coroutine should read from socket), and be
prepared to coming reply is not for _this_ request (in this case we
should wake another request and continue read from socket).


The problem with this approach is that it would change the reconnect
behavior.

Currently connection_co purpose is three-fold:

1) receive the header of the server response, identify the request it
pertains to, and wake the resective request coroutine

2) take on the responsibility to reestablish the connection when it's
lost

3) monitor the idle connection and initiate the reconnect as soon as the
connection is lost

Points 1 and 2 can be moved to the request coroutines indeed.  However I
don't see how to do 3 without an extra ever-running coroutine.
Sacrificing it would mean that a connection loss wouldn't be noticed and
the recovery wouldn't be attempted until a request arrived.

This change looks to me like a degradation compared to the current
state.



For 3 we can check the connection by timeout:

 - getsockopt( .. SO_ERROR .. ), which could be done from bs aio context, or 
even from reconnect-thread context

 - or, we can create a PING request: just use some request with parameters for 
which we are sure that NBD server should do no action but report some expected 
error. We can create such request by timeout when there no more requests, just 
to check that connection still works.

Note, that neither of this (and nor current [3] which is just endless read from 
socket) will work only with keep-alive set, which is not by default for now.

Anyway I think first step is splitting connect-thread out of nbd.c which is 
overcomplicated now, I'm going to send a refactoring series for this.

--
Best regards,
Vladimir

[PATCH 00/14] nbd: move reconnect-thread to separate file

2021-04-07 Thread Vladimir Sementsov-Ogievskiy

Hi all!

There are problems with nbd driver:

 - nbd reconnect is cancelled on drain, which is bad as Roman describes
   in his "[PATCH 0/7] block/nbd: decouple reconnect from drain"
 - nbd driver is too complicated around drained sections and aio context
   switch. It's nearly impossible to follow all the logic, including
   abuse of bs->in_flight, which is temporary decreased in some places
   (like nbd_read_eof()). Additional reconnect thread and two different
   state machines (we have BDRVNBDState::state and
   BDRVNBDState::connect_thread->state) doesn't make things simpler :)

So, I have a plan:

1. Move nbd negotiation to connect_thread

2. Do receive NBD replies in request coroutines, not in connection_co
  
   At this point we can drop connection_co, and when we don't have
   endless running coroutine, NBD driver becomes a usual block driver,
   and we can drop abuse of bs->in_flight, and probably drop most of
   complicated logic around drained section and aio context switch in
   nbd driver.

3. Still, as Roman describes, with [2] we loose a possibility to
   reconnect immediately when connection breaks (with current code we
   have endless read in reconnect_co, but actually for this to work
   keep-alive should be setup correctly). So, we'll need to reinvent it,
   checking connection periodically by timeout, with help of getsockopt
   or just sending a kind of PING request (zero-length READ or something
   like this).

And this series a kind of preparation. The main point of it is moving
connect-thread to a separate file.

This series may crash on iotest 277. So, it's based on corresponding
fix: "[PATCH 1/7] block/nbd: avoid touching freed connect_thread":

Based-on: <20210315060611.2989049-2-rvka...@yandex-team.ru>

Vladimir Sementsov-Ogievskiy (14):
  block/nbd: BDRVNBDState: drop unused connect_err
  block/nbd: nbd_co_establish_connection(): drop unused errp
  block/nbd: drop unused NBDConnectThread::err field
  block/nbd: split connect_thread_cb() out of connect_thread_func()
  block/nbd: rename NBDConnectThread to NBDConnectCB
  block/nbd: further segregation of connect-thread
  block/nbd: drop nbd_free_connect_thread()
  block/nbd: move nbd connect-thread to nbd/client-connect.c
  block/nbd: NBDConnectCB: drop bh_* fields
  block/nbd: move wait_connect field under mutex protection
  block/nbd: refactor connect_bh()
  block/nbd: refactor nbd_co_establish_connection
  block/nbd: nbd_co_establish_connection_cancel(): rename wake to
do_wake
  block/nbd: drop thr->state

 include/block/nbd.h  |   6 +
 block/nbd.c  | 266 ++-
 nbd/client-connect.c |  72 
 nbd/meson.build  |   1 +
 4 files changed, 162 insertions(+), 183 deletions(-)
 create mode 100644 nbd/client-connect.c

-- 
2.29.2

[PATCH 01/14] block/nbd: BDRVNBDState: drop unused connect_err

2021-04-07 Thread Vladimir Sementsov-Ogievskiy

The field is actually unused. Let's make things a bit simpler dropping
it and corresponding logic.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/nbd.c | 9 ++---
 1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/block/nbd.c b/block/nbd.c
index c26dc5a54f..a47d6cfea3 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -122,7 +122,6 @@ typedef struct BDRVNBDState {
 int in_flight;
 NBDClientState state;
 int connect_status;
-Error *connect_err;
 bool wait_in_flight;
 
 QEMUTimer *reconnect_delay_timer;
@@ -578,7 +577,6 @@ static void 
nbd_co_establish_connection_cancel(BlockDriverState *bs,
 static coroutine_fn void nbd_reconnect_attempt(BDRVNBDState *s)
 {
 int ret;
-Error *local_err = NULL;
 
 if (!nbd_client_connecting(s)) {
 return;
@@ -619,14 +617,14 @@ static coroutine_fn void 
nbd_reconnect_attempt(BDRVNBDState *s)
 s->ioc = NULL;
 }
 
-if (nbd_co_establish_connection(s->bs, &local_err) < 0) {
+if (nbd_co_establish_connection(s->bs, NULL) < 0) {
 ret = -ECONNREFUSED;
 goto out;
 }
 
 bdrv_dec_in_flight(s->bs);
 
-ret = nbd_client_handshake(s->bs, &local_err);
+ret = nbd_client_handshake(s->bs, NULL);
 
 if (s->drained) {
 s->wait_drained_end = true;
@@ -642,9 +640,6 @@ static coroutine_fn void nbd_reconnect_attempt(BDRVNBDState 
*s)
 
 out:
 s->connect_status = ret;
-error_free(s->connect_err);
-s->connect_err = NULL;
-error_propagate(&s->connect_err, local_err);
 
 if (ret >= 0) {
 /* successfully connected */
-- 
2.29.2

[PATCH 03/14] block/nbd: drop unused NBDConnectThread::err field

2021-04-07 Thread Vladimir Sementsov-Ogievskiy

The field is used only to free it. Let's drop it for now for
simplicity.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/nbd.c | 15 ++-
 1 file changed, 2 insertions(+), 13 deletions(-)

diff --git a/block/nbd.c b/block/nbd.c
index 29c8bf..fbf5154048 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -94,12 +94,8 @@ typedef struct NBDConnectThread {
 QEMUBHFunc *bh_func;
 void *bh_opaque;
 
-/*
- * Result of last attempt. Valid in FAIL and SUCCESS states.
- * If you want to steal error, don't forget to set pointer to NULL.
- */
+/* Result of last attempt. Valid in FAIL and SUCCESS states. */
 QIOChannelSocket *sioc;
-Error *err;
 
 /* state and bh_ctx are protected by mutex */
 QemuMutex mutex;
@@ -385,7 +381,6 @@ static void nbd_free_connect_thread(NBDConnectThread *thr)
 if (thr->sioc) {
 qio_channel_close(QIO_CHANNEL(thr->sioc), NULL);
 }
-error_free(thr->err);
 qapi_free_SocketAddress(thr->saddr);
 g_free(thr);
 }
@@ -398,9 +393,7 @@ static void *connect_thread_func(void *opaque)
 
 thr->sioc = qio_channel_socket_new();
 
-error_free(thr->err);
-thr->err = NULL;
-ret = qio_channel_socket_connect_sync(thr->sioc, thr->saddr, &thr->err);
+ret = qio_channel_socket_connect_sync(thr->sioc, thr->saddr, NULL);
 if (ret < 0) {
 object_unref(OBJECT(thr->sioc));
 thr->sioc = NULL;
@@ -447,8 +440,6 @@ nbd_co_establish_connection(BlockDriverState *bs)
 switch (thr->state) {
 case CONNECT_THREAD_FAIL:
 case CONNECT_THREAD_NONE:
-error_free(thr->err);
-thr->err = NULL;
 thr->state = CONNECT_THREAD_RUNNING;
 qemu_thread_create(&thread, "nbd-connect",
connect_thread_func, thr, QEMU_THREAD_DETACHED);
@@ -491,8 +482,6 @@ nbd_co_establish_connection(BlockDriverState *bs)
 case CONNECT_THREAD_SUCCESS:
 case CONNECT_THREAD_FAIL:
 thr->state = CONNECT_THREAD_NONE;
-error_free(thr->err);
-thr->err = NULL;
 s->sioc = thr->sioc;
 thr->sioc = NULL;
 if (s->sioc) {
-- 
2.29.2

[PATCH 02/14] block/nbd: nbd_co_establish_connection(): drop unused errp

2021-04-07 Thread Vladimir Sementsov-Ogievskiy

We are going to refactor connection logic to make it more
understandable. Every bit that we can simplify in advance will help.
Drop errp for now, it's unused anyway. We'll probably reimplement it in
future.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/nbd.c | 9 -
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/block/nbd.c b/block/nbd.c
index a47d6cfea3..29c8bf 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -145,7 +145,7 @@ typedef struct BDRVNBDState {
 
 static int nbd_establish_connection(BlockDriverState *bs, SocketAddress *saddr,
 Error **errp);
-static int nbd_co_establish_connection(BlockDriverState *bs, Error **errp);
+static int nbd_co_establish_connection(BlockDriverState *bs);
 static void nbd_co_establish_connection_cancel(BlockDriverState *bs,
bool detach);
 static int nbd_client_handshake(BlockDriverState *bs, Error **errp);
@@ -435,7 +435,7 @@ static void *connect_thread_func(void *opaque)
 }
 
 static int coroutine_fn
-nbd_co_establish_connection(BlockDriverState *bs, Error **errp)
+nbd_co_establish_connection(BlockDriverState *bs)
 {
 int ret;
 QemuThread thread;
@@ -491,7 +491,7 @@ nbd_co_establish_connection(BlockDriverState *bs, Error 
**errp)
 case CONNECT_THREAD_SUCCESS:
 case CONNECT_THREAD_FAIL:
 thr->state = CONNECT_THREAD_NONE;
-error_propagate(errp, thr->err);
+error_free(thr->err);
 thr->err = NULL;
 s->sioc = thr->sioc;
 thr->sioc = NULL;
@@ -509,7 +509,6 @@ nbd_co_establish_connection(BlockDriverState *bs, Error 
**errp)
  * result may be used for next connection attempt.
  */
 ret = -1;
-error_setg(errp, "Connection attempt cancelled by other operation");
 break;
 
 case CONNECT_THREAD_NONE:
@@ -617,7 +616,7 @@ static coroutine_fn void nbd_reconnect_attempt(BDRVNBDState 
*s)
 s->ioc = NULL;
 }
 
-if (nbd_co_establish_connection(s->bs, NULL) < 0) {
+if (nbd_co_establish_connection(s->bs) < 0) {
 ret = -ECONNREFUSED;
 goto out;
 }
-- 
2.29.2

[PATCH 08/14] block/nbd: move nbd connect-thread to nbd/client-connect.c

2021-04-07 Thread Vladimir Sementsov-Ogievskiy

connect-thread part is not directly connected to  block-layer. Good
place for it is nbd/ directory.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 include/block/nbd.h  |  6 
 block/nbd.c  | 46 
 nbd/client-connect.c | 72 
 nbd/meson.build  |  1 +
 4 files changed, 79 insertions(+), 46 deletions(-)
 create mode 100644 nbd/client-connect.c

diff --git a/include/block/nbd.h b/include/block/nbd.h
index 5f34d23bb0..660ab4c266 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -406,4 +406,10 @@ const char *nbd_info_lookup(uint16_t info);
 const char *nbd_cmd_lookup(uint16_t info);
 const char *nbd_err_lookup(int err);
 
+
+typedef void (*NBDConnectThreadCallback)(QIOChannelSocket *sioc, int ret,
+ void *opaque);
+void nbd_connect_thread_start(const SocketAddress *saddr,
+  NBDConnectThreadCallback cb, void *cb_opaque);
+
 #endif
diff --git a/block/nbd.c b/block/nbd.c
index 4e669d762a..ba281e2d5a 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -101,15 +101,6 @@ typedef struct NBDConnectCB {
 AioContext *bh_ctx; /* where to schedule bh (NULL means don't schedule) */
 } NBDConnectCB;
 
-typedef void (*NBDConnectThreadCallback)(QIOChannelSocket *sioc, int ret,
- void *opaque);
-
-typedef struct NBDConnectThread {
-SocketAddress *saddr; /* address to connect to */
-NBDConnectThreadCallback cb;
-void *cb_opaque;
-} NBDConnectThread;
-
 typedef struct BDRVNBDState {
 QIOChannelSocket *sioc; /* The master data channel */
 QIOChannel *ioc; /* The current I/O channel which may differ (eg TLS) */
@@ -415,43 +406,6 @@ static void connect_thread_cb(QIOChannelSocket *sioc, int 
ret, void *opaque)
 }
 }
 
-static void *connect_thread_func(void *opaque)
-{
-NBDConnectThread *thr = opaque;
-int ret;
-QIOChannelSocket *sioc = qio_channel_socket_new();
-
-ret = qio_channel_socket_connect_sync(sioc, thr->saddr, NULL);
-if (ret < 0) {
-object_unref(OBJECT(sioc));
-sioc = NULL;
-}
-
-thr->cb(sioc, ret, thr->cb_opaque);
-
-qapi_free_SocketAddress(thr->saddr);
-g_free(thr);
-
-return NULL;
-}
-
-static void nbd_connect_thread_start(const SocketAddress *saddr,
- NBDConnectThreadCallback cb,
- void *cb_opaque)
-{
-QemuThread thread;
-NBDConnectThread *thr = g_new(NBDConnectThread, 1);
-
-*thr = (NBDConnectThread) {
-.saddr = QAPI_CLONE(SocketAddress, saddr),
-.cb = cb,
-.cb_opaque = cb_opaque,
-};
-
-qemu_thread_create(&thread, "nbd-connect",
-   connect_thread_func, thr, QEMU_THREAD_DETACHED);
-}
-
 static int coroutine_fn
 nbd_co_establish_connection(BlockDriverState *bs)
 {
diff --git a/nbd/client-connect.c b/nbd/client-connect.c
new file mode 100644
index 00..9f22c41a34
--- /dev/null
+++ b/nbd/client-connect.c
@@ -0,0 +1,72 @@
+/*
+ * QEMU Block driver for  NBD
+ *
+ * Copyright (c) 2020 Virtuozzo International GmbH.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to 
deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include "qemu/osdep.h"
+
+#include "qapi/qapi-visit-sockets.h"
+#include "qapi/clone-visitor.h"
+
+#include "block/nbd.h"
+
+typedef struct NBDConnectThread {
+SocketAddress *saddr; /* address to connect to */
+NBDConnectThreadCallback cb;
+void *cb_opaque;
+} NBDConnectThread;
+
+static void *connect_thread_func(void *opaque)
+{
+NBDConnectThread *thr = opaque;
+int ret;
+QIOChannelSocket *sioc = qio_channel_socket_new();
+
+ret = qio_channel_socket_connect_sync(sioc, thr->saddr, NULL);
+if (ret < 0) {
+object_unref(OBJECT(sioc));
+sioc = NULL;
+}
+
+thr->cb(sioc, ret, thr->cb_opaque);
+
+qapi_free_SocketAddress(thr->saddr);
+g_free(thr);
+
+

[PATCH 10/14] block/nbd: move wait_connect field under mutex protection

2021-04-07 Thread Vladimir Sementsov-Ogievskiy

Move wait_connect to NBDConnectCB and protect it by mutex. It provides
simpler logic than bothering with bh_ctx (which we can drop now).

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/nbd.c | 42 +-
 1 file changed, 21 insertions(+), 21 deletions(-)

diff --git a/block/nbd.c b/block/nbd.c
index 8bd52884c8..29bdbd38b6 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -88,13 +88,15 @@ typedef struct NBDConnectCB {
 /* Result of last attempt. Valid in FAIL and SUCCESS states. */
 QIOChannelSocket *sioc;
 
-/* state and bh_ctx are protected by mutex */
 QemuMutex mutex;
+/* All further fields are protected by mutex */
 NBDConnectThreadState state; /* current state of the thread */
-AioContext *bh_ctx; /* where to schedule bh (NULL means don't schedule) */
 
 /* Link to NBD BDS. If NULL thread is detached, BDS is probably closed. */
 BlockDriverState *bs;
+
+/* connection_co is waiting in yield() */
+bool wait_connect;
 } NBDConnectCB;
 
 typedef struct BDRVNBDState {
@@ -129,7 +131,6 @@ typedef struct BDRVNBDState {
 char *x_dirty_bitmap;
 bool alloc_depth;
 
-bool wait_connect;
 NBDConnectCB *connect_thread;
 } BDRVNBDState;
 
@@ -365,8 +366,6 @@ static void connect_bh(void *opaque)
 {
 BDRVNBDState *state = opaque;
 
-assert(state->wait_connect);
-state->wait_connect = false;
 aio_co_wake(state->connection_co);
 }
 
@@ -374,6 +373,7 @@ static void connect_thread_cb(QIOChannelSocket *sioc, int 
ret, void *opaque)
 {
 NBDConnectCB *thr = opaque;
 bool do_free = false;
+bool do_wake = false;
 BDRVNBDState *s = thr->bs ? thr->bs->opaque : NULL;
 
 qemu_mutex_lock(&thr->mutex);
@@ -383,12 +383,8 @@ static void connect_thread_cb(QIOChannelSocket *sioc, int 
ret, void *opaque)
 switch (thr->state) {
 case CONNECT_THREAD_RUNNING:
 thr->state = ret < 0 ? CONNECT_THREAD_FAIL : CONNECT_THREAD_SUCCESS;
-if (thr->bh_ctx) {
-aio_bh_schedule_oneshot(thr->bh_ctx, connect_bh, s);
-
-/* play safe, don't reuse bh_ctx on further connection attempts */
-thr->bh_ctx = NULL;
-}
+do_wake = thr->wait_connect;
+thr->wait_connect = false;
 break;
 case CONNECT_THREAD_RUNNING_DETACHED:
 do_free = true;
@@ -399,6 +395,17 @@ static void connect_thread_cb(QIOChannelSocket *sioc, int 
ret, void *opaque)
 
 qemu_mutex_unlock(&thr->mutex);
 
+if (do_wake) {
+/*
+ * At this point we are sure that connection_co sleeps in the
+ * corresponding yield point and we here have an exclusive right
+ * (and obligations) to wake it.
+ * Direct call to aio_co_wake() from thread context works bad. So use
+ * aio_bh_schedule_oneshot() as a mediator.
+ */
+aio_bh_schedule_oneshot(bdrv_get_aio_context(thr->bs), connect_bh, s);
+}
+
 if (do_free) {
 g_free(thr);
 }
@@ -435,20 +442,14 @@ nbd_co_establish_connection(BlockDriverState *bs)
 abort();
 }
 
-thr->bh_ctx = qemu_get_current_aio_context();
+thr->wait_connect = true;
 
 qemu_mutex_unlock(&thr->mutex);
 
-
 /*
  * We are going to wait for connect-thread finish, but
  * nbd_client_co_drain_begin() can interrupt.
- *
- * Note that wait_connect variable is not visible for connect-thread. It
- * doesn't need mutex protection, it used only inside home aio context of
- * bs.
  */
-s->wait_connect = true;
 qemu_coroutine_yield();
 
 qemu_mutex_lock(&thr->mutex);
@@ -512,9 +513,8 @@ static void 
nbd_co_establish_connection_cancel(BlockDriverState *bs,
 
 if (thr->state == CONNECT_THREAD_RUNNING) {
 /* We can cancel only in running state, when bh is not yet scheduled */
-thr->bh_ctx = NULL;
-if (s->wait_connect) {
-s->wait_connect = false;
+if (thr->wait_connect) {
+thr->wait_connect = false;
 wake = true;
 }
 if (detach) {
-- 
2.29.2

[PATCH 12/14] block/nbd: refactor nbd_co_establish_connection

2021-04-07 Thread Vladimir Sementsov-Ogievskiy

We are going to drop nbd connect thread states for simplicity. This is
a step that makes further commit simpler.

Note, that yank_* calls moved out of thr->mutex. They shouldn't be
related and already called from other places in the file not under the
mutex.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/nbd.c | 20 
 1 file changed, 8 insertions(+), 12 deletions(-)

diff --git a/block/nbd.c b/block/nbd.c
index 6729561935..f0319d2256 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -413,7 +413,6 @@ static void connect_thread_cb(QIOChannelSocket *sioc, int 
ret, void *opaque)
 static int coroutine_fn
 nbd_co_establish_connection(BlockDriverState *bs)
 {
-int ret;
 BDRVNBDState *s = bs->opaque;
 NBDConnectCB *thr = s->connect_thread;
 
@@ -430,10 +429,7 @@ nbd_co_establish_connection(BlockDriverState *bs)
 thr->state = CONNECT_THREAD_NONE;
 s->sioc = thr->sioc;
 thr->sioc = NULL;
-yank_register_function(BLOCKDEV_YANK_INSTANCE(bs->node_name),
-   nbd_yank, bs);
-qemu_mutex_unlock(&thr->mutex);
-return 0;
+goto out;
 case CONNECT_THREAD_RUNNING:
 /* Already running, will wait */
 break;
@@ -459,11 +455,6 @@ nbd_co_establish_connection(BlockDriverState *bs)
 thr->state = CONNECT_THREAD_NONE;
 s->sioc = thr->sioc;
 thr->sioc = NULL;
-if (s->sioc) {
-yank_register_function(BLOCKDEV_YANK_INSTANCE(bs->node_name),
-   nbd_yank, bs);
-}
-ret = (s->sioc ? 0 : -1);
 break;
 case CONNECT_THREAD_RUNNING:
 case CONNECT_THREAD_RUNNING_DETACHED:
@@ -472,7 +463,6 @@ nbd_co_establish_connection(BlockDriverState *bs)
  * failed. Still connect thread is executing in background, and its
  * result may be used for next connection attempt.
  */
-ret = -1;
 break;
 
 case CONNECT_THREAD_NONE:
@@ -486,9 +476,15 @@ nbd_co_establish_connection(BlockDriverState *bs)
 abort();
 }
 
+out:
 qemu_mutex_unlock(&thr->mutex);
 
-return ret;
+if (s->sioc) {
+yank_register_function(BLOCKDEV_YANK_INSTANCE(bs->node_name),
+   nbd_yank, bs);
+}
+
+return s->sioc ? 0 : -1;
 }
 
 /*
-- 
2.29.2

[PATCH 04/14] block/nbd: split connect_thread_cb() out of connect_thread_func()

2021-04-07 Thread Vladimir Sementsov-Ogievskiy

We are going to split connect-thread to separate file. Start from
splitting the callback.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/nbd.c | 27 +--
 1 file changed, 17 insertions(+), 10 deletions(-)

diff --git a/block/nbd.c b/block/nbd.c
index fbf5154048..a9d351cbbc 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -385,20 +385,11 @@ static void nbd_free_connect_thread(NBDConnectThread *thr)
 g_free(thr);
 }
 
-static void *connect_thread_func(void *opaque)
+static void connect_thread_cb(int ret, void *opaque)
 {
 NBDConnectThread *thr = opaque;
-int ret;
 bool do_free = false;
 
-thr->sioc = qio_channel_socket_new();
-
-ret = qio_channel_socket_connect_sync(thr->sioc, thr->saddr, NULL);
-if (ret < 0) {
-object_unref(OBJECT(thr->sioc));
-thr->sioc = NULL;
-}
-
 qemu_mutex_lock(&thr->mutex);
 
 switch (thr->state) {
@@ -423,6 +414,22 @@ static void *connect_thread_func(void *opaque)
 if (do_free) {
 nbd_free_connect_thread(thr);
 }
+}
+
+static void *connect_thread_func(void *opaque)
+{
+NBDConnectThread *thr = opaque;
+int ret;
+
+thr->sioc = qio_channel_socket_new();
+
+ret = qio_channel_socket_connect_sync(thr->sioc, thr->saddr, NULL);
+if (ret < 0) {
+object_unref(OBJECT(thr->sioc));
+thr->sioc = NULL;
+}
+
+connect_thread_cb(ret, thr);
 
 return NULL;
 }
-- 
2.29.2

[PATCH 06/14] block/nbd: further segregation of connect-thread

2021-04-07 Thread Vladimir Sementsov-Ogievskiy

Add personal state NBDConnectThread for connect-thread and
nbd_connect_thread_start() function. Next step would be moving
connect-thread to a separate file.

Note that we stop publishing thr->sioc during
qio_channel_socket_connect_sync(). Which probably means that we can't
early-cancel qio_channel_socket_connect_sync() call in
nbd_free_connect_thread(). Still, when thread is detached it doesn't
make big sense, and we have no guarantee anyway.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/nbd.c | 57 -
 1 file changed, 39 insertions(+), 18 deletions(-)

diff --git a/block/nbd.c b/block/nbd.c
index e16c6d636a..23eb8adab1 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -85,8 +85,6 @@ typedef enum NBDConnectThreadState {
 } NBDConnectThreadState;
 
 typedef struct NBDConnectCB {
-/* Initialization constants */
-SocketAddress *saddr; /* address to connect to */
 /*
  * Bottom half to schedule on completion. Scheduled only if bh_ctx is not
  * NULL
@@ -103,6 +101,15 @@ typedef struct NBDConnectCB {
 AioContext *bh_ctx; /* where to schedule bh (NULL means don't schedule) */
 } NBDConnectCB;
 
+typedef void (*NBDConnectThreadCallback)(QIOChannelSocket *sioc, int ret,
+ void *opaque);
+
+typedef struct NBDConnectThread {
+SocketAddress *saddr; /* address to connect to */
+NBDConnectThreadCallback cb;
+void *cb_opaque;
+} NBDConnectThread;
+
 typedef struct BDRVNBDState {
 QIOChannelSocket *sioc; /* The master data channel */
 QIOChannel *ioc; /* The current I/O channel which may differ (eg TLS) */
@@ -367,7 +374,6 @@ static void nbd_init_connect_thread(BDRVNBDState *s)
 s->connect_thread = g_new(NBDConnectCB, 1);
 
 *s->connect_thread = (NBDConnectCB) {
-.saddr = QAPI_CLONE(SocketAddress, s->saddr),
 .state = CONNECT_THREAD_NONE,
 .bh_func = connect_bh,
 .bh_opaque = s,
@@ -378,20 +384,18 @@ static void nbd_init_connect_thread(BDRVNBDState *s)
 
 static void nbd_free_connect_thread(NBDConnectCB *thr)
 {
-if (thr->sioc) {
-qio_channel_close(QIO_CHANNEL(thr->sioc), NULL);
-}
-qapi_free_SocketAddress(thr->saddr);
 g_free(thr);
 }
 
-static void connect_thread_cb(int ret, void *opaque)
+static void connect_thread_cb(QIOChannelSocket *sioc, int ret, void *opaque)
 {
 NBDConnectCB *thr = opaque;
 bool do_free = false;
 
 qemu_mutex_lock(&thr->mutex);
 
+thr->sioc = sioc;
+
 switch (thr->state) {
 case CONNECT_THREAD_RUNNING:
 thr->state = ret < 0 ? CONNECT_THREAD_FAIL : CONNECT_THREAD_SUCCESS;
@@ -418,27 +422,45 @@ static void connect_thread_cb(int ret, void *opaque)
 
 static void *connect_thread_func(void *opaque)
 {
-NBDConnectCB *thr = opaque;
+NBDConnectThread *thr = opaque;
 int ret;
+QIOChannelSocket *sioc = qio_channel_socket_new();
 
-thr->sioc = qio_channel_socket_new();
-
-ret = qio_channel_socket_connect_sync(thr->sioc, thr->saddr, NULL);
+ret = qio_channel_socket_connect_sync(sioc, thr->saddr, NULL);
 if (ret < 0) {
-object_unref(OBJECT(thr->sioc));
-thr->sioc = NULL;
+object_unref(OBJECT(sioc));
+sioc = NULL;
 }
 
-connect_thread_cb(ret, thr);
+thr->cb(sioc, ret, thr->cb_opaque);
+
+qapi_free_SocketAddress(thr->saddr);
+g_free(thr);
 
 return NULL;
 }
 
+static void nbd_connect_thread_start(const SocketAddress *saddr,
+ NBDConnectThreadCallback cb,
+ void *cb_opaque)
+{
+QemuThread thread;
+NBDConnectThread *thr = g_new(NBDConnectThread, 1);
+
+*thr = (NBDConnectThread) {
+.saddr = QAPI_CLONE(SocketAddress, saddr),
+.cb = cb,
+.cb_opaque = cb_opaque,
+};
+
+qemu_thread_create(&thread, "nbd-connect",
+   connect_thread_func, thr, QEMU_THREAD_DETACHED);
+}
+
 static int coroutine_fn
 nbd_co_establish_connection(BlockDriverState *bs)
 {
 int ret;
-QemuThread thread;
 BDRVNBDState *s = bs->opaque;
 NBDConnectCB *thr = s->connect_thread;
 
@@ -448,8 +470,7 @@ nbd_co_establish_connection(BlockDriverState *bs)
 case CONNECT_THREAD_FAIL:
 case CONNECT_THREAD_NONE:
 thr->state = CONNECT_THREAD_RUNNING;
-qemu_thread_create(&thread, "nbd-connect",
-   connect_thread_func, thr, QEMU_THREAD_DETACHED);
+nbd_connect_thread_start(s->saddr, connect_thread_cb, thr);
 break;
 case CONNECT_THREAD_SUCCESS:
 /* Previous attempt finally succeeded in background */
-- 
2.29.2

[PATCH 11/14] block/nbd: refactor connect_bh()

2021-04-07 Thread Vladimir Sementsov-Ogievskiy

Now it's just a wrapper for aio_co_wake(). Make it more obvious.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/nbd.c | 9 -
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/block/nbd.c b/block/nbd.c
index 29bdbd38b6..6729561935 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -362,11 +362,9 @@ static void nbd_init_connect_thread(BlockDriverState *bs)
 qemu_mutex_init(&s->connect_thread->mutex);
 }
 
-static void connect_bh(void *opaque)
+static void coroutine_wake_bh(void *opaque)
 {
-BDRVNBDState *state = opaque;
-
-aio_co_wake(state->connection_co);
+aio_co_wake(opaque);
 }
 
 static void connect_thread_cb(QIOChannelSocket *sioc, int ret, void *opaque)
@@ -403,7 +401,8 @@ static void connect_thread_cb(QIOChannelSocket *sioc, int 
ret, void *opaque)
  * Direct call to aio_co_wake() from thread context works bad. So use
  * aio_bh_schedule_oneshot() as a mediator.
  */
-aio_bh_schedule_oneshot(bdrv_get_aio_context(thr->bs), connect_bh, s);
+aio_bh_schedule_oneshot(bdrv_get_aio_context(thr->bs),
+coroutine_wake_bh, s->connection_co);
 }
 
 if (do_free) {
-- 
2.29.2

[PATCH 07/14] block/nbd: drop nbd_free_connect_thread()

2021-04-07 Thread Vladimir Sementsov-Ogievskiy

Now it's only a wrapper on g_free, we don't need it. Also, it's
clearing the state of nbd connect thread callback, not thread itself,
so name is not very good. Drop it.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/nbd.c | 9 ++---
 1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/block/nbd.c b/block/nbd.c
index 23eb8adab1..4e669d762a 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -382,11 +382,6 @@ static void nbd_init_connect_thread(BDRVNBDState *s)
 qemu_mutex_init(&s->connect_thread->mutex);
 }
 
-static void nbd_free_connect_thread(NBDConnectCB *thr)
-{
-g_free(thr);
-}
-
 static void connect_thread_cb(QIOChannelSocket *sioc, int ret, void *opaque)
 {
 NBDConnectCB *thr = opaque;
@@ -416,7 +411,7 @@ static void connect_thread_cb(QIOChannelSocket *sioc, int 
ret, void *opaque)
 qemu_mutex_unlock(&thr->mutex);
 
 if (do_free) {
-nbd_free_connect_thread(thr);
+g_free(thr);
 }
 }
 
@@ -581,7 +576,7 @@ static void 
nbd_co_establish_connection_cancel(BlockDriverState *bs,
 qemu_mutex_unlock(&thr->mutex);
 
 if (do_free) {
-nbd_free_connect_thread(thr);
+g_free(thr);
 s->connect_thread = NULL;
 }
 
-- 
2.29.2

[PATCH 05/14] block/nbd: rename NBDConnectThread to NBDConnectCB

2021-04-07 Thread Vladimir Sementsov-Ogievskiy

Now it's mostly a state of call-back, so rename the structure, cleaning
the way for new small NBDConnectThread.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/nbd.c | 20 ++--
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/block/nbd.c b/block/nbd.c
index a9d351cbbc..e16c6d636a 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -84,7 +84,7 @@ typedef enum NBDConnectThreadState {
 CONNECT_THREAD_SUCCESS
 } NBDConnectThreadState;
 
-typedef struct NBDConnectThread {
+typedef struct NBDConnectCB {
 /* Initialization constants */
 SocketAddress *saddr; /* address to connect to */
 /*
@@ -101,7 +101,7 @@ typedef struct NBDConnectThread {
 QemuMutex mutex;
 NBDConnectThreadState state; /* current state of the thread */
 AioContext *bh_ctx; /* where to schedule bh (NULL means don't schedule) */
-} NBDConnectThread;
+} NBDConnectCB;
 
 typedef struct BDRVNBDState {
 QIOChannelSocket *sioc; /* The master data channel */
@@ -136,7 +136,7 @@ typedef struct BDRVNBDState {
 bool alloc_depth;
 
 bool wait_connect;
-NBDConnectThread *connect_thread;
+NBDConnectCB *connect_thread;
 } BDRVNBDState;
 
 static int nbd_establish_connection(BlockDriverState *bs, SocketAddress *saddr,
@@ -364,9 +364,9 @@ static void connect_bh(void *opaque)
 
 static void nbd_init_connect_thread(BDRVNBDState *s)
 {
-s->connect_thread = g_new(NBDConnectThread, 1);
+s->connect_thread = g_new(NBDConnectCB, 1);
 
-*s->connect_thread = (NBDConnectThread) {
+*s->connect_thread = (NBDConnectCB) {
 .saddr = QAPI_CLONE(SocketAddress, s->saddr),
 .state = CONNECT_THREAD_NONE,
 .bh_func = connect_bh,
@@ -376,7 +376,7 @@ static void nbd_init_connect_thread(BDRVNBDState *s)
 qemu_mutex_init(&s->connect_thread->mutex);
 }
 
-static void nbd_free_connect_thread(NBDConnectThread *thr)
+static void nbd_free_connect_thread(NBDConnectCB *thr)
 {
 if (thr->sioc) {
 qio_channel_close(QIO_CHANNEL(thr->sioc), NULL);
@@ -387,7 +387,7 @@ static void nbd_free_connect_thread(NBDConnectThread *thr)
 
 static void connect_thread_cb(int ret, void *opaque)
 {
-NBDConnectThread *thr = opaque;
+NBDConnectCB *thr = opaque;
 bool do_free = false;
 
 qemu_mutex_lock(&thr->mutex);
@@ -418,7 +418,7 @@ static void connect_thread_cb(int ret, void *opaque)
 
 static void *connect_thread_func(void *opaque)
 {
-NBDConnectThread *thr = opaque;
+NBDConnectCB *thr = opaque;
 int ret;
 
 thr->sioc = qio_channel_socket_new();
@@ -440,7 +440,7 @@ nbd_co_establish_connection(BlockDriverState *bs)
 int ret;
 QemuThread thread;
 BDRVNBDState *s = bs->opaque;
-NBDConnectThread *thr = s->connect_thread;
+NBDConnectCB *thr = s->connect_thread;
 
 qemu_mutex_lock(&thr->mutex);
 
@@ -536,7 +536,7 @@ static void 
nbd_co_establish_connection_cancel(BlockDriverState *bs,
bool detach)
 {
 BDRVNBDState *s = bs->opaque;
-NBDConnectThread *thr = s->connect_thread;
+NBDConnectCB *thr = s->connect_thread;
 bool wake = false;
 bool do_free = false;
 
-- 
2.29.2

[PATCH 09/14] block/nbd: NBDConnectCB: drop bh_* fields

2021-04-07 Thread Vladimir Sementsov-Ogievskiy

Drop bh_* fields and add back link to bs instead. We are on the way of
simplifying reconnect logic in nbd driver, so look forward to further
commits based on that one.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/nbd.c | 39 +++
 1 file changed, 19 insertions(+), 20 deletions(-)

diff --git a/block/nbd.c b/block/nbd.c
index ba281e2d5a..8bd52884c8 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -85,13 +85,6 @@ typedef enum NBDConnectThreadState {
 } NBDConnectThreadState;
 
 typedef struct NBDConnectCB {
-/*
- * Bottom half to schedule on completion. Scheduled only if bh_ctx is not
- * NULL
- */
-QEMUBHFunc *bh_func;
-void *bh_opaque;
-
 /* Result of last attempt. Valid in FAIL and SUCCESS states. */
 QIOChannelSocket *sioc;
 
@@ -99,6 +92,9 @@ typedef struct NBDConnectCB {
 QemuMutex mutex;
 NBDConnectThreadState state; /* current state of the thread */
 AioContext *bh_ctx; /* where to schedule bh (NULL means don't schedule) */
+
+/* Link to NBD BDS. If NULL thread is detached, BDS is probably closed. */
+BlockDriverState *bs;
 } NBDConnectCB;
 
 typedef struct BDRVNBDState {
@@ -351,32 +347,34 @@ static bool nbd_client_connecting_wait(BDRVNBDState *s)
 return qatomic_load_acquire(&s->state) == NBD_CLIENT_CONNECTING_WAIT;
 }
 
-static void connect_bh(void *opaque)
+static void nbd_init_connect_thread(BlockDriverState *bs)
 {
-BDRVNBDState *state = opaque;
-
-assert(state->wait_connect);
-state->wait_connect = false;
-aio_co_wake(state->connection_co);
-}
+BDRVNBDState *s = bs->opaque;
 
-static void nbd_init_connect_thread(BDRVNBDState *s)
-{
 s->connect_thread = g_new(NBDConnectCB, 1);
 
 *s->connect_thread = (NBDConnectCB) {
 .state = CONNECT_THREAD_NONE,
-.bh_func = connect_bh,
-.bh_opaque = s,
+.bs = bs,
 };
 
 qemu_mutex_init(&s->connect_thread->mutex);
 }
 
+static void connect_bh(void *opaque)
+{
+BDRVNBDState *state = opaque;
+
+assert(state->wait_connect);
+state->wait_connect = false;
+aio_co_wake(state->connection_co);
+}
+
 static void connect_thread_cb(QIOChannelSocket *sioc, int ret, void *opaque)
 {
 NBDConnectCB *thr = opaque;
 bool do_free = false;
+BDRVNBDState *s = thr->bs ? thr->bs->opaque : NULL;
 
 qemu_mutex_lock(&thr->mutex);
 
@@ -386,7 +384,7 @@ static void connect_thread_cb(QIOChannelSocket *sioc, int 
ret, void *opaque)
 case CONNECT_THREAD_RUNNING:
 thr->state = ret < 0 ? CONNECT_THREAD_FAIL : CONNECT_THREAD_SUCCESS;
 if (thr->bh_ctx) {
-aio_bh_schedule_oneshot(thr->bh_ctx, thr->bh_func, thr->bh_opaque);
+aio_bh_schedule_oneshot(thr->bh_ctx, connect_bh, s);
 
 /* play safe, don't reuse bh_ctx on further connection attempts */
 thr->bh_ctx = NULL;
@@ -520,6 +518,7 @@ static void 
nbd_co_establish_connection_cancel(BlockDriverState *bs,
 wake = true;
 }
 if (detach) {
+thr->bs = NULL;
 thr->state = CONNECT_THREAD_RUNNING_DETACHED;
 s->connect_thread = NULL;
 }
@@ -2271,7 +2270,7 @@ static int nbd_open(BlockDriverState *bs, QDict *options, 
int flags,
 /* successfully connected */
 s->state = NBD_CLIENT_CONNECTED;
 
-nbd_init_connect_thread(s);
+nbd_init_connect_thread(bs);
 
 s->connection_co = qemu_coroutine_create(nbd_connection_entry, s);
 bdrv_inc_in_flight(bs);
-- 
2.29.2

[PATCH 13/14] block/nbd: nbd_co_establish_connection_cancel(): rename wake to do_wake

2021-04-07 Thread Vladimir Sementsov-Ogievskiy

Rename local variable to look like do_free in same function and like
do_wake and do_free in connect_thread_cb

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/nbd.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/block/nbd.c b/block/nbd.c
index f0319d2256..9cee5b6650 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -501,7 +501,7 @@ static void 
nbd_co_establish_connection_cancel(BlockDriverState *bs,
 {
 BDRVNBDState *s = bs->opaque;
 NBDConnectCB *thr = s->connect_thread;
-bool wake = false;
+bool do_wake = false;
 bool do_free = false;
 
 qemu_mutex_lock(&thr->mutex);
@@ -510,7 +510,7 @@ static void 
nbd_co_establish_connection_cancel(BlockDriverState *bs,
 /* We can cancel only in running state, when bh is not yet scheduled */
 if (thr->wait_connect) {
 thr->wait_connect = false;
-wake = true;
+do_wake = true;
 }
 if (detach) {
 thr->bs = NULL;
@@ -528,7 +528,7 @@ static void 
nbd_co_establish_connection_cancel(BlockDriverState *bs,
 s->connect_thread = NULL;
 }
 
-if (wake) {
+if (do_wake) {
 aio_co_wake(s->connection_co);
 }
 }
-- 
2.29.2

Re: [for-6.0] Non-deterministic qemu-iotests qsd-jobs failures

2021-04-07 Thread Kevin Wolf

Am 29.03.2021 um 17:43 hat Max Reitz geschrieben:
> On 29.03.21 17:05, Stefan Hajnoczi wrote:
> > Hi Kevin,
> > Peter hit an s390 qemu-iotests failure. I was able to reproduce it
> > easily. I haven't checked if it reproduce on other platforms too, but it
> > seems likely. Here is what I found.
> > 
> > qsd-jobs has race conditions:
> > 
> ># Just make sure that this doesn't crash
> >$QSD --chardev stdio,id=stdio --monitor chardev=stdio \
> >--blockdev node-name=file0,driver=file,filename="$TEST_IMG" \
> >--blockdev node-name=fmt0,driver=qcow2,file=file0 < >{"execute":"qmp_capabilities"}
> >{"execute": "block-commit", "arguments": {"device": "fmt0", "job-id": 
> > "job0"}}
> >{"execute": "quit"}
> >EOF
> > 
> > The intent is for "quit" to run while job0 still exists. This proves
> > that the program does not abort with an assertion failure when "quit" is
> > called while there are block jobs. bdrv_close_all() expects there to be
> > no jobs.
> > 
> > Here is the failure that reproduces 1 out of 3 times:
> > 
> ># build/tests/qemu-iotests/check -qcow2 qsd-jobs
> >QEMU  -- 
> > "/root/qemu/build/tests/qemu-iotests/../../qemu-system-x86_64" -nodefaults 
> > -display none -accel qtest
> >QEMU_IMG  -- "/root/qemu/build/tests/qemu-iotests/../../qemu-img"
> >QEMU_IO   -- "/root/qemu/build/tests/qemu-iotests/../../qemu-io" 
> > --cache writeback --aio threads -f qcow2
> >QEMU_NBD  -- "/root/qemu/build/tests/qemu-iotests/../../qemu-nbd"
> >IMGFMT-- qcow2
> >IMGPROTO  -- file
> >PLATFORM  -- Linux/s390x
> >TEST_DIR  -- /root/qemu/scratch
> >SOCK_DIR  -- /tmp/tmpapj5dydm
> >SOCKET_SCM_HELPER -- 
> > /root/qemu/build/tests/qemu-iotests/socket_scm_helper
> > 
> >qsd-jobs   fail   [10:42:23] [10:42:23]   0.1s   (last: 0.1s)  
> > output mismatch (see qsd-jobs.out.bad)
> >--- /root/qemu/tests/qemu-iotests/tests/qsd-jobs.out
> >+++ qsd-jobs.out.bad
> >@@ -9,11 +9,11 @@
> > {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, 
> > "event": "JOB_STATUS_CHANGE", "data": {"status": "created", "id": "job0"}}
> > {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, 
> > "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "job0"}}
> > {"return": {}}
> >+{"return": {}}
> >+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, 
> > "event": "JOB_STATUS_CHANGE", "data": {"status": "paused", "id": "job0"}}
> >+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, 
> > "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "job0"}}
> > {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, 
> > "event": "JOB_STATUS_CHANGE", "data": {"status": "ready", "id": "job0"}}
> > {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, 
> > "event": "BLOCK_JOB_READY", "data": {"device": "job0", "len": 0, "offset": 
> > 0, "speed": 0, "type": "commit"}}
> >-{"return": {}}
> >-{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, 
> > "event": "JOB_STATUS_CHANGE", "data": {"status": "standby", "id": "job0"}}
> >-{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, 
> > "event": "JOB_STATUS_CHANGE", "data": {"status": "ready", "id": "job0"}}
> > {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, 
> > "event": "JOB_STATUS_CHANGE", "data": {"status": "waiting", "id": "job0"}}
> > {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, 
> > "event": "JOB_STATUS_CHANGE", "data": {"status": "pending", "id": "job0"}}
> > {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, 
> > "event": "BLOCK_JOB_COMPLETED", "data": {"device": "job0", "len": 0, 
> > "offset": 0, "speed": 0, "type": "commit"}}
> >Failures: qsd-jobs
> >Failed 1 of 1 iotests
> > 
> > The timing of the QMP events seems to be non-deterministic relative to
> > the QMP command completions.
> > 
> > It's more complicated than simply lines moving up/down in the expected
> > output: the job state transitions are slightly different (running ->
> > ready vs running -> paused).
> > 
> > Should the test case filter out all JOB_* and BLOCK_JOB_* events?
> > 
> > Is there a way to make this test deterministic?
> 
> Would writing data to the overlay and setting a very restrictive speed help,
> so it doesn’t immediately go into the 'ready' state?  (See attachment.)

I haven't tested the patch yet, but this is the approach we normally use
in tests, so I agree. Do you want to send it as a proper patch?

Kevin

[PATCH 14/14] block/nbd: drop thr->state

2021-04-07 Thread Vladimir Sementsov-Ogievskiy

thr->state variable mostly duplicates information that is already
obvious from the other fields: thr->bs=NULL means DETACHED,
thr->sioc!=NULL means SUCCESS. The only bit of information we need is
"is thread running now or not". So, drop state and add simple boolean
instead. It simplifies the logic a lot.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/nbd.c | 122 +++-
 1 file changed, 34 insertions(+), 88 deletions(-)

diff --git a/block/nbd.c b/block/nbd.c
index 9cee5b6650..5320a359f6 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -66,31 +66,16 @@ typedef enum NBDClientState {
 NBD_CLIENT_QUIT
 } NBDClientState;
 
-typedef enum NBDConnectThreadState {
-/* No thread, no pending results */
-CONNECT_THREAD_NONE,
-
-/* Thread is running, no results for now */
-CONNECT_THREAD_RUNNING,
-
+typedef struct NBDConnectCB {
 /*
- * Thread is running, but requestor exited. Thread should close
- * the new socket and free the connect state on exit.
+ * Result of last attempt. Set in connect_thread_cb()  on success. Should 
be
+ * set to NULL before starting the thread.
  */
-CONNECT_THREAD_RUNNING_DETACHED,
-
-/* Thread finished, results are stored in a state */
-CONNECT_THREAD_FAIL,
-CONNECT_THREAD_SUCCESS
-} NBDConnectThreadState;
-
-typedef struct NBDConnectCB {
-/* Result of last attempt. Valid in FAIL and SUCCESS states. */
 QIOChannelSocket *sioc;
 
 QemuMutex mutex;
 /* All further fields are protected by mutex */
-NBDConnectThreadState state; /* current state of the thread */
+bool running; /* thread is running now */
 
 /* Link to NBD BDS. If NULL thread is detached, BDS is probably closed. */
 BlockDriverState *bs;
@@ -354,10 +339,7 @@ static void nbd_init_connect_thread(BlockDriverState *bs)
 
 s->connect_thread = g_new(NBDConnectCB, 1);
 
-*s->connect_thread = (NBDConnectCB) {
-.state = CONNECT_THREAD_NONE,
-.bs = bs,
-};
+*s->connect_thread = (NBDConnectCB) { .bs = bs };
 
 qemu_mutex_init(&s->connect_thread->mutex);
 }
@@ -374,22 +356,21 @@ static void connect_thread_cb(QIOChannelSocket *sioc, int 
ret, void *opaque)
 bool do_wake = false;
 BDRVNBDState *s = thr->bs ? thr->bs->opaque : NULL;
 
+/* We are in context of connect thread ! */
+
 qemu_mutex_lock(&thr->mutex);
 
+assert(thr->running);
+assert(thr->sioc == NULL);
+assert(thr->bs || !thr->wait_connect);
+
+thr->running = false;
 thr->sioc = sioc;
 
-switch (thr->state) {
-case CONNECT_THREAD_RUNNING:
-thr->state = ret < 0 ? CONNECT_THREAD_FAIL : CONNECT_THREAD_SUCCESS;
-do_wake = thr->wait_connect;
-thr->wait_connect = false;
-break;
-case CONNECT_THREAD_RUNNING_DETACHED:
-do_free = true;
-break;
-default:
-abort();
-}
+do_wake = thr->wait_connect;
+thr->wait_connect = false;
+
+do_free = !thr->bs; /* detached */
 
 qemu_mutex_unlock(&thr->mutex);
 
@@ -416,25 +397,21 @@ nbd_co_establish_connection(BlockDriverState *bs)
 BDRVNBDState *s = bs->opaque;
 NBDConnectCB *thr = s->connect_thread;
 
+assert(!s->sioc);
+
 qemu_mutex_lock(&thr->mutex);
 
-switch (thr->state) {
-case CONNECT_THREAD_FAIL:
-case CONNECT_THREAD_NONE:
-thr->state = CONNECT_THREAD_RUNNING;
-nbd_connect_thread_start(s->saddr, connect_thread_cb, thr);
-break;
-case CONNECT_THREAD_SUCCESS:
+if (thr->sioc) {
 /* Previous attempt finally succeeded in background */
-thr->state = CONNECT_THREAD_NONE;
+assert(!thr->running);
 s->sioc = thr->sioc;
 thr->sioc = NULL;
 goto out;
-case CONNECT_THREAD_RUNNING:
-/* Already running, will wait */
-break;
-default:
-abort();
+}
+
+if (!thr->running) {
+thr->running = true;
+nbd_connect_thread_start(s->saddr, connect_thread_cb, thr);
 }
 
 thr->wait_connect = true;
@@ -449,32 +426,8 @@ nbd_co_establish_connection(BlockDriverState *bs)
 
 qemu_mutex_lock(&thr->mutex);
 
-switch (thr->state) {
-case CONNECT_THREAD_SUCCESS:
-case CONNECT_THREAD_FAIL:
-thr->state = CONNECT_THREAD_NONE;
-s->sioc = thr->sioc;
-thr->sioc = NULL;
-break;
-case CONNECT_THREAD_RUNNING:
-case CONNECT_THREAD_RUNNING_DETACHED:
-/*
- * Obviously, drained section wants to start. Report the attempt as
- * failed. Still connect thread is executing in background, and its
- * result may be used for next connection attempt.
- */
-break;
-
-case CONNECT_THREAD_NONE:
-/*
- * Impossible. We've seen this thread running. So it should be
- * running or at least give some results.
- */
-abort();
-
-default:
-abort();
-}
+s->sioc = thr->sioc;
+thr->

Re: [for-6.0] Non-deterministic qemu-iotests qsd-jobs failures

2021-04-07 Thread Vladimir Sementsov-Ogievskiy


07.04.2021 13:58, Kevin Wolf wrote:

Am 29.03.2021 um 17:43 hat Max Reitz geschrieben:

On 29.03.21 17:05, Stefan Hajnoczi wrote:

Hi Kevin,
Peter hit an s390 qemu-iotests failure. I was able to reproduce it
easily. I haven't checked if it reproduce on other platforms too, but it
seems likely. Here is what I found.

qsd-jobs has race conditions:

# Just make sure that this doesn't crash
$QSD --chardev stdio,id=stdio --monitor chardev=stdio \
--blockdev node-name=file0,driver=file,filename="$TEST_IMG" \
--blockdev node-name=fmt0,driver=qcow2,file=file0 <
ready vs running -> paused).

Should the test case filter out all JOB_* and BLOCK_JOB_* events?

Is there a way to make this test deterministic?


Would writing data to the overlay and setting a very restrictive speed help,
so it doesn’t immediately go into the 'ready' state?  (See attachment.)


I haven't tested the patch yet, but this is the approach we normally use
in tests, so I agree. Do you want to send it as a proper patch?



Max already sent two variants:

[PATCH] iotests/qsd-jobs: Filter events in the first test

and

[PATCH 0/2] iotests/qsd-jobs: Use common.qemu for the QSD

--
Best regards,
Vladimir

Re: [PATCH 01/14] block/nbd: BDRVNBDState: drop unused connect_err

2021-04-07 Thread Roman Kagan

On Wed, Apr 07, 2021 at 01:46:24PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> The field is actually unused. Let's make things a bit simpler dropping
> it and corresponding logic.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
>  block/nbd.c | 9 ++---
>  1 file changed, 2 insertions(+), 7 deletions(-)
> 
> diff --git a/block/nbd.c b/block/nbd.c
> index c26dc5a54f..a47d6cfea3 100644
> --- a/block/nbd.c
> +++ b/block/nbd.c
> @@ -122,7 +122,6 @@ typedef struct BDRVNBDState {
>  int in_flight;
>  NBDClientState state;
>  int connect_status;

This field is write-only too.  Do you have any plans on using it later?
Otherwise you may want to nuke it in this patch too.

Roman.

Re: [PATCH v4 09/23] job: call job_enter from job_pause

2021-04-07 Thread Max Reitz


On 16.01.21 22:46, Vladimir Sementsov-Ogievskiy wrote:

If main job coroutine called job_yield (while some background process
is in progress), we should give it a chance to call job_pause_point().
It will be used in backup, when moved on async block-copy.

Note, that job_user_pause is not enough: we want to handle
child_job_drained_begin() as well, which call job_pause().

Still, if job is already in job_do_yield() in job_pause_point() we
should not enter it.

iotest 109 output is modified: on stop we do bdrv_drain_all() which now
triggers job pause immediately (and pause after ready is standby).

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  job.c  |  3 +++
  tests/qemu-iotests/109.out | 24 
  2 files changed, 27 insertions(+)


While looking into

https://lists.gnu.org/archive/html/qemu-block/2021-04/msg00035.html

I noticed this:

$ ./qemu-img create -f raw src.img 1G
$ ./qemu-img create -f raw dst.img 1G

$ (echo '
{"execute":"qmp_capabilities"}
{"execute":"blockdev-mirror",
 "arguments":{"job-id":"mirror",
  "device":"source",
  "target":"target",
  "sync":"full",
  "filter-node-name":"mirror-top"}}
'; sleep 3; echo '
{"execute":"human-monitor-command",
 "arguments":{"command-line":
  "qemu-io mirror-top \"write 0 1G\""}}') \
| x86_64-softmmu/qemu-system-x86_64 \
-qmp stdio \
-blockdev file,node-name=source,filename=src.img \
-blockdev file,node-name=target,filename=dst.img \
-object iothread,id=iothr0 \
-device virtio-blk,drive=source,iothread=iothr0

Before this commit, qemu-io returned an error that there is a permission 
conflict with virtio-blk.  After this commit, there is an abort (“qemu: 
qemu_mutex_unlock_impl: Operation not permitted”):


#0  0x7f8445a4eef5 in raise () at /usr/lib/libc.so.6
#1  0x7f8445a38862 in abort () at /usr/lib/libc.so.6
#2  0x55fbb14a36bf in error_exit
(err=, msg=msg@entry=0x55fbb1634790 <__func__.27> 
"qemu_mutex_unlock_impl")

at ../util/qemu-thread-posix.c:37
#3  0x55fbb14a3bc3 in qemu_mutex_unlock_impl
(mutex=mutex@entry=0x55fbb25ab6e0, file=file@entry=0x55fbb1636957 
"../util/async.c", line=line@entry=650)

at ../util/qemu-thread-posix.c:109
#4  0x55fbb14b2e75 in aio_context_release 
(ctx=ctx@entry=0x55fbb25ab680) at ../util/async.c:650

#5  0x55fbb13d2029 in bdrv_do_drained_begin
(bs=bs@entry=0x55fbb3a87000, recursive=recursive@entry=false, 
parent=parent@entry=0x0, 
ignore_bds_parents=ignore_bds_parents@entry=false, poll=poll@entry=true) 
at ../block/io.c:441

#6  0x55fbb13d2192 in bdrv_do_drained_begin
(poll=true, ignore_bds_parents=false, parent=0x0, recursive=false, 
bs=0x55fbb3a87000) at ../block/io.c:448
#7  0x55fbb13c71a7 in blk_drain (blk=0x55fbb26c5a00) at 
../block/block-backend.c:1718
#8  0x55fbb13c8bbd in blk_unref (blk=0x55fbb26c5a00) at 
../block/block-backend.c:498

#9  blk_unref (blk=0x55fbb26c5a00) at ../block/block-backend.c:491
#10 0x55fbb1024863 in hmp_qemu_io (mon=0x7fffaf3fc7d0, 
qdict=)

at ../block/monitor/block-hmp-cmds.c:628

Can you make anything out of this?

Max

Re: [PATCH 02/14] block/nbd: nbd_co_establish_connection(): drop unused errp

2021-04-07 Thread Roman Kagan

On Wed, Apr 07, 2021 at 01:46:25PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> We are going to refactor connection logic to make it more
> understandable. Every bit that we can simplify in advance will help.
> Drop errp for now, it's unused anyway. We'll probably reimplement it in
> future.

Although I agree that this passing errors around is a bit of an
overkill, my problem with NBD client is that it's notoriously silent
about problems it expeirences, and those errors never pop up in logs.

Given that these errors are not guest-triggerable, and probably indicate
serious problems at the infrastructure level, instead of endlessly
passing them around (as in the code ATM) or dropping them on the floor
(as you propose in the patch) I'd much rather log them immediately when
encountering.

I have a patch to that end, I'll try to port it on top of your series.

Thanks,
Roman.

Re: [for-6.0] Non-deterministic qemu-iotests qsd-jobs failures

2021-04-07 Thread Kevin Wolf

Am 07.04.2021 um 13:03 hat Vladimir Sementsov-Ogievskiy geschrieben:
> 07.04.2021 13:58, Kevin Wolf wrote:
> > Am 29.03.2021 um 17:43 hat Max Reitz geschrieben:
> > > On 29.03.21 17:05, Stefan Hajnoczi wrote:
> > > > Hi Kevin,
> > > > Peter hit an s390 qemu-iotests failure. I was able to reproduce it
> > > > easily. I haven't checked if it reproduce on other platforms too, but it
> > > > seems likely. Here is what I found.
> > > > 
> > > > qsd-jobs has race conditions:
> > > > 
> > > > # Just make sure that this doesn't crash
> > > > $QSD --chardev stdio,id=stdio --monitor chardev=stdio \
> > > > --blockdev node-name=file0,driver=file,filename="$TEST_IMG" \
> > > > --blockdev node-name=fmt0,driver=qcow2,file=file0 < > > > _filter_qmp
> > > > {"execute":"qmp_capabilities"}
> > > > {"execute": "block-commit", "arguments": {"device": "fmt0", 
> > > > "job-id": "job0"}}
> > > > {"execute": "quit"}
> > > > EOF
> > > > 
> > > > The intent is for "quit" to run while job0 still exists. This proves
> > > > that the program does not abort with an assertion failure when "quit" is
> > > > called while there are block jobs. bdrv_close_all() expects there to be
> > > > no jobs.
> > > > 
> > > > Here is the failure that reproduces 1 out of 3 times:
> > > > 
> > > > # build/tests/qemu-iotests/check -qcow2 qsd-jobs
> > > > QEMU  -- 
> > > > "/root/qemu/build/tests/qemu-iotests/../../qemu-system-x86_64" 
> > > > -nodefaults -display none -accel qtest
> > > > QEMU_IMG  -- 
> > > > "/root/qemu/build/tests/qemu-iotests/../../qemu-img"
> > > > QEMU_IO   -- 
> > > > "/root/qemu/build/tests/qemu-iotests/../../qemu-io" --cache writeback 
> > > > --aio threads -f qcow2
> > > > QEMU_NBD  -- 
> > > > "/root/qemu/build/tests/qemu-iotests/../../qemu-nbd"
> > > > IMGFMT-- qcow2
> > > > IMGPROTO  -- file
> > > > PLATFORM  -- Linux/s390x
> > > > TEST_DIR  -- /root/qemu/scratch
> > > > SOCK_DIR  -- /tmp/tmpapj5dydm
> > > > SOCKET_SCM_HELPER -- 
> > > > /root/qemu/build/tests/qemu-iotests/socket_scm_helper
> > > > 
> > > > qsd-jobs   fail   [10:42:23] [10:42:23]   0.1s   (last: 0.1s)  
> > > > output mismatch (see qsd-jobs.out.bad)
> > > > --- /root/qemu/tests/qemu-iotests/tests/qsd-jobs.out
> > > > +++ qsd-jobs.out.bad
> > > > @@ -9,11 +9,11 @@
> > > >  {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, 
> > > > "event": "JOB_STATUS_CHANGE", "data": {"status": "created", "id": 
> > > > "job0"}}
> > > >  {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, 
> > > > "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": 
> > > > "job0"}}
> > > >  {"return": {}}
> > > > +{"return": {}}
> > > > +{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, 
> > > > "event": "JOB_STATUS_CHANGE", "data": {"status": "paused", "id": 
> > > > "job0"}}
> > > > +{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, 
> > > > "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": 
> > > > "job0"}}
> > > >  {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, 
> > > > "event": "JOB_STATUS_CHANGE", "data": {"status": "ready", "id": "job0"}}
> > > >  {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, 
> > > > "event": "BLOCK_JOB_READY", "data": {"device": "job0", "len": 0, 
> > > > "offset": 0, "speed": 0, "type": "commit"}}
> > > > -{"return": {}}
> > > > -{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, 
> > > > "event": "JOB_STATUS_CHANGE", "data": {"status": "standby", "id": 
> > > > "job0"}}
> > > > -{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, 
> > > > "event": "JOB_STATUS_CHANGE", "data": {"status": "ready", "id": "job0"}}
> > > >  {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, 
> > > > "event": "JOB_STATUS_CHANGE", "data": {"status": "waiting", "id": 
> > > > "job0"}}
> > > >  {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, 
> > > > "event": "JOB_STATUS_CHANGE", "data": {"status": "pending", "id": 
> > > > "job0"}}
> > > >  {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, 
> > > > "event": "BLOCK_JOB_COMPLETED", "data": {"device": "job0", "len": 0, 
> > > > "offset": 0, "speed": 0, "type": "commit"}}
> > > > Failures: qsd-jobs
> > > > Failed 1 of 1 iotests
> > > > 
> > > > The timing of the QMP events seems to be non-deterministic relative to
> > > > the QMP command completions.
> > > > 
> > > > It's more complicated than simply lines moving up/down in the expected
> > > > output: the job state transitions are slightly different (running ->
> > > > ready vs running -> paused).
> > > > 
> > > > Should the test case filter out all JOB_* and BLOCK_JOB_* events?
> > > > 
> > > > Is there a way to make this test deterministic?
> > > 
> > > Would writi

Re: [PATCH v4 09/23] job: call job_enter from job_pause

2021-04-07 Thread Vladimir Sementsov-Ogievskiy


07.04.2021 14:19, Max Reitz wrote:

On 16.01.21 22:46, Vladimir Sementsov-Ogievskiy wrote:

If main job coroutine called job_yield (while some background process
is in progress), we should give it a chance to call job_pause_point().
It will be used in backup, when moved on async block-copy.

Note, that job_user_pause is not enough: we want to handle
child_job_drained_begin() as well, which call job_pause().

Still, if job is already in job_do_yield() in job_pause_point() we
should not enter it.

iotest 109 output is modified: on stop we do bdrv_drain_all() which now
triggers job pause immediately (and pause after ready is standby).

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  job.c  |  3 +++
  tests/qemu-iotests/109.out | 24 
  2 files changed, 27 insertions(+)


While looking into

https://lists.gnu.org/archive/html/qemu-block/2021-04/msg00035.html

I noticed this:

$ ./qemu-img create -f raw src.img 1G
$ ./qemu-img create -f raw dst.img 1G

$ (echo '
    {"execute":"qmp_capabilities"}
    {"execute":"blockdev-mirror",
     "arguments":{"job-id":"mirror",
  "device":"source",
  "target":"target",
  "sync":"full",
  "filter-node-name":"mirror-top"}}
'; sleep 3; echo '
    {"execute":"human-monitor-command",
     "arguments":{"command-line":
  "qemu-io mirror-top \"write 0 1G\""}}') \
| x86_64-softmmu/qemu-system-x86_64 \
    -qmp stdio \
    -blockdev file,node-name=source,filename=src.img \
    -blockdev file,node-name=target,filename=dst.img \
    -object iothread,id=iothr0 \
    -device virtio-blk,drive=source,iothread=iothr0

Before this commit, qemu-io returned an error that there is a permission 
conflict with virtio-blk.  After this commit, there is an abort (“qemu: 
qemu_mutex_unlock_impl: Operation not permitted”):

#0  0x7f8445a4eef5 in raise () at /usr/lib/libc.so.6
#1  0x7f8445a38862 in abort () at /usr/lib/libc.so.6
#2  0x55fbb14a36bf in error_exit
    (err=, msg=msg@entry=0x55fbb1634790 <__func__.27> 
"qemu_mutex_unlock_impl")
    at ../util/qemu-thread-posix.c:37
#3  0x55fbb14a3bc3 in qemu_mutex_unlock_impl
    (mutex=mutex@entry=0x55fbb25ab6e0, file=file@entry=0x55fbb1636957 
"../util/async.c", line=line@entry=650)
    at ../util/qemu-thread-posix.c:109
#4  0x55fbb14b2e75 in aio_context_release (ctx=ctx@entry=0x55fbb25ab680) at 
../util/async.c:650
#5  0x55fbb13d2029 in bdrv_do_drained_begin
    (bs=bs@entry=0x55fbb3a87000, recursive=recursive@entry=false, 
parent=parent@entry=0x0, ignore_bds_parents=ignore_bds_parents@entry=false, 
poll=poll@entry=true) at ../block/io.c:441
#6  0x55fbb13d2192 in bdrv_do_drained_begin
    (poll=true, ignore_bds_parents=false, parent=0x0, recursive=false, 
bs=0x55fbb3a87000) at ../block/io.c:448
#7  0x55fbb13c71a7 in blk_drain (blk=0x55fbb26c5a00) at 
../block/block-backend.c:1718
#8  0x55fbb13c8bbd in blk_unref (blk=0x55fbb26c5a00) at 
../block/block-backend.c:498
#9  blk_unref (blk=0x55fbb26c5a00) at ../block/block-backend.c:491
#10 0x55fbb1024863 in hmp_qemu_io (mon=0x7fffaf3fc7d0, qdict=)
    at ../block/monitor/block-hmp-cmds.c:628

Can you make anything out of this?



Hmm.. Interesting.

man pthread_mutex_unlock
...
 EPERM  The  mutex type is PTHREAD_MUTEX_ERRORCHECK or 
PTHREAD_MUTEX_RECURSIVE, or the mutex is a
  robust mutex, and the current thread does not own the mutex.

So, thread doesn't own the mutex.. We have an iothread here.

AIO_WAIT_WHILE() documents that ctx must be acquired exactly once by caller.. 
But I don't see, where is it acquired in the call stack?

The other question, is why permission conflict is lost with the commit. 
Strange. I ss that hmp_qemu_io creates blk with perm=0 and 
shread=BLK_PERM_ALL.. How could it conflict even before the considered commit?

--
Best regards,
Vladimir

Re: [PATCH 03/14] block/nbd: drop unused NBDConnectThread::err field

2021-04-07 Thread Roman Kagan

On Wed, Apr 07, 2021 at 01:46:26PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> The field is used only to free it. Let's drop it for now for
> simplicity.

Well, it's *now* (after your patch 2) only used to free it.  This makes
the reconnect process even further concealed from the user: the client
may be struggling to re-establish the connection but you'll never know
when looking at the logs.

As I wrote in my comment to patch 2 I believe these errors need to be
logged.  Whether to do it directly in the reconnection thread or punt it
to the spawner of it is another matter; I'd personally prefer to do
logging in the original bdrv context as it allows to (easier) tag the
log messages with the indication of which connection is suffering.

Thanks,
Roman.

Re: [PATCH 03/14] block/nbd: drop unused NBDConnectThread::err field

2021-04-07 Thread Vladimir Sementsov-Ogievskiy


07.04.2021 14:42, Roman Kagan wrote:

On Wed, Apr 07, 2021 at 01:46:26PM +0300, Vladimir Sementsov-Ogievskiy wrote:

The field is used only to free it. Let's drop it for now for
simplicity.


Well, it's *now* (after your patch 2) only used to free it.  This makes
the reconnect process even further concealed from the user: the client
may be struggling to re-establish the connection but you'll never know
when looking at the logs.

As I wrote in my comment to patch 2 I believe these errors need to be
logged.  Whether to do it directly in the reconnection thread or punt it
to the spawner of it is another matter; I'd personally prefer to do
logging in the original bdrv context as it allows to (easier) tag the
log messages with the indication of which connection is suffering.



I see your point. It may be more reasonable to rebase my series on your patch 
and just save the err. Could you send your patch based on master?


--
Best regards,
Vladimir

Re: [PATCH 0/2] block/rbd: fix memory leaks

2021-04-07 Thread Kevin Wolf

Am 29.03.2021 um 17:01 hat Stefano Garzarella geschrieben:
> This series fixes two memory leaks, found through valgrind, in the
> rbd driver.

Thanks, applied to the block branch.

Kevin

[PATCH-for-6.0?] hw/block/fdc: Fix 'fallback' property on sysbus floppy disk controllers

2021-04-07 Thread Philippe Mathieu-Daudé

Setting the 'fallback' property corrupts the QOM instance state
(FDCtrlSysBus) because it accesses an incorrect offset (it uses
the offset of the FDCtrlISABus state).

Fixes: a73275dd6fc ("fdc: Add fallback option")
Signed-off-by: Philippe Mathieu-Daudé 
---
 hw/block/fdc.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/block/fdc.c b/hw/block/fdc.c
index 82afda7f3a7..a825c2acbae 100644
--- a/hw/block/fdc.c
+++ b/hw/block/fdc.c
@@ -2893,7 +2893,7 @@ static Property sysbus_fdc_properties[] = {
 DEFINE_PROP_SIGNED("fdtypeB", FDCtrlSysBus, state.qdev_for_drives[1].type,
 FLOPPY_DRIVE_TYPE_AUTO, qdev_prop_fdc_drive_type,
 FloppyDriveType),
-DEFINE_PROP_SIGNED("fallback", FDCtrlISABus, state.fallback,
+DEFINE_PROP_SIGNED("fallback", FDCtrlSysBus, state.fallback,
 FLOPPY_DRIVE_TYPE_144, qdev_prop_fdc_drive_type,
 FloppyDriveType),
 DEFINE_PROP_END_OF_LIST(),
@@ -2918,7 +2918,7 @@ static Property sun4m_fdc_properties[] = {
 DEFINE_PROP_SIGNED("fdtype", FDCtrlSysBus, state.qdev_for_drives[0].type,
 FLOPPY_DRIVE_TYPE_AUTO, qdev_prop_fdc_drive_type,
 FloppyDriveType),
-DEFINE_PROP_SIGNED("fallback", FDCtrlISABus, state.fallback,
+DEFINE_PROP_SIGNED("fallback", FDCtrlSysBus, state.fallback,
 FLOPPY_DRIVE_TYPE_144, qdev_prop_fdc_drive_type,
 FloppyDriveType),
 DEFINE_PROP_END_OF_LIST(),
-- 
2.26.3

Re: [PATCH 1/6] qdev: introduce qapi/hmp command for kick/call event

2021-04-07 Thread Eduardo Habkost

On Thu, Mar 25, 2021 at 10:44:28PM -0700, Dongli Zhang wrote:
> The virtio device/driver (e.g., vhost-scsi or vhost-net) may hang due to
> the loss of doorbell kick, e.g.,
> 
> https://lists.gnu.org/archive/html/qemu-devel/2018-12/msg01711.html
> 
> ... or due to the loss of IRQ, e.g., as fixed by linux kernel commit
> fe200ae48ef5 ("genirq: Mark polled irqs and defer the real handler").
> 
> This patch introduces a new debug interface 'DeviceEvent' to DeviceClass
> to help narrow down if the issue is due to loss of irq/kick. So far the new
> interface handles only two events: 'call' and 'kick'. Any device (e.g.,
> virtio/vhost or VFIO) may implement the interface (e.g., via eventfd, MSI-X
> or legacy IRQ).
> 
> The 'call' is to inject irq on purpose by admin for a specific device (e.g.,
> vhost-scsi) from QEMU/host to VM, while the 'kick' is to kick the doorbell
> on purpose by admin at QEMU/host side for a specific device.
> 
> Signed-off-by: Dongli Zhang 
[...]
> diff --git a/include/monitor/hmp.h b/include/monitor/hmp.h
> index 605d57287a..c7795d4ba5 100644
> --- a/include/monitor/hmp.h
> +++ b/include/monitor/hmp.h
> @@ -129,5 +129,6 @@ void hmp_info_replay(Monitor *mon, const QDict *qdict);
>  void hmp_replay_break(Monitor *mon, const QDict *qdict);
>  void hmp_replay_delete_break(Monitor *mon, const QDict *qdict);
>  void hmp_replay_seek(Monitor *mon, const QDict *qdict);
> +void hmp_x_debug_device_event(Monitor *mon, const QDict *qdict);
>  
>  #endif
> diff --git a/qapi/qdev.json b/qapi/qdev.json
> index b83178220b..711c4a297a 100644
> --- a/qapi/qdev.json
> +++ b/qapi/qdev.json
> @@ -124,3 +124,33 @@
>  ##
>  { 'event': 'DEVICE_DELETED',
>'data': { '*device': 'str', 'path': 'str' } }
> +
> +##
> +# @x-debug-device-event:
> +#
> +# Generate device event for a specific device queue
> +#
> +# @dev: device path
> +#
> +# @event: event (e.g., kick or call) to trigger

Any specific reason to not use an enum here?

In addition to making the QAPI schema and documentation more
descriptive, it would save you the work of manually defining the
DEVICE_EVENT_* constants and implementing get_device_event().


> +#
> +# @queue: queue id
> +#
> +# Returns: Nothing on success
> +#
> +# Since: 6.1
> +#
> +# Notes: This is used to debug VM driver hang issue. The 'kick' event is to
> +#send notification to QEMU/vhost while the 'call' event is to
> +#interrupt VM on purpose.
> +#
> +# Example:
> +#
> +# -> { "execute": "x-debug-device_event",
> +#  "arguments": { "dev": "/machine/peripheral/vscsi0", "event": "kick",
> +# "queue": 1 } }
> +# <- { "return": {} }
> +#
> +##
> +{ 'command': 'x-debug-device-event',
> +  'data': {'dev': 'str', 'event': 'str', 'queue': 'int'} }
[...]

-- 
Eduardo

Re: [PATCH] iotests/qsd-jobs: Filter events in the first test

2021-04-07 Thread Kevin Wolf

Am 01.04.2021 um 15:28 hat Max Reitz geschrieben:
> The job may or may not be ready before the 'quit' is issued.  Whether it
> is is irrelevant; for the purpose of the test, it only needs to still be
> there.  Filter the job status change and READY events from the output so
> it becomes reliable.
> 
> Reported-by: Peter Maydell 
> Suggested-by: Vladimir Sementsov-Ogievskiy 
> Signed-off-by: Max Reitz 

Thanks, applied to the block branch.

Kevin

[RFC PATCH v2 00/11] qemu_iotests: improve debugging options

2021-04-07 Thread Emanuele Giuseppe Esposito

This series adds the option to attach gdbserver and valgrind
to the QEMU binary running in qemu_iotests.
It also allows to redirect QEMU binaries output of the python tests
to the stdout, instead of a log file.

Patches 1-6 introduce the -gdb option to both python and bash tests, 
7-10 extend the already existing -valgrind flag to work also on 
python tests, and patch 11 introduces -p to enable logging to stdout.

In particular, patches 1,2,4,8 focus on extending the QMP socket timers
when using gdb/valgrind, otherwise the python tests will fail due to
delays in the QMP responses.

This series is tested on the previous serie
"qemu-iotests: quality of life improvements"
but independent from it, so it can be applied separately.

Signed-off-by: Emanuele Giuseppe Esposito 
---
v2:
- add valgrind and print patches
- better splitup of patches, and clearer commit messages

Emanuele Giuseppe Esposito (11):
  python: qemu: add timer parameter for qmp.accept socket
  python: qemu: pass the wrapper field from QEMUQtestmachine to
QEMUMachine
  qemu-iotests: add option to attach gdbserver
  qemu-iotests: delay QMP socket timers
  qemu_iotests: insert gdbserver command line as wrapper for qemu binary
  qemu-iotests: add gdbserver option to script tests too
  qemu_iotests: extend the check script to support valgrind for python
tests
  qemu_iotests: extent QMP socket timeout when using valgrind
  qemu_iotests: allow valgrint to print/delete the generated log file
  qemu_iotests: insert valgrind command line as wrapper for qemu binary
  qemu_iotests: add option to show qemu binary logs on stdout

 python/qemu/machine.py| 12 --
 python/qemu/qtest.py  |  8 ++--
 tests/qemu-iotests/check  |  7 +++-
 tests/qemu-iotests/common.rc  |  8 +++-
 tests/qemu-iotests/iotests.py | 69 ---
 tests/qemu-iotests/testenv.py | 24 ++--
 6 files changed, 111 insertions(+), 17 deletions(-)

-- 
2.30.2

[RFC PATCH v2 01/11] python: qemu: add timer parameter for qmp.accept socket

2021-04-07 Thread Emanuele Giuseppe Esposito

Extend the _post_launch function to include the timer as
parameter instead of defaulting to 15 sec.

Signed-off-by: Emanuele Giuseppe Esposito 
---
 python/qemu/machine.py | 4 ++--
 python/qemu/qtest.py   | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/python/qemu/machine.py b/python/qemu/machine.py
index 6e44bda337..c721e07d63 100644
--- a/python/qemu/machine.py
+++ b/python/qemu/machine.py
@@ -321,9 +321,9 @@ def _pre_launch(self) -> None:
 nickname=self._name
 )
 
-def _post_launch(self) -> None:
+def _post_launch(self, timer) -> None:
 if self._qmp_connection:
-self._qmp.accept()
+self._qmp.accept(timer)
 
 def _post_shutdown(self) -> None:
 """
diff --git a/python/qemu/qtest.py b/python/qemu/qtest.py
index 39a0cf62fe..0d01715086 100644
--- a/python/qemu/qtest.py
+++ b/python/qemu/qtest.py
@@ -138,9 +138,9 @@ def _pre_launch(self) -> None:
 super()._pre_launch()
 self._qtest = QEMUQtestProtocol(self._qtest_path, server=True)
 
-def _post_launch(self) -> None:
+def _post_launch(self, timer) -> None:
 assert self._qtest is not None
-super()._post_launch()
+super()._post_launch(timer)
 self._qtest.accept()
 
 def _post_shutdown(self) -> None:
-- 
2.30.2

[RFC PATCH v2 02/11] python: qemu: pass the wrapper field from QEMUQtestmachine to QEMUMachine

2021-04-07 Thread Emanuele Giuseppe Esposito

Signed-off-by: Emanuele Giuseppe Esposito 
---
 python/qemu/machine.py | 2 +-
 python/qemu/qtest.py   | 4 +++-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/python/qemu/machine.py b/python/qemu/machine.py
index c721e07d63..18d32ebe45 100644
--- a/python/qemu/machine.py
+++ b/python/qemu/machine.py
@@ -109,7 +109,7 @@ def __init__(self,
 
 self._binary = binary
 self._args = list(args)
-self._wrapper = wrapper
+self._wrapper = list(wrapper)
 
 self._name = name or "qemu-%d" % os.getpid()
 self._test_dir = test_dir
diff --git a/python/qemu/qtest.py b/python/qemu/qtest.py
index 0d01715086..4c90daf430 100644
--- a/python/qemu/qtest.py
+++ b/python/qemu/qtest.py
@@ -111,6 +111,7 @@ class QEMUQtestMachine(QEMUMachine):
 def __init__(self,
  binary: str,
  args: Sequence[str] = (),
+ wrapper: Sequence[str] = (),
  name: Optional[str] = None,
  test_dir: str = "/var/tmp",
  socket_scm_helper: Optional[str] = None,
@@ -119,7 +120,8 @@ def __init__(self,
 name = "qemu-%d" % os.getpid()
 if sock_dir is None:
 sock_dir = test_dir
-super().__init__(binary, args, name=name, test_dir=test_dir,
+super().__init__(binary, args, wrapper=wrapper, name=name,
+ test_dir=test_dir,
  socket_scm_helper=socket_scm_helper,
  sock_dir=sock_dir)
 self._qtest: Optional[QEMUQtestProtocol] = None
-- 
2.30.2

[RFC PATCH v2 06/11] qemu-iotests: add gdbserver option to script tests too

2021-04-07 Thread Emanuele Giuseppe Esposito

The only limitation here is that running a script with gdbserver
will make the test output mismatch with the expected
results, making the test fail.

Signed-off-by: Emanuele Giuseppe Esposito 
---
 tests/qemu-iotests/common.rc | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/tests/qemu-iotests/common.rc b/tests/qemu-iotests/common.rc
index 65cdba5723..53a3310fee 100644
--- a/tests/qemu-iotests/common.rc
+++ b/tests/qemu-iotests/common.rc
@@ -166,8 +166,14 @@ _qemu_wrapper()
 if [ -n "${QEMU_NEED_PID}" ]; then
 echo $BASHPID > "${QEMU_TEST_DIR}/qemu-${_QEMU_HANDLE}.pid"
 fi
+
+GDB="${QEMU_PROG}"
+if [ ! -z ${GDB_QEMU} ]; then
+GDB="gdbserver ${GDB_QEMU} ${GDB}"
+fi
+
 VALGRIND_QEMU="${VALGRIND_QEMU_VM}" _qemu_proc_exec 
"${VALGRIND_LOGFILE}" \
-"$QEMU_PROG" $QEMU_OPTIONS "$@"
+   $GDB $QEMU_OPTIONS "$@"
 )
 RETVAL=$?
 _qemu_proc_valgrind_log "${VALGRIND_LOGFILE}" $RETVAL
-- 
2.30.2

[RFC PATCH v2 08/11] qemu_iotests: extent QMP socket timeout when using valgrind

2021-04-07 Thread Emanuele Giuseppe Esposito

As with gdbserver, valgrind delays the test execution, so
the default QMP socket timeout expires too soon.

Signed-off-by: Emanuele Giuseppe Esposito 
---
 python/qemu/machine.py| 4 +++-
 tests/qemu-iotests/iotests.py | 8 
 2 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/python/qemu/machine.py b/python/qemu/machine.py
index 284b73385f..4b6eb39856 100644
--- a/python/qemu/machine.py
+++ b/python/qemu/machine.py
@@ -409,7 +409,9 @@ def _launch(self) -> None:
shell=False,
close_fds=False)
 
-timer = None if 'gdbserver' in self._wrapper else 15.0
+delay_timer = 'gdbserver' in self._wrapper
+delay_timer |= 'valgrind' in self._wrapper
+timer = None if delay_timer else 15.0
 self._post_launch(timer)
 
 def _early_cleanup(self) -> None:
diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index 7c28f0cb74..56733954b2 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -699,7 +699,7 @@ def qmp_to_opts(self, obj):
 def get_qmp_events_filtered(self, wait=60.0):
 result = []
 qmp_wait = wait
-if qemu_gdb:
+if qemu_gdb or qemu_valgrind:
 qmp_wait = 0.0
 for ev in self.get_qmp_events(wait=qmp_wait):
 result.append(filter_qmp_event(ev))
@@ -1003,7 +1003,7 @@ def cancel_and_wait(self, drive='drive0', force=False,
 self.assert_qmp(result, 'return', {})
 
 qmp_wait = wait
-if qemu_gdb:
+if qemu_gdb or qemu_valgrind:
 qmp_wait = 0.0
 
 if resume:
@@ -1029,7 +1029,7 @@ def wait_until_completed(self, drive='drive0', 
check_offset=True,
  wait=60.0, error=None):
 '''Wait for a block job to finish, returning the event'''
 qmp_wait = wait
-if qemu_gdb:
+if qemu_gdb or qemu_valgrind:
 qmp_wait = 0.0
 while True:
 for event in self.vm.get_qmp_events(wait=qmp_wait):
@@ -1077,7 +1077,7 @@ def complete_and_wait(self, drive='drive0', 
wait_ready=True,
 
 def pause_wait(self, job_id='job0'):
 def_timeout = 3
-if qemu_gdb:
+if qemu_gdb or qemu_valgrind:
 def_timeout = 3000
 with Timeout(def_timeout, "Timeout waiting for job to pause"):
 while True:
-- 
2.30.2

[RFC PATCH v2 05/11] qemu_iotests: insert gdbserver command line as wrapper for qemu binary

2021-04-07 Thread Emanuele Giuseppe Esposito

Signed-off-by: Emanuele Giuseppe Esposito 
---
 tests/qemu-iotests/iotests.py | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index 17f07710db..8f6bb20af5 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -579,7 +579,8 @@ class VM(qtest.QEMUQtestMachine):
 
 def __init__(self, path_suffix=''):
 name = "qemu%s-%d" % (path_suffix, os.getpid())
-super().__init__(qemu_prog, qemu_opts, name=name,
+super().__init__(qemu_prog, qemu_opts, wrapper=qemu_gdb,
+ name=name,
  test_dir=test_dir,
  socket_scm_helper=socket_scm_helper,
  sock_dir=sock_dir)
-- 
2.30.2

[RFC PATCH v2 03/11] qemu-iotests: add option to attach gdbserver

2021-04-07 Thread Emanuele Giuseppe Esposito

Add -gdb flag and GDB_QEMU environmental variable
to python tests to attach a gdbserver to each qemu instance.

if -gdb is not provided but $GDB_QEMU is set, ignore the
environmental variable.

Signed-off-by: Emanuele Giuseppe Esposito 
---
 tests/qemu-iotests/check  |  6 +-
 tests/qemu-iotests/iotests.py |  4 
 tests/qemu-iotests/testenv.py | 18 +++---
 3 files changed, 24 insertions(+), 4 deletions(-)

diff --git a/tests/qemu-iotests/check b/tests/qemu-iotests/check
index d1c87ceaf1..6186495eee 100755
--- a/tests/qemu-iotests/check
+++ b/tests/qemu-iotests/check
@@ -33,6 +33,9 @@ def make_argparser() -> argparse.ArgumentParser:
help='pretty print output for make check')
 
 p.add_argument('-d', dest='debug', action='store_true', help='debug')
+p.add_argument('-gdb', action='store_true',
+   help="start gdbserver with $GDB_QEMU options. \
+ Default is localhost:12345")
 p.add_argument('-misalign', action='store_true',
help='misalign memory allocations')
 p.add_argument('--color', choices=['on', 'off', 'auto'],
@@ -112,7 +115,8 @@ if __name__ == '__main__':
 env = TestEnv(imgfmt=args.imgfmt, imgproto=args.imgproto,
   aiomode=args.aiomode, cachemode=args.cachemode,
   imgopts=args.imgopts, misalign=args.misalign,
-  debug=args.debug, valgrind=args.valgrind)
+  debug=args.debug, valgrind=args.valgrind,
+  gdb=args.gdb)
 
 testfinder = TestFinder(test_dir=env.source_iotests)
 
diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index 90d0b62523..05d0dc0751 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -75,6 +75,10 @@
 qemu_prog = os.environ.get('QEMU_PROG', 'qemu')
 qemu_opts = os.environ.get('QEMU_OPTIONS', '').strip().split(' ')
 
+qemu_gdb = []
+if os.environ.get('GDB_QEMU'):
+qemu_gdb = ['gdbserver'] + os.environ.get('GDB_QEMU').strip().split(' ')
+
 imgfmt = os.environ.get('IMGFMT', 'raw')
 imgproto = os.environ.get('IMGPROTO', 'file')
 output_dir = os.environ.get('OUTPUT_DIR', '.')
diff --git a/tests/qemu-iotests/testenv.py b/tests/qemu-iotests/testenv.py
index 1fbec854c1..669eb6b925 100644
--- a/tests/qemu-iotests/testenv.py
+++ b/tests/qemu-iotests/testenv.py
@@ -72,7 +72,8 @@ class TestEnv(ContextManager['TestEnv']):
  'QEMU_NBD_OPTIONS', 'IMGOPTS', 'IMGFMT', 'IMGPROTO',
  'AIOMODE', 'CACHEMODE', 'VALGRIND_QEMU',
  'CACHEMODE_IS_DEFAULT', 'IMGFMT_GENERIC', 'IMGOPTSSYNTAX',
- 'IMGKEYSECRET', 'QEMU_DEFAULT_MACHINE', 'MALLOC_PERTURB_']
+ 'IMGKEYSECRET', 'QEMU_DEFAULT_MACHINE', 'MALLOC_PERTURB_',
+ 'GDB_QEMU']
 
 def get_env(self) -> Dict[str, str]:
 env = {}
@@ -163,7 +164,8 @@ def __init__(self, imgfmt: str, imgproto: str, aiomode: str,
  imgopts: Optional[str] = None,
  misalign: bool = False,
  debug: bool = False,
- valgrind: bool = False) -> None:
+ valgrind: bool = False,
+ gdb: bool = False) -> None:
 self.imgfmt = imgfmt
 self.imgproto = imgproto
 self.aiomode = aiomode
@@ -171,6 +173,14 @@ def __init__(self, imgfmt: str, imgproto: str, aiomode: 
str,
 self.misalign = misalign
 self.debug = debug
 
+self.gdb_qemu = os.getenv('GDB_QEMU')
+
+if gdb and not self.gdb_qemu:
+self.gdb_qemu = 'localhost:12345'
+elif self.gdb_qemu and not gdb:
+del self.gdb_qemu
+del os.environ['GDB_QEMU']
+
 if valgrind:
 self.valgrind_qemu = 'y'
 
@@ -268,7 +278,9 @@ def print_env(self) -> None:
 PLATFORM  -- {platform}
 TEST_DIR  -- {TEST_DIR}
 SOCK_DIR  -- {SOCK_DIR}
-SOCKET_SCM_HELPER -- {SOCKET_SCM_HELPER}"""
+SOCKET_SCM_HELPER -- {SOCKET_SCM_HELPER}
+GDB_QEMU  -- "{GDB_QEMU}"
+"""
 
 args = collections.defaultdict(str, self.get_env())
 
-- 
2.30.2

[RFC PATCH v2 09/11] qemu_iotests: allow valgrint to print/delete the generated log file

2021-04-07 Thread Emanuele Giuseppe Esposito

When using valgrind on the test scripts, it generates a log file
in $TEST_DIR that is either print (if valgrind finds problems) or
otherwise deleted. Provide the same exact behavior when using
-valgrind on the python tests.

Signed-off-by: Emanuele Giuseppe Esposito 
---
 tests/qemu-iotests/iotests.py | 20 
 1 file changed, 20 insertions(+)

diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index 56733954b2..b6166b6f7b 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -597,6 +597,26 @@ def __init__(self, path_suffix=''):
  sock_dir=sock_dir)
 self._num_drives = 0
 
+def subprocess_check_valgrind(self, valgrind) -> None:
+
+if not valgrind:
+return
+
+valgrind_filename =  test_dir + "/" + str(self._popen.pid) + 
".valgrind"
+
+if self.exitcode() == 99:
+with open(valgrind_filename) as f:
+content = f.readlines()
+for line in content:
+print(line, end ="")
+print("")
+else:
+os.remove(valgrind_filename)
+
+def _post_shutdown(self) -> None:
+super()._post_shutdown()
+self.subprocess_check_valgrind(qemu_valgrind)
+
 def add_object(self, opts):
 self._args.append('-object')
 self._args.append(opts)
-- 
2.30.2

[RFC PATCH v2 04/11] qemu-iotests: delay QMP socket timers

2021-04-07 Thread Emanuele Giuseppe Esposito

Attaching a gdbserver implies that the qmp socket
should wait indefinitely for an answer from QEMU.

Signed-off-by: Emanuele Giuseppe Esposito 
---
 python/qemu/machine.py|  4 +++-
 tests/qemu-iotests/iotests.py | 21 +
 2 files changed, 20 insertions(+), 5 deletions(-)

diff --git a/python/qemu/machine.py b/python/qemu/machine.py
index 18d32ebe45..284b73385f 100644
--- a/python/qemu/machine.py
+++ b/python/qemu/machine.py
@@ -408,7 +408,9 @@ def _launch(self) -> None:
stderr=subprocess.STDOUT,
shell=False,
close_fds=False)
-self._post_launch()
+
+timer = None if 'gdbserver' in self._wrapper else 15.0
+self._post_launch(timer)
 
 def _early_cleanup(self) -> None:
 """
diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index 05d0dc0751..17f07710db 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -686,7 +686,10 @@ def qmp_to_opts(self, obj):
 
 def get_qmp_events_filtered(self, wait=60.0):
 result = []
-for ev in self.get_qmp_events(wait=wait):
+qmp_wait = wait
+if qemu_gdb:
+qmp_wait = 0.0
+for ev in self.get_qmp_events(wait=qmp_wait):
 result.append(filter_qmp_event(ev))
 return result
 
@@ -987,13 +990,17 @@ def cancel_and_wait(self, drive='drive0', force=False,
 result = self.vm.qmp('block-job-cancel', device=drive, force=force)
 self.assert_qmp(result, 'return', {})
 
+qmp_wait = wait
+if qemu_gdb:
+qmp_wait = 0.0
+
 if resume:
 self.vm.resume_drive(drive)
 
 cancelled = False
 result = None
 while not cancelled:
-for event in self.vm.get_qmp_events(wait=wait):
+for event in self.vm.get_qmp_events(wait=qmp_wait):
 if event['event'] == 'BLOCK_JOB_COMPLETED' or \
event['event'] == 'BLOCK_JOB_CANCELLED':
 self.assert_qmp(event, 'data/device', drive)
@@ -1009,8 +1016,11 @@ def cancel_and_wait(self, drive='drive0', force=False,
 def wait_until_completed(self, drive='drive0', check_offset=True,
  wait=60.0, error=None):
 '''Wait for a block job to finish, returning the event'''
+qmp_wait = wait
+if qemu_gdb:
+qmp_wait = 0.0
 while True:
-for event in self.vm.get_qmp_events(wait=wait):
+for event in self.vm.get_qmp_events(wait=qmp_wait):
 if event['event'] == 'BLOCK_JOB_COMPLETED':
 self.assert_qmp(event, 'data/device', drive)
 if error is None:
@@ -1054,7 +1064,10 @@ def complete_and_wait(self, drive='drive0', 
wait_ready=True,
 self.assertTrue(event['data']['type'] in ['mirror', 'commit'])
 
 def pause_wait(self, job_id='job0'):
-with Timeout(3, "Timeout waiting for job to pause"):
+def_timeout = 3
+if qemu_gdb:
+def_timeout = 3000
+with Timeout(def_timeout, "Timeout waiting for job to pause"):
 while True:
 result = self.vm.qmp('query-block-jobs')
 found = False
-- 
2.30.2

[RFC PATCH v2 11/11] qemu_iotests: add option to show qemu binary logs on stdout

2021-04-07 Thread Emanuele Giuseppe Esposito

Using the flag -p, allow the qemu binary to print to stdout.
This is helpful especially for print-debugging.

Signed-off-by: Emanuele Giuseppe Esposito 
---
 tests/qemu-iotests/check  | 3 ++-
 tests/qemu-iotests/iotests.py | 9 +
 tests/qemu-iotests/testenv.py | 9 +++--
 3 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/tests/qemu-iotests/check b/tests/qemu-iotests/check
index 6186495eee..6c469c14ff 100755
--- a/tests/qemu-iotests/check
+++ b/tests/qemu-iotests/check
@@ -33,6 +33,7 @@ def make_argparser() -> argparse.ArgumentParser:
help='pretty print output for make check')
 
 p.add_argument('-d', dest='debug', action='store_true', help='debug')
+p.add_argument('-p', dest='print', action='store_true', help='enable 
prints')
 p.add_argument('-gdb', action='store_true',
help="start gdbserver with $GDB_QEMU options. \
  Default is localhost:12345")
@@ -116,7 +117,7 @@ if __name__ == '__main__':
   aiomode=args.aiomode, cachemode=args.cachemode,
   imgopts=args.imgopts, misalign=args.misalign,
   debug=args.debug, valgrind=args.valgrind,
-  gdb=args.gdb)
+  gdb=args.gdb, qprint=args.print)
 
 testfinder = TestFinder(test_dir=env.source_iotests)
 
diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index b23bfdfdff..bb29074b63 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -79,6 +79,8 @@
 if os.environ.get('GDB_QEMU'):
 qemu_gdb = ['gdbserver'] + os.environ.get('GDB_QEMU').strip().split(' ')
 
+qemu_print = os.environ.get('PRINT_QEMU', False)
+
 imgfmt = os.environ.get('IMGFMT', 'raw')
 imgproto = os.environ.get('IMGPROTO', 'file')
 output_dir = os.environ.get('OUTPUT_DIR', '.')
@@ -618,6 +620,13 @@ def _post_shutdown(self) -> None:
 super()._post_shutdown()
 self.subprocess_check_valgrind(qemu_valgrind)
 
+def _pre_launch(self) -> None:
+super()._pre_launch()
+if qemu_print and self._qemu_log_file != None:
+# set QEMU binary output to stdout
+self._qemu_log_file.close()
+self._qemu_log_file = None
+
 def add_object(self, opts):
 self._args.append('-object')
 self._args.append(opts)
diff --git a/tests/qemu-iotests/testenv.py b/tests/qemu-iotests/testenv.py
index 86798bf752..9192d7154b 100644
--- a/tests/qemu-iotests/testenv.py
+++ b/tests/qemu-iotests/testenv.py
@@ -73,7 +73,7 @@ class TestEnv(ContextManager['TestEnv']):
  'AIOMODE', 'CACHEMODE', 'VALGRIND_QEMU',
  'CACHEMODE_IS_DEFAULT', 'IMGFMT_GENERIC', 'IMGOPTSSYNTAX',
  'IMGKEYSECRET', 'QEMU_DEFAULT_MACHINE', 'MALLOC_PERTURB_',
- 'GDB_QEMU']
+ 'GDB_QEMU', 'PRINT_QEMU']
 
 def get_env(self) -> Dict[str, str]:
 env = {}
@@ -165,7 +165,8 @@ def __init__(self, imgfmt: str, imgproto: str, aiomode: str,
  misalign: bool = False,
  debug: bool = False,
  valgrind: bool = False,
- gdb: bool = False) -> None:
+ gdb: bool = False,
+ qprint: bool = False) -> None:
 self.imgfmt = imgfmt
 self.imgproto = imgproto
 self.aiomode = aiomode
@@ -173,6 +174,9 @@ def __init__(self, imgfmt: str, imgproto: str, aiomode: str,
 self.misalign = misalign
 self.debug = debug
 
+if qprint:
+self.print_qemu = 'y'
+
 self.gdb_qemu = os.getenv('GDB_QEMU')
 
 if gdb and not self.gdb_qemu:
@@ -281,6 +285,7 @@ def print_env(self) -> None:
 SOCKET_SCM_HELPER -- {SOCKET_SCM_HELPER}
 GDB_QEMU  -- "{GDB_QEMU}"
 VALGRIND_QEMU -- "{VALGRIND_QEMU}"
+PRINT_QEMU--  "{PRINT_QEMU}"
 """
 
 args = collections.defaultdict(str, self.get_env())
-- 
2.30.2

[RFC PATCH v2 10/11] qemu_iotests: insert valgrind command line as wrapper for qemu binary

2021-04-07 Thread Emanuele Giuseppe Esposito

The priority will be given to gdb command line, meaning if -gdb
and -valgrind parameters are given, only gdb will be wrapped around
the qemu binary.

Signed-off-by: Emanuele Giuseppe Esposito 
---
 tests/qemu-iotests/iotests.py | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index b6166b6f7b..b23bfdfdff 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -590,7 +590,8 @@ class VM(qtest.QEMUQtestMachine):
 
 def __init__(self, path_suffix=''):
 name = "qemu%s-%d" % (path_suffix, os.getpid())
-super().__init__(qemu_prog, qemu_opts, wrapper=qemu_gdb,
+wrapper = qemu_gdb if qemu_gdb else qemu_valgrind
+super().__init__(qemu_prog, qemu_opts, wrapper=wrapper,
  name=name,
  test_dir=test_dir,
  socket_scm_helper=socket_scm_helper,
-- 
2.30.2

[RFC PATCH v2 07/11] qemu_iotests: extend the check script to support valgrind for python tests

2021-04-07 Thread Emanuele Giuseppe Esposito

Currently, the check script only parses the option and sets the
VALGRIND_QEMU environmental variable to "y".
Add another iotests python variable that prepares the command line,
identical to the one provided in the test scripts.

Because the python script does not know in advance the valgrind
PID to assign to the log file name, use the "%p" flag in valgrind
log file name that automatically puts the process PID at runtime.

Signed-off-by: Emanuele Giuseppe Esposito 
---
 tests/qemu-iotests/iotests.py | 11 +++
 tests/qemu-iotests/testenv.py |  1 +
 2 files changed, 12 insertions(+)

diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index 8f6bb20af5..7c28f0cb74 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -96,6 +96,17 @@
 sys.stderr.write('Please run this test via the "check" script\n')
 sys.exit(os.EX_USAGE)
 
+qemu_valgrind = []
+if os.environ.get('VALGRIND_QEMU') == "y" and \
+os.environ.get('NO_VALGRIND') != "y":
+valgrind_logfile = "--log-file=" + test_dir.strip()
+# %p allows to put the valgrind process PID, since
+# we don't know it a priori (subprocess.Peopen is
+# not yet invoked)
+valgrind_logfile += "/%p.valgrind"
+
+qemu_valgrind = ['valgrind', valgrind_logfile, '--error-exitcode=99']
+
 socket_scm_helper = os.environ.get('SOCKET_SCM_HELPER', 'socket_scm_helper')
 
 luks_default_secret_object = 'secret,id=keysec0,data=' + \
diff --git a/tests/qemu-iotests/testenv.py b/tests/qemu-iotests/testenv.py
index 669eb6b925..86798bf752 100644
--- a/tests/qemu-iotests/testenv.py
+++ b/tests/qemu-iotests/testenv.py
@@ -280,6 +280,7 @@ def print_env(self) -> None:
 SOCK_DIR  -- {SOCK_DIR}
 SOCKET_SCM_HELPER -- {SOCKET_SCM_HELPER}
 GDB_QEMU  -- "{GDB_QEMU}"
+VALGRIND_QEMU -- "{VALGRIND_QEMU}"
 """
 
 args = collections.defaultdict(str, self.get_env())
-- 
2.30.2

[for-6.1 0/4] virtio: Improve boot time of virtio-scsi-pci and virtio-blk-pci

2021-04-07 Thread Greg Kurz

Now that virtio-scsi-pci and virtio-blk-pci map 1 virtqueue per vCPU,
a serious slow down may be observed on setups with a big enough number
of vCPUs.

Exemple with a pseries guest on a bi-POWER9 socket system (128 HW threads):

  virtio-scsi  virtio-blk

1   0m20.922s   0m21.346s
2   0m21.230s   0m20.350s
4   0m21.761s   0m20.997s
8   0m22.770s   0m20.051s
16  0m22.038s   0m19.994s
32  0m22.928s   0m20.803s
64  0m26.583s   0m22.953s
128 0m41.273s   0m32.333s
256 2m4.727s1m16.924s
384 6m5.563s3m26.186s

Both perf and gprof indicate that QEMU is hogging CPUs when setting up
the ioeventfds:

 67.88%  swapper [kernel.kallsyms]  [k] power_pmu_enable
  9.47%  qemu-kvm[kernel.kallsyms]  [k] smp_call_function_single
  8.64%  qemu-kvm[kernel.kallsyms]  [k] power_pmu_enable
=>2.79%  qemu-kvmqemu-kvm   [.] memory_region_ioeventfd_before
=>2.12%  qemu-kvmqemu-kvm   [.] address_space_update_ioeventfds
  0.56%  kworker/8:0-mm  [kernel.kallsyms]  [k] smp_call_function_single

address_space_update_ioeventfds() is called when committing an MR
transaction, i.e. for each ioeventfd with the current code base,
and it internally loops on all ioventfds:

static void address_space_update_ioeventfds(AddressSpace *as)
{
[...]
FOR_EACH_FLAT_RANGE(fr, view) {
for (i = 0; i < fr->mr->ioeventfd_nb; ++i) {

This means that the setup of ioeventfds for these devices has
quadratic time complexity.

This series simply changes the device models to extend the transaction
to all virtqueueues, like already done in the past in the generic
code with 710fccf80d78 ("virtio: improve virtio devices initialization
time").

Only virtio-scsi and virtio-blk are covered here, but a similar change
might also be beneficial to other device types such as host-scsi-pci,
vhost-user-scsi-pci and vhost-user-blk-pci.

  virtio-scsi  virtio-blk

1   0m21.271s   0m22.076s
2   0m20.912s   0m19.716s
4   0m20.508s   0m19.310s
8   0m21.374s   0m20.273s
16  0m21.559s   0m21.374s
32  0m22.532s   0m21.271s
64  0m26.550s   0m22.007s
128 0m29.115s   0m27.446s
256 0m44.752s   0m41.004s
384 1m2.884s0m58.023s

This should fix https://bugzilla.redhat.com/show_bug.cgi?id=1927108
which reported the issue for virtio-scsi-pci.

Changes since RFC:

As suggested by Stefan, splimplify the code by directly beginning and
committing the memory transaction from the device model, without all
the virtio specific proxying code and no changes needed in the memory
subsystem.

Greg Kurz (4):
  virtio-blk: Fix rollback path in virtio_blk_data_plane_start()
  virtio-blk: Configure all host notifiers in a single MR transaction
  virtio-scsi: Set host notifiers and callbacks separately
  virtio-scsi: Configure all host notifiers in a single MR transaction

 hw/block/dataplane/virtio-blk.c | 36 +++--
 hw/scsi/virtio-scsi-dataplane.c | 56 ++---
 2 files changed, 72 insertions(+), 20 deletions(-)

-- 
2.26.3

[for-6.1 3/4] virtio-scsi: Set host notifiers and callbacks separately

2021-04-07 Thread Greg Kurz

Host notifiers are guaranteed to be idle until the callbacks are
hooked up with virtio_queue_aio_set_host_notifier_handler(). They
thus don't need to be set or unset with the AioContext lock held.

Do this outside the critical section, like virtio-blk already
does : basically downgrading virtio_scsi_vring_init() to only
setup the host notifier and set the callback in the caller.

This will allow to batch addition/deletion of ioeventds in
a single memory transaction, which is expected to greatly
improve initialization time.

Signed-off-by: Greg Kurz 
---
 hw/scsi/virtio-scsi-dataplane.c | 40 ++---
 1 file changed, 22 insertions(+), 18 deletions(-)

diff --git a/hw/scsi/virtio-scsi-dataplane.c b/hw/scsi/virtio-scsi-dataplane.c
index 4ad879340645..b2cb3d9dcc64 100644
--- a/hw/scsi/virtio-scsi-dataplane.c
+++ b/hw/scsi/virtio-scsi-dataplane.c
@@ -94,8 +94,7 @@ static bool virtio_scsi_data_plane_handle_event(VirtIODevice 
*vdev,
 return progress;
 }
 
-static int virtio_scsi_vring_init(VirtIOSCSI *s, VirtQueue *vq, int n,
-  VirtIOHandleAIOOutput fn)
+static int virtio_scsi_set_host_notifier(VirtIOSCSI *s, VirtQueue *vq, int n)
 {
 BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(s)));
 int rc;
@@ -109,7 +108,6 @@ static int virtio_scsi_vring_init(VirtIOSCSI *s, VirtQueue 
*vq, int n,
 return rc;
 }
 
-virtio_queue_aio_set_host_notifier_handler(vq, s->ctx, fn);
 return 0;
 }
 
@@ -154,38 +152,44 @@ int virtio_scsi_dataplane_start(VirtIODevice *vdev)
 goto fail_guest_notifiers;
 }
 
-aio_context_acquire(s->ctx);
-rc = virtio_scsi_vring_init(s, vs->ctrl_vq, 0,
-virtio_scsi_data_plane_handle_ctrl);
-if (rc) {
-goto fail_vrings;
+rc = virtio_scsi_set_host_notifier(s, vs->ctrl_vq, 0);
+if (rc != 0) {
+goto fail_host_notifiers;
 }
 
 vq_init_count++;
-rc = virtio_scsi_vring_init(s, vs->event_vq, 1,
-virtio_scsi_data_plane_handle_event);
-if (rc) {
-goto fail_vrings;
+rc = virtio_scsi_set_host_notifier(s, vs->event_vq, 1);
+if (rc != 0) {
+goto fail_host_notifiers;
 }
 
 vq_init_count++;
+
 for (i = 0; i < vs->conf.num_queues; i++) {
-rc = virtio_scsi_vring_init(s, vs->cmd_vqs[i], i + 2,
-virtio_scsi_data_plane_handle_cmd);
+rc = virtio_scsi_set_host_notifier(s, vs->cmd_vqs[i], i + 2);
 if (rc) {
-goto fail_vrings;
+goto fail_host_notifiers;
 }
 vq_init_count++;
 }
 
+aio_context_acquire(s->ctx);
+virtio_queue_aio_set_host_notifier_handler(vs->ctrl_vq, s->ctx,
+
virtio_scsi_data_plane_handle_ctrl);
+virtio_queue_aio_set_host_notifier_handler(vs->event_vq, s->ctx,
+   
virtio_scsi_data_plane_handle_event);
+
+for (i = 0; i < vs->conf.num_queues; i++) {
+virtio_queue_aio_set_host_notifier_handler(vs->cmd_vqs[i], s->ctx,
+ 
virtio_scsi_data_plane_handle_cmd);
+}
+
 s->dataplane_starting = false;
 s->dataplane_started = true;
 aio_context_release(s->ctx);
 return 0;
 
-fail_vrings:
-aio_wait_bh_oneshot(s->ctx, virtio_scsi_dataplane_stop_bh, s);
-aio_context_release(s->ctx);
+fail_host_notifiers:
 for (i = 0; i < vq_init_count; i++) {
 virtio_bus_set_host_notifier(VIRTIO_BUS(qbus), i, false);
 virtio_bus_cleanup_host_notifier(VIRTIO_BUS(qbus), i);
-- 
2.26.3

[for-6.1 2/4] virtio-blk: Configure all host notifiers in a single MR transaction

2021-04-07 Thread Greg Kurz

This allows the virtio-blk-pci device to batch the setup of all its
host notifiers. This significantly improves boot time of VMs with a
high number of vCPUs, e.g. from 3m26.186s down to 0m58.023s for a
pseries machine with 384 vCPUs.

Note that memory_region_transaction_commit() must be called before
virtio_bus_cleanup_host_notifier() because the latter might close
ioeventfds that the transaction still assumes to be around when it
commits.

Signed-off-by: Greg Kurz 
---
 hw/block/dataplane/virtio-blk.c | 25 +
 1 file changed, 25 insertions(+)

diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c
index d7b5c95d26d9..cd81893d1d01 100644
--- a/hw/block/dataplane/virtio-blk.c
+++ b/hw/block/dataplane/virtio-blk.c
@@ -198,19 +198,30 @@ int virtio_blk_data_plane_start(VirtIODevice *vdev)
 goto fail_guest_notifiers;
 }
 
+memory_region_transaction_begin();
+
 /* Set up virtqueue notify */
 for (i = 0; i < nvqs; i++) {
 r = virtio_bus_set_host_notifier(VIRTIO_BUS(qbus), i, true);
 if (r != 0) {
+int j = i;
+
 fprintf(stderr, "virtio-blk failed to set host notifier (%d)\n", 
r);
 while (i--) {
 virtio_bus_set_host_notifier(VIRTIO_BUS(qbus), i, false);
+}
+
+memory_region_transaction_commit();
+
+while (j--) {
 virtio_bus_cleanup_host_notifier(VIRTIO_BUS(qbus), i);
 }
 goto fail_host_notifiers;
 }
 }
 
+memory_region_transaction_commit();
+
 s->starting = false;
 vblk->dataplane_started = true;
 trace_virtio_blk_data_plane_start(s);
@@ -246,8 +257,15 @@ int virtio_blk_data_plane_start(VirtIODevice *vdev)
 return 0;
 
   fail_aio_context:
+memory_region_transaction_begin();
+
 for (i = 0; i < nvqs; i++) {
 virtio_bus_set_host_notifier(VIRTIO_BUS(qbus), i, false);
+}
+
+memory_region_transaction_commit();
+
+for (i = 0; i < nvqs; i++) {
 virtio_bus_cleanup_host_notifier(VIRTIO_BUS(qbus), i);
 }
   fail_host_notifiers:
@@ -312,8 +330,15 @@ void virtio_blk_data_plane_stop(VirtIODevice *vdev)
 
 aio_context_release(s->ctx);
 
+memory_region_transaction_begin();
+
 for (i = 0; i < nvqs; i++) {
 virtio_bus_set_host_notifier(VIRTIO_BUS(qbus), i, false);
+}
+
+memory_region_transaction_commit();
+
+for (i = 0; i < nvqs; i++) {
 virtio_bus_cleanup_host_notifier(VIRTIO_BUS(qbus), i);
 }
 
-- 
2.26.3

[for-6.1 4/4] virtio-scsi: Configure all host notifiers in a single MR transaction

2021-04-07 Thread Greg Kurz

This allows the virtio-scsi-pci device to batch the setup of all its
host notifiers. This significantly improves boot time of VMs with a
high number of vCPUs, e.g. from 6m5.563s down to 1m2.884s for a
pseries machine with 384 vCPUs.

Note that memory_region_transaction_commit() must be called before
virtio_bus_cleanup_host_notifier() because the latter might close
ioeventfds that the transaction still assumes to be around when it
commits.

Signed-off-by: Greg Kurz 
---
 hw/scsi/virtio-scsi-dataplane.c | 16 
 1 file changed, 16 insertions(+)

diff --git a/hw/scsi/virtio-scsi-dataplane.c b/hw/scsi/virtio-scsi-dataplane.c
index b2cb3d9dcc64..28e003250a11 100644
--- a/hw/scsi/virtio-scsi-dataplane.c
+++ b/hw/scsi/virtio-scsi-dataplane.c
@@ -152,6 +152,8 @@ int virtio_scsi_dataplane_start(VirtIODevice *vdev)
 goto fail_guest_notifiers;
 }
 
+memory_region_transaction_begin();
+
 rc = virtio_scsi_set_host_notifier(s, vs->ctrl_vq, 0);
 if (rc != 0) {
 goto fail_host_notifiers;
@@ -173,6 +175,8 @@ int virtio_scsi_dataplane_start(VirtIODevice *vdev)
 vq_init_count++;
 }
 
+memory_region_transaction_commit();
+
 aio_context_acquire(s->ctx);
 virtio_queue_aio_set_host_notifier_handler(vs->ctrl_vq, s->ctx,
 
virtio_scsi_data_plane_handle_ctrl);
@@ -192,6 +196,11 @@ int virtio_scsi_dataplane_start(VirtIODevice *vdev)
 fail_host_notifiers:
 for (i = 0; i < vq_init_count; i++) {
 virtio_bus_set_host_notifier(VIRTIO_BUS(qbus), i, false);
+}
+
+memory_region_transaction_commit();
+
+for (i = 0; i < vq_init_count; i++) {
 virtio_bus_cleanup_host_notifier(VIRTIO_BUS(qbus), i);
 }
 k->set_guest_notifiers(qbus->parent, vs->conf.num_queues + 2, false);
@@ -229,8 +238,15 @@ void virtio_scsi_dataplane_stop(VirtIODevice *vdev)
 
 blk_drain_all(); /* ensure there are no in-flight requests */
 
+memory_region_transaction_begin();
+
 for (i = 0; i < vs->conf.num_queues + 2; i++) {
 virtio_bus_set_host_notifier(VIRTIO_BUS(qbus), i, false);
+}
+
+memory_region_transaction_commit();
+
+for (i = 0; i < vs->conf.num_queues + 2; i++) {
 virtio_bus_cleanup_host_notifier(VIRTIO_BUS(qbus), i);
 }
 
-- 
2.26.3

[for-6.1 1/4] virtio-blk: Fix rollback path in virtio_blk_data_plane_start()

2021-04-07 Thread Greg Kurz

When dataplane multiqueue support was added in QEMU 2.7, the path
that would rollback guest notifiers assignment in case of error
simply got dropped.

Later on, when Error was added to blk_set_aio_context() in QEMU 4.1,
another error path was introduced, but it ommits to rollback both
host and guest notifiers.

It seems cleaner to fix the rollback path in one go. The patch is
simple enough that it can be adjusted if backported to a pre-4.1
QEMU.

Fixes: 51b04ac5c6a6 ("virtio-blk: dataplane multiqueue support")
Cc: stefa...@redhat.com
Fixes: 97896a4887a0 ("block: Add Error to blk_set_aio_context()")
Cc: kw...@redhat.com
Signed-off-by: Greg Kurz 
Reviewed-by: Stefan Hajnoczi 
---
 hw/block/dataplane/virtio-blk.c | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c
index e9050c8987e7..d7b5c95d26d9 100644
--- a/hw/block/dataplane/virtio-blk.c
+++ b/hw/block/dataplane/virtio-blk.c
@@ -207,7 +207,7 @@ int virtio_blk_data_plane_start(VirtIODevice *vdev)
 virtio_bus_set_host_notifier(VIRTIO_BUS(qbus), i, false);
 virtio_bus_cleanup_host_notifier(VIRTIO_BUS(qbus), i);
 }
-goto fail_guest_notifiers;
+goto fail_host_notifiers;
 }
 }
 
@@ -221,7 +221,7 @@ int virtio_blk_data_plane_start(VirtIODevice *vdev)
 aio_context_release(old_context);
 if (r < 0) {
 error_report_err(local_err);
-goto fail_guest_notifiers;
+goto fail_aio_context;
 }
 
 /* Process queued requests before the ones in vring */
@@ -245,6 +245,13 @@ int virtio_blk_data_plane_start(VirtIODevice *vdev)
 aio_context_release(s->ctx);
 return 0;
 
+  fail_aio_context:
+for (i = 0; i < nvqs; i++) {
+virtio_bus_set_host_notifier(VIRTIO_BUS(qbus), i, false);
+virtio_bus_cleanup_host_notifier(VIRTIO_BUS(qbus), i);
+}
+  fail_host_notifiers:
+k->set_guest_notifiers(qbus->parent, nvqs, false);
   fail_guest_notifiers:
 /*
  * If we failed to set up the guest notifiers queued requests will be
-- 
2.26.3

Re: [PATCH v5 2/6] qcow2: fix cache discarding in update_refcount()

2021-04-07 Thread Alberto Garcia

On Fri 26 Mar 2021 09:00:41 PM CET, Vladimir Sementsov-Ogievskiy 
 wrote:
> Here refcount of cluster at @cluster_offset reached 0, so we "free"
> that cluster. Not a cluster at @offset. The thing that save us from the
> bug is that L2 tables and refblocks are discarded one by one. Still,
> let's be precise.
>
> Signed-off-by: Vladimir Sementsov-Ogievskiy 

Reviewed-by: Alberto Garcia 

Berto

Re: [PATCH] iotests: Test mirror-top filter permissions

2021-04-07 Thread Kevin Wolf

Am 31.03.2021 um 14:28 hat Max Reitz geschrieben:
> Add a test accompanying commit 53431b9086b2832ca1aeff0c55e186e9ed79bd11
> ("block/mirror: Fix mirror_top's permissions").
> 
> Signed-off-by: Max Reitz 

Thanks, applied to the block branch.

Kevin

Re: [PATCH v2] iotests: add test for removing persistent bitmap from backing file

2021-04-07 Thread Kevin Wolf

Am 01.04.2021 um 18:15 hat Vladimir Sementsov-Ogievskiy geschrieben:
> Just demonstrate one of x-blockdev-reopen usecases. We can't simply
> remove persistent bitmap from RO node (for example from backing file),
> as we need to remove it from the image too. So, we should reopen the
> node first.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 

Thanks, applied to the block branch.

Kevin

[PULL for-6.0 v3 00/10] emulated nvme fixes for -rc3

2021-04-07 Thread Klaus Jensen

From: Klaus Jensen 

Hi Peter,

Gitlab CI finally got back in business :)

Note: only cover letter sent for this v3.


The following changes since commit d0d3dd401b70168a353450e031727affee828527:

  Update version for v6.0.0-rc2 release (2021-04-06 18:34:34 +0100)

are available in the Git repository at:

  git://git.infradead.org/qemu-nvme.git tags/nvme-fixes-20210407-pull-request

for you to fetch changes up to 7645f21f409b67eb9aad9feef6283c2e186e3703:

  hw/block/nvme: fix out-of-bounds read in nvme_subsys_ctrl (2021-04-07 
10:48:33 +0200)


emulated nvme fixes for -rc3

v3:
  - removed unnecessary deprecation warning

v2:
  - added missing patches



Klaus Jensen (10):
  hw/block/nvme: fix pi constraint check
  hw/block/nvme: fix missing string representation for ns attachment
  hw/block/nvme: fix the nsid 'invalid' value
  hw/block/nvme: fix warning about legacy namespace configuration
  hw/block/nvme: update dmsrl limit on namespace detachment
  hw/block/nvme: fix handling of private namespaces
  hw/block/nvme: add missing copyright headers
  hw/block/nvme: fix ns attachment out-of-bounds read
  hw/block/nvme: fix assert crash in nvme_subsys_ns
  hw/block/nvme: fix out-of-bounds read in nvme_subsys_ctrl

 hw/block/nvme-dif.h|  10 +++
 hw/block/nvme-ns.h |  12 ++--
 hw/block/nvme-subsys.h |  11 ++--
 hw/block/nvme.h|  41 +---
 include/block/nvme.h   |   1 +
 hw/block/nvme-dif.c|  10 +++
 hw/block/nvme-ns.c |  78 ++
 hw/block/nvme-subsys.c |  28 
 hw/block/nvme.c| 142 +
 hw/block/trace-events  |   1 -
 10 files changed, 155 insertions(+), 179 deletions(-)

-- 
2.31.1

Re: [PATCH 0/4] iotests/297: Cover tests/

2021-04-07 Thread Kevin Wolf

Am 29.03.2021 um 15:26 hat Max Reitz geschrieben:
> Hi,
> 
> When reviewing Vladimir’s new addition to tests/, I noticed that 297 so
> far does not cover named tests.  That isn’t so good.
> 
> This series makes it cover them, and because tests/ is rather sparse at
> this point, I decided to also fix up the two tests in there that don’t
> pass pylint’s scrutiny yet.  I think it would be nice if we could keep
> all of tests/ clean.

For patch 2, Vladimir already made the point I would have made.

For the rest:
Reviewed-by: Kevin Wolf

Re: [PATCH v3 04/36] block: bdrv_append(): don't consume reference

2021-04-07 Thread Alberto Garcia

On Wed 17 Mar 2021 03:34:57 PM CET, Vladimir Sementsov-Ogievskiy 
 wrote:
> We have too much comments for this feature. It seems better just don't
> do it. Most of real users (tests don't count) have to create additional
> reference.
>
> Drop also comment in external_snapshot_prepare:
>  - bdrv_append doesn't "remove" old bs in common sense, it sounds
>strange
>  - the fact that bdrv_append can fail is obvious from the context
>  - the fact that we must rollback all changes in transaction abort is
>known (it's the direct role of abort)
>
> Signed-off-by: Vladimir Sementsov-Ogievskiy 

Reviewed-by: Alberto Garcia 

> @@ -4645,36 +4640,22 @@ int bdrv_replace_node(BlockDriverState *from, 
> BlockDriverState *to,
>   * bs_new must not be attached to a BlockBackend.
>   *
>   * This function does not create any image files.
> - *
> - * bdrv_append() takes ownership of a bs_new reference and unrefs it because
> - * that's what the callers commonly need. bs_new will be referenced by the 
> old
> - * parents of bs_top after bdrv_append() returns. If the caller needs to 
> keep a
> - * reference of its own, it must call bdrv_ref().
>   */
>  int bdrv_append(BlockDriverState *bs_new, BlockDriverState *bs_top,
>  Error **errp)

You could still mention explicitly that the old parents of @bs_top will
add a new reference to @bs_new.

Berto

Re: [PATCH for-6.0] block/nbd: fix possible use after free of s->connect_thread

2021-04-07 Thread Vladimir Sementsov-Ogievskiy


06.04.2021 19:20, Vladimir Sementsov-Ogievskiy wrote:

06.04.2021 18:51, Vladimir Sementsov-Ogievskiy wrote:

If on nbd_close() we detach the thread (in
nbd_co_establish_connection_cancel() thr->state becomes
CONNECT_THREAD_RUNNING_DETACHED), after that point we should not use
s->connect_thread (which is set to NULL), as running thread may free it
at any time.

Still nbd_co_establish_connection() does exactly this: it saves
s->connect_thread to local variable (just for better code style) and
use it even after yield point, when thread may be already detached.

Fix that. Also check thr to be non-NULL on
nbd_co_establish_connection() start for safety.

After this patch "case CONNECT_THREAD_RUNNING_DETACHED" becomes
impossible in the second switch in nbd_co_establish_connection().
Still, don't add extra abort() just before the release. If it somehow
possible to reach this "case:" it won't hurt. Anyway, good refactoring
of all this reconnect mess will come soon.

Signed-off-by: Vladimir Sementsov-Ogievskiy
---

Hi all! I faced a crash, just running 277 iotest in a loop. I can't
reproduce it on master, it reproduces only on my branch with nbd
reconnect refactorings.

Still, it seems very possible that it may crash under some conditions.
So I propose this patch for 6.0. It's written so that it's obvious that
it will not hurt:

  pre-patch, on first hunk we'll just crash if thr is NULL,
  on second hunk it's safe to return -1, and using thr when
  s->connect_thread is already zeroed is obviously wrong.


Ha, occasionally I reinvented what Roman already does in "[PATCH 1/7] block/nbd: 
avoid touching freed connect_thread".

My additional first hunk actually is not needed, as nbd_co_establish_connection is 
called after if (!nbd_clisent_connecting(s)) { return; }, so we should not be here 
after  nbd_co_establish_connection_cancel(bs, true); which is called with 
s->state set to NBD_CLIENT_QUIT.

So, it would be more honest to take Roman's patch "[PATCH 1/7] block/nbd: avoid 
touching freed connect_thread" :)


Still, I like my variant, because it make obvious that s->connect_thread may 
change only to NULL, not to some new pointer.



Eric, could you take a look? If there no more pending block patches, I can try 
to send pull-request myself



Kevin, I see you've staged several patches for rc3.. This one is quite simple, 
could you add it too?

--
Best regards,
Vladimir

Re: [PATCH 1/6] qdev: introduce qapi/hmp command for kick/call event

2021-04-07 Thread Dongli Zhang




On 4/7/21 6:40 AM, Eduardo Habkost wrote:
> On Thu, Mar 25, 2021 at 10:44:28PM -0700, Dongli Zhang wrote:
>> The virtio device/driver (e.g., vhost-scsi or vhost-net) may hang due to
>> the loss of doorbell kick, e.g.,
>>
>> https://urldefense.com/v3/__https://lists.gnu.org/archive/html/qemu-devel/2018-12/msg01711.html__;!!GqivPVa7Brio!NaqdV_o0gMJkUtVWaHyLRwKDa_8MsiuANAqEcM-Ooy4pYE3R1bwPmLdCTkE0gq6gywY$
>>  
>>
>> ... or due to the loss of IRQ, e.g., as fixed by linux kernel commit
>> fe200ae48ef5 ("genirq: Mark polled irqs and defer the real handler").
>>
>> This patch introduces a new debug interface 'DeviceEvent' to DeviceClass
>> to help narrow down if the issue is due to loss of irq/kick. So far the new
>> interface handles only two events: 'call' and 'kick'. Any device (e.g.,
>> virtio/vhost or VFIO) may implement the interface (e.g., via eventfd, MSI-X
>> or legacy IRQ).
>>
>> The 'call' is to inject irq on purpose by admin for a specific device (e.g.,
>> vhost-scsi) from QEMU/host to VM, while the 'kick' is to kick the doorbell
>> on purpose by admin at QEMU/host side for a specific device.
>>
>> Signed-off-by: Dongli Zhang 
> [...]
>> diff --git a/include/monitor/hmp.h b/include/monitor/hmp.h
>> index 605d57287a..c7795d4ba5 100644
>> --- a/include/monitor/hmp.h
>> +++ b/include/monitor/hmp.h
>> @@ -129,5 +129,6 @@ void hmp_info_replay(Monitor *mon, const QDict *qdict);
>>  void hmp_replay_break(Monitor *mon, const QDict *qdict);
>>  void hmp_replay_delete_break(Monitor *mon, const QDict *qdict);
>>  void hmp_replay_seek(Monitor *mon, const QDict *qdict);
>> +void hmp_x_debug_device_event(Monitor *mon, const QDict *qdict);
>>  
>>  #endif
>> diff --git a/qapi/qdev.json b/qapi/qdev.json
>> index b83178220b..711c4a297a 100644
>> --- a/qapi/qdev.json
>> +++ b/qapi/qdev.json
>> @@ -124,3 +124,33 @@
>>  ##
>>  { 'event': 'DEVICE_DELETED',
>>'data': { '*device': 'str', 'path': 'str' } }
>> +
>> +##
>> +# @x-debug-device-event:
>> +#
>> +# Generate device event for a specific device queue
>> +#
>> +# @dev: device path
>> +#
>> +# @event: event (e.g., kick or call) to trigger
> 
> Any specific reason to not use an enum here?
> 
> In addition to making the QAPI schema and documentation more
> descriptive, it would save you the work of manually defining the
> DEVICE_EVENT_* constants and implementing get_device_event().

Thank you very much for the suggestion!

I will use enum in json file.

Dongli Zhang

> 
> 
>> +#
>> +# @queue: queue id
>> +#
>> +# Returns: Nothing on success
>> +#
>> +# Since: 6.1
>> +#
>> +# Notes: This is used to debug VM driver hang issue. The 'kick' event is to
>> +#send notification to QEMU/vhost while the 'call' event is to
>> +#interrupt VM on purpose.
>> +#
>> +# Example:
>> +#
>> +# -> { "execute": "x-debug-device_event",
>> +#  "arguments": { "dev": "/machine/peripheral/vscsi0", "event": "kick",
>> +# "queue": 1 } }
>> +# <- { "return": {} }
>> +#
>> +##
>> +{ 'command': 'x-debug-device-event',
>> +  'data': {'dev': 'str', 'event': 'str', 'queue': 'int'} }
> [...]
>

Re: [PATCH 0/6] Add debug interface to kick/call on purpose

2021-04-07 Thread Dongli Zhang




On 4/6/21 7:20 PM, Jason Wang wrote:
> 
> 在 2021/4/7 上午7:27, Dongli Zhang 写道:
>>> This will answer your question that "Can it bypass the masking?".
>>>
>>> For vhost-scsi, virtio-blk, virtio-scsi and virtio-net, to write to eventfd 
>>> is
>>> not able to bypass masking because masking is to unregister the eventfd. To
>>> write to eventfd does not take effect.
>>>
>>> However, it is possible to bypass masking for vhost-net because vhost-net
>>> registered a specific masked_notifier eventfd in order to mask irq. To 
>>> write to
>>> original eventfd still takes effect.
>>>
>>> We may leave the user to decide whether to write to 'masked_notifier' or
>>> original 'guest_notifier' for vhost-net.
>> My fault here. To write to masked_notifier will always be masked:(
> 
> 
> Only when there's no bug in the qemu.
> 
> 
>>
>> If it is EventNotifier level, we will not care whether the EventNotifier is
>> masked or not. It just provides an interface to write to EventNotifier.
> 
> 
> Yes.
> 
> 
>>
>> To dump the MSI-x table for both virtio and vfio will help confirm if the 
>> vector
>> is masked.
> 
> 
> That would be helpful as well. It's probably better to extend "info pci" 
> command.
> 
> Thanks

I will try if to add to "info pci" (introduce new arg option to "info pci"), or
to introduce new command.

About the EventNotifier, I will classify them as guest notifier or host notifier
so that it will be much more easier for user to tell if the eventfd is for
injecting IRQ or kicking the doorbell.

Thank you very much for all suggestions!

Dongli Zhang

> 
> 
>>
>> Thank you very much!
>>
>> Dongli Zhang
>>
>

Re: [PATCH 0/6] Add debug interface to kick/call on purpose

2021-04-07 Thread Jason Wang




在 2021/4/8 下午1:51, Dongli Zhang 写道:


On 4/6/21 7:20 PM, Jason Wang wrote:

在 2021/4/7 上午7:27, Dongli Zhang 写道:

This will answer your question that "Can it bypass the masking?".

For vhost-scsi, virtio-blk, virtio-scsi and virtio-net, to write to eventfd is
not able to bypass masking because masking is to unregister the eventfd. To
write to eventfd does not take effect.

However, it is possible to bypass masking for vhost-net because vhost-net
registered a specific masked_notifier eventfd in order to mask irq. To write to
original eventfd still takes effect.

We may leave the user to decide whether to write to 'masked_notifier' or
original 'guest_notifier' for vhost-net.

My fault here. To write to masked_notifier will always be masked:(


Only when there's no bug in the qemu.



If it is EventNotifier level, we will not care whether the EventNotifier is
masked or not. It just provides an interface to write to EventNotifier.


Yes.



To dump the MSI-x table for both virtio and vfio will help confirm if the vector
is masked.


That would be helpful as well. It's probably better to extend "info pci" 
command.

Thanks

I will try if to add to "info pci" (introduce new arg option to "info pci"), or
to introduce new command.



It's better to just reuse "info pci".




About the EventNotifier, I will classify them as guest notifier or host notifier
so that it will be much more easier for user to tell if the eventfd is for
injecting IRQ or kicking the doorbell.



Sounds good.




Thank you very much for all suggestions!

Dongli Zhang



Thanks





Thank you very much!

Dongli Zhang

60 matches

Mail list logo