date:20210408

Re: [PATCH] hw/block/nvme: map prp fix if prp2 contains non-zero offset

2021-04-08 Thread Keith Busch

On Thu, Apr 08, 2021 at 09:53:13PM +0530, Padmakar Kalghatgi wrote:
> +/*
> + *   The first PRP list entry, pointed by PRP2 can contain
> + *   offsets. Hence, we need calculate the no of entries in
> + *   prp2 based on the offset it has.
> + */

This comment has some unnecessary spacing at the beginning.

> +nents = (n->page_size - (prp2 % n->page_size)) >> 3;

page_size is a always a power of two, so let's replace the costly modulo
with:

nents = (n->page_size - (prp2 & (n->page_size - 1))) >> 3;

Re: [RFC PATCH v2 02/11] python: qemu: pass the wrapper field from QEMUQtestmachine to QEMUMachine

2021-04-08 Thread John Snow


On 4/7/21 9:50 AM, Emanuele Giuseppe Esposito wrote:

Signed-off-by: Emanuele Giuseppe Esposito 
---
  python/qemu/machine.py | 2 +-
  python/qemu/qtest.py   | 4 +++-
  2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/python/qemu/machine.py b/python/qemu/machine.py
index c721e07d63..18d32ebe45 100644
--- a/python/qemu/machine.py
+++ b/python/qemu/machine.py
@@ -109,7 +109,7 @@ def __init__(self,
  
  self._binary = binary

  self._args = list(args)
-self._wrapper = wrapper
+self._wrapper = list(wrapper)
 


Unrelated change?

(I'm assuming you want to copy the user's input to explicitly avoid 
sharing state. Commit message blurb for this would be good.)



  self._name = name or "qemu-%d" % os.getpid()
  self._test_dir = test_dir
diff --git a/python/qemu/qtest.py b/python/qemu/qtest.py
index 0d01715086..4c90daf430 100644
--- a/python/qemu/qtest.py
+++ b/python/qemu/qtest.py
@@ -111,6 +111,7 @@ class QEMUQtestMachine(QEMUMachine):
  def __init__(self,
   binary: str,
   args: Sequence[str] = (),
+ wrapper: Sequence[str] = (),
   name: Optional[str] = None,
   test_dir: str = "/var/tmp",
   socket_scm_helper: Optional[str] = None,
@@ -119,7 +120,8 @@ def __init__(self,
  name = "qemu-%d" % os.getpid()
  if sock_dir is None:
  sock_dir = test_dir
-super().__init__(binary, args, name=name, test_dir=test_dir,
+super().__init__(binary, args, wrapper=wrapper, name=name,
+ test_dir=test_dir,
   socket_scm_helper=socket_scm_helper,
   sock_dir=sock_dir)
  self._qtest: Optional[QEMUQtestProtocol] = None



ACK

Re: [RFC PATCH v2 01/11] python: qemu: add timer parameter for qmp.accept socket

2021-04-08 Thread John Snow


On 4/7/21 9:50 AM, Emanuele Giuseppe Esposito wrote:

Extend the _post_launch function to include the timer as
parameter instead of defaulting to 15 sec.

Signed-off-by: Emanuele Giuseppe Esposito 
---
  python/qemu/machine.py | 4 ++--
  python/qemu/qtest.py   | 4 ++--
  2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/python/qemu/machine.py b/python/qemu/machine.py
index 6e44bda337..c721e07d63 100644
--- a/python/qemu/machine.py
+++ b/python/qemu/machine.py
@@ -321,9 +321,9 @@ def _pre_launch(self) -> None:
  nickname=self._name
  )
  
-def _post_launch(self) -> None:

+def _post_launch(self, timer) -> None:
  if self._qmp_connection:
-self._qmp.accept()
+self._qmp.accept(timer)
  
  def _post_shutdown(self) -> None:

  """
diff --git a/python/qemu/qtest.py b/python/qemu/qtest.py
index 39a0cf62fe..0d01715086 100644
--- a/python/qemu/qtest.py
+++ b/python/qemu/qtest.py
@@ -138,9 +138,9 @@ def _pre_launch(self) -> None:
  super()._pre_launch()
  self._qtest = QEMUQtestProtocol(self._qtest_path, server=True)
  
-def _post_launch(self) -> None:

+def _post_launch(self, timer) -> None:
  assert self._qtest is not None
-super()._post_launch()
+super()._post_launch(timer)
  self._qtest.accept()
  
  def _post_shutdown(self) -> None:




Are you forgetting to change _launch() to provide some default value for 
what timer needs to be?


I think for the "event" callbacks here, I'd prefer configuring the 
behavior as a property instead of passing it around as a parameter.


(Also, we have an awful lot of timeouts now... is it time to think about 
rewriting this using asyncio so that we can allow the callers to specify 
their own timeouts in with context blocks? Just a thought for later; we 
have an awful lot of timeouts scattered throughout machine.py, qmp.py, etc.)


--js

[PATCH 2/2] hw/block/nvme: drain namespaces on sq deletion

2021-04-08 Thread Klaus Jensen

From: Klaus Jensen 

For most commands, when issuing an AIO, the BlockAIOCB is stored in the
NvmeRequest aiocb pointer when the AIO is issued. The main use of this
is cancelling AIOs when deleting submission queues (it is currently not
used for Abort).

However, some commands like Dataset Management Zone Management Send
(zone reset) may involve more than one AIO and here the AIOs are issued
without saving a reference to the BlockAIOCB. This is a problem since
nvme_del_sq() will attempt to cancel outstanding AIOs, potentially with
an invalid BlockAIOCB since the aiocb pointer is not NULL'ed when the
request structure is recycled.

Fix this by

  1. making sure the aiocb pointer is NULL'ed when requests are recycled
  2. only attempt to cancel the AIO if the aiocb is non-NULL
  3. if any AIOs could not be cancelled, drain all aio as a last resort.

Fixes: dc04d25e2f3f ("hw/block/nvme: add support for the format nvm command")
Fixes: c94973288cd9 ("hw/block/nvme: add broadcast nsid support flush command")
Fixes: e4e430b3d6ba ("hw/block/nvme: add simple copy command")
Fixes: 5f5dc4c6a942 ("hw/block/nvme: zero out zones on reset")
Fixes: 2605257a26b8 ("hw/block/nvme: add the dataset management command")
Cc: Gollu Appalanaidu 
Cc: Minwoo Im 
Signed-off-by: Klaus Jensen 
---
 hw/block/nvme.c | 23 +--
 1 file changed, 21 insertions(+), 2 deletions(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 94bc373260be..3c4297e38a52 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -470,6 +470,7 @@ static void nvme_req_clear(NvmeRequest *req)
 {
 req->ns = NULL;
 req->opaque = NULL;
+req->aiocb = NULL;
 memset(>cqe, 0x0, sizeof(req->cqe));
 req->status = NVME_SUCCESS;
 }
@@ -3681,6 +3682,7 @@ static uint16_t nvme_del_sq(NvmeCtrl *n, NvmeRequest *req)
 NvmeSQueue *sq;
 NvmeCQueue *cq;
 uint16_t qid = le16_to_cpu(c->qid);
+int nsid;
 
 if (unlikely(!qid || nvme_check_sqid(n, qid))) {
 trace_pci_nvme_err_invalid_del_sq(qid);
@@ -3692,9 +3694,26 @@ static uint16_t nvme_del_sq(NvmeCtrl *n, NvmeRequest 
*req)
 sq = n->sq[qid];
 while (!QTAILQ_EMPTY(>out_req_list)) {
 r = QTAILQ_FIRST(>out_req_list);
-assert(r->aiocb);
-blk_aio_cancel(r->aiocb);
+if (r->aiocb) {
+blk_aio_cancel(r->aiocb);
+}
 }
+
+/*
+ * Drain all namespaces if there are still outstanding requests that we
+ * could not cancel explicitly.
+ */
+if (!QTAILQ_EMPTY(>out_req_list)) {
+for (nsid = 1; nsid <= NVME_MAX_NAMESPACES; nsid++) {
+NvmeNamespace *ns = nvme_ns(n, nsid);
+if (ns) {
+nvme_ns_drain(ns);
+}
+}
+}
+
+assert(QTAILQ_EMPTY(>out_req_list));
+
 if (!nvme_check_cqid(n, sq->cqid)) {
 cq = n->cq[sq->cqid];
 QTAILQ_REMOVE(>sq_list, sq, entry);
-- 
2.31.1

[PATCH 1/2] hw/block/nvme: store aiocb in compare

2021-04-08 Thread Klaus Jensen

From: Klaus Jensen 

nvme_compare() fails to store the aiocb from the blk_aio_preadv() call.
Fix this.

Fixes: 0a384f923f51 ("hw/block/nvme: add compare command")
Cc: Gollu Appalanaidu 
Signed-off-by: Klaus Jensen 
---
 hw/block/nvme.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 6b1f056a0ebc..94bc373260be 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -2837,7 +2837,8 @@ static uint16_t nvme_compare(NvmeCtrl *n, NvmeRequest 
*req)
 
 block_acct_start(blk_get_stats(blk), >acct, data_len,
  BLOCK_ACCT_READ);
-blk_aio_preadv(blk, offset, >data.iov, 0, nvme_compare_data_cb, req);
+req->aiocb = blk_aio_preadv(blk, offset, >data.iov, 0,
+nvme_compare_data_cb, req);
 
 return NVME_NO_COMPLETE;
 }
-- 
2.31.1

Re: [PATCH] hw/block/nvme: map prp fix if prp2 contains non-zero offset

2021-04-08 Thread Klaus Jensen


On Apr  8 21:53, Padmakar Kalghatgi wrote:

From: padmakar 

nvme_map_prp needs to calculate the number of list entries based on the
offset value. For the subsequent PRP2 list, need to ensure the number of
entries is within the MAX number of PRP entries for a page.

Signed-off-by: Padmakar Kalghatgi 
---
hw/block/nvme.c | 12 +---
1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index d439e44..efb7368 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -577,7 +577,12 @@ static uint16_t nvme_map_prp(NvmeCtrl *n, NvmeSg *sg, 
uint64_t prp1,
uint32_t nents, prp_trans;
int i = 0;

-nents = (len + n->page_size - 1) >> n->page_bits;
+/*
+ *   The first PRP list entry, pointed by PRP2 can contain
+ *   offsets. Hence, we need calculate the no of entries in
+ *   prp2 based on the offset it has.
+ */
+nents = (n->page_size - (prp2 % n->page_size)) >> 3;
prp_trans = MIN(n->max_prp_ents, nents) * sizeof(uint64_t);
ret = nvme_addr_read(n, prp2, (void *)prp_list, prp_trans);
if (ret) {
@@ -588,7 +593,7 @@ static uint16_t nvme_map_prp(NvmeCtrl *n, NvmeSg *sg, 
uint64_t prp1,
while (len != 0) {
uint64_t prp_ent = le64_to_cpu(prp_list[i]);

-if (i == n->max_prp_ents - 1 && len > n->page_size) {
+if (i == nents - 1 && len > n->page_size) {
if (unlikely(prp_ent & (n->page_size - 1))) {
trace_pci_nvme_err_invalid_prplist_ent(prp_ent);
status = NVME_INVALID_PRP_OFFSET | NVME_DNR;
@@ -597,7 +602,8 @@ static uint16_t nvme_map_prp(NvmeCtrl *n, NvmeSg *sg, 
uint64_t prp1,

i = 0;
nents = (len + n->page_size - 1) >> n->page_bits;
-prp_trans = MIN(n->max_prp_ents, nents) * sizeof(uint64_t);
+nents = MIN(nents, n->max_prp_ents);
+prp_trans = nents * sizeof(uint64_t);
ret = nvme_addr_read(n, prp_ent, (void *)prp_list,
 prp_trans);
if (ret) {
--
2.7.0.windows.1




LGTM.

Reviewed-by: Klaus Jensen 


signature.asc
Description: PGP signature

Re: [RFC PATCH v2 04/11] qemu-iotests: delay QMP socket timers

2021-04-08 Thread Paolo Bonzini

Il gio 8 apr 2021, 18:06 Emanuele Giuseppe Esposito 
ha scritto:

>
>
> On 08/04/2021 17:40, Paolo Bonzini wrote:
> > On 07/04/21 15:50, Emanuele Giuseppe Esposito wrote:
> >>   def get_qmp_events_filtered(self, wait=60.0):
> >>   result = []
> >> -for ev in self.get_qmp_events(wait=wait):
> >> +qmp_wait = wait
> >> +if qemu_gdb:
> >> +qmp_wait = 0.0
> >> +for ev in self.get_qmp_events(wait=qmp_wait):
> >>   result.append(filter_qmp_event(ev))
> >>   return result
> >
> > Should this be handled in get_qmp_events instead, since you're basically
> > changing all the callers?
>
> get_qmp_events is in python/machine.py, which as I understand might be
> used also by some other scripts, so I want to keep the changes there to
> the minimum. Also, machine.py has no access to qemu_gdb or
> qemu_valgrind, so passing a boolean or something to delay the timer
> would still require to add a similar check in all sections.
>
> Or do you have a cleaner way to do this?
>

Maybe a subclass IotestsMachine?

Paolo


> Emanuele
>
>

[PATCH] hw/block/nvme: map prp fix if prp2 contains non-zero offset

2021-04-08 Thread Padmakar Kalghatgi

From: padmakar 

nvme_map_prp needs to calculate the number of list entries based on the
offset value. For the subsequent PRP2 list, need to ensure the number of
entries is within the MAX number of PRP entries for a page.

Signed-off-by: Padmakar Kalghatgi 
---
 hw/block/nvme.c | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index d439e44..efb7368 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -577,7 +577,12 @@ static uint16_t nvme_map_prp(NvmeCtrl *n, NvmeSg *sg, 
uint64_t prp1,
 uint32_t nents, prp_trans;
 int i = 0;
 
-nents = (len + n->page_size - 1) >> n->page_bits;
+/*
+ *   The first PRP list entry, pointed by PRP2 can contain
+ *   offsets. Hence, we need calculate the no of entries in
+ *   prp2 based on the offset it has.
+ */
+nents = (n->page_size - (prp2 % n->page_size)) >> 3;
 prp_trans = MIN(n->max_prp_ents, nents) * sizeof(uint64_t);
 ret = nvme_addr_read(n, prp2, (void *)prp_list, prp_trans);
 if (ret) {
@@ -588,7 +593,7 @@ static uint16_t nvme_map_prp(NvmeCtrl *n, NvmeSg *sg, 
uint64_t prp1,
 while (len != 0) {
 uint64_t prp_ent = le64_to_cpu(prp_list[i]);
 
-if (i == n->max_prp_ents - 1 && len > n->page_size) {
+if (i == nents - 1 && len > n->page_size) {
 if (unlikely(prp_ent & (n->page_size - 1))) {
 trace_pci_nvme_err_invalid_prplist_ent(prp_ent);
 status = NVME_INVALID_PRP_OFFSET | NVME_DNR;
@@ -597,7 +602,8 @@ static uint16_t nvme_map_prp(NvmeCtrl *n, NvmeSg *sg, 
uint64_t prp1,
 
 i = 0;
 nents = (len + n->page_size - 1) >> n->page_bits;
-prp_trans = MIN(n->max_prp_ents, nents) * sizeof(uint64_t);
+nents = MIN(nents, n->max_prp_ents);
+prp_trans = nents * sizeof(uint64_t);
 ret = nvme_addr_read(n, prp_ent, (void *)prp_list,
  prp_trans);
 if (ret) {
-- 
2.7.0.windows.1

Re: [PATCH for-6.0? 1/3] job: Add job_wait_unpaused() for block-job-complete

2021-04-08 Thread Vladimir Sementsov-Ogievskiy


08.04.2021 20:04, John Snow wrote:

On 4/8/21 12:58 PM, Vladimir Sementsov-Ogievskiy wrote:

job-complete command is async. Can we instead just add a boolean like 
job->completion_requested, and set it if job-complete called in STANDBY state, 
and on job_resume job_complete will be called automatically if this boolean is 
true?


job_complete has a synchronous setup, though -- we lose out on a lot of 
synchronous error checking in that circumstance.


yes, that's a problem..



I was not able to audit it to determine that it'd be safe to attempt that setup 
during a drained section -- I imagine it won't work and will fail, though.

So I thought we'd have to signal completion and run the setup *later*, but what do we do 
if we get an error then? Does the entire job fail? Do we emit some new event? 
("BLOCK_JOB_COMPLETION_FAILED" ?) Is it recoverable?



Isn't it possible even now, that after successful job-complete job still fails 
and we report BLOCK_JOB_COMPLETED with error?

And actually, how much benefit user get from the fact that job-complete may 
fail?

We can make job-complete a simple always-success boolean flag setter like 
job-pause.

And actual completion will be done in background, when possible. And if it 
fail, job just fails, like it does for any background io error. And user have 
to check error/success status of final BLOCK_JOB_COMPLETED anyway.

--
Best regards,
Vladimir

Re: [PATCH v2 00/10] block/nbd: move connection code to separate file

2021-04-08 Thread Roman Kagan

On Thu, Apr 08, 2021 at 05:08:17PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> Hi all!
> 
> This substitutes "[PATCH 00/14] nbd: move reconnect-thread to separate file"
> Supersedes: <20210407104637.36033-1-vsement...@virtuozzo.com>
> 
> I want to simplify block/nbd.c which is overcomplicated now. First step
> is splitting out what could be split.
> 
> These series creates new file nbd/client-connection.c and part of
> block/nbd.c is refactored and moved.
> 
> v2 is mostly rewritten. I decided move larger part, otherwise it doesn't
> make real sense.
> 
> Note also that v2 is based on master. Patch 01 actually solves same
> problem as
> "[PATCH for-6.0] block/nbd: fix possible use after free of s->connect_thread" 
> [*]
> in a smarter way. So, if [*] goes first, this will be rebased to undo
> [*].
> 
> Vladimir Sementsov-Ogievskiy (10):
>   block/nbd: introduce NBDConnectThread reference counter
>   block/nbd: BDRVNBDState: drop unused connect_err and connect_status
>   util/async: aio_co_enter(): do aio_co_schedule in general case
>   block/nbd: simplify waking of nbd_co_establish_connection()
>   block/nbd: drop thr->state
>   block/nbd: bs-independent interface for nbd_co_establish_connection()
>   block/nbd: make nbd_co_establish_connection_cancel() bs-independent
>   block/nbd: rename NBDConnectThread to NBDClientConnection
>   block/nbd: introduce nbd_client_connection_new()
>   nbd: move connection code from block/nbd to nbd/client-connection
> 
>  include/block/nbd.h |  11 ++
>  block/nbd.c | 288 ++--
>  nbd/client-connection.c | 192 +++
>  util/async.c|  11 +-
>  nbd/meson.build |   1 +
>  5 files changed, 218 insertions(+), 285 deletions(-)
>  create mode 100644 nbd/client-connection.c

I think this is a nice cleanup overall, and makes the logic in
block/nbd.c much easier to reason about.

I guess it's 6.1 material though, as it looks somewhat too big for 6.0,
and the only serious bug it actually fixes can be addressed with a
band-aid mentioned above.

The problem I originally came across with, that of the requests being
canceled on drain despite reconnect, still remains, but I think the fix
for it should build up on this series (and thus probably wait till after
6.0).

Thanks,
Roman.

Re: [PATCH for-6.0? 0/3] job: Add job_wait_unpaused() for block-job-complete

2021-04-08 Thread John Snow


On 4/8/21 12:20 PM, Max Reitz wrote:

Hi,

See patch 1 for a detailed explanation of the problem.

The gist is: Draining a READY job makes it transition to STANDBY, and
jobs on STANDBY cannot be completed.  Ending the drained section will
schedule the job (so it is then resumed), but not wait until it is
actually running again.

Therefore, it can happen that issuing block-job-complete fails when you
issue it right after some draining operation.

I tried to come up with an iotest reproducer, but in the end I only got
something that reproduced the issue like 2/10 times, and it required
heavy I/O, so it is nothing I would like to have as part of the iotests.
Instead, I opted for a unit test, which allows me to cheat a bit
(specifically, locking the job IO thread before ending the drained
section).


Max Reitz (3):
   job: Add job_wait_unpaused() for block-job-complete
   test-blockjob: Test job_wait_unpaused()
   iotests/041: block-job-complete on user-paused job

  include/qemu/job.h |  15 
  blockdev.c |   3 +
  job.c  |  42 +++
  tests/unit/test-blockjob.c | 140 +
  tests/qemu-iotests/041 |  13 +++-
  5 files changed, 212 insertions(+), 1 deletion(-)



Left comments and review on #1, skimmed 2/3. Not sure if it's 
appropriate for 6.0 yet, that might depend on the responses to my 
comments and other reviewers and so on.


Acked-by: John Snow

Re: [PATCH v2 10/10] nbd: move connection code from block/nbd to nbd/client-connection

2021-04-08 Thread Vladimir Sementsov-Ogievskiy


08.04.2021 20:04, Roman Kagan wrote:

On Thu, Apr 08, 2021 at 05:08:27PM +0300, Vladimir Sementsov-Ogievskiy wrote:

We now have bs-independent connection API, which consists of four
functions:

   nbd_client_connection_new()
   nbd_client_connection_unref()
   nbd_co_establish_connection()
   nbd_co_establish_connection_cancel()

Move them to a separate file together with NBDClientConnection
structure which becomes private to the new API.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---

Hmm. I keep only Virtuozzo's copyright in a new file, as actually I've
developed nbd-reconnection code. Still probably safer to save all
copyrights. Let me now if you think so and I'll add them.


Not my call.


  include/block/nbd.h |  11 +++
  block/nbd.c | 167 --
  nbd/client-connection.c | 192 
  nbd/meson.build |   1 +
  4 files changed, 204 insertions(+), 167 deletions(-)
  create mode 100644 nbd/client-connection.c


Reviewed-by: Roman Kagan 



Thanks a lot for reviewing!

--
Best regards,
Vladimir

Re: [PATCH v2 10/10] nbd: move connection code from block/nbd to nbd/client-connection

2021-04-08 Thread Roman Kagan

On Thu, Apr 08, 2021 at 05:08:27PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> We now have bs-independent connection API, which consists of four
> functions:
> 
>   nbd_client_connection_new()
>   nbd_client_connection_unref()
>   nbd_co_establish_connection()
>   nbd_co_establish_connection_cancel()
> 
> Move them to a separate file together with NBDClientConnection
> structure which becomes private to the new API.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
> 
> Hmm. I keep only Virtuozzo's copyright in a new file, as actually I've
> developed nbd-reconnection code. Still probably safer to save all
> copyrights. Let me now if you think so and I'll add them.

Not my call.

>  include/block/nbd.h |  11 +++
>  block/nbd.c | 167 --
>  nbd/client-connection.c | 192 
>  nbd/meson.build |   1 +
>  4 files changed, 204 insertions(+), 167 deletions(-)
>  create mode 100644 nbd/client-connection.c

Reviewed-by: Roman Kagan

Re: [PATCH for-6.0? 1/3] job: Add job_wait_unpaused() for block-job-complete

2021-04-08 Thread John Snow


On 4/8/21 12:58 PM, Vladimir Sementsov-Ogievskiy wrote:
job-complete command is async. Can we instead just add a boolean like 
job->completion_requested, and set it if job-complete called in STANDBY 
state, and on job_resume job_complete will be called automatically if 
this boolean is true?


job_complete has a synchronous setup, though -- we lose out on a lot of 
synchronous error checking in that circumstance.


I was not able to audit it to determine that it'd be safe to attempt 
that setup during a drained section -- I imagine it won't work and will 
fail, though.


So I thought we'd have to signal completion and run the setup *later*, 
but what do we do if we get an error then? Does the entire job fail? Do 
we emit some new event? ("BLOCK_JOB_COMPLETION_FAILED" ?) Is it recoverable?


So on and so forth. Seems like a lot of things to consider, unless I am 
making a giant fuss about nothing again, not like that's ever happened. O:-)


--js

Re: [PATCH for-6.0? 1/3] job: Add job_wait_unpaused() for block-job-complete

2021-04-08 Thread Vladimir Sementsov-Ogievskiy


08.04.2021 19:20, Max Reitz wrote:

block-job-complete can only be applied when the job is READY, not when
it is on STANDBY (ready, but paused).  Draining a job technically pauses
it (which makes a READY job enter STANDBY), and ending the drained
section does not synchronously resume it, but only schedules the job,
which will then be resumed.  So attempting to complete a job immediately
after a drained section may sometimes fail.

That is bad at least because users cannot really work nicely around
this: A job may be paused and resumed at any time, so waiting for the
job to be in the READY state and then issuing a block-job-complete poses
a TOCTTOU problem.  The only way around it would be to issue
block-job-complete until it no longer fails due to the job being in the
STANDBY state, but that would not be nice.

We can solve the problem by allowing block-job-complete to be invoked on
jobs that are on STANDBY, if that status is the result of a drained
section (not because the user has paused the job), and that section has
ended.  That is, if the job is on STANDBY, but scheduled to be resumed.

Perhaps we could actually just directly allow this, seeing that mirror
is the only user of ready/complete, and that mirror_complete() could
probably work under the given circumstances, but there may be many side
effects to consider.

It is simpler to add a function job_wait_unpaused() that waits for the
job to be resumed (under said circumstances), and to make
qmp_block_job_complete() use it to delay job_complete() until then.

Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1945635
Signed-off-by: Max Reitz 
---
  include/qemu/job.h | 15 +++
  blockdev.c |  3 +++
  job.c  | 42 ++
  3 files changed, 60 insertions(+)

diff --git a/include/qemu/job.h b/include/qemu/job.h
index efc6fa7544..cf3082b6d7 100644
--- a/include/qemu/job.h
+++ b/include/qemu/job.h
@@ -563,4 +563,19 @@ void job_dismiss(Job **job, Error **errp);
   */
  int job_finish_sync(Job *job, void (*finish)(Job *, Error **errp), Error 
**errp);
  
+/**

+ * If the job has been paused because of a drained section, and that
+ * section has ended, wait until the job is resumed.
+ *
+ * Return 0 if the job is not paused, or if it has been successfully
+ * resumed.
+ * Return an error if the job has been paused in such a way that
+ * waiting will not resume it, i.e. if it has been paused by the user,
+ * or if it is still drained.
+ *
+ * Callers must be in the home AioContext and hold the AioContext lock
+ * of job->aio_context.
+ */
+int job_wait_unpaused(Job *job, Error **errp);
+
  #endif
diff --git a/blockdev.c b/blockdev.c
index a57590aae4..c0cc2fa364 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -3414,6 +3414,9 @@ void qmp_block_job_complete(const char *device, Error 
**errp)
  return;
  }
  
+if (job_wait_unpaused(>job, errp) < 0) {

+return;
+}
  trace_qmp_block_job_complete(job);
  job_complete(>job, errp);
  aio_context_release(aio_context);
diff --git a/job.c b/job.c
index 289edee143..1ea30fd294 100644
--- a/job.c
+++ b/job.c
@@ -1023,3 +1023,45 @@ int job_finish_sync(Job *job, void (*finish)(Job *, 
Error **errp), Error **errp)
  job_unref(job);
  return ret;
  }
+
+int job_wait_unpaused(Job *job, Error **errp)
+{
+/*
+ * Only run this function from the main context, because this is
+ * what we need, and this way we do not have to think about what
+ * happens if the user concurrently pauses the job from the main
+ * monitor.
+ */
+assert(qemu_get_current_aio_context() == qemu_get_aio_context());
+
+/*
+ * Quick path (e.g. so we do not get an error if pause_count > 0
+ * but the job is not even paused)
+ */
+if (!job->paused) {
+return 0;
+}
+
+/* If the user has paused the job, waiting will not help */
+if (job->user_paused) {
+error_setg(errp, "Job '%s' has been paused by the user", job->id);
+return -EBUSY;
+}
+
+/* Similarly, if the job is still drained, waiting will not help either */
+if (job->pause_count > 0) {
+error_setg(errp, "Job '%s' is blocked and cannot be unpaused", 
job->id);
+return -EBUSY;
+}
+
+/*
+ * This function is specifically for waiting for a job to be
+ * resumed after a drained section.  Ending the drained section
+ * includes a job_enter(), which schedules the job loop to be run,
+ * and once it does, job->paused will be cleared.  Therefore, we
+ * do not need to invoke job_enter() here.
+ */
+AIO_WAIT_WHILE(job->aio_context, job->paused);
+
+return 0;
+}



Hmm.. It seems that when job->pause_count becomes 0, job_enter is called, and 
the period when pause_count is 0 but paused is still true should be relatively 
shot. And patch doesn't help if user call job-complete during drained section. So 
it looks like the patch will help relatively seldom.. Or

Re: [PATCH v2 09/10] block/nbd: introduce nbd_client_connection_new()

2021-04-08 Thread Roman Kagan

On Thu, Apr 08, 2021 at 05:08:26PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> This is the last step of creating bs-independing nbd connection

s/independing/independent/

> interface. With next commit we can finally move it to separate file.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
>  block/nbd.c | 15 +--
>  1 file changed, 9 insertions(+), 6 deletions(-)

Reviewed-by: Roman Kagan

Re: [PATCH for-6.0? 1/3] job: Add job_wait_unpaused() for block-job-complete

2021-04-08 Thread John Snow


On 4/8/21 12:20 PM, Max Reitz wrote:

block-job-complete can only be applied when the job is READY, not when
it is on STANDBY (ready, but paused).  Draining a job technically pauses
it (which makes a READY job enter STANDBY), and ending the drained
section does not synchronously resume it, but only schedules the job,
which will then be resumed.  So attempting to complete a job immediately
after a drained section may sometimes fail.

That is bad at least because users cannot really work nicely around
this: A job may be paused and resumed at any time, so waiting for the
job to be in the READY state and then issuing a block-job-complete poses
a TOCTTOU problem.  The only way around it would be to issue
block-job-complete until it no longer fails due to the job being in the
STANDBY state, but that would not be nice.

We can solve the problem by allowing block-job-complete to be invoked on
jobs that are on STANDBY, if that status is the result of a drained
section (not because the user has paused the job), and that section has
ended.  That is, if the job is on STANDBY, but scheduled to be resumed.

Perhaps we could actually just directly allow this, seeing that mirror
is the only user of ready/complete, and that mirror_complete() could
probably work under the given circumstances, but there may be many side
effects to consider.

It is simpler to add a function job_wait_unpaused() that waits for the
job to be resumed (under said circumstances), and to make
qmp_block_job_complete() use it to delay job_complete() until then.

Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1945635
Signed-off-by: Max Reitz 
---
  include/qemu/job.h | 15 +++
  blockdev.c |  3 +++
  job.c  | 42 ++
  3 files changed, 60 insertions(+)

diff --git a/include/qemu/job.h b/include/qemu/job.h
index efc6fa7544..cf3082b6d7 100644
--- a/include/qemu/job.h
+++ b/include/qemu/job.h
@@ -563,4 +563,19 @@ void job_dismiss(Job **job, Error **errp);
   */
  int job_finish_sync(Job *job, void (*finish)(Job *, Error **errp), Error 
**errp);
  
+/**

+ * If the job has been paused because of a drained section, and that
+ * section has ended, wait until the job is resumed.
+ *
+ * Return 0 if the job is not paused, or if it has been successfully
+ * resumed.
+ * Return an error if the job has been paused in such a way that
+ * waiting will not resume it, i.e. if it has been paused by the user,
+ * or if it is still drained.
+ *
+ * Callers must be in the home AioContext and hold the AioContext lock
+ * of job->aio_context.
+ */
+int job_wait_unpaused(Job *job, Error **errp);
+
  #endif
diff --git a/blockdev.c b/blockdev.c
index a57590aae4..c0cc2fa364 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -3414,6 +3414,9 @@ void qmp_block_job_complete(const char *device, Error 
**errp)
  return;
  }
  
+if (job_wait_unpaused(>job, errp) < 0) {

+return;
+}


After which point, we assume we've transitioned back to either RUNNING 
or READY, and



  trace_qmp_block_job_complete(job);
  job_complete(>job, errp);


This function checks the usual state table for permission to 
deliver/perform the verb.



  aio_context_release(aio_context);
diff --git a/job.c b/job.c
index 289edee143..1ea30fd294 100644
--- a/job.c
+++ b/job.c
@@ -1023,3 +1023,45 @@ int job_finish_sync(Job *job, void (*finish)(Job *, 
Error **errp), Error **errp)
  job_unref(job);
  return ret;
  }
+
+int job_wait_unpaused(Job *job, Error **errp)
+{
+/*
+ * Only run this function from the main context, because this is
+ * what we need, and this way we do not have to think about what
+ * happens if the user concurrently pauses the job from the main
+ * monitor.
+ */
+assert(qemu_get_current_aio_context() == qemu_get_aio_context());
+
+/*
+ * Quick path (e.g. so we do not get an error if pause_count > 0
+ * but the job is not even paused)
+ */
+if (!job->paused) {
+return 0;
+}
+
+/* If the user has paused the job, waiting will not help */
+if (job->user_paused) {
+error_setg(errp, "Job '%s' has been paused by the user", job->id);
+return -EBUSY;
+}
+


Or the job has encountered an error if that error policy is set. It is 
maybe more accurate to say that the job is currently paused/halted (for 
some reason) and is awaiting the explicit unpause instruction.


"Job '%s' has been paused and needs to be explicitly resumed with 
job-resume", maybe?


Job '%s' has been paused and needs to be [explicitly] resumed
[by the user] [with job-resume]

Some combo of those runes.


+/* Similarly, if the job is still drained, waiting will not help either */
+if (job->pause_count > 0) {
+error_setg(errp, "Job '%s' is blocked and cannot be unpaused", 
job->id);
+return -EBUSY;
+}
+


This leaks an internal state detail out to the caller. In which 
circumstances does this happen? Do we

Re: [PATCH v2 08/10] block/nbd: rename NBDConnectThread to NBDClientConnection

2021-04-08 Thread Roman Kagan

On Thu, Apr 08, 2021 at 05:08:25PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> We are going to move connection code to own file and want clear names
> and APIs.
> 
> The structure is shared between user and (possibly) several runs of
> connect-thread. So it's wrong to call it "thread". Let's rename to
> something more generic.
> 
> Appropriately rename connect_thread and thr variables to conn.
> 
> connect_thread_state_unref() function gets new appropriate name too
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
>  block/nbd.c | 127 ++--
>  1 file changed, 63 insertions(+), 64 deletions(-)

[To other reviewers: in addition to renaming there's one blank line
removed, hence the difference between (+) and (-)]

Reviewed-by: Roman Kagan

Re: [PATCH v2 07/10] block/nbd: make nbd_co_establish_connection_cancel() bs-independent

2021-04-08 Thread Roman Kagan

On Thu, Apr 08, 2021 at 05:08:24PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> nbd_co_establish_connection_cancel() actually needs only pointer to
> NBDConnectThread. So, make it clean.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
>  block/nbd.c | 16 +++-
>  1 file changed, 7 insertions(+), 9 deletions(-)

Reviewed-by: Roman Kagan

Re: [PATCH v2 06/10] block/nbd: bs-independent interface for nbd_co_establish_connection()

2021-04-08 Thread Roman Kagan

On Thu, Apr 08, 2021 at 05:08:23PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> We are going to split connection code to separate file. Now we are
> ready to give nbd_co_establish_connection() clean and bs-independent
> interface.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
>  block/nbd.c | 49 +++--
>  1 file changed, 31 insertions(+), 18 deletions(-)

Reviewed-by: Roman Kagan

Re: [PATCH v2 05/10] block/nbd: drop thr->state

2021-04-08 Thread Roman Kagan

On Thu, Apr 08, 2021 at 05:08:22PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> Actually, the only bit of information we need is "is thread running or
> not". We don't need all these states. So, instead of thr->state add
> boolean variable thr->running and refactor the code.

There's certain redundancy between thr->refcnt and thr->running.  I
wonder if it makes sense to employ this.

Can be done later if deemed useful.

> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
>  block/nbd.c | 103 ++--
>  1 file changed, 27 insertions(+), 76 deletions(-)

Reviewed-by: Roman Kagan

[PATCH for-6.0? 0/3] job: Add job_wait_unpaused() for block-job-complete

2021-04-08 Thread Max Reitz

Hi,

See patch 1 for a detailed explanation of the problem.

The gist is: Draining a READY job makes it transition to STANDBY, and
jobs on STANDBY cannot be completed.  Ending the drained section will
schedule the job (so it is then resumed), but not wait until it is
actually running again.

Therefore, it can happen that issuing block-job-complete fails when you
issue it right after some draining operation.

I tried to come up with an iotest reproducer, but in the end I only got
something that reproduced the issue like 2/10 times, and it required
heavy I/O, so it is nothing I would like to have as part of the iotests.
Instead, I opted for a unit test, which allows me to cheat a bit
(specifically, locking the job IO thread before ending the drained
section).


Max Reitz (3):
  job: Add job_wait_unpaused() for block-job-complete
  test-blockjob: Test job_wait_unpaused()
  iotests/041: block-job-complete on user-paused job

 include/qemu/job.h |  15 
 blockdev.c |   3 +
 job.c  |  42 +++
 tests/unit/test-blockjob.c | 140 +
 tests/qemu-iotests/041 |  13 +++-
 5 files changed, 212 insertions(+), 1 deletion(-)

-- 
2.29.2

[PATCH for-6.0? 1/3] job: Add job_wait_unpaused() for block-job-complete

2021-04-08 Thread Max Reitz

block-job-complete can only be applied when the job is READY, not when
it is on STANDBY (ready, but paused).  Draining a job technically pauses
it (which makes a READY job enter STANDBY), and ending the drained
section does not synchronously resume it, but only schedules the job,
which will then be resumed.  So attempting to complete a job immediately
after a drained section may sometimes fail.

That is bad at least because users cannot really work nicely around
this: A job may be paused and resumed at any time, so waiting for the
job to be in the READY state and then issuing a block-job-complete poses
a TOCTTOU problem.  The only way around it would be to issue
block-job-complete until it no longer fails due to the job being in the
STANDBY state, but that would not be nice.

We can solve the problem by allowing block-job-complete to be invoked on
jobs that are on STANDBY, if that status is the result of a drained
section (not because the user has paused the job), and that section has
ended.  That is, if the job is on STANDBY, but scheduled to be resumed.

Perhaps we could actually just directly allow this, seeing that mirror
is the only user of ready/complete, and that mirror_complete() could
probably work under the given circumstances, but there may be many side
effects to consider.

It is simpler to add a function job_wait_unpaused() that waits for the
job to be resumed (under said circumstances), and to make
qmp_block_job_complete() use it to delay job_complete() until then.

Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1945635
Signed-off-by: Max Reitz 
---
 include/qemu/job.h | 15 +++
 blockdev.c |  3 +++
 job.c  | 42 ++
 3 files changed, 60 insertions(+)

diff --git a/include/qemu/job.h b/include/qemu/job.h
index efc6fa7544..cf3082b6d7 100644
--- a/include/qemu/job.h
+++ b/include/qemu/job.h
@@ -563,4 +563,19 @@ void job_dismiss(Job **job, Error **errp);
  */
 int job_finish_sync(Job *job, void (*finish)(Job *, Error **errp), Error 
**errp);
 
+/**
+ * If the job has been paused because of a drained section, and that
+ * section has ended, wait until the job is resumed.
+ *
+ * Return 0 if the job is not paused, or if it has been successfully
+ * resumed.
+ * Return an error if the job has been paused in such a way that
+ * waiting will not resume it, i.e. if it has been paused by the user,
+ * or if it is still drained.
+ *
+ * Callers must be in the home AioContext and hold the AioContext lock
+ * of job->aio_context.
+ */
+int job_wait_unpaused(Job *job, Error **errp);
+
 #endif
diff --git a/blockdev.c b/blockdev.c
index a57590aae4..c0cc2fa364 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -3414,6 +3414,9 @@ void qmp_block_job_complete(const char *device, Error 
**errp)
 return;
 }
 
+if (job_wait_unpaused(>job, errp) < 0) {
+return;
+}
 trace_qmp_block_job_complete(job);
 job_complete(>job, errp);
 aio_context_release(aio_context);
diff --git a/job.c b/job.c
index 289edee143..1ea30fd294 100644
--- a/job.c
+++ b/job.c
@@ -1023,3 +1023,45 @@ int job_finish_sync(Job *job, void (*finish)(Job *, 
Error **errp), Error **errp)
 job_unref(job);
 return ret;
 }
+
+int job_wait_unpaused(Job *job, Error **errp)
+{
+/*
+ * Only run this function from the main context, because this is
+ * what we need, and this way we do not have to think about what
+ * happens if the user concurrently pauses the job from the main
+ * monitor.
+ */
+assert(qemu_get_current_aio_context() == qemu_get_aio_context());
+
+/*
+ * Quick path (e.g. so we do not get an error if pause_count > 0
+ * but the job is not even paused)
+ */
+if (!job->paused) {
+return 0;
+}
+
+/* If the user has paused the job, waiting will not help */
+if (job->user_paused) {
+error_setg(errp, "Job '%s' has been paused by the user", job->id);
+return -EBUSY;
+}
+
+/* Similarly, if the job is still drained, waiting will not help either */
+if (job->pause_count > 0) {
+error_setg(errp, "Job '%s' is blocked and cannot be unpaused", 
job->id);
+return -EBUSY;
+}
+
+/*
+ * This function is specifically for waiting for a job to be
+ * resumed after a drained section.  Ending the drained section
+ * includes a job_enter(), which schedules the job loop to be run,
+ * and once it does, job->paused will be cleared.  Therefore, we
+ * do not need to invoke job_enter() here.
+ */
+AIO_WAIT_WHILE(job->aio_context, job->paused);
+
+return 0;
+}
-- 
2.29.2

[PATCH for-6.0? 2/3] test-blockjob: Test job_wait_unpaused()

2021-04-08 Thread Max Reitz

Create a job that remains on STANDBY after a drained section, and see
that invoking job_wait_unpaused() will get it unstuck.

Signed-off-by: Max Reitz 
---
 tests/unit/test-blockjob.c | 140 +
 1 file changed, 140 insertions(+)

diff --git a/tests/unit/test-blockjob.c b/tests/unit/test-blockjob.c
index 7519847912..b7736e298d 100644
--- a/tests/unit/test-blockjob.c
+++ b/tests/unit/test-blockjob.c
@@ -16,6 +16,7 @@
 #include "block/blockjob_int.h"
 #include "sysemu/block-backend.h"
 #include "qapi/qmp/qdict.h"
+#include "iothread.h"
 
 static const BlockJobDriver test_block_job_driver = {
 .job_driver = {
@@ -375,6 +376,144 @@ static void test_cancel_concluded(void)
 cancel_common(s);
 }
 
+/* (See test_yielding_driver for the job description) */
+typedef struct YieldingJob {
+BlockJob common;
+bool should_complete;
+} YieldingJob;
+
+static void yielding_job_complete(Job *job, Error **errp)
+{
+YieldingJob *s = container_of(job, YieldingJob, common.job);
+s->should_complete = true;
+job_enter(job);
+}
+
+static int coroutine_fn yielding_job_run(Job *job, Error **errp)
+{
+YieldingJob *s = container_of(job, YieldingJob, common.job);
+
+job_transition_to_ready(job);
+
+while (!s->should_complete) {
+job_yield(job);
+}
+
+return 0;
+}
+
+/*
+ * This job transitions immediately to the READY state, and then
+ * yields until it is to complete.
+ */
+static const BlockJobDriver test_yielding_driver = {
+.job_driver = {
+.instance_size  = sizeof(YieldingJob),
+.free   = block_job_free,
+.user_resume= block_job_user_resume,
+.run= yielding_job_run,
+.complete   = yielding_job_complete,
+},
+};
+
+/*
+ * Test that job_wait_unpaused() can get jobs from a paused state to
+ * a running state so that job_complete() can be applied (assuming the
+ * pause occurred due to a drain that has already been lifted).
+ * (This is what QMP's block-job-complete does so it can be executed
+ * even immediately after some other operation instated and lifted a
+ * drain.)
+ *
+ * To do this, run YieldingJob in an IO thread, get it into the READY
+ * state, then have a drained section.  Before ending the section,
+ * acquire the context so the job will not be entered and will thus
+ * remain on STANDBY.
+ *
+ * Invoking job_complete() then will fail.
+ *
+ * However, job_wait_unpaused() should see the job is to be resumed,
+ * wait for it to be resumed, and then we can invoke job_complete()
+ * without error.
+ *
+ * Note that on the QMP interface, it is impossible to lock an IO
+ * thread before a drained section ends.  In practice, the
+ * bdrv_drain_all_end() and the aio_context_acquire() will be
+ * reversed.  However, that makes for worse reproducibility here:
+ * Sometimes, the job would no longer be in STANDBY then but already
+ * be started.  We cannot prevent that, because the IO thread runs
+ * concurrently.  We can only prevent it by taking the lock before
+ * ending the drained section, so we do that.
+ *
+ * (You can reverse the order of operations and most of the time the
+ * test will pass, but sometimes the assert(status == STANDBY) will
+ * fail.)
+ */
+static void test_complete_in_standby(void)
+{
+BlockBackend *blk;
+IOThread *iothread;
+AioContext *ctx;
+Job *job;
+BlockJob *bjob;
+Error *local_err = NULL;
+
+/* Create a test drive, move it to an IO thread */
+blk = create_blk(NULL);
+iothread = iothread_new();
+
+ctx = iothread_get_aio_context(iothread);
+blk_set_aio_context(blk, ctx, _abort);
+
+/* Create our test job */
+bjob = mk_job(blk, "job", _yielding_driver, true,
+  JOB_MANUAL_FINALIZE | JOB_MANUAL_DISMISS);
+job = >job;
+assert(job->status == JOB_STATUS_CREATED);
+
+/* Wait for the job to become READY */
+job_start(job);
+aio_context_acquire(ctx);
+AIO_WAIT_WHILE(ctx, job->status != JOB_STATUS_READY);
+aio_context_release(ctx);
+
+/* Begin the drained section, pausing the job */
+bdrv_drain_all_begin();
+assert(job->status == JOB_STATUS_STANDBY);
+/* Lock the IO thread to prevent the job from being run */
+aio_context_acquire(ctx);
+/* This will schedule the job to resume it */
+bdrv_drain_all_end();
+
+/* But the job cannot run, so it will remain on standby */
+assert(job->status == JOB_STATUS_STANDBY);
+
+/* A job on standby cannot be completed */
+job_complete(job, _err);
+assert(local_err != NULL);
+error_free(local_err);
+local_err = NULL;
+
+/*
+ * But waiting for it and then completing it should work.
+ * (This is what qmp_block_job_complete() does.)
+ */
+job_wait_unpaused(job, _abort);
+job_complete(job, _abort);
+
+/* The test is done now, clean up. */
+job_finish_sync(job, NULL, _abort);
+assert(job->status == JOB_STATUS_PENDING);
+
+

[PATCH for-6.0? 3/3] iotests/041: block-job-complete on user-paused job

2021-04-08 Thread Max Reitz

Expand test_pause() to check what happens when issuing
block-job-complete on a job that is on STANDBY because it has been
paused by the user.  (This should be an error, and in particular not
hang job_wait_unpaused().)

Signed-off-by: Max Reitz 
---
 tests/qemu-iotests/041 | 13 -
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/tests/qemu-iotests/041 b/tests/qemu-iotests/041
index 5cc02b24fc..d2c9669741 100755
--- a/tests/qemu-iotests/041
+++ b/tests/qemu-iotests/041
@@ -120,7 +120,18 @@ class TestSingleDrive(iotests.QMPTestCase):
 result = self.vm.qmp('block-job-resume', device='drive0')
 self.assert_qmp(result, 'return', {})
 
-self.complete_and_wait()
+self.wait_ready()
+
+# Check that a job on STANDBY cannot be completed
+self.pause_job('drive0')
+result = self.vm.qmp('block-job-complete', device='drive0')
+self.assert_qmp(result, 'error/desc',
+"Job 'drive0' has been paused by the user")
+
+result = self.vm.qmp('block-job-resume', device='drive0')
+self.assert_qmp(result, 'return', {})
+
+self.complete_and_wait(wait_ready=False)
 self.vm.shutdown()
 self.assertTrue(iotests.compare_images(test_img, target_img),
 'target image does not match source after mirroring')
-- 
2.29.2

Re: [PATCH v2 04/10] block/nbd: simplify waking of nbd_co_establish_connection()

2021-04-08 Thread Roman Kagan

On Thu, Apr 08, 2021 at 05:08:21PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> Instead of connect_bh, bh_ctx and wait_connect fields we can live with
> only one link to waiting coroutine, protected by mutex.
> 
> So new logic is:
> 
> nbd_co_establish_connection() sets wait_co under mutex, release the
> mutex and do yield(). Note, that wait_co may be scheduled by thread
> immediately after unlocking the mutex. Still, in main thread (or
> iothread) we'll not reach the code for entering the coroutine until the
> yield() so we are safe.
> 
> Both connect_thread_func() and nbd_co_establish_connection_cancel() do
> the following to handle wait_co:
> 
> Under mutex, if thr->wait_co is not NULL, call aio_co_wake() (which
> never tries to acquire aio context since previous commit, so we are
> safe to do it under thr->mutex) and set thr->wait_co to NULL.
> This way we protect ourselves of scheduling it twice.
> 
> Also this commit make nbd_co_establish_connection() less connected to
> bs (we have generic pointer to the coroutine, not use s->connection_co
> directly). So, we are on the way of splitting connection API out of
> nbd.c (which is overcomplicated now).
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
>  block/nbd.c | 49 +
>  1 file changed, 9 insertions(+), 40 deletions(-)

Reviewed-by: Roman Kagan

Re: [RFC PATCH v2 04/11] qemu-iotests: delay QMP socket timers

2021-04-08 Thread Emanuele Giuseppe Esposito





On 08/04/2021 17:40, Paolo Bonzini wrote:

On 07/04/21 15:50, Emanuele Giuseppe Esposito wrote:

  def get_qmp_events_filtered(self, wait=60.0):
  result = []
-    for ev in self.get_qmp_events(wait=wait):
+    qmp_wait = wait
+    if qemu_gdb:
+    qmp_wait = 0.0
+    for ev in self.get_qmp_events(wait=qmp_wait):
  result.append(filter_qmp_event(ev))
  return result


Should this be handled in get_qmp_events instead, since you're basically 
changing all the callers?


get_qmp_events is in python/machine.py, which as I understand might be 
used also by some other scripts, so I want to keep the changes there to 
the minimum. Also, machine.py has no access to qemu_gdb or 
qemu_valgrind, so passing a boolean or something to delay the timer 
would still require to add a similar check in all sections.


Or do you have a cleaner way to do this?

Emanuele

Re: [RFC PATCH v2 03/11] qemu-iotests: add option to attach gdbserver

2021-04-08 Thread Emanuele Giuseppe Esposito





On 08/04/2021 17:40, Paolo Bonzini wrote:

On 07/04/21 15:50, Emanuele Giuseppe Esposito wrote:

+    self.gdb_qemu = os.getenv('GDB_QEMU')
+
+    if gdb and not self.gdb_qemu:
+    self.gdb_qemu = 'localhost:12345'
+    elif self.gdb_qemu and not gdb:
+    del self.gdb_qemu
+    del os.environ['GDB_QEMU']


Alternatively:

     if gdb:
     self.gdb_qemu = os.environ.get('GDB_QEMU', 'localhost:12345')
     elif 'GDB_QEMU' in os.environ:
     del os.environ['GDB_QEMU']


makes sense, thanks.



+GDB_QEMU  -- "{GDB_QEMU}"


Perhaps only include this if gdbserver is actually in use?  (Or not at 
all, since gdbserver anyway prints the port).


You forgot that by default all logs go to a log file :) so unless you 
find the corresponding log, it is not easy to find the GDB port.


Thank you,
Emanuele



Paolo

[PATCH 5/5] blkdebug: protect rules and suspended_reqs with a lock

2021-04-08 Thread Emanuele Giuseppe Esposito

Co-developed-by: Paolo Bonzini 
Signed-off-by: Emanuele Giuseppe Esposito 
---
 block/blkdebug.c | 32 +++-
 1 file changed, 31 insertions(+), 1 deletion(-)

diff --git a/block/blkdebug.c b/block/blkdebug.c
index dffd869b32..e92a35ccbb 100644
--- a/block/blkdebug.c
+++ b/block/blkdebug.c
@@ -54,6 +54,7 @@ typedef struct BDRVBlkdebugState {
 /* For blkdebug_refresh_filename() */
 char *config_file;
 
+QemuMutex lock;
 QLIST_HEAD(, BlkdebugRule) rules[BLKDBG__MAX];
 QSIMPLEQ_HEAD(, BlkdebugRule) active_rules;
 QLIST_HEAD(, BlkdebugSuspendedReq) suspended_reqs;
@@ -245,7 +246,9 @@ static int add_rule(void *opaque, QemuOpts *opts, Error 
**errp)
 };
 
 /* Add the rule */
+qemu_mutex_lock(>lock);
 QLIST_INSERT_HEAD(>rules[event], rule, next);
+qemu_mutex_unlock(>lock);
 
 return 0;
 }
@@ -468,6 +471,7 @@ static int blkdebug_open(BlockDriverState *bs, QDict 
*options, int flags,
 int ret;
 uint64_t align;
 
+qemu_mutex_init(>lock);
 opts = qemu_opts_create(_opts, NULL, 0, _abort);
 if (!qemu_opts_absorb_qdict(opts, options, errp)) {
 ret = -EINVAL;
@@ -568,6 +572,7 @@ static int blkdebug_open(BlockDriverState *bs, QDict 
*options, int flags,
 ret = 0;
 out:
 if (ret < 0) {
+qemu_mutex_destroy(>lock);
 g_free(s->config_file);
 }
 qemu_opts_del(opts);
@@ -582,6 +587,7 @@ static int rule_check(BlockDriverState *bs, uint64_t 
offset, uint64_t bytes,
 int error;
 bool immediately;
 
+qemu_mutex_lock(>lock);
 QSIMPLEQ_FOREACH(rule, >active_rules, active_next) {
 uint64_t inject_offset = rule->options.inject.offset;
 
@@ -595,6 +601,7 @@ static int rule_check(BlockDriverState *bs, uint64_t 
offset, uint64_t bytes,
 }
 
 if (!rule || !rule->options.inject.error) {
+qemu_mutex_unlock(>lock);
 return 0;
 }
 
@@ -606,6 +613,7 @@ static int rule_check(BlockDriverState *bs, uint64_t 
offset, uint64_t bytes,
 remove_rule(rule);
 }
 
+qemu_mutex_unlock(>lock);
 if (!immediately) {
 aio_co_schedule(qemu_get_current_aio_context(), qemu_coroutine_self());
 qemu_coroutine_yield();
@@ -771,8 +779,10 @@ static void blkdebug_close(BlockDriverState *bs)
 }
 
 g_free(s->config_file);
+qemu_mutex_destroy(>lock);
 }
 
+/* Called with lock held.  */
 static void suspend_request(BlockDriverState *bs, BlkdebugRule *rule)
 {
 BDRVBlkdebugState *s = bs->opaque;
@@ -791,6 +801,7 @@ static void suspend_request(BlockDriverState *bs, 
BlkdebugRule *rule)
 }
 }
 
+/* Called with lock held.  */
 static void process_rule(BlockDriverState *bs, struct BlkdebugRule *rule,
  int *action_count)
 {
@@ -829,17 +840,22 @@ static void blkdebug_debug_event(BlockDriverState *bs, 
BlkdebugEvent event)
 
 assert((int)event >= 0 && event < BLKDBG__MAX);
 
+qemu_mutex_lock(>lock);
 s->new_state = s->state;
 QLIST_FOREACH_SAFE(rule, >rules[event], next, next) {
 process_rule(bs, rule, actions_count);
 }
 
+qemu_mutex_unlock(>lock);
+
 while (actions_count[ACTION_SUSPEND] > 0) {
 qemu_coroutine_yield();
 actions_count[ACTION_SUSPEND]--;
 }
 
+qemu_mutex_lock(>lock);
 s->state = s->new_state;
+qemu_mutex_unlock(>lock);
 }
 
 static int blkdebug_debug_breakpoint(BlockDriverState *bs, const char *event,
@@ -862,11 +878,14 @@ static int blkdebug_debug_breakpoint(BlockDriverState 
*bs, const char *event,
 .options.suspend.tag = g_strdup(tag),
 };
 
+qemu_mutex_lock(>lock);
 QLIST_INSERT_HEAD(>rules[blkdebug_event], rule, next);
+qemu_mutex_unlock(>lock);
 
 return 0;
 }
 
+/* Called with lock held.  */
 static int resume_req_by_tag(BDRVBlkdebugState *s, const char *tag, bool all)
 {
 BlkdebugSuspendedReq *r;
@@ -884,7 +903,9 @@ retry:
 g_free(r->tag);
 g_free(r);
 
+qemu_mutex_unlock(>lock);
 qemu_coroutine_enter(co);
+qemu_mutex_lock(>lock);
 
 if (all) {
 goto retry;
@@ -898,8 +919,12 @@ retry:
 static int blkdebug_debug_resume(BlockDriverState *bs, const char *tag)
 {
 BDRVBlkdebugState *s = bs->opaque;
+int rc;
 
-return resume_req_by_tag(s, tag, false);
+qemu_mutex_lock(>lock);
+rc = resume_req_by_tag(s, tag, false);
+qemu_mutex_unlock(>lock);
+return rc;
 }
 
 static int blkdebug_debug_remove_breakpoint(BlockDriverState *bs,
@@ -909,6 +934,7 @@ static int 
blkdebug_debug_remove_breakpoint(BlockDriverState *bs,
 BlkdebugRule *rule, *next;
 int i, ret = -ENOENT;
 
+qemu_mutex_lock(>lock);
 for (i = 0; i < BLKDBG__MAX; i++) {
 QLIST_FOREACH_SAFE(rule, >rules[i], next, next) {
 if (rule->action == ACTION_SUSPEND &&
@@ -921,6 +947,7 @@ static int 
blkdebug_debug_remove_breakpoint(BlockDriverState *bs,
 if (resume_req_by_tag(s, tag, true) == 0) {

[PATCH 4/5] blkdebug: do not suspend in the middle of QLIST_FOREACH_SAFE

2021-04-08 Thread Emanuele Giuseppe Esposito

Use actions_count to see how many yeld to issue.

Co-developed-by: Paolo Bonzini 
Signed-off-by: Emanuele Giuseppe Esposito 
---
 block/blkdebug.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/block/blkdebug.c b/block/blkdebug.c
index 388b5ed615..dffd869b32 100644
--- a/block/blkdebug.c
+++ b/block/blkdebug.c
@@ -789,7 +789,6 @@ static void suspend_request(BlockDriverState *bs, 
BlkdebugRule *rule)
 if (!qtest_enabled()) {
 printf("blkdebug: Suspended request '%s'\n", r->tag);
 }
-qemu_coroutine_yield();
 }
 
 static void process_rule(BlockDriverState *bs, struct BlkdebugRule *rule,
@@ -834,6 +833,12 @@ static void blkdebug_debug_event(BlockDriverState *bs, 
BlkdebugEvent event)
 QLIST_FOREACH_SAFE(rule, >rules[event], next, next) {
 process_rule(bs, rule, actions_count);
 }
+
+while (actions_count[ACTION_SUSPEND] > 0) {
+qemu_coroutine_yield();
+actions_count[ACTION_SUSPEND]--;
+}
+
 s->state = s->new_state;
 }
 
-- 
2.30.2

[PATCH 0/5] blkdebug: fix racing condition when iterating on

2021-04-08 Thread Emanuele Giuseppe Esposito

When qemu_coroutine_enter is executed in a loop
(even QEMU_FOREACH_SAFE), the new routine can modify the list,
for example removing an element, causing problem when control
is given back to the caller that continues iterating on the same list. 

Patch 1 solves the issue in blkdebug_debug_resume by restarting
the list walk after every coroutine_enter if list has to be fully iterated.
Patches 2,3,4 aim to fix blkdebug_debug_event by gathering
all actions that the rules make in a counter and invoking 
the respective coroutine_yeld only after processing all requests.
Patch 5 adds a lock to protect rules and suspended_reqs.

Signed-off-by: Emanuele Giuseppe Esposito 

Emanuele Giuseppe Esposito (5):
  blkdebug: refactor removal of a suspended request
  blkdebug: move post-resume handling to resume_req_by_tag
  blkdebug: track all actions
  blkdebug: do not suspend in the middle of QLIST_FOREACH_SAFE
  blkdebug: protect rules and suspended_reqs with a lock

 block/blkdebug.c | 113 +--
 1 file changed, 79 insertions(+), 34 deletions(-)

-- 
2.30.2

[PATCH 3/5] blkdebug: track all actions

2021-04-08 Thread Emanuele Giuseppe Esposito

Add a counter for each action that a rule can trigger.
This is mainly used to keep track of how many coroutine_yeld()
we need to perform after processing all rules in the list.

Co-developed-by: Paolo Bonzini 
Signed-off-by: Emanuele Giuseppe Esposito 
---
 block/blkdebug.c | 17 -
 1 file changed, 8 insertions(+), 9 deletions(-)

diff --git a/block/blkdebug.c b/block/blkdebug.c
index e37f999254..388b5ed615 100644
--- a/block/blkdebug.c
+++ b/block/blkdebug.c
@@ -74,6 +74,7 @@ enum {
 ACTION_INJECT_ERROR,
 ACTION_SET_STATE,
 ACTION_SUSPEND,
+ACTION__MAX,
 };
 
 typedef struct BlkdebugRule {
@@ -791,22 +792,22 @@ static void suspend_request(BlockDriverState *bs, 
BlkdebugRule *rule)
 qemu_coroutine_yield();
 }
 
-static bool process_rule(BlockDriverState *bs, struct BlkdebugRule *rule,
-bool injected)
+static void process_rule(BlockDriverState *bs, struct BlkdebugRule *rule,
+ int *action_count)
 {
 BDRVBlkdebugState *s = bs->opaque;
 
 /* Only process rules for the current state */
 if (rule->state && rule->state != s->state) {
-return injected;
+return;
 }
 
 /* Take the action */
+action_count[rule->action]++;
 switch (rule->action) {
 case ACTION_INJECT_ERROR:
-if (!injected) {
+if (action_count[ACTION_INJECT_ERROR] == 1) {
 QSIMPLEQ_INIT(>active_rules);
-injected = true;
 }
 QSIMPLEQ_INSERT_HEAD(>active_rules, rule, active_next);
 break;
@@ -819,21 +820,19 @@ static bool process_rule(BlockDriverState *bs, struct 
BlkdebugRule *rule,
 suspend_request(bs, rule);
 break;
 }
-return injected;
 }
 
 static void blkdebug_debug_event(BlockDriverState *bs, BlkdebugEvent event)
 {
 BDRVBlkdebugState *s = bs->opaque;
 struct BlkdebugRule *rule, *next;
-bool injected;
+int actions_count[ACTION__MAX] = { 0 };
 
 assert((int)event >= 0 && event < BLKDBG__MAX);
 
-injected = false;
 s->new_state = s->state;
 QLIST_FOREACH_SAFE(rule, >rules[event], next, next) {
-injected = process_rule(bs, rule, injected);
+process_rule(bs, rule, actions_count);
 }
 s->state = s->new_state;
 }
-- 
2.30.2

[PATCH 2/5] blkdebug: move post-resume handling to resume_req_by_tag

2021-04-08 Thread Emanuele Giuseppe Esposito

We want to move qemu_coroutine_yield() after the loop on rules,
because QLIST_FOREACH_SAFE is wrong if the rule list is modified
while the coroutine has yielded.  Therefore move the suspended
request to the heap and clean it up from the remove side.
All that is left is for blkdebug_debug_event to handle the
yielding.

Co-developed-by: Paolo Bonzini 
Signed-off-by: Emanuele Giuseppe Esposito 
---
 block/blkdebug.c | 31 ++-
 1 file changed, 18 insertions(+), 13 deletions(-)

diff --git a/block/blkdebug.c b/block/blkdebug.c
index 8f19d991fa..e37f999254 100644
--- a/block/blkdebug.c
+++ b/block/blkdebug.c
@@ -775,25 +775,20 @@ static void blkdebug_close(BlockDriverState *bs)
 static void suspend_request(BlockDriverState *bs, BlkdebugRule *rule)
 {
 BDRVBlkdebugState *s = bs->opaque;
-BlkdebugSuspendedReq r;
+BlkdebugSuspendedReq *r;
 
-r = (BlkdebugSuspendedReq) {
-.co = qemu_coroutine_self(),
-.tag= g_strdup(rule->options.suspend.tag),
-};
+r = g_new(BlkdebugSuspendedReq, 1);
+
+r->co = qemu_coroutine_self();
+r->tag= g_strdup(rule->options.suspend.tag);
 
 remove_rule(rule);
-QLIST_INSERT_HEAD(>suspended_reqs, , next);
+QLIST_INSERT_HEAD(>suspended_reqs, r, next);
 
 if (!qtest_enabled()) {
-printf("blkdebug: Suspended request '%s'\n", r.tag);
+printf("blkdebug: Suspended request '%s'\n", r->tag);
 }
 qemu_coroutine_yield();
-if (!qtest_enabled()) {
-printf("blkdebug: Resuming request '%s'\n", r.tag);
-}
-
-g_free(r.tag);
 }
 
 static bool process_rule(BlockDriverState *bs, struct BlkdebugRule *rule,
@@ -875,8 +870,18 @@ static int resume_req_by_tag(BDRVBlkdebugState *s, const 
char *tag, bool all)
 retry:
 QLIST_FOREACH(r, >suspended_reqs, next) {
 if (!strcmp(r->tag, tag)) {
+Coroutine *co = r->co;
+
+if (!qtest_enabled()) {
+printf("blkdebug: Resuming request '%s'\n", r->tag);
+}
+
 QLIST_REMOVE(r, next);
-qemu_coroutine_enter(r->co);
+g_free(r->tag);
+g_free(r);
+
+qemu_coroutine_enter(co);
+
 if (all) {
 goto retry;
 }
-- 
2.30.2

[PATCH 1/5] blkdebug: refactor removal of a suspended request

2021-04-08 Thread Emanuele Giuseppe Esposito

Extract to a separate function.  Do not rely on FOREACH_SAFE, which is
only "safe" if the *current* node is removed---not if another node is
removed.  Instead, just walk the entire list from the beginning when
asked to resume all suspended requests with a given tag.

Co-developed-by: Paolo Bonzini 
Signed-off-by: Emanuele Giuseppe Esposito 
---
 block/blkdebug.c | 28 +---
 1 file changed, 17 insertions(+), 11 deletions(-)

diff --git a/block/blkdebug.c b/block/blkdebug.c
index 2c0b9b0ee8..8f19d991fa 100644
--- a/block/blkdebug.c
+++ b/block/blkdebug.c
@@ -793,7 +793,6 @@ static void suspend_request(BlockDriverState *bs, 
BlkdebugRule *rule)
 printf("blkdebug: Resuming request '%s'\n", r.tag);
 }
 
-QLIST_REMOVE(, next);
 g_free(r.tag);
 }
 
@@ -869,25 +868,35 @@ static int blkdebug_debug_breakpoint(BlockDriverState 
*bs, const char *event,
 return 0;
 }
 
-static int blkdebug_debug_resume(BlockDriverState *bs, const char *tag)
+static int resume_req_by_tag(BDRVBlkdebugState *s, const char *tag, bool all)
 {
-BDRVBlkdebugState *s = bs->opaque;
-BlkdebugSuspendedReq *r, *next;
+BlkdebugSuspendedReq *r;
 
-QLIST_FOREACH_SAFE(r, >suspended_reqs, next, next) {
+retry:
+QLIST_FOREACH(r, >suspended_reqs, next) {
 if (!strcmp(r->tag, tag)) {
+QLIST_REMOVE(r, next);
 qemu_coroutine_enter(r->co);
+if (all) {
+goto retry;
+}
 return 0;
 }
 }
 return -ENOENT;
 }
 
+static int blkdebug_debug_resume(BlockDriverState *bs, const char *tag)
+{
+BDRVBlkdebugState *s = bs->opaque;
+
+return resume_req_by_tag(s, tag, false);
+}
+
 static int blkdebug_debug_remove_breakpoint(BlockDriverState *bs,
 const char *tag)
 {
 BDRVBlkdebugState *s = bs->opaque;
-BlkdebugSuspendedReq *r, *r_next;
 BlkdebugRule *rule, *next;
 int i, ret = -ENOENT;
 
@@ -900,11 +909,8 @@ static int 
blkdebug_debug_remove_breakpoint(BlockDriverState *bs,
 }
 }
 }
-QLIST_FOREACH_SAFE(r, >suspended_reqs, next, r_next) {
-if (!strcmp(r->tag, tag)) {
-qemu_coroutine_enter(r->co);
-ret = 0;
-}
+if (resume_req_by_tag(s, tag, true) == 0) {
+ret = 0;
 }
 return ret;
 }
-- 
2.30.2

Re: [PATCH v2 03/10] util/async: aio_co_enter(): do aio_co_schedule in general case

2021-04-08 Thread Roman Kagan

On Thu, Apr 08, 2021 at 05:08:20PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> With the following patch we want to call aio_co_wake() from thread.
> And it works bad.
> Assume we have no iothreads.
> Assume we have a coroutine A, which waits in the yield point for external
> aio_co_wake(), and no progress can be done until it happen.
> Main thread is in blocking aio_poll() (for example, in blk_read()).
> 
> Now, in a separate thread we do aio_co_wake(). It calls  aio_co_enter(),
> which goes through last "else" branch and do aio_context_acquire(ctx).
> 
> Now we have a deadlock, as aio_poll() will not release the context lock
> until some progress is done, and progress can't be done until
> aio_co_wake() wake the coroutine A. And it can't because it wait for
> aio_context_acquire().
> 
> Still, aio_co_schedule() works well in parallel with blocking
> aio_poll(). So let's use it in generic case and drop
> aio_context_acquire/aio_context_release branch from aio_co_enter().
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
>  util/async.c | 11 ++-
>  1 file changed, 2 insertions(+), 9 deletions(-)
> 
> diff --git a/util/async.c b/util/async.c
> index 674dbefb7c..f05b883a39 100644
> --- a/util/async.c
> +++ b/util/async.c
> @@ -614,19 +614,12 @@ void aio_co_wake(struct Coroutine *co)
>  
>  void aio_co_enter(AioContext *ctx, struct Coroutine *co)
>  {
> -if (ctx != qemu_get_current_aio_context()) {
> -aio_co_schedule(ctx, co);
> -return;
> -}
> -
> -if (qemu_in_coroutine()) {
> +if (ctx == qemu_get_current_aio_context() && qemu_in_coroutine()) {
>  Coroutine *self = qemu_coroutine_self();
>  assert(self != co);
>  QSIMPLEQ_INSERT_TAIL(>co_queue_wakeup, co, co_queue_next);
>  } else {
> -aio_context_acquire(ctx);
> -qemu_aio_coroutine_enter(ctx, co);
> -aio_context_release(ctx);
> +aio_co_schedule(ctx, co);
>  }
>  }

I'm fine with the change, but I find the log message to be a bit
confusing (although correct).  AFAICS the problem is that calling
aio_co_enter from a thread which has no associated aio_context works
differently compared to calling it from a proper iothread: if the target
context was qemu_aio_context, an iothread would just schedule the
coroutine there, while a "dumb" thread would try lock the context
potentially resulting in a deadlock.  This patch makes "dumb" threads
and iothreads behave identically when entering a coroutine on a foreign
context.

You may want to rephrase the log message to that end.

Anyway

Reviewed-by: Roman Kagan

Re: [RFC PATCH v2 04/11] qemu-iotests: delay QMP socket timers

2021-04-08 Thread Paolo Bonzini


On 07/04/21 15:50, Emanuele Giuseppe Esposito wrote:

  def get_qmp_events_filtered(self, wait=60.0):
  result = []
-for ev in self.get_qmp_events(wait=wait):
+qmp_wait = wait
+if qemu_gdb:
+qmp_wait = 0.0
+for ev in self.get_qmp_events(wait=qmp_wait):
  result.append(filter_qmp_event(ev))
  return result


Should this be handled in get_qmp_events instead, since you're basically 
changing all the callers?


Paolo

Re: [RFC PATCH v2 03/11] qemu-iotests: add option to attach gdbserver

2021-04-08 Thread Paolo Bonzini


On 07/04/21 15:50, Emanuele Giuseppe Esposito wrote:

+self.gdb_qemu = os.getenv('GDB_QEMU')
+
+if gdb and not self.gdb_qemu:
+self.gdb_qemu = 'localhost:12345'
+elif self.gdb_qemu and not gdb:
+del self.gdb_qemu
+del os.environ['GDB_QEMU']


Alternatively:

if gdb:
self.gdb_qemu = os.environ.get('GDB_QEMU', 'localhost:12345')
elif 'GDB_QEMU' in os.environ:
del os.environ['GDB_QEMU']


+GDB_QEMU  -- "{GDB_QEMU}"


Perhaps only include this if gdbserver is actually in use?  (Or not at 
all, since gdbserver anyway prints the port).


Paolo

Re: [PATCH v2 02/10] block/nbd: BDRVNBDState: drop unused connect_err and connect_status

2021-04-08 Thread Roman Kagan

On Thu, Apr 08, 2021 at 05:08:19PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> These fields are write-only. Drop them.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
>  block/nbd.c | 12 ++--
>  1 file changed, 2 insertions(+), 10 deletions(-)

Reviewed-by: Roman Kagan

Re: [PATCH v2 01/10] block/nbd: introduce NBDConnectThread reference counter

2021-04-08 Thread Roman Kagan

On Thu, Apr 08, 2021 at 05:08:18PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> The structure is shared between NBD BDS and connection thread. And it
> is possible the connect thread will finish after closing and releasing
> for the bs. To handle this we have a concept of
> CONNECT_THREAD_RUNNING_DETACHED state and when thread is running and
> BDS is going to be closed we don't free the structure, but instead move
> it to CONNECT_THREAD_RUNNING_DETACHED state, so that thread will free
> it.
> 
> Still more native way to solve the problem is using reference counter
> for shared structure. Let's use it. It makes code smaller and more
> readable.
> 
> New approach also makes checks in nbd_co_establish_connection()
> redundant: now we are sure that s->connect_thread is valid during the
> whole life of NBD BDS.
> 
> This also fixes possible use-after-free of s->connect_thread if
> nbd_co_establish_connection_cancel() clears it during
> nbd_co_establish_connection(), and nbd_co_establish_connection() uses
> local copy of s->connect_thread after yield point.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
>  block/nbd.c | 62 +
>  1 file changed, 20 insertions(+), 42 deletions(-)

Reviewed-by: Roman Kagan

Re: [ovirt-users] Q: Sparsify/fstrim don't reduce actual disk image size

2021-04-08 Thread Nir Soffer

On Thu, Apr 8, 2021 at 10:04 AM Andrei Verovski  wrote:
>
> Hi,
>
> Many thanks, its worked ! Actual size shrunk from 584 to 9 GB, now I have 
> space to backup.
>
> Is there any guidelines how to format QCOW2 images (for Linux) so they can be 
> shrunk in an efficient way?
> With this NextCloud/Collabora LVM I did in the following order:
> swap
> ext2 boot
> ext4 root
> ext4 var (large, for data, all cloud data stored here)
>
> Ext4 partitions on LVM.
>
> Or this is not predictable how data will span QCOW2 space?

I think there is no way to avoid this issue with ovirt block storage.

Regardless of how the data is laid out in the qcow2 image, when we don't
have enough space ovirt extend the disk. This happens many times until
the disk reaches the virtual size (actually more because of qcow2 metadata).

For example we start with 1g image:

[xx--]

Now you write more and ovirt extend the image:

[--]

This repeats many times...

[xxx---]

When you sparsify, some clusters are marked as zero (or discarded), but if
they are not at the end of the image we cannot shrink the image.

[xxxxx---x-xxxxx-x--xx--xx-]

When  you copy the image to a new image the discarded or zero parts are
skipped and the new image contains only the actual data.

If qemu will support a way to defragment an image (best online):

[---]

ovirt can shrink the logical volume after that.

[]

Adding qemu-block since this is an interesting use case that may be useful
to more qemu users.

Nir

> > On 7 Apr 2021, at 18:43, Nir Soffer  wrote:
> >
> > On Wed, Apr 7, 2021 at 5:36 PM Andrei Verovski  wrote:
> >>
> >> Hi !
> >>
> >> I have VM (under oVirt) with single disk thin provision (~600 GB)
> >
> > I guess you are using block storage?
> >
> >> running NextCloud on Debian 9.
> >> Right now VM HD is almost empty. Unfortunately, it occupies 584 GB 
> >> (virtual size = 600 GB)
> >> All partition (except swap and boot) are EXT4 with discard option.
> >
> > You don't need to use discard mount option. fstrim works without it.
> >
> >> in oVirt “enable discard = on”.
> >>
> >> # fstrim -av runs successfully:
> >> /var: 477.6 GiB (512851144704 bytes) trimmed on 
> >> /dev/mapper/vg--system-lv4--data
> >> /boot: 853.8 MiB (895229952 bytes) trimmed on 
> >> /dev/mapper/vg--system-lv2--boot
> >> /: 88.4 GiB (94888611840 bytes) trimmed on /dev/mapper/vg--system-lv3—sys
> >>
> >> When fstrim runs again, it trims zero. I even run “Sparsify” in oVirt. 
> >> Unfortunately, actual size is still 584 GB.
> >>
> >> Here is /etc/fstab
> >> /dev/mapper/vg--system-lv3--sys /   ext4
> >> discard,noatime,nodiratime,errors=remount-ro 0   1
> >> /dev/mapper/vg--system-lv2--boot /boot   ext2defaults0 
> >>   2
> >> /dev/mapper/vg--system-lv4--data /varext4
> >> discard,noatime,nodiratime 0   2
> >> /dev/mapper/vg--system-lv1--swap noneswapsw  0 
> >>   0
> >>
> >> When disk was partitioned/formatted, swap and boot were created first and 
> >> positioned at the beginning.
> >>
> >> What is wrong here? Is it possible to fix all this ?
> >
> > Discarding data mark the areas in the qcow2 image as zero, but it does not 
> > move
> > actual data around (which is very slow). So if the clusters were at
> > the end of the
> > image they remain there, and the actual file size is the same.
> >
> > The only way to reclaim the space is:
> > 1. sparsify disk - must be done *before* copying the disk.
> > 2. move this to another storage domain
> > 3. move disk back to the original storage domain
> >
> > We may have an easier and more efficient way in the future that
> > works with single storage domain, but it will have to copy the
> > entire disk. If the disk is really mostly empty it should be fast.
> >
> > Nir
> >
>

[PATCH v2 03/10] util/async: aio_co_enter(): do aio_co_schedule in general case

2021-04-08 Thread Vladimir Sementsov-Ogievskiy

With the following patch we want to call aio_co_wake() from thread.
And it works bad.
Assume we have no iothreads.
Assume we have a coroutine A, which waits in the yield point for external
aio_co_wake(), and no progress can be done until it happen.
Main thread is in blocking aio_poll() (for example, in blk_read()).

Now, in a separate thread we do aio_co_wake(). It calls  aio_co_enter(),
which goes through last "else" branch and do aio_context_acquire(ctx).

Now we have a deadlock, as aio_poll() will not release the context lock
until some progress is done, and progress can't be done until
aio_co_wake() wake the coroutine A. And it can't because it wait for
aio_context_acquire().

Still, aio_co_schedule() works well in parallel with blocking
aio_poll(). So let's use it in generic case and drop
aio_context_acquire/aio_context_release branch from aio_co_enter().

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 util/async.c | 11 ++-
 1 file changed, 2 insertions(+), 9 deletions(-)

diff --git a/util/async.c b/util/async.c
index 674dbefb7c..f05b883a39 100644
--- a/util/async.c
+++ b/util/async.c
@@ -614,19 +614,12 @@ void aio_co_wake(struct Coroutine *co)
 
 void aio_co_enter(AioContext *ctx, struct Coroutine *co)
 {
-if (ctx != qemu_get_current_aio_context()) {
-aio_co_schedule(ctx, co);
-return;
-}
-
-if (qemu_in_coroutine()) {
+if (ctx == qemu_get_current_aio_context() && qemu_in_coroutine()) {
 Coroutine *self = qemu_coroutine_self();
 assert(self != co);
 QSIMPLEQ_INSERT_TAIL(>co_queue_wakeup, co, co_queue_next);
 } else {
-aio_context_acquire(ctx);
-qemu_aio_coroutine_enter(ctx, co);
-aio_context_release(ctx);
+aio_co_schedule(ctx, co);
 }
 }
 
-- 
2.29.2

[PATCH v2 00/10] block/nbd: move connection code to separate file

2021-04-08 Thread Vladimir Sementsov-Ogievskiy

Hi all!

This substitutes "[PATCH 00/14] nbd: move reconnect-thread to separate file"
Supersedes: <20210407104637.36033-1-vsement...@virtuozzo.com>

I want to simplify block/nbd.c which is overcomplicated now. First step
is splitting out what could be split.

These series creates new file nbd/client-connection.c and part of
block/nbd.c is refactored and moved.

v2 is mostly rewritten. I decided move larger part, otherwise it doesn't
make real sense.

Note also that v2 is based on master. Patch 01 actually solves same
problem as
"[PATCH for-6.0] block/nbd: fix possible use after free of s->connect_thread" 
[*]
in a smarter way. So, if [*] goes first, this will be rebased to undo
[*].

Vladimir Sementsov-Ogievskiy (10):
  block/nbd: introduce NBDConnectThread reference counter
  block/nbd: BDRVNBDState: drop unused connect_err and connect_status
  util/async: aio_co_enter(): do aio_co_schedule in general case
  block/nbd: simplify waking of nbd_co_establish_connection()
  block/nbd: drop thr->state
  block/nbd: bs-independent interface for nbd_co_establish_connection()
  block/nbd: make nbd_co_establish_connection_cancel() bs-independent
  block/nbd: rename NBDConnectThread to NBDClientConnection
  block/nbd: introduce nbd_client_connection_new()
  nbd: move connection code from block/nbd to nbd/client-connection

 include/block/nbd.h |  11 ++
 block/nbd.c | 288 ++--
 nbd/client-connection.c | 192 +++
 util/async.c|  11 +-
 nbd/meson.build |   1 +
 5 files changed, 218 insertions(+), 285 deletions(-)
 create mode 100644 nbd/client-connection.c

-- 
2.29.2

[PATCH v2 06/10] block/nbd: bs-independent interface for nbd_co_establish_connection()

2021-04-08 Thread Vladimir Sementsov-Ogievskiy

We are going to split connection code to separate file. Now we are
ready to give nbd_co_establish_connection() clean and bs-independent
interface.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/nbd.c | 49 +++--
 1 file changed, 31 insertions(+), 18 deletions(-)

diff --git a/block/nbd.c b/block/nbd.c
index 85c20e7810..a487fd1e68 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -122,7 +122,8 @@ typedef struct BDRVNBDState {
 static void connect_thread_state_unref(NBDConnectThread *thr);
 static int nbd_establish_connection(BlockDriverState *bs, SocketAddress *saddr,
 Error **errp);
-static int nbd_co_establish_connection(BlockDriverState *bs, Error **errp);
+static coroutine_fn QIOChannelSocket *
+nbd_co_establish_connection(NBDConnectThread *thr, Error **errp);
 static void nbd_co_establish_connection_cancel(BlockDriverState *bs);
 static int nbd_client_handshake(BlockDriverState *bs, Error **errp);
 static void nbd_yank(void *opaque);
@@ -390,22 +391,36 @@ static void *connect_thread_func(void *opaque)
 return NULL;
 }
 
-static int coroutine_fn
-nbd_co_establish_connection(BlockDriverState *bs, Error **errp)
+/*
+ * Get a new connection in context of @thr:
+ *   if thread is running, wait for completion
+ *   if thread is already succeeded in background, and user didn't get the
+ * result, just return it now
+ *   otherwise if thread is not running, start a thread and wait for completion
+ */
+static coroutine_fn QIOChannelSocket *
+nbd_co_establish_connection(NBDConnectThread *thr, Error **errp)
 {
+QIOChannelSocket *sioc = NULL;
 QemuThread thread;
-BDRVNBDState *s = bs->opaque;
-NBDConnectThread *thr = s->connect_thread;
-
-assert(!s->sioc);
 
 qemu_mutex_lock(>mutex);
 
+/*
+ * Don't call nbd_co_establish_connection() in several coroutines in
+ * parallel. Only one call at once is supported.
+ */
+assert(!thr->wait_co);
+
 if (!thr->running) {
 if (thr->sioc) {
 /* Previous attempt finally succeeded in background */
-goto out;
+sioc = g_steal_pointer(>sioc);
+qemu_mutex_unlock(>mutex);
+
+return sioc;
 }
+
 thr->running = true;
 error_free(thr->err);
 thr->err = NULL;
@@ -420,13 +435,12 @@ nbd_co_establish_connection(BlockDriverState *bs, Error 
**errp)
 
 /*
  * We are going to wait for connect-thread finish, but
- * nbd_client_co_drain_begin() can interrupt.
+ * nbd_co_establish_connection_cancel() can interrupt.
  */
 qemu_coroutine_yield();
 
 qemu_mutex_lock(>mutex);
 
-out:
 if (thr->running) {
 /*
  * Obviously, drained section wants to start. Report the attempt as
@@ -437,17 +451,12 @@ out:
 } else {
 error_propagate(errp, thr->err);
 thr->err = NULL;
-s->sioc = thr->sioc;
-thr->sioc = NULL;
-if (s->sioc) {
-yank_register_function(BLOCKDEV_YANK_INSTANCE(bs->node_name),
-   nbd_yank, bs);
-}
+sioc = g_steal_pointer(>sioc);
 }
 
 qemu_mutex_unlock(>mutex);
 
-return s->sioc ? 0 : -1;
+return sioc;
 }
 
 /*
@@ -514,11 +523,15 @@ static coroutine_fn void 
nbd_reconnect_attempt(BDRVNBDState *s)
 s->ioc = NULL;
 }
 
-if (nbd_co_establish_connection(s->bs, NULL) < 0) {
+s->sioc = nbd_co_establish_connection(s->connect_thread, NULL);
+if (!s->sioc) {
 ret = -ECONNREFUSED;
 goto out;
 }
 
+yank_register_function(BLOCKDEV_YANK_INSTANCE(s->bs->node_name), nbd_yank,
+   s->bs);
+
 bdrv_dec_in_flight(s->bs);
 
 ret = nbd_client_handshake(s->bs, NULL);
-- 
2.29.2

[PATCH v2 10/10] nbd: move connection code from block/nbd to nbd/client-connection

2021-04-08 Thread Vladimir Sementsov-Ogievskiy

We now have bs-independent connection API, which consists of four
functions:

  nbd_client_connection_new()
  nbd_client_connection_unref()
  nbd_co_establish_connection()
  nbd_co_establish_connection_cancel()

Move them to a separate file together with NBDClientConnection
structure which becomes private to the new API.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---

Hmm. I keep only Virtuozzo's copyright in a new file, as actually I've
developed nbd-reconnection code. Still probably safer to save all
copyrights. Let me now if you think so and I'll add them.

 include/block/nbd.h |  11 +++
 block/nbd.c | 167 --
 nbd/client-connection.c | 192 
 nbd/meson.build |   1 +
 4 files changed, 204 insertions(+), 167 deletions(-)
 create mode 100644 nbd/client-connection.c

diff --git a/include/block/nbd.h b/include/block/nbd.h
index 5f34d23bb0..d4666b105e 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -406,4 +406,15 @@ const char *nbd_info_lookup(uint16_t info);
 const char *nbd_cmd_lookup(uint16_t info);
 const char *nbd_err_lookup(int err);
 
+/* nbd/client-connection.c */
+typedef struct NBDClientConnection NBDClientConnection;
+
+NBDClientConnection *nbd_client_connection_new(const SocketAddress *saddr);
+void nbd_client_connection_unref(NBDClientConnection *conn);
+
+QIOChannelSocket *coroutine_fn
+nbd_co_establish_connection(NBDClientConnection *conn, Error **errp);
+
+void coroutine_fn nbd_co_establish_connection_cancel(NBDClientConnection 
*conn);
+
 #endif
diff --git a/block/nbd.c b/block/nbd.c
index 376ab9f92d..1db86b7340 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -66,25 +66,6 @@ typedef enum NBDClientState {
 NBD_CLIENT_QUIT
 } NBDClientState;
 
-typedef struct NBDClientConnection {
-/* Initialization constants */
-SocketAddress *saddr; /* address to connect to */
-
-/*
- * Result of last attempt. Valid in FAIL and SUCCESS states.
- * If you want to steal error, don't forget to set pointer to NULL.
- */
-QIOChannelSocket *sioc;
-Error *err;
-
-int refcnt; /* atomic access */
-
-QemuMutex mutex;
-/* All further fields are protected by mutex */
-bool running; /* thread is running now */
-Coroutine *wait_co; /* nbd_co_establish_connection() wait in yield() */
-} NBDClientConnection;
-
 typedef struct BDRVNBDState {
 QIOChannelSocket *sioc; /* The master data channel */
 QIOChannel *ioc; /* The current I/O channel which may differ (eg TLS) */
@@ -119,12 +100,8 @@ typedef struct BDRVNBDState {
 NBDClientConnection *conn;
 } BDRVNBDState;
 
-static void nbd_client_connection_unref(NBDClientConnection *conn);
 static int nbd_establish_connection(BlockDriverState *bs, SocketAddress *saddr,
 Error **errp);
-static coroutine_fn QIOChannelSocket *
-nbd_co_establish_connection(NBDClientConnection *conn, Error **errp);
-static void nbd_co_establish_connection_cancel(NBDClientConnection *conn);
 static int nbd_client_handshake(BlockDriverState *bs, Error **errp);
 static void nbd_yank(void *opaque);
 
@@ -336,150 +313,6 @@ static bool nbd_client_connecting_wait(BDRVNBDState *s)
 return qatomic_load_acquire(>state) == NBD_CLIENT_CONNECTING_WAIT;
 }
 
-static NBDClientConnection *
-nbd_client_connection_new(const SocketAddress *saddr)
-{
-NBDClientConnection *conn = g_new(NBDClientConnection, 1);
-
-*conn = (NBDClientConnection) {
-.saddr = QAPI_CLONE(SocketAddress, saddr),
-.refcnt = 1,
-};
-
-qemu_mutex_init(>mutex);
-
-return conn;
-}
-
-static void nbd_client_connection_unref(NBDClientConnection *conn)
-{
-if (qatomic_dec_fetch(>refcnt) == 0) {
-if (conn->sioc) {
-qio_channel_close(QIO_CHANNEL(conn->sioc), NULL);
-}
-error_free(conn->err);
-qapi_free_SocketAddress(conn->saddr);
-g_free(conn);
-}
-}
-
-static void *connect_thread_func(void *opaque)
-{
-NBDClientConnection *conn = opaque;
-int ret;
-
-conn->sioc = qio_channel_socket_new();
-
-error_free(conn->err);
-conn->err = NULL;
-ret = qio_channel_socket_connect_sync(conn->sioc, conn->saddr, >err);
-if (ret < 0) {
-object_unref(OBJECT(conn->sioc));
-conn->sioc = NULL;
-}
-
-qemu_mutex_lock(>mutex);
-
-assert(conn->running);
-conn->running = false;
-if (conn->wait_co) {
-aio_co_wake(conn->wait_co);
-conn->wait_co = NULL;
-}
-
-qemu_mutex_unlock(>mutex);
-
-nbd_client_connection_unref(conn);
-
-return NULL;
-}
-
-/*
- * Get a new connection in context of @conn:
- *   if thread is running, wait for completion
- *   if thread is already succeeded in background, and user didn't get the
- * result, just return it now
- *   otherwise if thread is not running, start a thread and wait for completion
- */
-static coroutine_fn QIOChannelSocket *

[PATCH v2 07/10] block/nbd: make nbd_co_establish_connection_cancel() bs-independent

2021-04-08 Thread Vladimir Sementsov-Ogievskiy

nbd_co_establish_connection_cancel() actually needs only pointer to
NBDConnectThread. So, make it clean.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/nbd.c | 16 +++-
 1 file changed, 7 insertions(+), 9 deletions(-)

diff --git a/block/nbd.c b/block/nbd.c
index a487fd1e68..ebbb0bec6a 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -124,7 +124,7 @@ static int nbd_establish_connection(BlockDriverState *bs, 
SocketAddress *saddr,
 Error **errp);
 static coroutine_fn QIOChannelSocket *
 nbd_co_establish_connection(NBDConnectThread *thr, Error **errp);
-static void nbd_co_establish_connection_cancel(BlockDriverState *bs);
+static void nbd_co_establish_connection_cancel(NBDConnectThread *thr);
 static int nbd_client_handshake(BlockDriverState *bs, Error **errp);
 static void nbd_yank(void *opaque);
 
@@ -271,7 +271,7 @@ static void coroutine_fn 
nbd_client_co_drain_begin(BlockDriverState *bs)
 qemu_co_sleep_wake(s->connection_co_sleep_ns_state);
 }
 
-nbd_co_establish_connection_cancel(bs);
+nbd_co_establish_connection_cancel(s->connect_thread);
 
 reconnect_delay_timer_del(s);
 
@@ -311,7 +311,7 @@ static void nbd_teardown_connection(BlockDriverState *bs)
 if (s->connection_co_sleep_ns_state) {
 qemu_co_sleep_wake(s->connection_co_sleep_ns_state);
 }
-nbd_co_establish_connection_cancel(bs);
+nbd_co_establish_connection_cancel(s->connect_thread);
 }
 if (qemu_in_coroutine()) {
 s->teardown_co = qemu_coroutine_self();
@@ -461,14 +461,12 @@ nbd_co_establish_connection(NBDConnectThread *thr, Error 
**errp)
 
 /*
  * nbd_co_establish_connection_cancel
- * Cancel nbd_co_establish_connection asynchronously: it will finish soon, to
- * allow drained section to begin.
+ * Cancel nbd_co_establish_connection() asynchronously. Note, that it doesn't
+ * stop the thread itself neither close the socket. It just safely wakes
+ * nbd_co_establish_connection() sleeping in the yield().
  */
-static void nbd_co_establish_connection_cancel(BlockDriverState *bs)
+static void nbd_co_establish_connection_cancel(NBDConnectThread *thr)
 {
-BDRVNBDState *s = bs->opaque;
-NBDConnectThread *thr = s->connect_thread;
-
 qemu_mutex_lock(>mutex);
 
 if (thr->wait_co) {
-- 
2.29.2

[PATCH v2 09/10] block/nbd: introduce nbd_client_connection_new()

2021-04-08 Thread Vladimir Sementsov-Ogievskiy

This is the last step of creating bs-independing nbd connection
interface. With next commit we can finally move it to separate file.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/nbd.c | 15 +--
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/block/nbd.c b/block/nbd.c
index ab3ef13366..376ab9f92d 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -336,16 +336,19 @@ static bool nbd_client_connecting_wait(BDRVNBDState *s)
 return qatomic_load_acquire(>state) == NBD_CLIENT_CONNECTING_WAIT;
 }
 
-static void nbd_init_connect_thread(BDRVNBDState *s)
+static NBDClientConnection *
+nbd_client_connection_new(const SocketAddress *saddr)
 {
-s->conn = g_new(NBDClientConnection, 1);
+NBDClientConnection *conn = g_new(NBDClientConnection, 1);
 
-*s->conn = (NBDClientConnection) {
-.saddr = QAPI_CLONE(SocketAddress, s->saddr),
+*conn = (NBDClientConnection) {
+.saddr = QAPI_CLONE(SocketAddress, saddr),
 .refcnt = 1,
 };
 
-qemu_mutex_init(>conn->mutex);
+qemu_mutex_init(>mutex);
+
+return conn;
 }
 
 static void nbd_client_connection_unref(NBDClientConnection *conn)
@@ -2211,7 +2214,7 @@ static int nbd_open(BlockDriverState *bs, QDict *options, 
int flags,
 /* successfully connected */
 s->state = NBD_CLIENT_CONNECTED;
 
-nbd_init_connect_thread(s);
+s->conn = nbd_client_connection_new(s->saddr);
 
 s->connection_co = qemu_coroutine_create(nbd_connection_entry, s);
 bdrv_inc_in_flight(bs);
-- 
2.29.2

[PATCH v2 02/10] block/nbd: BDRVNBDState: drop unused connect_err and connect_status

2021-04-08 Thread Vladimir Sementsov-Ogievskiy

These fields are write-only. Drop them.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/nbd.c | 12 ++--
 1 file changed, 2 insertions(+), 10 deletions(-)

diff --git a/block/nbd.c b/block/nbd.c
index 64b2977dc8..b0bbde741a 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -117,8 +117,6 @@ typedef struct BDRVNBDState {
 bool wait_drained_end;
 int in_flight;
 NBDClientState state;
-int connect_status;
-Error *connect_err;
 bool wait_in_flight;
 
 QEMUTimer *reconnect_delay_timer;
@@ -556,7 +554,6 @@ static void 
nbd_co_establish_connection_cancel(BlockDriverState *bs)
 static coroutine_fn void nbd_reconnect_attempt(BDRVNBDState *s)
 {
 int ret;
-Error *local_err = NULL;
 
 if (!nbd_client_connecting(s)) {
 return;
@@ -597,14 +594,14 @@ static coroutine_fn void 
nbd_reconnect_attempt(BDRVNBDState *s)
 s->ioc = NULL;
 }
 
-if (nbd_co_establish_connection(s->bs, _err) < 0) {
+if (nbd_co_establish_connection(s->bs, NULL) < 0) {
 ret = -ECONNREFUSED;
 goto out;
 }
 
 bdrv_dec_in_flight(s->bs);
 
-ret = nbd_client_handshake(s->bs, _err);
+ret = nbd_client_handshake(s->bs, NULL);
 
 if (s->drained) {
 s->wait_drained_end = true;
@@ -619,11 +616,6 @@ static coroutine_fn void 
nbd_reconnect_attempt(BDRVNBDState *s)
 bdrv_inc_in_flight(s->bs);
 
 out:
-s->connect_status = ret;
-error_free(s->connect_err);
-s->connect_err = NULL;
-error_propagate(>connect_err, local_err);
-
 if (ret >= 0) {
 /* successfully connected */
 s->state = NBD_CLIENT_CONNECTED;
-- 
2.29.2

[PATCH v2 05/10] block/nbd: drop thr->state

2021-04-08 Thread Vladimir Sementsov-Ogievskiy

Actually, the only bit of information we need is "is thread running or
not". We don't need all these states. So, instead of thr->state add
boolean variable thr->running and refactor the code.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/nbd.c | 103 ++--
 1 file changed, 27 insertions(+), 76 deletions(-)

diff --git a/block/nbd.c b/block/nbd.c
index 4e28982e53..85c20e7810 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -66,18 +66,6 @@ typedef enum NBDClientState {
 NBD_CLIENT_QUIT
 } NBDClientState;
 
-typedef enum NBDConnectThreadState {
-/* No thread, no pending results */
-CONNECT_THREAD_NONE,
-
-/* Thread is running, no results for now */
-CONNECT_THREAD_RUNNING,
-
-/* Thread finished, results are stored in a state */
-CONNECT_THREAD_FAIL,
-CONNECT_THREAD_SUCCESS
-} NBDConnectThreadState;
-
 typedef struct NBDConnectThread {
 /* Initialization constants */
 SocketAddress *saddr; /* address to connect to */
@@ -93,7 +81,7 @@ typedef struct NBDConnectThread {
 
 QemuMutex mutex;
 /* All further fields are protected by mutex */
-NBDConnectThreadState state; /* current state of the thread */
+bool running; /* thread is running now */
 Coroutine *wait_co; /* nbd_co_establish_connection() wait in yield() */
 } NBDConnectThread;
 
@@ -353,7 +341,6 @@ static void nbd_init_connect_thread(BDRVNBDState *s)
 
 *s->connect_thread = (NBDConnectThread) {
 .saddr = QAPI_CLONE(SocketAddress, s->saddr),
-.state = CONNECT_THREAD_NONE,
 .refcnt = 1,
 };
 
@@ -389,16 +376,11 @@ static void *connect_thread_func(void *opaque)
 
 qemu_mutex_lock(>mutex);
 
-switch (thr->state) {
-case CONNECT_THREAD_RUNNING:
-thr->state = ret < 0 ? CONNECT_THREAD_FAIL : CONNECT_THREAD_SUCCESS;
-if (thr->wait_co) {
-aio_co_wake(thr->wait_co);
-thr->wait_co = NULL;
-}
-break;
-default:
-abort();
+assert(thr->running);
+thr->running = false;
+if (thr->wait_co) {
+aio_co_wake(thr->wait_co);
+thr->wait_co = NULL;
 }
 
 qemu_mutex_unlock(>mutex);
@@ -411,37 +393,25 @@ static void *connect_thread_func(void *opaque)
 static int coroutine_fn
 nbd_co_establish_connection(BlockDriverState *bs, Error **errp)
 {
-int ret;
 QemuThread thread;
 BDRVNBDState *s = bs->opaque;
 NBDConnectThread *thr = s->connect_thread;
 
+assert(!s->sioc);
+
 qemu_mutex_lock(>mutex);
 
-switch (thr->state) {
-case CONNECT_THREAD_FAIL:
-case CONNECT_THREAD_NONE:
+if (!thr->running) {
+if (thr->sioc) {
+/* Previous attempt finally succeeded in background */
+goto out;
+}
+thr->running = true;
 error_free(thr->err);
 thr->err = NULL;
-thr->state = CONNECT_THREAD_RUNNING;
 qatomic_inc(>refcnt); /* for thread */
 qemu_thread_create(, "nbd-connect",
connect_thread_func, thr, QEMU_THREAD_DETACHED);
-break;
-case CONNECT_THREAD_SUCCESS:
-/* Previous attempt finally succeeded in background */
-thr->state = CONNECT_THREAD_NONE;
-s->sioc = thr->sioc;
-thr->sioc = NULL;
-yank_register_function(BLOCKDEV_YANK_INSTANCE(bs->node_name),
-   nbd_yank, bs);
-qemu_mutex_unlock(>mutex);
-return 0;
-case CONNECT_THREAD_RUNNING:
-/* Already running, will wait */
-break;
-default:
-abort();
 }
 
 thr->wait_co = qemu_coroutine_self();
@@ -456,10 +426,15 @@ nbd_co_establish_connection(BlockDriverState *bs, Error 
**errp)
 
 qemu_mutex_lock(>mutex);
 
-switch (thr->state) {
-case CONNECT_THREAD_SUCCESS:
-case CONNECT_THREAD_FAIL:
-thr->state = CONNECT_THREAD_NONE;
+out:
+if (thr->running) {
+/*
+ * Obviously, drained section wants to start. Report the attempt as
+ * failed. Still connect thread is executing in background, and its
+ * result may be used for next connection attempt.
+ */
+error_setg(errp, "Connection attempt cancelled by other operation");
+} else {
 error_propagate(errp, thr->err);
 thr->err = NULL;
 s->sioc = thr->sioc;
@@ -468,32 +443,11 @@ nbd_co_establish_connection(BlockDriverState *bs, Error 
**errp)
 yank_register_function(BLOCKDEV_YANK_INSTANCE(bs->node_name),
nbd_yank, bs);
 }
-ret = (s->sioc ? 0 : -1);
-break;
-case CONNECT_THREAD_RUNNING:
-/*
- * Obviously, drained section wants to start. Report the attempt as
- * failed. Still connect thread is executing in background, and its
- * result may be used for next connection attempt.
- */
-ret = -1;
-error_setg(errp, "Connection attempt cancelled by

[PATCH v2 01/10] block/nbd: introduce NBDConnectThread reference counter

2021-04-08 Thread Vladimir Sementsov-Ogievskiy

The structure is shared between NBD BDS and connection thread. And it
is possible the connect thread will finish after closing and releasing
for the bs. To handle this we have a concept of
CONNECT_THREAD_RUNNING_DETACHED state and when thread is running and
BDS is going to be closed we don't free the structure, but instead move
it to CONNECT_THREAD_RUNNING_DETACHED state, so that thread will free
it.

Still more native way to solve the problem is using reference counter
for shared structure. Let's use it. It makes code smaller and more
readable.

New approach also makes checks in nbd_co_establish_connection()
redundant: now we are sure that s->connect_thread is valid during the
whole life of NBD BDS.

This also fixes possible use-after-free of s->connect_thread if
nbd_co_establish_connection_cancel() clears it during
nbd_co_establish_connection(), and nbd_co_establish_connection() uses
local copy of s->connect_thread after yield point.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/nbd.c | 62 +
 1 file changed, 20 insertions(+), 42 deletions(-)

diff --git a/block/nbd.c b/block/nbd.c
index c26dc5a54f..64b2977dc8 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -73,12 +73,6 @@ typedef enum NBDConnectThreadState {
 /* Thread is running, no results for now */
 CONNECT_THREAD_RUNNING,
 
-/*
- * Thread is running, but requestor exited. Thread should close
- * the new socket and free the connect state on exit.
- */
-CONNECT_THREAD_RUNNING_DETACHED,
-
 /* Thread finished, results are stored in a state */
 CONNECT_THREAD_FAIL,
 CONNECT_THREAD_SUCCESS
@@ -101,6 +95,8 @@ typedef struct NBDConnectThread {
 QIOChannelSocket *sioc;
 Error *err;
 
+int refcnt; /* atomic access */
+
 /* state and bh_ctx are protected by mutex */
 QemuMutex mutex;
 NBDConnectThreadState state; /* current state of the thread */
@@ -144,16 +140,18 @@ typedef struct BDRVNBDState {
 NBDConnectThread *connect_thread;
 } BDRVNBDState;
 
+static void connect_thread_state_unref(NBDConnectThread *thr);
 static int nbd_establish_connection(BlockDriverState *bs, SocketAddress *saddr,
 Error **errp);
 static int nbd_co_establish_connection(BlockDriverState *bs, Error **errp);
-static void nbd_co_establish_connection_cancel(BlockDriverState *bs,
-   bool detach);
+static void nbd_co_establish_connection_cancel(BlockDriverState *bs);
 static int nbd_client_handshake(BlockDriverState *bs, Error **errp);
 static void nbd_yank(void *opaque);
 
 static void nbd_clear_bdrvstate(BDRVNBDState *s)
 {
+connect_thread_state_unref(s->connect_thread);
+s->connect_thread = NULL;
 object_unref(OBJECT(s->tlscreds));
 qapi_free_SocketAddress(s->saddr);
 s->saddr = NULL;
@@ -293,7 +291,7 @@ static void coroutine_fn 
nbd_client_co_drain_begin(BlockDriverState *bs)
 qemu_co_sleep_wake(s->connection_co_sleep_ns_state);
 }
 
-nbd_co_establish_connection_cancel(bs, false);
+nbd_co_establish_connection_cancel(bs);
 
 reconnect_delay_timer_del(s);
 
@@ -333,7 +331,7 @@ static void nbd_teardown_connection(BlockDriverState *bs)
 if (s->connection_co_sleep_ns_state) {
 qemu_co_sleep_wake(s->connection_co_sleep_ns_state);
 }
-nbd_co_establish_connection_cancel(bs, true);
+nbd_co_establish_connection_cancel(bs);
 }
 if (qemu_in_coroutine()) {
 s->teardown_co = qemu_coroutine_self();
@@ -376,26 +374,28 @@ static void nbd_init_connect_thread(BDRVNBDState *s)
 .state = CONNECT_THREAD_NONE,
 .bh_func = connect_bh,
 .bh_opaque = s,
+.refcnt = 1,
 };
 
 qemu_mutex_init(>connect_thread->mutex);
 }
 
-static void nbd_free_connect_thread(NBDConnectThread *thr)
+static void connect_thread_state_unref(NBDConnectThread *thr)
 {
-if (thr->sioc) {
-qio_channel_close(QIO_CHANNEL(thr->sioc), NULL);
+if (qatomic_dec_fetch(>refcnt) == 0) {
+if (thr->sioc) {
+qio_channel_close(QIO_CHANNEL(thr->sioc), NULL);
+}
+error_free(thr->err);
+qapi_free_SocketAddress(thr->saddr);
+g_free(thr);
 }
-error_free(thr->err);
-qapi_free_SocketAddress(thr->saddr);
-g_free(thr);
 }
 
 static void *connect_thread_func(void *opaque)
 {
 NBDConnectThread *thr = opaque;
 int ret;
-bool do_free = false;
 
 thr->sioc = qio_channel_socket_new();
 
@@ -419,18 +419,13 @@ static void *connect_thread_func(void *opaque)
 thr->bh_ctx = NULL;
 }
 break;
-case CONNECT_THREAD_RUNNING_DETACHED:
-do_free = true;
-break;
 default:
 abort();
 }
 
 qemu_mutex_unlock(>mutex);
 
-if (do_free) {
-nbd_free_connect_thread(thr);
-}
+connect_thread_state_unref(thr);
 
 return NULL;
 }
@@ -451,6 +446,7 @@

[PATCH v2 08/10] block/nbd: rename NBDConnectThread to NBDClientConnection

2021-04-08 Thread Vladimir Sementsov-Ogievskiy

We are going to move connection code to own file and want clear names
and APIs.

The structure is shared between user and (possibly) several runs of
connect-thread. So it's wrong to call it "thread". Let's rename to
something more generic.

Appropriately rename connect_thread and thr variables to conn.

connect_thread_state_unref() function gets new appropriate name too

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/nbd.c | 127 ++--
 1 file changed, 63 insertions(+), 64 deletions(-)

diff --git a/block/nbd.c b/block/nbd.c
index ebbb0bec6a..ab3ef13366 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -66,7 +66,7 @@ typedef enum NBDClientState {
 NBD_CLIENT_QUIT
 } NBDClientState;
 
-typedef struct NBDConnectThread {
+typedef struct NBDClientConnection {
 /* Initialization constants */
 SocketAddress *saddr; /* address to connect to */
 
@@ -83,7 +83,7 @@ typedef struct NBDConnectThread {
 /* All further fields are protected by mutex */
 bool running; /* thread is running now */
 Coroutine *wait_co; /* nbd_co_establish_connection() wait in yield() */
-} NBDConnectThread;
+} NBDClientConnection;
 
 typedef struct BDRVNBDState {
 QIOChannelSocket *sioc; /* The master data channel */
@@ -116,22 +116,22 @@ typedef struct BDRVNBDState {
 char *x_dirty_bitmap;
 bool alloc_depth;
 
-NBDConnectThread *connect_thread;
+NBDClientConnection *conn;
 } BDRVNBDState;
 
-static void connect_thread_state_unref(NBDConnectThread *thr);
+static void nbd_client_connection_unref(NBDClientConnection *conn);
 static int nbd_establish_connection(BlockDriverState *bs, SocketAddress *saddr,
 Error **errp);
 static coroutine_fn QIOChannelSocket *
-nbd_co_establish_connection(NBDConnectThread *thr, Error **errp);
-static void nbd_co_establish_connection_cancel(NBDConnectThread *thr);
+nbd_co_establish_connection(NBDClientConnection *conn, Error **errp);
+static void nbd_co_establish_connection_cancel(NBDClientConnection *conn);
 static int nbd_client_handshake(BlockDriverState *bs, Error **errp);
 static void nbd_yank(void *opaque);
 
 static void nbd_clear_bdrvstate(BDRVNBDState *s)
 {
-connect_thread_state_unref(s->connect_thread);
-s->connect_thread = NULL;
+nbd_client_connection_unref(s->conn);
+s->conn = NULL;
 object_unref(OBJECT(s->tlscreds));
 qapi_free_SocketAddress(s->saddr);
 s->saddr = NULL;
@@ -271,7 +271,7 @@ static void coroutine_fn 
nbd_client_co_drain_begin(BlockDriverState *bs)
 qemu_co_sleep_wake(s->connection_co_sleep_ns_state);
 }
 
-nbd_co_establish_connection_cancel(s->connect_thread);
+nbd_co_establish_connection_cancel(s->conn);
 
 reconnect_delay_timer_del(s);
 
@@ -311,7 +311,7 @@ static void nbd_teardown_connection(BlockDriverState *bs)
 if (s->connection_co_sleep_ns_state) {
 qemu_co_sleep_wake(s->connection_co_sleep_ns_state);
 }
-nbd_co_establish_connection_cancel(s->connect_thread);
+nbd_co_establish_connection_cancel(s->conn);
 }
 if (qemu_in_coroutine()) {
 s->teardown_co = qemu_coroutine_self();
@@ -338,100 +338,100 @@ static bool nbd_client_connecting_wait(BDRVNBDState *s)
 
 static void nbd_init_connect_thread(BDRVNBDState *s)
 {
-s->connect_thread = g_new(NBDConnectThread, 1);
+s->conn = g_new(NBDClientConnection, 1);
 
-*s->connect_thread = (NBDConnectThread) {
+*s->conn = (NBDClientConnection) {
 .saddr = QAPI_CLONE(SocketAddress, s->saddr),
 .refcnt = 1,
 };
 
-qemu_mutex_init(>connect_thread->mutex);
+qemu_mutex_init(>conn->mutex);
 }
 
-static void connect_thread_state_unref(NBDConnectThread *thr)
+static void nbd_client_connection_unref(NBDClientConnection *conn)
 {
-if (qatomic_dec_fetch(>refcnt) == 0) {
-if (thr->sioc) {
-qio_channel_close(QIO_CHANNEL(thr->sioc), NULL);
+if (qatomic_dec_fetch(>refcnt) == 0) {
+if (conn->sioc) {
+qio_channel_close(QIO_CHANNEL(conn->sioc), NULL);
 }
-error_free(thr->err);
-qapi_free_SocketAddress(thr->saddr);
-g_free(thr);
+error_free(conn->err);
+qapi_free_SocketAddress(conn->saddr);
+g_free(conn);
 }
 }
 
 static void *connect_thread_func(void *opaque)
 {
-NBDConnectThread *thr = opaque;
+NBDClientConnection *conn = opaque;
 int ret;
 
-thr->sioc = qio_channel_socket_new();
+conn->sioc = qio_channel_socket_new();
 
-error_free(thr->err);
-thr->err = NULL;
-ret = qio_channel_socket_connect_sync(thr->sioc, thr->saddr, >err);
+error_free(conn->err);
+conn->err = NULL;
+ret = qio_channel_socket_connect_sync(conn->sioc, conn->saddr, >err);
 if (ret < 0) {
-object_unref(OBJECT(thr->sioc));
-thr->sioc = NULL;
+object_unref(OBJECT(conn->sioc));
+conn->sioc = NULL;
 }
 
-

[PATCH v2 04/10] block/nbd: simplify waking of nbd_co_establish_connection()

2021-04-08 Thread Vladimir Sementsov-Ogievskiy

Instead of connect_bh, bh_ctx and wait_connect fields we can live with
only one link to waiting coroutine, protected by mutex.

So new logic is:

nbd_co_establish_connection() sets wait_co under mutex, release the
mutex and do yield(). Note, that wait_co may be scheduled by thread
immediately after unlocking the mutex. Still, in main thread (or
iothread) we'll not reach the code for entering the coroutine until the
yield() so we are safe.

Both connect_thread_func() and nbd_co_establish_connection_cancel() do
the following to handle wait_co:

Under mutex, if thr->wait_co is not NULL, call aio_co_wake() (which
never tries to acquire aio context since previous commit, so we are
safe to do it under thr->mutex) and set thr->wait_co to NULL.
This way we protect ourselves of scheduling it twice.

Also this commit make nbd_co_establish_connection() less connected to
bs (we have generic pointer to the coroutine, not use s->connection_co
directly). So, we are on the way of splitting connection API out of
nbd.c (which is overcomplicated now).

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/nbd.c | 49 +
 1 file changed, 9 insertions(+), 40 deletions(-)

diff --git a/block/nbd.c b/block/nbd.c
index b0bbde741a..4e28982e53 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -81,12 +81,6 @@ typedef enum NBDConnectThreadState {
 typedef struct NBDConnectThread {
 /* Initialization constants */
 SocketAddress *saddr; /* address to connect to */
-/*
- * Bottom half to schedule on completion. Scheduled only if bh_ctx is not
- * NULL
- */
-QEMUBHFunc *bh_func;
-void *bh_opaque;
 
 /*
  * Result of last attempt. Valid in FAIL and SUCCESS states.
@@ -97,10 +91,10 @@ typedef struct NBDConnectThread {
 
 int refcnt; /* atomic access */
 
-/* state and bh_ctx are protected by mutex */
 QemuMutex mutex;
+/* All further fields are protected by mutex */
 NBDConnectThreadState state; /* current state of the thread */
-AioContext *bh_ctx; /* where to schedule bh (NULL means don't schedule) */
+Coroutine *wait_co; /* nbd_co_establish_connection() wait in yield() */
 } NBDConnectThread;
 
 typedef struct BDRVNBDState {
@@ -134,7 +128,6 @@ typedef struct BDRVNBDState {
 char *x_dirty_bitmap;
 bool alloc_depth;
 
-bool wait_connect;
 NBDConnectThread *connect_thread;
 } BDRVNBDState;
 
@@ -354,15 +347,6 @@ static bool nbd_client_connecting_wait(BDRVNBDState *s)
 return qatomic_load_acquire(>state) == NBD_CLIENT_CONNECTING_WAIT;
 }
 
-static void connect_bh(void *opaque)
-{
-BDRVNBDState *state = opaque;
-
-assert(state->wait_connect);
-state->wait_connect = false;
-aio_co_wake(state->connection_co);
-}
-
 static void nbd_init_connect_thread(BDRVNBDState *s)
 {
 s->connect_thread = g_new(NBDConnectThread, 1);
@@ -370,8 +354,6 @@ static void nbd_init_connect_thread(BDRVNBDState *s)
 *s->connect_thread = (NBDConnectThread) {
 .saddr = QAPI_CLONE(SocketAddress, s->saddr),
 .state = CONNECT_THREAD_NONE,
-.bh_func = connect_bh,
-.bh_opaque = s,
 .refcnt = 1,
 };
 
@@ -410,11 +392,9 @@ static void *connect_thread_func(void *opaque)
 switch (thr->state) {
 case CONNECT_THREAD_RUNNING:
 thr->state = ret < 0 ? CONNECT_THREAD_FAIL : CONNECT_THREAD_SUCCESS;
-if (thr->bh_ctx) {
-aio_bh_schedule_oneshot(thr->bh_ctx, thr->bh_func, thr->bh_opaque);
-
-/* play safe, don't reuse bh_ctx on further connection attempts */
-thr->bh_ctx = NULL;
+if (thr->wait_co) {
+aio_co_wake(thr->wait_co);
+thr->wait_co = NULL;
 }
 break;
 default:
@@ -464,20 +444,14 @@ nbd_co_establish_connection(BlockDriverState *bs, Error 
**errp)
 abort();
 }
 
-thr->bh_ctx = qemu_get_current_aio_context();
+thr->wait_co = qemu_coroutine_self();
 
 qemu_mutex_unlock(>mutex);
 
-
 /*
  * We are going to wait for connect-thread finish, but
  * nbd_client_co_drain_begin() can interrupt.
- *
- * Note that wait_connect variable is not visible for connect-thread. It
- * doesn't need mutex protection, it used only inside home aio context of
- * bs.
  */
-s->wait_connect = true;
 qemu_coroutine_yield();
 
 qemu_mutex_lock(>mutex);
@@ -531,24 +505,19 @@ static void 
nbd_co_establish_connection_cancel(BlockDriverState *bs)
 {
 BDRVNBDState *s = bs->opaque;
 NBDConnectThread *thr = s->connect_thread;
-bool wake = false;
 
 qemu_mutex_lock(>mutex);
 
 if (thr->state == CONNECT_THREAD_RUNNING) {
 /* We can cancel only in running state, when bh is not yet scheduled */
-thr->bh_ctx = NULL;
-if (s->wait_connect) {
-s->wait_connect = false;
-wake = true;
+if (thr->wait_co) {
+aio_co_wake(thr->wait_co);
+thr->wait_co = NULL;

Re: [RFC PATCH v2 00/11] qemu_iotests: improve debugging options

2021-04-08 Thread Emanuele Giuseppe Esposito





On 08/04/2021 14:39, Markus Armbruster wrote:

Emanuele Giuseppe Esposito  writes:


On 08/04/2021 10:26, Markus Armbruster wrote:

Emanuele Giuseppe Esposito  writes:


This series adds the option to attach gdbserver and valgrind
to the QEMU binary running in qemu_iotests.
It also allows to redirect QEMU binaries output of the python tests
to the stdout, instead of a log file.

Patches 1-6 introduce the -gdb option to both python and bash tests,
7-10 extend the already existing -valgrind flag to work also on
python tests, and patch 11 introduces -p to enable logging to stdout.

In particular, patches 1,2,4,8 focus on extending the QMP socket timers
when using gdb/valgrind, otherwise the python tests will fail due to
delays in the QMP responses.

This series is tested on the previous serie
"qemu-iotests: quality of life improvements"
but independent from it, so it can be applied separately.

Signed-off-by: Emanuele Giuseppe Esposito 


How discoverable are these goodies for developers with only superficial
knowledge of iotests?



Not really sure what you mean, but

./check --help now shows:


-p enable prints
-gdb   start gdbserver with $GDB_QEMU options. Default is localhost:12345


Which I guess should be clear enough? Btw two-three weeks ago I didn't
know anything about these tests either.

I agree I can make -p more clear, saying "enable qemu binary prints to
stdout", and move -valgrind to the "optional arguments" instead of
"bash-only"


Yes, please (this is not a demand).

docs/devel/testing.rst section "QEMU iotests" is the place to explain
things in more detail than --help, if that's useful.  Right now it
simply points to check -h.



Ok, I will add a new section in testing.rst explaining the new flags.

Thank you,
Emanuele

Re: [RFC PATCH v2 00/11] qemu_iotests: improve debugging options

2021-04-08 Thread Markus Armbruster

Emanuele Giuseppe Esposito  writes:

> On 08/04/2021 10:26, Markus Armbruster wrote:
>> Emanuele Giuseppe Esposito  writes:
>> 
>>> This series adds the option to attach gdbserver and valgrind
>>> to the QEMU binary running in qemu_iotests.
>>> It also allows to redirect QEMU binaries output of the python tests
>>> to the stdout, instead of a log file.
>>>
>>> Patches 1-6 introduce the -gdb option to both python and bash tests,
>>> 7-10 extend the already existing -valgrind flag to work also on
>>> python tests, and patch 11 introduces -p to enable logging to stdout.
>>>
>>> In particular, patches 1,2,4,8 focus on extending the QMP socket timers
>>> when using gdb/valgrind, otherwise the python tests will fail due to
>>> delays in the QMP responses.
>>>
>>> This series is tested on the previous serie
>>> "qemu-iotests: quality of life improvements"
>>> but independent from it, so it can be applied separately.
>>>
>>> Signed-off-by: Emanuele Giuseppe Esposito 
>> 
>> How discoverable are these goodies for developers with only superficial
>> knowledge of iotests?
>> 
>
> Not really sure what you mean, but
>
> ./check --help now shows:
>
>> -p enable prints
>> -gdb   start gdbserver with $GDB_QEMU options. Default is localhost:12345
>
> Which I guess should be clear enough? Btw two-three weeks ago I didn't 
> know anything about these tests either.
>
> I agree I can make -p more clear, saying "enable qemu binary prints to 
> stdout", and move -valgrind to the "optional arguments" instead of 
> "bash-only"

Yes, please (this is not a demand).

docs/devel/testing.rst section "QEMU iotests" is the place to explain
things in more detail than --help, if that's useful.  Right now it
simply points to check -h.

[PATCH v2] docs: add nvme emulation documentation

2021-04-08 Thread Klaus Jensen

From: Klaus Jensen 

Remove the docs/specs/nvme.txt and replace it with proper documentation
in docs/system/nvme.rst.

Signed-off-by: Klaus Jensen 
Reviewed-by: Philippe Mathieu-Daudé 
---
v2:
 - add descriptions of the simple copy `mssrl`, `mcl` and `msrc`
   parameters. (philmd)

 docs/specs/nvme.txt   |  23 -
 docs/system/index.rst |   1 +
 docs/system/nvme.rst  | 225 ++
 MAINTAINERS   |   2 +-
 4 files changed, 227 insertions(+), 24 deletions(-)
 delete mode 100644 docs/specs/nvme.txt
 create mode 100644 docs/system/nvme.rst

diff --git a/docs/specs/nvme.txt b/docs/specs/nvme.txt
deleted file mode 100644
index 56d393884e7a..
--- a/docs/specs/nvme.txt
+++ /dev/null
@@ -1,23 +0,0 @@
-NVM Express Controller
-==
-
-The nvme device (-device nvme) emulates an NVM Express Controller.
-
-
-Reference Specifications
-
-
-The device currently implements most mandatory features of NVMe v1.3d, see
-
-  https://nvmexpress.org/resources/specifications/
-
-for the specification.
-
-
-Known issues
-
-
-* The accounting numbers in the SMART/Health are reset across power cycles
-
-* Interrupt Coalescing is not supported and is disabled by default in volation
-  of the specification.
diff --git a/docs/system/index.rst b/docs/system/index.rst
index 02d07071810f..b05af716a973 100644
--- a/docs/system/index.rst
+++ b/docs/system/index.rst
@@ -23,6 +23,7 @@ Contents:
net
virtio-net-failover
usb
+   nvme
ivshmem
linuxboot
generic-loader
diff --git a/docs/system/nvme.rst b/docs/system/nvme.rst
new file mode 100644
index ..f7f63d6bf615
--- /dev/null
+++ b/docs/system/nvme.rst
@@ -0,0 +1,225 @@
+==
+NVMe Emulation
+==
+
+QEMU provides NVMe emulation through the ``nvme``, ``nvme-ns`` and
+``nvme-subsys`` devices.
+
+See the following sections for specific information on
+
+  * `Adding NVMe Devices`_, `additional namespaces`_ and `NVM subsystems`_.
+  * Configuration of `Optional Features`_ such as `Controller Memory Buffer`_,
+`Simple Copy`_, `Zoned Namespaces`_, `metadata`_ and `End-to-End Data
+Protection`_,
+
+Adding NVMe Devices
+===
+
+Controller Emulation
+
+
+The QEMU emulated NVMe controller implements version 1.4 of the NVM Express
+specification. All mandatory features are implement with a couple of exceptions
+and limitations:
+
+  * Accounting numbers in the SMART/Health log page are reset when the device
+is power cycled.
+  * Interrupt Coalescing is not supported and is disabled by default.
+
+The simplest way to attach an NVMe controller on the QEMU PCI bus is to add the
+following parameters:
+
+.. code-block:: console
+
+-drive file=nvm.img,if=none,id=nvm
+-device nvme,serial=deadbeef,drive=nvm
+
+There are a number of optional general parameters for the ``nvme`` device. Some
+are mentioned here, but see ``-device nvme,help`` to list all possible
+parameters.
+
+``max_ioqpairs=UINT32`` (default: ``64``)
+  Set the maximum number of allowed I/O queue pairs. This replaces the
+  deprecated ``num_queues`` parameter.
+
+``msix_qsize=UINT16`` (default: ``65``)
+  The number of MSI-X vectors that the device should support.
+
+``mdts=UINT8`` (default: ``7``)
+  Set the Maximum Data Transfer Size of the device.
+
+``use-intel-id`` (default: ``off``)
+  Since QEMU 5.2, the device uses a QEMU allocated "Red Hat" PCI Device and
+  Vendor ID. Set this to ``on`` to revert to the unallocated Intel ID
+  previously used.
+
+Additional Namespaces
+-
+
+In the simplest possible invocation sketched above, the device only support a
+single namespace with the namespace identifier ``1``. To support multiple
+namespaces and additional features, the ``nvme-ns`` device must be used.
+
+.. code-block:: console
+
+   -device nvme,id=nvme-ctrl-0,serial=deadbeef
+   -drive file=nvm-1.img,if=none,id=nvm-1
+   -device nvme-ns,drive=nvm-1
+   -drive file=nvm-2.img,if=none,id=nvm-2
+   -device nvme-ns,drive=nvm-2
+
+The namespaces defined by the ``nvme-ns`` device will attach to the most
+recently defined ``nvme-bus`` that is created by the ``nvme`` device. Namespace
+identifers are allocated automatically, starting from ``1``.
+
+There are a number of parameters available:
+
+``nsid`` (default: ``0``)
+  Explicitly set the namespace identifier.
+
+``uuid`` (default: *autogenerated*)
+  Set the UUID of the namespace. This will be reported as a "Namespace UUID"
+  descriptor in the Namespace Identification Descriptor List.
+
+``bus``
+  If there are more ``nvme`` devices defined, this parameter may be used to
+  attach the namespace to a specific ``nvme`` device (identified by an ``id``
+  parameter on the controller device).
+
+NVM Subsystems
+--
+
+Additional features becomes available if the controller device (``nvme``) is
+linked to an NVM Subsystem device (``nvme-subsys``).
+

Re: [PATCH] docs: add nvme emulation documentation

2021-04-08 Thread Klaus Jensen


On Apr  8 12:17, Philippe Mathieu-Daudé wrote:

On 4/8/21 11:50 AM, Klaus Jensen wrote:

From: Klaus Jensen 

Remove the docs/specs/nvme.txt and replace it with proper documentation
in docs/system/nvme.rst.

Signed-off-by: Klaus Jensen 
---
 docs/specs/nvme.txt   |  23 -
 docs/system/index.rst |   1 +
 docs/system/nvme.rst  | 212 ++
 MAINTAINERS   |   2 +-
 4 files changed, 214 insertions(+), 24 deletions(-)
 delete mode 100644 docs/specs/nvme.txt
 create mode 100644 docs/system/nvme.rst



+Simple Copy
+---
+
+The device includes support for TP 4065 ("Simple Copy Command"). Copy command
+limits may be set with the ``mssrl``, ``mcl`` and ``msrc=UINT8`` ``nvme-ns``
+device parameters.


All parameters are described except mssrl/mcl/msrc.
Any particular reason?



Not really, missed them. I'll add them :)


Otherwise:
Reviewed-by: Philippe Mathieu-Daudé 



Thanks!


signature.asc
Description: PGP signature

Re: [RFC PATCH v2 00/11] qemu_iotests: improve debugging options

2021-04-08 Thread Emanuele Giuseppe Esposito





On 08/04/2021 10:26, Markus Armbruster wrote:

Emanuele Giuseppe Esposito  writes:


This series adds the option to attach gdbserver and valgrind
to the QEMU binary running in qemu_iotests.
It also allows to redirect QEMU binaries output of the python tests
to the stdout, instead of a log file.

Patches 1-6 introduce the -gdb option to both python and bash tests,
7-10 extend the already existing -valgrind flag to work also on
python tests, and patch 11 introduces -p to enable logging to stdout.

In particular, patches 1,2,4,8 focus on extending the QMP socket timers
when using gdb/valgrind, otherwise the python tests will fail due to
delays in the QMP responses.

This series is tested on the previous serie
"qemu-iotests: quality of life improvements"
but independent from it, so it can be applied separately.

Signed-off-by: Emanuele Giuseppe Esposito 


How discoverable are these goodies for developers with only superficial
knowledge of iotests?



Not really sure what you mean, but

./check --help now shows:


-p enable prints
-gdb   start gdbserver with $GDB_QEMU options. Default is localhost:12345


Which I guess should be clear enough? Btw two-three weeks ago I didn't 
know anything about these tests either.


I agree I can make -p more clear, saying "enable qemu binary prints to 
stdout", and move -valgrind to the "optional arguments" instead of 
"bash-only"


Emanuele

Re: [PATCH-for-6.0?] hw/block/fdc: Fix 'fallback' property on sysbus floppy disk controllers

2021-04-08 Thread Kevin Wolf

Am 07.04.2021 um 15:37 hat Philippe Mathieu-Daudé geschrieben:
> Setting the 'fallback' property corrupts the QOM instance state
> (FDCtrlSysBus) because it accesses an incorrect offset (it uses
> the offset of the FDCtrlISABus state).
> 
> Fixes: a73275dd6fc ("fdc: Add fallback option")
> Signed-off-by: Philippe Mathieu-Daudé 

Thanks, applied to the block branch.

Kevin

Re: [PATCH 06/14] block/nbd: further segregation of connect-thread

2021-04-08 Thread Roman Kagan

On Wed, Apr 07, 2021 at 01:46:29PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> Add personal state NBDConnectThread for connect-thread and
> nbd_connect_thread_start() function. Next step would be moving
> connect-thread to a separate file.
> 
> Note that we stop publishing thr->sioc during
> qio_channel_socket_connect_sync(). Which probably means that we can't
> early-cancel qio_channel_socket_connect_sync() call in
> nbd_free_connect_thread(). Still, when thread is detached it doesn't
> make big sense, and we have no guarantee anyway.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
>  block/nbd.c | 57 -
>  1 file changed, 39 insertions(+), 18 deletions(-)
> 
> diff --git a/block/nbd.c b/block/nbd.c
> index e16c6d636a..23eb8adab1 100644
> --- a/block/nbd.c
> +++ b/block/nbd.c
> @@ -85,8 +85,6 @@ typedef enum NBDConnectThreadState {
>  } NBDConnectThreadState;
>  
>  typedef struct NBDConnectCB {
> -/* Initialization constants */
> -SocketAddress *saddr; /* address to connect to */
>  /*
>   * Bottom half to schedule on completion. Scheduled only if bh_ctx is not
>   * NULL
> @@ -103,6 +101,15 @@ typedef struct NBDConnectCB {
>  AioContext *bh_ctx; /* where to schedule bh (NULL means don't schedule) 
> */
>  } NBDConnectCB;
>  
> +typedef void (*NBDConnectThreadCallback)(QIOChannelSocket *sioc, int ret,
> + void *opaque);
> +
> +typedef struct NBDConnectThread {
> +SocketAddress *saddr; /* address to connect to */
> +NBDConnectThreadCallback cb;
> +void *cb_opaque;
> +} NBDConnectThread;
> +
>  typedef struct BDRVNBDState {
>  QIOChannelSocket *sioc; /* The master data channel */
>  QIOChannel *ioc; /* The current I/O channel which may differ (eg TLS) */
> @@ -367,7 +374,6 @@ static void nbd_init_connect_thread(BDRVNBDState *s)
>  s->connect_thread = g_new(NBDConnectCB, 1);
>  
>  *s->connect_thread = (NBDConnectCB) {
> -.saddr = QAPI_CLONE(SocketAddress, s->saddr),
>  .state = CONNECT_THREAD_NONE,
>  .bh_func = connect_bh,
>  .bh_opaque = s,
> @@ -378,20 +384,18 @@ static void nbd_init_connect_thread(BDRVNBDState *s)
>  
>  static void nbd_free_connect_thread(NBDConnectCB *thr)
>  {
> -if (thr->sioc) {
> -qio_channel_close(QIO_CHANNEL(thr->sioc), NULL);
> -}

Doesn't this result in an open channel leak?  (The original code here
seems to be missing an unref, too.)

> -qapi_free_SocketAddress(thr->saddr);
>  g_free(thr);
>  }
>  
> -static void connect_thread_cb(int ret, void *opaque)
> +static void connect_thread_cb(QIOChannelSocket *sioc, int ret, void *opaque)
>  {
>  NBDConnectCB *thr = opaque;
>  bool do_free = false;
>  
>  qemu_mutex_lock(>mutex);
>  
> +thr->sioc = sioc;
> +
>  switch (thr->state) {
>  case CONNECT_THREAD_RUNNING:
>  thr->state = ret < 0 ? CONNECT_THREAD_FAIL : CONNECT_THREAD_SUCCESS;
> @@ -418,27 +422,45 @@ static void connect_thread_cb(int ret, void *opaque)
>  
>  static void *connect_thread_func(void *opaque)
>  {
> -NBDConnectCB *thr = opaque;
> +NBDConnectThread *thr = opaque;
>  int ret;
> +QIOChannelSocket *sioc = qio_channel_socket_new();
>  
> -thr->sioc = qio_channel_socket_new();
> -
> -ret = qio_channel_socket_connect_sync(thr->sioc, thr->saddr, NULL);
> +ret = qio_channel_socket_connect_sync(sioc, thr->saddr, NULL);
>  if (ret < 0) {
> -object_unref(OBJECT(thr->sioc));
> -thr->sioc = NULL;
> +object_unref(OBJECT(sioc));
> +sioc = NULL;
>  }
>  
> -connect_thread_cb(ret, thr);
> +thr->cb(sioc, ret, thr->cb_opaque);
> +
> +qapi_free_SocketAddress(thr->saddr);
> +g_free(thr);
>  
>  return NULL;
>  }
>  
> +static void nbd_connect_thread_start(const SocketAddress *saddr,
> + NBDConnectThreadCallback cb,
> + void *cb_opaque)
> +{
> +QemuThread thread;
> +NBDConnectThread *thr = g_new(NBDConnectThread, 1);
> +
> +*thr = (NBDConnectThread) {
> +.saddr = QAPI_CLONE(SocketAddress, saddr),
> +.cb = cb,
> +.cb_opaque = cb_opaque,
> +};
> +
> +qemu_thread_create(, "nbd-connect",
> +   connect_thread_func, thr, QEMU_THREAD_DETACHED);
> +}
> +
>  static int coroutine_fn
>  nbd_co_establish_connection(BlockDriverState *bs)
>  {
>  int ret;
> -QemuThread thread;
>  BDRVNBDState *s = bs->opaque;
>  NBDConnectCB *thr = s->connect_thread;
>  
> @@ -448,8 +470,7 @@ nbd_co_establish_connection(BlockDriverState *bs)
>  case CONNECT_THREAD_FAIL:
>  case CONNECT_THREAD_NONE:
>  thr->state = CONNECT_THREAD_RUNNING;
> -qemu_thread_create(, "nbd-connect",
> -   connect_thread_func, thr, QEMU_THREAD_DETACHED);
> +nbd_connect_thread_start(s->saddr, connect_thread_cb,

Re: [PULL for-6.0 v3 00/10] emulated nvme fixes for -rc3

2021-04-08 Thread Peter Maydell

On Wed, 7 Apr 2021 at 17:53, Klaus Jensen  wrote:
>
> From: Klaus Jensen 
>
> Hi Peter,
>
> Gitlab CI finally got back in business :)
>
> Note: only cover letter sent for this v3.
>
>
> The following changes since commit d0d3dd401b70168a353450e031727affee828527:
>
>   Update version for v6.0.0-rc2 release (2021-04-06 18:34:34 +0100)
>
> are available in the Git repository at:
>
>   git://git.infradead.org/qemu-nvme.git tags/nvme-fixes-20210407-pull-request
>
> for you to fetch changes up to 7645f21f409b67eb9aad9feef6283c2e186e3703:
>
>   hw/block/nvme: fix out-of-bounds read in nvme_subsys_ctrl (2021-04-07 
> 10:48:33 +0200)
>
> 
> emulated nvme fixes for -rc3
>
> v3:
>   - removed unnecessary deprecation warning
>
> v2:
>   - added missing patches
>


Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/6.0
for any user-visible changes.

-- PMM

Re: [PATCH] docs: add nvme emulation documentation

2021-04-08 Thread Philippe Mathieu-Daudé

On 4/8/21 11:50 AM, Klaus Jensen wrote:
> From: Klaus Jensen 
> 
> Remove the docs/specs/nvme.txt and replace it with proper documentation
> in docs/system/nvme.rst.
> 
> Signed-off-by: Klaus Jensen 
> ---
>  docs/specs/nvme.txt   |  23 -
>  docs/system/index.rst |   1 +
>  docs/system/nvme.rst  | 212 ++
>  MAINTAINERS   |   2 +-
>  4 files changed, 214 insertions(+), 24 deletions(-)
>  delete mode 100644 docs/specs/nvme.txt
>  create mode 100644 docs/system/nvme.rst

> +Simple Copy
> +---
> +
> +The device includes support for TP 4065 ("Simple Copy Command"). Copy command
> +limits may be set with the ``mssrl``, ``mcl`` and ``msrc=UINT8`` ``nvme-ns``
> +device parameters.

All parameters are described except mssrl/mcl/msrc.
Any particular reason?

Otherwise:
Reviewed-by: Philippe Mathieu-Daudé

DROP THIS Re: [PATCH 00/14] nbd: move reconnect-thread to separate file

2021-04-08 Thread Vladimir Sementsov-Ogievskiy


07.04.2021 13:46, Vladimir Sementsov-Ogievskiy wrote:

Hi all!

There are problems with nbd driver:

  - nbd reconnect is cancelled on drain, which is bad as Roman describes
in his "[PATCH 0/7] block/nbd: decouple reconnect from drain"
  - nbd driver is too complicated around drained sections and aio context
switch. It's nearly impossible to follow all the logic, including
abuse of bs->in_flight, which is temporary decreased in some places
(like nbd_read_eof()). Additional reconnect thread and two different
state machines (we have BDRVNBDState::state and
BDRVNBDState::connect_thread->state) doesn't make things simpler :)

So, I have a plan:

1. Move nbd negotiation to connect_thread

2. Do receive NBD replies in request coroutines, not in connection_co
   
At this point we can drop connection_co, and when we don't have

endless running coroutine, NBD driver becomes a usual block driver,
and we can drop abuse of bs->in_flight, and probably drop most of
complicated logic around drained section and aio context switch in
nbd driver.

3. Still, as Roman describes, with [2] we loose a possibility to
reconnect immediately when connection breaks (with current code we
have endless read in reconnect_co, but actually for this to work
keep-alive should be setup correctly). So, we'll need to reinvent it,
checking connection periodically by timeout, with help of getsockopt
or just sending a kind of PING request (zero-length READ or something
like this).

And this series a kind of preparation. The main point of it is moving
connect-thread to a separate file.




Finally I don't like it. I'm in progress of creating new series to substitute 
this one, so don't waste your time on review.


--
Best regards,
Vladimir

Re: [PATCH-for-6.0?] hw/block/fdc: Fix 'fallback' property on sysbus floppy disk controllers

2021-04-08 Thread Philippe Mathieu-Daudé

On 4/8/21 11:38 AM, Markus Armbruster wrote:
> Philippe Mathieu-Daudé  writes:
> 
>> Setting the 'fallback' property corrupts the QOM instance state
>> (FDCtrlSysBus) because it accesses an incorrect offset (it uses
>> the offset of the FDCtrlISABus state).
>>
>> Fixes: a73275dd6fc ("fdc: Add fallback option")
>> Signed-off-by: Philippe Mathieu-Daudé 
>> ---
>>  hw/block/fdc.c | 4 ++--
>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/hw/block/fdc.c b/hw/block/fdc.c
>> index 82afda7f3a7..a825c2acbae 100644
>> --- a/hw/block/fdc.c
>> +++ b/hw/block/fdc.c
>> @@ -2893,7 +2893,7 @@ static Property sysbus_fdc_properties[] = {
>>  DEFINE_PROP_SIGNED("fdtypeB", FDCtrlSysBus, 
>> state.qdev_for_drives[1].type,
>>  FLOPPY_DRIVE_TYPE_AUTO, qdev_prop_fdc_drive_type,
>>  FloppyDriveType),
>> -DEFINE_PROP_SIGNED("fallback", FDCtrlISABus, state.fallback,
>> +DEFINE_PROP_SIGNED("fallback", FDCtrlSysBus, state.fallback,
>>  FLOPPY_DRIVE_TYPE_144, qdev_prop_fdc_drive_type,
>>  FloppyDriveType),
>>  DEFINE_PROP_END_OF_LIST(),
>> @@ -2918,7 +2918,7 @@ static Property sun4m_fdc_properties[] = {
>>  DEFINE_PROP_SIGNED("fdtype", FDCtrlSysBus, 
>> state.qdev_for_drives[0].type,
>>  FLOPPY_DRIVE_TYPE_AUTO, qdev_prop_fdc_drive_type,
>>  FloppyDriveType),
>> -DEFINE_PROP_SIGNED("fallback", FDCtrlISABus, state.fallback,
>> +DEFINE_PROP_SIGNED("fallback", FDCtrlSysBus, state.fallback,
>>  FLOPPY_DRIVE_TYPE_144, qdev_prop_fdc_drive_type,
>>  FloppyDriveType),
>>  DEFINE_PROP_END_OF_LIST(),
> 
> Reviewed-by: Markus Armbruster 
> 
> On whether to pick this into 6.0...
> 
> The patch has no effect unless someone or something uses "fallback" with
> a non-ISA FDC.  There it fixes a bug.  The bug's exact impact is
> unknown.  I figure I could find out, but it doesn't seem to be worth the
> bother.

non-ISA FDC is only used on MIPS/SPARC.

> Commit a73275dd6fc:
> 
> Currently, QEMU chooses a drive type automatically based on the inserted
> media. If there is no disk inserted, it chooses a 1.44MB drive type.
> 
> Change this behavior to be configurable, but leave it defaulted to 1.44.
> 
> This is not earnestly intended to be used by a user or a management
> library, but rather exists so that pre-2.6 board types can configure it
> to be a legacy value.
> 
> We do so only for "isa-fdc", in hw/core/machine.c.
> 
> I don't understand why we don't for the other devices, but that's
> outside this patch's scope.
> 
> Downstreams could do it, but it wouldn't work.  They need this commit to
> make it work.
> 
> Users (human or management application) should not use it, but of course
> they might anyway.  This commit makes such (unadvisable) usage safe.
> 
> The reward is low, but so is the risk.  If I was the maintainer, I'd be
> tempted to take it up to rc3.

Thanks for the impact analysis.

The fix seems harmless to me, but I'm fine having it fixed in 6.1
(this is an old bug, so not critical to have it fixed for 6.0).

Phil.

Re: [PATCH] docs: add nvme emulation documentation

2021-04-08 Thread Peter Maydell

On Thu, 8 Apr 2021 at 10:54, Klaus Jensen  wrote:
>
> Hi Peter,
>
> Are documentation updates acceptable for -rc3?

Yes; they're safe changes, generally.

thanks
-- PMM

Re: [PATCH] docs: add nvme emulation documentation

2021-04-08 Thread Klaus Jensen


Hi Peter,

Are documentation updates acceptable for -rc3?

On Apr  8 11:50, Klaus Jensen wrote:

From: Klaus Jensen 

Remove the docs/specs/nvme.txt and replace it with proper documentation
in docs/system/nvme.rst.

Signed-off-by: Klaus Jensen 
---
docs/specs/nvme.txt   |  23 -
docs/system/index.rst |   1 +
docs/system/nvme.rst  | 212 ++
MAINTAINERS   |   2 +-
4 files changed, 214 insertions(+), 24 deletions(-)
delete mode 100644 docs/specs/nvme.txt
create mode 100644 docs/system/nvme.rst

diff --git a/docs/specs/nvme.txt b/docs/specs/nvme.txt
deleted file mode 100644
index 56d393884e7a..
--- a/docs/specs/nvme.txt
+++ /dev/null
@@ -1,23 +0,0 @@
-NVM Express Controller
-==
-
-The nvme device (-device nvme) emulates an NVM Express Controller.
-
-
-Reference Specifications
-
-
-The device currently implements most mandatory features of NVMe v1.3d, see
-
-  https://nvmexpress.org/resources/specifications/
-
-for the specification.
-
-
-Known issues
-
-
-* The accounting numbers in the SMART/Health are reset across power cycles
-
-* Interrupt Coalescing is not supported and is disabled by default in volation
-  of the specification.
diff --git a/docs/system/index.rst b/docs/system/index.rst
index 02d07071810f..b05af716a973 100644
--- a/docs/system/index.rst
+++ b/docs/system/index.rst
@@ -23,6 +23,7 @@ Contents:
   net
   virtio-net-failover
   usb
+   nvme
   ivshmem
   linuxboot
   generic-loader
diff --git a/docs/system/nvme.rst b/docs/system/nvme.rst
new file mode 100644
index ..ab760d9888f9
--- /dev/null
+++ b/docs/system/nvme.rst
@@ -0,0 +1,212 @@
+==
+NVMe Emulation
+==
+
+QEMU provides NVMe emulation through the ``nvme``, ``nvme-ns`` and
+``nvme-subsys`` devices.
+
+See the following sections for specific information on
+
+  * `Adding NVMe Devices`_, `additional namespaces`_ and `NVM subsystems`_.
+  * Configuration of `Optional Features`_ such as `Controller Memory Buffer`_,
+`Simple Copy`_, `Zoned Namespaces`_, `metadata`_ and `End-to-End Data
+Protection`_,
+
+Adding NVMe Devices
+===
+
+Controller Emulation
+
+
+The QEMU emulated NVMe controller implements version 1.4 of the NVM Express
+specification. All mandatory features are implement with a couple of exceptions
+and limitations:
+
+  * Accounting numbers in the SMART/Health log page are reset when the device
+is power cycled.
+  * Interrupt Coalescing is not supported and is disabled by default.
+
+The simplest way to attach an NVMe controller on the QEMU PCI bus is to add the
+following parameters:
+
+.. code-block:: console
+
+-drive file=nvm.img,if=none,id=nvm
+-device nvme,serial=deadbeef,drive=nvm
+
+There are a number of optional general parameters for the ``nvme`` device. Some
+are mentioned here, but see ``-device nvme,help`` to list all possible
+parameters.
+
+``max_ioqpairs=UINT32`` (default: ``64``)
+  Set the maximum number of allowed I/O queue pairs. This replaces the
+  deprecated ``num_queues`` parameter.
+
+``msix_qsize=UINT16`` (default: ``65``)
+  The number of MSI-X vectors that the device should support.
+
+``mdts=UINT8`` (default: ``7``)
+  Set the Maximum Data Transfer Size of the device.
+
+``use-intel-id`` (default: ``off``)
+  Since QEMU 5.2, the device uses a QEMU allocated "Red Hat" PCI Device and
+  Vendor ID. Set this to ``on`` to revert to the unallocated Intel ID
+  previously used.
+
+Additional Namespaces
+-
+
+In the simplest possible invocation sketched above, the device only support a
+single namespace with the namespace identifier ``1``. To support multiple
+namespaces and additional features, the ``nvme-ns`` device must be used.
+
+.. code-block:: console
+
+   -device nvme,id=nvme-ctrl-0,serial=deadbeef
+   -drive file=nvm-1.img,if=none,id=nvm-1
+   -device nvme-ns,drive=nvm-1
+   -drive file=nvm-2.img,if=none,id=nvm-2
+   -device nvme-ns,drive=nvm-2
+
+The namespaces defined by the ``nvme-ns`` device will attach to the most
+recently defined ``nvme-bus`` that is created by the ``nvme`` device. Namespace
+identifers are allocated automatically, starting from ``1``.
+
+There are a number of parameters available:
+
+``nsid`` (default: ``0``)
+  Explicitly set the namespace identifier.
+
+``uuid`` (default: *autogenerated*)
+  Set the UUID of the namespace. This will be reported as a "Namespace UUID"
+  descriptor in the Namespace Identification Descriptor List.
+
+``bus``
+  If there are more ``nvme`` devices defined, this parameter may be used to
+  attach the namespace to a specific ``nvme`` device (identified by an ``id``
+  parameter on the controller device).
+
+NVM Subsystems
+--
+
+Additional features becomes available if the controller device (``nvme``) is
+linked to an NVM Subsystem device (``nvme-subsys``).
+
+The NVM Subsystem emulation allows features

[PATCH] docs: add nvme emulation documentation

2021-04-08 Thread Klaus Jensen

From: Klaus Jensen 

Remove the docs/specs/nvme.txt and replace it with proper documentation
in docs/system/nvme.rst.

Signed-off-by: Klaus Jensen 
---
 docs/specs/nvme.txt   |  23 -
 docs/system/index.rst |   1 +
 docs/system/nvme.rst  | 212 ++
 MAINTAINERS   |   2 +-
 4 files changed, 214 insertions(+), 24 deletions(-)
 delete mode 100644 docs/specs/nvme.txt
 create mode 100644 docs/system/nvme.rst

diff --git a/docs/specs/nvme.txt b/docs/specs/nvme.txt
deleted file mode 100644
index 56d393884e7a..
--- a/docs/specs/nvme.txt
+++ /dev/null
@@ -1,23 +0,0 @@
-NVM Express Controller
-==
-
-The nvme device (-device nvme) emulates an NVM Express Controller.
-
-
-Reference Specifications
-
-
-The device currently implements most mandatory features of NVMe v1.3d, see
-
-  https://nvmexpress.org/resources/specifications/
-
-for the specification.
-
-
-Known issues
-
-
-* The accounting numbers in the SMART/Health are reset across power cycles
-
-* Interrupt Coalescing is not supported and is disabled by default in volation
-  of the specification.
diff --git a/docs/system/index.rst b/docs/system/index.rst
index 02d07071810f..b05af716a973 100644
--- a/docs/system/index.rst
+++ b/docs/system/index.rst
@@ -23,6 +23,7 @@ Contents:
net
virtio-net-failover
usb
+   nvme
ivshmem
linuxboot
generic-loader
diff --git a/docs/system/nvme.rst b/docs/system/nvme.rst
new file mode 100644
index ..ab760d9888f9
--- /dev/null
+++ b/docs/system/nvme.rst
@@ -0,0 +1,212 @@
+==
+NVMe Emulation
+==
+
+QEMU provides NVMe emulation through the ``nvme``, ``nvme-ns`` and
+``nvme-subsys`` devices.
+
+See the following sections for specific information on
+
+  * `Adding NVMe Devices`_, `additional namespaces`_ and `NVM subsystems`_.
+  * Configuration of `Optional Features`_ such as `Controller Memory Buffer`_,
+`Simple Copy`_, `Zoned Namespaces`_, `metadata`_ and `End-to-End Data
+Protection`_,
+
+Adding NVMe Devices
+===
+
+Controller Emulation
+
+
+The QEMU emulated NVMe controller implements version 1.4 of the NVM Express
+specification. All mandatory features are implement with a couple of exceptions
+and limitations:
+
+  * Accounting numbers in the SMART/Health log page are reset when the device
+is power cycled.
+  * Interrupt Coalescing is not supported and is disabled by default.
+
+The simplest way to attach an NVMe controller on the QEMU PCI bus is to add the
+following parameters:
+
+.. code-block:: console
+
+-drive file=nvm.img,if=none,id=nvm
+-device nvme,serial=deadbeef,drive=nvm
+
+There are a number of optional general parameters for the ``nvme`` device. Some
+are mentioned here, but see ``-device nvme,help`` to list all possible
+parameters.
+
+``max_ioqpairs=UINT32`` (default: ``64``)
+  Set the maximum number of allowed I/O queue pairs. This replaces the
+  deprecated ``num_queues`` parameter.
+
+``msix_qsize=UINT16`` (default: ``65``)
+  The number of MSI-X vectors that the device should support.
+
+``mdts=UINT8`` (default: ``7``)
+  Set the Maximum Data Transfer Size of the device.
+
+``use-intel-id`` (default: ``off``)
+  Since QEMU 5.2, the device uses a QEMU allocated "Red Hat" PCI Device and
+  Vendor ID. Set this to ``on`` to revert to the unallocated Intel ID
+  previously used.
+
+Additional Namespaces
+-
+
+In the simplest possible invocation sketched above, the device only support a
+single namespace with the namespace identifier ``1``. To support multiple
+namespaces and additional features, the ``nvme-ns`` device must be used.
+
+.. code-block:: console
+
+   -device nvme,id=nvme-ctrl-0,serial=deadbeef
+   -drive file=nvm-1.img,if=none,id=nvm-1
+   -device nvme-ns,drive=nvm-1
+   -drive file=nvm-2.img,if=none,id=nvm-2
+   -device nvme-ns,drive=nvm-2
+
+The namespaces defined by the ``nvme-ns`` device will attach to the most
+recently defined ``nvme-bus`` that is created by the ``nvme`` device. Namespace
+identifers are allocated automatically, starting from ``1``.
+
+There are a number of parameters available:
+
+``nsid`` (default: ``0``)
+  Explicitly set the namespace identifier.
+
+``uuid`` (default: *autogenerated*)
+  Set the UUID of the namespace. This will be reported as a "Namespace UUID"
+  descriptor in the Namespace Identification Descriptor List.
+
+``bus``
+  If there are more ``nvme`` devices defined, this parameter may be used to
+  attach the namespace to a specific ``nvme`` device (identified by an ``id``
+  parameter on the controller device).
+
+NVM Subsystems
+--
+
+Additional features becomes available if the controller device (``nvme``) is
+linked to an NVM Subsystem device (``nvme-subsys``).
+
+The NVM Subsystem emulation allows features such as shared namespaces and
+multipath I/O.
+
+.. code-block:: console
+
+   -device

Re: [PATCH-for-6.0?] hw/block/fdc: Fix 'fallback' property on sysbus floppy disk controllers

2021-04-08 Thread Markus Armbruster

Philippe Mathieu-Daudé  writes:

> Setting the 'fallback' property corrupts the QOM instance state
> (FDCtrlSysBus) because it accesses an incorrect offset (it uses
> the offset of the FDCtrlISABus state).
>
> Fixes: a73275dd6fc ("fdc: Add fallback option")
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
>  hw/block/fdc.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/hw/block/fdc.c b/hw/block/fdc.c
> index 82afda7f3a7..a825c2acbae 100644
> --- a/hw/block/fdc.c
> +++ b/hw/block/fdc.c
> @@ -2893,7 +2893,7 @@ static Property sysbus_fdc_properties[] = {
>  DEFINE_PROP_SIGNED("fdtypeB", FDCtrlSysBus, 
> state.qdev_for_drives[1].type,
>  FLOPPY_DRIVE_TYPE_AUTO, qdev_prop_fdc_drive_type,
>  FloppyDriveType),
> -DEFINE_PROP_SIGNED("fallback", FDCtrlISABus, state.fallback,
> +DEFINE_PROP_SIGNED("fallback", FDCtrlSysBus, state.fallback,
>  FLOPPY_DRIVE_TYPE_144, qdev_prop_fdc_drive_type,
>  FloppyDriveType),
>  DEFINE_PROP_END_OF_LIST(),
> @@ -2918,7 +2918,7 @@ static Property sun4m_fdc_properties[] = {
>  DEFINE_PROP_SIGNED("fdtype", FDCtrlSysBus, state.qdev_for_drives[0].type,
>  FLOPPY_DRIVE_TYPE_AUTO, qdev_prop_fdc_drive_type,
>  FloppyDriveType),
> -DEFINE_PROP_SIGNED("fallback", FDCtrlISABus, state.fallback,
> +DEFINE_PROP_SIGNED("fallback", FDCtrlSysBus, state.fallback,
>  FLOPPY_DRIVE_TYPE_144, qdev_prop_fdc_drive_type,
>  FloppyDriveType),
>  DEFINE_PROP_END_OF_LIST(),

Reviewed-by: Markus Armbruster 

On whether to pick this into 6.0...

The patch has no effect unless someone or something uses "fallback" with
a non-ISA FDC.  There it fixes a bug.  The bug's exact impact is
unknown.  I figure I could find out, but it doesn't seem to be worth the
bother.

Commit a73275dd6fc:

Currently, QEMU chooses a drive type automatically based on the inserted
media. If there is no disk inserted, it chooses a 1.44MB drive type.

Change this behavior to be configurable, but leave it defaulted to 1.44.

This is not earnestly intended to be used by a user or a management
library, but rather exists so that pre-2.6 board types can configure it
to be a legacy value.

We do so only for "isa-fdc", in hw/core/machine.c.

I don't understand why we don't for the other devices, but that's
outside this patch's scope.

Downstreams could do it, but it wouldn't work.  They need this commit to
make it work.

Users (human or management application) should not use it, but of course
they might anyway.  This commit makes such (unadvisable) usage safe.

The reward is low, but so is the risk.  If I was the maintainer, I'd be
tempted to take it up to rc3.

Re: [RFC PATCH v2 00/11] qemu_iotests: improve debugging options

2021-04-08 Thread Markus Armbruster

Emanuele Giuseppe Esposito  writes:

> This series adds the option to attach gdbserver and valgrind
> to the QEMU binary running in qemu_iotests.
> It also allows to redirect QEMU binaries output of the python tests
> to the stdout, instead of a log file.
>
> Patches 1-6 introduce the -gdb option to both python and bash tests, 
> 7-10 extend the already existing -valgrind flag to work also on 
> python tests, and patch 11 introduces -p to enable logging to stdout.
>
> In particular, patches 1,2,4,8 focus on extending the QMP socket timers
> when using gdb/valgrind, otherwise the python tests will fail due to
> delays in the QMP responses.
>
> This series is tested on the previous serie
> "qemu-iotests: quality of life improvements"
> but independent from it, so it can be applied separately.
>
> Signed-off-by: Emanuele Giuseppe Esposito 

How discoverable are these goodies for developers with only superficial
knowledge of iotests?

Re: [PATCH 0/2] block/rbd: fix memory leaks

2021-04-08 Thread Stefano Garzarella


On Wed, Apr 07, 2021 at 11:38:17AM +0200, Markus Armbruster wrote:

Max Reitz  writes:


On 29.03.21 17:01, Stefano Garzarella wrote:

This series fixes two memory leaks, found through valgrind, in the
rbd driver.
Stefano Garzarella (2):
   block/rbd: fix memory leak in qemu_rbd_connect()
   block/rbd: fix memory leak in qemu_rbd_co_create_opts()
  block/rbd.c | 10 ++
  1 file changed, 6 insertions(+), 4 deletions(-)


Reviewed-by: Max Reitz 

I’m not quite sure whether this is fit for 6.0...  I think it’s too
late for rc2, so I don’t know.


This the maintainers' call to make.

* PATCH 1:

 CON: Old bug, probably 2.9, i.e. four years

 PRO: The fix is straightforward

* PATCH 2:

 NEUTRAL: Not recent from upstream's point of view (5.0), but
 downstreams may have different ideas

 PRO: The fix is trivial

I encourage you to take at least PATCH 2.



Kevin queued them up, thank you both for the review,
Stefano

Re: [PATCH 1/2] block/rbd: fix memory leak in qemu_rbd_connect()

2021-04-08 Thread Stefano Garzarella


On Tue, Apr 06, 2021 at 10:22:30AM +0200, Markus Armbruster wrote:

Stefano Garzarella  writes:


In qemu_rbd_connect(), 'mon_host' is allocated by qemu_rbd_mon_host()
using g_strjoinv(), but it's only freed in the error path, leaking
memory in the success path as reported by valgrind:

  80 bytes in 4 blocks are definitely lost in loss record 5,028 of 6,516
 at 0x4839809: malloc (vg_replace_malloc.c:307)
 by 0x5315BB8: g_malloc (in /usr/lib64/libglib-2.0.so.0.6600.8)
 by 0x532B6FF: g_strjoinv (in /usr/lib64/libglib-2.0.so.0.6600.8)
 by 0x87D07E: qemu_rbd_mon_host (rbd.c:538)
 by 0x87D07E: qemu_rbd_connect (rbd.c:562)
 by 0x87E1CE: qemu_rbd_open (rbd.c:740)
 by 0x840EB1: bdrv_open_driver (block.c:1528)
 by 0x8453A9: bdrv_open_common (block.c:1802)
 by 0x8453A9: bdrv_open_inherit (block.c:3444)
 by 0x8464C2: bdrv_open (block.c:3537)
 by 0x8108CD: qmp_blockdev_add (blockdev.c:3569)
 by 0x8EA61B: qmp_marshal_blockdev_add (qapi-commands-block-core.c:1086)
 by 0x90B528: do_qmp_dispatch_bh (qmp-dispatch.c:131)
 by 0x907EA4: aio_bh_poll (async.c:164)

Fix freeing 'mon_host' also when qemu_rbd_connect() ends correctly.

Signed-off-by: Stefano Garzarella 


I believe this
Fixes: 0a55679b4a5061f4d74bdb1a0e81611ba3390b00


Yep :-)



Reviewed-by: Markus Armbruster 



Thanks,
Stefano

Re: [PATCH 0/6] Add debug interface to kick/call on purpose

2021-04-08 Thread Jason Wang




在 2021/4/8 下午1:51, Dongli Zhang 写道:


On 4/6/21 7:20 PM, Jason Wang wrote:

在 2021/4/7 上午7:27, Dongli Zhang 写道:

This will answer your question that "Can it bypass the masking?".

For vhost-scsi, virtio-blk, virtio-scsi and virtio-net, to write to eventfd is
not able to bypass masking because masking is to unregister the eventfd. To
write to eventfd does not take effect.

However, it is possible to bypass masking for vhost-net because vhost-net
registered a specific masked_notifier eventfd in order to mask irq. To write to
original eventfd still takes effect.

We may leave the user to decide whether to write to 'masked_notifier' or
original 'guest_notifier' for vhost-net.

My fault here. To write to masked_notifier will always be masked:(


Only when there's no bug in the qemu.



If it is EventNotifier level, we will not care whether the EventNotifier is
masked or not. It just provides an interface to write to EventNotifier.


Yes.



To dump the MSI-x table for both virtio and vfio will help confirm if the vector
is masked.


That would be helpful as well. It's probably better to extend "info pci" 
command.

Thanks

I will try if to add to "info pci" (introduce new arg option to "info pci"), or
to introduce new command.



It's better to just reuse "info pci".




About the EventNotifier, I will classify them as guest notifier or host notifier
so that it will be much more easier for user to tell if the eventfd is for
injecting IRQ or kicking the doorbell.



Sounds good.




Thank you very much for all suggestions!

Dongli Zhang



Thanks





Thank you very much!

Dongli Zhang

70 matches

Mail list logo