Re: [PATCH] hw/block/nvme: map prp fix if prp2 contains non-zero offset
On Thu, Apr 08, 2021 at 09:53:13PM +0530, Padmakar Kalghatgi wrote: > +/* > + * The first PRP list entry, pointed by PRP2 can contain > + * offsets. Hence, we need calculate the no of entries in > + * prp2 based on the offset it has. > + */ This comment has some unnecessary spacing at the beginning. > +nents = (n->page_size - (prp2 % n->page_size)) >> 3; page_size is a always a power of two, so let's replace the costly modulo with: nents = (n->page_size - (prp2 & (n->page_size - 1))) >> 3;
Re: [RFC PATCH v2 02/11] python: qemu: pass the wrapper field from QEMUQtestmachine to QEMUMachine
On 4/7/21 9:50 AM, Emanuele Giuseppe Esposito wrote: Signed-off-by: Emanuele Giuseppe Esposito --- python/qemu/machine.py | 2 +- python/qemu/qtest.py | 4 +++- 2 files changed, 4 insertions(+), 2 deletions(-) diff --git a/python/qemu/machine.py b/python/qemu/machine.py index c721e07d63..18d32ebe45 100644 --- a/python/qemu/machine.py +++ b/python/qemu/machine.py @@ -109,7 +109,7 @@ def __init__(self, self._binary = binary self._args = list(args) -self._wrapper = wrapper +self._wrapper = list(wrapper) Unrelated change? (I'm assuming you want to copy the user's input to explicitly avoid sharing state. Commit message blurb for this would be good.) self._name = name or "qemu-%d" % os.getpid() self._test_dir = test_dir diff --git a/python/qemu/qtest.py b/python/qemu/qtest.py index 0d01715086..4c90daf430 100644 --- a/python/qemu/qtest.py +++ b/python/qemu/qtest.py @@ -111,6 +111,7 @@ class QEMUQtestMachine(QEMUMachine): def __init__(self, binary: str, args: Sequence[str] = (), + wrapper: Sequence[str] = (), name: Optional[str] = None, test_dir: str = "/var/tmp", socket_scm_helper: Optional[str] = None, @@ -119,7 +120,8 @@ def __init__(self, name = "qemu-%d" % os.getpid() if sock_dir is None: sock_dir = test_dir -super().__init__(binary, args, name=name, test_dir=test_dir, +super().__init__(binary, args, wrapper=wrapper, name=name, + test_dir=test_dir, socket_scm_helper=socket_scm_helper, sock_dir=sock_dir) self._qtest: Optional[QEMUQtestProtocol] = None ACK
Re: [RFC PATCH v2 01/11] python: qemu: add timer parameter for qmp.accept socket
On 4/7/21 9:50 AM, Emanuele Giuseppe Esposito wrote: Extend the _post_launch function to include the timer as parameter instead of defaulting to 15 sec. Signed-off-by: Emanuele Giuseppe Esposito --- python/qemu/machine.py | 4 ++-- python/qemu/qtest.py | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/python/qemu/machine.py b/python/qemu/machine.py index 6e44bda337..c721e07d63 100644 --- a/python/qemu/machine.py +++ b/python/qemu/machine.py @@ -321,9 +321,9 @@ def _pre_launch(self) -> None: nickname=self._name ) -def _post_launch(self) -> None: +def _post_launch(self, timer) -> None: if self._qmp_connection: -self._qmp.accept() +self._qmp.accept(timer) def _post_shutdown(self) -> None: """ diff --git a/python/qemu/qtest.py b/python/qemu/qtest.py index 39a0cf62fe..0d01715086 100644 --- a/python/qemu/qtest.py +++ b/python/qemu/qtest.py @@ -138,9 +138,9 @@ def _pre_launch(self) -> None: super()._pre_launch() self._qtest = QEMUQtestProtocol(self._qtest_path, server=True) -def _post_launch(self) -> None: +def _post_launch(self, timer) -> None: assert self._qtest is not None -super()._post_launch() +super()._post_launch(timer) self._qtest.accept() def _post_shutdown(self) -> None: Are you forgetting to change _launch() to provide some default value for what timer needs to be? I think for the "event" callbacks here, I'd prefer configuring the behavior as a property instead of passing it around as a parameter. (Also, we have an awful lot of timeouts now... is it time to think about rewriting this using asyncio so that we can allow the callers to specify their own timeouts in with context blocks? Just a thought for later; we have an awful lot of timeouts scattered throughout machine.py, qmp.py, etc.) --js
[PATCH 2/2] hw/block/nvme: drain namespaces on sq deletion
From: Klaus Jensen For most commands, when issuing an AIO, the BlockAIOCB is stored in the NvmeRequest aiocb pointer when the AIO is issued. The main use of this is cancelling AIOs when deleting submission queues (it is currently not used for Abort). However, some commands like Dataset Management Zone Management Send (zone reset) may involve more than one AIO and here the AIOs are issued without saving a reference to the BlockAIOCB. This is a problem since nvme_del_sq() will attempt to cancel outstanding AIOs, potentially with an invalid BlockAIOCB since the aiocb pointer is not NULL'ed when the request structure is recycled. Fix this by 1. making sure the aiocb pointer is NULL'ed when requests are recycled 2. only attempt to cancel the AIO if the aiocb is non-NULL 3. if any AIOs could not be cancelled, drain all aio as a last resort. Fixes: dc04d25e2f3f ("hw/block/nvme: add support for the format nvm command") Fixes: c94973288cd9 ("hw/block/nvme: add broadcast nsid support flush command") Fixes: e4e430b3d6ba ("hw/block/nvme: add simple copy command") Fixes: 5f5dc4c6a942 ("hw/block/nvme: zero out zones on reset") Fixes: 2605257a26b8 ("hw/block/nvme: add the dataset management command") Cc: Gollu Appalanaidu Cc: Minwoo Im Signed-off-by: Klaus Jensen --- hw/block/nvme.c | 23 +-- 1 file changed, 21 insertions(+), 2 deletions(-) diff --git a/hw/block/nvme.c b/hw/block/nvme.c index 94bc373260be..3c4297e38a52 100644 --- a/hw/block/nvme.c +++ b/hw/block/nvme.c @@ -470,6 +470,7 @@ static void nvme_req_clear(NvmeRequest *req) { req->ns = NULL; req->opaque = NULL; +req->aiocb = NULL; memset(>cqe, 0x0, sizeof(req->cqe)); req->status = NVME_SUCCESS; } @@ -3681,6 +3682,7 @@ static uint16_t nvme_del_sq(NvmeCtrl *n, NvmeRequest *req) NvmeSQueue *sq; NvmeCQueue *cq; uint16_t qid = le16_to_cpu(c->qid); +int nsid; if (unlikely(!qid || nvme_check_sqid(n, qid))) { trace_pci_nvme_err_invalid_del_sq(qid); @@ -3692,9 +3694,26 @@ static uint16_t nvme_del_sq(NvmeCtrl *n, NvmeRequest *req) sq = n->sq[qid]; while (!QTAILQ_EMPTY(>out_req_list)) { r = QTAILQ_FIRST(>out_req_list); -assert(r->aiocb); -blk_aio_cancel(r->aiocb); +if (r->aiocb) { +blk_aio_cancel(r->aiocb); +} } + +/* + * Drain all namespaces if there are still outstanding requests that we + * could not cancel explicitly. + */ +if (!QTAILQ_EMPTY(>out_req_list)) { +for (nsid = 1; nsid <= NVME_MAX_NAMESPACES; nsid++) { +NvmeNamespace *ns = nvme_ns(n, nsid); +if (ns) { +nvme_ns_drain(ns); +} +} +} + +assert(QTAILQ_EMPTY(>out_req_list)); + if (!nvme_check_cqid(n, sq->cqid)) { cq = n->cq[sq->cqid]; QTAILQ_REMOVE(>sq_list, sq, entry); -- 2.31.1
[PATCH 1/2] hw/block/nvme: store aiocb in compare
From: Klaus Jensen nvme_compare() fails to store the aiocb from the blk_aio_preadv() call. Fix this. Fixes: 0a384f923f51 ("hw/block/nvme: add compare command") Cc: Gollu Appalanaidu Signed-off-by: Klaus Jensen --- hw/block/nvme.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/hw/block/nvme.c b/hw/block/nvme.c index 6b1f056a0ebc..94bc373260be 100644 --- a/hw/block/nvme.c +++ b/hw/block/nvme.c @@ -2837,7 +2837,8 @@ static uint16_t nvme_compare(NvmeCtrl *n, NvmeRequest *req) block_acct_start(blk_get_stats(blk), >acct, data_len, BLOCK_ACCT_READ); -blk_aio_preadv(blk, offset, >data.iov, 0, nvme_compare_data_cb, req); +req->aiocb = blk_aio_preadv(blk, offset, >data.iov, 0, +nvme_compare_data_cb, req); return NVME_NO_COMPLETE; } -- 2.31.1
Re: [PATCH] hw/block/nvme: map prp fix if prp2 contains non-zero offset
On Apr 8 21:53, Padmakar Kalghatgi wrote: From: padmakar nvme_map_prp needs to calculate the number of list entries based on the offset value. For the subsequent PRP2 list, need to ensure the number of entries is within the MAX number of PRP entries for a page. Signed-off-by: Padmakar Kalghatgi --- hw/block/nvme.c | 12 +--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/hw/block/nvme.c b/hw/block/nvme.c index d439e44..efb7368 100644 --- a/hw/block/nvme.c +++ b/hw/block/nvme.c @@ -577,7 +577,12 @@ static uint16_t nvme_map_prp(NvmeCtrl *n, NvmeSg *sg, uint64_t prp1, uint32_t nents, prp_trans; int i = 0; -nents = (len + n->page_size - 1) >> n->page_bits; +/* + * The first PRP list entry, pointed by PRP2 can contain + * offsets. Hence, we need calculate the no of entries in + * prp2 based on the offset it has. + */ +nents = (n->page_size - (prp2 % n->page_size)) >> 3; prp_trans = MIN(n->max_prp_ents, nents) * sizeof(uint64_t); ret = nvme_addr_read(n, prp2, (void *)prp_list, prp_trans); if (ret) { @@ -588,7 +593,7 @@ static uint16_t nvme_map_prp(NvmeCtrl *n, NvmeSg *sg, uint64_t prp1, while (len != 0) { uint64_t prp_ent = le64_to_cpu(prp_list[i]); -if (i == n->max_prp_ents - 1 && len > n->page_size) { +if (i == nents - 1 && len > n->page_size) { if (unlikely(prp_ent & (n->page_size - 1))) { trace_pci_nvme_err_invalid_prplist_ent(prp_ent); status = NVME_INVALID_PRP_OFFSET | NVME_DNR; @@ -597,7 +602,8 @@ static uint16_t nvme_map_prp(NvmeCtrl *n, NvmeSg *sg, uint64_t prp1, i = 0; nents = (len + n->page_size - 1) >> n->page_bits; -prp_trans = MIN(n->max_prp_ents, nents) * sizeof(uint64_t); +nents = MIN(nents, n->max_prp_ents); +prp_trans = nents * sizeof(uint64_t); ret = nvme_addr_read(n, prp_ent, (void *)prp_list, prp_trans); if (ret) { -- 2.7.0.windows.1 LGTM. Reviewed-by: Klaus Jensen signature.asc Description: PGP signature
Re: [RFC PATCH v2 04/11] qemu-iotests: delay QMP socket timers
Il gio 8 apr 2021, 18:06 Emanuele Giuseppe Esposito ha scritto: > > > On 08/04/2021 17:40, Paolo Bonzini wrote: > > On 07/04/21 15:50, Emanuele Giuseppe Esposito wrote: > >> def get_qmp_events_filtered(self, wait=60.0): > >> result = [] > >> -for ev in self.get_qmp_events(wait=wait): > >> +qmp_wait = wait > >> +if qemu_gdb: > >> +qmp_wait = 0.0 > >> +for ev in self.get_qmp_events(wait=qmp_wait): > >> result.append(filter_qmp_event(ev)) > >> return result > > > > Should this be handled in get_qmp_events instead, since you're basically > > changing all the callers? > > get_qmp_events is in python/machine.py, which as I understand might be > used also by some other scripts, so I want to keep the changes there to > the minimum. Also, machine.py has no access to qemu_gdb or > qemu_valgrind, so passing a boolean or something to delay the timer > would still require to add a similar check in all sections. > > Or do you have a cleaner way to do this? > Maybe a subclass IotestsMachine? Paolo > Emanuele > >
[PATCH] hw/block/nvme: map prp fix if prp2 contains non-zero offset
From: padmakar nvme_map_prp needs to calculate the number of list entries based on the offset value. For the subsequent PRP2 list, need to ensure the number of entries is within the MAX number of PRP entries for a page. Signed-off-by: Padmakar Kalghatgi --- hw/block/nvme.c | 12 +--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/hw/block/nvme.c b/hw/block/nvme.c index d439e44..efb7368 100644 --- a/hw/block/nvme.c +++ b/hw/block/nvme.c @@ -577,7 +577,12 @@ static uint16_t nvme_map_prp(NvmeCtrl *n, NvmeSg *sg, uint64_t prp1, uint32_t nents, prp_trans; int i = 0; -nents = (len + n->page_size - 1) >> n->page_bits; +/* + * The first PRP list entry, pointed by PRP2 can contain + * offsets. Hence, we need calculate the no of entries in + * prp2 based on the offset it has. + */ +nents = (n->page_size - (prp2 % n->page_size)) >> 3; prp_trans = MIN(n->max_prp_ents, nents) * sizeof(uint64_t); ret = nvme_addr_read(n, prp2, (void *)prp_list, prp_trans); if (ret) { @@ -588,7 +593,7 @@ static uint16_t nvme_map_prp(NvmeCtrl *n, NvmeSg *sg, uint64_t prp1, while (len != 0) { uint64_t prp_ent = le64_to_cpu(prp_list[i]); -if (i == n->max_prp_ents - 1 && len > n->page_size) { +if (i == nents - 1 && len > n->page_size) { if (unlikely(prp_ent & (n->page_size - 1))) { trace_pci_nvme_err_invalid_prplist_ent(prp_ent); status = NVME_INVALID_PRP_OFFSET | NVME_DNR; @@ -597,7 +602,8 @@ static uint16_t nvme_map_prp(NvmeCtrl *n, NvmeSg *sg, uint64_t prp1, i = 0; nents = (len + n->page_size - 1) >> n->page_bits; -prp_trans = MIN(n->max_prp_ents, nents) * sizeof(uint64_t); +nents = MIN(nents, n->max_prp_ents); +prp_trans = nents * sizeof(uint64_t); ret = nvme_addr_read(n, prp_ent, (void *)prp_list, prp_trans); if (ret) { -- 2.7.0.windows.1
Re: [PATCH for-6.0? 1/3] job: Add job_wait_unpaused() for block-job-complete
08.04.2021 20:04, John Snow wrote: On 4/8/21 12:58 PM, Vladimir Sementsov-Ogievskiy wrote: job-complete command is async. Can we instead just add a boolean like job->completion_requested, and set it if job-complete called in STANDBY state, and on job_resume job_complete will be called automatically if this boolean is true? job_complete has a synchronous setup, though -- we lose out on a lot of synchronous error checking in that circumstance. yes, that's a problem.. I was not able to audit it to determine that it'd be safe to attempt that setup during a drained section -- I imagine it won't work and will fail, though. So I thought we'd have to signal completion and run the setup *later*, but what do we do if we get an error then? Does the entire job fail? Do we emit some new event? ("BLOCK_JOB_COMPLETION_FAILED" ?) Is it recoverable? Isn't it possible even now, that after successful job-complete job still fails and we report BLOCK_JOB_COMPLETED with error? And actually, how much benefit user get from the fact that job-complete may fail? We can make job-complete a simple always-success boolean flag setter like job-pause. And actual completion will be done in background, when possible. And if it fail, job just fails, like it does for any background io error. And user have to check error/success status of final BLOCK_JOB_COMPLETED anyway. -- Best regards, Vladimir
Re: [PATCH v2 00/10] block/nbd: move connection code to separate file
On Thu, Apr 08, 2021 at 05:08:17PM +0300, Vladimir Sementsov-Ogievskiy wrote: > Hi all! > > This substitutes "[PATCH 00/14] nbd: move reconnect-thread to separate file" > Supersedes: <20210407104637.36033-1-vsement...@virtuozzo.com> > > I want to simplify block/nbd.c which is overcomplicated now. First step > is splitting out what could be split. > > These series creates new file nbd/client-connection.c and part of > block/nbd.c is refactored and moved. > > v2 is mostly rewritten. I decided move larger part, otherwise it doesn't > make real sense. > > Note also that v2 is based on master. Patch 01 actually solves same > problem as > "[PATCH for-6.0] block/nbd: fix possible use after free of s->connect_thread" > [*] > in a smarter way. So, if [*] goes first, this will be rebased to undo > [*]. > > Vladimir Sementsov-Ogievskiy (10): > block/nbd: introduce NBDConnectThread reference counter > block/nbd: BDRVNBDState: drop unused connect_err and connect_status > util/async: aio_co_enter(): do aio_co_schedule in general case > block/nbd: simplify waking of nbd_co_establish_connection() > block/nbd: drop thr->state > block/nbd: bs-independent interface for nbd_co_establish_connection() > block/nbd: make nbd_co_establish_connection_cancel() bs-independent > block/nbd: rename NBDConnectThread to NBDClientConnection > block/nbd: introduce nbd_client_connection_new() > nbd: move connection code from block/nbd to nbd/client-connection > > include/block/nbd.h | 11 ++ > block/nbd.c | 288 ++-- > nbd/client-connection.c | 192 +++ > util/async.c| 11 +- > nbd/meson.build | 1 + > 5 files changed, 218 insertions(+), 285 deletions(-) > create mode 100644 nbd/client-connection.c I think this is a nice cleanup overall, and makes the logic in block/nbd.c much easier to reason about. I guess it's 6.1 material though, as it looks somewhat too big for 6.0, and the only serious bug it actually fixes can be addressed with a band-aid mentioned above. The problem I originally came across with, that of the requests being canceled on drain despite reconnect, still remains, but I think the fix for it should build up on this series (and thus probably wait till after 6.0). Thanks, Roman.
Re: [PATCH for-6.0? 0/3] job: Add job_wait_unpaused() for block-job-complete
On 4/8/21 12:20 PM, Max Reitz wrote: Hi, See patch 1 for a detailed explanation of the problem. The gist is: Draining a READY job makes it transition to STANDBY, and jobs on STANDBY cannot be completed. Ending the drained section will schedule the job (so it is then resumed), but not wait until it is actually running again. Therefore, it can happen that issuing block-job-complete fails when you issue it right after some draining operation. I tried to come up with an iotest reproducer, but in the end I only got something that reproduced the issue like 2/10 times, and it required heavy I/O, so it is nothing I would like to have as part of the iotests. Instead, I opted for a unit test, which allows me to cheat a bit (specifically, locking the job IO thread before ending the drained section). Max Reitz (3): job: Add job_wait_unpaused() for block-job-complete test-blockjob: Test job_wait_unpaused() iotests/041: block-job-complete on user-paused job include/qemu/job.h | 15 blockdev.c | 3 + job.c | 42 +++ tests/unit/test-blockjob.c | 140 + tests/qemu-iotests/041 | 13 +++- 5 files changed, 212 insertions(+), 1 deletion(-) Left comments and review on #1, skimmed 2/3. Not sure if it's appropriate for 6.0 yet, that might depend on the responses to my comments and other reviewers and so on. Acked-by: John Snow
Re: [PATCH v2 10/10] nbd: move connection code from block/nbd to nbd/client-connection
08.04.2021 20:04, Roman Kagan wrote: On Thu, Apr 08, 2021 at 05:08:27PM +0300, Vladimir Sementsov-Ogievskiy wrote: We now have bs-independent connection API, which consists of four functions: nbd_client_connection_new() nbd_client_connection_unref() nbd_co_establish_connection() nbd_co_establish_connection_cancel() Move them to a separate file together with NBDClientConnection structure which becomes private to the new API. Signed-off-by: Vladimir Sementsov-Ogievskiy --- Hmm. I keep only Virtuozzo's copyright in a new file, as actually I've developed nbd-reconnection code. Still probably safer to save all copyrights. Let me now if you think so and I'll add them. Not my call. include/block/nbd.h | 11 +++ block/nbd.c | 167 -- nbd/client-connection.c | 192 nbd/meson.build | 1 + 4 files changed, 204 insertions(+), 167 deletions(-) create mode 100644 nbd/client-connection.c Reviewed-by: Roman Kagan Thanks a lot for reviewing! -- Best regards, Vladimir
Re: [PATCH v2 10/10] nbd: move connection code from block/nbd to nbd/client-connection
On Thu, Apr 08, 2021 at 05:08:27PM +0300, Vladimir Sementsov-Ogievskiy wrote: > We now have bs-independent connection API, which consists of four > functions: > > nbd_client_connection_new() > nbd_client_connection_unref() > nbd_co_establish_connection() > nbd_co_establish_connection_cancel() > > Move them to a separate file together with NBDClientConnection > structure which becomes private to the new API. > > Signed-off-by: Vladimir Sementsov-Ogievskiy > --- > > Hmm. I keep only Virtuozzo's copyright in a new file, as actually I've > developed nbd-reconnection code. Still probably safer to save all > copyrights. Let me now if you think so and I'll add them. Not my call. > include/block/nbd.h | 11 +++ > block/nbd.c | 167 -- > nbd/client-connection.c | 192 > nbd/meson.build | 1 + > 4 files changed, 204 insertions(+), 167 deletions(-) > create mode 100644 nbd/client-connection.c Reviewed-by: Roman Kagan
Re: [PATCH for-6.0? 1/3] job: Add job_wait_unpaused() for block-job-complete
On 4/8/21 12:58 PM, Vladimir Sementsov-Ogievskiy wrote: job-complete command is async. Can we instead just add a boolean like job->completion_requested, and set it if job-complete called in STANDBY state, and on job_resume job_complete will be called automatically if this boolean is true? job_complete has a synchronous setup, though -- we lose out on a lot of synchronous error checking in that circumstance. I was not able to audit it to determine that it'd be safe to attempt that setup during a drained section -- I imagine it won't work and will fail, though. So I thought we'd have to signal completion and run the setup *later*, but what do we do if we get an error then? Does the entire job fail? Do we emit some new event? ("BLOCK_JOB_COMPLETION_FAILED" ?) Is it recoverable? So on and so forth. Seems like a lot of things to consider, unless I am making a giant fuss about nothing again, not like that's ever happened. O:-) --js
Re: [PATCH for-6.0? 1/3] job: Add job_wait_unpaused() for block-job-complete
08.04.2021 19:20, Max Reitz wrote: block-job-complete can only be applied when the job is READY, not when it is on STANDBY (ready, but paused). Draining a job technically pauses it (which makes a READY job enter STANDBY), and ending the drained section does not synchronously resume it, but only schedules the job, which will then be resumed. So attempting to complete a job immediately after a drained section may sometimes fail. That is bad at least because users cannot really work nicely around this: A job may be paused and resumed at any time, so waiting for the job to be in the READY state and then issuing a block-job-complete poses a TOCTTOU problem. The only way around it would be to issue block-job-complete until it no longer fails due to the job being in the STANDBY state, but that would not be nice. We can solve the problem by allowing block-job-complete to be invoked on jobs that are on STANDBY, if that status is the result of a drained section (not because the user has paused the job), and that section has ended. That is, if the job is on STANDBY, but scheduled to be resumed. Perhaps we could actually just directly allow this, seeing that mirror is the only user of ready/complete, and that mirror_complete() could probably work under the given circumstances, but there may be many side effects to consider. It is simpler to add a function job_wait_unpaused() that waits for the job to be resumed (under said circumstances), and to make qmp_block_job_complete() use it to delay job_complete() until then. Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1945635 Signed-off-by: Max Reitz --- include/qemu/job.h | 15 +++ blockdev.c | 3 +++ job.c | 42 ++ 3 files changed, 60 insertions(+) diff --git a/include/qemu/job.h b/include/qemu/job.h index efc6fa7544..cf3082b6d7 100644 --- a/include/qemu/job.h +++ b/include/qemu/job.h @@ -563,4 +563,19 @@ void job_dismiss(Job **job, Error **errp); */ int job_finish_sync(Job *job, void (*finish)(Job *, Error **errp), Error **errp); +/** + * If the job has been paused because of a drained section, and that + * section has ended, wait until the job is resumed. + * + * Return 0 if the job is not paused, or if it has been successfully + * resumed. + * Return an error if the job has been paused in such a way that + * waiting will not resume it, i.e. if it has been paused by the user, + * or if it is still drained. + * + * Callers must be in the home AioContext and hold the AioContext lock + * of job->aio_context. + */ +int job_wait_unpaused(Job *job, Error **errp); + #endif diff --git a/blockdev.c b/blockdev.c index a57590aae4..c0cc2fa364 100644 --- a/blockdev.c +++ b/blockdev.c @@ -3414,6 +3414,9 @@ void qmp_block_job_complete(const char *device, Error **errp) return; } +if (job_wait_unpaused(>job, errp) < 0) { +return; +} trace_qmp_block_job_complete(job); job_complete(>job, errp); aio_context_release(aio_context); diff --git a/job.c b/job.c index 289edee143..1ea30fd294 100644 --- a/job.c +++ b/job.c @@ -1023,3 +1023,45 @@ int job_finish_sync(Job *job, void (*finish)(Job *, Error **errp), Error **errp) job_unref(job); return ret; } + +int job_wait_unpaused(Job *job, Error **errp) +{ +/* + * Only run this function from the main context, because this is + * what we need, and this way we do not have to think about what + * happens if the user concurrently pauses the job from the main + * monitor. + */ +assert(qemu_get_current_aio_context() == qemu_get_aio_context()); + +/* + * Quick path (e.g. so we do not get an error if pause_count > 0 + * but the job is not even paused) + */ +if (!job->paused) { +return 0; +} + +/* If the user has paused the job, waiting will not help */ +if (job->user_paused) { +error_setg(errp, "Job '%s' has been paused by the user", job->id); +return -EBUSY; +} + +/* Similarly, if the job is still drained, waiting will not help either */ +if (job->pause_count > 0) { +error_setg(errp, "Job '%s' is blocked and cannot be unpaused", job->id); +return -EBUSY; +} + +/* + * This function is specifically for waiting for a job to be + * resumed after a drained section. Ending the drained section + * includes a job_enter(), which schedules the job loop to be run, + * and once it does, job->paused will be cleared. Therefore, we + * do not need to invoke job_enter() here. + */ +AIO_WAIT_WHILE(job->aio_context, job->paused); + +return 0; +} Hmm.. It seems that when job->pause_count becomes 0, job_enter is called, and the period when pause_count is 0 but paused is still true should be relatively shot. And patch doesn't help if user call job-complete during drained section. So it looks like the patch will help relatively seldom.. Or
Re: [PATCH v2 09/10] block/nbd: introduce nbd_client_connection_new()
On Thu, Apr 08, 2021 at 05:08:26PM +0300, Vladimir Sementsov-Ogievskiy wrote: > This is the last step of creating bs-independing nbd connection s/independing/independent/ > interface. With next commit we can finally move it to separate file. > > Signed-off-by: Vladimir Sementsov-Ogievskiy > --- > block/nbd.c | 15 +-- > 1 file changed, 9 insertions(+), 6 deletions(-) Reviewed-by: Roman Kagan
Re: [PATCH for-6.0? 1/3] job: Add job_wait_unpaused() for block-job-complete
On 4/8/21 12:20 PM, Max Reitz wrote: block-job-complete can only be applied when the job is READY, not when it is on STANDBY (ready, but paused). Draining a job technically pauses it (which makes a READY job enter STANDBY), and ending the drained section does not synchronously resume it, but only schedules the job, which will then be resumed. So attempting to complete a job immediately after a drained section may sometimes fail. That is bad at least because users cannot really work nicely around this: A job may be paused and resumed at any time, so waiting for the job to be in the READY state and then issuing a block-job-complete poses a TOCTTOU problem. The only way around it would be to issue block-job-complete until it no longer fails due to the job being in the STANDBY state, but that would not be nice. We can solve the problem by allowing block-job-complete to be invoked on jobs that are on STANDBY, if that status is the result of a drained section (not because the user has paused the job), and that section has ended. That is, if the job is on STANDBY, but scheduled to be resumed. Perhaps we could actually just directly allow this, seeing that mirror is the only user of ready/complete, and that mirror_complete() could probably work under the given circumstances, but there may be many side effects to consider. It is simpler to add a function job_wait_unpaused() that waits for the job to be resumed (under said circumstances), and to make qmp_block_job_complete() use it to delay job_complete() until then. Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1945635 Signed-off-by: Max Reitz --- include/qemu/job.h | 15 +++ blockdev.c | 3 +++ job.c | 42 ++ 3 files changed, 60 insertions(+) diff --git a/include/qemu/job.h b/include/qemu/job.h index efc6fa7544..cf3082b6d7 100644 --- a/include/qemu/job.h +++ b/include/qemu/job.h @@ -563,4 +563,19 @@ void job_dismiss(Job **job, Error **errp); */ int job_finish_sync(Job *job, void (*finish)(Job *, Error **errp), Error **errp); +/** + * If the job has been paused because of a drained section, and that + * section has ended, wait until the job is resumed. + * + * Return 0 if the job is not paused, or if it has been successfully + * resumed. + * Return an error if the job has been paused in such a way that + * waiting will not resume it, i.e. if it has been paused by the user, + * or if it is still drained. + * + * Callers must be in the home AioContext and hold the AioContext lock + * of job->aio_context. + */ +int job_wait_unpaused(Job *job, Error **errp); + #endif diff --git a/blockdev.c b/blockdev.c index a57590aae4..c0cc2fa364 100644 --- a/blockdev.c +++ b/blockdev.c @@ -3414,6 +3414,9 @@ void qmp_block_job_complete(const char *device, Error **errp) return; } +if (job_wait_unpaused(>job, errp) < 0) { +return; +} After which point, we assume we've transitioned back to either RUNNING or READY, and trace_qmp_block_job_complete(job); job_complete(>job, errp); This function checks the usual state table for permission to deliver/perform the verb. aio_context_release(aio_context); diff --git a/job.c b/job.c index 289edee143..1ea30fd294 100644 --- a/job.c +++ b/job.c @@ -1023,3 +1023,45 @@ int job_finish_sync(Job *job, void (*finish)(Job *, Error **errp), Error **errp) job_unref(job); return ret; } + +int job_wait_unpaused(Job *job, Error **errp) +{ +/* + * Only run this function from the main context, because this is + * what we need, and this way we do not have to think about what + * happens if the user concurrently pauses the job from the main + * monitor. + */ +assert(qemu_get_current_aio_context() == qemu_get_aio_context()); + +/* + * Quick path (e.g. so we do not get an error if pause_count > 0 + * but the job is not even paused) + */ +if (!job->paused) { +return 0; +} + +/* If the user has paused the job, waiting will not help */ +if (job->user_paused) { +error_setg(errp, "Job '%s' has been paused by the user", job->id); +return -EBUSY; +} + Or the job has encountered an error if that error policy is set. It is maybe more accurate to say that the job is currently paused/halted (for some reason) and is awaiting the explicit unpause instruction. "Job '%s' has been paused and needs to be explicitly resumed with job-resume", maybe? Job '%s' has been paused and needs to be [explicitly] resumed [by the user] [with job-resume] Some combo of those runes. +/* Similarly, if the job is still drained, waiting will not help either */ +if (job->pause_count > 0) { +error_setg(errp, "Job '%s' is blocked and cannot be unpaused", job->id); +return -EBUSY; +} + This leaks an internal state detail out to the caller. In which circumstances does this happen? Do we
Re: [PATCH v2 08/10] block/nbd: rename NBDConnectThread to NBDClientConnection
On Thu, Apr 08, 2021 at 05:08:25PM +0300, Vladimir Sementsov-Ogievskiy wrote: > We are going to move connection code to own file and want clear names > and APIs. > > The structure is shared between user and (possibly) several runs of > connect-thread. So it's wrong to call it "thread". Let's rename to > something more generic. > > Appropriately rename connect_thread and thr variables to conn. > > connect_thread_state_unref() function gets new appropriate name too > > Signed-off-by: Vladimir Sementsov-Ogievskiy > --- > block/nbd.c | 127 ++-- > 1 file changed, 63 insertions(+), 64 deletions(-) [To other reviewers: in addition to renaming there's one blank line removed, hence the difference between (+) and (-)] Reviewed-by: Roman Kagan
Re: [PATCH v2 07/10] block/nbd: make nbd_co_establish_connection_cancel() bs-independent
On Thu, Apr 08, 2021 at 05:08:24PM +0300, Vladimir Sementsov-Ogievskiy wrote: > nbd_co_establish_connection_cancel() actually needs only pointer to > NBDConnectThread. So, make it clean. > > Signed-off-by: Vladimir Sementsov-Ogievskiy > --- > block/nbd.c | 16 +++- > 1 file changed, 7 insertions(+), 9 deletions(-) Reviewed-by: Roman Kagan
Re: [PATCH v2 06/10] block/nbd: bs-independent interface for nbd_co_establish_connection()
On Thu, Apr 08, 2021 at 05:08:23PM +0300, Vladimir Sementsov-Ogievskiy wrote: > We are going to split connection code to separate file. Now we are > ready to give nbd_co_establish_connection() clean and bs-independent > interface. > > Signed-off-by: Vladimir Sementsov-Ogievskiy > --- > block/nbd.c | 49 +++-- > 1 file changed, 31 insertions(+), 18 deletions(-) Reviewed-by: Roman Kagan
Re: [PATCH v2 05/10] block/nbd: drop thr->state
On Thu, Apr 08, 2021 at 05:08:22PM +0300, Vladimir Sementsov-Ogievskiy wrote: > Actually, the only bit of information we need is "is thread running or > not". We don't need all these states. So, instead of thr->state add > boolean variable thr->running and refactor the code. There's certain redundancy between thr->refcnt and thr->running. I wonder if it makes sense to employ this. Can be done later if deemed useful. > > Signed-off-by: Vladimir Sementsov-Ogievskiy > --- > block/nbd.c | 103 ++-- > 1 file changed, 27 insertions(+), 76 deletions(-) Reviewed-by: Roman Kagan
[PATCH for-6.0? 0/3] job: Add job_wait_unpaused() for block-job-complete
Hi, See patch 1 for a detailed explanation of the problem. The gist is: Draining a READY job makes it transition to STANDBY, and jobs on STANDBY cannot be completed. Ending the drained section will schedule the job (so it is then resumed), but not wait until it is actually running again. Therefore, it can happen that issuing block-job-complete fails when you issue it right after some draining operation. I tried to come up with an iotest reproducer, but in the end I only got something that reproduced the issue like 2/10 times, and it required heavy I/O, so it is nothing I would like to have as part of the iotests. Instead, I opted for a unit test, which allows me to cheat a bit (specifically, locking the job IO thread before ending the drained section). Max Reitz (3): job: Add job_wait_unpaused() for block-job-complete test-blockjob: Test job_wait_unpaused() iotests/041: block-job-complete on user-paused job include/qemu/job.h | 15 blockdev.c | 3 + job.c | 42 +++ tests/unit/test-blockjob.c | 140 + tests/qemu-iotests/041 | 13 +++- 5 files changed, 212 insertions(+), 1 deletion(-) -- 2.29.2
[PATCH for-6.0? 1/3] job: Add job_wait_unpaused() for block-job-complete
block-job-complete can only be applied when the job is READY, not when it is on STANDBY (ready, but paused). Draining a job technically pauses it (which makes a READY job enter STANDBY), and ending the drained section does not synchronously resume it, but only schedules the job, which will then be resumed. So attempting to complete a job immediately after a drained section may sometimes fail. That is bad at least because users cannot really work nicely around this: A job may be paused and resumed at any time, so waiting for the job to be in the READY state and then issuing a block-job-complete poses a TOCTTOU problem. The only way around it would be to issue block-job-complete until it no longer fails due to the job being in the STANDBY state, but that would not be nice. We can solve the problem by allowing block-job-complete to be invoked on jobs that are on STANDBY, if that status is the result of a drained section (not because the user has paused the job), and that section has ended. That is, if the job is on STANDBY, but scheduled to be resumed. Perhaps we could actually just directly allow this, seeing that mirror is the only user of ready/complete, and that mirror_complete() could probably work under the given circumstances, but there may be many side effects to consider. It is simpler to add a function job_wait_unpaused() that waits for the job to be resumed (under said circumstances), and to make qmp_block_job_complete() use it to delay job_complete() until then. Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1945635 Signed-off-by: Max Reitz --- include/qemu/job.h | 15 +++ blockdev.c | 3 +++ job.c | 42 ++ 3 files changed, 60 insertions(+) diff --git a/include/qemu/job.h b/include/qemu/job.h index efc6fa7544..cf3082b6d7 100644 --- a/include/qemu/job.h +++ b/include/qemu/job.h @@ -563,4 +563,19 @@ void job_dismiss(Job **job, Error **errp); */ int job_finish_sync(Job *job, void (*finish)(Job *, Error **errp), Error **errp); +/** + * If the job has been paused because of a drained section, and that + * section has ended, wait until the job is resumed. + * + * Return 0 if the job is not paused, or if it has been successfully + * resumed. + * Return an error if the job has been paused in such a way that + * waiting will not resume it, i.e. if it has been paused by the user, + * or if it is still drained. + * + * Callers must be in the home AioContext and hold the AioContext lock + * of job->aio_context. + */ +int job_wait_unpaused(Job *job, Error **errp); + #endif diff --git a/blockdev.c b/blockdev.c index a57590aae4..c0cc2fa364 100644 --- a/blockdev.c +++ b/blockdev.c @@ -3414,6 +3414,9 @@ void qmp_block_job_complete(const char *device, Error **errp) return; } +if (job_wait_unpaused(>job, errp) < 0) { +return; +} trace_qmp_block_job_complete(job); job_complete(>job, errp); aio_context_release(aio_context); diff --git a/job.c b/job.c index 289edee143..1ea30fd294 100644 --- a/job.c +++ b/job.c @@ -1023,3 +1023,45 @@ int job_finish_sync(Job *job, void (*finish)(Job *, Error **errp), Error **errp) job_unref(job); return ret; } + +int job_wait_unpaused(Job *job, Error **errp) +{ +/* + * Only run this function from the main context, because this is + * what we need, and this way we do not have to think about what + * happens if the user concurrently pauses the job from the main + * monitor. + */ +assert(qemu_get_current_aio_context() == qemu_get_aio_context()); + +/* + * Quick path (e.g. so we do not get an error if pause_count > 0 + * but the job is not even paused) + */ +if (!job->paused) { +return 0; +} + +/* If the user has paused the job, waiting will not help */ +if (job->user_paused) { +error_setg(errp, "Job '%s' has been paused by the user", job->id); +return -EBUSY; +} + +/* Similarly, if the job is still drained, waiting will not help either */ +if (job->pause_count > 0) { +error_setg(errp, "Job '%s' is blocked and cannot be unpaused", job->id); +return -EBUSY; +} + +/* + * This function is specifically for waiting for a job to be + * resumed after a drained section. Ending the drained section + * includes a job_enter(), which schedules the job loop to be run, + * and once it does, job->paused will be cleared. Therefore, we + * do not need to invoke job_enter() here. + */ +AIO_WAIT_WHILE(job->aio_context, job->paused); + +return 0; +} -- 2.29.2
[PATCH for-6.0? 2/3] test-blockjob: Test job_wait_unpaused()
Create a job that remains on STANDBY after a drained section, and see that invoking job_wait_unpaused() will get it unstuck. Signed-off-by: Max Reitz --- tests/unit/test-blockjob.c | 140 + 1 file changed, 140 insertions(+) diff --git a/tests/unit/test-blockjob.c b/tests/unit/test-blockjob.c index 7519847912..b7736e298d 100644 --- a/tests/unit/test-blockjob.c +++ b/tests/unit/test-blockjob.c @@ -16,6 +16,7 @@ #include "block/blockjob_int.h" #include "sysemu/block-backend.h" #include "qapi/qmp/qdict.h" +#include "iothread.h" static const BlockJobDriver test_block_job_driver = { .job_driver = { @@ -375,6 +376,144 @@ static void test_cancel_concluded(void) cancel_common(s); } +/* (See test_yielding_driver for the job description) */ +typedef struct YieldingJob { +BlockJob common; +bool should_complete; +} YieldingJob; + +static void yielding_job_complete(Job *job, Error **errp) +{ +YieldingJob *s = container_of(job, YieldingJob, common.job); +s->should_complete = true; +job_enter(job); +} + +static int coroutine_fn yielding_job_run(Job *job, Error **errp) +{ +YieldingJob *s = container_of(job, YieldingJob, common.job); + +job_transition_to_ready(job); + +while (!s->should_complete) { +job_yield(job); +} + +return 0; +} + +/* + * This job transitions immediately to the READY state, and then + * yields until it is to complete. + */ +static const BlockJobDriver test_yielding_driver = { +.job_driver = { +.instance_size = sizeof(YieldingJob), +.free = block_job_free, +.user_resume= block_job_user_resume, +.run= yielding_job_run, +.complete = yielding_job_complete, +}, +}; + +/* + * Test that job_wait_unpaused() can get jobs from a paused state to + * a running state so that job_complete() can be applied (assuming the + * pause occurred due to a drain that has already been lifted). + * (This is what QMP's block-job-complete does so it can be executed + * even immediately after some other operation instated and lifted a + * drain.) + * + * To do this, run YieldingJob in an IO thread, get it into the READY + * state, then have a drained section. Before ending the section, + * acquire the context so the job will not be entered and will thus + * remain on STANDBY. + * + * Invoking job_complete() then will fail. + * + * However, job_wait_unpaused() should see the job is to be resumed, + * wait for it to be resumed, and then we can invoke job_complete() + * without error. + * + * Note that on the QMP interface, it is impossible to lock an IO + * thread before a drained section ends. In practice, the + * bdrv_drain_all_end() and the aio_context_acquire() will be + * reversed. However, that makes for worse reproducibility here: + * Sometimes, the job would no longer be in STANDBY then but already + * be started. We cannot prevent that, because the IO thread runs + * concurrently. We can only prevent it by taking the lock before + * ending the drained section, so we do that. + * + * (You can reverse the order of operations and most of the time the + * test will pass, but sometimes the assert(status == STANDBY) will + * fail.) + */ +static void test_complete_in_standby(void) +{ +BlockBackend *blk; +IOThread *iothread; +AioContext *ctx; +Job *job; +BlockJob *bjob; +Error *local_err = NULL; + +/* Create a test drive, move it to an IO thread */ +blk = create_blk(NULL); +iothread = iothread_new(); + +ctx = iothread_get_aio_context(iothread); +blk_set_aio_context(blk, ctx, _abort); + +/* Create our test job */ +bjob = mk_job(blk, "job", _yielding_driver, true, + JOB_MANUAL_FINALIZE | JOB_MANUAL_DISMISS); +job = >job; +assert(job->status == JOB_STATUS_CREATED); + +/* Wait for the job to become READY */ +job_start(job); +aio_context_acquire(ctx); +AIO_WAIT_WHILE(ctx, job->status != JOB_STATUS_READY); +aio_context_release(ctx); + +/* Begin the drained section, pausing the job */ +bdrv_drain_all_begin(); +assert(job->status == JOB_STATUS_STANDBY); +/* Lock the IO thread to prevent the job from being run */ +aio_context_acquire(ctx); +/* This will schedule the job to resume it */ +bdrv_drain_all_end(); + +/* But the job cannot run, so it will remain on standby */ +assert(job->status == JOB_STATUS_STANDBY); + +/* A job on standby cannot be completed */ +job_complete(job, _err); +assert(local_err != NULL); +error_free(local_err); +local_err = NULL; + +/* + * But waiting for it and then completing it should work. + * (This is what qmp_block_job_complete() does.) + */ +job_wait_unpaused(job, _abort); +job_complete(job, _abort); + +/* The test is done now, clean up. */ +job_finish_sync(job, NULL, _abort); +assert(job->status == JOB_STATUS_PENDING); + +
[PATCH for-6.0? 3/3] iotests/041: block-job-complete on user-paused job
Expand test_pause() to check what happens when issuing block-job-complete on a job that is on STANDBY because it has been paused by the user. (This should be an error, and in particular not hang job_wait_unpaused().) Signed-off-by: Max Reitz --- tests/qemu-iotests/041 | 13 - 1 file changed, 12 insertions(+), 1 deletion(-) diff --git a/tests/qemu-iotests/041 b/tests/qemu-iotests/041 index 5cc02b24fc..d2c9669741 100755 --- a/tests/qemu-iotests/041 +++ b/tests/qemu-iotests/041 @@ -120,7 +120,18 @@ class TestSingleDrive(iotests.QMPTestCase): result = self.vm.qmp('block-job-resume', device='drive0') self.assert_qmp(result, 'return', {}) -self.complete_and_wait() +self.wait_ready() + +# Check that a job on STANDBY cannot be completed +self.pause_job('drive0') +result = self.vm.qmp('block-job-complete', device='drive0') +self.assert_qmp(result, 'error/desc', +"Job 'drive0' has been paused by the user") + +result = self.vm.qmp('block-job-resume', device='drive0') +self.assert_qmp(result, 'return', {}) + +self.complete_and_wait(wait_ready=False) self.vm.shutdown() self.assertTrue(iotests.compare_images(test_img, target_img), 'target image does not match source after mirroring') -- 2.29.2
Re: [PATCH v2 04/10] block/nbd: simplify waking of nbd_co_establish_connection()
On Thu, Apr 08, 2021 at 05:08:21PM +0300, Vladimir Sementsov-Ogievskiy wrote: > Instead of connect_bh, bh_ctx and wait_connect fields we can live with > only one link to waiting coroutine, protected by mutex. > > So new logic is: > > nbd_co_establish_connection() sets wait_co under mutex, release the > mutex and do yield(). Note, that wait_co may be scheduled by thread > immediately after unlocking the mutex. Still, in main thread (or > iothread) we'll not reach the code for entering the coroutine until the > yield() so we are safe. > > Both connect_thread_func() and nbd_co_establish_connection_cancel() do > the following to handle wait_co: > > Under mutex, if thr->wait_co is not NULL, call aio_co_wake() (which > never tries to acquire aio context since previous commit, so we are > safe to do it under thr->mutex) and set thr->wait_co to NULL. > This way we protect ourselves of scheduling it twice. > > Also this commit make nbd_co_establish_connection() less connected to > bs (we have generic pointer to the coroutine, not use s->connection_co > directly). So, we are on the way of splitting connection API out of > nbd.c (which is overcomplicated now). > > Signed-off-by: Vladimir Sementsov-Ogievskiy > --- > block/nbd.c | 49 + > 1 file changed, 9 insertions(+), 40 deletions(-) Reviewed-by: Roman Kagan
Re: [RFC PATCH v2 04/11] qemu-iotests: delay QMP socket timers
On 08/04/2021 17:40, Paolo Bonzini wrote: On 07/04/21 15:50, Emanuele Giuseppe Esposito wrote: def get_qmp_events_filtered(self, wait=60.0): result = [] - for ev in self.get_qmp_events(wait=wait): + qmp_wait = wait + if qemu_gdb: + qmp_wait = 0.0 + for ev in self.get_qmp_events(wait=qmp_wait): result.append(filter_qmp_event(ev)) return result Should this be handled in get_qmp_events instead, since you're basically changing all the callers? get_qmp_events is in python/machine.py, which as I understand might be used also by some other scripts, so I want to keep the changes there to the minimum. Also, machine.py has no access to qemu_gdb or qemu_valgrind, so passing a boolean or something to delay the timer would still require to add a similar check in all sections. Or do you have a cleaner way to do this? Emanuele
Re: [RFC PATCH v2 03/11] qemu-iotests: add option to attach gdbserver
On 08/04/2021 17:40, Paolo Bonzini wrote: On 07/04/21 15:50, Emanuele Giuseppe Esposito wrote: + self.gdb_qemu = os.getenv('GDB_QEMU') + + if gdb and not self.gdb_qemu: + self.gdb_qemu = 'localhost:12345' + elif self.gdb_qemu and not gdb: + del self.gdb_qemu + del os.environ['GDB_QEMU'] Alternatively: if gdb: self.gdb_qemu = os.environ.get('GDB_QEMU', 'localhost:12345') elif 'GDB_QEMU' in os.environ: del os.environ['GDB_QEMU'] makes sense, thanks. +GDB_QEMU -- "{GDB_QEMU}" Perhaps only include this if gdbserver is actually in use? (Or not at all, since gdbserver anyway prints the port). You forgot that by default all logs go to a log file :) so unless you find the corresponding log, it is not easy to find the GDB port. Thank you, Emanuele Paolo
[PATCH 5/5] blkdebug: protect rules and suspended_reqs with a lock
Co-developed-by: Paolo Bonzini Signed-off-by: Emanuele Giuseppe Esposito --- block/blkdebug.c | 32 +++- 1 file changed, 31 insertions(+), 1 deletion(-) diff --git a/block/blkdebug.c b/block/blkdebug.c index dffd869b32..e92a35ccbb 100644 --- a/block/blkdebug.c +++ b/block/blkdebug.c @@ -54,6 +54,7 @@ typedef struct BDRVBlkdebugState { /* For blkdebug_refresh_filename() */ char *config_file; +QemuMutex lock; QLIST_HEAD(, BlkdebugRule) rules[BLKDBG__MAX]; QSIMPLEQ_HEAD(, BlkdebugRule) active_rules; QLIST_HEAD(, BlkdebugSuspendedReq) suspended_reqs; @@ -245,7 +246,9 @@ static int add_rule(void *opaque, QemuOpts *opts, Error **errp) }; /* Add the rule */ +qemu_mutex_lock(>lock); QLIST_INSERT_HEAD(>rules[event], rule, next); +qemu_mutex_unlock(>lock); return 0; } @@ -468,6 +471,7 @@ static int blkdebug_open(BlockDriverState *bs, QDict *options, int flags, int ret; uint64_t align; +qemu_mutex_init(>lock); opts = qemu_opts_create(_opts, NULL, 0, _abort); if (!qemu_opts_absorb_qdict(opts, options, errp)) { ret = -EINVAL; @@ -568,6 +572,7 @@ static int blkdebug_open(BlockDriverState *bs, QDict *options, int flags, ret = 0; out: if (ret < 0) { +qemu_mutex_destroy(>lock); g_free(s->config_file); } qemu_opts_del(opts); @@ -582,6 +587,7 @@ static int rule_check(BlockDriverState *bs, uint64_t offset, uint64_t bytes, int error; bool immediately; +qemu_mutex_lock(>lock); QSIMPLEQ_FOREACH(rule, >active_rules, active_next) { uint64_t inject_offset = rule->options.inject.offset; @@ -595,6 +601,7 @@ static int rule_check(BlockDriverState *bs, uint64_t offset, uint64_t bytes, } if (!rule || !rule->options.inject.error) { +qemu_mutex_unlock(>lock); return 0; } @@ -606,6 +613,7 @@ static int rule_check(BlockDriverState *bs, uint64_t offset, uint64_t bytes, remove_rule(rule); } +qemu_mutex_unlock(>lock); if (!immediately) { aio_co_schedule(qemu_get_current_aio_context(), qemu_coroutine_self()); qemu_coroutine_yield(); @@ -771,8 +779,10 @@ static void blkdebug_close(BlockDriverState *bs) } g_free(s->config_file); +qemu_mutex_destroy(>lock); } +/* Called with lock held. */ static void suspend_request(BlockDriverState *bs, BlkdebugRule *rule) { BDRVBlkdebugState *s = bs->opaque; @@ -791,6 +801,7 @@ static void suspend_request(BlockDriverState *bs, BlkdebugRule *rule) } } +/* Called with lock held. */ static void process_rule(BlockDriverState *bs, struct BlkdebugRule *rule, int *action_count) { @@ -829,17 +840,22 @@ static void blkdebug_debug_event(BlockDriverState *bs, BlkdebugEvent event) assert((int)event >= 0 && event < BLKDBG__MAX); +qemu_mutex_lock(>lock); s->new_state = s->state; QLIST_FOREACH_SAFE(rule, >rules[event], next, next) { process_rule(bs, rule, actions_count); } +qemu_mutex_unlock(>lock); + while (actions_count[ACTION_SUSPEND] > 0) { qemu_coroutine_yield(); actions_count[ACTION_SUSPEND]--; } +qemu_mutex_lock(>lock); s->state = s->new_state; +qemu_mutex_unlock(>lock); } static int blkdebug_debug_breakpoint(BlockDriverState *bs, const char *event, @@ -862,11 +878,14 @@ static int blkdebug_debug_breakpoint(BlockDriverState *bs, const char *event, .options.suspend.tag = g_strdup(tag), }; +qemu_mutex_lock(>lock); QLIST_INSERT_HEAD(>rules[blkdebug_event], rule, next); +qemu_mutex_unlock(>lock); return 0; } +/* Called with lock held. */ static int resume_req_by_tag(BDRVBlkdebugState *s, const char *tag, bool all) { BlkdebugSuspendedReq *r; @@ -884,7 +903,9 @@ retry: g_free(r->tag); g_free(r); +qemu_mutex_unlock(>lock); qemu_coroutine_enter(co); +qemu_mutex_lock(>lock); if (all) { goto retry; @@ -898,8 +919,12 @@ retry: static int blkdebug_debug_resume(BlockDriverState *bs, const char *tag) { BDRVBlkdebugState *s = bs->opaque; +int rc; -return resume_req_by_tag(s, tag, false); +qemu_mutex_lock(>lock); +rc = resume_req_by_tag(s, tag, false); +qemu_mutex_unlock(>lock); +return rc; } static int blkdebug_debug_remove_breakpoint(BlockDriverState *bs, @@ -909,6 +934,7 @@ static int blkdebug_debug_remove_breakpoint(BlockDriverState *bs, BlkdebugRule *rule, *next; int i, ret = -ENOENT; +qemu_mutex_lock(>lock); for (i = 0; i < BLKDBG__MAX; i++) { QLIST_FOREACH_SAFE(rule, >rules[i], next, next) { if (rule->action == ACTION_SUSPEND && @@ -921,6 +947,7 @@ static int blkdebug_debug_remove_breakpoint(BlockDriverState *bs, if (resume_req_by_tag(s, tag, true) == 0) {
[PATCH 4/5] blkdebug: do not suspend in the middle of QLIST_FOREACH_SAFE
Use actions_count to see how many yeld to issue. Co-developed-by: Paolo Bonzini Signed-off-by: Emanuele Giuseppe Esposito --- block/blkdebug.c | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/block/blkdebug.c b/block/blkdebug.c index 388b5ed615..dffd869b32 100644 --- a/block/blkdebug.c +++ b/block/blkdebug.c @@ -789,7 +789,6 @@ static void suspend_request(BlockDriverState *bs, BlkdebugRule *rule) if (!qtest_enabled()) { printf("blkdebug: Suspended request '%s'\n", r->tag); } -qemu_coroutine_yield(); } static void process_rule(BlockDriverState *bs, struct BlkdebugRule *rule, @@ -834,6 +833,12 @@ static void blkdebug_debug_event(BlockDriverState *bs, BlkdebugEvent event) QLIST_FOREACH_SAFE(rule, >rules[event], next, next) { process_rule(bs, rule, actions_count); } + +while (actions_count[ACTION_SUSPEND] > 0) { +qemu_coroutine_yield(); +actions_count[ACTION_SUSPEND]--; +} + s->state = s->new_state; } -- 2.30.2
[PATCH 0/5] blkdebug: fix racing condition when iterating on
When qemu_coroutine_enter is executed in a loop (even QEMU_FOREACH_SAFE), the new routine can modify the list, for example removing an element, causing problem when control is given back to the caller that continues iterating on the same list. Patch 1 solves the issue in blkdebug_debug_resume by restarting the list walk after every coroutine_enter if list has to be fully iterated. Patches 2,3,4 aim to fix blkdebug_debug_event by gathering all actions that the rules make in a counter and invoking the respective coroutine_yeld only after processing all requests. Patch 5 adds a lock to protect rules and suspended_reqs. Signed-off-by: Emanuele Giuseppe Esposito Emanuele Giuseppe Esposito (5): blkdebug: refactor removal of a suspended request blkdebug: move post-resume handling to resume_req_by_tag blkdebug: track all actions blkdebug: do not suspend in the middle of QLIST_FOREACH_SAFE blkdebug: protect rules and suspended_reqs with a lock block/blkdebug.c | 113 +-- 1 file changed, 79 insertions(+), 34 deletions(-) -- 2.30.2
[PATCH 3/5] blkdebug: track all actions
Add a counter for each action that a rule can trigger. This is mainly used to keep track of how many coroutine_yeld() we need to perform after processing all rules in the list. Co-developed-by: Paolo Bonzini Signed-off-by: Emanuele Giuseppe Esposito --- block/blkdebug.c | 17 - 1 file changed, 8 insertions(+), 9 deletions(-) diff --git a/block/blkdebug.c b/block/blkdebug.c index e37f999254..388b5ed615 100644 --- a/block/blkdebug.c +++ b/block/blkdebug.c @@ -74,6 +74,7 @@ enum { ACTION_INJECT_ERROR, ACTION_SET_STATE, ACTION_SUSPEND, +ACTION__MAX, }; typedef struct BlkdebugRule { @@ -791,22 +792,22 @@ static void suspend_request(BlockDriverState *bs, BlkdebugRule *rule) qemu_coroutine_yield(); } -static bool process_rule(BlockDriverState *bs, struct BlkdebugRule *rule, -bool injected) +static void process_rule(BlockDriverState *bs, struct BlkdebugRule *rule, + int *action_count) { BDRVBlkdebugState *s = bs->opaque; /* Only process rules for the current state */ if (rule->state && rule->state != s->state) { -return injected; +return; } /* Take the action */ +action_count[rule->action]++; switch (rule->action) { case ACTION_INJECT_ERROR: -if (!injected) { +if (action_count[ACTION_INJECT_ERROR] == 1) { QSIMPLEQ_INIT(>active_rules); -injected = true; } QSIMPLEQ_INSERT_HEAD(>active_rules, rule, active_next); break; @@ -819,21 +820,19 @@ static bool process_rule(BlockDriverState *bs, struct BlkdebugRule *rule, suspend_request(bs, rule); break; } -return injected; } static void blkdebug_debug_event(BlockDriverState *bs, BlkdebugEvent event) { BDRVBlkdebugState *s = bs->opaque; struct BlkdebugRule *rule, *next; -bool injected; +int actions_count[ACTION__MAX] = { 0 }; assert((int)event >= 0 && event < BLKDBG__MAX); -injected = false; s->new_state = s->state; QLIST_FOREACH_SAFE(rule, >rules[event], next, next) { -injected = process_rule(bs, rule, injected); +process_rule(bs, rule, actions_count); } s->state = s->new_state; } -- 2.30.2
[PATCH 2/5] blkdebug: move post-resume handling to resume_req_by_tag
We want to move qemu_coroutine_yield() after the loop on rules, because QLIST_FOREACH_SAFE is wrong if the rule list is modified while the coroutine has yielded. Therefore move the suspended request to the heap and clean it up from the remove side. All that is left is for blkdebug_debug_event to handle the yielding. Co-developed-by: Paolo Bonzini Signed-off-by: Emanuele Giuseppe Esposito --- block/blkdebug.c | 31 ++- 1 file changed, 18 insertions(+), 13 deletions(-) diff --git a/block/blkdebug.c b/block/blkdebug.c index 8f19d991fa..e37f999254 100644 --- a/block/blkdebug.c +++ b/block/blkdebug.c @@ -775,25 +775,20 @@ static void blkdebug_close(BlockDriverState *bs) static void suspend_request(BlockDriverState *bs, BlkdebugRule *rule) { BDRVBlkdebugState *s = bs->opaque; -BlkdebugSuspendedReq r; +BlkdebugSuspendedReq *r; -r = (BlkdebugSuspendedReq) { -.co = qemu_coroutine_self(), -.tag= g_strdup(rule->options.suspend.tag), -}; +r = g_new(BlkdebugSuspendedReq, 1); + +r->co = qemu_coroutine_self(); +r->tag= g_strdup(rule->options.suspend.tag); remove_rule(rule); -QLIST_INSERT_HEAD(>suspended_reqs, , next); +QLIST_INSERT_HEAD(>suspended_reqs, r, next); if (!qtest_enabled()) { -printf("blkdebug: Suspended request '%s'\n", r.tag); +printf("blkdebug: Suspended request '%s'\n", r->tag); } qemu_coroutine_yield(); -if (!qtest_enabled()) { -printf("blkdebug: Resuming request '%s'\n", r.tag); -} - -g_free(r.tag); } static bool process_rule(BlockDriverState *bs, struct BlkdebugRule *rule, @@ -875,8 +870,18 @@ static int resume_req_by_tag(BDRVBlkdebugState *s, const char *tag, bool all) retry: QLIST_FOREACH(r, >suspended_reqs, next) { if (!strcmp(r->tag, tag)) { +Coroutine *co = r->co; + +if (!qtest_enabled()) { +printf("blkdebug: Resuming request '%s'\n", r->tag); +} + QLIST_REMOVE(r, next); -qemu_coroutine_enter(r->co); +g_free(r->tag); +g_free(r); + +qemu_coroutine_enter(co); + if (all) { goto retry; } -- 2.30.2
[PATCH 1/5] blkdebug: refactor removal of a suspended request
Extract to a separate function. Do not rely on FOREACH_SAFE, which is only "safe" if the *current* node is removed---not if another node is removed. Instead, just walk the entire list from the beginning when asked to resume all suspended requests with a given tag. Co-developed-by: Paolo Bonzini Signed-off-by: Emanuele Giuseppe Esposito --- block/blkdebug.c | 28 +--- 1 file changed, 17 insertions(+), 11 deletions(-) diff --git a/block/blkdebug.c b/block/blkdebug.c index 2c0b9b0ee8..8f19d991fa 100644 --- a/block/blkdebug.c +++ b/block/blkdebug.c @@ -793,7 +793,6 @@ static void suspend_request(BlockDriverState *bs, BlkdebugRule *rule) printf("blkdebug: Resuming request '%s'\n", r.tag); } -QLIST_REMOVE(, next); g_free(r.tag); } @@ -869,25 +868,35 @@ static int blkdebug_debug_breakpoint(BlockDriverState *bs, const char *event, return 0; } -static int blkdebug_debug_resume(BlockDriverState *bs, const char *tag) +static int resume_req_by_tag(BDRVBlkdebugState *s, const char *tag, bool all) { -BDRVBlkdebugState *s = bs->opaque; -BlkdebugSuspendedReq *r, *next; +BlkdebugSuspendedReq *r; -QLIST_FOREACH_SAFE(r, >suspended_reqs, next, next) { +retry: +QLIST_FOREACH(r, >suspended_reqs, next) { if (!strcmp(r->tag, tag)) { +QLIST_REMOVE(r, next); qemu_coroutine_enter(r->co); +if (all) { +goto retry; +} return 0; } } return -ENOENT; } +static int blkdebug_debug_resume(BlockDriverState *bs, const char *tag) +{ +BDRVBlkdebugState *s = bs->opaque; + +return resume_req_by_tag(s, tag, false); +} + static int blkdebug_debug_remove_breakpoint(BlockDriverState *bs, const char *tag) { BDRVBlkdebugState *s = bs->opaque; -BlkdebugSuspendedReq *r, *r_next; BlkdebugRule *rule, *next; int i, ret = -ENOENT; @@ -900,11 +909,8 @@ static int blkdebug_debug_remove_breakpoint(BlockDriverState *bs, } } } -QLIST_FOREACH_SAFE(r, >suspended_reqs, next, r_next) { -if (!strcmp(r->tag, tag)) { -qemu_coroutine_enter(r->co); -ret = 0; -} +if (resume_req_by_tag(s, tag, true) == 0) { +ret = 0; } return ret; } -- 2.30.2
Re: [PATCH v2 03/10] util/async: aio_co_enter(): do aio_co_schedule in general case
On Thu, Apr 08, 2021 at 05:08:20PM +0300, Vladimir Sementsov-Ogievskiy wrote: > With the following patch we want to call aio_co_wake() from thread. > And it works bad. > Assume we have no iothreads. > Assume we have a coroutine A, which waits in the yield point for external > aio_co_wake(), and no progress can be done until it happen. > Main thread is in blocking aio_poll() (for example, in blk_read()). > > Now, in a separate thread we do aio_co_wake(). It calls aio_co_enter(), > which goes through last "else" branch and do aio_context_acquire(ctx). > > Now we have a deadlock, as aio_poll() will not release the context lock > until some progress is done, and progress can't be done until > aio_co_wake() wake the coroutine A. And it can't because it wait for > aio_context_acquire(). > > Still, aio_co_schedule() works well in parallel with blocking > aio_poll(). So let's use it in generic case and drop > aio_context_acquire/aio_context_release branch from aio_co_enter(). > > Signed-off-by: Vladimir Sementsov-Ogievskiy > --- > util/async.c | 11 ++- > 1 file changed, 2 insertions(+), 9 deletions(-) > > diff --git a/util/async.c b/util/async.c > index 674dbefb7c..f05b883a39 100644 > --- a/util/async.c > +++ b/util/async.c > @@ -614,19 +614,12 @@ void aio_co_wake(struct Coroutine *co) > > void aio_co_enter(AioContext *ctx, struct Coroutine *co) > { > -if (ctx != qemu_get_current_aio_context()) { > -aio_co_schedule(ctx, co); > -return; > -} > - > -if (qemu_in_coroutine()) { > +if (ctx == qemu_get_current_aio_context() && qemu_in_coroutine()) { > Coroutine *self = qemu_coroutine_self(); > assert(self != co); > QSIMPLEQ_INSERT_TAIL(>co_queue_wakeup, co, co_queue_next); > } else { > -aio_context_acquire(ctx); > -qemu_aio_coroutine_enter(ctx, co); > -aio_context_release(ctx); > +aio_co_schedule(ctx, co); > } > } I'm fine with the change, but I find the log message to be a bit confusing (although correct). AFAICS the problem is that calling aio_co_enter from a thread which has no associated aio_context works differently compared to calling it from a proper iothread: if the target context was qemu_aio_context, an iothread would just schedule the coroutine there, while a "dumb" thread would try lock the context potentially resulting in a deadlock. This patch makes "dumb" threads and iothreads behave identically when entering a coroutine on a foreign context. You may want to rephrase the log message to that end. Anyway Reviewed-by: Roman Kagan
Re: [RFC PATCH v2 04/11] qemu-iotests: delay QMP socket timers
On 07/04/21 15:50, Emanuele Giuseppe Esposito wrote: def get_qmp_events_filtered(self, wait=60.0): result = [] -for ev in self.get_qmp_events(wait=wait): +qmp_wait = wait +if qemu_gdb: +qmp_wait = 0.0 +for ev in self.get_qmp_events(wait=qmp_wait): result.append(filter_qmp_event(ev)) return result Should this be handled in get_qmp_events instead, since you're basically changing all the callers? Paolo
Re: [RFC PATCH v2 03/11] qemu-iotests: add option to attach gdbserver
On 07/04/21 15:50, Emanuele Giuseppe Esposito wrote: +self.gdb_qemu = os.getenv('GDB_QEMU') + +if gdb and not self.gdb_qemu: +self.gdb_qemu = 'localhost:12345' +elif self.gdb_qemu and not gdb: +del self.gdb_qemu +del os.environ['GDB_QEMU'] Alternatively: if gdb: self.gdb_qemu = os.environ.get('GDB_QEMU', 'localhost:12345') elif 'GDB_QEMU' in os.environ: del os.environ['GDB_QEMU'] +GDB_QEMU -- "{GDB_QEMU}" Perhaps only include this if gdbserver is actually in use? (Or not at all, since gdbserver anyway prints the port). Paolo
Re: [PATCH v2 02/10] block/nbd: BDRVNBDState: drop unused connect_err and connect_status
On Thu, Apr 08, 2021 at 05:08:19PM +0300, Vladimir Sementsov-Ogievskiy wrote: > These fields are write-only. Drop them. > > Signed-off-by: Vladimir Sementsov-Ogievskiy > --- > block/nbd.c | 12 ++-- > 1 file changed, 2 insertions(+), 10 deletions(-) Reviewed-by: Roman Kagan
Re: [PATCH v2 01/10] block/nbd: introduce NBDConnectThread reference counter
On Thu, Apr 08, 2021 at 05:08:18PM +0300, Vladimir Sementsov-Ogievskiy wrote: > The structure is shared between NBD BDS and connection thread. And it > is possible the connect thread will finish after closing and releasing > for the bs. To handle this we have a concept of > CONNECT_THREAD_RUNNING_DETACHED state and when thread is running and > BDS is going to be closed we don't free the structure, but instead move > it to CONNECT_THREAD_RUNNING_DETACHED state, so that thread will free > it. > > Still more native way to solve the problem is using reference counter > for shared structure. Let's use it. It makes code smaller and more > readable. > > New approach also makes checks in nbd_co_establish_connection() > redundant: now we are sure that s->connect_thread is valid during the > whole life of NBD BDS. > > This also fixes possible use-after-free of s->connect_thread if > nbd_co_establish_connection_cancel() clears it during > nbd_co_establish_connection(), and nbd_co_establish_connection() uses > local copy of s->connect_thread after yield point. > > Signed-off-by: Vladimir Sementsov-Ogievskiy > --- > block/nbd.c | 62 + > 1 file changed, 20 insertions(+), 42 deletions(-) Reviewed-by: Roman Kagan
Re: [ovirt-users] Q: Sparsify/fstrim don't reduce actual disk image size
On Thu, Apr 8, 2021 at 10:04 AM Andrei Verovski wrote: > > Hi, > > Many thanks, its worked ! Actual size shrunk from 584 to 9 GB, now I have > space to backup. > > Is there any guidelines how to format QCOW2 images (for Linux) so they can be > shrunk in an efficient way? > With this NextCloud/Collabora LVM I did in the following order: > swap > ext2 boot > ext4 root > ext4 var (large, for data, all cloud data stored here) > > Ext4 partitions on LVM. > > Or this is not predictable how data will span QCOW2 space? I think there is no way to avoid this issue with ovirt block storage. Regardless of how the data is laid out in the qcow2 image, when we don't have enough space ovirt extend the disk. This happens many times until the disk reaches the virtual size (actually more because of qcow2 metadata). For example we start with 1g image: [xx--] Now you write more and ovirt extend the image: [--] This repeats many times... [xxx---] When you sparsify, some clusters are marked as zero (or discarded), but if they are not at the end of the image we cannot shrink the image. [xxxxx---x-xxxxx-x--xx--xx-] When you copy the image to a new image the discarded or zero parts are skipped and the new image contains only the actual data. If qemu will support a way to defragment an image (best online): [---] ovirt can shrink the logical volume after that. [] Adding qemu-block since this is an interesting use case that may be useful to more qemu users. Nir > > On 7 Apr 2021, at 18:43, Nir Soffer wrote: > > > > On Wed, Apr 7, 2021 at 5:36 PM Andrei Verovski wrote: > >> > >> Hi ! > >> > >> I have VM (under oVirt) with single disk thin provision (~600 GB) > > > > I guess you are using block storage? > > > >> running NextCloud on Debian 9. > >> Right now VM HD is almost empty. Unfortunately, it occupies 584 GB > >> (virtual size = 600 GB) > >> All partition (except swap and boot) are EXT4 with discard option. > > > > You don't need to use discard mount option. fstrim works without it. > > > >> in oVirt “enable discard = on”. > >> > >> # fstrim -av runs successfully: > >> /var: 477.6 GiB (512851144704 bytes) trimmed on > >> /dev/mapper/vg--system-lv4--data > >> /boot: 853.8 MiB (895229952 bytes) trimmed on > >> /dev/mapper/vg--system-lv2--boot > >> /: 88.4 GiB (94888611840 bytes) trimmed on /dev/mapper/vg--system-lv3—sys > >> > >> When fstrim runs again, it trims zero. I even run “Sparsify” in oVirt. > >> Unfortunately, actual size is still 584 GB. > >> > >> Here is /etc/fstab > >> /dev/mapper/vg--system-lv3--sys / ext4 > >> discard,noatime,nodiratime,errors=remount-ro 0 1 > >> /dev/mapper/vg--system-lv2--boot /boot ext2defaults0 > >> 2 > >> /dev/mapper/vg--system-lv4--data /varext4 > >> discard,noatime,nodiratime 0 2 > >> /dev/mapper/vg--system-lv1--swap noneswapsw 0 > >> 0 > >> > >> When disk was partitioned/formatted, swap and boot were created first and > >> positioned at the beginning. > >> > >> What is wrong here? Is it possible to fix all this ? > > > > Discarding data mark the areas in the qcow2 image as zero, but it does not > > move > > actual data around (which is very slow). So if the clusters were at > > the end of the > > image they remain there, and the actual file size is the same. > > > > The only way to reclaim the space is: > > 1. sparsify disk - must be done *before* copying the disk. > > 2. move this to another storage domain > > 3. move disk back to the original storage domain > > > > We may have an easier and more efficient way in the future that > > works with single storage domain, but it will have to copy the > > entire disk. If the disk is really mostly empty it should be fast. > > > > Nir > > >
[PATCH v2 03/10] util/async: aio_co_enter(): do aio_co_schedule in general case
With the following patch we want to call aio_co_wake() from thread. And it works bad. Assume we have no iothreads. Assume we have a coroutine A, which waits in the yield point for external aio_co_wake(), and no progress can be done until it happen. Main thread is in blocking aio_poll() (for example, in blk_read()). Now, in a separate thread we do aio_co_wake(). It calls aio_co_enter(), which goes through last "else" branch and do aio_context_acquire(ctx). Now we have a deadlock, as aio_poll() will not release the context lock until some progress is done, and progress can't be done until aio_co_wake() wake the coroutine A. And it can't because it wait for aio_context_acquire(). Still, aio_co_schedule() works well in parallel with blocking aio_poll(). So let's use it in generic case and drop aio_context_acquire/aio_context_release branch from aio_co_enter(). Signed-off-by: Vladimir Sementsov-Ogievskiy --- util/async.c | 11 ++- 1 file changed, 2 insertions(+), 9 deletions(-) diff --git a/util/async.c b/util/async.c index 674dbefb7c..f05b883a39 100644 --- a/util/async.c +++ b/util/async.c @@ -614,19 +614,12 @@ void aio_co_wake(struct Coroutine *co) void aio_co_enter(AioContext *ctx, struct Coroutine *co) { -if (ctx != qemu_get_current_aio_context()) { -aio_co_schedule(ctx, co); -return; -} - -if (qemu_in_coroutine()) { +if (ctx == qemu_get_current_aio_context() && qemu_in_coroutine()) { Coroutine *self = qemu_coroutine_self(); assert(self != co); QSIMPLEQ_INSERT_TAIL(>co_queue_wakeup, co, co_queue_next); } else { -aio_context_acquire(ctx); -qemu_aio_coroutine_enter(ctx, co); -aio_context_release(ctx); +aio_co_schedule(ctx, co); } } -- 2.29.2
[PATCH v2 00/10] block/nbd: move connection code to separate file
Hi all! This substitutes "[PATCH 00/14] nbd: move reconnect-thread to separate file" Supersedes: <20210407104637.36033-1-vsement...@virtuozzo.com> I want to simplify block/nbd.c which is overcomplicated now. First step is splitting out what could be split. These series creates new file nbd/client-connection.c and part of block/nbd.c is refactored and moved. v2 is mostly rewritten. I decided move larger part, otherwise it doesn't make real sense. Note also that v2 is based on master. Patch 01 actually solves same problem as "[PATCH for-6.0] block/nbd: fix possible use after free of s->connect_thread" [*] in a smarter way. So, if [*] goes first, this will be rebased to undo [*]. Vladimir Sementsov-Ogievskiy (10): block/nbd: introduce NBDConnectThread reference counter block/nbd: BDRVNBDState: drop unused connect_err and connect_status util/async: aio_co_enter(): do aio_co_schedule in general case block/nbd: simplify waking of nbd_co_establish_connection() block/nbd: drop thr->state block/nbd: bs-independent interface for nbd_co_establish_connection() block/nbd: make nbd_co_establish_connection_cancel() bs-independent block/nbd: rename NBDConnectThread to NBDClientConnection block/nbd: introduce nbd_client_connection_new() nbd: move connection code from block/nbd to nbd/client-connection include/block/nbd.h | 11 ++ block/nbd.c | 288 ++-- nbd/client-connection.c | 192 +++ util/async.c| 11 +- nbd/meson.build | 1 + 5 files changed, 218 insertions(+), 285 deletions(-) create mode 100644 nbd/client-connection.c -- 2.29.2
[PATCH v2 06/10] block/nbd: bs-independent interface for nbd_co_establish_connection()
We are going to split connection code to separate file. Now we are ready to give nbd_co_establish_connection() clean and bs-independent interface. Signed-off-by: Vladimir Sementsov-Ogievskiy --- block/nbd.c | 49 +++-- 1 file changed, 31 insertions(+), 18 deletions(-) diff --git a/block/nbd.c b/block/nbd.c index 85c20e7810..a487fd1e68 100644 --- a/block/nbd.c +++ b/block/nbd.c @@ -122,7 +122,8 @@ typedef struct BDRVNBDState { static void connect_thread_state_unref(NBDConnectThread *thr); static int nbd_establish_connection(BlockDriverState *bs, SocketAddress *saddr, Error **errp); -static int nbd_co_establish_connection(BlockDriverState *bs, Error **errp); +static coroutine_fn QIOChannelSocket * +nbd_co_establish_connection(NBDConnectThread *thr, Error **errp); static void nbd_co_establish_connection_cancel(BlockDriverState *bs); static int nbd_client_handshake(BlockDriverState *bs, Error **errp); static void nbd_yank(void *opaque); @@ -390,22 +391,36 @@ static void *connect_thread_func(void *opaque) return NULL; } -static int coroutine_fn -nbd_co_establish_connection(BlockDriverState *bs, Error **errp) +/* + * Get a new connection in context of @thr: + * if thread is running, wait for completion + * if thread is already succeeded in background, and user didn't get the + * result, just return it now + * otherwise if thread is not running, start a thread and wait for completion + */ +static coroutine_fn QIOChannelSocket * +nbd_co_establish_connection(NBDConnectThread *thr, Error **errp) { +QIOChannelSocket *sioc = NULL; QemuThread thread; -BDRVNBDState *s = bs->opaque; -NBDConnectThread *thr = s->connect_thread; - -assert(!s->sioc); qemu_mutex_lock(>mutex); +/* + * Don't call nbd_co_establish_connection() in several coroutines in + * parallel. Only one call at once is supported. + */ +assert(!thr->wait_co); + if (!thr->running) { if (thr->sioc) { /* Previous attempt finally succeeded in background */ -goto out; +sioc = g_steal_pointer(>sioc); +qemu_mutex_unlock(>mutex); + +return sioc; } + thr->running = true; error_free(thr->err); thr->err = NULL; @@ -420,13 +435,12 @@ nbd_co_establish_connection(BlockDriverState *bs, Error **errp) /* * We are going to wait for connect-thread finish, but - * nbd_client_co_drain_begin() can interrupt. + * nbd_co_establish_connection_cancel() can interrupt. */ qemu_coroutine_yield(); qemu_mutex_lock(>mutex); -out: if (thr->running) { /* * Obviously, drained section wants to start. Report the attempt as @@ -437,17 +451,12 @@ out: } else { error_propagate(errp, thr->err); thr->err = NULL; -s->sioc = thr->sioc; -thr->sioc = NULL; -if (s->sioc) { -yank_register_function(BLOCKDEV_YANK_INSTANCE(bs->node_name), - nbd_yank, bs); -} +sioc = g_steal_pointer(>sioc); } qemu_mutex_unlock(>mutex); -return s->sioc ? 0 : -1; +return sioc; } /* @@ -514,11 +523,15 @@ static coroutine_fn void nbd_reconnect_attempt(BDRVNBDState *s) s->ioc = NULL; } -if (nbd_co_establish_connection(s->bs, NULL) < 0) { +s->sioc = nbd_co_establish_connection(s->connect_thread, NULL); +if (!s->sioc) { ret = -ECONNREFUSED; goto out; } +yank_register_function(BLOCKDEV_YANK_INSTANCE(s->bs->node_name), nbd_yank, + s->bs); + bdrv_dec_in_flight(s->bs); ret = nbd_client_handshake(s->bs, NULL); -- 2.29.2
[PATCH v2 10/10] nbd: move connection code from block/nbd to nbd/client-connection
We now have bs-independent connection API, which consists of four functions: nbd_client_connection_new() nbd_client_connection_unref() nbd_co_establish_connection() nbd_co_establish_connection_cancel() Move them to a separate file together with NBDClientConnection structure which becomes private to the new API. Signed-off-by: Vladimir Sementsov-Ogievskiy --- Hmm. I keep only Virtuozzo's copyright in a new file, as actually I've developed nbd-reconnection code. Still probably safer to save all copyrights. Let me now if you think so and I'll add them. include/block/nbd.h | 11 +++ block/nbd.c | 167 -- nbd/client-connection.c | 192 nbd/meson.build | 1 + 4 files changed, 204 insertions(+), 167 deletions(-) create mode 100644 nbd/client-connection.c diff --git a/include/block/nbd.h b/include/block/nbd.h index 5f34d23bb0..d4666b105e 100644 --- a/include/block/nbd.h +++ b/include/block/nbd.h @@ -406,4 +406,15 @@ const char *nbd_info_lookup(uint16_t info); const char *nbd_cmd_lookup(uint16_t info); const char *nbd_err_lookup(int err); +/* nbd/client-connection.c */ +typedef struct NBDClientConnection NBDClientConnection; + +NBDClientConnection *nbd_client_connection_new(const SocketAddress *saddr); +void nbd_client_connection_unref(NBDClientConnection *conn); + +QIOChannelSocket *coroutine_fn +nbd_co_establish_connection(NBDClientConnection *conn, Error **errp); + +void coroutine_fn nbd_co_establish_connection_cancel(NBDClientConnection *conn); + #endif diff --git a/block/nbd.c b/block/nbd.c index 376ab9f92d..1db86b7340 100644 --- a/block/nbd.c +++ b/block/nbd.c @@ -66,25 +66,6 @@ typedef enum NBDClientState { NBD_CLIENT_QUIT } NBDClientState; -typedef struct NBDClientConnection { -/* Initialization constants */ -SocketAddress *saddr; /* address to connect to */ - -/* - * Result of last attempt. Valid in FAIL and SUCCESS states. - * If you want to steal error, don't forget to set pointer to NULL. - */ -QIOChannelSocket *sioc; -Error *err; - -int refcnt; /* atomic access */ - -QemuMutex mutex; -/* All further fields are protected by mutex */ -bool running; /* thread is running now */ -Coroutine *wait_co; /* nbd_co_establish_connection() wait in yield() */ -} NBDClientConnection; - typedef struct BDRVNBDState { QIOChannelSocket *sioc; /* The master data channel */ QIOChannel *ioc; /* The current I/O channel which may differ (eg TLS) */ @@ -119,12 +100,8 @@ typedef struct BDRVNBDState { NBDClientConnection *conn; } BDRVNBDState; -static void nbd_client_connection_unref(NBDClientConnection *conn); static int nbd_establish_connection(BlockDriverState *bs, SocketAddress *saddr, Error **errp); -static coroutine_fn QIOChannelSocket * -nbd_co_establish_connection(NBDClientConnection *conn, Error **errp); -static void nbd_co_establish_connection_cancel(NBDClientConnection *conn); static int nbd_client_handshake(BlockDriverState *bs, Error **errp); static void nbd_yank(void *opaque); @@ -336,150 +313,6 @@ static bool nbd_client_connecting_wait(BDRVNBDState *s) return qatomic_load_acquire(>state) == NBD_CLIENT_CONNECTING_WAIT; } -static NBDClientConnection * -nbd_client_connection_new(const SocketAddress *saddr) -{ -NBDClientConnection *conn = g_new(NBDClientConnection, 1); - -*conn = (NBDClientConnection) { -.saddr = QAPI_CLONE(SocketAddress, saddr), -.refcnt = 1, -}; - -qemu_mutex_init(>mutex); - -return conn; -} - -static void nbd_client_connection_unref(NBDClientConnection *conn) -{ -if (qatomic_dec_fetch(>refcnt) == 0) { -if (conn->sioc) { -qio_channel_close(QIO_CHANNEL(conn->sioc), NULL); -} -error_free(conn->err); -qapi_free_SocketAddress(conn->saddr); -g_free(conn); -} -} - -static void *connect_thread_func(void *opaque) -{ -NBDClientConnection *conn = opaque; -int ret; - -conn->sioc = qio_channel_socket_new(); - -error_free(conn->err); -conn->err = NULL; -ret = qio_channel_socket_connect_sync(conn->sioc, conn->saddr, >err); -if (ret < 0) { -object_unref(OBJECT(conn->sioc)); -conn->sioc = NULL; -} - -qemu_mutex_lock(>mutex); - -assert(conn->running); -conn->running = false; -if (conn->wait_co) { -aio_co_wake(conn->wait_co); -conn->wait_co = NULL; -} - -qemu_mutex_unlock(>mutex); - -nbd_client_connection_unref(conn); - -return NULL; -} - -/* - * Get a new connection in context of @conn: - * if thread is running, wait for completion - * if thread is already succeeded in background, and user didn't get the - * result, just return it now - * otherwise if thread is not running, start a thread and wait for completion - */ -static coroutine_fn QIOChannelSocket *
[PATCH v2 07/10] block/nbd: make nbd_co_establish_connection_cancel() bs-independent
nbd_co_establish_connection_cancel() actually needs only pointer to NBDConnectThread. So, make it clean. Signed-off-by: Vladimir Sementsov-Ogievskiy --- block/nbd.c | 16 +++- 1 file changed, 7 insertions(+), 9 deletions(-) diff --git a/block/nbd.c b/block/nbd.c index a487fd1e68..ebbb0bec6a 100644 --- a/block/nbd.c +++ b/block/nbd.c @@ -124,7 +124,7 @@ static int nbd_establish_connection(BlockDriverState *bs, SocketAddress *saddr, Error **errp); static coroutine_fn QIOChannelSocket * nbd_co_establish_connection(NBDConnectThread *thr, Error **errp); -static void nbd_co_establish_connection_cancel(BlockDriverState *bs); +static void nbd_co_establish_connection_cancel(NBDConnectThread *thr); static int nbd_client_handshake(BlockDriverState *bs, Error **errp); static void nbd_yank(void *opaque); @@ -271,7 +271,7 @@ static void coroutine_fn nbd_client_co_drain_begin(BlockDriverState *bs) qemu_co_sleep_wake(s->connection_co_sleep_ns_state); } -nbd_co_establish_connection_cancel(bs); +nbd_co_establish_connection_cancel(s->connect_thread); reconnect_delay_timer_del(s); @@ -311,7 +311,7 @@ static void nbd_teardown_connection(BlockDriverState *bs) if (s->connection_co_sleep_ns_state) { qemu_co_sleep_wake(s->connection_co_sleep_ns_state); } -nbd_co_establish_connection_cancel(bs); +nbd_co_establish_connection_cancel(s->connect_thread); } if (qemu_in_coroutine()) { s->teardown_co = qemu_coroutine_self(); @@ -461,14 +461,12 @@ nbd_co_establish_connection(NBDConnectThread *thr, Error **errp) /* * nbd_co_establish_connection_cancel - * Cancel nbd_co_establish_connection asynchronously: it will finish soon, to - * allow drained section to begin. + * Cancel nbd_co_establish_connection() asynchronously. Note, that it doesn't + * stop the thread itself neither close the socket. It just safely wakes + * nbd_co_establish_connection() sleeping in the yield(). */ -static void nbd_co_establish_connection_cancel(BlockDriverState *bs) +static void nbd_co_establish_connection_cancel(NBDConnectThread *thr) { -BDRVNBDState *s = bs->opaque; -NBDConnectThread *thr = s->connect_thread; - qemu_mutex_lock(>mutex); if (thr->wait_co) { -- 2.29.2
[PATCH v2 09/10] block/nbd: introduce nbd_client_connection_new()
This is the last step of creating bs-independing nbd connection interface. With next commit we can finally move it to separate file. Signed-off-by: Vladimir Sementsov-Ogievskiy --- block/nbd.c | 15 +-- 1 file changed, 9 insertions(+), 6 deletions(-) diff --git a/block/nbd.c b/block/nbd.c index ab3ef13366..376ab9f92d 100644 --- a/block/nbd.c +++ b/block/nbd.c @@ -336,16 +336,19 @@ static bool nbd_client_connecting_wait(BDRVNBDState *s) return qatomic_load_acquire(>state) == NBD_CLIENT_CONNECTING_WAIT; } -static void nbd_init_connect_thread(BDRVNBDState *s) +static NBDClientConnection * +nbd_client_connection_new(const SocketAddress *saddr) { -s->conn = g_new(NBDClientConnection, 1); +NBDClientConnection *conn = g_new(NBDClientConnection, 1); -*s->conn = (NBDClientConnection) { -.saddr = QAPI_CLONE(SocketAddress, s->saddr), +*conn = (NBDClientConnection) { +.saddr = QAPI_CLONE(SocketAddress, saddr), .refcnt = 1, }; -qemu_mutex_init(>conn->mutex); +qemu_mutex_init(>mutex); + +return conn; } static void nbd_client_connection_unref(NBDClientConnection *conn) @@ -2211,7 +2214,7 @@ static int nbd_open(BlockDriverState *bs, QDict *options, int flags, /* successfully connected */ s->state = NBD_CLIENT_CONNECTED; -nbd_init_connect_thread(s); +s->conn = nbd_client_connection_new(s->saddr); s->connection_co = qemu_coroutine_create(nbd_connection_entry, s); bdrv_inc_in_flight(bs); -- 2.29.2
[PATCH v2 02/10] block/nbd: BDRVNBDState: drop unused connect_err and connect_status
These fields are write-only. Drop them. Signed-off-by: Vladimir Sementsov-Ogievskiy --- block/nbd.c | 12 ++-- 1 file changed, 2 insertions(+), 10 deletions(-) diff --git a/block/nbd.c b/block/nbd.c index 64b2977dc8..b0bbde741a 100644 --- a/block/nbd.c +++ b/block/nbd.c @@ -117,8 +117,6 @@ typedef struct BDRVNBDState { bool wait_drained_end; int in_flight; NBDClientState state; -int connect_status; -Error *connect_err; bool wait_in_flight; QEMUTimer *reconnect_delay_timer; @@ -556,7 +554,6 @@ static void nbd_co_establish_connection_cancel(BlockDriverState *bs) static coroutine_fn void nbd_reconnect_attempt(BDRVNBDState *s) { int ret; -Error *local_err = NULL; if (!nbd_client_connecting(s)) { return; @@ -597,14 +594,14 @@ static coroutine_fn void nbd_reconnect_attempt(BDRVNBDState *s) s->ioc = NULL; } -if (nbd_co_establish_connection(s->bs, _err) < 0) { +if (nbd_co_establish_connection(s->bs, NULL) < 0) { ret = -ECONNREFUSED; goto out; } bdrv_dec_in_flight(s->bs); -ret = nbd_client_handshake(s->bs, _err); +ret = nbd_client_handshake(s->bs, NULL); if (s->drained) { s->wait_drained_end = true; @@ -619,11 +616,6 @@ static coroutine_fn void nbd_reconnect_attempt(BDRVNBDState *s) bdrv_inc_in_flight(s->bs); out: -s->connect_status = ret; -error_free(s->connect_err); -s->connect_err = NULL; -error_propagate(>connect_err, local_err); - if (ret >= 0) { /* successfully connected */ s->state = NBD_CLIENT_CONNECTED; -- 2.29.2
[PATCH v2 05/10] block/nbd: drop thr->state
Actually, the only bit of information we need is "is thread running or not". We don't need all these states. So, instead of thr->state add boolean variable thr->running and refactor the code. Signed-off-by: Vladimir Sementsov-Ogievskiy --- block/nbd.c | 103 ++-- 1 file changed, 27 insertions(+), 76 deletions(-) diff --git a/block/nbd.c b/block/nbd.c index 4e28982e53..85c20e7810 100644 --- a/block/nbd.c +++ b/block/nbd.c @@ -66,18 +66,6 @@ typedef enum NBDClientState { NBD_CLIENT_QUIT } NBDClientState; -typedef enum NBDConnectThreadState { -/* No thread, no pending results */ -CONNECT_THREAD_NONE, - -/* Thread is running, no results for now */ -CONNECT_THREAD_RUNNING, - -/* Thread finished, results are stored in a state */ -CONNECT_THREAD_FAIL, -CONNECT_THREAD_SUCCESS -} NBDConnectThreadState; - typedef struct NBDConnectThread { /* Initialization constants */ SocketAddress *saddr; /* address to connect to */ @@ -93,7 +81,7 @@ typedef struct NBDConnectThread { QemuMutex mutex; /* All further fields are protected by mutex */ -NBDConnectThreadState state; /* current state of the thread */ +bool running; /* thread is running now */ Coroutine *wait_co; /* nbd_co_establish_connection() wait in yield() */ } NBDConnectThread; @@ -353,7 +341,6 @@ static void nbd_init_connect_thread(BDRVNBDState *s) *s->connect_thread = (NBDConnectThread) { .saddr = QAPI_CLONE(SocketAddress, s->saddr), -.state = CONNECT_THREAD_NONE, .refcnt = 1, }; @@ -389,16 +376,11 @@ static void *connect_thread_func(void *opaque) qemu_mutex_lock(>mutex); -switch (thr->state) { -case CONNECT_THREAD_RUNNING: -thr->state = ret < 0 ? CONNECT_THREAD_FAIL : CONNECT_THREAD_SUCCESS; -if (thr->wait_co) { -aio_co_wake(thr->wait_co); -thr->wait_co = NULL; -} -break; -default: -abort(); +assert(thr->running); +thr->running = false; +if (thr->wait_co) { +aio_co_wake(thr->wait_co); +thr->wait_co = NULL; } qemu_mutex_unlock(>mutex); @@ -411,37 +393,25 @@ static void *connect_thread_func(void *opaque) static int coroutine_fn nbd_co_establish_connection(BlockDriverState *bs, Error **errp) { -int ret; QemuThread thread; BDRVNBDState *s = bs->opaque; NBDConnectThread *thr = s->connect_thread; +assert(!s->sioc); + qemu_mutex_lock(>mutex); -switch (thr->state) { -case CONNECT_THREAD_FAIL: -case CONNECT_THREAD_NONE: +if (!thr->running) { +if (thr->sioc) { +/* Previous attempt finally succeeded in background */ +goto out; +} +thr->running = true; error_free(thr->err); thr->err = NULL; -thr->state = CONNECT_THREAD_RUNNING; qatomic_inc(>refcnt); /* for thread */ qemu_thread_create(, "nbd-connect", connect_thread_func, thr, QEMU_THREAD_DETACHED); -break; -case CONNECT_THREAD_SUCCESS: -/* Previous attempt finally succeeded in background */ -thr->state = CONNECT_THREAD_NONE; -s->sioc = thr->sioc; -thr->sioc = NULL; -yank_register_function(BLOCKDEV_YANK_INSTANCE(bs->node_name), - nbd_yank, bs); -qemu_mutex_unlock(>mutex); -return 0; -case CONNECT_THREAD_RUNNING: -/* Already running, will wait */ -break; -default: -abort(); } thr->wait_co = qemu_coroutine_self(); @@ -456,10 +426,15 @@ nbd_co_establish_connection(BlockDriverState *bs, Error **errp) qemu_mutex_lock(>mutex); -switch (thr->state) { -case CONNECT_THREAD_SUCCESS: -case CONNECT_THREAD_FAIL: -thr->state = CONNECT_THREAD_NONE; +out: +if (thr->running) { +/* + * Obviously, drained section wants to start. Report the attempt as + * failed. Still connect thread is executing in background, and its + * result may be used for next connection attempt. + */ +error_setg(errp, "Connection attempt cancelled by other operation"); +} else { error_propagate(errp, thr->err); thr->err = NULL; s->sioc = thr->sioc; @@ -468,32 +443,11 @@ nbd_co_establish_connection(BlockDriverState *bs, Error **errp) yank_register_function(BLOCKDEV_YANK_INSTANCE(bs->node_name), nbd_yank, bs); } -ret = (s->sioc ? 0 : -1); -break; -case CONNECT_THREAD_RUNNING: -/* - * Obviously, drained section wants to start. Report the attempt as - * failed. Still connect thread is executing in background, and its - * result may be used for next connection attempt. - */ -ret = -1; -error_setg(errp, "Connection attempt cancelled by
[PATCH v2 01/10] block/nbd: introduce NBDConnectThread reference counter
The structure is shared between NBD BDS and connection thread. And it is possible the connect thread will finish after closing and releasing for the bs. To handle this we have a concept of CONNECT_THREAD_RUNNING_DETACHED state and when thread is running and BDS is going to be closed we don't free the structure, but instead move it to CONNECT_THREAD_RUNNING_DETACHED state, so that thread will free it. Still more native way to solve the problem is using reference counter for shared structure. Let's use it. It makes code smaller and more readable. New approach also makes checks in nbd_co_establish_connection() redundant: now we are sure that s->connect_thread is valid during the whole life of NBD BDS. This also fixes possible use-after-free of s->connect_thread if nbd_co_establish_connection_cancel() clears it during nbd_co_establish_connection(), and nbd_co_establish_connection() uses local copy of s->connect_thread after yield point. Signed-off-by: Vladimir Sementsov-Ogievskiy --- block/nbd.c | 62 + 1 file changed, 20 insertions(+), 42 deletions(-) diff --git a/block/nbd.c b/block/nbd.c index c26dc5a54f..64b2977dc8 100644 --- a/block/nbd.c +++ b/block/nbd.c @@ -73,12 +73,6 @@ typedef enum NBDConnectThreadState { /* Thread is running, no results for now */ CONNECT_THREAD_RUNNING, -/* - * Thread is running, but requestor exited. Thread should close - * the new socket and free the connect state on exit. - */ -CONNECT_THREAD_RUNNING_DETACHED, - /* Thread finished, results are stored in a state */ CONNECT_THREAD_FAIL, CONNECT_THREAD_SUCCESS @@ -101,6 +95,8 @@ typedef struct NBDConnectThread { QIOChannelSocket *sioc; Error *err; +int refcnt; /* atomic access */ + /* state and bh_ctx are protected by mutex */ QemuMutex mutex; NBDConnectThreadState state; /* current state of the thread */ @@ -144,16 +140,18 @@ typedef struct BDRVNBDState { NBDConnectThread *connect_thread; } BDRVNBDState; +static void connect_thread_state_unref(NBDConnectThread *thr); static int nbd_establish_connection(BlockDriverState *bs, SocketAddress *saddr, Error **errp); static int nbd_co_establish_connection(BlockDriverState *bs, Error **errp); -static void nbd_co_establish_connection_cancel(BlockDriverState *bs, - bool detach); +static void nbd_co_establish_connection_cancel(BlockDriverState *bs); static int nbd_client_handshake(BlockDriverState *bs, Error **errp); static void nbd_yank(void *opaque); static void nbd_clear_bdrvstate(BDRVNBDState *s) { +connect_thread_state_unref(s->connect_thread); +s->connect_thread = NULL; object_unref(OBJECT(s->tlscreds)); qapi_free_SocketAddress(s->saddr); s->saddr = NULL; @@ -293,7 +291,7 @@ static void coroutine_fn nbd_client_co_drain_begin(BlockDriverState *bs) qemu_co_sleep_wake(s->connection_co_sleep_ns_state); } -nbd_co_establish_connection_cancel(bs, false); +nbd_co_establish_connection_cancel(bs); reconnect_delay_timer_del(s); @@ -333,7 +331,7 @@ static void nbd_teardown_connection(BlockDriverState *bs) if (s->connection_co_sleep_ns_state) { qemu_co_sleep_wake(s->connection_co_sleep_ns_state); } -nbd_co_establish_connection_cancel(bs, true); +nbd_co_establish_connection_cancel(bs); } if (qemu_in_coroutine()) { s->teardown_co = qemu_coroutine_self(); @@ -376,26 +374,28 @@ static void nbd_init_connect_thread(BDRVNBDState *s) .state = CONNECT_THREAD_NONE, .bh_func = connect_bh, .bh_opaque = s, +.refcnt = 1, }; qemu_mutex_init(>connect_thread->mutex); } -static void nbd_free_connect_thread(NBDConnectThread *thr) +static void connect_thread_state_unref(NBDConnectThread *thr) { -if (thr->sioc) { -qio_channel_close(QIO_CHANNEL(thr->sioc), NULL); +if (qatomic_dec_fetch(>refcnt) == 0) { +if (thr->sioc) { +qio_channel_close(QIO_CHANNEL(thr->sioc), NULL); +} +error_free(thr->err); +qapi_free_SocketAddress(thr->saddr); +g_free(thr); } -error_free(thr->err); -qapi_free_SocketAddress(thr->saddr); -g_free(thr); } static void *connect_thread_func(void *opaque) { NBDConnectThread *thr = opaque; int ret; -bool do_free = false; thr->sioc = qio_channel_socket_new(); @@ -419,18 +419,13 @@ static void *connect_thread_func(void *opaque) thr->bh_ctx = NULL; } break; -case CONNECT_THREAD_RUNNING_DETACHED: -do_free = true; -break; default: abort(); } qemu_mutex_unlock(>mutex); -if (do_free) { -nbd_free_connect_thread(thr); -} +connect_thread_state_unref(thr); return NULL; } @@ -451,6 +446,7 @@
[PATCH v2 08/10] block/nbd: rename NBDConnectThread to NBDClientConnection
We are going to move connection code to own file and want clear names and APIs. The structure is shared between user and (possibly) several runs of connect-thread. So it's wrong to call it "thread". Let's rename to something more generic. Appropriately rename connect_thread and thr variables to conn. connect_thread_state_unref() function gets new appropriate name too Signed-off-by: Vladimir Sementsov-Ogievskiy --- block/nbd.c | 127 ++-- 1 file changed, 63 insertions(+), 64 deletions(-) diff --git a/block/nbd.c b/block/nbd.c index ebbb0bec6a..ab3ef13366 100644 --- a/block/nbd.c +++ b/block/nbd.c @@ -66,7 +66,7 @@ typedef enum NBDClientState { NBD_CLIENT_QUIT } NBDClientState; -typedef struct NBDConnectThread { +typedef struct NBDClientConnection { /* Initialization constants */ SocketAddress *saddr; /* address to connect to */ @@ -83,7 +83,7 @@ typedef struct NBDConnectThread { /* All further fields are protected by mutex */ bool running; /* thread is running now */ Coroutine *wait_co; /* nbd_co_establish_connection() wait in yield() */ -} NBDConnectThread; +} NBDClientConnection; typedef struct BDRVNBDState { QIOChannelSocket *sioc; /* The master data channel */ @@ -116,22 +116,22 @@ typedef struct BDRVNBDState { char *x_dirty_bitmap; bool alloc_depth; -NBDConnectThread *connect_thread; +NBDClientConnection *conn; } BDRVNBDState; -static void connect_thread_state_unref(NBDConnectThread *thr); +static void nbd_client_connection_unref(NBDClientConnection *conn); static int nbd_establish_connection(BlockDriverState *bs, SocketAddress *saddr, Error **errp); static coroutine_fn QIOChannelSocket * -nbd_co_establish_connection(NBDConnectThread *thr, Error **errp); -static void nbd_co_establish_connection_cancel(NBDConnectThread *thr); +nbd_co_establish_connection(NBDClientConnection *conn, Error **errp); +static void nbd_co_establish_connection_cancel(NBDClientConnection *conn); static int nbd_client_handshake(BlockDriverState *bs, Error **errp); static void nbd_yank(void *opaque); static void nbd_clear_bdrvstate(BDRVNBDState *s) { -connect_thread_state_unref(s->connect_thread); -s->connect_thread = NULL; +nbd_client_connection_unref(s->conn); +s->conn = NULL; object_unref(OBJECT(s->tlscreds)); qapi_free_SocketAddress(s->saddr); s->saddr = NULL; @@ -271,7 +271,7 @@ static void coroutine_fn nbd_client_co_drain_begin(BlockDriverState *bs) qemu_co_sleep_wake(s->connection_co_sleep_ns_state); } -nbd_co_establish_connection_cancel(s->connect_thread); +nbd_co_establish_connection_cancel(s->conn); reconnect_delay_timer_del(s); @@ -311,7 +311,7 @@ static void nbd_teardown_connection(BlockDriverState *bs) if (s->connection_co_sleep_ns_state) { qemu_co_sleep_wake(s->connection_co_sleep_ns_state); } -nbd_co_establish_connection_cancel(s->connect_thread); +nbd_co_establish_connection_cancel(s->conn); } if (qemu_in_coroutine()) { s->teardown_co = qemu_coroutine_self(); @@ -338,100 +338,100 @@ static bool nbd_client_connecting_wait(BDRVNBDState *s) static void nbd_init_connect_thread(BDRVNBDState *s) { -s->connect_thread = g_new(NBDConnectThread, 1); +s->conn = g_new(NBDClientConnection, 1); -*s->connect_thread = (NBDConnectThread) { +*s->conn = (NBDClientConnection) { .saddr = QAPI_CLONE(SocketAddress, s->saddr), .refcnt = 1, }; -qemu_mutex_init(>connect_thread->mutex); +qemu_mutex_init(>conn->mutex); } -static void connect_thread_state_unref(NBDConnectThread *thr) +static void nbd_client_connection_unref(NBDClientConnection *conn) { -if (qatomic_dec_fetch(>refcnt) == 0) { -if (thr->sioc) { -qio_channel_close(QIO_CHANNEL(thr->sioc), NULL); +if (qatomic_dec_fetch(>refcnt) == 0) { +if (conn->sioc) { +qio_channel_close(QIO_CHANNEL(conn->sioc), NULL); } -error_free(thr->err); -qapi_free_SocketAddress(thr->saddr); -g_free(thr); +error_free(conn->err); +qapi_free_SocketAddress(conn->saddr); +g_free(conn); } } static void *connect_thread_func(void *opaque) { -NBDConnectThread *thr = opaque; +NBDClientConnection *conn = opaque; int ret; -thr->sioc = qio_channel_socket_new(); +conn->sioc = qio_channel_socket_new(); -error_free(thr->err); -thr->err = NULL; -ret = qio_channel_socket_connect_sync(thr->sioc, thr->saddr, >err); +error_free(conn->err); +conn->err = NULL; +ret = qio_channel_socket_connect_sync(conn->sioc, conn->saddr, >err); if (ret < 0) { -object_unref(OBJECT(thr->sioc)); -thr->sioc = NULL; +object_unref(OBJECT(conn->sioc)); +conn->sioc = NULL; } -
[PATCH v2 04/10] block/nbd: simplify waking of nbd_co_establish_connection()
Instead of connect_bh, bh_ctx and wait_connect fields we can live with only one link to waiting coroutine, protected by mutex. So new logic is: nbd_co_establish_connection() sets wait_co under mutex, release the mutex and do yield(). Note, that wait_co may be scheduled by thread immediately after unlocking the mutex. Still, in main thread (or iothread) we'll not reach the code for entering the coroutine until the yield() so we are safe. Both connect_thread_func() and nbd_co_establish_connection_cancel() do the following to handle wait_co: Under mutex, if thr->wait_co is not NULL, call aio_co_wake() (which never tries to acquire aio context since previous commit, so we are safe to do it under thr->mutex) and set thr->wait_co to NULL. This way we protect ourselves of scheduling it twice. Also this commit make nbd_co_establish_connection() less connected to bs (we have generic pointer to the coroutine, not use s->connection_co directly). So, we are on the way of splitting connection API out of nbd.c (which is overcomplicated now). Signed-off-by: Vladimir Sementsov-Ogievskiy --- block/nbd.c | 49 + 1 file changed, 9 insertions(+), 40 deletions(-) diff --git a/block/nbd.c b/block/nbd.c index b0bbde741a..4e28982e53 100644 --- a/block/nbd.c +++ b/block/nbd.c @@ -81,12 +81,6 @@ typedef enum NBDConnectThreadState { typedef struct NBDConnectThread { /* Initialization constants */ SocketAddress *saddr; /* address to connect to */ -/* - * Bottom half to schedule on completion. Scheduled only if bh_ctx is not - * NULL - */ -QEMUBHFunc *bh_func; -void *bh_opaque; /* * Result of last attempt. Valid in FAIL and SUCCESS states. @@ -97,10 +91,10 @@ typedef struct NBDConnectThread { int refcnt; /* atomic access */ -/* state and bh_ctx are protected by mutex */ QemuMutex mutex; +/* All further fields are protected by mutex */ NBDConnectThreadState state; /* current state of the thread */ -AioContext *bh_ctx; /* where to schedule bh (NULL means don't schedule) */ +Coroutine *wait_co; /* nbd_co_establish_connection() wait in yield() */ } NBDConnectThread; typedef struct BDRVNBDState { @@ -134,7 +128,6 @@ typedef struct BDRVNBDState { char *x_dirty_bitmap; bool alloc_depth; -bool wait_connect; NBDConnectThread *connect_thread; } BDRVNBDState; @@ -354,15 +347,6 @@ static bool nbd_client_connecting_wait(BDRVNBDState *s) return qatomic_load_acquire(>state) == NBD_CLIENT_CONNECTING_WAIT; } -static void connect_bh(void *opaque) -{ -BDRVNBDState *state = opaque; - -assert(state->wait_connect); -state->wait_connect = false; -aio_co_wake(state->connection_co); -} - static void nbd_init_connect_thread(BDRVNBDState *s) { s->connect_thread = g_new(NBDConnectThread, 1); @@ -370,8 +354,6 @@ static void nbd_init_connect_thread(BDRVNBDState *s) *s->connect_thread = (NBDConnectThread) { .saddr = QAPI_CLONE(SocketAddress, s->saddr), .state = CONNECT_THREAD_NONE, -.bh_func = connect_bh, -.bh_opaque = s, .refcnt = 1, }; @@ -410,11 +392,9 @@ static void *connect_thread_func(void *opaque) switch (thr->state) { case CONNECT_THREAD_RUNNING: thr->state = ret < 0 ? CONNECT_THREAD_FAIL : CONNECT_THREAD_SUCCESS; -if (thr->bh_ctx) { -aio_bh_schedule_oneshot(thr->bh_ctx, thr->bh_func, thr->bh_opaque); - -/* play safe, don't reuse bh_ctx on further connection attempts */ -thr->bh_ctx = NULL; +if (thr->wait_co) { +aio_co_wake(thr->wait_co); +thr->wait_co = NULL; } break; default: @@ -464,20 +444,14 @@ nbd_co_establish_connection(BlockDriverState *bs, Error **errp) abort(); } -thr->bh_ctx = qemu_get_current_aio_context(); +thr->wait_co = qemu_coroutine_self(); qemu_mutex_unlock(>mutex); - /* * We are going to wait for connect-thread finish, but * nbd_client_co_drain_begin() can interrupt. - * - * Note that wait_connect variable is not visible for connect-thread. It - * doesn't need mutex protection, it used only inside home aio context of - * bs. */ -s->wait_connect = true; qemu_coroutine_yield(); qemu_mutex_lock(>mutex); @@ -531,24 +505,19 @@ static void nbd_co_establish_connection_cancel(BlockDriverState *bs) { BDRVNBDState *s = bs->opaque; NBDConnectThread *thr = s->connect_thread; -bool wake = false; qemu_mutex_lock(>mutex); if (thr->state == CONNECT_THREAD_RUNNING) { /* We can cancel only in running state, when bh is not yet scheduled */ -thr->bh_ctx = NULL; -if (s->wait_connect) { -s->wait_connect = false; -wake = true; +if (thr->wait_co) { +aio_co_wake(thr->wait_co); +thr->wait_co = NULL;
Re: [RFC PATCH v2 00/11] qemu_iotests: improve debugging options
On 08/04/2021 14:39, Markus Armbruster wrote: Emanuele Giuseppe Esposito writes: On 08/04/2021 10:26, Markus Armbruster wrote: Emanuele Giuseppe Esposito writes: This series adds the option to attach gdbserver and valgrind to the QEMU binary running in qemu_iotests. It also allows to redirect QEMU binaries output of the python tests to the stdout, instead of a log file. Patches 1-6 introduce the -gdb option to both python and bash tests, 7-10 extend the already existing -valgrind flag to work also on python tests, and patch 11 introduces -p to enable logging to stdout. In particular, patches 1,2,4,8 focus on extending the QMP socket timers when using gdb/valgrind, otherwise the python tests will fail due to delays in the QMP responses. This series is tested on the previous serie "qemu-iotests: quality of life improvements" but independent from it, so it can be applied separately. Signed-off-by: Emanuele Giuseppe Esposito How discoverable are these goodies for developers with only superficial knowledge of iotests? Not really sure what you mean, but ./check --help now shows: -p enable prints -gdb start gdbserver with $GDB_QEMU options. Default is localhost:12345 Which I guess should be clear enough? Btw two-three weeks ago I didn't know anything about these tests either. I agree I can make -p more clear, saying "enable qemu binary prints to stdout", and move -valgrind to the "optional arguments" instead of "bash-only" Yes, please (this is not a demand). docs/devel/testing.rst section "QEMU iotests" is the place to explain things in more detail than --help, if that's useful. Right now it simply points to check -h. Ok, I will add a new section in testing.rst explaining the new flags. Thank you, Emanuele
Re: [RFC PATCH v2 00/11] qemu_iotests: improve debugging options
Emanuele Giuseppe Esposito writes: > On 08/04/2021 10:26, Markus Armbruster wrote: >> Emanuele Giuseppe Esposito writes: >> >>> This series adds the option to attach gdbserver and valgrind >>> to the QEMU binary running in qemu_iotests. >>> It also allows to redirect QEMU binaries output of the python tests >>> to the stdout, instead of a log file. >>> >>> Patches 1-6 introduce the -gdb option to both python and bash tests, >>> 7-10 extend the already existing -valgrind flag to work also on >>> python tests, and patch 11 introduces -p to enable logging to stdout. >>> >>> In particular, patches 1,2,4,8 focus on extending the QMP socket timers >>> when using gdb/valgrind, otherwise the python tests will fail due to >>> delays in the QMP responses. >>> >>> This series is tested on the previous serie >>> "qemu-iotests: quality of life improvements" >>> but independent from it, so it can be applied separately. >>> >>> Signed-off-by: Emanuele Giuseppe Esposito >> >> How discoverable are these goodies for developers with only superficial >> knowledge of iotests? >> > > Not really sure what you mean, but > > ./check --help now shows: > >> -p enable prints >> -gdb start gdbserver with $GDB_QEMU options. Default is localhost:12345 > > Which I guess should be clear enough? Btw two-three weeks ago I didn't > know anything about these tests either. > > I agree I can make -p more clear, saying "enable qemu binary prints to > stdout", and move -valgrind to the "optional arguments" instead of > "bash-only" Yes, please (this is not a demand). docs/devel/testing.rst section "QEMU iotests" is the place to explain things in more detail than --help, if that's useful. Right now it simply points to check -h.
[PATCH v2] docs: add nvme emulation documentation
From: Klaus Jensen Remove the docs/specs/nvme.txt and replace it with proper documentation in docs/system/nvme.rst. Signed-off-by: Klaus Jensen Reviewed-by: Philippe Mathieu-Daudé --- v2: - add descriptions of the simple copy `mssrl`, `mcl` and `msrc` parameters. (philmd) docs/specs/nvme.txt | 23 - docs/system/index.rst | 1 + docs/system/nvme.rst | 225 ++ MAINTAINERS | 2 +- 4 files changed, 227 insertions(+), 24 deletions(-) delete mode 100644 docs/specs/nvme.txt create mode 100644 docs/system/nvme.rst diff --git a/docs/specs/nvme.txt b/docs/specs/nvme.txt deleted file mode 100644 index 56d393884e7a.. --- a/docs/specs/nvme.txt +++ /dev/null @@ -1,23 +0,0 @@ -NVM Express Controller -== - -The nvme device (-device nvme) emulates an NVM Express Controller. - - -Reference Specifications - - -The device currently implements most mandatory features of NVMe v1.3d, see - - https://nvmexpress.org/resources/specifications/ - -for the specification. - - -Known issues - - -* The accounting numbers in the SMART/Health are reset across power cycles - -* Interrupt Coalescing is not supported and is disabled by default in volation - of the specification. diff --git a/docs/system/index.rst b/docs/system/index.rst index 02d07071810f..b05af716a973 100644 --- a/docs/system/index.rst +++ b/docs/system/index.rst @@ -23,6 +23,7 @@ Contents: net virtio-net-failover usb + nvme ivshmem linuxboot generic-loader diff --git a/docs/system/nvme.rst b/docs/system/nvme.rst new file mode 100644 index ..f7f63d6bf615 --- /dev/null +++ b/docs/system/nvme.rst @@ -0,0 +1,225 @@ +== +NVMe Emulation +== + +QEMU provides NVMe emulation through the ``nvme``, ``nvme-ns`` and +``nvme-subsys`` devices. + +See the following sections for specific information on + + * `Adding NVMe Devices`_, `additional namespaces`_ and `NVM subsystems`_. + * Configuration of `Optional Features`_ such as `Controller Memory Buffer`_, +`Simple Copy`_, `Zoned Namespaces`_, `metadata`_ and `End-to-End Data +Protection`_, + +Adding NVMe Devices +=== + +Controller Emulation + + +The QEMU emulated NVMe controller implements version 1.4 of the NVM Express +specification. All mandatory features are implement with a couple of exceptions +and limitations: + + * Accounting numbers in the SMART/Health log page are reset when the device +is power cycled. + * Interrupt Coalescing is not supported and is disabled by default. + +The simplest way to attach an NVMe controller on the QEMU PCI bus is to add the +following parameters: + +.. code-block:: console + +-drive file=nvm.img,if=none,id=nvm +-device nvme,serial=deadbeef,drive=nvm + +There are a number of optional general parameters for the ``nvme`` device. Some +are mentioned here, but see ``-device nvme,help`` to list all possible +parameters. + +``max_ioqpairs=UINT32`` (default: ``64``) + Set the maximum number of allowed I/O queue pairs. This replaces the + deprecated ``num_queues`` parameter. + +``msix_qsize=UINT16`` (default: ``65``) + The number of MSI-X vectors that the device should support. + +``mdts=UINT8`` (default: ``7``) + Set the Maximum Data Transfer Size of the device. + +``use-intel-id`` (default: ``off``) + Since QEMU 5.2, the device uses a QEMU allocated "Red Hat" PCI Device and + Vendor ID. Set this to ``on`` to revert to the unallocated Intel ID + previously used. + +Additional Namespaces +- + +In the simplest possible invocation sketched above, the device only support a +single namespace with the namespace identifier ``1``. To support multiple +namespaces and additional features, the ``nvme-ns`` device must be used. + +.. code-block:: console + + -device nvme,id=nvme-ctrl-0,serial=deadbeef + -drive file=nvm-1.img,if=none,id=nvm-1 + -device nvme-ns,drive=nvm-1 + -drive file=nvm-2.img,if=none,id=nvm-2 + -device nvme-ns,drive=nvm-2 + +The namespaces defined by the ``nvme-ns`` device will attach to the most +recently defined ``nvme-bus`` that is created by the ``nvme`` device. Namespace +identifers are allocated automatically, starting from ``1``. + +There are a number of parameters available: + +``nsid`` (default: ``0``) + Explicitly set the namespace identifier. + +``uuid`` (default: *autogenerated*) + Set the UUID of the namespace. This will be reported as a "Namespace UUID" + descriptor in the Namespace Identification Descriptor List. + +``bus`` + If there are more ``nvme`` devices defined, this parameter may be used to + attach the namespace to a specific ``nvme`` device (identified by an ``id`` + parameter on the controller device). + +NVM Subsystems +-- + +Additional features becomes available if the controller device (``nvme``) is +linked to an NVM Subsystem device (``nvme-subsys``). +
Re: [PATCH] docs: add nvme emulation documentation
On Apr 8 12:17, Philippe Mathieu-Daudé wrote: On 4/8/21 11:50 AM, Klaus Jensen wrote: From: Klaus Jensen Remove the docs/specs/nvme.txt and replace it with proper documentation in docs/system/nvme.rst. Signed-off-by: Klaus Jensen --- docs/specs/nvme.txt | 23 - docs/system/index.rst | 1 + docs/system/nvme.rst | 212 ++ MAINTAINERS | 2 +- 4 files changed, 214 insertions(+), 24 deletions(-) delete mode 100644 docs/specs/nvme.txt create mode 100644 docs/system/nvme.rst +Simple Copy +--- + +The device includes support for TP 4065 ("Simple Copy Command"). Copy command +limits may be set with the ``mssrl``, ``mcl`` and ``msrc=UINT8`` ``nvme-ns`` +device parameters. All parameters are described except mssrl/mcl/msrc. Any particular reason? Not really, missed them. I'll add them :) Otherwise: Reviewed-by: Philippe Mathieu-Daudé Thanks! signature.asc Description: PGP signature
Re: [RFC PATCH v2 00/11] qemu_iotests: improve debugging options
On 08/04/2021 10:26, Markus Armbruster wrote: Emanuele Giuseppe Esposito writes: This series adds the option to attach gdbserver and valgrind to the QEMU binary running in qemu_iotests. It also allows to redirect QEMU binaries output of the python tests to the stdout, instead of a log file. Patches 1-6 introduce the -gdb option to both python and bash tests, 7-10 extend the already existing -valgrind flag to work also on python tests, and patch 11 introduces -p to enable logging to stdout. In particular, patches 1,2,4,8 focus on extending the QMP socket timers when using gdb/valgrind, otherwise the python tests will fail due to delays in the QMP responses. This series is tested on the previous serie "qemu-iotests: quality of life improvements" but independent from it, so it can be applied separately. Signed-off-by: Emanuele Giuseppe Esposito How discoverable are these goodies for developers with only superficial knowledge of iotests? Not really sure what you mean, but ./check --help now shows: -p enable prints -gdb start gdbserver with $GDB_QEMU options. Default is localhost:12345 Which I guess should be clear enough? Btw two-three weeks ago I didn't know anything about these tests either. I agree I can make -p more clear, saying "enable qemu binary prints to stdout", and move -valgrind to the "optional arguments" instead of "bash-only" Emanuele
Re: [PATCH-for-6.0?] hw/block/fdc: Fix 'fallback' property on sysbus floppy disk controllers
Am 07.04.2021 um 15:37 hat Philippe Mathieu-Daudé geschrieben: > Setting the 'fallback' property corrupts the QOM instance state > (FDCtrlSysBus) because it accesses an incorrect offset (it uses > the offset of the FDCtrlISABus state). > > Fixes: a73275dd6fc ("fdc: Add fallback option") > Signed-off-by: Philippe Mathieu-Daudé Thanks, applied to the block branch. Kevin
Re: [PATCH 06/14] block/nbd: further segregation of connect-thread
On Wed, Apr 07, 2021 at 01:46:29PM +0300, Vladimir Sementsov-Ogievskiy wrote: > Add personal state NBDConnectThread for connect-thread and > nbd_connect_thread_start() function. Next step would be moving > connect-thread to a separate file. > > Note that we stop publishing thr->sioc during > qio_channel_socket_connect_sync(). Which probably means that we can't > early-cancel qio_channel_socket_connect_sync() call in > nbd_free_connect_thread(). Still, when thread is detached it doesn't > make big sense, and we have no guarantee anyway. > > Signed-off-by: Vladimir Sementsov-Ogievskiy > --- > block/nbd.c | 57 - > 1 file changed, 39 insertions(+), 18 deletions(-) > > diff --git a/block/nbd.c b/block/nbd.c > index e16c6d636a..23eb8adab1 100644 > --- a/block/nbd.c > +++ b/block/nbd.c > @@ -85,8 +85,6 @@ typedef enum NBDConnectThreadState { > } NBDConnectThreadState; > > typedef struct NBDConnectCB { > -/* Initialization constants */ > -SocketAddress *saddr; /* address to connect to */ > /* > * Bottom half to schedule on completion. Scheduled only if bh_ctx is not > * NULL > @@ -103,6 +101,15 @@ typedef struct NBDConnectCB { > AioContext *bh_ctx; /* where to schedule bh (NULL means don't schedule) > */ > } NBDConnectCB; > > +typedef void (*NBDConnectThreadCallback)(QIOChannelSocket *sioc, int ret, > + void *opaque); > + > +typedef struct NBDConnectThread { > +SocketAddress *saddr; /* address to connect to */ > +NBDConnectThreadCallback cb; > +void *cb_opaque; > +} NBDConnectThread; > + > typedef struct BDRVNBDState { > QIOChannelSocket *sioc; /* The master data channel */ > QIOChannel *ioc; /* The current I/O channel which may differ (eg TLS) */ > @@ -367,7 +374,6 @@ static void nbd_init_connect_thread(BDRVNBDState *s) > s->connect_thread = g_new(NBDConnectCB, 1); > > *s->connect_thread = (NBDConnectCB) { > -.saddr = QAPI_CLONE(SocketAddress, s->saddr), > .state = CONNECT_THREAD_NONE, > .bh_func = connect_bh, > .bh_opaque = s, > @@ -378,20 +384,18 @@ static void nbd_init_connect_thread(BDRVNBDState *s) > > static void nbd_free_connect_thread(NBDConnectCB *thr) > { > -if (thr->sioc) { > -qio_channel_close(QIO_CHANNEL(thr->sioc), NULL); > -} Doesn't this result in an open channel leak? (The original code here seems to be missing an unref, too.) > -qapi_free_SocketAddress(thr->saddr); > g_free(thr); > } > > -static void connect_thread_cb(int ret, void *opaque) > +static void connect_thread_cb(QIOChannelSocket *sioc, int ret, void *opaque) > { > NBDConnectCB *thr = opaque; > bool do_free = false; > > qemu_mutex_lock(>mutex); > > +thr->sioc = sioc; > + > switch (thr->state) { > case CONNECT_THREAD_RUNNING: > thr->state = ret < 0 ? CONNECT_THREAD_FAIL : CONNECT_THREAD_SUCCESS; > @@ -418,27 +422,45 @@ static void connect_thread_cb(int ret, void *opaque) > > static void *connect_thread_func(void *opaque) > { > -NBDConnectCB *thr = opaque; > +NBDConnectThread *thr = opaque; > int ret; > +QIOChannelSocket *sioc = qio_channel_socket_new(); > > -thr->sioc = qio_channel_socket_new(); > - > -ret = qio_channel_socket_connect_sync(thr->sioc, thr->saddr, NULL); > +ret = qio_channel_socket_connect_sync(sioc, thr->saddr, NULL); > if (ret < 0) { > -object_unref(OBJECT(thr->sioc)); > -thr->sioc = NULL; > +object_unref(OBJECT(sioc)); > +sioc = NULL; > } > > -connect_thread_cb(ret, thr); > +thr->cb(sioc, ret, thr->cb_opaque); > + > +qapi_free_SocketAddress(thr->saddr); > +g_free(thr); > > return NULL; > } > > +static void nbd_connect_thread_start(const SocketAddress *saddr, > + NBDConnectThreadCallback cb, > + void *cb_opaque) > +{ > +QemuThread thread; > +NBDConnectThread *thr = g_new(NBDConnectThread, 1); > + > +*thr = (NBDConnectThread) { > +.saddr = QAPI_CLONE(SocketAddress, saddr), > +.cb = cb, > +.cb_opaque = cb_opaque, > +}; > + > +qemu_thread_create(, "nbd-connect", > + connect_thread_func, thr, QEMU_THREAD_DETACHED); > +} > + > static int coroutine_fn > nbd_co_establish_connection(BlockDriverState *bs) > { > int ret; > -QemuThread thread; > BDRVNBDState *s = bs->opaque; > NBDConnectCB *thr = s->connect_thread; > > @@ -448,8 +470,7 @@ nbd_co_establish_connection(BlockDriverState *bs) > case CONNECT_THREAD_FAIL: > case CONNECT_THREAD_NONE: > thr->state = CONNECT_THREAD_RUNNING; > -qemu_thread_create(, "nbd-connect", > - connect_thread_func, thr, QEMU_THREAD_DETACHED); > +nbd_connect_thread_start(s->saddr, connect_thread_cb,
Re: [PULL for-6.0 v3 00/10] emulated nvme fixes for -rc3
On Wed, 7 Apr 2021 at 17:53, Klaus Jensen wrote: > > From: Klaus Jensen > > Hi Peter, > > Gitlab CI finally got back in business :) > > Note: only cover letter sent for this v3. > > > The following changes since commit d0d3dd401b70168a353450e031727affee828527: > > Update version for v6.0.0-rc2 release (2021-04-06 18:34:34 +0100) > > are available in the Git repository at: > > git://git.infradead.org/qemu-nvme.git tags/nvme-fixes-20210407-pull-request > > for you to fetch changes up to 7645f21f409b67eb9aad9feef6283c2e186e3703: > > hw/block/nvme: fix out-of-bounds read in nvme_subsys_ctrl (2021-04-07 > 10:48:33 +0200) > > > emulated nvme fixes for -rc3 > > v3: > - removed unnecessary deprecation warning > > v2: > - added missing patches > Applied, thanks. Please update the changelog at https://wiki.qemu.org/ChangeLog/6.0 for any user-visible changes. -- PMM
Re: [PATCH] docs: add nvme emulation documentation
On 4/8/21 11:50 AM, Klaus Jensen wrote: > From: Klaus Jensen > > Remove the docs/specs/nvme.txt and replace it with proper documentation > in docs/system/nvme.rst. > > Signed-off-by: Klaus Jensen > --- > docs/specs/nvme.txt | 23 - > docs/system/index.rst | 1 + > docs/system/nvme.rst | 212 ++ > MAINTAINERS | 2 +- > 4 files changed, 214 insertions(+), 24 deletions(-) > delete mode 100644 docs/specs/nvme.txt > create mode 100644 docs/system/nvme.rst > +Simple Copy > +--- > + > +The device includes support for TP 4065 ("Simple Copy Command"). Copy command > +limits may be set with the ``mssrl``, ``mcl`` and ``msrc=UINT8`` ``nvme-ns`` > +device parameters. All parameters are described except mssrl/mcl/msrc. Any particular reason? Otherwise: Reviewed-by: Philippe Mathieu-Daudé
DROP THIS Re: [PATCH 00/14] nbd: move reconnect-thread to separate file
07.04.2021 13:46, Vladimir Sementsov-Ogievskiy wrote: Hi all! There are problems with nbd driver: - nbd reconnect is cancelled on drain, which is bad as Roman describes in his "[PATCH 0/7] block/nbd: decouple reconnect from drain" - nbd driver is too complicated around drained sections and aio context switch. It's nearly impossible to follow all the logic, including abuse of bs->in_flight, which is temporary decreased in some places (like nbd_read_eof()). Additional reconnect thread and two different state machines (we have BDRVNBDState::state and BDRVNBDState::connect_thread->state) doesn't make things simpler :) So, I have a plan: 1. Move nbd negotiation to connect_thread 2. Do receive NBD replies in request coroutines, not in connection_co At this point we can drop connection_co, and when we don't have endless running coroutine, NBD driver becomes a usual block driver, and we can drop abuse of bs->in_flight, and probably drop most of complicated logic around drained section and aio context switch in nbd driver. 3. Still, as Roman describes, with [2] we loose a possibility to reconnect immediately when connection breaks (with current code we have endless read in reconnect_co, but actually for this to work keep-alive should be setup correctly). So, we'll need to reinvent it, checking connection periodically by timeout, with help of getsockopt or just sending a kind of PING request (zero-length READ or something like this). And this series a kind of preparation. The main point of it is moving connect-thread to a separate file. Finally I don't like it. I'm in progress of creating new series to substitute this one, so don't waste your time on review. -- Best regards, Vladimir
Re: [PATCH-for-6.0?] hw/block/fdc: Fix 'fallback' property on sysbus floppy disk controllers
On 4/8/21 11:38 AM, Markus Armbruster wrote: > Philippe Mathieu-Daudé writes: > >> Setting the 'fallback' property corrupts the QOM instance state >> (FDCtrlSysBus) because it accesses an incorrect offset (it uses >> the offset of the FDCtrlISABus state). >> >> Fixes: a73275dd6fc ("fdc: Add fallback option") >> Signed-off-by: Philippe Mathieu-Daudé >> --- >> hw/block/fdc.c | 4 ++-- >> 1 file changed, 2 insertions(+), 2 deletions(-) >> >> diff --git a/hw/block/fdc.c b/hw/block/fdc.c >> index 82afda7f3a7..a825c2acbae 100644 >> --- a/hw/block/fdc.c >> +++ b/hw/block/fdc.c >> @@ -2893,7 +2893,7 @@ static Property sysbus_fdc_properties[] = { >> DEFINE_PROP_SIGNED("fdtypeB", FDCtrlSysBus, >> state.qdev_for_drives[1].type, >> FLOPPY_DRIVE_TYPE_AUTO, qdev_prop_fdc_drive_type, >> FloppyDriveType), >> -DEFINE_PROP_SIGNED("fallback", FDCtrlISABus, state.fallback, >> +DEFINE_PROP_SIGNED("fallback", FDCtrlSysBus, state.fallback, >> FLOPPY_DRIVE_TYPE_144, qdev_prop_fdc_drive_type, >> FloppyDriveType), >> DEFINE_PROP_END_OF_LIST(), >> @@ -2918,7 +2918,7 @@ static Property sun4m_fdc_properties[] = { >> DEFINE_PROP_SIGNED("fdtype", FDCtrlSysBus, >> state.qdev_for_drives[0].type, >> FLOPPY_DRIVE_TYPE_AUTO, qdev_prop_fdc_drive_type, >> FloppyDriveType), >> -DEFINE_PROP_SIGNED("fallback", FDCtrlISABus, state.fallback, >> +DEFINE_PROP_SIGNED("fallback", FDCtrlSysBus, state.fallback, >> FLOPPY_DRIVE_TYPE_144, qdev_prop_fdc_drive_type, >> FloppyDriveType), >> DEFINE_PROP_END_OF_LIST(), > > Reviewed-by: Markus Armbruster > > On whether to pick this into 6.0... > > The patch has no effect unless someone or something uses "fallback" with > a non-ISA FDC. There it fixes a bug. The bug's exact impact is > unknown. I figure I could find out, but it doesn't seem to be worth the > bother. non-ISA FDC is only used on MIPS/SPARC. > Commit a73275dd6fc: > > Currently, QEMU chooses a drive type automatically based on the inserted > media. If there is no disk inserted, it chooses a 1.44MB drive type. > > Change this behavior to be configurable, but leave it defaulted to 1.44. > > This is not earnestly intended to be used by a user or a management > library, but rather exists so that pre-2.6 board types can configure it > to be a legacy value. > > We do so only for "isa-fdc", in hw/core/machine.c. > > I don't understand why we don't for the other devices, but that's > outside this patch's scope. > > Downstreams could do it, but it wouldn't work. They need this commit to > make it work. > > Users (human or management application) should not use it, but of course > they might anyway. This commit makes such (unadvisable) usage safe. > > The reward is low, but so is the risk. If I was the maintainer, I'd be > tempted to take it up to rc3. Thanks for the impact analysis. The fix seems harmless to me, but I'm fine having it fixed in 6.1 (this is an old bug, so not critical to have it fixed for 6.0). Phil.
Re: [PATCH] docs: add nvme emulation documentation
On Thu, 8 Apr 2021 at 10:54, Klaus Jensen wrote: > > Hi Peter, > > Are documentation updates acceptable for -rc3? Yes; they're safe changes, generally. thanks -- PMM
Re: [PATCH] docs: add nvme emulation documentation
Hi Peter, Are documentation updates acceptable for -rc3? On Apr 8 11:50, Klaus Jensen wrote: From: Klaus Jensen Remove the docs/specs/nvme.txt and replace it with proper documentation in docs/system/nvme.rst. Signed-off-by: Klaus Jensen --- docs/specs/nvme.txt | 23 - docs/system/index.rst | 1 + docs/system/nvme.rst | 212 ++ MAINTAINERS | 2 +- 4 files changed, 214 insertions(+), 24 deletions(-) delete mode 100644 docs/specs/nvme.txt create mode 100644 docs/system/nvme.rst diff --git a/docs/specs/nvme.txt b/docs/specs/nvme.txt deleted file mode 100644 index 56d393884e7a.. --- a/docs/specs/nvme.txt +++ /dev/null @@ -1,23 +0,0 @@ -NVM Express Controller -== - -The nvme device (-device nvme) emulates an NVM Express Controller. - - -Reference Specifications - - -The device currently implements most mandatory features of NVMe v1.3d, see - - https://nvmexpress.org/resources/specifications/ - -for the specification. - - -Known issues - - -* The accounting numbers in the SMART/Health are reset across power cycles - -* Interrupt Coalescing is not supported and is disabled by default in volation - of the specification. diff --git a/docs/system/index.rst b/docs/system/index.rst index 02d07071810f..b05af716a973 100644 --- a/docs/system/index.rst +++ b/docs/system/index.rst @@ -23,6 +23,7 @@ Contents: net virtio-net-failover usb + nvme ivshmem linuxboot generic-loader diff --git a/docs/system/nvme.rst b/docs/system/nvme.rst new file mode 100644 index ..ab760d9888f9 --- /dev/null +++ b/docs/system/nvme.rst @@ -0,0 +1,212 @@ +== +NVMe Emulation +== + +QEMU provides NVMe emulation through the ``nvme``, ``nvme-ns`` and +``nvme-subsys`` devices. + +See the following sections for specific information on + + * `Adding NVMe Devices`_, `additional namespaces`_ and `NVM subsystems`_. + * Configuration of `Optional Features`_ such as `Controller Memory Buffer`_, +`Simple Copy`_, `Zoned Namespaces`_, `metadata`_ and `End-to-End Data +Protection`_, + +Adding NVMe Devices +=== + +Controller Emulation + + +The QEMU emulated NVMe controller implements version 1.4 of the NVM Express +specification. All mandatory features are implement with a couple of exceptions +and limitations: + + * Accounting numbers in the SMART/Health log page are reset when the device +is power cycled. + * Interrupt Coalescing is not supported and is disabled by default. + +The simplest way to attach an NVMe controller on the QEMU PCI bus is to add the +following parameters: + +.. code-block:: console + +-drive file=nvm.img,if=none,id=nvm +-device nvme,serial=deadbeef,drive=nvm + +There are a number of optional general parameters for the ``nvme`` device. Some +are mentioned here, but see ``-device nvme,help`` to list all possible +parameters. + +``max_ioqpairs=UINT32`` (default: ``64``) + Set the maximum number of allowed I/O queue pairs. This replaces the + deprecated ``num_queues`` parameter. + +``msix_qsize=UINT16`` (default: ``65``) + The number of MSI-X vectors that the device should support. + +``mdts=UINT8`` (default: ``7``) + Set the Maximum Data Transfer Size of the device. + +``use-intel-id`` (default: ``off``) + Since QEMU 5.2, the device uses a QEMU allocated "Red Hat" PCI Device and + Vendor ID. Set this to ``on`` to revert to the unallocated Intel ID + previously used. + +Additional Namespaces +- + +In the simplest possible invocation sketched above, the device only support a +single namespace with the namespace identifier ``1``. To support multiple +namespaces and additional features, the ``nvme-ns`` device must be used. + +.. code-block:: console + + -device nvme,id=nvme-ctrl-0,serial=deadbeef + -drive file=nvm-1.img,if=none,id=nvm-1 + -device nvme-ns,drive=nvm-1 + -drive file=nvm-2.img,if=none,id=nvm-2 + -device nvme-ns,drive=nvm-2 + +The namespaces defined by the ``nvme-ns`` device will attach to the most +recently defined ``nvme-bus`` that is created by the ``nvme`` device. Namespace +identifers are allocated automatically, starting from ``1``. + +There are a number of parameters available: + +``nsid`` (default: ``0``) + Explicitly set the namespace identifier. + +``uuid`` (default: *autogenerated*) + Set the UUID of the namespace. This will be reported as a "Namespace UUID" + descriptor in the Namespace Identification Descriptor List. + +``bus`` + If there are more ``nvme`` devices defined, this parameter may be used to + attach the namespace to a specific ``nvme`` device (identified by an ``id`` + parameter on the controller device). + +NVM Subsystems +-- + +Additional features becomes available if the controller device (``nvme``) is +linked to an NVM Subsystem device (``nvme-subsys``). + +The NVM Subsystem emulation allows features
[PATCH] docs: add nvme emulation documentation
From: Klaus Jensen Remove the docs/specs/nvme.txt and replace it with proper documentation in docs/system/nvme.rst. Signed-off-by: Klaus Jensen --- docs/specs/nvme.txt | 23 - docs/system/index.rst | 1 + docs/system/nvme.rst | 212 ++ MAINTAINERS | 2 +- 4 files changed, 214 insertions(+), 24 deletions(-) delete mode 100644 docs/specs/nvme.txt create mode 100644 docs/system/nvme.rst diff --git a/docs/specs/nvme.txt b/docs/specs/nvme.txt deleted file mode 100644 index 56d393884e7a.. --- a/docs/specs/nvme.txt +++ /dev/null @@ -1,23 +0,0 @@ -NVM Express Controller -== - -The nvme device (-device nvme) emulates an NVM Express Controller. - - -Reference Specifications - - -The device currently implements most mandatory features of NVMe v1.3d, see - - https://nvmexpress.org/resources/specifications/ - -for the specification. - - -Known issues - - -* The accounting numbers in the SMART/Health are reset across power cycles - -* Interrupt Coalescing is not supported and is disabled by default in volation - of the specification. diff --git a/docs/system/index.rst b/docs/system/index.rst index 02d07071810f..b05af716a973 100644 --- a/docs/system/index.rst +++ b/docs/system/index.rst @@ -23,6 +23,7 @@ Contents: net virtio-net-failover usb + nvme ivshmem linuxboot generic-loader diff --git a/docs/system/nvme.rst b/docs/system/nvme.rst new file mode 100644 index ..ab760d9888f9 --- /dev/null +++ b/docs/system/nvme.rst @@ -0,0 +1,212 @@ +== +NVMe Emulation +== + +QEMU provides NVMe emulation through the ``nvme``, ``nvme-ns`` and +``nvme-subsys`` devices. + +See the following sections for specific information on + + * `Adding NVMe Devices`_, `additional namespaces`_ and `NVM subsystems`_. + * Configuration of `Optional Features`_ such as `Controller Memory Buffer`_, +`Simple Copy`_, `Zoned Namespaces`_, `metadata`_ and `End-to-End Data +Protection`_, + +Adding NVMe Devices +=== + +Controller Emulation + + +The QEMU emulated NVMe controller implements version 1.4 of the NVM Express +specification. All mandatory features are implement with a couple of exceptions +and limitations: + + * Accounting numbers in the SMART/Health log page are reset when the device +is power cycled. + * Interrupt Coalescing is not supported and is disabled by default. + +The simplest way to attach an NVMe controller on the QEMU PCI bus is to add the +following parameters: + +.. code-block:: console + +-drive file=nvm.img,if=none,id=nvm +-device nvme,serial=deadbeef,drive=nvm + +There are a number of optional general parameters for the ``nvme`` device. Some +are mentioned here, but see ``-device nvme,help`` to list all possible +parameters. + +``max_ioqpairs=UINT32`` (default: ``64``) + Set the maximum number of allowed I/O queue pairs. This replaces the + deprecated ``num_queues`` parameter. + +``msix_qsize=UINT16`` (default: ``65``) + The number of MSI-X vectors that the device should support. + +``mdts=UINT8`` (default: ``7``) + Set the Maximum Data Transfer Size of the device. + +``use-intel-id`` (default: ``off``) + Since QEMU 5.2, the device uses a QEMU allocated "Red Hat" PCI Device and + Vendor ID. Set this to ``on`` to revert to the unallocated Intel ID + previously used. + +Additional Namespaces +- + +In the simplest possible invocation sketched above, the device only support a +single namespace with the namespace identifier ``1``. To support multiple +namespaces and additional features, the ``nvme-ns`` device must be used. + +.. code-block:: console + + -device nvme,id=nvme-ctrl-0,serial=deadbeef + -drive file=nvm-1.img,if=none,id=nvm-1 + -device nvme-ns,drive=nvm-1 + -drive file=nvm-2.img,if=none,id=nvm-2 + -device nvme-ns,drive=nvm-2 + +The namespaces defined by the ``nvme-ns`` device will attach to the most +recently defined ``nvme-bus`` that is created by the ``nvme`` device. Namespace +identifers are allocated automatically, starting from ``1``. + +There are a number of parameters available: + +``nsid`` (default: ``0``) + Explicitly set the namespace identifier. + +``uuid`` (default: *autogenerated*) + Set the UUID of the namespace. This will be reported as a "Namespace UUID" + descriptor in the Namespace Identification Descriptor List. + +``bus`` + If there are more ``nvme`` devices defined, this parameter may be used to + attach the namespace to a specific ``nvme`` device (identified by an ``id`` + parameter on the controller device). + +NVM Subsystems +-- + +Additional features becomes available if the controller device (``nvme``) is +linked to an NVM Subsystem device (``nvme-subsys``). + +The NVM Subsystem emulation allows features such as shared namespaces and +multipath I/O. + +.. code-block:: console + + -device
Re: [PATCH-for-6.0?] hw/block/fdc: Fix 'fallback' property on sysbus floppy disk controllers
Philippe Mathieu-Daudé writes: > Setting the 'fallback' property corrupts the QOM instance state > (FDCtrlSysBus) because it accesses an incorrect offset (it uses > the offset of the FDCtrlISABus state). > > Fixes: a73275dd6fc ("fdc: Add fallback option") > Signed-off-by: Philippe Mathieu-Daudé > --- > hw/block/fdc.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/hw/block/fdc.c b/hw/block/fdc.c > index 82afda7f3a7..a825c2acbae 100644 > --- a/hw/block/fdc.c > +++ b/hw/block/fdc.c > @@ -2893,7 +2893,7 @@ static Property sysbus_fdc_properties[] = { > DEFINE_PROP_SIGNED("fdtypeB", FDCtrlSysBus, > state.qdev_for_drives[1].type, > FLOPPY_DRIVE_TYPE_AUTO, qdev_prop_fdc_drive_type, > FloppyDriveType), > -DEFINE_PROP_SIGNED("fallback", FDCtrlISABus, state.fallback, > +DEFINE_PROP_SIGNED("fallback", FDCtrlSysBus, state.fallback, > FLOPPY_DRIVE_TYPE_144, qdev_prop_fdc_drive_type, > FloppyDriveType), > DEFINE_PROP_END_OF_LIST(), > @@ -2918,7 +2918,7 @@ static Property sun4m_fdc_properties[] = { > DEFINE_PROP_SIGNED("fdtype", FDCtrlSysBus, state.qdev_for_drives[0].type, > FLOPPY_DRIVE_TYPE_AUTO, qdev_prop_fdc_drive_type, > FloppyDriveType), > -DEFINE_PROP_SIGNED("fallback", FDCtrlISABus, state.fallback, > +DEFINE_PROP_SIGNED("fallback", FDCtrlSysBus, state.fallback, > FLOPPY_DRIVE_TYPE_144, qdev_prop_fdc_drive_type, > FloppyDriveType), > DEFINE_PROP_END_OF_LIST(), Reviewed-by: Markus Armbruster On whether to pick this into 6.0... The patch has no effect unless someone or something uses "fallback" with a non-ISA FDC. There it fixes a bug. The bug's exact impact is unknown. I figure I could find out, but it doesn't seem to be worth the bother. Commit a73275dd6fc: Currently, QEMU chooses a drive type automatically based on the inserted media. If there is no disk inserted, it chooses a 1.44MB drive type. Change this behavior to be configurable, but leave it defaulted to 1.44. This is not earnestly intended to be used by a user or a management library, but rather exists so that pre-2.6 board types can configure it to be a legacy value. We do so only for "isa-fdc", in hw/core/machine.c. I don't understand why we don't for the other devices, but that's outside this patch's scope. Downstreams could do it, but it wouldn't work. They need this commit to make it work. Users (human or management application) should not use it, but of course they might anyway. This commit makes such (unadvisable) usage safe. The reward is low, but so is the risk. If I was the maintainer, I'd be tempted to take it up to rc3.
Re: [RFC PATCH v2 00/11] qemu_iotests: improve debugging options
Emanuele Giuseppe Esposito writes: > This series adds the option to attach gdbserver and valgrind > to the QEMU binary running in qemu_iotests. > It also allows to redirect QEMU binaries output of the python tests > to the stdout, instead of a log file. > > Patches 1-6 introduce the -gdb option to both python and bash tests, > 7-10 extend the already existing -valgrind flag to work also on > python tests, and patch 11 introduces -p to enable logging to stdout. > > In particular, patches 1,2,4,8 focus on extending the QMP socket timers > when using gdb/valgrind, otherwise the python tests will fail due to > delays in the QMP responses. > > This series is tested on the previous serie > "qemu-iotests: quality of life improvements" > but independent from it, so it can be applied separately. > > Signed-off-by: Emanuele Giuseppe Esposito How discoverable are these goodies for developers with only superficial knowledge of iotests?
Re: [PATCH 0/2] block/rbd: fix memory leaks
On Wed, Apr 07, 2021 at 11:38:17AM +0200, Markus Armbruster wrote: Max Reitz writes: On 29.03.21 17:01, Stefano Garzarella wrote: This series fixes two memory leaks, found through valgrind, in the rbd driver. Stefano Garzarella (2): block/rbd: fix memory leak in qemu_rbd_connect() block/rbd: fix memory leak in qemu_rbd_co_create_opts() block/rbd.c | 10 ++ 1 file changed, 6 insertions(+), 4 deletions(-) Reviewed-by: Max Reitz I’m not quite sure whether this is fit for 6.0... I think it’s too late for rc2, so I don’t know. This the maintainers' call to make. * PATCH 1: CON: Old bug, probably 2.9, i.e. four years PRO: The fix is straightforward * PATCH 2: NEUTRAL: Not recent from upstream's point of view (5.0), but downstreams may have different ideas PRO: The fix is trivial I encourage you to take at least PATCH 2. Kevin queued them up, thank you both for the review, Stefano
Re: [PATCH 1/2] block/rbd: fix memory leak in qemu_rbd_connect()
On Tue, Apr 06, 2021 at 10:22:30AM +0200, Markus Armbruster wrote: Stefano Garzarella writes: In qemu_rbd_connect(), 'mon_host' is allocated by qemu_rbd_mon_host() using g_strjoinv(), but it's only freed in the error path, leaking memory in the success path as reported by valgrind: 80 bytes in 4 blocks are definitely lost in loss record 5,028 of 6,516 at 0x4839809: malloc (vg_replace_malloc.c:307) by 0x5315BB8: g_malloc (in /usr/lib64/libglib-2.0.so.0.6600.8) by 0x532B6FF: g_strjoinv (in /usr/lib64/libglib-2.0.so.0.6600.8) by 0x87D07E: qemu_rbd_mon_host (rbd.c:538) by 0x87D07E: qemu_rbd_connect (rbd.c:562) by 0x87E1CE: qemu_rbd_open (rbd.c:740) by 0x840EB1: bdrv_open_driver (block.c:1528) by 0x8453A9: bdrv_open_common (block.c:1802) by 0x8453A9: bdrv_open_inherit (block.c:3444) by 0x8464C2: bdrv_open (block.c:3537) by 0x8108CD: qmp_blockdev_add (blockdev.c:3569) by 0x8EA61B: qmp_marshal_blockdev_add (qapi-commands-block-core.c:1086) by 0x90B528: do_qmp_dispatch_bh (qmp-dispatch.c:131) by 0x907EA4: aio_bh_poll (async.c:164) Fix freeing 'mon_host' also when qemu_rbd_connect() ends correctly. Signed-off-by: Stefano Garzarella I believe this Fixes: 0a55679b4a5061f4d74bdb1a0e81611ba3390b00 Yep :-) Reviewed-by: Markus Armbruster Thanks, Stefano
Re: [PATCH 0/6] Add debug interface to kick/call on purpose
在 2021/4/8 下午1:51, Dongli Zhang 写道: On 4/6/21 7:20 PM, Jason Wang wrote: 在 2021/4/7 上午7:27, Dongli Zhang 写道: This will answer your question that "Can it bypass the masking?". For vhost-scsi, virtio-blk, virtio-scsi and virtio-net, to write to eventfd is not able to bypass masking because masking is to unregister the eventfd. To write to eventfd does not take effect. However, it is possible to bypass masking for vhost-net because vhost-net registered a specific masked_notifier eventfd in order to mask irq. To write to original eventfd still takes effect. We may leave the user to decide whether to write to 'masked_notifier' or original 'guest_notifier' for vhost-net. My fault here. To write to masked_notifier will always be masked:( Only when there's no bug in the qemu. If it is EventNotifier level, we will not care whether the EventNotifier is masked or not. It just provides an interface to write to EventNotifier. Yes. To dump the MSI-x table for both virtio and vfio will help confirm if the vector is masked. That would be helpful as well. It's probably better to extend "info pci" command. Thanks I will try if to add to "info pci" (introduce new arg option to "info pci"), or to introduce new command. It's better to just reuse "info pci". About the EventNotifier, I will classify them as guest notifier or host notifier so that it will be much more easier for user to tell if the eventfd is for injecting IRQ or kicking the doorbell. Sounds good. Thank you very much for all suggestions! Dongli Zhang Thanks Thank you very much! Dongli Zhang