Re: [PATCH v3 3/3] qapi: deprecate drive-backup

2021-11-04 Thread Eric Blake
On Wed, Nov 03, 2021 at 02:29:12PM +0100, Vladimir Sementsov-Ogievskiy wrote:
> Modern way is using blockdev-add + blockdev-backup, which provides a
> lot more control on how target is opened.
> 

> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> Reviewed-by: Kashyap Chamarthy 
> ---
>  docs/about/deprecated.rst  | 11 ++
>  docs/interop/live-block-operations.rst | 47 +-
>  qapi/block-core.json   |  5 ++-
>  3 files changed, 46 insertions(+), 17 deletions(-)
> 
> diff --git a/docs/about/deprecated.rst b/docs/about/deprecated.rst
> index 25b7ec8d92..4a4910143f 100644
> --- a/docs/about/deprecated.rst
> +++ b/docs/about/deprecated.rst
> @@ -234,6 +234,17 @@ single ``bitmap``, the new ``block-export-add`` uses a 
> list of ``bitmaps``.
>  Member ``values`` in return value elements with meta-type ``enum`` is
>  deprecated.  Use ``members`` instead.
>  
> +``drive-backup`` (since 6.2)
> +

Correct here,...

> +++ b/docs/interop/live-block-operations.rst

>  
>  QMP invocation for ``drive-backup``
>  ~~~
>  
> +Note that ``drive-backup`` command is deprecated since QEMU 6.1 and
> +will be removed in future.

...but stale here (this patch has been around for a while).

But even though we're in soft freeze, I lean toward this going in
rather than wait yet another release to do the deprecation.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org




Re: [PATCH 12/15] hw/nvme: Initialize capability structures for primary/secondary controllers

2021-11-04 Thread Łukasz Gieryk
On Wed, Nov 03, 2021 at 01:07:31PM +0100, Klaus Jensen wrote:
> On Oct  7 18:24, Lukasz Maniak wrote:
> > From: Łukasz Gieryk 
> > 
> > With two new properties (sriov_max_vi_per_vf, sriov_max_vq_per_vf) one
> > can configure the maximum number of virtual queues and interrupts
> > assignable to a single virtual device. The primary and secondary
> > controller capability structures are initialized accordingly.
> > 
> > Since the number of available queues (interrupts) now varies between
> > VF/PF, BAR size calculation is also adjusted.
> > 
> 
> While this patch allows configuring the VQFRSM and VIFRSM fields, it
> implicitly sets VQFRT and VIFRT (i.e. by setting them to the product of
> sriov_max_vi_pervf and max_vfs). Which is just setting it to an upper
> bound and this removes a testable case for host software (e.g.
> requesting more flexible resources than what is currently available).
> 
> This patch also requires that these parameters are set if sriov_max_vfs
> is. I think we can provide better defaults.
> 

Originally I considered more params, but ended up coding the simplest,
user-friendly solution, because I did not like the mess with so many
parameters, and the flexibility wasn't needed for my use cases. But I do
agree: others may need the flexibility. Case (FRT < max_vfs * FRSM) is
valid and resembles an actual device.

> How about,
> 
> 1. if only sriov_max_vfs is set, then all VFs get private resources
>equal to max_ioqpairs. Like before this patch. This limits the number
>of parameters required to get a basic setup going.
> 
> 2. if sriov_v{q,i}_private is set (I suggested this parameter in patch
>10), the difference between that and max_ioqpairs become flexible
>resources. Also, I'd be just fine with having sriov_v{q,i}_flexible
>instead and just make the difference become private resources.
>Potato/potato.
> 
>a. in the absence of sriov_max_v{q,i}_per_vf, set them to the number
>   of calculated flexible resources.
> 
> This probably smells a bit like bikeshedding, but I think this gives
> more flexibility and better defaults, which helps with verifying host
> software.
> 
> If we can't agree on this now, I suggest we could go ahead and merge the
> base functionality (i.e. private resources only) and ruminate some more
> about these parameters.

The problem is that the spec allows VFs to support either only private,
or only flexible resources.

At this point I have to admit, that since my use cases for
QEMU/Nvme/SRIOV require flexible resources, I haven’t paid much
attention to the case with VFs having private resources. So this SR/IOV
implementation doesn’t even support such case (max_vX_per_vf != 0).

Let me summarize the possible config space, and how the current
parameters (could) map to these (interrupt-related ones omitted):

Flexible resources not supported (not implemented):
 - Private resources for PF = max_ioqpairs
 - Private resources per VF = ?
 - (error if flexible resources are configured)

With flexible resources:
 - VQPRT, private resources for PF  = max_ioqpairs
 - VQFRT, total flexible resources  = max_vq_per_vf * num_vfs
 - VQFRSM, maximum assignable per VF= max_vq_per_vf
 - VQGRAN, granularity  = #define constant
 - (error if private resources per VF are configured)

Since I don’t want to misunderstand your suggestion: could you provide a
similar map with your parameters, formulas, and explain how to determine
if flexible resources are active? I want to be sure we are on the
same page.

-- 
Regards,
Łukasz



Re: [PATCH v8 4/8] qmp: add QMP command x-debug-virtio-status

2021-11-04 Thread Markus Armbruster
Jonah Palmer  writes:

> From: Laurent Vivier 
>
> This new command shows the status of a VirtIODevice, including
> its corresponding vhost device status (if active).
>
> Next patch will improve output by decoding feature bits, including
> vhost device's feature bits (backend, protocol, acked, and features).
> Also will decode status bits of a VirtIODevice.
>
> Next patch will also suppress the vhost device field from displaying
> if no vhost device is active for a given VirtIODevice.
>
> Signed-off-by: Jonah Palmer 
> ---

[...]

> diff --git a/qapi/virtio.json b/qapi/virtio.json
> index 4490c2c..656a26f 100644
> --- a/qapi/virtio.json
> +++ b/qapi/virtio.json
> @@ -65,3 +65,258 @@
>  ##
>  
>  { 'command': 'x-debug-query-virtio', 'returns': ['VirtioInfo'] }
> +
> +##
> +# @VirtioStatusEndianness:
> +#
> +# Enumeration of endianness for VirtioDevice
> +#
> +# Since: 6.2
> +##
> +
> +{ 'enum': 'VirtioStatusEndianness',
> +  'data': [ 'unknown', 'little', 'big' ]
> +}
> +
> +##
> +# @VhostStatus:
> +#
> +# Information about a vhost device. This information will only be
> +# displayed if the vhost device is active.
> +#
> +# @n-mem-sections: vhost_dev n_mem_sections
> +#
> +# @n-tmp-sections: vhost_dev n_tmp_sections
> +#
> +# @nvqs: vhost_dev nvqs. This is the number of virtqueues being used
> +#by the vhost device.
> +#
> +# @vq-index: vhost_dev vq_index
> +#
> +# @features: vhost_dev features
> +#
> +# @acked-features: vhost_dev acked_features
> +#
> +# @backend-features: vhost_dev backend_features
> +#
> +# @protocol-features: vhost_dev protocol_features
> +#
> +# @max-queues: vhost_dev max_queues
> +#
> +# @backend-cap: vhost_dev backend_cap
> +#
> +# @log-enabled: vhost_dev log_enabled flag
> +#
> +# @log-size: vhost_dev log_size
> +#
> +# Since: 6.2
> +#
> +##
> +
> +{ 'struct': 'VhostStatus',
> +'data': {
> +'n-mem-sections': 'int',
> +'n-tmp-sections': 'int',

Odd indentation.  Better

   { 'struct': 'VhostStatus',
 'data': {
 'n-mem-sections': 'int',
 'n-tmp-sections': 'int',

or

   { 'struct': 'VhostStatus',
 'data': { 'n-mem-sections': 'int',
   'n-tmp-sections': 'int',

More of the same below, and possibly in other patches.  I'm not going to
point it out again.

> +'nvqs': 'uint32',
> +'vq-index': 'int',
> +'features': 'uint64',
> +'acked-features': 'uint64',
> +'backend-features': 'uint64',
> +'protocol-features': 'uint64',
> +'max-queues': 'uint64',
> +'backend-cap': 'uint64',
> +'log-enabled': 'bool',
> +'log-size': 'uint64'
> +}
> +}

I can't tell whether these are all needed.  Got to trust virtio experts
there.

I'm not checking the schema types match the data sources' C types.  I
hope you did :)

More of the same below, and possibly in other patches.  I'm not going to
point it out again.

> +
> +##
> +# @VirtioStatus:
> +#
> +# Full status of the virtio device with most VirtIODevice members.
> +# Also includes the full status of the corresponding vhost device
> +# if the vhost device is active.
> +#
> +# @name: VirtIODevice name
> +#
> +# @device-id: VirtIODevice ID
> +#
> +# @vhost-started: VirtIODevice vhost_started flag
> +#
> +# @guest-features: VirtIODevice guest_features
> +#
> +# @host-features: VirtIODevice host_features
> +#
> +# @backend-features: VirtIODevice backend_features
> +#
> +# @device-endian: VirtIODevice device_endian
> +#
> +# @num-vqs: VirtIODevice virtqueue count. This is the number of active
> +#   virtqueues being used by the VirtIODevice.
> +#
> +# @status: VirtIODevice configuration status (e.g. DRIVER_OK,
> +#  FEATURES_OK, DRIVER, etc.)
> +#
> +# @isr: VirtIODevice ISR
> +#
> +# @queue-sel: VirtIODevice queue_sel
> +#
> +# @vm-running: VirtIODevice vm_running flag
> +#
> +# @broken: VirtIODevice broken flag
> +#
> +# @disabled: VirtIODevice disabled flag
> +#
> +# @use-started: VirtIODevice use_started flag
> +#
> +# @started: VirtIODevice started flag
> +#
> +# @start-on-kick: VirtIODevice start_on_kick flag
> +#
> +# @disable-legacy-check: VirtIODevice disabled_legacy_check flag
> +#
> +# @bus-name: VirtIODevice bus_name
> +#
> +# @use-guest-notifier-mask: VirtIODevice use_guest_notifier_mask flag
> +#
> +# @vhost-dev: corresponding vhost device info for a given VirtIODevice
> +#
> +# Since: 6.2
> +#
> +##
> +
> +{ 'struct': 'VirtioStatus',
> +'data': {
> +'name': 'str',
> +'device-id': 'uint16',
> +'vhost-started': 'bool',
> +'guest-features': 'uint64',
> +'host-features': 'uint64',
> +'backend-features': 'uint64',
> +'device-endian': 'VirtioStatusEndianness',
> +'num-vqs': 'int',
> +'status': 'uint8',
> +'isr': 'uint8',
> +'queue-sel': 'uint16',
> +'vm-running': 'bool',
> +'broken': 'bool',
> +'disabled': 'bool',
> +'use-started': 'bool',
> +'started': 'bool',
> + 

[RFC PATCH 3/3] jobs: add job-driver.h

2021-11-04 Thread Emanuele Giuseppe Esposito
job-driver.h contains all functions of job.h that are used by
the drivers (JobDriver, BlockJobDriver).

These functions are unaware of the job_mutex,
so they all take and release the lock internally.

No functional change intended.

Signed-off-by: Emanuele Giuseppe Esposito 
---
 include/qemu/job-driver.h  | 173 +
 include/qemu/job-monitor.h |   4 +-
 include/qemu/job.h | 129 +--
 job.c  |   3 +-
 4 files changed, 179 insertions(+), 130 deletions(-)
 create mode 100644 include/qemu/job-driver.h

diff --git a/include/qemu/job-driver.h b/include/qemu/job-driver.h
new file mode 100644
index 00..19ae5ce8f0
--- /dev/null
+++ b/include/qemu/job-driver.h
@@ -0,0 +1,173 @@
+/*
+ * Declarations for background jobs
+ *
+ * Copyright (c) 2011 IBM Corp.
+ * Copyright (c) 2012, 2018 Red Hat, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to 
deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#ifndef JOB_DRIVER_H
+#define JOB_DRIVER_H
+
+#include "job-common.h"
+
+/*
+ * Job driver API.
+ *
+ * These functions use are used by job drivers like mirror,
+ * stream, commit etc. The driver is not aware of the job_mutex
+ * presence, so these functions use it internally to protect
+ * job fields (see job-common.h).
+ *
+ * Therefore, each function in this API that requires protection
+ * must have the comment
+ * "Called with job_mutext *not* held"
+ * in job.c
+ */
+
+/**
+ * Create a new long-running job and return it.
+ *
+ * @job_id: The id of the newly-created job, or %NULL for internal jobs
+ * @driver: The class object for the newly-created job.
+ * @txn: The transaction this job belongs to, if any. %NULL otherwise.
+ * @ctx: The AioContext to run the job coroutine in.
+ * @flags: Creation flags for the job. See @JobCreateFlags.
+ * @cb: Completion function for the job.
+ * @opaque: Opaque pointer value passed to @cb.
+ * @errp: Error object.
+ */
+void *job_create(const char *job_id, const JobDriver *driver, JobTxn *txn,
+ AioContext *ctx, int flags, BlockCompletionFunc *cb,
+ void *opaque, Error **errp);
+
+/**
+ * @job: The job that has made progress
+ * @done: How much progress the job made since the last call
+ *
+ * Updates the progress counter of the job.
+ */
+void job_progress_update(Job *job, uint64_t done);
+
+/**
+ * @job: The job whose expected progress end value is set
+ * @remaining: Missing progress (on top of the current progress counter value)
+ * until the new expected end value is reached
+ *
+ * Sets the expected end value of the progress counter of a job so that a
+ * completion percentage can be calculated when the progress is updated.
+ */
+void job_progress_set_remaining(Job *job, uint64_t remaining);
+
+/**
+ * @job: The job whose expected progress end value is updated
+ * @delta: Value which is to be added to the current expected end
+ * value
+ *
+ * Increases the expected end value of the progress counter of a job.
+ * This is useful for parenthesis operations: If a job has to
+ * conditionally perform a high-priority operation as part of its
+ * progress, it calls this function with the expected operation's
+ * length before, and job_progress_update() afterwards.
+ * (So the operation acts as a parenthesis in regards to the main job
+ * operation running in background.)
+ */
+void job_progress_increase_remaining(Job *job, uint64_t delta);
+
+/**
+ * @job: A job that has not yet been started.
+ *
+ * Begins execution of a job.
+ * Takes ownership of one reference to the job object.
+ */
+void job_start(Job *job);
+
+/**
+ * @job: The job to enter.
+ *
+ * Continue the specified job by entering the coroutine.
+ */
+void job_enter(Job *job);
+
+/**
+ * @job: The job that is ready to pause.
+ *
+ * Pause now if job_pause() has been called. Jobs that perform lots of I/O
+ * must call this between requests so that the job can be paused.
+ */
+void coroutine_fn job

[RFC PATCH 0/3] job: split job API in driver and monitor

2021-11-04 Thread Emanuele Giuseppe Esposito
In this series, we split the job API in two headers:
job-driver.h and job-monitor.h.
As explained in job.c, job-monitor are the functions mainly used
by the monitor, and require consistency between the search of
a specific job (job_get) and the actual operation/action on it
(e.g. job_user_cancel). Therefore job-monitor API assume that
the job mutex lock is always held by the caller.

job-driver, on the other side, is the collection of functions
that are used by the job drivers or core block layer. These
functions are not aware of the job mutex, and delegate the
locking to the callee instead.

We also have job-common.h contains the job struct definition
and common functions that are not part of monitor or driver APIs.
job.h is left for legacy and to avoid changing all files that
include it.

RFC: alternative names for the headers: job-locked (was job-monitor)
and job-self-locking (was job-driver). I am open to suggestion for
alternative names.

This serie is based on my previous series "job: replace AioContext
lock with job_mutex".

Based-on: <20211104145334.1346363-1-eespo...@redhat.com>

Emanuele Giuseppe Esposito (3):
  jobs: add job-common.h
  jobs: add job-monitor.h
  jobs: add job-driver.h

 include/qemu/job-common.h  | 336 +++
 include/qemu/job-driver.h  | 173 ++
 include/qemu/job-monitor.h | 282 
 include/qemu/job.h | 662 +
 job.c  |   4 +-
 5 files changed, 797 insertions(+), 660 deletions(-)
 create mode 100644 include/qemu/job-common.h
 create mode 100644 include/qemu/job-driver.h
 create mode 100644 include/qemu/job-monitor.h

-- 
2.27.0




[RFC PATCH 1/3] jobs: add job-common.h

2021-11-04 Thread Emanuele Giuseppe Esposito
job-common.h contains all struct and common function that currently
are in job.h and will be shared by job-monitor and job-driver in
the next commits.

Also move job_type() and job_type_str() there, as they are
common helper functions.

No functional change intended.

Signed-off-by: Emanuele Giuseppe Esposito 
---
 include/qemu/job-common.h | 336 ++
 include/qemu/job.h| 307 +-
 2 files changed, 337 insertions(+), 306 deletions(-)
 create mode 100644 include/qemu/job-common.h

diff --git a/include/qemu/job-common.h b/include/qemu/job-common.h
new file mode 100644
index 00..d2968a848e
--- /dev/null
+++ b/include/qemu/job-common.h
@@ -0,0 +1,336 @@
+/*
+ * Declarations for background jobs
+ *
+ * Copyright (c) 2011 IBM Corp.
+ * Copyright (c) 2012, 2018 Red Hat, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to 
deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#ifndef JOB_COMMON_H
+#define JOB_COMMON_H
+
+#include "qapi/qapi-types-job.h"
+#include "qemu/queue.h"
+#include "qemu/progress_meter.h"
+#include "qemu/coroutine.h"
+#include "block/aio.h"
+
+typedef struct JobDriver JobDriver;
+typedef struct JobTxn JobTxn;
+
+
+/**
+ * Long-running operation.
+ */
+typedef struct Job {
+
+/* Fields set at initialization (job_create), and never modified */
+
+/** The ID of the job. May be NULL for internal jobs. */
+char *id;
+
+/**
+ * The type of this job.
+ * All callbacks are called with job_mutex *not* held.
+ */
+const JobDriver *driver;
+
+/** AioContext to run the job coroutine in */
+AioContext *aio_context;
+
+/**
+ * The coroutine that executes the job.  If not NULL, it is reentered when
+ * busy is false and the job is cancelled.
+ * Initialized in job_start()
+ */
+Coroutine *co;
+
+/** True if this job should automatically finalize itself */
+bool auto_finalize;
+
+/** True if this job should automatically dismiss itself */
+bool auto_dismiss;
+
+/** The completion function that will be called when the job completes.  */
+BlockCompletionFunc *cb;
+
+/** The opaque value that is passed to the completion function.  */
+void *opaque;
+
+/* ProgressMeter API is thread-safe */
+ProgressMeter progress;
+
+
+/** Protected by job_mutex */
+
+/** Reference count of the block job */
+int refcnt;
+
+/** Current state; See @JobStatus for details. */
+JobStatus status;
+
+/**
+ * Timer that is used by @job_sleep_ns. Accessed under job_mutex (in
+ * job.c).
+ */
+QEMUTimer sleep_timer;
+
+/**
+ * Counter for pause request. If non-zero, the block job is either paused,
+ * or if busy == true will pause itself as soon as possible.
+ */
+int pause_count;
+
+/**
+ * Set to false by the job while the coroutine has yielded and may be
+ * re-entered by job_enter(). There may still be I/O or event loop activity
+ * pending. Accessed under job_mutex.
+ *
+ * When the job is deferred to the main loop, busy is true as long as the
+ * bottom half is still pending.
+ */
+bool busy;
+
+/**
+ * Set to true by the job while it is in a quiescent state, where
+ * no I/O or event loop activity is pending.
+ */
+bool paused;
+
+/**
+ * Set to true if the job is paused by user.  Can be unpaused with the
+ * block-job-resume QMP command.
+ */
+bool user_paused;
+
+/**
+ * Set to true if the job should cancel itself.  The flag must
+ * always be tested just before toggling the busy flag from false
+ * to true.  After a job has been cancelled, it should only yield
+ * if #aio_poll will ("sooner or later") reenter the coroutine.
+ */
+bool cancelled;
+
+/**
+ * Set to true if the job should abort immediately without waiting
+ * for data to be in sync.
+ */
+bool force_cancel;
+
+/** Se

[RFC PATCH 2/3] jobs: add job-monitor.h

2021-11-04 Thread Emanuele Giuseppe Esposito
job-monitor.h contains all functions of job.h that are used by the
monitor and essentially all functions that do not define a
JobDriver/Blockdriver.

No functional change intended.

Signed-off-by: Emanuele Giuseppe Esposito 
---
 include/qemu/job-monitor.h | 282 +
 include/qemu/job.h | 228 +-
 job.c  |   1 +
 3 files changed, 284 insertions(+), 227 deletions(-)
 create mode 100644 include/qemu/job-monitor.h

diff --git a/include/qemu/job-monitor.h b/include/qemu/job-monitor.h
new file mode 100644
index 00..7189cdafef
--- /dev/null
+++ b/include/qemu/job-monitor.h
@@ -0,0 +1,282 @@
+/*
+ * Declarations for background jobs
+ *
+ * Copyright (c) 2011 IBM Corp.
+ * Copyright (c) 2012, 2018 Red Hat, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to 
deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#ifndef JOB_MONITOR_H
+#define JOB_MONITOR_H
+
+#include "job-common.h"
+
+/*
+ * Job monitor API.
+ *
+ * These functions use are used by the QEMU monitor, for example
+ * to execute QMP commands. The monitor is aware of the job_mutex
+ * presence, so these functions assume it is held by the caller
+ * to protect job fields (see job-common.h).
+ * This prevents TOC/TOU bugs, allowing the caller to hold the
+ * lock between a check in the job state and the actual action.
+ *
+ * Therefore, each function in this API that needs protection
+ * must have the comment
+ * "Called between job_lock and job_unlock."
+ */
+
+/**
+ * Allocate and return a new job transaction. Jobs can be added to the
+ * transaction using job_txn_add_job().
+ *
+ * The transaction is automatically freed when the last job completes or is
+ * cancelled.
+ *
+ * All jobs in the transaction either complete successfully or fail/cancel as a
+ * group.  Jobs wait for each other before completing.  Cancelling one job
+ * cancels all jobs in the transaction.
+ */
+JobTxn *job_txn_new(void);
+
+/**
+ * Release a reference that was previously acquired with job_txn_add_job or
+ * job_txn_new. If it's the last reference to the object, it will be freed.
+ */
+void job_txn_unref(JobTxn *txn);
+
+/**
+ * @txn: The transaction (may be NULL)
+ * @job: Job to add to the transaction
+ *
+ * Add @job to the transaction.  The @job must not already be in a transaction.
+ * The caller must call either job_txn_unref() or job_completed() to release
+ * the reference that is automatically grabbed here.
+ *
+ * If @txn is NULL, the function does nothing.
+ *
+ * Called between job_lock and job_unlock.
+ */
+void job_txn_add_job(JobTxn *txn, Job *job);
+
+/**
+ * Add a reference to Job refcnt, it will be decreased with job_unref, and then
+ * be freed if it comes to be the last reference.
+ *
+ * Called between job_lock and job_unlock.
+ */
+void job_ref(Job *job);
+
+/**
+ * Release a reference that was previously acquired with job_ref() or
+ * job_create(). If it's the last reference to the object, it will be freed.
+ *
+ * Called between job_lock and job_unlock.
+ */
+void job_unref(Job *job);
+
+/**
+ * Conditionally enter the job coroutine if the job is ready to run, not
+ * already busy and fn() returns true. fn() is called while under the job_lock
+ * critical section.
+ *
+ * Called between job_lock and job_unlock, but it releases the lock temporarly.
+ */
+void job_enter_cond(Job *job, bool(*fn)(Job *job));
+
+/**
+ * Returns true if the job should not be visible to the management layer.
+ */
+bool job_is_internal(Job *job);
+
+/**
+ * Returns whether the job is in a completed state.
+ * Called between job_lock and job_unlock.
+ */
+bool job_is_completed(Job *job);
+
+/**
+ * Request @job to pause at the next pause point. Must be paired with
+ * job_resume(). If the job is supposed to be resumed by user action, call
+ * job_user_pause() instead.
+ *
+ * Called between job_lock and job_unlock.
+ */
+void job_pause(Job *job);
+
+/**
+ * Resumes a @job paused with job_pause.
+ * Ca

Re: [PATCH v8 3/8] qmp: add QMP command x-debug-query-virtio

2021-11-04 Thread Markus Armbruster
Jonah Palmer  writes:

> From: Laurent Vivier 
>
> This new command lists all the instances of VirtIODevice with
> their QOM paths and virtio type/name.
>
> Signed-off-by: Jonah Palmer 
> ---
>  hw/virtio/meson.build  |  2 ++
>  hw/virtio/virtio-stub.c| 14 ++
>  hw/virtio/virtio.c | 27 +++
>  include/hw/virtio/virtio.h |  1 +
>  qapi/meson.build   |  1 +
>  qapi/qapi-schema.json  |  1 +
>  qapi/virtio.json   | 67 
> ++
>  tests/qtest/qmp-cmd-test.c |  1 +
>  8 files changed, 114 insertions(+)
>  create mode 100644 hw/virtio/virtio-stub.c
>  create mode 100644 qapi/virtio.json
>
> diff --git a/hw/virtio/meson.build b/hw/virtio/meson.build
> index 521f7d6..d893f5f 100644
> --- a/hw/virtio/meson.build
> +++ b/hw/virtio/meson.build
> @@ -6,8 +6,10 @@ softmmu_virtio_ss.add(when: 'CONFIG_VHOST', if_false: 
> files('vhost-stub.c'))
>  
>  softmmu_ss.add_all(when: 'CONFIG_VIRTIO', if_true: softmmu_virtio_ss)
>  softmmu_ss.add(when: 'CONFIG_VIRTIO', if_false: files('vhost-stub.c'))
> +softmmu_ss.add(when: 'CONFIG_VIRTIO', if_false: files('virtio-stub.c'))
>  
>  softmmu_ss.add(when: 'CONFIG_ALL', if_true: files('vhost-stub.c'))
> +softmmu_ss.add(when: 'CONFIG_ALL', if_true: files('virtio-stub.c'))
>  
>  virtio_ss = ss.source_set()
>  virtio_ss.add(files('virtio.c'))
> diff --git a/hw/virtio/virtio-stub.c b/hw/virtio/virtio-stub.c
> new file mode 100644
> index 000..d4a88f5
> --- /dev/null
> +++ b/hw/virtio/virtio-stub.c
> @@ -0,0 +1,14 @@
> +#include "qemu/osdep.h"
> +#include "qapi/error.h"
> +#include "qapi/qapi-commands-virtio.h"
> +
> +static void *qmp_virtio_unsupported(Error **errp)
> +{
> +error_setg(errp, "Virtio is disabled");
> +return NULL;
> +}
> +
> +VirtioInfoList *qmp_x_debug_query_virtio(Error **errp)
> +{
> +return qmp_virtio_unsupported(errp);
> +}
> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> index 7050bd5..ad17be7 100644
> --- a/hw/virtio/virtio.c
> +++ b/hw/virtio/virtio.c
> @@ -13,6 +13,8 @@
>  
>  #include "qemu/osdep.h"
>  #include "qapi/error.h"
> +#include "qapi/qapi-commands-virtio.h"
> +#include "qapi/qapi-visit-virtio.h"
>  #include "cpu.h"
>  #include "trace.h"
>  #include "qemu/error-report.h"
> @@ -29,6 +31,9 @@
>  #include "sysemu/runstate.h"
>  #include "standard-headers/linux/virtio_ids.h"
>  
> +/* QAPI list of VirtIODevices */
> +static QTAILQ_HEAD(, VirtIODevice) virtio_list;
> +
>  /*
>   * The alignment to use between consumer and producer parts of vring.
>   * x86 pagesize again. This is the default, used by transports like PCI
> @@ -3709,6 +3714,7 @@ static void virtio_device_realize(DeviceState *dev, 
> Error **errp)
>  vdev->listener.commit = virtio_memory_listener_commit;
>  vdev->listener.name = "virtio";
>  memory_listener_register(&vdev->listener, vdev->dma_as);
> +QTAILQ_INSERT_TAIL(&virtio_list, vdev, next);
>  }
>  
>  static void virtio_device_unrealize(DeviceState *dev)
> @@ -3723,6 +3729,7 @@ static void virtio_device_unrealize(DeviceState *dev)
>  vdc->unrealize(dev);
>  }
>  
> +QTAILQ_REMOVE(&virtio_list, vdev, next);
>  g_free(vdev->bus_name);
>  vdev->bus_name = NULL;
>  }
> @@ -3896,6 +3903,8 @@ static void virtio_device_class_init(ObjectClass 
> *klass, void *data)
>  vdc->stop_ioeventfd = virtio_device_stop_ioeventfd_impl;
>  
>  vdc->legacy_features |= VIRTIO_LEGACY_FEATURES;
> +
> +QTAILQ_INIT(&virtio_list);
>  }
>  
>  bool virtio_device_ioeventfd_enabled(VirtIODevice *vdev)
> @@ -3906,6 +3915,24 @@ bool virtio_device_ioeventfd_enabled(VirtIODevice 
> *vdev)
>  return virtio_bus_ioeventfd_enabled(vbus);
>  }
>  
> +VirtioInfoList *qmp_x_debug_query_virtio(Error **errp)
> +{
> +VirtioInfoList *list = NULL;
> +VirtioInfoList *node;
> +VirtIODevice *vdev;
> +
> +QTAILQ_FOREACH(vdev, &virtio_list, next) {
> +DeviceState *dev = DEVICE(vdev);
> +node = g_new0(VirtioInfoList, 1);
> +node->value = g_new(VirtioInfo, 1);
> +node->value->path = g_strdup(dev->canonical_path);
> +node->value->type = g_strdup(vdev->name);
> +QAPI_LIST_PREPEND(list, node->value);
> +}
> +
> +return list;
> +}
> +
>  static const TypeInfo virtio_device_info = {
>  .name = TYPE_VIRTIO_DEVICE,
>  .parent = TYPE_DEVICE,
> diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
> index 105b98c..eceaafc 100644
> --- a/include/hw/virtio/virtio.h
> +++ b/include/hw/virtio/virtio.h
> @@ -110,6 +110,7 @@ struct VirtIODevice
>  bool use_guest_notifier_mask;
>  AddressSpace *dma_as;
>  QLIST_HEAD(, VirtQueue) *vector_queues;
> +QTAILQ_ENTRY(VirtIODevice) next;
>  };
>  
>  struct VirtioDeviceClass {
> diff --git a/qapi/meson.build b/qapi/meson.build
> index c356a38..df5662e 100644
> --- a/qapi/meson.build
> +++ b/qapi/meson.build
> @@ -45,6 +45,7 @@ qapi_all_modules = [
>'sockets',
>'trace',
>  

[RFC PATCH v2 12/14] jobs: use job locks and helpers also in the unit tests

2021-11-04 Thread Emanuele Giuseppe Esposito
Add missing job synchronization in the unit tests, with
both explicit locks and helpers.

Note: at this stage, job_{lock/unlock} and job lock guard macros
are *nop*.

Signed-off-by: Emanuele Giuseppe Esposito 
---
 tests/unit/test-bdrv-drain.c | 40 +++---
 tests/unit/test-block-iothread.c |  4 +++
 tests/unit/test-blockjob-txn.c   | 10 ++
 tests/unit/test-blockjob.c   | 57 +---
 4 files changed, 72 insertions(+), 39 deletions(-)

diff --git a/tests/unit/test-bdrv-drain.c b/tests/unit/test-bdrv-drain.c
index 2d3c17e566..535c39b5a8 100644
--- a/tests/unit/test-bdrv-drain.c
+++ b/tests/unit/test-bdrv-drain.c
@@ -941,61 +941,63 @@ static void test_blockjob_common_drain_node(enum 
drain_type drain_type,
 }
 }
 
-g_assert_cmpint(job->job.pause_count, ==, 0);
-g_assert_false(job->job.paused);
+g_assert_cmpint(job_get_pause_count(&job->job), ==, 0);
+g_assert_false(job_get_paused(&job->job));
 g_assert_true(tjob->running);
-g_assert_true(job->job.busy); /* We're in qemu_co_sleep_ns() */
+g_assert_true(job_get_busy(&job->job)); /* We're in qemu_co_sleep_ns() */
 
 do_drain_begin_unlocked(drain_type, drain_bs);
 
 if (drain_type == BDRV_DRAIN_ALL) {
 /* bdrv_drain_all() drains both src and target */
-g_assert_cmpint(job->job.pause_count, ==, 2);
+g_assert_cmpint(job_get_pause_count(&job->job), ==, 2);
 } else {
-g_assert_cmpint(job->job.pause_count, ==, 1);
+g_assert_cmpint(job_get_pause_count(&job->job), ==, 1);
 }
-g_assert_true(job->job.paused);
-g_assert_false(job->job.busy); /* The job is paused */
+g_assert_true(job_get_paused(&job->job));
+g_assert_false(job_get_busy(&job->job)); /* The job is paused */
 
 do_drain_end_unlocked(drain_type, drain_bs);
 
 if (use_iothread) {
 /* paused is reset in the I/O thread, wait for it */
-while (job->job.paused) {
+while (job_get_paused(&job->job)) {
 aio_poll(qemu_get_aio_context(), false);
 }
 }
 
-g_assert_cmpint(job->job.pause_count, ==, 0);
-g_assert_false(job->job.paused);
-g_assert_true(job->job.busy); /* We're in qemu_co_sleep_ns() */
+g_assert_cmpint(job_get_pause_count(&job->job), ==, 0);
+g_assert_false(job_get_paused(&job->job));
+g_assert_true(job_get_busy(&job->job)); /* We're in qemu_co_sleep_ns() */
 
 do_drain_begin_unlocked(drain_type, target);
 
 if (drain_type == BDRV_DRAIN_ALL) {
 /* bdrv_drain_all() drains both src and target */
-g_assert_cmpint(job->job.pause_count, ==, 2);
+g_assert_cmpint(job_get_pause_count(&job->job), ==, 2);
 } else {
-g_assert_cmpint(job->job.pause_count, ==, 1);
+g_assert_cmpint(job_get_pause_count(&job->job), ==, 1);
 }
-g_assert_true(job->job.paused);
-g_assert_false(job->job.busy); /* The job is paused */
+g_assert_true(job_get_paused(&job->job));
+g_assert_false(job_get_busy(&job->job)); /* The job is paused */
 
 do_drain_end_unlocked(drain_type, target);
 
 if (use_iothread) {
 /* paused is reset in the I/O thread, wait for it */
-while (job->job.paused) {
+while (job_get_paused(&job->job)) {
 aio_poll(qemu_get_aio_context(), false);
 }
 }
 
-g_assert_cmpint(job->job.pause_count, ==, 0);
-g_assert_false(job->job.paused);
-g_assert_true(job->job.busy); /* We're in qemu_co_sleep_ns() */
+g_assert_cmpint(job_get_pause_count(&job->job), ==, 0);
+g_assert_false(job_get_paused(&job->job));
+g_assert_true(job_get_busy(&job->job)); /* We're in qemu_co_sleep_ns() */
 
 aio_context_acquire(ctx);
+job_lock();
 ret = job_complete_sync(&job->job, &error_abort);
+job_unlock();
 g_assert_cmpint(ret, ==, (result == TEST_JOB_SUCCESS ? 0 : -EIO));
 
 if (use_iothread) {
diff --git a/tests/unit/test-block-iothread.c b/tests/unit/test-block-iothread.c
index aea660aeed..f39cb8b7ef 100644
--- a/tests/unit/test-block-iothread.c
+++ b/tests/unit/test-block-iothread.c
@@ -456,7 +456,9 @@ static void test_attach_blockjob(void)
 }
 
 aio_context_acquire(ctx);
+job_lock();
 job_complete_sync(&tjob->common.job, &error_abort);
+job_unlock();
 blk_set_aio_context(blk, qemu_get_aio_context(), &error_abort);
 aio_context_release(ctx);
 
@@ -630,7 +632,9 @@ static void test_propagate_mirror(void)
  BLOCKDEV_ON_ERROR_REPORT, BLOCKDEV_ON_ERROR_REPORT,
  false, "filter_node", MIRROR_COPY_MODE_BACKGROUND,
  &error_abort);
+job_lock();
 job = job_get("job0");
+job_unlock();
 filter = bdrv_find_node("filter_node");
 
 /* Change the AioContext of src */
diff --git a/tests/unit/test-blockjob-txn.c b/tests/unit/test-blockjob-txn.c
index 8bd13b9949..1ae3a9d443 100644
--- a/tests/unit/test-blockjob-txn.c
+++ b/tests/unit/test-blockjob-txn.c
@@ -124,16 +

[RFC PATCH v2 14/14] job.c: enable job lock/unlock and remove Aiocontext locks

2021-11-04 Thread Emanuele Giuseppe Esposito
Change the job_{lock/unlock} and macros to use job_mutex.

Now that they are not nop anymore, remove the aiocontext
to avoid deadlocks.

Therefore:
- when possible, remove completely the aiocontext lock/unlock pair
- if it is used by some other functions too, reduce the locking
section as much as possible, leaving the job API outside.

There is only one JobDriver callback, ->free() that assumes that
the aiocontext lock is held (because it calls bdrv_unref), so for
now keep that under aiocontext lock.

Also remove _job_{lock/unlock}, as they are replaced by the
public functions.

Signed-off-by: Emanuele Giuseppe Esposito 
---
 include/qemu/job.h   |  7 ---
 block/replication.c  |  2 +
 blockdev.c   | 62 
 job-qmp.c| 38 ---
 job.c| 81 
 tests/unit/test-bdrv-drain.c |  4 +-
 tests/unit/test-block-iothread.c |  2 +-
 tests/unit/test-blockjob.c   | 13 ++---
 8 files changed, 34 insertions(+), 175 deletions(-)

diff --git a/include/qemu/job.h b/include/qemu/job.h
index 58b3af47e3..d417e1b601 100644
--- a/include/qemu/job.h
+++ b/include/qemu/job.h
@@ -608,8 +608,6 @@ void job_user_cancel(Job *job, bool force, Error **errp);
  *
  * Returns the return value from the job if the job actually completed
  * during the call, or -ECANCELED if it was canceled.
- *
- * Callers must hold the AioContext lock of job->aio_context.
  */
 int job_cancel_sync(Job *job, bool force);
 
@@ -633,9 +631,6 @@ void job_cancel_sync_all(void);
  * function).
  *
  * Returns the return value from the job.
- *
- * Callers must hold the AioContext lock of job->aio_context.
- *
  * Called between job_lock and job_unlock.
  */
 int job_complete_sync(Job *job, Error **errp);
@@ -667,8 +662,6 @@ void job_dismiss(Job **job, Error **errp);
  * Returns 0 if the job is successfully completed, -ECANCELED if the job was
  * cancelled before completing, and -errno in other error cases.
  *
- * Callers must hold the AioContext lock of job->aio_context.
- *
  * Called between job_lock and job_unlock.
  */
 int job_finish_sync(Job *job, void (*finish)(Job *, Error **errp), Error 
**errp);
diff --git a/block/replication.c b/block/replication.c
index 0f487cc215..6a60c1af1a 100644
--- a/block/replication.c
+++ b/block/replication.c
@@ -728,9 +728,11 @@ static void replication_stop(ReplicationState *rs, bool 
failover, Error **errp)
  * disk, secondary disk in backup_job_completed().
  */
 if (s->backup_job) {
+aio_context_release(aio_context);
 job_lock();
 job_cancel_sync(&s->backup_job->job, true);
 job_unlock();
+aio_context_acquire(aio_context);
 }
 
 if (!failover) {
diff --git a/blockdev.c b/blockdev.c
index 0bd79757fc..dfc73ef1bf 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -154,12 +154,7 @@ void blockdev_mark_auto_del(BlockBackend *blk)
 
 for (job = block_job_next(NULL); job; job = block_job_next(job)) {
 if (block_job_has_bdrv(job, blk_bs(blk))) {
-AioContext *aio_context = job->job.aio_context;
-aio_context_acquire(aio_context);
-
 job_cancel(&job->job, false);
-
-aio_context_release(aio_context);
 }
 }
 
@@ -1843,16 +1838,9 @@ static void drive_backup_abort(BlkActionState *common)
 DriveBackupState *state = DO_UPCAST(DriveBackupState, common, common);
 
 if (state->job) {
-AioContext *aio_context;
-
-aio_context = bdrv_get_aio_context(state->bs);
-aio_context_acquire(aio_context);
-
 job_lock();
 job_cancel_sync(&state->job->job, true);
 job_unlock();
-
-aio_context_release(aio_context);
 }
 }
 
@@ -1946,16 +1934,9 @@ static void blockdev_backup_abort(BlkActionState *common)
 BlockdevBackupState *state = DO_UPCAST(BlockdevBackupState, common, 
common);
 
 if (state->job) {
-AioContext *aio_context;
-
-aio_context = bdrv_get_aio_context(state->bs);
-aio_context_acquire(aio_context);
-
 job_lock();
 job_cancel_sync(&state->job->job, true);
 job_unlock();
-
-aio_context_release(aio_context);
 }
 }
 
@@ -3318,15 +3299,13 @@ out:
 aio_context_release(aio_context);
 }
 
-/* Get a block job using its ID and acquire its AioContext */
-static BlockJob *find_block_job(const char *id, AioContext **aio_context,
-Error **errp)
+/* Get a block job using its ID. Returns with job_lock held on success */
+static BlockJob *find_block_job(const char *id, Error **errp)
 {
 BlockJob *job;
 
 assert(id != NULL);
 
-*aio_context = NULL;
 job_lock();
 
 job = block_job_get(id);
@@ -3338,31 +3317,25 @@ static BlockJob *find_block_job(const char *id, 
AioContext **aio_context,
 return NULL;
 }
 
-*aio_context = blk_get_aio_contex

[RFC PATCH v2 07/14] job.c: move inner aiocontext lock in callbacks

2021-11-04 Thread Emanuele Giuseppe Esposito
Instead of having the lock in job_tnx_apply, move it inside
in the callback. This will be helpful for next commits, when
we introduce job_lock/unlock pairs.

job_transition_to_pending() and job_needs_finalize() do not
need to be protected by the aiocontext lock.

No functional change intended.

Signed-off-by: Emanuele Giuseppe Esposito 
---
 job.c | 15 +++
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/job.c b/job.c
index ce5066522f..7856fa734b 100644
--- a/job.c
+++ b/job.c
@@ -170,7 +170,6 @@ static void job_txn_del_job(Job *job)
 
 static int job_txn_apply(Job *job, int fn(Job *))
 {
-AioContext *inner_ctx;
 Job *other_job, *next;
 JobTxn *txn = job->txn;
 int rc = 0;
@@ -185,10 +184,7 @@ static int job_txn_apply(Job *job, int fn(Job *))
 aio_context_release(job->aio_context);
 
 QLIST_FOREACH_SAFE(other_job, &txn->jobs, txn_list, next) {
-inner_ctx = other_job->aio_context;
-aio_context_acquire(inner_ctx);
 rc = fn(other_job);
-aio_context_release(inner_ctx);
 if (rc) {
 break;
 }
@@ -820,11 +816,15 @@ static void job_clean(Job *job)
 
 static int job_finalize_single(Job *job)
 {
+AioContext *ctx = job->aio_context;
+
 assert(job_is_completed(job));
 
 /* Ensure abort is called for late-transactional failures */
 job_update_rc(job);
 
+aio_context_acquire(ctx);
+
 if (!job->ret) {
 job_commit(job);
 } else {
@@ -832,6 +832,8 @@ static int job_finalize_single(Job *job)
 }
 job_clean(job);
 
+aio_context_release(ctx);
+
 if (job->cb) {
 job->cb(job->opaque, job->ret);
 }
@@ -952,11 +954,16 @@ static void job_completed_txn_abort(Job *job)
 
 static int job_prepare(Job *job)
 {
+AioContext *ctx = job->aio_context;
 assert(qemu_in_main_thread());
+
 if (job->ret == 0 && job->driver->prepare) {
+aio_context_acquire(ctx);
 job->ret = job->driver->prepare(job);
+aio_context_release(ctx);
 job_update_rc(job);
 }
+
 return job->ret;
 }
 
-- 
2.27.0




[RFC PATCH v2 10/14] jobs: protect jobs with job_lock/unlock

2021-11-04 Thread Emanuele Giuseppe Esposito
Introduce the job locking mechanism through the whole job API,
following the comments and requirements of job-monitor (assume
lock is held) and job-driver (lock is not held).

job_{lock/unlock} is independent from _job_{lock/unlock}.

Note: at this stage, job_{lock/unlock} and job lock guard macros
are *nop*.

Signed-off-by: Emanuele Giuseppe Esposito 
---
 block.c |   6 ++
 block/replication.c |   4 +
 blockdev.c  |  13 +++
 blockjob.c  |  37 +++-
 job-qmp.c   |   4 +
 job.c   | 201 ++--
 monitor/qmp-cmds.c  |   2 +
 qemu-img.c  |   8 +-
 8 files changed, 229 insertions(+), 46 deletions(-)

diff --git a/block.c b/block.c
index da80e89ad4..a6dcd9eb36 100644
--- a/block.c
+++ b/block.c
@@ -4826,7 +4826,9 @@ static void bdrv_close(BlockDriverState *bs)
 
 void bdrv_close_all(void)
 {
+job_lock();
 assert(job_next(NULL) == NULL);
+job_unlock();
 assert(qemu_in_main_thread());
 
 /* Drop references from requests still in flight, such as canceled block
@@ -5965,6 +5967,8 @@ XDbgBlockGraph *bdrv_get_xdbg_block_graph(Error **errp)
 }
 }
 
+job_lock();
+
 for (job = block_job_next(NULL); job; job = block_job_next(job)) {
 GSList *el;
 
@@ -5975,6 +5979,8 @@ XDbgBlockGraph *bdrv_get_xdbg_block_graph(Error **errp)
 }
 }
 
+job_unlock();
+
 QTAILQ_FOREACH(bs, &graph_bdrv_states, node_list) {
 xdbg_graph_add_node(gr, bs, X_DBG_BLOCK_GRAPH_NODE_TYPE_BLOCK_DRIVER,
bs->node_name);
diff --git a/block/replication.c b/block/replication.c
index 55c8f894aa..0f487cc215 100644
--- a/block/replication.c
+++ b/block/replication.c
@@ -149,7 +149,9 @@ static void replication_close(BlockDriverState *bs)
 if (s->stage == BLOCK_REPLICATION_FAILOVER) {
 commit_job = &s->commit_job->job;
 assert(commit_job->aio_context == qemu_get_current_aio_context());
+job_lock();
 job_cancel_sync(commit_job, false);
+job_unlock();
 }
 
 if (s->mode == REPLICATION_MODE_SECONDARY) {
@@ -726,7 +728,9 @@ static void replication_stop(ReplicationState *rs, bool 
failover, Error **errp)
  * disk, secondary disk in backup_job_completed().
  */
 if (s->backup_job) {
+job_lock();
 job_cancel_sync(&s->backup_job->job, true);
+job_unlock();
 }
 
 if (!failover) {
diff --git a/blockdev.c b/blockdev.c
index 67b55eec11..c5a835d9ed 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -150,6 +150,8 @@ void blockdev_mark_auto_del(BlockBackend *blk)
 return;
 }
 
+job_lock();
+
 for (job = block_job_next(NULL); job; job = block_job_next(job)) {
 if (block_job_has_bdrv(job, blk_bs(blk))) {
 AioContext *aio_context = job->job.aio_context;
@@ -161,6 +163,8 @@ void blockdev_mark_auto_del(BlockBackend *blk)
 }
 }
 
+job_unlock();
+
 dinfo->auto_del = 1;
 }
 
@@ -1844,7 +1848,9 @@ static void drive_backup_abort(BlkActionState *common)
 aio_context = bdrv_get_aio_context(state->bs);
 aio_context_acquire(aio_context);
 
+job_lock();
 job_cancel_sync(&state->job->job, true);
+job_unlock();
 
 aio_context_release(aio_context);
 }
@@ -1945,7 +1951,9 @@ static void blockdev_backup_abort(BlkActionState *common)
 aio_context = bdrv_get_aio_context(state->bs);
 aio_context_acquire(aio_context);
 
+job_lock();
 job_cancel_sync(&state->job->job, true);
+job_unlock();
 
 aio_context_release(aio_context);
 }
@@ -2394,7 +2402,9 @@ exit:
 if (!has_props) {
 qapi_free_TransactionProperties(props);
 }
+job_lock();
 job_txn_unref(block_job_txn);
+job_unlock();
 }
 
 BlockDirtyBitmapSha256 *qmp_x_debug_block_dirty_bitmap_sha256(const char *node,
@@ -3717,6 +3727,7 @@ BlockJobInfoList *qmp_query_block_jobs(Error **errp)
 BlockJobInfoList *head = NULL, **tail = &head;
 BlockJob *job;
 
+job_lock();
 for (job = block_job_next(NULL); job; job = block_job_next(job)) {
 BlockJobInfo *value;
 
@@ -3726,10 +3737,12 @@ BlockJobInfoList *qmp_query_block_jobs(Error **errp)
 value = block_job_query(job, errp);
 if (!value) {
 qapi_free_BlockJobInfoList(head);
+job_unlock();
 return NULL;
 }
 QAPI_LIST_APPEND(tail, value);
 }
+job_unlock();
 
 return head;
 }
diff --git a/blockjob.c b/blockjob.c
index 53c1e9c406..dcc13dc336 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -88,19 +88,25 @@ static char *child_job_get_parent_desc(BdrvChild *c)
 static void child_job_drained_begin(BdrvChild *c)
 {
 BlockJob *job = c->opaque;
+job_lock();
 job_pause(&job->job);
+job_unlock();
 }
 
 static bool child_job_drained_poll(BdrvChild *c)
 {
 BlockJob *bjob = c->opaque;
 Job *job = &bjob->job;

[RFC PATCH v2 11/14] block_job_query: remove atomic read

2021-11-04 Thread Emanuele Giuseppe Esposito
Not sure what the atomic here was supposed to do, since job.busy
is protected by the job lock. Since the whole function will
be called under job_mutex, just remove the atomic.

Signed-off-by: Emanuele Giuseppe Esposito 
---
 blockjob.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/blockjob.c b/blockjob.c
index dcc13dc336..426dcddcc1 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -314,6 +314,7 @@ int64_t block_job_ratelimit_get_delay(BlockJob *job, 
uint64_t n)
 return ratelimit_calculate_delay(&job->limit, n);
 }
 
+/* Called with job_mutex held */
 BlockJobInfo *block_job_query(BlockJob *job, Error **errp)
 {
 BlockJobInfo *info;
@@ -332,13 +333,13 @@ BlockJobInfo *block_job_query(BlockJob *job, Error **errp)
 info = g_new0(BlockJobInfo, 1);
 info->type  = g_strdup(job_type_str(&job->job));
 info->device= g_strdup(job->job.id);
-info->busy  = qatomic_read(&job->job.busy);
+info->busy  = job->job.busy;
 info->paused= job->job.pause_count > 0;
 info->offset= progress_current;
 info->len   = progress_total;
 info->speed = job->speed;
 info->io_status = job->iostatus;
-info->ready = job_is_ready(&job->job),
+info->ready = job_is_ready_locked(&job->job),
 info->status= job->job.status;
 info->auto_finalize = job->job.auto_finalize;
 info->auto_dismiss  = job->job.auto_dismiss;
-- 
2.27.0




[RFC PATCH v2 09/14] jobs: remove aiocontext locks since the functions are under BQL

2021-11-04 Thread Emanuele Giuseppe Esposito
In preparation to the job_lock/unlock patch, remove these
aiocontext locks.
The main reason these two locks are removed here is because
they are inside a loop iterating on the jobs list. Once the
job_lock is added, it will have to protect the whole loop,
wrapping also the aiocontext acquire/release.

We don't want this, as job_lock can only be *wrapped by*
the aiocontext lock, and not vice-versa, to avoid deadlocks.

Signed-off-by: Emanuele Giuseppe Esposito 
---
 blockdev.c | 4 
 job-qmp.c  | 4 
 2 files changed, 8 deletions(-)

diff --git a/blockdev.c b/blockdev.c
index d9bf842a81..67b55eec11 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -3719,15 +3719,11 @@ BlockJobInfoList *qmp_query_block_jobs(Error **errp)
 
 for (job = block_job_next(NULL); job; job = block_job_next(job)) {
 BlockJobInfo *value;
-AioContext *aio_context;
 
 if (block_job_is_internal(job)) {
 continue;
 }
-aio_context = blk_get_aio_context(job->blk);
-aio_context_acquire(aio_context);
 value = block_job_query(job, errp);
-aio_context_release(aio_context);
 if (!value) {
 qapi_free_BlockJobInfoList(head);
 return NULL;
diff --git a/job-qmp.c b/job-qmp.c
index 829a28aa70..a67745 100644
--- a/job-qmp.c
+++ b/job-qmp.c
@@ -173,15 +173,11 @@ JobInfoList *qmp_query_jobs(Error **errp)
 
 for (job = job_next(NULL); job; job = job_next(job)) {
 JobInfo *value;
-AioContext *aio_context;
 
 if (job_is_internal(job)) {
 continue;
 }
-aio_context = job->aio_context;
-aio_context_acquire(aio_context);
 value = job_query_single(job, errp);
-aio_context_release(aio_context);
 if (!value) {
 qapi_free_JobInfoList(head);
 return NULL;
-- 
2.27.0




[RFC PATCH v2 05/14] block/mirror.c: use of job helpers in drivers to avoid TOC/TOU

2021-11-04 Thread Emanuele Giuseppe Esposito
Once job lock is used and aiocontext is removed, mirror has
to perform job operations under the same critical section,
using the helpers prepared in previous commit.

Note: at this stage, job_{lock/unlock} and job lock guard macros
are *nop*.

Signed-off-by: Emanuele Giuseppe Esposito 
---
 block/mirror.c | 8 +++-
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index 00089e519b..f22fa7da6e 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -653,7 +653,7 @@ static int mirror_exit_common(Job *job)
 BlockDriverState *target_bs;
 BlockDriverState *mirror_top_bs;
 Error *local_err = NULL;
-bool abort = job->ret < 0;
+bool abort = job_has_failed(job);
 int ret = 0;
 
 if (s->prepared) {
@@ -1161,9 +1161,7 @@ static void mirror_complete(Job *job, Error **errp)
 s->should_complete = true;
 
 /* If the job is paused, it will be re-entered when it is resumed */
-if (!job->paused) {
-job_enter(job);
-}
+job_enter_not_paused(job);
 }
 
 static void coroutine_fn mirror_pause(Job *job)
@@ -1182,7 +1180,7 @@ static bool mirror_drained_poll(BlockJob *job)
  * from one of our own drain sections, to avoid a deadlock waiting for
  * ourselves.
  */
-if (!s->common.job.paused && !job_is_cancelled(&job->job) && !s->in_drain) 
{
+if (job_not_paused_nor_cancelled(&s->common.job) && !s->in_drain) {
 return true;
 }
 
-- 
2.27.0




[RFC PATCH v2 08/14] aio-wait.h: introduce AIO_WAIT_WHILE_UNLOCKED

2021-11-04 Thread Emanuele Giuseppe Esposito
Same as AIO_WAIT_WHILE macro, but if we are in the Main loop
do not release and then acquire ctx_ 's aiocontext.

Once all Aiocontext locks go away, this macro will replace
AIO_WAIT_WHILE.

Signed-off-by: Emanuele Giuseppe Esposito 
---
 include/block/aio-wait.h | 15 +++
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/include/block/aio-wait.h b/include/block/aio-wait.h
index b39eefb38d..ff27fe4eab 100644
--- a/include/block/aio-wait.h
+++ b/include/block/aio-wait.h
@@ -59,10 +59,11 @@ typedef struct {
 extern AioWait global_aio_wait;
 
 /**
- * AIO_WAIT_WHILE:
+ * _AIO_WAIT_WHILE:
  * @ctx: the aio context, or NULL if multiple aio contexts (for which the
  *   caller does not hold a lock) are involved in the polling condition.
  * @cond: wait while this conditional expression is true
+ * @unlock: whether to unlock and then lock again @ctx
  *
  * Wait while a condition is true.  Use this to implement synchronous
  * operations that require event loop activity.
@@ -75,7 +76,7 @@ extern AioWait global_aio_wait;
  * wait on conditions between two IOThreads since that could lead to deadlock,
  * go via the main loop instead.
  */
-#define AIO_WAIT_WHILE(ctx, cond) ({   \
+#define _AIO_WAIT_WHILE(ctx, cond, unlock) ({  \
 bool waited_ = false;  \
 AioWait *wait_ = &global_aio_wait; \
 AioContext *ctx_ = (ctx);  \
@@ -90,11 +91,11 @@ extern AioWait global_aio_wait;
 assert(qemu_get_current_aio_context() ==   \
qemu_get_aio_context());\
 while ((cond)) {   \
-if (ctx_) {\
+if (unlock && ctx_) {  \
 aio_context_release(ctx_); \
 }  \
 aio_poll(qemu_get_aio_context(), true);\
-if (ctx_) {\
+if (unlock && ctx_) {  \
 aio_context_acquire(ctx_); \
 }  \
 waited_ = true;\
@@ -103,6 +104,12 @@ extern AioWait global_aio_wait;
 qatomic_dec(&wait_->num_waiters);  \
 waited_; })
 
+#define AIO_WAIT_WHILE(ctx, cond)  \
+_AIO_WAIT_WHILE(ctx, cond, true)
+
+#define AIO_WAIT_WHILE_UNLOCKED(ctx, cond) \
+_AIO_WAIT_WHILE(ctx, cond, false)
+
 /**
  * aio_wait_kick:
  * Wake up the main thread if it is waiting on AIO_WAIT_WHILE().  During
-- 
2.27.0




[RFC PATCH v2 13/14] jobs: add job lock in find_* functions

2021-11-04 Thread Emanuele Giuseppe Esposito
Both blockdev.c and job-qmp.c have TOC/TOU conditions, because
they first search for the job and then perform an action on it.
Therefore, we need to do the search + action under the same
job mutex critical section.

Note: at this stage, job_{lock/unlock} and job lock guard macros
are *nop*.

Signed-off-by: Emanuele Giuseppe Esposito 
---
 blockdev.c | 9 +
 job-qmp.c  | 8 
 2 files changed, 17 insertions(+)

diff --git a/blockdev.c b/blockdev.c
index c5a835d9ed..0bd79757fc 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -3327,12 +3327,14 @@ static BlockJob *find_block_job(const char *id, 
AioContext **aio_context,
 assert(id != NULL);
 
 *aio_context = NULL;
+job_lock();
 
 job = block_job_get(id);
 
 if (!job) {
 error_set(errp, ERROR_CLASS_DEVICE_NOT_ACTIVE,
   "Block job '%s' not found", id);
+job_unlock();
 return NULL;
 }
 
@@ -3353,6 +3355,7 @@ void qmp_block_job_set_speed(const char *device, int64_t 
speed, Error **errp)
 
 block_job_set_speed(job, speed, errp);
 aio_context_release(aio_context);
+job_unlock();
 }
 
 void qmp_block_job_cancel(const char *device,
@@ -3379,6 +3382,7 @@ void qmp_block_job_cancel(const char *device,
 job_user_cancel(&job->job, force, errp);
 out:
 aio_context_release(aio_context);
+job_unlock();
 }
 
 void qmp_block_job_pause(const char *device, Error **errp)
@@ -3393,6 +3397,7 @@ void qmp_block_job_pause(const char *device, Error **errp)
 trace_qmp_block_job_pause(job);
 job_user_pause(&job->job, errp);
 aio_context_release(aio_context);
+job_unlock();
 }
 
 void qmp_block_job_resume(const char *device, Error **errp)
@@ -3407,6 +3412,7 @@ void qmp_block_job_resume(const char *device, Error 
**errp)
 trace_qmp_block_job_resume(job);
 job_user_resume(&job->job, errp);
 aio_context_release(aio_context);
+job_unlock();
 }
 
 void qmp_block_job_complete(const char *device, Error **errp)
@@ -3421,6 +3427,7 @@ void qmp_block_job_complete(const char *device, Error 
**errp)
 trace_qmp_block_job_complete(job);
 job_complete(&job->job, errp);
 aio_context_release(aio_context);
+job_unlock();
 }
 
 void qmp_block_job_finalize(const char *id, Error **errp)
@@ -3444,6 +3451,7 @@ void qmp_block_job_finalize(const char *id, Error **errp)
 aio_context = blk_get_aio_context(job->blk);
 job_unref(&job->job);
 aio_context_release(aio_context);
+job_unlock();
 }
 
 void qmp_block_job_dismiss(const char *id, Error **errp)
@@ -3460,6 +3468,7 @@ void qmp_block_job_dismiss(const char *id, Error **errp)
 job = &bjob->job;
 job_dismiss(&job, errp);
 aio_context_release(aio_context);
+job_unlock();
 }
 
 void qmp_change_backing_file(const char *device,
diff --git a/job-qmp.c b/job-qmp.c
index a355dc2954..8f07c51db8 100644
--- a/job-qmp.c
+++ b/job-qmp.c
@@ -35,10 +35,12 @@ static Job *find_job(const char *id, AioContext 
**aio_context, Error **errp)
 Job *job;
 
 *aio_context = NULL;
+job_lock();
 
 job = job_get(id);
 if (!job) {
 error_setg(errp, "Job not found");
+job_unlock();
 return NULL;
 }
 
@@ -60,6 +62,7 @@ void qmp_job_cancel(const char *id, Error **errp)
 trace_qmp_job_cancel(job);
 job_user_cancel(job, true, errp);
 aio_context_release(aio_context);
+job_unlock();
 }
 
 void qmp_job_pause(const char *id, Error **errp)
@@ -74,6 +77,7 @@ void qmp_job_pause(const char *id, Error **errp)
 trace_qmp_job_pause(job);
 job_user_pause(job, errp);
 aio_context_release(aio_context);
+job_unlock();
 }
 
 void qmp_job_resume(const char *id, Error **errp)
@@ -88,6 +92,7 @@ void qmp_job_resume(const char *id, Error **errp)
 trace_qmp_job_resume(job);
 job_user_resume(job, errp);
 aio_context_release(aio_context);
+job_unlock();
 }
 
 void qmp_job_complete(const char *id, Error **errp)
@@ -102,6 +107,7 @@ void qmp_job_complete(const char *id, Error **errp)
 trace_qmp_job_complete(job);
 job_complete(job, errp);
 aio_context_release(aio_context);
+job_unlock();
 }
 
 void qmp_job_finalize(const char *id, Error **errp)
@@ -125,6 +131,7 @@ void qmp_job_finalize(const char *id, Error **errp)
 aio_context = job->aio_context;
 job_unref(job);
 aio_context_release(aio_context);
+job_unlock();
 }
 
 void qmp_job_dismiss(const char *id, Error **errp)
@@ -139,6 +146,7 @@ void qmp_job_dismiss(const char *id, Error **errp)
 trace_qmp_job_dismiss(job);
 job_dismiss(&job, errp);
 aio_context_release(aio_context);
+job_unlock();
 }
 
 static JobInfo *job_query_single(Job *job, Error **errp)
-- 
2.27.0




[RFC PATCH v2 06/14] job.c: make job_event_* functions static

2021-11-04 Thread Emanuele Giuseppe Esposito
job_event_* functions can all be static, as they are not used
outside job.c.

Signed-off-by: Emanuele Giuseppe Esposito 
---
 include/qemu/job.h |  6 --
 job.c  | 12 ++--
 2 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/include/qemu/job.h b/include/qemu/job.h
index d34c55dad0..58b3af47e3 100644
--- a/include/qemu/job.h
+++ b/include/qemu/job.h
@@ -425,12 +425,6 @@ void job_progress_set_remaining(Job *job, uint64_t 
remaining);
  */
 void job_progress_increase_remaining(Job *job, uint64_t delta);
 
-/** To be called when a cancelled job is finalised. */
-void job_event_cancelled(Job *job);
-
-/** To be called when a successfully completed job is finalised. */
-void job_event_completed(Job *job);
-
 /**
  * Conditionally enter the job coroutine if the job is ready to run, not
  * already busy and fn() returns true. fn() is called while under the job_lock
diff --git a/job.c b/job.c
index bd36207021..ce5066522f 100644
--- a/job.c
+++ b/job.c
@@ -514,12 +514,20 @@ void job_progress_increase_remaining(Job *job, uint64_t 
delta)
 progress_increase_remaining(&job->progress, delta);
 }
 
-void job_event_cancelled(Job *job)
+/**
+ * To be called when a cancelled job is finalised.
+ * Called with job_mutex held.
+ */
+static void job_event_cancelled(Job *job)
 {
 notifier_list_notify(&job->on_finalize_cancelled, job);
 }
 
-void job_event_completed(Job *job)
+/**
+ * To be called when a successfully completed job is finalised.
+ * Called with job_mutex held.
+ */
+static void job_event_completed(Job *job)
 {
 notifier_list_notify(&job->on_finalize_completed, job);
 }
-- 
2.27.0




[RFC PATCH v2 04/14] job.h: define unlocked functions

2021-11-04 Thread Emanuele Giuseppe Esposito
All these functions assume that the lock is not held, and acquire
it internally.

These functions will be useful when job_lock is globally applied,
as they will allow callers to access the job struct fields
without worrying about the job lock.

Update also the comments in blockjob.c (and move them in job.c).

Note: at this stage, job_{lock/unlock} and job lock guard macros
are *nop*.

Signed-off-by: Emanuele Giuseppe Esposito 
---
 include/qemu/job.h | 21 +++
 blockjob.c | 20 ---
 job.c  | 88 --
 3 files changed, 107 insertions(+), 22 deletions(-)

diff --git a/include/qemu/job.h b/include/qemu/job.h
index c7a6bcad1b..d34c55dad0 100644
--- a/include/qemu/job.h
+++ b/include/qemu/job.h
@@ -679,4 +679,25 @@ void job_dismiss(Job **job, Error **errp);
  */
 int job_finish_sync(Job *job, void (*finish)(Job *, Error **errp), Error 
**errp);
 
+/** Enters the @job if it is not paused */
+void job_enter_not_paused(Job *job);
+
+/** returns @job->ret */
+bool job_has_failed(Job *job);
+
+/** Returns the @job->status */
+JobStatus job_get_status(Job *job);
+
+/** Returns the @job->pause_count */
+int job_get_pause_count(Job *job);
+
+/** Returns @job->paused */
+bool job_get_paused(Job *job);
+
+/** Returns @job->busy */
+bool job_get_busy(Job *job);
+
+/** Return true if @job not paused and not cancelled */
+bool job_not_paused_nor_cancelled(Job *job);
+
 #endif
diff --git a/blockjob.c b/blockjob.c
index 4982f6a2b5..53c1e9c406 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -36,21 +36,6 @@
 #include "qemu/main-loop.h"
 #include "qemu/timer.h"
 
-/*
- * The block job API is composed of two categories of functions.
- *
- * The first includes functions used by the monitor.  The monitor is
- * peculiar in that it accesses the block job list with block_job_get, and
- * therefore needs consistency across block_job_get and the actual operation
- * (e.g. block_job_set_speed).  The consistency is achieved with
- * aio_context_acquire/release.  These functions are declared in blockjob.h.
- *
- * The second includes functions used by the block job drivers and sometimes
- * by the core block layer.  These do not care about locking, because the
- * whole coroutine runs under the AioContext lock, and are declared in
- * blockjob_int.h.
- */
-
 static bool is_block_job(Job *job)
 {
 return job_type(job) == JOB_TYPE_BACKUP ||
@@ -433,11 +418,6 @@ static void block_job_event_ready(Notifier *n, void 
*opaque)
 }
 
 
-/*
- * API for block job drivers and the block layer.  These functions are
- * declared in blockjob_int.h.
- */
-
 void *block_job_create(const char *job_id, const BlockJobDriver *driver,
JobTxn *txn, BlockDriverState *bs, uint64_t perm,
uint64_t shared_perm, int64_t speed, int flags,
diff --git a/job.c b/job.c
index e393c1222f..bd36207021 100644
--- a/job.c
+++ b/job.c
@@ -32,6 +32,23 @@
 #include "trace/trace-root.h"
 #include "qapi/qapi-events-job.h"
 
+/*
+ * The job API is composed of two categories of functions.
+ *
+ * The first includes functions used by the monitor.  The monitor is
+ * peculiar in that it accesses the block job list with job_get, and
+ * therefore needs consistency across job_get and the actual operation
+ * (e.g. job_user_cancel). To achieve this consistency, the caller
+ * calls job_lock/job_unlock itself around the whole operation.
+ * These functions are declared in job-monitor.h.
+ *
+ *
+ * The second includes functions used by the block job drivers and sometimes
+ * by the core block layer. These delegate the locking to the callee instead,
+ * and are declared in job-driver.h.
+ */
+
+
 /*
  * job_mutex protects the jobs list, but also makes the
  * struct job fields thread-safe.
@@ -230,18 +247,70 @@ const char *job_type_str(const Job *job)
 return JobType_str(job_type(job));
 }
 
-bool job_is_cancelled(Job *job)
+JobStatus job_get_status(Job *job)
+{
+JOB_LOCK_GUARD();
+return job->status;
+}
+
+int job_get_pause_count(Job *job)
+{
+JOB_LOCK_GUARD();
+return job->pause_count;
+}
+
+bool job_get_paused(Job *job)
+{
+JOB_LOCK_GUARD();
+return job->paused;
+}
+
+bool job_get_busy(Job *job)
+{
+JOB_LOCK_GUARD();
+return job->busy;
+}
+
+bool job_has_failed(Job *job)
+{
+JOB_LOCK_GUARD();
+return job->ret < 0;
+}
+
+/* Called with job_mutex held. */
+static bool job_is_cancelled_locked(Job *job)
 {
 /* force_cancel may be true only if cancelled is true, too */
 assert(job->cancelled || !job->force_cancel);
 return job->force_cancel;
 }
 
-bool job_cancel_requested(Job *job)
+/* Called with job_mutex *not* held. */
+bool job_is_cancelled(Job *job)
+{
+JOB_LOCK_GUARD();
+return job_is_cancelled_locked(job);
+}
+
+bool job_not_paused_nor_cancelled(Job *job)
+{
+JOB_LOCK_GUARD();
+return !job->paused && !job_is_cancelled_locked(job);
+}
+
+/* Called with job_mutex held. */
+static bool job_cancel_r

[RFC PATCH v2 01/14] job.c: make job_lock/unlock public

2021-11-04 Thread Emanuele Giuseppe Esposito
job mutex will be used to protect the job struct elements and list,
replacing AioContext locks.

Right now use a shared lock for all jobs, in order to keep things
simple. Once the AioContext lock is gone, we can introduce per-job
locks.

To simplify the switch from aiocontext to job lock, introduce
*nop* lock/unlock functions and macros. Once everything is protected
by jobs, we can add the mutex and remove the aiocontext.
Since job_mutex is already being used, add static
_job_{lock/unlock}.

Signed-off-by: Emanuele Giuseppe Esposito 
Reviewed-by: Stefan Hajnoczi 
---
 include/qemu/job.h | 18 ++
 job.c  | 39 +++
 2 files changed, 45 insertions(+), 12 deletions(-)

diff --git a/include/qemu/job.h b/include/qemu/job.h
index 7e9e59f4b8..ccf7826426 100644
--- a/include/qemu/job.h
+++ b/include/qemu/job.h
@@ -297,6 +297,24 @@ typedef enum JobCreateFlags {
 JOB_MANUAL_DISMISS = 0x04,
 } JobCreateFlags;
 
+/**
+ * job_lock:
+ *
+ * Take the mutex protecting the list of jobs and their status.
+ * Most functions called by the monitor need to call job_lock
+ * and job_unlock manually.  On the other hand, function called
+ * by the block jobs themselves and by the block layer will take the
+ * lock for you.
+ */
+void job_lock(void);
+
+/**
+ * job_unlock:
+ *
+ * Release the mutex protecting the list of jobs and their status.
+ */
+void job_unlock(void);
+
 /**
  * Allocate and return a new job transaction. Jobs can be added to the
  * transaction using job_txn_add_job().
diff --git a/job.c b/job.c
index 94b142684f..0e4dacf028 100644
--- a/job.c
+++ b/job.c
@@ -32,6 +32,12 @@
 #include "trace/trace-root.h"
 #include "qapi/qapi-events-job.h"
 
+/*
+ * job_mutex protects the jobs list, but also makes the
+ * struct job fields thread-safe.
+ */
+static QemuMutex job_mutex;
+
 static QLIST_HEAD(, Job) jobs = QLIST_HEAD_INITIALIZER(jobs);
 
 /* Job State Transition Table */
@@ -74,17 +80,26 @@ struct JobTxn {
 int refcnt;
 };
 
-/* Right now, this mutex is only needed to synchronize accesses to job->busy
- * and job->sleep_timer, such as concurrent calls to job_do_yield and
- * job_enter. */
-static QemuMutex job_mutex;
+#define JOB_LOCK_GUARD() /* QEMU_LOCK_GUARD(&job_mutex) */
+
+#define WITH_JOB_LOCK_GUARD() /* WITH_QEMU_LOCK_GUARD(&job_mutex) */
+
+void job_lock(void)
+{
+/* nop */
+}
+
+void job_unlock(void)
+{
+/* nop */
+}
 
-static void job_lock(void)
+static void _job_lock(void)
 {
 qemu_mutex_lock(&job_mutex);
 }
 
-static void job_unlock(void)
+static void _job_unlock(void)
 {
 qemu_mutex_unlock(&job_mutex);
 }
@@ -449,21 +464,21 @@ void job_enter_cond(Job *job, bool(*fn)(Job *job))
 return;
 }
 
-job_lock();
+_job_lock();
 if (job->busy) {
-job_unlock();
+_job_unlock();
 return;
 }
 
 if (fn && !fn(job)) {
-job_unlock();
+_job_unlock();
 return;
 }
 
 assert(!job->deferred_to_main_loop);
 timer_del(&job->sleep_timer);
 job->busy = true;
-job_unlock();
+_job_unlock();
 aio_co_enter(job->aio_context, job->co);
 }
 
@@ -480,13 +495,13 @@ void job_enter(Job *job)
  * called explicitly. */
 static void coroutine_fn job_do_yield(Job *job, uint64_t ns)
 {
-job_lock();
+_job_lock();
 if (ns != -1) {
 timer_mod(&job->sleep_timer, ns);
 }
 job->busy = false;
 job_event_idle(job);
-job_unlock();
+_job_unlock();
 qemu_coroutine_yield();
 
 /* Set by job_enter_cond() before re-entering the coroutine.  */
-- 
2.27.0




[RFC PATCH v2 02/14] job.h: categorize fields in struct Job

2021-11-04 Thread Emanuele Giuseppe Esposito
Categorize the fields in struct Job to understand which ones
need to be protected by the job mutex and which don't.

Signed-off-by: Emanuele Giuseppe Esposito 
---
 include/qemu/job.h | 57 +++---
 1 file changed, 34 insertions(+), 23 deletions(-)

diff --git a/include/qemu/job.h b/include/qemu/job.h
index ccf7826426..f7036ac6b3 100644
--- a/include/qemu/job.h
+++ b/include/qemu/job.h
@@ -40,27 +40,52 @@ typedef struct JobTxn JobTxn;
  * Long-running operation.
  */
 typedef struct Job {
+
+/* Fields set at initialization (job_create), and never modified */
+
 /** The ID of the job. May be NULL for internal jobs. */
 char *id;
 
-/** The type of this job. */
+/**
+ * The type of this job.
+ * All callbacks are called with job_mutex *not* held.
+ */
 const JobDriver *driver;
 
-/** Reference count of the block job */
-int refcnt;
-
-/** Current state; See @JobStatus for details. */
-JobStatus status;
-
 /** AioContext to run the job coroutine in */
 AioContext *aio_context;
 
 /**
  * The coroutine that executes the job.  If not NULL, it is reentered when
  * busy is false and the job is cancelled.
+ * Initialized in job_start()
  */
 Coroutine *co;
 
+/** True if this job should automatically finalize itself */
+bool auto_finalize;
+
+/** True if this job should automatically dismiss itself */
+bool auto_dismiss;
+
+/** The completion function that will be called when the job completes.  */
+BlockCompletionFunc *cb;
+
+/** The opaque value that is passed to the completion function.  */
+void *opaque;
+
+/* ProgressMeter API is thread-safe */
+ProgressMeter progress;
+
+
+/** Protected by job_mutex */
+
+/** Reference count of the block job */
+int refcnt;
+
+/** Current state; See @JobStatus for details. */
+JobStatus status;
+
 /**
  * Timer that is used by @job_sleep_ns. Accessed under job_mutex (in
  * job.c).
@@ -76,7 +101,7 @@ typedef struct Job {
 /**
  * Set to false by the job while the coroutine has yielded and may be
  * re-entered by job_enter(). There may still be I/O or event loop activity
- * pending. Accessed under block_job_mutex (in blockjob.c).
+ * pending. Accessed under job_mutex.
  *
  * When the job is deferred to the main loop, busy is true as long as the
  * bottom half is still pending.
@@ -112,14 +137,6 @@ typedef struct Job {
 /** Set to true when the job has deferred work to the main loop. */
 bool deferred_to_main_loop;
 
-/** True if this job should automatically finalize itself */
-bool auto_finalize;
-
-/** True if this job should automatically dismiss itself */
-bool auto_dismiss;
-
-ProgressMeter progress;
-
 /**
  * Return code from @run and/or @prepare callback(s).
  * Not final until the job has reached the CONCLUDED status.
@@ -134,12 +151,6 @@ typedef struct Job {
  */
 Error *err;
 
-/** The completion function that will be called when the job completes.  */
-BlockCompletionFunc *cb;
-
-/** The opaque value that is passed to the completion function.  */
-void *opaque;
-
 /** Notifiers called when a cancelled job is finalised */
 NotifierList on_finalize_cancelled;
 
@@ -167,6 +178,7 @@ typedef struct Job {
 
 /**
  * Callbacks and other information about a Job driver.
+ * All callbacks are invoked with job_mutex *not* held.
  */
 struct JobDriver {
 
@@ -460,7 +472,6 @@ void job_yield(Job *job);
  */
 void coroutine_fn job_sleep_ns(Job *job, int64_t ns);
 
-
 /** Returns the JobType of a given Job. */
 JobType job_type(const Job *job);
 
-- 
2.27.0




[RFC PATCH v2 00/14] job: replace AioContext lock with job_mutex

2021-11-04 Thread Emanuele Giuseppe Esposito
In this series, we want to remove the AioContext lock and instead
use the already existent job_mutex to protect the job structures
and list. This is part of the work to get rid of AioContext lock
usage in favour of smaller granularity locks.

In order to simplify reviewer's job, job lock/unlock functions and
macros are added as empty prototypes (nop) in patch 1.
They are converted to use the actual job mutex only in the last
patch, 14. In this way we can freely create locking sections
without worrying about deadlocks with the aiocontext lock.

Patch 2 defines what fields in the job structure need protection,
and patches 3-4 categorize respectively locked and unlocked
functions in the job API.

Patch 5-9 are in preparation to the job locks, they try to reduce
the aiocontext critical sections and other minor fixes.

Patch 10-13 introduces the (nop) job lock into the job API and
its users, following the comments and categorizations done in
patch 2-3-4.

Patch 14 makes the prototypes in patch 1 use the job_mutex and
removes all aiocontext lock at the same time.

Tested this series by running unit tests, qemu-iotests and qtests
(x86_64).

This serie is based on my previous series "block layer: split
block APIs in global state and I/O".

Based-on: <20211025101735.2060852-1-eespo...@redhat.com>
---
RFC v2:
* use JOB_LOCK_GUARD and WITH_JOB_LOCK_GUARD
* mu(u)ltiple typos in commit messages
* job API split patches are sent separately in another series
* use of empty job_{lock/unlock} and JOB_LOCK_GUARD/WITH_JOB_LOCK_GUARD
  to avoid deadlocks and simplify the reviewer job

Emanuele Giuseppe Esposito (14):
  job.c: make job_lock/unlock public
  job.h: categorize fields in struct Job
  job.h: define locked functions
  job.h: define unlocked functions
  block/mirror.c: use of job helpers in drivers to avoid TOC/TOU
  job.c: make job_event_* functions static
  job.c: move inner aiocontext lock in callbacks
  aio-wait.h: introduce AIO_WAIT_WHILE_UNLOCKED
  jobs: remove aiocontext locks since the functions are under BQL
  jobs: protect jobs with job_lock/unlock
  block_job_query: remove atomic read
  jobs: use job locks and helpers also in the unit tests
  jobs: add job lock in find_* functions
  job.c: enable job lock/unlock and remove Aiocontext locks

 include/block/aio-wait.h |  15 +-
 include/qemu/job.h   | 171 ++---
 block.c  |   6 +
 block/mirror.c   |   8 +-
 block/replication.c  |   6 +
 blockdev.c   |  88 +++
 blockjob.c   |  62 +++--
 job-qmp.c|  54 ++--
 job.c| 410 ++-
 monitor/qmp-cmds.c   |   2 +
 qemu-img.c   |   8 +-
 tests/unit/test-bdrv-drain.c |  44 ++--
 tests/unit/test-block-iothread.c |   6 +-
 tests/unit/test-blockjob-txn.c   |  10 +
 tests/unit/test-blockjob.c   |  68 ++---
 15 files changed, 628 insertions(+), 330 deletions(-)

-- 
2.27.0




[RFC PATCH v2 03/14] job.h: define locked functions

2021-11-04 Thread Emanuele Giuseppe Esposito
These functions assume that the job lock is held by the
caller, to avoid TOC/TOU conditions.

Introduce also additional helpers that define _locked
functions (useful when the job_mutex is globally applied).

Note: at this stage, job_{lock/unlock} and job lock guard macros
are *nop*.

Signed-off-by: Emanuele Giuseppe Esposito 
---
 include/qemu/job.h | 66 ++
 job.c  | 18 +++--
 2 files changed, 77 insertions(+), 7 deletions(-)

diff --git a/include/qemu/job.h b/include/qemu/job.h
index f7036ac6b3..c7a6bcad1b 100644
--- a/include/qemu/job.h
+++ b/include/qemu/job.h
@@ -355,6 +355,8 @@ void job_txn_unref(JobTxn *txn);
  * the reference that is automatically grabbed here.
  *
  * If @txn is NULL, the function does nothing.
+ *
+ * Called between job_lock and job_unlock.
  */
 void job_txn_add_job(JobTxn *txn, Job *job);
 
@@ -377,12 +379,16 @@ void *job_create(const char *job_id, const JobDriver 
*driver, JobTxn *txn,
 /**
  * Add a reference to Job refcnt, it will be decreased with job_unref, and then
  * be freed if it comes to be the last reference.
+ *
+ * Called between job_lock and job_unlock.
  */
 void job_ref(Job *job);
 
 /**
  * Release a reference that was previously acquired with job_ref() or
  * job_create(). If it's the last reference to the object, it will be freed.
+ *
+ * Called between job_lock and job_unlock.
  */
 void job_unref(Job *job);
 
@@ -429,6 +435,8 @@ void job_event_completed(Job *job);
  * Conditionally enter the job coroutine if the job is ready to run, not
  * already busy and fn() returns true. fn() is called while under the job_lock
  * critical section.
+ *
+ * Called between job_lock and job_unlock, but it releases the lock temporarly.
  */
 void job_enter_cond(Job *job, bool(*fn)(Job *job));
 
@@ -490,34 +498,52 @@ bool job_is_cancelled(Job *job);
  */
 bool job_cancel_requested(Job *job);
 
-/** Returns whether the job is in a completed state. */
+/**
+ * Returns whether the job is in a completed state.
+ * Called between job_lock and job_unlock.
+ */
 bool job_is_completed(Job *job);
 
 /** Returns whether the job is ready to be completed. */
 bool job_is_ready(Job *job);
 
+/** Same as job_is_ready(), but assumes job_lock is held. */
+bool job_is_ready_locked(Job *job);
+
 /**
  * Request @job to pause at the next pause point. Must be paired with
  * job_resume(). If the job is supposed to be resumed by user action, call
  * job_user_pause() instead.
+ *
+ * Called between job_lock and job_unlock.
  */
 void job_pause(Job *job);
 
-/** Resumes a @job paused with job_pause. */
+/**
+ * Resumes a @job paused with job_pause.
+ * Called between job_lock and job_unlock.
+ */
 void job_resume(Job *job);
 
 /**
  * Asynchronously pause the specified @job.
  * Do not allow a resume until a matching call to job_user_resume.
+ *
+ * Called between job_lock and job_unlock.
  */
 void job_user_pause(Job *job, Error **errp);
 
-/** Returns true if the job is user-paused. */
+/**
+ * Returns true if the job is user-paused.
+ * Called between job_lock and job_unlock.
+ */
 bool job_user_paused(Job *job);
 
 /**
  * Resume the specified @job.
  * Must be paired with a preceding job_user_pause.
+ *
+ * Called between job_lock and job_unlock.
  */
 void job_user_resume(Job *job, Error **errp);
 
@@ -526,6 +552,8 @@ void job_user_resume(Job *job, Error **errp);
  * first one if @job is %NULL.
  *
  * Returns the requested job, or %NULL if there are no more jobs left.
+ *
+ * Called between job_lock and job_unlock.
  */
 Job *job_next(Job *job);
 
@@ -533,6 +561,8 @@ Job *job_next(Job *job);
  * Get the job identified by @id (which must not be %NULL).
  *
  * Returns the requested job, or %NULL if it doesn't exist.
+ *
+ * Called between job_lock and job_unlock.
  */
 Job *job_get(const char *id);
 
@@ -540,27 +570,39 @@ Job *job_get(const char *id);
  * Check whether the verb @verb can be applied to @job in its current state.
  * Returns 0 if the verb can be applied; otherwise errp is set and -EPERM
  * returned.
+ *
+ * Called between job_lock and job_unlock.
  */
 int job_apply_verb(Job *job, JobVerb verb, Error **errp);
 
 /** The @job could not be started, free it. */
 void job_early_fail(Job *job);
 
+/** Same as job_early_fail(), but assumes job_lock is held. */
+void job_early_fail_locked(Job *job);
+
 /** Moves the @job from RUNNING to READY */
 void job_transition_to_ready(Job *job);
 
-/** Asynchronously complete the specified @job. */
+/**
+ * Asynchronously complete the specified @job.
+ * Called between job_lock and job_unlock, but it releases the lock temporarly.
+ */
 void job_complete(Job *job, Error **errp);
 
 /**
  * Asynchronously cancel the specified @job. If @force is true, the job should
  * be cancelled immediately without waiting for a consistent state.
+ *
+ * Called between job_lock and job_unlock.
  */
 void job_cancel(Job *job, bool force);
 
 /**
  * Cancels the specified job like job_cancel(), 

Re: [PATCH] file-posix: Fix alignment after reopen changing O_DIRECT

2021-11-04 Thread Kevin Wolf
Am 04.11.2021 um 15:20 hat Stefano Garzarella geschrieben:
> On Thu, Nov 04, 2021 at 12:31:09PM +0100, Kevin Wolf wrote:
> > At the end of a reopen, we already call bdrv_refresh_limits(), which
> > should update bs->request_alignment according to the new file
> > descriptor. However, raw_probe_alignment() relies on s->needs_alignment
> > and just uses 1 if it isn't set. We neglected to update this field, so
> > starting with cache=writeback and then reopening with cache=none means
> > that we get an incorrect bs->request_alignment == 1 and unaligned
> > requests fail instead of being automatically aligned.
> > 
> > Fix this by recalculating s->needs_alignment in raw_refresh_limits()
> > before calling raw_probe_alignment().
> > 
> > Signed-off-by: Kevin Wolf 
> > ---
> > block/file-posix.c | 20 
> > tests/qemu-iotests/142 | 22 ++
> > tests/qemu-iotests/142.out | 15 +++
> > 3 files changed, 53 insertions(+), 4 deletions(-)
> > 
> > diff --git a/block/file-posix.c b/block/file-posix.c
> > index 7a27c83060..3f14e47096 100644
> > --- a/block/file-posix.c
> > +++ b/block/file-posix.c
> > @@ -167,6 +167,7 @@ typedef struct BDRVRawState {
> > int page_cache_inconsistent; /* errno from fdatasync failure */
> > bool has_fallocate;
> > bool needs_alignment;
> > +bool force_alignment;
> > bool drop_cache;
> > bool check_cache_dropped;
> > struct {
> > @@ -351,6 +352,17 @@ static bool dio_byte_aligned(int fd)
> > return false;
> > }
> > 
> > +static int raw_needs_alignment(BlockDriverState *bs)
> 
> If you need to respin, maybe it's better to use `bool` as return type.

Yes, that's what it should be. Can be fixed up while applying. I had a
0/1/-errno return value in an intermediate version, which is how it
became 'int', but it's not necessary any more in this version.

> In both cases:
> Reviewed-by: Stefano Garzarella 

Thanks!

Kevin




Re: [PATCH v2 0/3] virtio: increase VIRTQUEUE_MAX_SIZE to 32k

2021-11-04 Thread Christian Schoenebeck
On Mittwoch, 3. November 2021 12:33:33 CET Stefan Hajnoczi wrote:
> On Mon, Nov 01, 2021 at 09:29:26PM +0100, Christian Schoenebeck wrote:
> > On Donnerstag, 28. Oktober 2021 11:00:48 CET Stefan Hajnoczi wrote:
> > > On Mon, Oct 25, 2021 at 05:03:25PM +0200, Christian Schoenebeck wrote:
> > > > On Montag, 25. Oktober 2021 12:30:41 CEST Stefan Hajnoczi wrote:
> > > > > On Thu, Oct 21, 2021 at 05:39:28PM +0200, Christian Schoenebeck wrote:
> > > > > > On Freitag, 8. Oktober 2021 18:08:48 CEST Christian Schoenebeck 
> > > > > > wrote:
> > > > > > > On Freitag, 8. Oktober 2021 16:24:42 CEST Christian Schoenebeck 
> > > > > > > wrote:
> > > > > > > > On Freitag, 8. Oktober 2021 09:25:33 CEST Greg Kurz wrote:
> > > > > > > > > On Thu, 7 Oct 2021 16:42:49 +0100
> > > > > > > > > 
> > > > > > > > > Stefan Hajnoczi  wrote:
> > > > > > > > > > On Thu, Oct 07, 2021 at 02:51:55PM +0200, Christian 
> > > > > > > > > > Schoenebeck wrote:
> > > > > > > > > > > On Donnerstag, 7. Oktober 2021 07:23:59 CEST Stefan 
> > > > > > > > > > > Hajnoczi wrote:
> > > > > > > > > > > > On Mon, Oct 04, 2021 at 09:38:00PM +0200, Christian
> > > > > > > > > > > > Schoenebeck
> > > > > > > 
> > > > > > > wrote:
> > > > > > > > > > > > > At the moment the maximum transfer size with virtio
> > > > > > > > > > > > > is
> > > > > > > > > > > > > limited
> > > > > > > > > > > > > to
> > > > > > > > > > > > > 4M
> > > > > > > > > > > > > (1024 * PAGE_SIZE). This series raises this limit to
> > > > > > > > > > > > > its
> > > > > > > > > > > > > maximum
> > > > > > > > > > > > > theoretical possible transfer size of 128M (32k
> > > > > > > > > > > > > pages)
> > > > > > > > > > > > > according
> > > > > > > > > > > > > to
> > > > > > > > > > > > > the
> > > > > > > > > > > > > virtio specs:
> > > > > > > > > > > > > 
> > > > > > > > > > > > > https://docs.oasis-open.org/virtio/virtio/v1.1/cs01/
> > > > > > > > > > > > > virt
> > > > > > > > > > > > > io-v
> > > > > > > > > > > > > 1.1-
> > > > > > > > > > > > > cs
> > > > > > > > > > > > > 01
> > > > > > > > > > > > > .html#
> > > > > > > > > > > > > x1-240006
> > > > > > > > > > > > 
> > > > > > > > > > > > Hi Christian,
> > > > > > > > > 
> > > > > > > > > > > > I took a quick look at the code:
> > > > > > > > > Hi,
> > > > > > > > > 
> > > > > > > > > Thanks Stefan for sharing virtio expertise and helping
> > > > > > > > > Christian
> > > > > > > > > !
> > > > > > > > > 
> > > > > > > > > > > > - The Linux 9p driver restricts descriptor chains to
> > > > > > > > > > > > 128
> > > > > > > > > > > > elements
> > > > > > > > > > > > 
> > > > > > > > > > > >   (net/9p/trans_virtio.c:VIRTQUEUE_NUM)
> > > > > > > > > > > 
> > > > > > > > > > > Yes, that's the limitation that I am about to remove
> > > > > > > > > > > (WIP);
> > > > > > > > > > > current
> > > > > > > > > > > kernel
> > > > > > > > > > > patches:
> > > > > > > > > > > https://lore.kernel.org/netdev/cover.1632327421.git.linu
> > > > > > > > > > > x_os
> > > > > > > > > > > s@cr
> > > > > > > > > > > udeb
> > > > > > > > > > > yt
> > > > > > > > > > > e.
> > > > > > > > > > > com/>
> > > > > > > > > > 
> > > > > > > > > > I haven't read the patches yet but I'm concerned that
> > > > > > > > > > today
> > > > > > > > > > the
> > > > > > > > > > driver
> > > > > > > > > > is pretty well-behaved and this new patch series
> > > > > > > > > > introduces a
> > > > > > > > > > spec
> > > > > > > > > > violation. Not fixing existing spec violations is okay,
> > > > > > > > > > but
> > > > > > > > > > adding
> > > > > > > > > > new
> > > > > > > > > > ones is a red flag. I think we need to figure out a clean
> > > > > > > > > > solution.
> > > > > > > > 
> > > > > > > > Nobody has reviewed the kernel patches yet. My main concern
> > > > > > > > therefore
> > > > > > > > actually is that the kernel patches are already too complex,
> > > > > > > > because
> > > > > > > > the
> > > > > > > > current situation is that only Dominique is handling 9p
> > > > > > > > patches on
> > > > > > > > kernel
> > > > > > > > side, and he barely has time for 9p anymore.
> > > > > > > > 
> > > > > > > > Another reason for me to catch up on reading current kernel
> > > > > > > > code
> > > > > > > > and
> > > > > > > > stepping in as reviewer of 9p on kernel side ASAP, independent
> > > > > > > > of
> > > > > > > > this
> > > > > > > > issue.
> > > > > > > > 
> > > > > > > > As for current kernel patches' complexity: I can certainly
> > > > > > > > drop
> > > > > > > > patch
> > > > > > > > 7
> > > > > > > > entirely as it is probably just overkill. Patch 4 is then the
> > > > > > > > biggest
> > > > > > > > chunk, I have to see if I can simplify it, and whether it
> > > > > > > > would
> > > > > > > > make
> > > > > > > > sense to squash with patch 3.
> > > > > > > > 
> > > > > > > > > > > > - The QEMU 9pfs code passes iovecs directly to
> > > > > > > > > > > > preadv(2)
> > > > > > > > > > > > and
> > > > > > > > > > > > will
> > > > > > > > > > > > fail
> > >

Re: [PATCH] file-posix: Fix alignment after reopen changing O_DIRECT

2021-11-04 Thread Stefano Garzarella

On Thu, Nov 04, 2021 at 12:31:09PM +0100, Kevin Wolf wrote:

At the end of a reopen, we already call bdrv_refresh_limits(), which
should update bs->request_alignment according to the new file
descriptor. However, raw_probe_alignment() relies on s->needs_alignment
and just uses 1 if it isn't set. We neglected to update this field, so
starting with cache=writeback and then reopening with cache=none means
that we get an incorrect bs->request_alignment == 1 and unaligned
requests fail instead of being automatically aligned.

Fix this by recalculating s->needs_alignment in raw_refresh_limits()
before calling raw_probe_alignment().

Signed-off-by: Kevin Wolf 
---
block/file-posix.c | 20 
tests/qemu-iotests/142 | 22 ++
tests/qemu-iotests/142.out | 15 +++
3 files changed, 53 insertions(+), 4 deletions(-)

diff --git a/block/file-posix.c b/block/file-posix.c
index 7a27c83060..3f14e47096 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -167,6 +167,7 @@ typedef struct BDRVRawState {
int page_cache_inconsistent; /* errno from fdatasync failure */
bool has_fallocate;
bool needs_alignment;
+bool force_alignment;
bool drop_cache;
bool check_cache_dropped;
struct {
@@ -351,6 +352,17 @@ static bool dio_byte_aligned(int fd)
return false;
}

+static int raw_needs_alignment(BlockDriverState *bs)


If you need to respin, maybe it's better to use `bool` as return type.

In both cases:
Reviewed-by: Stefano Garzarella 

Stefano




Re: [PATCH 05/15] hw/nvme: Add support for SR-IOV

2021-11-04 Thread Lukasz Maniak
On Tue, Nov 02, 2021 at 06:33:31PM +0100, Lukasz Maniak wrote:
> On Tue, Nov 02, 2021 at 03:33:15PM +0100, Klaus Jensen wrote:
> > On Oct  7 18:23, Lukasz Maniak wrote:
> > > This patch implements initial support for Single Root I/O Virtualization
> > > on an NVMe device.
> > > 
> > > Essentially, it allows to define the maximum number of virtual functions
> > > supported by the NVMe controller via sriov_max_vfs parameter.
> > > 
> > > Passing a non-zero value to sriov_max_vfs triggers reporting of SR-IOV
> > > capability by a physical controller and ARI capability by both the
> > > physical and virtual function devices.
> > > 
> > > NVMe controllers created via virtual functions mirror functionally
> > > the physical controller, which may not entirely be the case, thus
> > > consideration would be needed on the way to limit the capabilities of
> > > the VF.
> > > 
> > > NVMe subsystem is required for the use of SR-IOV.
> > > 
> > > Signed-off-by: Lukasz Maniak 
> > > ---
> > >  hw/nvme/ctrl.c   | 74 ++--
> > >  hw/nvme/nvme.h   |  1 +
> > >  include/hw/pci/pci_ids.h |  1 +
> > >  3 files changed, 73 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
> > > index 6a571d18cf..ad79ff0c00 100644
> > > --- a/hw/nvme/ctrl.c
> > > +++ b/hw/nvme/ctrl.c
> > > @@ -6361,8 +6406,12 @@ static int nvme_init_pci(NvmeCtrl *n, PCIDevice 
> > > *pci_dev, Error **errp)
> > >n->reg_size);
> > >  memory_region_add_subregion(&n->bar0, 0, &n->iomem);
> > >  
> > > -pci_register_bar(pci_dev, 0, PCI_BASE_ADDRESS_SPACE_MEMORY |
> > > - PCI_BASE_ADDRESS_MEM_TYPE_64, &n->bar0);
> > > +if (pci_is_vf(pci_dev)) {
> > > +pcie_sriov_vf_register_bar(pci_dev, 0, &n->bar0);
> > > +} else {
> > > +pci_register_bar(pci_dev, 0, PCI_BASE_ADDRESS_SPACE_MEMORY |
> > > + PCI_BASE_ADDRESS_MEM_TYPE_64, &n->bar0);
> > > +}
> > 
> > I assume that the assert we are seeing means that the pci_register_bars
> > in nvme_init_cmb and nvme_init_pmr must be changed similarly to this.
> 
> Assert will only arise for CMB as VF params are initialized with PF
> params.
> 
> @@ -6532,6 +6585,15 @@ static void nvme_realize(PCIDevice *pci_dev, Error 
> **errp)
>  NvmeCtrl *n = NVME(pci_dev);
>  NvmeNamespace *ns;
>  Error *local_err = NULL;
> +NvmeCtrl *pn = NVME(pcie_sriov_get_pf(pci_dev));
> +
> +if (pci_is_vf(pci_dev)) {
> +/* VFs derive settings from the parent. PF's lifespan exceeds
> + * that of VF's, so it's safe to share params.serial.
> + */
> +memcpy(&n->params, &pn->params, sizeof(NvmeParams));
> +n->subsys = pn->subsys;
> +}
>  
>  nvme_check_constraints(n, &local_err);
>  if (local_err) {
> 
> The following simple fix will both fix assert and also allow
> each VF to have its own CMB of the size defined for PF.
> 
> ---
>  hw/nvme/ctrl.c | 13 +
>  1 file changed, 9 insertions(+), 4 deletions(-)
> 
> diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
> index 19b32dd4da..99daa6290c 100644
> --- a/hw/nvme/ctrl.c
> +++ b/hw/nvme/ctrl.c
> @@ -6837,10 +6837,15 @@ static void nvme_init_cmb(NvmeCtrl *n, PCIDevice 
> *pci_dev)
>  n->cmb.buf = g_malloc0(cmb_size);
>  memory_region_init_io(&n->cmb.mem, OBJECT(n), &nvme_cmb_ops, n,
>"nvme-cmb", cmb_size);
> -pci_register_bar(pci_dev, NVME_CMB_BIR,
> - PCI_BASE_ADDRESS_SPACE_MEMORY |
> - PCI_BASE_ADDRESS_MEM_TYPE_64 |
> - PCI_BASE_ADDRESS_MEM_PREFETCH, &n->cmb.mem);
> +
> +if (pci_is_vf(pci_dev)) {
> +pcie_sriov_vf_register_bar(pci_dev, NVME_CMB_BIR, &n->cmb.mem);
> +} else {
> +pci_register_bar(pci_dev, NVME_CMB_BIR,
> +PCI_BASE_ADDRESS_SPACE_MEMORY |
> +PCI_BASE_ADDRESS_MEM_TYPE_64 |
> +PCI_BASE_ADDRESS_MEM_PREFETCH, &n->cmb.mem);
> +}
>  
>  NVME_CAP_SET_CMBS(cap, 1);
>  stq_le_p(&n->bar.cap, cap);
> 
> As for PMR, it is currently only available on PF, as only PF is capable
> of specifying the memory-backend-file object to use with PMR.
> Otherwise, either VFs would have to share the PMR with its PF, or there
> would be a requirement to define a memory-backend-file object for each VF.

Hi Klaus,

After some discussion, we decided to prohibit in V2 the use of CMB and
PMR in combination with SR-IOV.

While the implementation of CMB with SR-IOV is relatively
straightforward, PMR is not. We are committed to consistency in CMB and
PMR design in association with SR-IOV. So we considered it best to
disable both features and implement them in separate patches.

Kind regards,
Lukasz



Re: [PATCH 0/7] block: Attempt on fixing 030-reported errors

2021-11-04 Thread Hanna Reitz

On 04.11.21 12:58, Kevin Wolf wrote:

Am 04.11.2021 um 11:38 hat Hanna Reitz geschrieben:

(2A) bdrv_replace_child_noperm() should immediately set bs->file or
  bs->backing to NULL when it sets bs->{file,backing}->bs to NULL.
  It should also immediately remove any BdrvChild with .bs == NULL
  from the parent’s BDS.children list.
  Implemented in patches 2 through 6.

(2B) Alternatively, we could always keep the whole subgraph drained
  while we manipulate it.  Then, the bdrv_parent_drained_end_single()
  in bdrv_replace_child_noperm() wouldn’t do anything.
  To fix 030, we would need to add a drained section to
  stream_prepare(): Namely we’d need to drain the subgraph below the
  COR filter node.
  This would be a much simpler solution, but I don’t feel like it’s
  the right one.
As you can see, I’m not sure which of 2A or 2B is the right solution.  I
decided to investigate both: 2A was much more complicated, but seemed
like the right thing to do; 2B is much simpler, but doesn’t feel as
right.  Therefore, I decided to go with 2A in this first version of this
series.

I haven't looked at the patches yet, but if I understand correctly the
choice you're presenting here is between protecting code from accessing
invalid state and not creating the invalid state in the first place.


Yes, that’s right.


I agree that the latter is preferable as long as it doesn't make things
so complicated that we would be willing to accept the higher risk of
breakage in the former.


No, I don’t think it’s too complicated.  Just not as sample as a 
drained_begin + drained_end.



If it's doable in five patches, it's probably
not complicated enough to make such compromises.


Without the clean-up patches that are patches 3 and 4, it would be 
doable in even fewer patches. :)


Hanna




Re: [PATCH] file-posix: Fix alignment after reopen changing O_DIRECT

2021-11-04 Thread Hanna Reitz

On 04.11.21 12:31, Kevin Wolf wrote:

At the end of a reopen, we already call bdrv_refresh_limits(), which
should update bs->request_alignment according to the new file
descriptor. However, raw_probe_alignment() relies on s->needs_alignment
and just uses 1 if it isn't set. We neglected to update this field, so
starting with cache=writeback and then reopening with cache=none means
that we get an incorrect bs->request_alignment == 1 and unaligned
requests fail instead of being automatically aligned.

Fix this by recalculating s->needs_alignment in raw_refresh_limits()
before calling raw_probe_alignment().

Signed-off-by: Kevin Wolf 
---
  block/file-posix.c | 20 
  tests/qemu-iotests/142 | 22 ++
  tests/qemu-iotests/142.out | 15 +++
  3 files changed, 53 insertions(+), 4 deletions(-)


Reviewed-by: Hanna Reitz 




Re: [PATCH 0/7] block: Attempt on fixing 030-reported errors

2021-11-04 Thread Kevin Wolf
Am 04.11.2021 um 11:38 hat Hanna Reitz geschrieben:
> (2A) bdrv_replace_child_noperm() should immediately set bs->file or
>  bs->backing to NULL when it sets bs->{file,backing}->bs to NULL.
>  It should also immediately remove any BdrvChild with .bs == NULL
>  from the parent’s BDS.children list.
>  Implemented in patches 2 through 6.
> 
> (2B) Alternatively, we could always keep the whole subgraph drained
>  while we manipulate it.  Then, the bdrv_parent_drained_end_single()
>  in bdrv_replace_child_noperm() wouldn’t do anything.
>  To fix 030, we would need to add a drained section to
>  stream_prepare(): Namely we’d need to drain the subgraph below the
>  COR filter node.
>  This would be a much simpler solution, but I don’t feel like it’s
>  the right one.

> As you can see, I’m not sure which of 2A or 2B is the right solution.  I
> decided to investigate both: 2A was much more complicated, but seemed
> like the right thing to do; 2B is much simpler, but doesn’t feel as
> right.  Therefore, I decided to go with 2A in this first version of this
> series.

I haven't looked at the patches yet, but if I understand correctly the
choice you're presenting here is between protecting code from accessing
invalid state and not creating the invalid state in the first place. I
agree that the latter is preferable as long as it doesn't make things so
complicated that we would be willing to accept the higher risk of
breakage in the former. If it's doable in five patches, it's probably
not complicated enough to make such compromises.

Kevin




[PATCH] file-posix: Fix alignment after reopen changing O_DIRECT

2021-11-04 Thread Kevin Wolf
At the end of a reopen, we already call bdrv_refresh_limits(), which
should update bs->request_alignment according to the new file
descriptor. However, raw_probe_alignment() relies on s->needs_alignment
and just uses 1 if it isn't set. We neglected to update this field, so
starting with cache=writeback and then reopening with cache=none means
that we get an incorrect bs->request_alignment == 1 and unaligned
requests fail instead of being automatically aligned.

Fix this by recalculating s->needs_alignment in raw_refresh_limits()
before calling raw_probe_alignment().

Signed-off-by: Kevin Wolf 
---
 block/file-posix.c | 20 
 tests/qemu-iotests/142 | 22 ++
 tests/qemu-iotests/142.out | 15 +++
 3 files changed, 53 insertions(+), 4 deletions(-)

diff --git a/block/file-posix.c b/block/file-posix.c
index 7a27c83060..3f14e47096 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -167,6 +167,7 @@ typedef struct BDRVRawState {
 int page_cache_inconsistent; /* errno from fdatasync failure */
 bool has_fallocate;
 bool needs_alignment;
+bool force_alignment;
 bool drop_cache;
 bool check_cache_dropped;
 struct {
@@ -351,6 +352,17 @@ static bool dio_byte_aligned(int fd)
 return false;
 }
 
+static int raw_needs_alignment(BlockDriverState *bs)
+{
+BDRVRawState *s = bs->opaque;
+
+if ((bs->open_flags & BDRV_O_NOCACHE) != 0 && !dio_byte_aligned(s->fd)) {
+return true;
+}
+
+return s->force_alignment;
+}
+
 /* Check if read is allowed with given memory buffer and length.
  *
  * This function is used to check O_DIRECT memory buffer and request alignment.
@@ -728,9 +740,6 @@ static int raw_open_common(BlockDriverState *bs, QDict 
*options,
 
 s->has_discard = true;
 s->has_write_zeroes = true;
-if ((bs->open_flags & BDRV_O_NOCACHE) != 0 && !dio_byte_aligned(s->fd)) {
-s->needs_alignment = true;
-}
 
 if (fstat(s->fd, &st) < 0) {
 ret = -errno;
@@ -784,9 +793,10 @@ static int raw_open_common(BlockDriverState *bs, QDict 
*options,
  * so QEMU makes sure all IO operations on the device are aligned
  * to sector size, or else FreeBSD will reject them with EINVAL.
  */
-s->needs_alignment = true;
+s->force_alignment = true;
 }
 #endif
+s->needs_alignment = raw_needs_alignment(bs);
 
 #ifdef CONFIG_XFS
 if (platform_test_xfs_fd(s->fd)) {
@@ -1251,7 +1261,9 @@ static void raw_refresh_limits(BlockDriverState *bs, 
Error **errp)
 BDRVRawState *s = bs->opaque;
 struct stat st;
 
+s->needs_alignment = raw_needs_alignment(bs);
 raw_probe_alignment(bs, s->fd, errp);
+
 bs->bl.min_mem_alignment = s->buf_align;
 bs->bl.opt_mem_alignment = MAX(s->buf_align, qemu_real_host_page_size);
 
diff --git a/tests/qemu-iotests/142 b/tests/qemu-iotests/142
index 69fd10ef51..07003c597a 100755
--- a/tests/qemu-iotests/142
+++ b/tests/qemu-iotests/142
@@ -350,6 +350,28 @@ info block backing-file"
 
 echo "$hmp_cmds" | run_qemu -drive "$files","$ids" | grep "Cache"
 
+echo
+echo "--- Alignment after changing O_DIRECT ---"
+echo
+
+# Directly test the protocol level: Can unaligned requests succeed even if
+# O_DIRECT was only enabled through a reopen and vice versa?
+
+$QEMU_IO --cache=writeback -f file $TEST_IMG <

[PATCH 6/7] block: Let replace_child_noperm free children

2021-11-04 Thread Hanna Reitz
In most of the block layer, especially when traversing down from other
BlockDriverStates, we assume that BdrvChild.bs can never be NULL.  When
it becomes NULL, it is expected that the corresponding BdrvChild pointer
also becomes NULL and the BdrvChild object is freed.

Therefore, once bdrv_replace_child_noperm() sets the BdrvChild.bs
pointer to NULL, it should also immediately set the corresponding
BdrvChild pointer (like bs->file or bs->backing) to NULL.

In that context, it also makes sense for this function to free the
child.  Sometimes we cannot do so, though, because it is called in a
transactional context where the caller might still want to reinstate the
child in the abort branch (and free it only on commit), so this behavior
has to remain optional.

In bdrv_replace_child_tran()'s abort handler, we now rely on the fact
that the BdrvChild passed to bdrv_replace_child_tran() must have had a
non-NULL .bs pointer initially.  Make a note of that and assert it.

Signed-off-by: Hanna Reitz 
---
 block.c | 102 +++-
 1 file changed, 86 insertions(+), 16 deletions(-)

diff --git a/block.c b/block.c
index ff45447686..0a85ede8dc 100644
--- a/block.c
+++ b/block.c
@@ -87,8 +87,10 @@ static BlockDriverState *bdrv_open_inherit(const char 
*filename,
 static bool bdrv_recurse_has_child(BlockDriverState *bs,
BlockDriverState *child);
 
+static void bdrv_child_free(BdrvChild *child);
 static void bdrv_replace_child_noperm(BdrvChild **child,
-  BlockDriverState *new_bs);
+  BlockDriverState *new_bs,
+  bool free_empty_child);
 static void bdrv_remove_file_or_backing_child(BlockDriverState *bs,
   BdrvChild *child,
   Transaction *tran);
@@ -2256,12 +2258,16 @@ typedef struct BdrvReplaceChildState {
 BdrvChild *child;
 BdrvChild **childp;
 BlockDriverState *old_bs;
+bool free_empty_child;
 } BdrvReplaceChildState;
 
 static void bdrv_replace_child_commit(void *opaque)
 {
 BdrvReplaceChildState *s = opaque;
 
+if (s->free_empty_child && !s->child->bs) {
+bdrv_child_free(s->child);
+}
 bdrv_unref(s->old_bs);
 }
 
@@ -2270,8 +2276,23 @@ static void bdrv_replace_child_abort(void *opaque)
 BdrvReplaceChildState *s = opaque;
 BlockDriverState *new_bs = s->child->bs;
 
-/* old_bs reference is transparently moved from @s to *s->childp */
-bdrv_replace_child_noperm(s->childp, s->old_bs);
+/*
+ * old_bs reference is transparently moved from @s to s->child;
+ * pass &s->child here instead of s->childp, because *s->childp
+ * will have been cleared by bdrv_replace_child_tran()'s
+ * bdrv_replace_child_noperm() call if new_bs is NULL, and we must
+ * not pass a NULL *s->childp here.
+ */
+bdrv_replace_child_noperm(&s->child, s->old_bs, true);
+/*
+ * The child was pre-existing, so s->old_bs must be non-NULL, and
+ * s->child thus must not have been freed
+ */
+assert(s->child != NULL);
+if (!new_bs) {
+/* As described above, *s->childp was cleared, so restore it */
+*s->childp = s->child;
+}
 bdrv_unref(new_bs);
 }
 
@@ -2287,23 +2308,40 @@ static TransactionActionDrv bdrv_replace_child_drv = {
  * Note: real unref of old_bs is done only on commit.
  *
  * The function doesn't update permissions, caller is responsible for this.
+ *
+ * (*childp)->bs must not be NULL.
+ *
+ * If @free_empty_child is true and @new_bs is NULL, the BdrvChild is
+ * freed (on commit).  @free_empty_child should only be false if the
+ * caller will free the BDrvChild themselves (which may be important
+ * if this is in turn called in another transactional context).
  */
 static void bdrv_replace_child_tran(BdrvChild **childp,
 BlockDriverState *new_bs,
-Transaction *tran)
+Transaction *tran,
+bool free_empty_child)
 {
 BdrvReplaceChildState *s = g_new(BdrvReplaceChildState, 1);
 *s = (BdrvReplaceChildState) {
 .child = *childp,
 .childp = childp,
 .old_bs = (*childp)->bs,
+.free_empty_child = free_empty_child,
 };
 tran_add(tran, &bdrv_replace_child_drv, s);
 
+/* The abort handler relies on this */
+assert(s->old_bs != NULL);
+
 if (new_bs) {
 bdrv_ref(new_bs);
 }
-bdrv_replace_child_noperm(childp, new_bs);
+/*
+ * Pass free_empty_child=false, we will free the child (if
+ * necessary) in bdrv_replace_child_commit() (if our
+ * @free_empty_child parameter was true).
+ */
+bdrv_replace_child_noperm(childp, new_bs, false);
 /* old_bs reference is transparently moved from *childp to @s */
 }
 
@@ -2675,

[PATCH 4/7] block: Drop detached child from ignore list

2021-11-04 Thread Hanna Reitz
bdrv_attach_child_common_abort() restores the parent's AioContext.  To
do so, the child (which was supposed to be attached, but is now detached
again by this abort handler) is added to the ignore list for the
AioContext changing functions.

However, since we modify a BDS's children list in the BdrvChildClass's
.attach and .detach handlers, the child is already effectively detached
from the parent by this point.  We do not need to put it into the ignore
list.

Use this opportunity to clean up the empty line structure: Keep setting
the ignore list, invoking the AioContext function, and freeing the
ignore list in blocks separated by empty lines.

Signed-off-by: Hanna Reitz 
---
 block.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/block.c b/block.c
index b95f8dcf8f..6d230ad3d1 100644
--- a/block.c
+++ b/block.c
@@ -2774,14 +2774,16 @@ static void bdrv_attach_child_common_abort(void *opaque)
 }
 
 if (bdrv_child_get_parent_aio_context(child) != s->old_parent_ctx) {
-GSList *ignore = g_slist_prepend(NULL, child);
+GSList *ignore;
 
+/* No need to ignore `child`, because it has been detached already */
+ignore = NULL;
 child->klass->can_set_aio_ctx(child, s->old_parent_ctx, &ignore,
   &error_abort);
 g_slist_free(ignore);
-ignore = g_slist_prepend(NULL, child);
-child->klass->set_aio_ctx(child, s->old_parent_ctx, &ignore);
 
+ignore = NULL;
+child->klass->set_aio_ctx(child, s->old_parent_ctx, &ignore);
 g_slist_free(ignore);
 }
 
-- 
2.33.1




[PATCH 5/7] block: Pass BdrvChild ** to replace_child_noperm

2021-11-04 Thread Hanna Reitz
bdrv_replace_child_noperm() modifies BdrvChild.bs, and can potentially
set it to NULL.  That is dangerous, because BDS parents generally assume
that their children's .bs pointer is never NULL.  We therefore want to
let bdrv_replace_child_noperm() set the corresponding BdrvChild pointer
to NULL, too.

This patch lays the foundation for it by passing a BdrvChild ** pointer
to bdrv_replace_child_noperm() so that it can later use it to NULL the
BdrvChild pointer immediately after setting BdrvChild.bs to NULL.

Signed-off-by: Hanna Reitz 
---
 block.c | 59 -
 1 file changed, 33 insertions(+), 26 deletions(-)

diff --git a/block.c b/block.c
index 6d230ad3d1..ff45447686 100644
--- a/block.c
+++ b/block.c
@@ -87,7 +87,7 @@ static BlockDriverState *bdrv_open_inherit(const char 
*filename,
 static bool bdrv_recurse_has_child(BlockDriverState *bs,
BlockDriverState *child);
 
-static void bdrv_replace_child_noperm(BdrvChild *child,
+static void bdrv_replace_child_noperm(BdrvChild **child,
   BlockDriverState *new_bs);
 static void bdrv_remove_file_or_backing_child(BlockDriverState *bs,
   BdrvChild *child,
@@ -2254,6 +2254,7 @@ static int bdrv_drv_set_perm(BlockDriverState *bs, 
uint64_t perm,
 
 typedef struct BdrvReplaceChildState {
 BdrvChild *child;
+BdrvChild **childp;
 BlockDriverState *old_bs;
 } BdrvReplaceChildState;
 
@@ -2269,8 +2270,8 @@ static void bdrv_replace_child_abort(void *opaque)
 BdrvReplaceChildState *s = opaque;
 BlockDriverState *new_bs = s->child->bs;
 
-/* old_bs reference is transparently moved from @s to @s->child */
-bdrv_replace_child_noperm(s->child, s->old_bs);
+/* old_bs reference is transparently moved from @s to *s->childp */
+bdrv_replace_child_noperm(s->childp, s->old_bs);
 bdrv_unref(new_bs);
 }
 
@@ -2287,21 +2288,23 @@ static TransactionActionDrv bdrv_replace_child_drv = {
  *
  * The function doesn't update permissions, caller is responsible for this.
  */
-static void bdrv_replace_child_tran(BdrvChild *child, BlockDriverState *new_bs,
+static void bdrv_replace_child_tran(BdrvChild **childp,
+BlockDriverState *new_bs,
 Transaction *tran)
 {
 BdrvReplaceChildState *s = g_new(BdrvReplaceChildState, 1);
 *s = (BdrvReplaceChildState) {
-.child = child,
-.old_bs = child->bs,
+.child = *childp,
+.childp = childp,
+.old_bs = (*childp)->bs,
 };
 tran_add(tran, &bdrv_replace_child_drv, s);
 
 if (new_bs) {
 bdrv_ref(new_bs);
 }
-bdrv_replace_child_noperm(child, new_bs);
-/* old_bs reference is transparently moved from @child to @s */
+bdrv_replace_child_noperm(childp, new_bs);
+/* old_bs reference is transparently moved from *childp to @s */
 }
 
 /*
@@ -2672,9 +2675,10 @@ uint64_t bdrv_qapi_perm_to_blk_perm(BlockPermission 
qapi_perm)
 return permissions[qapi_perm];
 }
 
-static void bdrv_replace_child_noperm(BdrvChild *child,
+static void bdrv_replace_child_noperm(BdrvChild **childp,
   BlockDriverState *new_bs)
 {
+BdrvChild *child = *childp;
 BlockDriverState *old_bs = child->bs;
 int new_bs_quiesce_counter;
 int drain_saldo;
@@ -2767,7 +2771,7 @@ static void bdrv_attach_child_common_abort(void *opaque)
 BdrvChild *child = *s->child;
 BlockDriverState *bs = child->bs;
 
-bdrv_replace_child_noperm(child, NULL);
+bdrv_replace_child_noperm(s->child, NULL);
 
 if (bdrv_get_aio_context(bs) != s->old_child_ctx) {
 bdrv_try_set_aio_context(bs, s->old_child_ctx, &error_abort);
@@ -2867,7 +2871,7 @@ static int bdrv_attach_child_common(BlockDriverState 
*child_bs,
 }
 
 bdrv_ref(child_bs);
-bdrv_replace_child_noperm(new_child, child_bs);
+bdrv_replace_child_noperm(&new_child, child_bs);
 
 *child = new_child;
 
@@ -2927,12 +2931,12 @@ static int bdrv_attach_child_noperm(BlockDriverState 
*parent_bs,
 return 0;
 }
 
-static void bdrv_detach_child(BdrvChild *child)
+static void bdrv_detach_child(BdrvChild **childp)
 {
-BlockDriverState *old_bs = child->bs;
+BlockDriverState *old_bs = (*childp)->bs;
 
-bdrv_replace_child_noperm(child, NULL);
-bdrv_child_free(child);
+bdrv_replace_child_noperm(childp, NULL);
+bdrv_child_free(*childp);
 
 if (old_bs) {
 /*
@@ -3038,7 +3042,7 @@ void bdrv_root_unref_child(BdrvChild *child)
 BlockDriverState *child_bs;
 
 child_bs = child->bs;
-bdrv_detach_child(child);
+bdrv_detach_child(&child);
 bdrv_unref(child_bs);
 }
 
@@ -4891,30 +4895,33 @@ static void 
bdrv_remove_file_or_backing_child(BlockDriverState *bs,
   BdrvChild *child,
   Transaction *t

[PATCH 7/7] iotests/030: Unthrottle parallel jobs in reverse

2021-11-04 Thread Hanna Reitz
See the comment for why this is necessary.

Signed-off-by: Hanna Reitz 
---
 tests/qemu-iotests/030 | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/tests/qemu-iotests/030 b/tests/qemu-iotests/030
index 5fb65b4bef..567bf1da67 100755
--- a/tests/qemu-iotests/030
+++ b/tests/qemu-iotests/030
@@ -251,7 +251,16 @@ class TestParallelOps(iotests.QMPTestCase):
  speed=1024)
 self.assert_qmp(result, 'return', {})
 
-for job in pending_jobs:
+# Do this in reverse: After unthrottling them, some jobs may finish
+# before we have unthrottled all of them.  This will drain their
+# subgraph, and this will make jobs above them advance (despite those
+# jobs on top being throttled).  In the worst case, all jobs below the
+# top one are finished before we can unthrottle it, and this makes it
+# advance so far that it completes before we can unthrottle it - which
+# results in an error.
+# Starting from the top (i.e. in reverse) does not have this problem:
+# When a job finishes, the ones below it are not advanced.
+for job in reversed(pending_jobs):
 result = self.vm.qmp('block-job-set-speed', device=job, speed=0)
 self.assert_qmp(result, 'return', {})
 
-- 
2.33.1




[PATCH 3/7] block: Unite remove_empty_child and child_free

2021-11-04 Thread Hanna Reitz
Now that bdrv_remove_empty_child() no longer removes the child from the
parent's children list but only checks that it is not in such a list, it
is only a wrapper around bdrv_child_free() that checks that the child is
empty and unused.  That should apply to all children that we free, so
put those checks into bdrv_child_free() and drop
bdrv_remove_empty_child().

Signed-off-by: Hanna Reitz 
---
 block.c | 26 +-
 1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/block.c b/block.c
index 243ae206b5..b95f8dcf8f 100644
--- a/block.c
+++ b/block.c
@@ -2740,19 +2740,19 @@ static void bdrv_replace_child_noperm(BdrvChild *child,
 }
 }
 
-static void bdrv_child_free(void *opaque)
-{
-BdrvChild *c = opaque;
-
-g_free(c->name);
-g_free(c);
-}
-
-static void bdrv_remove_empty_child(BdrvChild *child)
+/**
+ * Free the given @child.
+ *
+ * The child must be empty (i.e. `child->bs == NULL`) and it must be
+ * unused (i.e. not in a children list).
+ */
+static void bdrv_child_free(BdrvChild *child)
 {
 assert(!child->bs);
 assert(!child->next.le_prev); /* not in children list */
-bdrv_child_free(child);
+
+g_free(child->name);
+g_free(child);
 }
 
 typedef struct BdrvAttachChildCommonState {
@@ -2786,7 +2786,7 @@ static void bdrv_attach_child_common_abort(void *opaque)
 }
 
 bdrv_unref(bs);
-bdrv_remove_empty_child(child);
+bdrv_child_free(child);
 *s->child = NULL;
 }
 
@@ -2859,7 +2859,7 @@ static int bdrv_attach_child_common(BlockDriverState 
*child_bs,
 
 if (ret < 0) {
 error_propagate(errp, local_err);
-bdrv_remove_empty_child(new_child);
+bdrv_child_free(new_child);
 return ret;
 }
 }
@@ -2930,7 +2930,7 @@ static void bdrv_detach_child(BdrvChild *child)
 BlockDriverState *old_bs = child->bs;
 
 bdrv_replace_child_noperm(child, NULL);
-bdrv_remove_empty_child(child);
+bdrv_child_free(child);
 
 if (old_bs) {
 /*
-- 
2.33.1




[PATCH 2/7] block: Manipulate children list in .attach/.detach

2021-11-04 Thread Hanna Reitz
The children list is specific to BDS parents.  We should not modify it
in the general children modification code, but let BDS parents deal with
it in their .attach() and .detach() methods.

This also has the advantage that a BdrvChild is removed from the
children list before its .bs pointer can become NULL.  BDS parents
generally assume that their children's .bs pointer is never NULL, so
this is actually a bug fix.

Signed-off-by: Hanna Reitz 
---
 block.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/block.c b/block.c
index 580cb77a70..243ae206b5 100644
--- a/block.c
+++ b/block.c
@@ -1387,6 +1387,8 @@ static void bdrv_child_cb_attach(BdrvChild *child)
 {
 BlockDriverState *bs = child->opaque;
 
+QLIST_INSERT_HEAD(&bs->children, child, next);
+
 if (child->role & BDRV_CHILD_COW) {
 bdrv_backing_attach(child);
 }
@@ -1403,6 +1405,8 @@ static void bdrv_child_cb_detach(BdrvChild *child)
 }
 
 bdrv_unapply_subtree_drain(child, bs);
+
+QLIST_REMOVE(child, next);
 }
 
 static int bdrv_child_cb_update_filename(BdrvChild *c, BlockDriverState *base,
@@ -2747,7 +2751,7 @@ static void bdrv_child_free(void *opaque)
 static void bdrv_remove_empty_child(BdrvChild *child)
 {
 assert(!child->bs);
-QLIST_SAFE_REMOVE(child, next);
+assert(!child->next.le_prev); /* not in children list */
 bdrv_child_free(child);
 }
 
@@ -2913,7 +2917,6 @@ static int bdrv_attach_child_noperm(BlockDriverState 
*parent_bs,
 return ret;
 }
 
-QLIST_INSERT_HEAD(&parent_bs->children, *child, next);
 /*
  * child is removed in bdrv_attach_child_common_abort(), so don't care to
  * abort this change separately.
@@ -4851,7 +4854,6 @@ static void bdrv_remove_filter_or_cow_child_abort(void 
*opaque)
 BdrvRemoveFilterOrCowChild *s = opaque;
 BlockDriverState *parent_bs = s->child->opaque;
 
-QLIST_INSERT_HEAD(&parent_bs->children, s->child, next);
 if (s->is_backing) {
 parent_bs->backing = s->child;
 } else {
@@ -4906,7 +4908,6 @@ static void 
bdrv_remove_file_or_backing_child(BlockDriverState *bs,
 };
 tran_add(tran, &bdrv_remove_filter_or_cow_child_drv, s);
 
-QLIST_SAFE_REMOVE(child, next);
 if (s->is_backing) {
 bs->backing = NULL;
 } else {
-- 
2.33.1




[PATCH 0/7] block: Attempt on fixing 030-reported errors

2021-11-04 Thread Hanna Reitz
Hi,

I’ve tried to investigate what causes the iotest 030 to fail.  Here’s
what I found:

(1) stream_prepare() gets the base node by looking up the node below
above_base.  It then invokes bdrv_cor_filter_drop(), before we
actually use the base node.
bdrv_cor_filter_drop() modifies the block graph, which means
draining, which means other parties might modify the graph, too.
Therefore, afterwards, the node below above_base might be completely
different, and the base node we got before might already be gone.

(2) bdrv_replace_child_noperm() can set BdrvChild.bs to NULL.  That’s
problematic, because most of our code cannot deal with BdrvChild
objects whose .bs pointer is NULL.  We assume that such objects are
immediately removed from the BDS.children list, and that they won’t
appear under bs->backing or bs->file (i.e. that those pointers are
immediately NULLed when bs->{backing,file}->bs is NULLed).
After setting BdrvChild.bs to NULL, bdrv_replace_child_noperm() may
invoke bdrv_parent_drained_end_single() on the BdrvChild.
Therefore, other code is run at that point, and it might not be
ready to encounter something like
`bs->backing != NULL && bs->backing->bs == NULL`.

(3) 030 in one case launches four stream jobs concurrently, all with
speed=1024.  It then unthrottles them one after each other, but the
problem is that if one job finishes, the jobs above it will be
advanced by a step (which is actually 512k); so since we unthrottle
bottom to top, it’s possible that all jobs below the top job are
finished before we get to unthrottle the top job.  This will advance
the top job so far (3 * 512k + 512k = 2M) that it actually finishes
despite still being throttled.  Attempting to unthrottle it then
throws an error.


Here’s how I think we can solve these problems:

(1) Invoke bdrv_cor_filter_drop() first, then get the base node
afterwards, when the graph will no longer change.
Implemented in patch 1.

(2A) bdrv_replace_child_noperm() should immediately set bs->file or
 bs->backing to NULL when it sets bs->{file,backing}->bs to NULL.
 It should also immediately remove any BdrvChild with .bs == NULL
 from the parent’s BDS.children list.
 Implemented in patches 2 through 6.

(2B) Alternatively, we could always keep the whole subgraph drained
 while we manipulate it.  Then, the bdrv_parent_drained_end_single()
 in bdrv_replace_child_noperm() wouldn’t do anything.
 To fix 030, we would need to add a drained section to
 stream_prepare(): Namely we’d need to drain the subgraph below the
 COR filter node.
 This would be a much simpler solution, but I don’t feel like it’s
 the right one.

(3) Just unthrottle the jobs from bottom to top instead of top to
bottom.


As you can see, I’m not sure which of 2A or 2B is the right solution.  I
decided to investigate both: 2A was much more complicated, but seemed
like the right thing to do; 2B is much simpler, but doesn’t feel as
right.  Therefore, I decided to go with 2A in this first version of this
series.


Hanna Reitz (7):
  stream: Traverse graph after modification
  block: Manipulate children list in .attach/.detach
  block: Unite remove_empty_child and child_free
  block: Drop detached child from ignore list
  block: Pass BdrvChild ** to replace_child_noperm
  block: Let replace_child_noperm free children
  iotests/030: Unthrottle parallel jobs in reverse

 block.c| 178 +
 block/stream.c |   7 +-
 tests/qemu-iotests/030 |  11 ++-
 3 files changed, 144 insertions(+), 52 deletions(-)

-- 
2.33.1




[PATCH 1/7] stream: Traverse graph after modification

2021-11-04 Thread Hanna Reitz
bdrv_cor_filter_drop() modifies the block graph.  That means that other
parties can also modify the block graph before it returns.  Therefore,
we cannot assume that the result of a graph traversal we did before
remains valid afterwards.

We should thus fetch `base` and `unfiltered_base` afterwards instead of
before.

Signed-off-by: Hanna Reitz 
---
 block/stream.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/block/stream.c b/block/stream.c
index 97bee482dc..e45113aed6 100644
--- a/block/stream.c
+++ b/block/stream.c
@@ -54,8 +54,8 @@ static int stream_prepare(Job *job)
 {
 StreamBlockJob *s = container_of(job, StreamBlockJob, common.job);
 BlockDriverState *unfiltered_bs = bdrv_skip_filters(s->target_bs);
-BlockDriverState *base = bdrv_filter_or_cow_bs(s->above_base);
-BlockDriverState *unfiltered_base = bdrv_skip_filters(base);
+BlockDriverState *base;
+BlockDriverState *unfiltered_base;
 Error *local_err = NULL;
 int ret = 0;
 
@@ -63,6 +63,9 @@ static int stream_prepare(Job *job)
 bdrv_cor_filter_drop(s->cor_filter_bs);
 s->cor_filter_bs = NULL;
 
+base = bdrv_filter_or_cow_bs(s->above_base);
+unfiltered_base = bdrv_skip_filters(base);
+
 if (bdrv_cow_child(unfiltered_bs)) {
 const char *base_id = NULL, *base_fmt = NULL;
 if (unfiltered_base) {
-- 
2.33.1




Re: [PATCH v1] job.c: add missing notifier initialization

2021-11-04 Thread Stefan Hajnoczi
On Wed, Nov 03, 2021 at 12:21:55PM -0400, Emanuele Giuseppe Esposito wrote:
> It seems that on_idle list is not properly initialized like
> the other notifiers.
> 
> Signed-off-by: Emanuele Giuseppe Esposito 
> ---
>  job.c | 1 +
>  1 file changed, 1 insertion(+)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature


[PATCH v4 3/3] qapi: deprecate drive-backup

2021-11-04 Thread Markus Armbruster
From: Vladimir Sementsov-Ogievskiy 

Modern way is using blockdev-add + blockdev-backup, which provides a
lot more control on how target is opened.

As example of drive-backup problems consider the following:

User of drive-backup expects that target will be opened in the same
cache and aio mode as source. Corresponding logic is in
drive_backup_prepare(), where we take bs->open_flags of source.

It works rather bad if source was added by blockdev-add. Assume source
is qcow2 image. On blockdev-add we should specify aio and cache options
for file child of qcow2 node. What happens next:

drive_backup_prepare() looks at bs->open_flags of qcow2 source node.
But there no BDRV_O_NOCAHE neither BDRV_O_NATIVE_AIO: BDRV_O_NOCAHE is
places in bs->file->bs->open_flags, and BDRV_O_NATIVE_AIO is nowhere,
as file-posix parse options and simply set s->use_linux_aio.

The documentation is updated in a minimal way, so that drive-backup is
noted only as a deprecated command, and blockdev-backup used in most of
places.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Kashyap Chamarthy 
Signed-off-by: Markus Armbruster 
---
 docs/about/deprecated.rst  | 11 ++
 docs/interop/live-block-operations.rst | 47 +-
 qapi/block-core.json   |  5 ++-
 qapi/transaction.json  |  6 +++-
 4 files changed, 51 insertions(+), 18 deletions(-)

diff --git a/docs/about/deprecated.rst b/docs/about/deprecated.rst
index 56f9ad15ab..e3bdb89c65 100644
--- a/docs/about/deprecated.rst
+++ b/docs/about/deprecated.rst
@@ -239,6 +239,17 @@ single ``bitmap``, the new ``block-export-add`` uses a 
list of ``bitmaps``.
 Member ``values`` in return value elements with meta-type ``enum`` is
 deprecated.  Use ``members`` instead.
 
+``drive-backup`` (since 6.2)
+
+
+Use ``blockdev-backup`` in combination with ``blockdev-add`` instead.
+This change primarily separates the creation/opening process of the backup
+target with explicit, separate steps. ``blockdev-backup`` uses mostly the
+same arguments as ``drive-backup``, except the ``format`` and ``mode``
+options are removed in favor of using explicit ``blockdev-create`` and
+``blockdev-add`` calls. See :doc:`/interop/live-block-operations` for
+details.
+
 System accelerators
 ---
 
diff --git a/docs/interop/live-block-operations.rst 
b/docs/interop/live-block-operations.rst
index 9e3635b233..d403d96f58 100644
--- a/docs/interop/live-block-operations.rst
+++ b/docs/interop/live-block-operations.rst
@@ -116,8 +116,8 @@ QEMU block layer supports.
 (3) ``drive-mirror`` (and ``blockdev-mirror``): Synchronize a running
 disk to another image.
 
-(4) ``drive-backup`` (and ``blockdev-backup``): Point-in-time (live) copy
-of a block device to a destination.
+(4) ``blockdev-backup`` (and the deprecated ``drive-backup``):
+Point-in-time (live) copy of a block device to a destination.
 
 
 .. _`Interacting with a QEMU instance`:
@@ -555,13 +555,14 @@ Currently, there are four different kinds:
 
 (3) ``none`` -- Synchronize only the new writes from this point on.
 
-.. note:: In the case of ``drive-backup`` (or ``blockdev-backup``),
-  the behavior of ``none`` synchronization mode is different.
-  Normally, a ``backup`` job consists of two parts: Anything
-  that is overwritten by the guest is first copied out to
-  the backup, and in the background the whole image is
-  copied from start to end. With ``sync=none``, it's only
-  the first part.
+.. note:: In the case of ``blockdev-backup`` (or deprecated
+  ``drive-backup``), the behavior of ``none``
+  synchronization mode is different.  Normally, a
+  ``backup`` job consists of two parts: Anything that is
+  overwritten by the guest is first copied out to the
+  backup, and in the background the whole image is copied
+  from start to end. With ``sync=none``, it's only the
+  first part.
 
 (4) ``incremental`` -- Synchronize content that is described by the
 dirty bitmap
@@ -928,19 +929,22 @@ Shutdown the guest, by issuing the ``quit`` QMP command::
 }
 
 
-Live disk backup --- ``drive-backup`` and ``blockdev-backup``
--
+Live disk backup --- ``blockdev-backup`` and the deprecated``drive-backup``
+---
 
-The ``drive-backup`` (and its newer equivalent ``blockdev-backup``) allows
+The ``blockdev-backup`` (and the deprecated ``drive-backup``) allows
 you to create a point-in-time snapshot.
 
-In this case, the point-in-time is when you *start* the ``drive-backup``
-(or its newer equivalent ``blockdev-backup``) command.
+In this case, the point-in-time is when you *start* the
+``blockdev-backup`` (or deprecated ``drive-backup``) command.
 
 
 QMP

[PATCH v4 1/3] docs/block-replication: use blockdev-backup

2021-11-04 Thread Markus Armbruster
From: Vladimir Sementsov-Ogievskiy 

We are going to deprecate drive-backup, so don't mention it here.
Moreover, blockdev-backup seems more correct in the context.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: John Snow 
Signed-off-by: Markus Armbruster 
---
 docs/block-replication.txt | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/docs/block-replication.txt b/docs/block-replication.txt
index 108e9166a8..59eb2b33b3 100644
--- a/docs/block-replication.txt
+++ b/docs/block-replication.txt
@@ -79,7 +79,7 @@ Primary | ||  Secondary disk <- 
hidden-disk 5 <-
   ||| |
   ||| |
   ||'-'
-  ||   drive-backup sync=none 6
+  || blockdev-backup sync=none 6
 
 1) The disk on the primary is represented by a block device with two
 children, providing replication between a primary disk and the host that
@@ -101,7 +101,7 @@ should support bdrv_make_empty() and backing file.
 that is modified by the primary VM. It should also start as an empty disk,
 and the driver supports bdrv_make_empty() and backing file.
 
-6) The drive-backup job (sync=none) is run to allow hidden-disk to buffer
+6) The blockdev-backup job (sync=none) is run to allow hidden-disk to buffer
 any state that would otherwise be lost by the speculative write-through
 of the NBD server into the secondary disk. So before block replication,
 the primary disk and secondary disk should contain the same data.
-- 
2.31.1




[PATCH v4 0/3] qapi & doc: deprecate drive-backup

2021-11-04 Thread Markus Armbruster
See 03 commit message for details. 01-02 are preparation docs update.

v4: deprecate drive-backup transaction by squashing
[PATCH v4 5/5] block: Deprecate transaction type drive-backup
Message-Id: <20211025042405.3762351-6-arm...@redhat.com>
into PATCH 3

v3: wording fix-ups and improvements suggested by Kashyap
v2: add a lot of documentation changes
v1 was "[PATCH] qapi: deprecate drive-backup"

Vladimir Sementsov-Ogievskiy (3):
  docs/block-replication: use blockdev-backup
  docs/interop/bitmaps: use blockdev-backup
  qapi: deprecate drive-backup

 docs/about/deprecated.rst  |  11 +
 docs/block-replication.txt |   4 +-
 docs/interop/bitmaps.rst   | 285 +++--
 docs/interop/live-block-operations.rst |  47 ++--
 qapi/block-core.json   |   5 +-
 qapi/transaction.json  |   6 +-
 6 files changed, 268 insertions(+), 90 deletions(-)

-- 
2.31.1




[PATCH v4 2/3] docs/interop/bitmaps: use blockdev-backup

2021-11-04 Thread Markus Armbruster
From: Vladimir Sementsov-Ogievskiy 

We are going to deprecate drive-backup, so use modern interface here.
In examples where target image creation is shown, show blockdev-add as
well. If target creation omitted, omit blockdev-add as well.

Reviewed-by: Kashyap Chamarthy 
Signed-off-by: Vladimir Sementsov-Ogievskiy 
Signed-off-by: Markus Armbruster 
---
 docs/interop/bitmaps.rst | 285 +--
 1 file changed, 215 insertions(+), 70 deletions(-)

diff --git a/docs/interop/bitmaps.rst b/docs/interop/bitmaps.rst
index 059ad67929..1de46febdc 100644
--- a/docs/interop/bitmaps.rst
+++ b/docs/interop/bitmaps.rst
@@ -539,12 +539,11 @@ other partial disk images on top of a base image to 
reconstruct a full backup
 from the point in time at which the incremental backup was issued.
 
 The "Push Model" here references the fact that QEMU is "pushing" the modified
-blocks out to a destination. We will be using the `drive-backup
-`_ and `blockdev-backup
-`_ QMP commands to create both
+blocks out to a destination. We will be using the  `blockdev-backup
+`_ QMP command to create both
 full and incremental backups.
 
-Both of these commands are jobs, which have their own QMP API for querying and
+The command is a background job, which has its own QMP API for querying and
 management documented in `Background jobs
 `_.
 
@@ -557,6 +556,10 @@ create a new incremental backup chain attached to a drive.
 This example creates a new, full backup of "drive0" and accompanies it with a
 new, empty bitmap that records writes from this point in time forward.
 
+The target can be created with the help of `blockdev-add
+`_ or `blockdev-create
+`_ command.
+
 .. note:: Any new writes that happen after this command is issued, even while
   the backup job runs, will be written locally and not to the backup
   destination. These writes will be recorded in the bitmap
@@ -576,12 +579,11 @@ new, empty bitmap that records writes from this point in 
time forward.
  }
},
{
- "type": "drive-backup",
+ "type": "blockdev-backup",
  "data": {
"device": "drive0",
-   "target": "/path/to/drive0.full.qcow2",
-   "sync": "full",
-   "format": "qcow2"
+   "target": "target0",
+   "sync": "full"
  }
}
  ]
@@ -664,12 +666,11 @@ use a transaction to reset the bitmap while making a new 
full backup:
}
  },
  {
-   "type": "drive-backup",
+   "type": "blockdev-backup",
"data": {
  "device": "drive0",
- "target": "/path/to/drive0.new_full.qcow2",
- "sync": "full",
- "format": "qcow2"
+ "target": "target0",
+ "sync": "full"
}
  }
]
@@ -728,19 +729,35 @@ Example: First Incremental Backup
$ qemu-img create -f qcow2 drive0.inc0.qcow2 \
  -b drive0.full.qcow2 -F qcow2
 
+#. Add target block node:
+
+   .. code-block:: QMP
+
+-> {
+ "execute": "blockdev-add",
+ "arguments": {
+   "node-name": "target0",
+   "driver": "qcow2",
+   "file": {
+ "driver": "file",
+ "filename": "drive0.inc0.qcow2"
+   }
+ }
+   }
+
+<- { "return": {} }
+
 #. Issue an incremental backup command:
 
.. code-block:: QMP
 
 -> {
- "execute": "drive-backup",
+ "execute": "blockdev-backup",
  "arguments": {
"device": "drive0",
"bitmap": "bitmap0",
-   "target": "drive0.inc0.qcow2",
-   "format": "qcow2",
-   "sync": "incremental",
-   "mode": "existing"
+   "target": "target0",
+   "sync": "incremental"
  }
}
 
@@ -785,20 +802,36 @@ Example: Second Incremental Backup
$ qemu-img create -f qcow2 drive0.inc1.qcow2 \
  -b drive0.inc0.qcow2 -F qcow2
 
+#. Add target block node:
+
+   .. code-block:: QMP
+
+-> {
+ "execute": "blockdev-add",
+ "arguments": {
+   "node-name": "target0",
+   "driver": "qcow2",
+   "file": {
+ "driver": "file",
+ "filename": "drive0.inc1.qcow2"
+   }
+ }
+   }
+
+<- { "return": {} }
+
 #. Issue a new incremental backup command. The only difference here is that we
have changed the target image below.
 
.. code-block:: QMP
 
 -> {
- "execute": "drive-backup",
+ "execute": "blockdev-backup",
  "arguments": {
"device": "drive0",
"bitmap": "bitmap0",
-   "target": "drive0.inc1.qcow2",
-   "format": "qcow2",
-   "sync": "incremental",
-   "mode": "existing"
+   "target": "target0",
+   "sync": "incremental"
  }
}
 
@@ -866,20 +899,36 @@ image: