On Mon, Feb 05, 2018 at 02:28:55PM -0500, John Snow wrote: > > > On 01/31/2018 09:19 PM, Liang Li wrote: > > On Tue, Jan 30, 2018 at 03:18:31PM -0500, John Snow wrote: > >> > >> > >> On 01/30/2018 03:38 AM, Liang Li wrote: > >>> When doing drive mirror to a low speed shared storage, if there was heavy > >>> BLK IO write workload in VM after the 'ready' event, drive mirror block > >>> job > >>> can't be canceled immediately, it would keep running until the heavy BLK > >>> IO > >>> workload stopped in the VM. > >>> > >>> Because libvirt depends on block-job-cancel for block live migration, the > >>> current block-job-cancel has the semantic to make sure data is in sync > >>> after > >>> the 'ready' event. This semantic can't meet some requirement, for > >>> example, > >>> people may use drive mirror for realtime backup while need the ability of > >>> block live migration. If drive mirror can't not be cancelled immediately, > >>> it means block live migration need to wait, because libvirt make use drive > >>> mirror to implement block live migration and only one drive mirror block > >>> job is allowed at the same time for a give block dev. > >>> > >>> We need a new interface for 'force cancel', which could quit block job > >>> immediately if don't care about whether data is in sync or not. > >>> > >>> 'force' is not used by libvirt currently, to make things simple, change > >>> it's semantic slightly, hope it will not break some use case which need > >>> its > >>> original semantic. > >>> > >>> Cc: Paolo Bonzini <pbonz...@redhat.com> > >>> Cc: Jeff Cody <jc...@redhat.com> > >>> Cc: Kevin Wolf <kw...@redhat.com> > >>> Cc: Max Reitz <mre...@redhat.com> > >>> Cc: Eric Blake <ebl...@redhat.com> > >>> Cc: John Snow <js...@redhat.com> > >>> Reported-by: Huaitong Han <huanhuait...@didichuxing.com> > >>> Signed-off-by: Huaitong Han <huanhuait...@didichuxing.com> > >>> Signed-off-by: Liang Li <liliang...@didichuxing.com> > >>> --- > >>> block/mirror.c | 9 +++------ > >>> blockdev.c | 4 ++-- > >>> blockjob.c | 11 ++++++----- > >>> hmp-commands.hx | 3 ++- > >>> include/block/blockjob.h | 9 ++++++++- > >>> qapi/block-core.json | 6 ++++-- > >>> tests/test-blockjob-txn.c | 8 ++++---- > >>> 7 files changed, 29 insertions(+), 21 deletions(-) > >>> > >>> diff --git a/block/mirror.c b/block/mirror.c > >>> index c9badc1..c22dff9 100644 > >>> --- a/block/mirror.c > >>> +++ b/block/mirror.c > >>> @@ -869,11 +869,8 @@ static void coroutine_fn mirror_run(void *opaque) > >>> > >>> ret = 0; > >>> trace_mirror_before_sleep(s, cnt, s->synced, delay_ns); > >>> - if (!s->synced) { > >>> - block_job_sleep_ns(&s->common, delay_ns); > >>> - if (block_job_is_cancelled(&s->common)) { > >>> - break; > >>> - } > >>> + if (block_job_is_cancelled(&s->common) && s->common.force) { > >>> + break; > >> > >> what's the justification for removing the sleep in the case that > >> !s->synced && !block_job_is_cancelled(...) ? > >> > > if !block_job_is_cancelled() satisfied, the code in 'if (!should_complete) > > {}' > > will execute, there is a block_job_sleep_ns there. > > > > block_job_sleep_ns is for rate throttling, if there is no more data to > > sync, > > sleep is not needed, right? > > > >>> } else if (!should_complete) { > >>> delay_ns = (s->in_flight == 0 && cnt == 0 ? SLICE_TIME : 0); > >>> block_job_sleep_ns(&s->common, delay_ns); > >>> @@ -887,7 +884,7 @@ immediate_exit: > >>> * or it was cancelled prematurely so that we do not guarantee > >>> that > >>> * the target is a copy of the source. > >>> */ > >>> - assert(ret < 0 || (!s->synced && > >>> block_job_is_cancelled(&s->common))); > >>> + assert(ret < 0 || block_job_is_cancelled(&s->common)); > >> > >> This assertion gets weaker in the case where force isn't provided, is > >> that desired? > >> > > yes. if force quit is used, the following condition can be true > > > > (ret >= 0) && (s->synced) && (block_job_is_cancelled(&s->common)) > > > > so the above assert should be changed, or it will be failed. > > > > What I mean is that when we *don't* use force quit, the assertion here > is weakened. You can work the force conditional in here: > > ret < 0 || (block_job_is_cancelled(job) && (job->force || !s->synced)) > > which preserves the stricter assertion when force isn't provided. >
you are right, will change. > >>> assert(need_drain); > >>> mirror_wait_for_all_io(s); > >>> } > >>> diff --git a/blockdev.c b/blockdev.c > >>> index 8e977ee..039f156 100644 > >>> --- a/blockdev.c > >>> +++ b/blockdev.c > >>> @@ -145,7 +145,7 @@ void blockdev_mark_auto_del(BlockBackend *blk) > >>> aio_context_acquire(aio_context); > >>> > >>> if (bs->job) { > >>> - block_job_cancel(bs->job); > >>> + block_job_cancel(bs->job, false); > >>> } > >>> > >>> aio_context_release(aio_context); > >>> @@ -3802,7 +3802,7 @@ void qmp_block_job_cancel(const char *device, > >>> } > >>> > >>> trace_qmp_block_job_cancel(job); > >>> - block_job_cancel(job); > >>> + block_job_cancel(job, force); > >>> out: > >>> aio_context_release(aio_context); > >>> } > >>> diff --git a/blockjob.c b/blockjob.c > >>> index f5cea84..0aacb50 100644 > >>> --- a/blockjob.c > >>> +++ b/blockjob.c > >>> @@ -365,7 +365,7 @@ static void block_job_completed_single(BlockJob *job) > >>> block_job_unref(job); > >>> } > >>> > >>> -static void block_job_cancel_async(BlockJob *job) > >>> +static void block_job_cancel_async(BlockJob *job, bool force) > >>> { > >>> if (job->iostatus != BLOCK_DEVICE_IO_STATUS_OK) { > >>> block_job_iostatus_reset(job); > >>> @@ -376,6 +376,7 @@ static void block_job_cancel_async(BlockJob *job) > >>> job->pause_count--; > >>> } > >>> job->cancelled = true; > >>> + job->force = force; > >>> } > >>> > >> > >> I suppose this is okay; we're setting a permanent property of the job as > >> part of a limited operation. > >> > >> Granted, the job should disappear afterwards, so it should never > >> conflict, but it made me wonder for a moment. > >> > >> What happens if we cancel with force = true and then immediately cancel > >> again with force = false? What happens? Can it cause issues? > >> > > > > Indeed. It can be fixed by: > > > > if (!job->force) > > job->force = force > > > > it's that ok ? > > > > I think so, yes. or job->force |= force, or your preferred equivalent > for only allowing false-->true here. > > >>> static int block_job_finish_sync(BlockJob *job, > >>> @@ -437,7 +438,7 @@ static void block_job_completed_txn_abort(BlockJob > >>> *job) > >>> * on the caller, so leave it. */ > >>> QLIST_FOREACH(other_job, &txn->jobs, txn_list) { > >>> if (other_job != job) { > >>> - block_job_cancel_async(other_job); > >>> + block_job_cancel_async(other_job, true); > >> > >> What's the rationale for deciding that transactional cancellations are > >> always force=true? > >> > > > > no particular reason, just try to speed up the abort process. :) > > > > I'd prefer we not change the semantics of how transactions fail without > a stronger justification than wanting to make it faster! > OK. I will send the v2, thanks! Liang > >> Granted, this doesn't matter currently because mirror isn't a > >> transactional job, but that makes the decision all the more curious. > >> > >>> } > >>> } > > > > Thanks for your comments. > > > > Liang > >