On 03.08.21 16:34, Kevin Wolf wrote:
Am 26.07.2021 um 16:46 hat Max Reitz geschrieben:We must check whether the job is force-cancelled early in our main loop, most importantly before any `continue` statement. For example, we used to have `continue`s before our current checking location that are triggered by `mirror_flush()` failing. So, if `mirror_flush()` kept failing, force-cancelling the job would not terminate it.A job being force-cancelled should be treated the same as the job having failed, so put the check in the same place where we check `s->ret < 0`. Buglink: https://gitlab.com/qemu-project/qemu/-/issues/462 Signed-off-by: Max Reitz <mre...@redhat.com> --- block/mirror.c | 7 +------ 1 file changed, 1 insertion(+), 6 deletions(-) diff --git a/block/mirror.c b/block/mirror.c index 72e02fa34e..46d1a1e5a2 100644 --- a/block/mirror.c +++ b/block/mirror.c @@ -993,7 +993,7 @@ static int coroutine_fn mirror_run(Job *job, Error **errp) mirror_wait_for_any_operation(s, true); }- if (s->ret < 0) {+ if (s->ret < 0 || job_is_cancelled(&s->common.job)) { ret = s->ret; goto immediate_exit; } @@ -1078,8 +1078,6 @@ static int coroutine_fn mirror_run(Job *job, Error **errp) break; }- ret = 0;- if (job_is_ready(&s->common.job) && !should_complete) { delay_ns = (s->in_flight == 0 && cnt == 0 ? BLOCK_JOB_SLICE_TIME : 0); @@ -1087,9 +1085,6 @@ static int coroutine_fn mirror_run(Job *job, Error **errp) trace_mirror_before_sleep(s, cnt, job_is_ready(&s->common.job), delay_ns); job_sleep_ns(&s->common.job, delay_ns); - if (job_is_cancelled(&s->common.job)) { - break; - }I think it was intentional that the check is here because it means skipping the job_sleep_ns() and instead cancelling immediately, and we probably still want that. Between your check above and here, the coroutine can yield, so cancellation could have been newly requested.
I’m afraid I don’t quite understand. If cancel is requested in job_sleep_ns(), then we will go back to the top of the loop, wait for in-flight active requests and then break. Waiting for the in-flight requests seems unnecessary, but does it really make a difference in practice? We don’t start new requests, so it should be legal to wait for existing ones to settle, and also I believe someone will have to wait for those in-flight requests anyway (when the mirror top node is removed). (The only thing we could do is to cancel the in-flight requests, but that is what mirror_cancel() does.)
Looking more at the whole loop, there are a couple of places that can yield. Of course we can check whether the job has been cancelled after every single one of them, but that would be a bit strange. We only really need to check before we initiate new requests or want to change the state. I believe the right place to do the check would be after the job_pause_point().
And perhaps the active write functions (bdrv_mirror_top_do_write() and bdrv_mirror_top_pwritev()) should stop copying to the target if the job has been cancelled.
Max
So have the check in both places, I guess? And a comment to explain why neither is redundant.s->last_pause_ns = qemu_clock_get_ns(QEMU_CLOCK_REALTIME); }Kevin