Re: [PATCH for-6.1? v2 6/7] mirror: Check job_is_cancelled() earlier

Max Reitz Wed, 04 Aug 2021 01:26:33 -0700

On 03.08.21 16:34, Kevin Wolf wrote:

Am 26.07.2021 um 16:46 hat Max Reitz geschrieben:

We must check whether the job is force-cancelled early in our main loop,
most importantly before any `continue` statement.  For example, we used
to have `continue`s before our current checking location that are
triggered by `mirror_flush()` failing.  So, if `mirror_flush()` kept
failing, force-cancelling the job would not terminate it.


A job being force-cancelled should be treated the same as the job having
failed, so put the check in the same place where we check `s->ret < 0`.

Buglink: https://gitlab.com/qemu-project/qemu/-/issues/462
Signed-off-by: Max Reitz <mre...@redhat.com>
---
  block/mirror.c | 7 +------
  1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index 72e02fa34e..46d1a1e5a2 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -993,7 +993,7 @@ static int coroutine_fn mirror_run(Job *job, Error **errp)
              mirror_wait_for_any_operation(s, true);
          }

- if (s->ret < 0) {

+        if (s->ret < 0 || job_is_cancelled(&s->common.job)) {
              ret = s->ret;
              goto immediate_exit;
          }
@@ -1078,8 +1078,6 @@ static int coroutine_fn mirror_run(Job *job, Error **errp)
              break;
          }

- ret = 0;

-
          if (job_is_ready(&s->common.job) && !should_complete) {
              delay_ns = (s->in_flight == 0 &&
                          cnt == 0 ? BLOCK_JOB_SLICE_TIME : 0);
@@ -1087,9 +1085,6 @@ static int coroutine_fn mirror_run(Job *job, Error **errp)
          trace_mirror_before_sleep(s, cnt, job_is_ready(&s->common.job),
                                    delay_ns);
          job_sleep_ns(&s->common.job, delay_ns);
-        if (job_is_cancelled(&s->common.job)) {
-            break;
-        }

I think it was intentional that the check is here because it means
skipping the job_sleep_ns() and instead cancelling immediately, and we
probably still want that. Between your check above and here, the
coroutine can yield, so cancellation could have been newly requested.

I’m afraid I don’t quite understand. If cancel is requested injob_sleep_ns(), then we will go back to the top of the loop, wait forin-flight active requests and then break. Waiting for the in-flightrequests seems unnecessary, but does it really make a difference inpractice? We don’t start new requests, so it should be legal to waitfor existing ones to settle, and also I believe someone will have towait for those in-flight requests anyway (when the mirror top node isremoved). (The only thing we could do is to cancel the in-flightrequests, but that is what mirror_cancel() does.)

Looking more at the whole loop, there are a couple of places that canyield. Of course we can check whether the job has been cancelled afterevery single one of them, but that would be a bit strange. We onlyreally need to check before we initiate new requests or want to changethe state. I believe the right place to do the check would be after thejob_pause_point().

And perhaps the active write functions (bdrv_mirror_top_do_write() andbdrv_mirror_top_pwritev()) should stop copying to the target if the jobhas been cancelled.

Max

So have the check in both places, I guess? And a comment to explain why
neither is redundant.

          s->last_pause_ns = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
      }

Kevin

Re: [PATCH for-6.1? v2 6/7] mirror: Check job_is_cancelled() earlier

Reply via email to