On 03.08.21 16:34, Kevin Wolf wrote:
Am 26.07.2021 um 16:46 hat Max Reitz geschrieben:
We must check whether the job is force-cancelled early in our main loop,
most importantly before any `continue` statement.  For example, we used
to have `continue`s before our current checking location that are
triggered by `mirror_flush()` failing.  So, if `mirror_flush()` kept
failing, force-cancelling the job would not terminate it.

A job being force-cancelled should be treated the same as the job having
failed, so put the check in the same place where we check `s->ret < 0`.

Buglink: https://gitlab.com/qemu-project/qemu/-/issues/462
Signed-off-by: Max Reitz <mre...@redhat.com>
---
  block/mirror.c | 7 +------
  1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index 72e02fa34e..46d1a1e5a2 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -993,7 +993,7 @@ static int coroutine_fn mirror_run(Job *job, Error **errp)
              mirror_wait_for_any_operation(s, true);
          }
- if (s->ret < 0) {
+        if (s->ret < 0 || job_is_cancelled(&s->common.job)) {
              ret = s->ret;
              goto immediate_exit;
          }
@@ -1078,8 +1078,6 @@ static int coroutine_fn mirror_run(Job *job, Error **errp)
              break;
          }
- ret = 0;
-
          if (job_is_ready(&s->common.job) && !should_complete) {
              delay_ns = (s->in_flight == 0 &&
                          cnt == 0 ? BLOCK_JOB_SLICE_TIME : 0);
@@ -1087,9 +1085,6 @@ static int coroutine_fn mirror_run(Job *job, Error **errp)
          trace_mirror_before_sleep(s, cnt, job_is_ready(&s->common.job),
                                    delay_ns);
          job_sleep_ns(&s->common.job, delay_ns);
-        if (job_is_cancelled(&s->common.job)) {
-            break;
-        }
I think it was intentional that the check is here because it means
skipping the job_sleep_ns() and instead cancelling immediately, and we
probably still want that. Between your check above and here, the
coroutine can yield, so cancellation could have been newly requested.

I’m afraid I don’t quite understand.  If cancel is requested in job_sleep_ns(), then we will go back to the top of the loop, wait for in-flight active requests and then break.  Waiting for the in-flight requests seems unnecessary, but does it really make a difference in practice?  We don’t start new requests, so it should be legal to wait for existing ones to settle, and also I believe someone will have to wait for those in-flight requests anyway (when the mirror top node is removed).  (The only thing we could do is to cancel the in-flight requests, but that is what mirror_cancel() does.)

Looking more at the whole loop, there are a couple of places that can yield.  Of course we can check whether the job has been cancelled after every single one of them, but that would be a bit strange.  We only really need to check before we initiate new requests or want to change the state.  I believe the right place to do the check would be after the job_pause_point().

And perhaps the active write functions (bdrv_mirror_top_do_write() and bdrv_mirror_top_pwritev()) should stop copying to the target if the job has been cancelled.

Max

So have the check in both places, I guess? And a comment to explain why
neither is redundant.

          s->last_pause_ns = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
      }
Kevin



Reply via email to