Re: [Qemu-block] backup notifier fail policy

John Snow Mon, 03 Oct 2016 11:08:21 -0700


On 10/03/2016 09:11 AM, Stefan Hajnoczi wrote:

On Fri, Sep 30, 2016 at 09:59:16PM +0300, Vladimir Sementsov-Ogievskiy wrote:

On 30.09.2016 20:11, Vladimir Sementsov-Ogievskiy wrote:

Hi all!

Please, can somebody explain me, why we fail guest request in case of io
error in write notifier? I think guest consistency is more important
than success of unfinished backup. Or, what am I missing?

I'm saying about this code:

static int coroutine_fn backup_before_write_notify(
        NotifierWithReturn *notifier,
        void *opaque)
{
    BackupBlockJob *job = container_of(notifier, BackupBlockJob,
before_write);
    BdrvTrackedRequest *req = opaque;
    int64_t sector_num = req->offset >> BDRV_SECTOR_BITS;
    int nb_sectors = req->bytes >> BDRV_SECTOR_BITS;

    assert(req->bs == blk_bs(job->common.blk));
    assert((req->offset & (BDRV_SECTOR_SIZE - 1)) == 0);
    assert((req->bytes & (BDRV_SECTOR_SIZE - 1)) == 0);

    return backup_do_cow(job, sector_num, nb_sectors, NULL, true);
}

So, what about something like

ret = backup_do_cow(job, ...
if (ret < 0 && job->notif_ret == 0) {
   job->notif_ret = ret;
}

return 0;

and fail block job if notif_ret < 0 in other places of backup code?


And second question about notifiers in backup block job. If block job is
paused, notifiers still works and can copy data. Is it ok? So, user thinks
that job is paused, so he can do something with target disk.. But really,
this 'something' will race with write-notifiers. So, what assumptions may
user actually have about paused backup job? Is there any agreements? Also,
on query-block-jobs we will see job.busy = false, when actually
copy-on-write may be in flight..


I agree that the job should fail and the guest continues running.

The backup job cannot do the usual ENOSPC stop/resume error handling
since we lose snapshot consistency once guest writes are allowed to
proceed.  Backup errors need to be fatal, resuming is usually not
possible.  The user will have to retry the backup operation.

Stefan

If we fail and intercept the error for the backup write and HALT at thatpoint, why would we lose consistency? If the backup write failed beforewe allowed the guest write to proceed, that data should still be thereon disk, no?

I guess it is a little messier than the usual STOP case, but it doesn'tseem inherently impossible to me...

Eh, regardless: If we're not using a STOP policy, it seems like theright thing to do is definitely to just fail the backup instead offailing the write.

As for paused guarantees... good point. If you want to truly pause abackup job, I think you necessarily begin accruing a backlog of datathat needs to get written back out. Maybe it's not easily possible totruly pause a backup block job.

I'm not exactly sure what we should do about it, though I do know thateventually we want to replace write notifiers with block filters, buteven those would likely remain operating during a pause.

'busy' means something very specific within QEMU, but perhaps the queryfunction can be adjusted to return 'true' for busy as long as either thejob is running OR it has latent portions still running (write notifiers,block filters, etc.)


--js

Re: [Qemu-block] backup notifier fail policy

Reply via email to