On 10/04/2016 12:34 PM, Kevin Wolf wrote: > Am 04.10.2016 um 11:23 hat Stefan Hajnoczi geschrieben: >> On Mon, Oct 03, 2016 at 02:07:34PM -0400, John Snow wrote: >>> >>> On 10/03/2016 09:11 AM, Stefan Hajnoczi wrote: >>>> On Fri, Sep 30, 2016 at 09:59:16PM +0300, Vladimir Sementsov-Ogievskiy >>>> wrote: >>>>> On 30.09.2016 20:11, Vladimir Sementsov-Ogievskiy wrote: >>>>>> Hi all! >>>>>> >>>>>> Please, can somebody explain me, why we fail guest request in case of io >>>>>> error in write notifier? I think guest consistency is more important >>>>>> than success of unfinished backup. Or, what am I missing? >>>>>> >>>>>> I'm saying about this code: >>>>>> >>>>>> static int coroutine_fn backup_before_write_notify( >>>>>> NotifierWithReturn *notifier, >>>>>> void *opaque) >>>>>> { >>>>>> BackupBlockJob *job = container_of(notifier, BackupBlockJob, >>>>>> before_write); >>>>>> BdrvTrackedRequest *req = opaque; >>>>>> int64_t sector_num = req->offset >> BDRV_SECTOR_BITS; >>>>>> int nb_sectors = req->bytes >> BDRV_SECTOR_BITS; >>>>>> >>>>>> assert(req->bs == blk_bs(job->common.blk)); >>>>>> assert((req->offset & (BDRV_SECTOR_SIZE - 1)) == 0); >>>>>> assert((req->bytes & (BDRV_SECTOR_SIZE - 1)) == 0); >>>>>> >>>>>> return backup_do_cow(job, sector_num, nb_sectors, NULL, true); >>>>>> } >>>>>> >>>>>> So, what about something like >>>>>> >>>>>> ret = backup_do_cow(job, ... >>>>>> if (ret < 0 && job->notif_ret == 0) { >>>>>> job->notif_ret = ret; >>>>>> } >>>>>> >>>>>> return 0; >>>>>> >>>>>> and fail block job if notif_ret < 0 in other places of backup code? >>>>>> >>>>> And second question about notifiers in backup block job. If block job is >>>>> paused, notifiers still works and can copy data. Is it ok? So, user thinks >>>>> that job is paused, so he can do something with target disk.. But really, >>>>> this 'something' will race with write-notifiers. So, what assumptions may >>>>> user actually have about paused backup job? Is there any agreements? Also, >>>>> on query-block-jobs we will see job.busy = false, when actually >>>>> copy-on-write may be in flight.. >>>> I agree that the job should fail and the guest continues running. >>>> >>>> The backup job cannot do the usual ENOSPC stop/resume error handling >>>> since we lose snapshot consistency once guest writes are allowed to >>>> proceed. Backup errors need to be fatal, resuming is usually not >>>> possible. The user will have to retry the backup operation. >>>> >>>> Stefan >>>> >>> If we fail and intercept the error for the backup write and HALT at that >>> point, why would we lose consistency? If the backup write failed before we >>> allowed the guest write to proceed, that data should still be there on disk, >>> no? >> I missed that there are two separate error handling approaches used in >> block/backup.c: >> >> 1. In the write notifier I/O errors are treated as if the guest write >> failed. >> >> 2. In the backup_run() loop I/O errors affect the block job's error >> status. >> >> I was thinking of case #2 instead of case #1. >> >>> Eh, regardless: If we're not using a STOP policy, it seems like the right >>> thing to do is definitely to just fail the backup instead of failing the >>> write. >> Even with a -drive werror=stop policy the user probably doesn't want >> guest downtime if writing to the backup target fails. > That's a policy decision that ultimately only the user can make. For one > user, it might be preferable to cancel the backup and keep the VM > running, but for another user it may be more important to keep a > consistent snapshot of the point in time when the backup job was started > than keeping the VM running. > > Kevin In this case policy for guest error and policy for backup error should be different policies or I have missed something.
Den