Re: [PATCH 00/33] migration: capture error reports into Error object

Dr. David Alan Gilbert Tue, 16 Feb 2021 11:32:30 -0800

* Daniel P. Berrangé (berra...@redhat.com) wrote:
> On Mon, Feb 15, 2021 at 07:01:28PM +0000, Dr. David Alan Gilbert wrote:
> > * Daniel P. Berrangé (berra...@redhat.com) wrote:
> > > On Mon, Feb 15, 2021 at 06:38:05PM +0000, Dr. David Alan Gilbert wrote:
> > > > One thing to check, and I *think* you're OK, but we have one place where
> > > > we actually check the error number:
> > > > 
> > > > migration.c:
> > > > 3414 static MigThrError migration_detect_error(MigrationState *s)
> > > > ...
> > > > 3426     /* Try to detect any file errors */
> > > > 3427     ret = qemu_file_get_error_obj(s->to_dst_file, &local_error);
> > > > 3428     if (!ret) {
> > > > 3429         /* Everything is fine */
> > > > 3430         assert(!local_error);
> > > > 3431         return MIG_THR_ERR_NONE;
> > > > 3432     }
> > > > 3433 
> > > > 3434     if (local_error) {
> > > > 3435         migrate_set_error(s, local_error);
> > > > 3436         error_free(local_error);
> > > > 3437     }
> > > > 3438 
> > > > 3439     if (state == MIGRATION_STATUS_POSTCOPY_ACTIVE && ret == -EIO) {
> > > > 3440         /*
> > > > 3441          * For postcopy, we allow the network to be down for a
> > > > 3442          * while. After that, it can be continued by a
> > > > 3443          * recovery phase.
> > > > 3444          */
> > > > 3445         return postcopy_pause(s);
> > > > 3446     } else {
> > > > 
> > > > This is to go into postcopy pause if the network connection broke (but
> > > > not if for example a device moaned about being in an invalid state)
> > > > 
> > > > If I read this correctly, file errors are still being preserved - is
> > > > that correct?
> > > 
> > > Yes, in places where QemuFile is reporting an actual I/O error I've
> > > tried to preserve that. Only removed setting of fake I/O errors. So
> > > if anything, we ought to get more accurate at detecting the recoverable
> > > scenarios once we fully cleanup errors.
> > 
> > OK, good.
> 
> One scenario to possibly check though is that in a few places we used
> error_report_err() but didn't immediately return an error code back to
> the caller, instead carrying on doing other calls. It is possible that
> we thus reported an error about bad data, and then later hit the EIO
> check for QemuFile.


That's generally OK; it gets pretty painful to do the qemu file checks
after every read.

Dave

> 
> Regards,
> Daniel
> -- 
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
-- 
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [PATCH 00/33] migration: capture error reports into Error object

Reply via email to