* Daniel P. Berrangé (berra...@redhat.com) wrote: > On Mon, Feb 15, 2021 at 07:01:28PM +0000, Dr. David Alan Gilbert wrote: > > * Daniel P. Berrangé (berra...@redhat.com) wrote: > > > On Mon, Feb 15, 2021 at 06:38:05PM +0000, Dr. David Alan Gilbert wrote: > > > > One thing to check, and I *think* you're OK, but we have one place where > > > > we actually check the error number: > > > > > > > > migration.c: > > > > 3414 static MigThrError migration_detect_error(MigrationState *s) > > > > ... > > > > 3426 /* Try to detect any file errors */ > > > > 3427 ret = qemu_file_get_error_obj(s->to_dst_file, &local_error); > > > > 3428 if (!ret) { > > > > 3429 /* Everything is fine */ > > > > 3430 assert(!local_error); > > > > 3431 return MIG_THR_ERR_NONE; > > > > 3432 } > > > > 3433 > > > > 3434 if (local_error) { > > > > 3435 migrate_set_error(s, local_error); > > > > 3436 error_free(local_error); > > > > 3437 } > > > > 3438 > > > > 3439 if (state == MIGRATION_STATUS_POSTCOPY_ACTIVE && ret == -EIO) { > > > > 3440 /* > > > > 3441 * For postcopy, we allow the network to be down for a > > > > 3442 * while. After that, it can be continued by a > > > > 3443 * recovery phase. > > > > 3444 */ > > > > 3445 return postcopy_pause(s); > > > > 3446 } else { > > > > > > > > This is to go into postcopy pause if the network connection broke (but > > > > not if for example a device moaned about being in an invalid state) > > > > > > > > If I read this correctly, file errors are still being preserved - is > > > > that correct? > > > > > > Yes, in places where QemuFile is reporting an actual I/O error I've > > > tried to preserve that. Only removed setting of fake I/O errors. So > > > if anything, we ought to get more accurate at detecting the recoverable > > > scenarios once we fully cleanup errors. > > > > OK, good. > > One scenario to possibly check though is that in a few places we used > error_report_err() but didn't immediately return an error code back to > the caller, instead carrying on doing other calls. It is possible that > we thus reported an error about bad data, and then later hit the EIO > check for QemuFile.
That's generally OK; it gets pretty painful to do the qemu file checks after every read. Dave > > Regards, > Daniel > -- > |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| > |: https://libvirt.org -o- https://fstop138.berrange.com :| > |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :| -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK