On Tue, May 05, 2020 at 12:37:26PM +0800, Alex Williamson wrote: > On Tue, 5 May 2020 04:49:10 +0530 > Kirti Wankhede <kwankh...@nvidia.com> wrote: > > > On 3/26/2020 2:32 AM, Alex Williamson wrote: > > > On Wed, 25 Mar 2020 02:39:06 +0530 > > > Kirti Wankhede <kwankh...@nvidia.com> wrote: > > > > > >> Define flags to be used as delimeter in migration file stream. > > >> Added .save_setup and .save_cleanup functions. Mapped & unmapped > > >> migration > > >> region from these functions at source during saving or pre-copy phase. > > >> Set VFIO device state depending on VM's state. During live migration, VM > > >> is > > >> running when .save_setup is called, _SAVING | _RUNNING state is set for > > >> VFIO > > >> device. During save-restore, VM is paused, _SAVING state is set for VFIO > > >> device. > > >> > > >> Signed-off-by: Kirti Wankhede <kwankh...@nvidia.com> > > >> Reviewed-by: Neo Jia <c...@nvidia.com> > > >> --- > > >> hw/vfio/migration.c | 76 > > >> ++++++++++++++++++++++++++++++++++++++++++++++++++++ > > >> hw/vfio/trace-events | 2 ++ > > >> 2 files changed, 78 insertions(+) > > >> > > >> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c > > >> index 22ded9d28cf3..033f76526e49 100644 > > >> --- a/hw/vfio/migration.c > > >> +++ b/hw/vfio/migration.c > > >> @@ -8,6 +8,7 @@ > > >> */ > > >> > > >> #include "qemu/osdep.h" > > >> +#include "qemu/main-loop.h" > > >> #include <linux/vfio.h> > > >> > > >> #include "sysemu/runstate.h" > > >> @@ -24,6 +25,17 @@ > > >> #include "pci.h" > > >> #include "trace.h" > > >> > > >> +/* > > >> + * Flags used as delimiter: > > >> + * 0xffffffff => MSB 32-bit all 1s > > >> + * 0xef10 => emulated (virtual) function IO > > >> + * 0x0000 => 16-bits reserved for flags > > >> + */ > > >> +#define VFIO_MIG_FLAG_END_OF_STATE (0xffffffffef100001ULL) > > >> +#define VFIO_MIG_FLAG_DEV_CONFIG_STATE (0xffffffffef100002ULL) > > >> +#define VFIO_MIG_FLAG_DEV_SETUP_STATE (0xffffffffef100003ULL) > > >> +#define VFIO_MIG_FLAG_DEV_DATA_STATE (0xffffffffef100004ULL) > > >> + > > >> static void vfio_migration_region_exit(VFIODevice *vbasedev) > > >> { > > >> VFIOMigration *migration = vbasedev->migration; > > >> @@ -126,6 +138,69 @@ static int vfio_migration_set_state(VFIODevice > > >> *vbasedev, uint32_t mask, > > >> return 0; > > >> } > > >> > > >> +/* > > >> ---------------------------------------------------------------------- */ > > >> + > > >> +static int vfio_save_setup(QEMUFile *f, void *opaque) > > >> +{ > > >> + VFIODevice *vbasedev = opaque; > > >> + VFIOMigration *migration = vbasedev->migration; > > >> + int ret; > > >> + > > >> + qemu_put_be64(f, VFIO_MIG_FLAG_DEV_SETUP_STATE); > > >> + > > >> + if (migration->region.mmaps) { > > >> + qemu_mutex_lock_iothread(); > > >> + ret = vfio_region_mmap(&migration->region); > > >> + qemu_mutex_unlock_iothread(); > > >> + if (ret) { > > >> + error_report("%s: Failed to mmap VFIO migration region %d: > > >> %s", > > >> + vbasedev->name, migration->region.index, > > >> + strerror(-ret)); > > >> + return ret; > > >> + } > > >> + } > > >> + > > >> + ret = vfio_migration_set_state(vbasedev, ~0, > > >> VFIO_DEVICE_STATE_SAVING); > > >> + if (ret) { > > >> + error_report("%s: Failed to set state SAVING", vbasedev->name); > > >> + return ret; > > >> + } > > >> + > > >> + /* > > >> + * Save migration region size. This is used to verify migration > > >> region size > > >> + * is greater than or equal to migration region size at destination > > >> + */ > > >> + qemu_put_be64(f, migration->region.size); > > > > > > Is this requirement supported by the uapi? > > > > Yes, on UAPI thread we discussed this: > > > > * For the user application, data is opaque. The user application > > should write > > * data in the same order as the data is received and the data should be of > > * same transaction size at the source. > > > > data should be same transaction size, so migration region size should be > > greater than or equal to the size at source when verifying at destination. > > We are that user application for which the data is opaque, therefore we > should make no assumptions about how the vendor driver makes use of > their region. If we get a transaction that exceeds the end of the > region, I agree, that would be an error. But we have no business > predicting that such a transaction might occur if the vendor driver > indicates it can support the migration. > > > > The vendor driver operates > > > within the migration region, but it has no requirement to use the full > > > extent of the region. Shouldn't we instead insert the version string > > > from versioning API Yan proposed? Is this were we might choose to use > > > an interface via the vfio API rather than sysfs if we had one? > > > > > > > VFIO API cannot be used by libvirt or management tool stack. We need > > sysfs as Yan proposed to be used by libvirt or management tool stack. > > It's been a long time, but that doesn't seem like what I was asking. > The sysfs version checking is used to select a target that is likely to > succeed, but the migration stream is still generated by a user and the > vendor driver is still ultimately responsible for validating that > stream. I would hope that a vendor migration stream therefore starts > with information similar to that found in the sysfs interface, allowing > the receiving vendor driver to validate the source device and vendor > software version, such that we can fail an incoming migration that the > vendor driver deems incompatible. Ideally the vendor driver might also > include consistency and sequence checking throughout the stream to > prevent a malicious user from exploiting the internal operation of the > vendor driver. Thanks, > maybe we can add a rw field migration_version in struct vfio_device_migration_info besides sysfs interface ?
when reading it in src, it gets the same string as that from sysfs; when writing it in target, it returns success or not to check compatibility and fails the migration early in setup phase. Thanks Yan.