On 7/23/2019 3:20 AM, Yan Zhao wrote: > On Tue, Jul 23, 2019 at 03:07:13AM +0800, Alex Williamson wrote: >> On Sun, 21 Jul 2019 23:20:28 -0400 >> Yan Zhao <yan.y.z...@intel.com> wrote: >> >>> On Fri, Jul 19, 2019 at 03:00:13AM +0800, Kirti Wankhede wrote: >>>> >>>> >>>> On 7/12/2019 8:22 AM, Yan Zhao wrote: >>>>> On Tue, Jul 09, 2019 at 05:49:17PM +0800, Kirti Wankhede wrote: >>>>>> Flow during _RESUMING device state: >>>>>> - If Vendor driver defines mappable region, mmap migration region. >>>>>> - Load config state. >>>>>> - For data packet, till VFIO_MIG_FLAG_END_OF_STATE is not reached >>>>>> - read data_size from packet, read buffer of data_size >>>>>> - read data_offset from where QEMU should write data. >>>>>> if region is mmaped, write data of data_size to mmaped region. >>>>>> - write data_size. >>>>>> In case of mmapped region, write to data_size indicates kernel >>>>>> driver that data is written in staging buffer. >>>>>> - if region is trapped, pwrite() data of data_size from data_offset. >>>>>> - Repeat above until VFIO_MIG_FLAG_END_OF_STATE. >>>>>> - Unmap migration region. >>>>>> >>>>>> For user, data is opaque. User should write data in the same order as >>>>>> received. >>>>>> >>>>>> Signed-off-by: Kirti Wankhede <kwankh...@nvidia.com> >>>>>> Reviewed-by: Neo Jia <c...@nvidia.com> >>>>>> --- >>>>>> hw/vfio/migration.c | 162 >>>>>> +++++++++++++++++++++++++++++++++++++++++++++++++++ >>>>>> hw/vfio/trace-events | 3 + >>>>>> 2 files changed, 165 insertions(+) >>>>>> >>>>>> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c >>>>>> index 4e9b4cce230b..5fb4c5329ede 100644 >>>>>> --- a/hw/vfio/migration.c >>>>>> +++ b/hw/vfio/migration.c >>>>>> @@ -249,6 +249,26 @@ static int vfio_save_device_config_state(QEMUFile >>>>>> *f, void *opaque) >>>>>> return qemu_file_get_error(f); >>>>>> } >>>>>> >>>>>> +static int vfio_load_device_config_state(QEMUFile *f, void *opaque) >>>>>> +{ >>>>>> + VFIODevice *vbasedev = opaque; >>>>>> + uint64_t data; >>>>>> + >>>>>> + if (vbasedev->ops && vbasedev->ops->vfio_load_config) { >>>>>> + vbasedev->ops->vfio_load_config(vbasedev, f); >>>>>> + } >>>>>> + >>>>>> + data = qemu_get_be64(f); >>>>>> + if (data != VFIO_MIG_FLAG_END_OF_STATE) { >>>>>> + error_report("%s: Failed loading device config space, " >>>>>> + "end flag incorrect 0x%"PRIx64, vbasedev->name, >>>>>> data); >>>>>> + return -EINVAL; >>>>>> + } >>>>>> + >>>>>> + trace_vfio_load_device_config_state(vbasedev->name); >>>>>> + return qemu_file_get_error(f); >>>>>> +} >>>>>> + >>>>>> /* >>>>>> ---------------------------------------------------------------------- */ >>>>>> >>>>>> static int vfio_save_setup(QEMUFile *f, void *opaque) >>>>>> @@ -421,12 +441,154 @@ static int vfio_save_complete_precopy(QEMUFile >>>>>> *f, void *opaque) >>>>>> return ret; >>>>>> } >>>>>> >>>>>> +static int vfio_load_setup(QEMUFile *f, void *opaque) >>>>>> +{ >>>>>> + VFIODevice *vbasedev = opaque; >>>>>> + VFIOMigration *migration = vbasedev->migration; >>>>>> + int ret = 0; >>>>>> + >>>>>> + if (migration->region.buffer.mmaps) { >>>>>> + ret = vfio_region_mmap(&migration->region.buffer); >>>>>> + if (ret) { >>>>>> + error_report("%s: Failed to mmap VFIO migration region %d: >>>>>> %s", >>>>>> + vbasedev->name, migration->region.index, >>>>>> + strerror(-ret)); >>>>>> + return ret; >>>>>> + } >>>>>> + } >>>>>> + >>>>>> + ret = vfio_migration_set_state(vbasedev, >>>>>> VFIO_DEVICE_STATE_RESUMING); >>>>>> + if (ret) { >>>>>> + error_report("%s: Failed to set state RESUMING", >>>>>> vbasedev->name); >>>>>> + } >>>>>> + return ret; >>>>>> +} >>>>>> + >>>>>> +static int vfio_load_cleanup(void *opaque) >>>>>> +{ >>>>>> + vfio_save_cleanup(opaque); >>>>>> + return 0; >>>>>> +} >>>>>> + >>>>>> +static int vfio_load_state(QEMUFile *f, void *opaque, int version_id) >>>>>> +{ >>>>>> + VFIODevice *vbasedev = opaque; >>>>>> + VFIOMigration *migration = vbasedev->migration; >>>>>> + int ret = 0; >>>>>> + uint64_t data, data_size; >>>>>> + >>>>> I think checking of version_id is still needed. >>>>> >>>> >>>> Checking version_id with what value? >>>> >>> this version_id passed-in is the source VFIO software interface id. >>> need to check it with the value in target side, right? >>> >>> Though we previously discussed the sysfs node interface to check live >>> migration version even before launching live migration, I think we still >>> need this runtime software version check in qemu to ensure software >>> interfaces in QEMU VFIO are compatible. >> >> Do we want QEMU to interact directly with sysfs for that, which would >> require write privileges to sysfs, or do we want to suggest that vendor >> drivers should include equivalent information early in their migration >> data stream to force a migration failure as early as possible for >> incompatible data? I think we need the latter regardless because the >> vendor driver should never trust userspace like that, but does that >> make any QEMU use of the sysfs version test itself redundant? Thanks, >> >> Alex > > hi Alex > I think QEMU needs to check at least the code version of software interface in > QEMU, like format of migration region, details of migration protocol, > IOW, the software version QEMU interacts with vendor driver. > This information should not be known to vendor driver until migration > running to certain phase. > e.g. if saving flow or format in source qemu is changed a little as a result > of software upgrading, target qemu has to detect that from this > version_id check, as vendor driver has no knowledge of that. > Does that make sense? > That is already done in qemu_loadvm_section_start_full() /* Validate version */ if (version_id > se->version_id) { error_report("savevm: unsupported version %d for '%s' v%d", version_id, idstr, se->version_id); return -EINVAL; } se->load_version_id = version_id; Thanks, Kirti