On Tue, 27 Jun 2023 11:00:46 +0300 Avihai Horon <avih...@nvidia.com> wrote:
> On 26/06/2023 20:27, Alex Williamson wrote: > > External email: Use caution opening links or attachments > > > > > > On Mon, 26 Jun 2023 17:26:42 +0200 > > Cédric Le Goater <c...@redhat.com> wrote: > > > >> On 6/26/23 15:40, Joao Martins wrote: > >>> On 26/06/2023 14:20, Cédric Le Goater wrote: > >>>> Hello Avihai, > >>>> > >>>> On 6/26/23 10:23, Avihai Horon wrote: > >>>>> The major parts of VFIO migration are supported today in QEMU. This > >>>>> includes basic VFIO migration, device dirty page tracking and precopy > >>>>> support. > >>>>> > >>>>> Thus, at this point in time, it seems appropriate to make VFIO migration > >>>>> non-experimental: remove the x prefix from enable_migration property, > >>>>> change it to ON_OFF_AUTO and let the default value be AUTO. > >>>>> > >>>>> In addition, make the following adjustments: > >>>>> 1. Require device dirty tracking support when enable_migration is AUTO > >>>>> (i.e., not explicitly enabled). This is because device dirty > >>>>> tracking > >>>>> is currently the only method to do dirty page tracking, which is > >>>>> essential for migrating in a reasonable downtime. > >>>> hmm, I don't think QEMU should decide to disable a feature for all > >>>> devices supposedly because it could be slow for some. That's too > >>>> restrictive. What about devices with have small states ? for which > >>>> the downtime would be reasonable even without device dirty tracking > >>>> support. > >>>> > >>> device dirty tracking refers to the ability to tracking dirty IOVA used > >>> by the > >>> device which will DMA into RAM. It is required because the > >>> consequence/alternative is to transfer all RAM in stop copy phase. Device > >>> state > >>> size at that point is the least of our problems downtime wise. > >> Arg. thanks for reminding me. I tend to take this for granted ... > > My initial reaction was the same, for instance we could have a non-DMA > > device (ex. PCI serial card) that supports migration w/o dirty > > tracking, but QEMU cannot assume the device doesn't do DMA, nor is it > > worth our time to monitor whether bus-master is ever enabled for the > > device, so QEMU needs to assume all memory that's DMA accessible for > > the device is perpetually dirty. Also, if such a corner case existed > > for a non-DMA migratable device, the easier path is simply to require > > dirty tracking stubs in the variant driver to report no memory dirtied. > > > >>> I can imagine that allowing without dirty tracking is useful for developer > >>> testing of the suspend/device-state flows, but as real default (auto) is > >>> very > >>> questionable to let it go through without dirty tracking. When we have > >>> IOMMUFD > >>> dirty tracking that's when we can relieve this restriction as a default. > >>> > >>> But then note that (...) > >>> > >>>>> Setting > >>>>> enable_migration to ON will not require device dirty tracking. > >>> (...) this lets it ignore dirty tracking as you would like. > >>> > >>> > >>>>> 2. Make migration blocker messages more elaborate. > >>>>> 3. Remove error prints in vfio_migration_query_flags(). > >>>>> 4. Remove a redundant assignment in vfio_migration_realize(). > >>>>> > >>>>> Signed-off-by: Avihai Horon <avih...@nvidia.com> > >>>>> --- > >>>>> include/hw/vfio/vfio-common.h | 2 +- > >>>>> hw/vfio/migration.c | 29 ++++++++++++++++------------- > >>>>> hw/vfio/pci.c | 4 ++-- > >>>>> 3 files changed, 19 insertions(+), 16 deletions(-) > >>>>> > >>>>> diff --git a/include/hw/vfio/vfio-common.h > >>>>> b/include/hw/vfio/vfio-common.h > >>>>> index b4c28f318f..387eabde60 100644 > >>>>> --- a/include/hw/vfio/vfio-common.h > >>>>> +++ b/include/hw/vfio/vfio-common.h > >>>>> @@ -139,7 +139,7 @@ typedef struct VFIODevice { > >>>>> bool needs_reset; > >>>>> bool no_mmap; > >>>>> bool ram_block_discard_allowed; > >>>>> - bool enable_migration; > >>>>> + OnOffAuto enable_migration; > >>>>> VFIODeviceOps *ops; > >>>>> unsigned int num_irqs; > >>>>> unsigned int num_regions; > >>>>> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c > >>>>> index 79eb81dfd7..d8e0848635 100644 > >>>>> --- a/hw/vfio/migration.c > >>>>> +++ b/hw/vfio/migration.c > >>>>> @@ -731,14 +731,6 @@ static int vfio_migration_query_flags(VFIODevice > >>>>> *vbasedev, uint64_t *mig_flags) > >>>>> feature->argsz = sizeof(buf); > >>>>> feature->flags = VFIO_DEVICE_FEATURE_GET | > >>>>> VFIO_DEVICE_FEATURE_MIGRATION; > >>>>> if (ioctl(vbasedev->fd, VFIO_DEVICE_FEATURE, feature)) { > >>>>> - if (errno == ENOTTY) { > >>>>> - error_report("%s: VFIO migration is not supported in > >>>>> kernel", > >>>>> - vbasedev->name); > >>>>> - } else { > >>>>> - error_report("%s: Failed to query VFIO migration support, > >>>>> err: %s", > >>>>> - vbasedev->name, strerror(errno)); > >>>>> - } > >>>>> - > >>>>> return -errno; > >>>>> } > >>>>> @@ -831,14 +823,28 @@ void vfio_reset_bytes_transferred(void) > >>>>> int vfio_migration_realize(VFIODevice *vbasedev, Error **errp) > >>>>> { > >>>>> - int ret = -ENOTSUP; > >>>>> + int ret; > >>>>> - if (!vbasedev->enable_migration) { > >>>>> + if (vbasedev->enable_migration == ON_OFF_AUTO_OFF) { > >>>>> + error_setg(&vbasedev->migration_blocker, > >>>>> + "%s: Migration is disabled for VFIO device", > >>>>> vbasedev->name); > >>>>> goto add_blocker; > >>>>> } > >>>>> ret = vfio_migration_init(vbasedev); > >>>>> if (ret) { > >>>> It would be good to keep the message for 'errno == ENOTTY' as it was in > >>>> vfio_migration_query_flags(). When migration fails, it is an important > >>>> information to know that it is because the VFIO PCI host device driver > >>>> doesn't support the feature. The root cause could be deep below in FW or > >>>> how the VF was set up. > >>>> > >>> +1 As I have been in this rabbit hole > > Sure, I will add it. > > >>> > >>>>> + error_setg(&vbasedev->migration_blocker, > >>>>> + "%s: Migration couldn't be initialized for VFIO > >>>>> device, " > >>>>> + "err: %d (%s)", > >>>>> + vbasedev->name, ret, strerror(-ret)); > >>>>> + goto add_blocker; > >>>>> + } > >>>>> + > >>>>> + if (vbasedev->enable_migration == ON_OFF_AUTO_AUTO && > >>>>> + !vbasedev->dirty_pages_supported) { > >>>> I don't agree with this test. > >>>> > >>> The alternative right now is perceptual dirty tracking. How is that OK as > >>> a > >>> default? It's not like we have even an option :( > >>> > >>> Maybe perhaps you refer to avoid strongly enforcing *always* it to allow > >>> testing > >>> of the non dirty tracking parts? Maybe when you 'force' enabling with > >>> enable-migration=on is when you ignore the dirty tracking which is what > >>> this is > >>> doing. > >> > >> I see ON_OFF_AUTO_ON as a way to abort the machine startup while > >> ON_OFF_AUTO_AUTO would let it run but block migration. > > Agreed. There's a little bit of redundancy between the device-level > > "enable-migration=on" option and the global "-only-migratable" option > > relative to preventing machine startup, but it also doesn't make sense > > to me if the device-level option let realize complete successfully if > > the device doesn't support or fails migration setup. So I think we'd > > generally rely on using the -only-migratable option with the default > > ON_OFF_AUTO_AUTO value, allow the ON_OFF_AUTO_ON value to enable > > dis-recommended support, and live with the redundancy that it should > > also cause the device realize to fail if migration is not supported. > > Thanks, > > > > Alex > > OK. > > When enable_migration=AUTO we allow blockers. > When enable_migration=ON we don't allow blockers and instead prevent > realization of VFIO device. > > Regarding device dirty tracking, we keep current behavior, right? > That is: > When enable_migration=AUTO we block migration if device dirty tracking > is not supported. > When enable_migration=ON we allow migration even if device dirty > tracking is not supported (in which case DMA-able memory will be > perpetually dirtied). Yes, and I think we'd be justified in logging a warning when migration is enabled without any dirty page tracking support. Thanks, Alex