Re: [PATCH RFC] vfio/common: Add an option to relax vIOMMU migration blocker

Joao Martins Wed, 20 Sep 2023 13:22:14 -0700

On 08/09/2023 13:05, Joao Martins wrote:
> Add an option 'x-migration-iommu-pt' to VFIO that allows it to relax
> whether the vIOMMU usage blocks the migration. The current behaviour
> is kept and we block migration in the following conditions:
> 
> * By default if the guest does try to use vIOMMU migration is blocked
> when migration is attempted, just like having the migration blocker in
> the first place [Current behaviour]
> 
> * Migration starts with no vIOMMU mappings, but guest kexec's itself
> with IOMMU on ('iommu=on intel_iommu=on') and ends up using the vIOMMU.
> here we cancel the migration with an error message [Added behaviour]
> 
> This is meant to be used for older VMs (5.10) cases where we can relax
> the usage and that IOMMU is passed for the sole need of interrupt
> remapping while the guest is old enough to not check for DMA translation
> services while probe its IOMMU devices[0]. The option is useful for
> managed VMs where you *steer* some of the guest behaviour and you know
> you won't use it for more than interrupt remapping.
> 
> [0] 
> https://lore.kernel.org/qemu-devel/20230622214845.3980-1-joao.m.mart...@oracle.com/
> 
> Default is 'disabled' for this option given the second bullet point
> above depends on guest behaviour (thus undeterministic). But let the
> user enable it if it can tolerate migration failures.
> 
> Signed-off-by: Joao Martins <joao.m.mart...@oracle.com>
> ---
> Followup from discussion here:
> https://lore.kernel.org/qemu-devel/d5d30f58-31f0-1103-6956-377de34a7...@redhat.com/
> 
> This is a smaller (and simpler) take than [0], but is likely the only
> option thinking in old guests, or managed guests that only want to use
> vIOMMU for interrupt remapping. The work in [0] has stronger 'migration
> will work' guarantees (of course except for the usual no convergence 
> or network failuresi that are agnostic to vIOMMU), and a bit better in
> limiting what guest can do. But it also depends in slightly recent
> guests. I think both are useful.
> 
> About the patch itself:
> 
> * cancelling migration was done via vfio_migration_set_error() but
> I can always use migrate_cancel() if migration is active, or add
> a migration blocker when it's not active.
> 
Are folks in against/favor the idea presented here before I go and make this
small improvement?


It is the only way I can think of for old guests using vIOMMU (for intremap
case). At the same time, it is still blocking/interrupting migration with vIOMMU
except that it's only really blocked of migration when it actually tries to
setup a mapping. Hence why I was thinking to enable it by default, but
optionally on (as is) is great too. The naming could probably be better, but
couldn't figure a better name

> ---
>  include/hw/vfio/vfio-common.h |  2 ++
>  hw/vfio/common.c              | 66 +++++++++++++++++++++++++++++++++++
>  hw/vfio/migration.c           |  7 +++-
>  hw/vfio/pci.c                 |  2 ++
>  4 files changed, 76 insertions(+), 1 deletion(-)
> 
> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> index e9b895459534..95ef386af45f 100644
> --- a/include/hw/vfio/vfio-common.h
> +++ b/include/hw/vfio/vfio-common.h
> @@ -140,6 +140,7 @@ typedef struct VFIODevice {
>      bool no_mmap;
>      bool ram_block_discard_allowed;
>      OnOffAuto enable_migration;
> +    bool iommu_passthrough;
>      VFIODeviceOps *ops;
>      unsigned int num_irqs;
>      unsigned int num_regions;
> @@ -227,6 +228,7 @@ extern VFIOGroupList vfio_group_list;
>  bool vfio_mig_active(void);
>  int vfio_block_multiple_devices_migration(VFIODevice *vbasedev, Error 
> **errp);
>  void vfio_unblock_multiple_devices_migration(void);
> +bool vfio_devices_all_iommu_passthrough(void);
>  bool vfio_viommu_preset(VFIODevice *vbasedev);
>  int64_t vfio_mig_bytes_transferred(void);
>  void vfio_reset_bytes_transferred(void);
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index 134649226d43..4adf9fec08f1 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -433,6 +433,22 @@ void vfio_unblock_multiple_devices_migration(void)
>      multiple_devices_migration_blocker = NULL;
>  }
>  
> +bool vfio_devices_all_iommu_passthrough(void)
> +{
> +    VFIODevice *vbasedev;
> +    VFIOGroup *group;
> +
> +    QLIST_FOREACH(group, &vfio_group_list, next) {
> +        QLIST_FOREACH(vbasedev, &group->device_list, next) {
> +            if (!vbasedev->iommu_passthrough) {
> +                return false;
> +            }
> +        }
> +    }
> +
> +    return true;
> +}
> +
>  bool vfio_viommu_preset(VFIODevice *vbasedev)
>  {
>      return vbasedev->group->container->space->as != &address_space_memory;
> @@ -1194,6 +1210,18 @@ static void vfio_listener_region_add(MemoryListener 
> *listener,
>              goto fail;
>          }
>          QLIST_INSERT_HEAD(&container->giommu_list, giommu, giommu_next);
> +
> +        /*
> +         * Any attempts to use make vIOMMU mappings will fail the live 
> migration
> +         */
> +        if (vfio_devices_all_iommu_passthrough()) {
> +            MigrationState *ms = migrate_get_current();
> +
> +            if (migration_is_setup_or_active(ms->state)) {
> +                vfio_set_migration_error(-EOPNOTSUPP);
> +            }
> +        }
> +
>          memory_region_iommu_replay(giommu->iommu_mr, &giommu->n);
>  
>          return;
> @@ -1628,6 +1656,44 @@ static int 
> vfio_devices_dma_logging_start(VFIOContainer *container)
>      VFIOGroup *group;
>      int ret = 0;
>  
> +    /*
> +     * vIOMMU models traditionally define the maximum address space width, 
> which
> +     * is a superset of the effective IOVA addresses being used e.g.
> +     * intel-iommu defines 39-bit and 48-bit, and similarly AMD hardware.  
> The
> +     * problem is that these limits are really big, for device dirty trackers
> +     * when the iommu gets passed for the sole usage of interrupt remapping 
> i.e.
> +     * >=256 vCPUs while IOMMU is kept in passthrough mode.
> +     *
> +     * With x-migration-iommu-pt assume that a guest being migrated won't use
> +     * the vIOMMU in a non passthrough manner (throughout migration). In that
> +     * case, try to use the boot memory layout that VFIO DMA maps to minimize
> +     * having to stress high dirty tracking limits, and fail on any new 
> gIOMMU
> +     * mappings which may:
> +     *
> +     * 1) Prevent the migration from starting i.e. gIOMMU mappings done
> +     * migration which would be no different than having the migration 
> blocker.
> +     * So this behaviour is obviously kept.
> +     *
> +     * 2) Cancelling an active migration i.e. new gIOMMU mappings mid 
> migration
> +     * From a Linux guest perspective this means for example the guest 
> kexec's
> +     * with 'iommu=on intel_iommu=on amd_iommu=on' or etc and at boot it will
> +     * establish some vIOMMU mappings.
> +     *
> +     * This option should be specially relevant for old guests (<5.10) which
> +     * don't probe for DMA translation services being off when initializing
> +     * IOMMU devices, thus ending up in crashes when dma-translation=off is
> +     * passed.
> +     *
> +     */
> +    if (vfio_devices_all_iommu_passthrough() &&
> +        !QLIST_EMPTY(&container->giommu_list)) {
> +        ret = EOPNOTSUPP;
> +        error_report("vIOMMU mappings active "
> +                     "cannot start dirty tracking, err %d (%s)",
> +                     -ret, strerror(ret));
> +        return -ret;
> +    }
> +
>      vfio_dirty_tracking_init(container, &ranges);
>      feature = vfio_device_feature_dma_logging_start_create(container,
>                                                             &ranges);
> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
> index da43dcd2fe07..21304c8a90bc 100644
> --- a/hw/vfio/migration.c
> +++ b/hw/vfio/migration.c
> @@ -970,10 +970,15 @@ bool vfio_migration_realize(VFIODevice *vbasedev, Error 
> **errp)
>          goto out_deinit;
>      }
>  
> -    if (vfio_viommu_preset(vbasedev)) {
> +    if (vfio_viommu_preset(vbasedev) &&
> +        !vfio_devices_all_iommu_passthrough()) {
>          error_setg(&err, "%s: Migration is currently not supported "
>                     "with vIOMMU enabled", vbasedev->name);
>          goto add_blocker;
> +    } else if (vfio_devices_all_iommu_passthrough()) {
> +        warn_report("%s: Migration maybe blocked or cancelled"
> +                    "if vIOMMU is used beyond interrupt remapping",
> +                    vbasedev->name);
>      }
>  
>      trace_vfio_migration_realize(vbasedev->name);
> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index 3c37d036e727..5276a49fca12 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -3507,6 +3507,8 @@ static Property vfio_pci_dev_properties[] = {
>                      VFIO_FEATURE_ENABLE_IGD_OPREGION_BIT, false),
>      DEFINE_PROP_ON_OFF_AUTO("enable-migration", VFIOPCIDevice,
>                              vbasedev.enable_migration, ON_OFF_AUTO_AUTO),
> +    DEFINE_PROP_BOOL("x-migration-iommu-pt", VFIOPCIDevice,
> +                     vbasedev.iommu_passthrough, false),
>      DEFINE_PROP_BOOL("x-no-mmap", VFIOPCIDevice, vbasedev.no_mmap, false),
>      DEFINE_PROP_BOOL("x-balloon-allowed", VFIOPCIDevice,
>                       vbasedev.ram_block_discard_allowed, false),

Re: [PATCH RFC] vfio/common: Add an option to relax vIOMMU migration blocker

Reply via email to