migration: Implement VFIO migration protocol v2

Alex Williamson Thu, 17 Nov 2022 09:39:01 -0800

On Thu, 17 Nov 2022 19:07:10 +0200
Avihai Horon <avih...@nvidia.com> wrote:
> On 16/11/2022 20:29, Alex Williamson wrote:
> > On Thu, 3 Nov 2022 18:16:15 +0200
> > Avihai Horon <avih...@nvidia.com> wrote:
> >> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
> >> index e784374453..62afc23a8c 100644
> >> --- a/hw/vfio/migration.c
> >> +++ b/hw/vfio/migration.c
> >> @@ -44,8 +44,84 @@
> >>   #define VFIO_MIG_FLAG_DEV_SETUP_STATE   (0xffffffffef100003ULL)
> >>   #define VFIO_MIG_FLAG_DEV_DATA_STATE    (0xffffffffef100004ULL)
> >>
> >> +#define VFIO_MIG_DATA_BUFFER_SIZE (1024 * 1024)  
> > Add comment explaining heuristic of this size.  
> 
> This is an arbitrary size we picked with mlx5 state size in mind.
> Increasing this size to higher values (128M, 1G) didn't improve 
> performance in our testing.
> 
> How about this comment:
> This is an initial value that doesn't consume much memory and provides 
> good performance.
> 
> Do you have other suggestion?


I'd lean more towards your description above, ex:

/*
 * This is an arbitrary size based on migration of mlx5 devices, where
 * the worst case total device migration size is on the order of 100s
 * of MB.  Testing with larger values, ex. 128MB and 1GB, did not show
 * a performance improvement.
 */

I think that provides sufficient information for someone who might come
later to have an understanding of the basis if they want to try to
optimize further.

> >> @@ -804,34 +1090,51 @@ static int vfio_migration_init(VFIODevice *vbasedev)
> >>           return -EINVAL;
> >>       }
> >>
> >> -    ret = vfio_get_dev_region_info(vbasedev,
> >> -                                   VFIO_REGION_TYPE_MIGRATION_DEPRECATED,
> >> -                                   
> >> VFIO_REGION_SUBTYPE_MIGRATION_DEPRECATED,
> >> -                                   &info);
> >> -    if (ret) {
> >> -        return ret;
> >> -    }
> >> +    ret = vfio_migration_query_flags(vbasedev, &mig_flags);
> >> +    if (!ret) {
> >> +        /* Migration v2 */
> >> +        /* Basic migration functionality must be supported */
> >> +        if (!(mig_flags & VFIO_MIGRATION_STOP_COPY)) {
> >> +            return -EOPNOTSUPP;
> >> +        }
> >> +        vbasedev->migration = g_new0(VFIOMigration, 1);
> >> +        vbasedev->migration->device_state = VFIO_DEVICE_STATE_RUNNING;
> >> +        vbasedev->migration->data_buffer_size = VFIO_MIG_DATA_BUFFER_SIZE;
> >> +        vbasedev->migration->data_buffer =
> >> +            g_malloc0(vbasedev->migration->data_buffer_size);  
> > So VFIO_MIG_DATA_BUFFER_SIZE is our chunk size, but why doesn't the
> > later addition of estimated device data size make any changes here?
> > I'd think we'd want to scale the buffer to the minimum of the reported
> > data size and some well documented heuristic for an upper bound.  
> 
> As I wrote above, increasing this size to higher values (128M, 1G) 
> didn't improve performance in our testing.
> We can always change it later on if some other heuristics are proven to 
> improve performance.

Note that I'm asking about a minimum buffer size, for example if
hisi_acc reports only 10s of KB for an estimated device size, why would
we still allocate VFIO_MIG_DATA_BUFFER_SIZE here?  Thanks,

Alex

Re: [PATCH v3 12/17] vfio/migration: Implement VFIO migration protocol v2

Reply via email to