On Thu, 17 Nov 2022 19:07:10 +0200 Avihai Horon <avih...@nvidia.com> wrote: > On 16/11/2022 20:29, Alex Williamson wrote: > > On Thu, 3 Nov 2022 18:16:15 +0200 > > Avihai Horon <avih...@nvidia.com> wrote: > >> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c > >> index e784374453..62afc23a8c 100644 > >> --- a/hw/vfio/migration.c > >> +++ b/hw/vfio/migration.c > >> @@ -44,8 +44,84 @@ > >> #define VFIO_MIG_FLAG_DEV_SETUP_STATE (0xffffffffef100003ULL) > >> #define VFIO_MIG_FLAG_DEV_DATA_STATE (0xffffffffef100004ULL) > >> > >> +#define VFIO_MIG_DATA_BUFFER_SIZE (1024 * 1024) > > Add comment explaining heuristic of this size. > > This is an arbitrary size we picked with mlx5 state size in mind. > Increasing this size to higher values (128M, 1G) didn't improve > performance in our testing. > > How about this comment: > This is an initial value that doesn't consume much memory and provides > good performance. > > Do you have other suggestion?
I'd lean more towards your description above, ex: /* * This is an arbitrary size based on migration of mlx5 devices, where * the worst case total device migration size is on the order of 100s * of MB. Testing with larger values, ex. 128MB and 1GB, did not show * a performance improvement. */ I think that provides sufficient information for someone who might come later to have an understanding of the basis if they want to try to optimize further. > >> @@ -804,34 +1090,51 @@ static int vfio_migration_init(VFIODevice *vbasedev) > >> return -EINVAL; > >> } > >> > >> - ret = vfio_get_dev_region_info(vbasedev, > >> - VFIO_REGION_TYPE_MIGRATION_DEPRECATED, > >> - > >> VFIO_REGION_SUBTYPE_MIGRATION_DEPRECATED, > >> - &info); > >> - if (ret) { > >> - return ret; > >> - } > >> + ret = vfio_migration_query_flags(vbasedev, &mig_flags); > >> + if (!ret) { > >> + /* Migration v2 */ > >> + /* Basic migration functionality must be supported */ > >> + if (!(mig_flags & VFIO_MIGRATION_STOP_COPY)) { > >> + return -EOPNOTSUPP; > >> + } > >> + vbasedev->migration = g_new0(VFIOMigration, 1); > >> + vbasedev->migration->device_state = VFIO_DEVICE_STATE_RUNNING; > >> + vbasedev->migration->data_buffer_size = VFIO_MIG_DATA_BUFFER_SIZE; > >> + vbasedev->migration->data_buffer = > >> + g_malloc0(vbasedev->migration->data_buffer_size); > > So VFIO_MIG_DATA_BUFFER_SIZE is our chunk size, but why doesn't the > > later addition of estimated device data size make any changes here? > > I'd think we'd want to scale the buffer to the minimum of the reported > > data size and some well documented heuristic for an upper bound. > > As I wrote above, increasing this size to higher values (128M, 1G) > didn't improve performance in our testing. > We can always change it later on if some other heuristics are proven to > improve performance. Note that I'm asking about a minimum buffer size, for example if hisi_acc reports only 10s of KB for an estimated device size, why would we still allocate VFIO_MIG_DATA_BUFFER_SIZE here? Thanks, Alex