On Wed, 15 Feb 2023 20:23:12 +0200 Avihai Horon <avih...@nvidia.com> wrote:
> On 15/02/2023 15:01, Juan Quintela wrote: > > External email: Use caution opening links or attachments > > > > > > Avihai Horon <avih...@nvidia.com> wrote: > >> Implement the basic mandatory part of VFIO migration protocol v2. > >> This includes all functionality that is necessary to support > >> VFIO_MIGRATION_STOP_COPY part of the v2 protocol. > >> > >> The two protocols, v1 and v2, will co-exist and in the following patches > >> v1 protocol code will be removed. > >> > >> There are several main differences between v1 and v2 protocols: > >> - VFIO device state is now represented as a finite state machine instead > >> of a bitmap. > >> > >> - Migration interface with kernel is now done using VFIO_DEVICE_FEATURE > >> ioctl and normal read() and write() instead of the migration region. > >> > >> - Pre-copy is made optional in v2 protocol. Support for pre-copy will be > >> added later on. > >> > >> Detailed information about VFIO migration protocol v2 and its difference > >> compared to v1 protocol can be found here [1]. > >> > >> [1] > >> https://lore.kernel.org/all/20220224142024.147653-10-yish...@nvidia.com/ > >> > >> Signed-off-by: Avihai Horon <avih...@nvidia.com> > >> +/* > >> + * Migration size of VFIO devices can be as little as a few KBs or as big > >> as > >> + * many GBs. This value should be big enough to cover the worst case. > >> + */ > >> +#define VFIO_MIG_STOP_COPY_SIZE (100 * GiB) > > Wow O:-) > > > >> + > >> +/* > >> + * Only exact function is implemented and not estimate function. The > >> reason is > >> + * that during pre-copy phase of migration the estimate function is called > >> + * repeatedly while pending RAM size is over the threshold, thus migration > >> + * can't converge and querying the VFIO device pending data size is > >> useless. > >> + */ > > You can do it after this is merge, but I think you can do better than > > this. Something in the lines of: > > > > > > // I put it in a global variable, but it really needs to be in > > VFIODevice to be // able to support several devices. You get the idea > > O:-) > > > > static uint64_t cached_size = -1; > > > > static void vfio_state_pending_exact(void *opaque, uint64_t > > *res_precopy_only, > > uint64_t *res_compatible, > > uint64_t *res_postcopy_only) > > { > > VFIODevice *vbasedev = opaque; > > uint64_t stop_copy_size = VFIO_MIG_STOP_COPY_SIZE; > > > > /* > > * If getting pending migration size fails, VFIO_MIG_STOP_COPY_SIZE is > > * reported so downtime limit won't be violated. > > */ > > vfio_query_stop_copy_size(vbasedev, &stop_copy_size); > > *res_precopy_only += stop_copy_size; > > cached_size = stop_copy_size; > > > > trace_vfio_state_pending_exact(vbasedev->name, *res_precopy_only, > > *res_postcopy_only, *res_compatible, > > stop_copy_size); > > } > > > > > > static void vfio_state_pending_estimate(void *opaque, uint64_t > > *res_precopy_only, > > uint64_t *res_compatible, > > uint64_t *res_postcopy_only) > > { > > VFIODevice *vbasedev = opaque; > > uint64_t stop_copy_size = VFIO_MIG_STOP_COPY_SIZE; > > > > if (cached_size == -1) { > > uint64_t res_precopy; > > uint64_t res_compatible; > > uint64_t res_postcopy; > > vfio_state_pending_exact(opaque, &res_precopy, &res_compatible, > > &res_postcopy); > > } > > *res_precopy_only += cached_size; > > } > > In the next series, which will add pre-copy support to VFIO migration > (v1 was sent [1] but isn't rebased on your pull reqs yet), I am going to > do something similar to what you suggested. > It will be like you did here but with pre-copy data size (data which can > be transferred during pre-copy phase) instead of the stop_copy_size. > > Plus, I don't think caching the stop_copy_size and reporting the cached > value in the estimate handler fits the best here, > because stop_copy_size doesn't decrease by pre-copy iterations as > opposed to RAM pre-copy data, for example. > > So I would rather keep things as they are and add something similar to > your suggestion in the pre-copy series. Assuming Juan is on board with this, please re-base your series on Juan's latest pull request. At the time of writing, this is: https://lore.kernel.org/all/20230215200554.1365-1-quint...@redhat.com/ The kernel headers need an update (you might as well pick up v6.2-rc8 at this point to be as close as possible to what lands in v6.2), Juan's pull request includes qemu_file_get_to_fd(), so we can drop it here, and there are some conflicts that need to be ironed out relative to 24beea4efe6e ("migration: Rename res_{postcopy,precopy}_only"). Thanks, Alex