On Mon, 09 Nov 2020 11:56:02 -0700 Alex Williamson <alex.william...@redhat.com> wrote:
> Per the proposed documentation for vfio device migration: > > Dirty pages are tracked when device is in stop-and-copy phase > because if pages are marked dirty during pre-copy phase and > content is transfered from source to destination, there is no > way to know newly dirtied pages from the point they were copied > earlier until device stops. To avoid repeated copy of same > content, pinned pages are marked dirty only during > stop-and-copy phase. > > Essentially, since we don't have hardware dirty page tracking for > assigned devices at this point, we consider any page that is pinned > by an mdev vendor driver or pinned and mapped through the IOMMU to > be perpetually dirty. In the worst case, this may result in all of > guest memory being considered dirty during every iteration of live > migration. The current vfio implementation of migration has chosen > to mask device dirtied pages until the final stages of migration in > order to avoid this worst case scenario. > > Allowing the device to implement a policy decision to prioritize > reduced migration data like this jeopardizes QEMU's overall ability > to implement any degree of service level guarantees during migration. > For example, any estimates towards achieving acceptable downtime > margins cannot be trusted when such a device is present. The vfio > device should participate in dirty page tracking to the best of its > ability throughout migration, even if that means the dirty footprint > of the device impedes migration progress, allowing both QEMU and > higher level management tools to decide whether to continue the > migration or abort due to failure to achieve the desired behavior. > > Link: https://lists.gnu.org/archive/html/qemu-devel/2020-11/msg00807.html > Cc: Kirti Wankhede <kwankh...@nvidia.com> > Cc: Neo Jia <c...@nvidia.com> > Cc: Dr. David Alan Gilbert <dgilb...@redhat.com> > Cc: Juan Quintela <quint...@redhat.com> > Cc: Philippe Mathieu-Daudé <phi...@redhat.com> > Cc: Cornelia Huck <coh...@redhat.com> > Signed-off-by: Alex Williamson <alex.william...@redhat.com> > --- > > Given that our discussion in the link above seems to be going in > circles, I'm afraid it seems necessary to both have a contigency > plan and to raise the visibility of the current behavior to > determine whether others agree that this is a sufficiently > troubling behavior to consider migration support experimental > at this stage. Please voice your opinion or contribute patches > to resolve this before QEMU 5.2. Thanks, > > Alex > > hw/vfio/migration.c | 2 +- > hw/vfio/pci.c | 2 ++ > include/hw/vfio/vfio-common.h | 1 + > 3 files changed, 4 insertions(+), 1 deletion(-) Given the ongoing discussions, I'd be rather more comfortable making this experimental for the upcoming release and spent some time getting this into a state that everyone is happy to live with, so Acked-by: Cornelia Huck <coh...@redhat.com>