On Mon, 13 Jun 2022 14:21:26 +0300 Avihai Horon <avih...@nvidia.com> wrote:
> On 6/8/2022 12:32 AM, Alex Williamson wrote: > > External email: Use caution opening links or attachments > > > > > > On Tue, 7 Jun 2022 20:44:23 +0300 > > Avihai Horon <avih...@nvidia.com> wrote: > > > >> On 5/30/2022 8:07 PM, Avihai Horon wrote: > >>> Hello, > >>> > >>> Following VFIO migration protocol v2 acceptance in kernel, this series > >>> implements VFIO migration according to the new v2 protocol and replaces > >>> the now deprecated v1 implementation. > >>> > >>> The main differences between v1 and v2 migration protocols are: > >>> 1. VFIO device state is represented as a finite state machine instead of > >>> a bitmap. > >>> > >>> 2. The migration interface with kernel is done using VFIO_DEVICE_FEATURE > >>> ioctl and normal read() and write() instead of the migration region > >>> used in v1. > >>> > >>> 3. Migration protocol v2 currently doesn't support the pre-copy phase of > >>> migration. > >>> > >>> Full description of the v2 protocol and the differences from v1 can be > >>> found here [1]. > >>> > >>> Patches 1-3 are prep patches fixing bugs and adding QEMUFile function > >>> that will be used later. > >>> > >>> Patches 4-6 refactor v1 protocol code to make it easier to add v2 > >>> protocol. > >>> > >>> Patches 7-11 implement v2 protocol and remove v1 protocol. > >>> > >>> Thanks. > >>> > >>> [1] > >>> https://lore.kernel.org/all/20220224142024.147653-10-yish...@nvidia.com/ > >>> > >>> Changes from v1: > >>> https://lore.kernel.org/all/20220512154320.19697-1-avih...@nvidia.com/ > >>> - Split the big patch that replaced v1 with v2 into several patches as > >>> suggested by Joao, to make review easier. > >>> - Change warn_report to warn_report_once when container doesn't support > >>> dirty tracking. > >>> - Add Reviewed-by tag. > >>> > >>> Avihai Horon (11): > >>> vfio/migration: Fix NULL pointer dereference bug > >>> vfio/migration: Skip pre-copy if dirty page tracking is not supported > >>> migration/qemu-file: Add qemu_file_get_to_fd() > >>> vfio/common: Change vfio_devices_all_running_and_saving() logic to > >>> equivalent one > >>> vfio/migration: Move migration v1 logic to vfio_migration_init() > >>> vfio/migration: Rename functions/structs related to v1 protocol > >>> vfio/migration: Implement VFIO migration protocol v2 > >>> vfio/migration: Remove VFIO migration protocol v1 > >>> vfio/migration: Reset device if setting recover state fails > >>> vfio: Alphabetize migration section of VFIO trace-events file > >>> docs/devel: Align vfio-migration docs to VFIO migration v2 > >>> > >>> docs/devel/vfio-migration.rst | 77 ++-- > >>> hw/vfio/common.c | 21 +- > >>> hw/vfio/migration.c | 640 ++++++++-------------------------- > >>> hw/vfio/trace-events | 25 +- > >>> include/hw/vfio/vfio-common.h | 8 +- > >>> migration/migration.c | 5 + > >>> migration/migration.h | 3 + > >>> migration/qemu-file.c | 34 ++ > >>> migration/qemu-file.h | 1 + > >>> 9 files changed, 252 insertions(+), 562 deletions(-) > >>> > >> Ping. > > Based on the changelog, this seems like a mostly cosmetic spin and I > > don't see that all of the discussion threads from v1 were resolved to > > everyone's satisfaction. I'm certainly still uncomfortable with the > > pre-copy behavior and I thought there were still some action items to > > figure out whether an SLA is present and vet the solution with > > management tools. Thanks, > > Yes. > OK, so let's clear things up and reach an agreement before I prepare the > v3 series. > > There are three topics that came up in previous discussion: > > 1. [PATCH v2 01/11] vfio/migration: Fix NULL pointer dereference bug. > Juan gave his Reviewed-by but he wasn't sure about qemu_file_* usage > outside migration thread. > This code existed before and I fixed a NULL pointer dereference that > I encountered. > I suggested that later we can refactor VMChangeStateHandler to > return error. > I prefer not to do this refactor right now because I am not sure > it's as straightforward change as it might seem - if some notifier > fails and we abort do_vm_stop/vm_prepare_start in the middle, can > this leave the VM in some unstable state? > We plan to leave it as is and not do the refactor as part of this > series. > Are you ok with this? I'll defer to Juan here, it's not 100% clear to me from the last reply if he's looking for that sooner than later. Juan? > 2. [PATCH v2 02/11] vfio/migration: Skip pre-copy if dirty page > tracking is not supported. > As previously discussed, this patch doesn't consider the configured > downtime limit. > One way to fix it is to allow such migration only when "no SLA" (no > downtime limit) is set. AFAIK today there is no way that one can set > "no SLA". > If we go with this option, we change normal flow of migration > (skipping pre-copy) and might need to change management tools. > > Instead, what about letting QEMU VFIO code mark all pages dirty (instead > of kernel)? > This way we don’t skip pre-copy and we get the same behavior we have now > of perpetual dirtying all RAM, which respects SLA. > If we go with this option, do we need to block migration when IOMMU is > sPAPR TCE? > Until now migration would be blocked because sPAPR TCE doesn't report > dirty_pages_supported cap, but going with this option we will allow > migration even when dirty_pages_supported cap is not set (and let QEMU > dirty all pages). It's ok by me if QEMU vfio is the one that marks all mapped pages dirty if the host interface provides no way to do so. Would we toggle that based on whether the device has bus-master enabled? Regarding SPAPR, I'd tend to think that if we're dirtying in QEMU then nothing prevents us from implementing the same there, but also I'm not going to stand in the way of simply disabling migration for that IOMMU backend unless someone speaks up that they think it deserves parity. > 3. [PATCH v2 03/11] migration/qemu-file: Add qemu_file_get_to_fd(). > Juan expressed his concern about the amount of data that will go > through main migration thread. > > This is already the case in v1 protocol - VFIO devices send all their > data in the main migration thread. Note that like in v1 protocol, here > as well the data is sent in small sized chunks, each with a header. > This patch just aims to eliminate an extra copy. > > We plan to leave it as is. Is this ok? I don't think we should lean too heavily on this being a bump from v1 to v2 protocol as v1 was only ever experimental and hasn't been widely used in practice AFAIK. Again, I'll defer to the migration folks for this, it requires their buy-in. Thanks, Alex