On 6/8/2022 12:32 AM, Alex Williamson wrote:
External email: Use caution opening links or attachments
On Tue, 7 Jun 2022 20:44:23 +0300
Avihai Horon <avih...@nvidia.com> wrote:
On 5/30/2022 8:07 PM, Avihai Horon wrote:
Hello,
Following VFIO migration protocol v2 acceptance in kernel, this series
implements VFIO migration according to the new v2 protocol and replaces
the now deprecated v1 implementation.
The main differences between v1 and v2 migration protocols are:
1. VFIO device state is represented as a finite state machine instead of
a bitmap.
2. The migration interface with kernel is done using VFIO_DEVICE_FEATURE
ioctl and normal read() and write() instead of the migration region
used in v1.
3. Migration protocol v2 currently doesn't support the pre-copy phase of
migration.
Full description of the v2 protocol and the differences from v1 can be
found here [1].
Patches 1-3 are prep patches fixing bugs and adding QEMUFile function
that will be used later.
Patches 4-6 refactor v1 protocol code to make it easier to add v2
protocol.
Patches 7-11 implement v2 protocol and remove v1 protocol.
Thanks.
[1]
https://lore.kernel.org/all/20220224142024.147653-10-yish...@nvidia.com/
Changes from v1:
https://lore.kernel.org/all/20220512154320.19697-1-avih...@nvidia.com/
- Split the big patch that replaced v1 with v2 into several patches as
suggested by Joao, to make review easier.
- Change warn_report to warn_report_once when container doesn't support
dirty tracking.
- Add Reviewed-by tag.
Avihai Horon (11):
vfio/migration: Fix NULL pointer dereference bug
vfio/migration: Skip pre-copy if dirty page tracking is not supported
migration/qemu-file: Add qemu_file_get_to_fd()
vfio/common: Change vfio_devices_all_running_and_saving() logic to
equivalent one
vfio/migration: Move migration v1 logic to vfio_migration_init()
vfio/migration: Rename functions/structs related to v1 protocol
vfio/migration: Implement VFIO migration protocol v2
vfio/migration: Remove VFIO migration protocol v1
vfio/migration: Reset device if setting recover state fails
vfio: Alphabetize migration section of VFIO trace-events file
docs/devel: Align vfio-migration docs to VFIO migration v2
docs/devel/vfio-migration.rst | 77 ++--
hw/vfio/common.c | 21 +-
hw/vfio/migration.c | 640 ++++++++--------------------------
hw/vfio/trace-events | 25 +-
include/hw/vfio/vfio-common.h | 8 +-
migration/migration.c | 5 +
migration/migration.h | 3 +
migration/qemu-file.c | 34 ++
migration/qemu-file.h | 1 +
9 files changed, 252 insertions(+), 562 deletions(-)
Ping.
Based on the changelog, this seems like a mostly cosmetic spin and I
don't see that all of the discussion threads from v1 were resolved to
everyone's satisfaction. I'm certainly still uncomfortable with the
pre-copy behavior and I thought there were still some action items to
figure out whether an SLA is present and vet the solution with
management tools. Thanks,
Yes.
OK, so let's clear things up and reach an agreement before I prepare the
v3 series.
There are three topics that came up in previous discussion:
1. [PATCH v2 01/11] vfio/migration: Fix NULL pointer dereference bug.
Juan gave his Reviewed-by but he wasn't sure about qemu_file_* usage
outside migration thread.
This code existed before and I fixed a NULL pointer dereference that
I encountered.
I suggested that later we can refactor VMChangeStateHandler to
return error.
I prefer not to do this refactor right now because I am not sure
it's as straightforward change as it might seem - if some notifier
fails and we abort do_vm_stop/vm_prepare_start in the middle, can
this leave the VM in some unstable state?
We plan to leave it as is and not do the refactor as part of this
series.
Are you ok with this?
2. [PATCH v2 02/11] vfio/migration: Skip pre-copy if dirty page
tracking is not supported.
As previously discussed, this patch doesn't consider the configured
downtime limit.
One way to fix it is to allow such migration only when "no SLA" (no
downtime limit) is set. AFAIK today there is no way that one can set
"no SLA".
If we go with this option, we change normal flow of migration
(skipping pre-copy) and might need to change management tools.
Instead, what about letting QEMU VFIO code mark all pages dirty (instead
of kernel)?
This way we don’t skip pre-copy and we get the same behavior we have now
of perpetual dirtying all RAM, which respects SLA.
If we go with this option, do we need to block migration when IOMMU is
sPAPR TCE?
Until now migration would be blocked because sPAPR TCE doesn't report
dirty_pages_supported cap, but going with this option we will allow
migration even when dirty_pages_supported cap is not set (and let QEMU
dirty all pages).
3. [PATCH v2 03/11] migration/qemu-file: Add qemu_file_get_to_fd().
Juan expressed his concern about the amount of data that will go
through main migration thread.
This is already the case in v1 protocol - VFIO devices send all their
data in the main migration thread. Note that like in v1 protocol, here
as well the data is sent in small sized chunks, each with a header.
This patch just aims to eliminate an extra copy.
We plan to leave it as is. Is this ok?
Thanks.