On 6/18/2022 12:51 AM, Alex Williamson wrote:
External email: Use caution opening links or attachments


On Mon, 13 Jun 2022 14:21:26 +0300
Avihai Horon <avih...@nvidia.com> wrote:

On 6/8/2022 12:32 AM, Alex Williamson wrote:
External email: Use caution opening links or attachments


On Tue, 7 Jun 2022 20:44:23 +0300
Avihai Horon <avih...@nvidia.com> wrote:

On 5/30/2022 8:07 PM, Avihai Horon wrote:
Hello,

Following VFIO migration protocol v2 acceptance in kernel, this series
implements VFIO migration according to the new v2 protocol and replaces
the now deprecated v1 implementation.

The main differences between v1 and v2 migration protocols are:
1. VFIO device state is represented as a finite state machine instead of
      a bitmap.

2. The migration interface with kernel is done using VFIO_DEVICE_FEATURE
      ioctl and normal read() and write() instead of the migration region
      used in v1.

3. Migration protocol v2 currently doesn't support the pre-copy phase of
      migration.

Full description of the v2 protocol and the differences from v1 can be
found here [1].

Patches 1-3 are prep patches fixing bugs and adding QEMUFile function
that will be used later.

Patches 4-6 refactor v1 protocol code to make it easier to add v2
protocol.

Patches 7-11 implement v2 protocol and remove v1 protocol.

Thanks.

[1]
https://lore.kernel.org/all/20220224142024.147653-10-yish...@nvidia.com/

Changes from v1: 
https://lore.kernel.org/all/20220512154320.19697-1-avih...@nvidia.com/
- Split the big patch that replaced v1 with v2 into several patches as
     suggested by Joao, to make review easier.
- Change warn_report to warn_report_once when container doesn't support
     dirty tracking.
- Add Reviewed-by tag.

Avihai Horon (11):
     vfio/migration: Fix NULL pointer dereference bug
     vfio/migration: Skip pre-copy if dirty page tracking is not supported
     migration/qemu-file: Add qemu_file_get_to_fd()
     vfio/common: Change vfio_devices_all_running_and_saving() logic to
       equivalent one
     vfio/migration: Move migration v1 logic to vfio_migration_init()
     vfio/migration: Rename functions/structs related to v1 protocol
     vfio/migration: Implement VFIO migration protocol v2
     vfio/migration: Remove VFIO migration protocol v1
     vfio/migration: Reset device if setting recover state fails
     vfio: Alphabetize migration section of VFIO trace-events file
     docs/devel: Align vfio-migration docs to VFIO migration v2

    docs/devel/vfio-migration.rst |  77 ++--
    hw/vfio/common.c              |  21 +-
    hw/vfio/migration.c           | 640 ++++++++--------------------------
    hw/vfio/trace-events          |  25 +-
    include/hw/vfio/vfio-common.h |   8 +-
    migration/migration.c         |   5 +
    migration/migration.h         |   3 +
    migration/qemu-file.c         |  34 ++
    migration/qemu-file.h         |   1 +
    9 files changed, 252 insertions(+), 562 deletions(-)

Ping.
Based on the changelog, this seems like a mostly cosmetic spin and I
don't see that all of the discussion threads from v1 were resolved to
everyone's satisfaction.  I'm certainly still uncomfortable with the
pre-copy behavior and I thought there were still some action items to
figure out whether an SLA is present and vet the solution with
management tools.  Thanks,
Yes.
OK, so let's clear things up and reach an agreement before I prepare the
v3 series.

There are three topics that came up in previous discussion:

  1. [PATCH v2 01/11] vfio/migration: Fix NULL pointer dereference bug.
     Juan gave his Reviewed-by but he wasn't sure about qemu_file_* usage
     outside migration thread.
     This code existed before and I fixed a NULL pointer dereference that
     I encountered.
     I suggested that later we can refactor VMChangeStateHandler to
     return error.
     I prefer not to do this refactor right now because I am not sure
     it's as straightforward change as it might seem - if some notifier
     fails and we abort do_vm_stop/vm_prepare_start in the middle, can
     this leave the VM in some unstable state?
     We plan to leave it as is and not do the refactor as part of this
     series.
     Are you ok with this?
I'll defer to Juan here, it's not 100% clear to me from the last reply
if he's looking for that sooner than later.  Juan?
<snip>
  3. [PATCH v2 03/11] migration/qemu-file: Add qemu_file_get_to_fd().
     Juan expressed his concern about the amount of data that will go
     through main migration thread.

This is already the case in v1 protocol - VFIO devices send all their
data in the main migration thread. Note that like in v1 protocol, here
as well the data is sent in small sized chunks, each with a header.
This patch just aims to eliminate an extra copy.

We plan to leave it as is. Is this ok?
I don't think we should lean too heavily on this being a bump from v1 to
v2 protocol as v1 was only ever experimental and hasn't been widely
used in practice AFAIK.  Again, I'll defer to the migration folks for
this, it requires their buy-in.  Thanks,

Ping.
Juan, can you respond to items 1 and 3?
Thanks.



Reply via email to