Avihai Horon <avih...@nvidia.com> writes:

> On 17/05/2023 12:17, Markus Armbruster wrote:
>> External email: Use caution opening links or attachments
>>
>>
>> Avihai Horon <avih...@nvidia.com> writes:
>>
>>> Migration downtime estimation is calculated based on bandwidth and
>>> remaining migration data. This assumes that loading of migration data in
>>> the destination takes a negligible amount of time and that downtime
>>> depends only on network speed.
>>>
>>> While this may be true for RAM, it's not necessarily true for other
>>> migration users. For example, loading the data of a VFIO device in the
>>> destination might require from the device to allocate resources, prepare
>>> internal data structures and so on. These operations can take a
>>> significant amount of time which can increase migration downtime.
>>>
>>> This patch adds a new capability "precopy initial data" that allows the
>>> source to send initial precopy data and the destination to ACK that this
>>> data has been loaded. Migration will not attempt to stop the source VM
>>> and complete the migration until this ACK is received.
>>>
>>> This will allow migration users to send initial precopy data which can
>>> be used to reduce downtime (e.g., by pre-allocating resources), while
>>> making sure that the source will stop the VM and complete the migration
>>> only after this initial precopy data is sent and loaded in the
>>> destination so it will have full effect.
>>>
>>> This new capability relies on the return path capability to communicate
>>> from the destination back to the source.
>>>
>>> The actual implementation of the capability will be added in the
>>> following patches.
>>>
>>> Signed-off-by: Avihai Horon <avih...@nvidia.com>
>>> ---
>>>   qapi/migration.json |  9 ++++++++-
>>>   migration/options.h |  1 +
>>>   migration/options.c | 20 ++++++++++++++++++++
>>>   3 files changed, 29 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/qapi/migration.json b/qapi/migration.json
>>> index 82000adce4..d496148386 100644
>>> --- a/qapi/migration.json
>>> +++ b/qapi/migration.json
>>> @@ -478,6 +478,13 @@
>>>   #                    should not affect the correctness of postcopy 
>>> migration.
>>>   #                    (since 7.1)
>>>   #
>>> +# @precopy-initial-data: If enabled, migration will not attempt to stop 
>>> source
>>> +#                        VM and complete the migration until an ACK is 
>>> received
>>> +#                        from the destination that initial precopy data has
>>> +#                        been loaded. This can improve downtime if there 
>>> are
>>> +#                        migration users that support precopy initial data.
>>> +#                        (since 8.1)
>>> +#
>> Please format like
>>
>>     # @precopy-initial-data: If enabled, migration will not attempt to
>>     #     stop source VM and complete the migration until an ACK is
>>     #     received from the destination that initial precopy data has been
>>     #     loaded.  This can improve downtime if there are migration users
>>     #     that support precopy initial data.  (since 8.1)
>>
>> to blend in with recent commit a937b6aa739 (qapi: Reformat doc comments
>> to conform to current conventions).
>
> Sure.
>
>>
>> What do you mean by "if there are migration users that support precopy
>> initial data"?
>
> This capability only provides the framework to send precopy initial data and 
> ACK that it was loaded in the destination.
> To actually benefit from it, migration users (such as VFIO devices, RAM, 
> etc.) must implement support for it and use it.
>
> What I wanted to say here is that there is no point to enable this capability 
> if there are no migration users that support it.
> For example, if you are migrating a VM without VFIO devices, then enabling 
> this capability will have no effect.

I see.

Which "migration users" support it now?

Which could support it in the future?

Is the "initial precopy data" feature described in more detail anywhere?

>> Do I have to ensure the ACK comes by configuring the destination VM in a
>> certain way, and if yes, how exactly?
>
> In v2 of the series that I will send later you will have to enable this 
> capability also in the destination.

What happens when you enable it on the source and not on the
destination?

[...]


Reply via email to