Avihai Horon <avih...@nvidia.com> writes: > On 17/05/2023 12:17, Markus Armbruster wrote: >> External email: Use caution opening links or attachments >> >> >> Avihai Horon <avih...@nvidia.com> writes: >> >>> Migration downtime estimation is calculated based on bandwidth and >>> remaining migration data. This assumes that loading of migration data in >>> the destination takes a negligible amount of time and that downtime >>> depends only on network speed. >>> >>> While this may be true for RAM, it's not necessarily true for other >>> migration users. For example, loading the data of a VFIO device in the >>> destination might require from the device to allocate resources, prepare >>> internal data structures and so on. These operations can take a >>> significant amount of time which can increase migration downtime. >>> >>> This patch adds a new capability "precopy initial data" that allows the >>> source to send initial precopy data and the destination to ACK that this >>> data has been loaded. Migration will not attempt to stop the source VM >>> and complete the migration until this ACK is received. >>> >>> This will allow migration users to send initial precopy data which can >>> be used to reduce downtime (e.g., by pre-allocating resources), while >>> making sure that the source will stop the VM and complete the migration >>> only after this initial precopy data is sent and loaded in the >>> destination so it will have full effect. >>> >>> This new capability relies on the return path capability to communicate >>> from the destination back to the source. >>> >>> The actual implementation of the capability will be added in the >>> following patches. >>> >>> Signed-off-by: Avihai Horon <avih...@nvidia.com> >>> --- >>> qapi/migration.json | 9 ++++++++- >>> migration/options.h | 1 + >>> migration/options.c | 20 ++++++++++++++++++++ >>> 3 files changed, 29 insertions(+), 1 deletion(-) >>> >>> diff --git a/qapi/migration.json b/qapi/migration.json >>> index 82000adce4..d496148386 100644 >>> --- a/qapi/migration.json >>> +++ b/qapi/migration.json >>> @@ -478,6 +478,13 @@ >>> # should not affect the correctness of postcopy >>> migration. >>> # (since 7.1) >>> # >>> +# @precopy-initial-data: If enabled, migration will not attempt to stop >>> source >>> +# VM and complete the migration until an ACK is >>> received >>> +# from the destination that initial precopy data has >>> +# been loaded. This can improve downtime if there >>> are >>> +# migration users that support precopy initial data. >>> +# (since 8.1) >>> +# >> Please format like >> >> # @precopy-initial-data: If enabled, migration will not attempt to >> # stop source VM and complete the migration until an ACK is >> # received from the destination that initial precopy data has been >> # loaded. This can improve downtime if there are migration users >> # that support precopy initial data. (since 8.1) >> >> to blend in with recent commit a937b6aa739 (qapi: Reformat doc comments >> to conform to current conventions). > > Sure. > >> >> What do you mean by "if there are migration users that support precopy >> initial data"? > > This capability only provides the framework to send precopy initial data and > ACK that it was loaded in the destination. > To actually benefit from it, migration users (such as VFIO devices, RAM, > etc.) must implement support for it and use it. > > What I wanted to say here is that there is no point to enable this capability > if there are no migration users that support it. > For example, if you are migrating a VM without VFIO devices, then enabling > this capability will have no effect.
I see. Which "migration users" support it now? Which could support it in the future? Is the "initial precopy data" feature described in more detail anywhere? >> Do I have to ensure the ACK comes by configuring the destination VM in a >> certain way, and if yes, how exactly? > > In v2 of the series that I will send later you will have to enable this > capability also in the destination. What happens when you enable it on the source and not on the destination? [...]