Steven Sistare <steven.sist...@oracle.com> writes:

> On 5/2/2024 8:23 AM, Markus Armbruster wrote:
>> Steve Sistare <steven.sist...@oracle.com> writes:
>> 
>>> Add the cpr-exec migration mode.  Usage:
>>>    qemu-system-$arch -machine memfd-alloc=on ...
>>>    migrate_set_parameter mode cpr-exec
>>>    migrate_set_parameter cpr-exec-args \
>>>      <arg1> <arg2> ... -incoming <uri>
>>>    migrate -d <uri>
>>>
>>> The migrate command stops the VM, saves state to the URI,
>>> directly exec's a new version of QEMU on the same host,
>>> replacing the original process while retaining its PID, and
>>> loads state from the URI.  Guest RAM is preserved in place,
>>> albeit with new virtual addresses.
>>>
>>> Arguments for the new QEMU process are taken from the
>>> @cpr-exec-args parameter.  The first argument should be the
>>> path of a new QEMU binary, or a prefix command that exec's the
>>> new QEMU binary.
>>>
>>> Because old QEMU terminates when new QEMU starts, one cannot
>>> stream data between the two, so the URI must be a type, such as
>>> a file, that reads all data before old QEMU exits.
>>>
>>> Memory backend objects must have the share=on attribute, and
>>> must be mmap'able in the new QEMU process.  For example,
>>> memory-backend-file is acceptable, but memory-backend-ram is
>>> not.
>>>
>>> The VM must be started with the '-machine memfd-alloc=on'
>>> option.  This causes implicit ram blocks (those not explicitly
>>> described by a memory-backend object) to be allocated by
>>> mmap'ing a memfd.  Examples include VGA, ROM, and even guest
>>> RAM when it is specified without a memory-backend object.
>>>
>>> The implementation saves precreate vmstate at the end of normal
>>> migration in migrate_fd_cleanup, and tells the main loop to call
>>> cpr_exec.  Incoming qemu loads preceate state early, before objects
>>> are created.  The memfds are kept open across exec by clearing the
>>> close-on-exec flag, their values are saved in precreate vmstate,
>>> and they are mmap'd in new qemu.
>>>
>>> Note that the memfd-alloc option is not related to memory-backend-memfd.
>>> Later patches add support for memory-backend-memfd, and for additional
>>> devices, including vfio, chardev, and more.
>>>
>>> Signed-off-by: Steve Sistare <steven.sist...@oracle.com>
>> 
>> [...]
>> 
>>> diff --git a/qapi/migration.json b/qapi/migration.json
>>> index 49710e7..7c5f45f 100644
>>> --- a/qapi/migration.json
>>> +++ b/qapi/migration.json
>>> @@ -665,9 +665,37 @@
>>>  #     or COLO.
>>>  #
>>>  #     (since 8.2)
>>> +#
>>> +# @cpr-exec: The migrate command stops the VM, saves state to the URI,

What URI?  I know you mean the migration URI, but will readers know?
Elsewhere, we use "migration URI".

Hmm.  That's no good, either: we may not *have* a migration URI since
commit 074dbce5fcce (migration: New migrate and migrate-incoming
argument 'channels') and its fixup commit 57fd4b4e1075 made command
migrate argument @uri optional and mutually exclusive with @channels.

I think we better use more generic terminology here.  Let's have a look
at migrate's documentation for inspiration:

    ##
    # @migrate:
    #
    # Migrates the current running guest to another Virtual Machine.
    #
    # @uri: the Uniform Resource Identifier of the destination VM
    #
    # @channels: list of migration stream channels with each stream in the
    #     list connected to a destination interface endpoint.
    #
    [...]
    # Notes:
    [...]
    #     4. The uri argument should have the Uniform Resource Identifier
    #        of default destination VM. This connection will be bound to
    #        default network.
    #
    #     5. For now, number of migration streams is restricted to one,
    #        i.e. number of items in 'channels' list is just 1.
    #
    #     6. The 'uri' and 'channels' arguments are mutually exclusive;
    #        exactly one of the two should be present.

Perhaps "saves the state to the migration destination"?

>>> +#     directly exec's a new version of QEMU on the same host,
>>> +#     replacing the original process while retaining its PID, and
>>> +#     loads state from the URI.  Guest RAM is preserved in place,

"loads the state from the migration destination"?

We should also fix up existing uses of "migration URI": @mapped-ram,
@cpr-reboot, @tls-hostname.  Not this series' job.  I'll report it
separately.

>>> +#     albeit with new virtual addresses.
>> 
>> Do you mean the virtual addresses of guest RAM may differ betwen old and
>> new QEMU process?
>
> The VA at which a guest RAM segment is mapped in the QEMU process
> changes.  The end user would not notice or care, so I'll drop that
> detail here.
>
>>> +#
>>> +#     Arguments for the new QEMU process are taken from the
>>> +#     @cpr-exec-args parameter.  The first argument should be the
>>> +#     path of a new QEMU binary, or a prefix command that exec's the
>>> +#     new QEMU binary.
>> 
>> What's a "prefix command"?  A wrapper script, perhaps?
>
> A prefix command is any command of the form:
>    command1 command1-args command2 command2-args
> where command1 performs some set up before exec'ing command2.
> However, I will drop the word "prefix", it adds no meaning here.

Maybe "the command to start the new QEMU process"?

Hmm.  @cpr-exec-args is documented like this:

    # @cpr-exec-args: Arguments passed to new QEMU for @cpr-exec mode.
    #    See @cpr-exec for details.  (Since 9.1)

Is it a good idea to keep the details with @cpr-exec?  Let me try not
to.  Replace the "Arguments for the new QEMU process..." paragraph by

    #     The new QEMU process is started according to migration parameter
    #     @cpr-exec-args.

Then document cpr-exec-args like

    # @cpr-exec-args: Command to start the new QEMU process for MigMode
    # @cpr-exec.  The first list element is the program's filename, the
    # remainder its arguments.

What do you think?

Naming the thing "-args" feels questionable.  It's program and
arguments.

For what it's worth, QGA command guest-exec has them separate:

    # @path: path or executable name to execute
    #
    # @arg: argument list to pass to executable

The name @path is poorly chosen.

qmp_guest_exec() then prepends @path to @arg to make the argv[] for the
execve() wrapper it uses.

I figure you'd rather not have them separate, to keep migration
parameters simpler.  Name it @cpr-exec-command?

>>> +#
>>> +#     Because old QEMU terminates when new QEMU starts, one cannot
>>> +#     stream data between the two, so the URI must be a type, such as
>>> +#     a file, that reads all data before old QEMU exits.
>> 
>> What happens when you specify a URI that doesn't?
>
> Old QEMU will quietly block indefinitely writing to the URI.

Worth spelling that out in the doc comment?

>>> +#
>>> +#     Memory backend objects must have the share=on attribute, and
>>> +#     must be mmap'able in the new QEMU process.  For example,
>>> +#     memory-backend-file is acceptable, but memory-backend-ram is
>>> +#     not.
>>> +#
>>> +#     The VM must be started with the '-machine memfd-alloc=on'
>> 
>> What happens when you don't?
>
> If '-only-migratable-modes cpr-exec' is specified, then QEMU will fail
> to start, and print a clear error message.
>
> Otherwise, a blocker is registered and any attempt to cpr-exec will fail
> with a clear error message.

With clear errors, no further documentation is needed.  Good :)

> - Steve
>
>>> +#     option.  This causes implicit ram blocks -- those not explicitly
>>> +#     described by a memory-backend object -- to be allocated by
>>> +#     mmap'ing a memfd.  Examples include VGA, ROM, and even guest
>>> +#     RAM when it is specified without a memory-backend object.
>>> +#
>>> +#     (since 9.1)
>>>   ##
>>>   { 'enum': 'MigMode',
>>> -  'data': [ 'normal', 'cpr-reboot' ] }
>>> +  'data': [ 'normal', 'cpr-reboot', 'cpr-exec' ] }
>>>   
>>>   ##
>>>   # @ZeroPageDetection:
>> 
>> [...]
>> 


Reply via email to