On Wed, 29 Jan 2025 at 16:11, Fabiano Rosas <[email protected]> wrote:
>
> From: Steve Sistare <[email protected]>
>
> Add the cpr-transfer migration mode, which allows the user to transfer
> a guest to a new QEMU instance on the same host with minimal guest pause
> time, by preserving guest RAM in place, albeit with new virtual addresses
> in new QEMU, and by preserving device file descriptors. Pages that were
> locked in memory for DMA in old QEMU remain locked in new QEMU, because the
> descriptor of the device that locked them remains open.
>
> cpr-transfer preserves memory and devices descriptors by sending them to
> new QEMU over a unix domain socket using SCM_RIGHTS. Such CPR state cannot
> be sent over the normal migration channel, because devices and backends
> are created prior to reading the channel, so this mode sends CPR state
> over a second "cpr" migration channel. New QEMU reads the cpr channel
> prior to creating devices or backends. The user specifies the cpr channel
> in the channel arguments on the outgoing side, and in a second -incoming
> command-line parameter on the incoming side.
>
> The user must start old QEMU with the the '-machine aux-ram-share=on' option,
> which allows anonymous memory to be transferred in place to the new process
> by transferring a memory descriptor for each ram block. Memory-backend
> objects must have the share=on attribute, but memory-backend-epc is not
> supported.
>
> The user starts new QEMU on the same host as old QEMU, with command-line
> arguments to create the same machine, plus the -incoming option for the
> main migration channel, like normal live migration. In addition, the user
> adds a second -incoming option with channel type "cpr". This CPR channel
> must support file descriptor transfer with SCM_RIGHTS, i.e. it must be a
> UNIX domain socket.
>
> To initiate CPR, the user issues a migrate command to old QEMU, adding
> a second migration channel of type "cpr" in the channels argument.
> Old QEMU stops the VM, saves state to the migration channels, and enters
> the postmigrate state. New QEMU mmap's memory descriptors, and execution
> resumes.
>
> The implementation splits qmp_migrate into start and finish functions.
> Start sends CPR state to new QEMU, which responds by closing the CPR
> channel. Old QEMU detects the HUP then calls finish, which connects the
> main migration channel.
>
> In summary, the usage is:
>
> qemu-system-$arch -machine aux-ram-share=on ...
>
> start new QEMU with "-incoming <main-uri> -incoming <cpr-channel>"
>
> Issue commands to old QEMU:
> migrate_set_parameter mode cpr-transfer
>
> {"execute": "migrate", ...
> {"channel-type": "main"...}, {"channel-type": "cpr"...} ... }
>
> Signed-off-by: Steve Sistare <[email protected]>
> Reviewed-by: Peter Xu <[email protected]>
> Acked-by: Markus Armbruster <[email protected]>
> Link:
> https://lore.kernel.org/r/[email protected]
> Signed-off-by: Fabiano Rosas <[email protected]>
Hi; this commit includes some code that has confused
Coverity (CID 1590980) and it also confused me, so maybe
it could be usefully made clearer?
> void qmp_migrate(const char *uri, bool has_channels,
> MigrationChannelList *channels, bool has_detach, bool
> detach,
> bool has_resume, bool resume, Error **errp)
> @@ -2056,6 +2118,7 @@ void qmp_migrate(const char *uri, bool has_channels,
> g_autoptr(MigrationChannel) channel = NULL;
> MigrationAddress *addr = NULL;
> MigrationChannel *channelv[MIGRATION_CHANNEL_TYPE__MAX] = { NULL };
> + MigrationChannel *cpr_channel = NULL;
>
> /*
> * Having preliminary checks for uri and channel
> @@ -2076,6 +2139,7 @@ void qmp_migrate(const char *uri, bool has_channels,
> }
> channelv[type] = channels->value;
> }
> + cpr_channel = channelv[MIGRATION_CHANNEL_TYPE_CPR];
> addr = channelv[MIGRATION_CHANNEL_TYPE_MAIN]->addr;
> if (!addr) {
> error_setg(errp, "Channel list has no main entry");
> @@ -2096,12 +2160,52 @@ void qmp_migrate(const char *uri, bool has_channels,
> return;
> }
>
> + if (s->parameters.mode == MIG_MODE_CPR_TRANSFER && !cpr_channel) {
> + error_setg(errp, "missing 'cpr' migration channel");
> + return;
> + }
Here in qmp_migrate() we bail out if cpr_channel is NULL,
provided that s->parameters.mode is MIG_MODE_CPR_TRANSFER...
> +
> resume_requested = has_resume && resume;
> if (!migrate_prepare(s, resume_requested, errp)) {
> /* Error detected, put into errp */
> return;
> }
>
> + if (cpr_state_save(cpr_channel, &local_err)) {
...but in cpr_state_save() when we decide whether to dereference
cpr_channel or not, we aren't checking s->parameters.mode,
we call migrate_mode() and check the result of that.
And migrate_mode() isn't completely trivial: it calls
cpr_get_incoming_mode(), so it's not obvious that it's
necessarily going to be the same value as s->parameters.mode.
So Coverity complains that it sees a code path where we
might dereference cpr_channel even when it's NULL.
Could this be made a bit clearer somehow, do you think?
thanks
-- PMM