multifd: Fix compile error caused by pag...

Alex Bennée via Qemu-commits Fri, 03 Jan 2025 12:31:29 -0800

  Branch: refs/heads/staging
  Home:   https://github.com/qemu/qemu
  Commit: 69c1295bc92fd51a1109fce89c032c4808fbe4ad
      
https://github.com/qemu/qemu/commit/69c1295bc92fd51a1109fce89c032c4808fbe4ad
  Author: Shameer Kolothum <[email protected]>
  Date:   2025-01-03 (Fri, 03 Jan 2025)


  Changed paths:
    M migration/multifd-uadk.c

  Log Message:
  -----------
  migration/multifd: Fix compile error caused by page_size usage

>From Commit 90fa121c6c07 ("migration/multifd: Inline page_size and
page_count") onwards page_size is not part of MutiFD*Params but uses
an inline constant instead.

However, it missed updating an old usage, causing a compile error.

Fixes: 90fa121c6c07 ("migration/multifd: Inline page_size and page_count")
Signed-off-by: Shameer Kolothum <[email protected]>
Reviewed-by: Fabiano Rosas <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Fabiano Rosas <[email protected]>


  Commit: fed521c0aaf46be191ba26fa2d7394dedc9236e5
      
https://github.com/qemu/qemu/commit/fed521c0aaf46be191ba26fa2d7394dedc9236e5
  Author: Peter Xu <[email protected]>
  Date:   2025-01-03 (Fri, 03 Jan 2025)

  Changed paths:
    M migration/ram.c

  Log Message:
  -----------
  migration/multifd: Further remove the SYNC on complete

Commit 637280aeb2 ("migration/multifd: Avoid the final FLUSH in
complete()") stopped sending the RAM_SAVE_FLAG_MULTIFD_FLUSH flag at
ram_save_complete(), because the sync on the destination side is not
needed due to the last iteration of find_dirty_block() having already
done it.

However, that commit overlooked that multifd_ram_flush_and_sync() on the
source side is also not needed at ram_save_complete(), for the same
reason.

Moreover, removing the RAM_SAVE_FLAG_MULTIFD_FLUSH but keeping the
multifd_ram_flush_and_sync() means that currently the recv threads will
hang when receiving the MULTIFD_FLAG_SYNC message, waiting for the
destination sync which only happens when RAM_SAVE_FLAG_MULTIFD_FLUSH is
received.

Luckily, multifd is still all working fine because recv side cleanup
code (mostly multifd_recv_sync_main()) is smart enough to make sure even
if recv threads are stuck at SYNC it'll get kicked out. And since this
is the completion phase of migration, nothing else will be sent after
the SYNCs.

This needs to be fixed because in the future VFIO will have data to push
after ram_save_complete() and we don't want the recv thread to be stuck
in the MULTIFD_FLAG_SYNC message.

Remove the unnecessary (and buggy) invocation of
multifd_ram_flush_and_sync().

For very old binaries (multifd_flush_after_each_section==true), the
flush_and_sync is still needed because each EOS received on destination
will enforce all-channel sync once.

Stable branches do not need this patch, as no real bug I can think of
that will go wrong there.. so not attaching Fixes to be clear on the
backport not needed.

Reviewed-by: Fabiano Rosas <[email protected]>
Signed-off-by: Peter Xu <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Fabiano Rosas <[email protected]>


  Commit: 2a9aa801641f1f70380a191f109c61bdd514ef24
      
https://github.com/qemu/qemu/commit/2a9aa801641f1f70380a191f109c61bdd514ef24
  Author: Peter Xu <[email protected]>
  Date:   2025-01-03 (Fri, 03 Jan 2025)

  Changed paths:
    M migration/multifd-nocomp.c
    M migration/multifd.c
    M migration/multifd.h

  Log Message:
  -----------
  migration/multifd: Allow to sync with sender threads only

Teach multifd_send_sync_main() to sync with threads only.

We already have such requests, which is when mapped-ram is enabled with
multifd.  In that case, no SYNC messages will be pushed to the stream when
multifd syncs the sender threads because there's no destination threads
waiting for that.  The whole point of the sync is to make sure all threads
finished their jobs.

So fundamentally we have a request to do the sync in different ways:

  - Either to sync the threads only,
  - Or to sync the threads but also with the destination side.

Mapped-ram did it already because of the use_packet check in the sync
handler of the sender thread.  It works.

However it may stop working when e.g. VFIO may start to reuse multifd
channels to push device states.  In that case VFIO has similar request on
"thread-only sync" however we can't check a flag because such sync request
can still come from RAM which needs the on-wire notifications.

Paving way for that by allowing the multifd_send_sync_main() to specify
what kind of sync the caller needs.  We can use it for mapped-ram already.

No functional change intended.

Signed-off-by: Peter Xu <[email protected]>
Reviewed-by: Fabiano Rosas <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Fabiano Rosas <[email protected]>


  Commit: 7e98c712498639ba3abfc5edecef27d7a26f9074
      
https://github.com/qemu/qemu/commit/7e98c712498639ba3abfc5edecef27d7a26f9074
  Author: Peter Xu <[email protected]>
  Date:   2025-01-03 (Fri, 03 Jan 2025)

  Changed paths:
    M migration/ram.c
    M migration/ram.h
    M migration/rdma.h

  Log Message:
  -----------
  migration/ram: Move RAM_SAVE_FLAG* into ram.h

Firstly, we're going to use the multifd flag soon in multifd code, so ram.c
isn't gonna work.

Secondly, we have a separate RDMA flag dangling around, which is definitely
not obvious.  There's one comment that helps, but not too much.

Put all RAM save flags altogether, so nothing will get overlooked.

Add a section explain why we can't use bits over 0x200.

Remove RAM_SAVE_FLAG_FULL as it's already not used in QEMU, as the comment
explained.

Reviewed-by: Fabiano Rosas <[email protected]>
Signed-off-by: Peter Xu <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Fabiano Rosas <[email protected]>


  Commit: 0867922f169bbbb6b6d08510c6b6127f4ebf8e1a
      
https://github.com/qemu/qemu/commit/0867922f169bbbb6b6d08510c6b6127f4ebf8e1a
  Author: Peter Xu <[email protected]>
  Date:   2025-01-03 (Fri, 03 Jan 2025)

  Changed paths:
    M migration/multifd-nocomp.c
    M migration/multifd.h
    M migration/ram.c

  Log Message:
  -----------
  migration/multifd: Unify RAM_SAVE_FLAG_MULTIFD_FLUSH messages

RAM_SAVE_FLAG_MULTIFD_FLUSH message should always be correlated to a sync
request on src.  Unify such message into one place, and conditionally send
the message only if necessary.

Reviewed-by: Fabiano Rosas <[email protected]>
Signed-off-by: Peter Xu <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Fabiano Rosas <[email protected]>


  Commit: 7ce4737dd80bb35aa859ebdb4d80ae6d6a3e7dd6
      
https://github.com/qemu/qemu/commit/7ce4737dd80bb35aa859ebdb4d80ae6d6a3e7dd6
  Author: Peter Xu <[email protected]>
  Date:   2025-01-03 (Fri, 03 Jan 2025)

  Changed paths:
    M migration/ram.c

  Log Message:
  -----------
  migration/multifd: Remove sync processing on postcopy

Multifd never worked with postcopy, at least yet so far.

Remove the sync processing there, because it's confusing, and they should
never appear.  Now if RAM_SAVE_FLAG_MULTIFD_FLUSH is observed, we fail hard
instead of trying to invoke multifd code.

Reviewed-by: Fabiano Rosas <[email protected]>
Signed-off-by: Peter Xu <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Fabiano Rosas <[email protected]>


  Commit: 049a2aefa3bf89de4bae4da3909577dfc1b82f91
      
https://github.com/qemu/qemu/commit/049a2aefa3bf89de4bae4da3909577dfc1b82f91
  Author: Peter Xu <[email protected]>
  Date:   2025-01-03 (Fri, 03 Jan 2025)

  Changed paths:
    M migration/multifd-nocomp.c
    M migration/multifd.h
    M migration/ram.c

  Log Message:
  -----------
  migration/multifd: Cleanup src flushes on condition check

The src flush condition check is over complicated, and it's getting more
out of control if postcopy will be involved.

In general, we have two modes to do the sync: legacy or modern ways.
Legacy uses per-section flush, modern uses per-round flush.

Mapped-ram always uses the modern, which is per-round.

Introduce two helpers, which can greatly simplify the code, and hopefully
make it readable again.

Signed-off-by: Peter Xu <[email protected]>
Reviewed-by: Fabiano Rosas <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Fabiano Rosas <[email protected]>


  Commit: b2e944b92611e1584c254a47b6e34b3032b42f52
      
https://github.com/qemu/qemu/commit/b2e944b92611e1584c254a47b6e34b3032b42f52
  Author: Peter Xu <[email protected]>
  Date:   2025-01-03 (Fri, 03 Jan 2025)

  Changed paths:
    M migration/ram.c

  Log Message:
  -----------
  migration/multifd: Document the reason to sync for save_setup()

It's not straightforward to see why src QEMU needs to sync multifd during
setup() phase.  After all, there's no page queued at that point.

For old QEMUs, there's a solid reason: EOS requires it to work.  While it's
clueless on the new QEMUs which do not take EOS message as sync requests.

One will figure that out only when this is conditionally removed.  In fact,
the author did try it out.  Logically we could still avoid doing this on
new machine types, however that needs a separate compat field and that can
be an overkill in some trivial overhead in setup() phase.

Let's instead document it completely, to avoid someone else tries this
again and do the debug one more time, or anyone confused on why this ever
existed.

Signed-off-by: Peter Xu <[email protected]>
Reviewed-by: Fabiano Rosas <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Fabiano Rosas <[email protected]>


  Commit: 686a4ca58cd0c3238c2d59cfda9dd6f6dffebd90
      
https://github.com/qemu/qemu/commit/686a4ca58cd0c3238c2d59cfda9dd6f6dffebd90
  Author: Fabiano Rosas <[email protected]>
  Date:   2025-01-03 (Fri, 03 Jan 2025)

  Changed paths:
    M migration/multifd.c

  Log Message:
  -----------
  migration/multifd: Fix compat with QEMU < 9.0

Commit f5f48a7891 ("migration/multifd: Separate SYNC request with
normal jobs") changed the multifd source side to stop sending data
along with the MULTIFD_FLAG_SYNC, effectively introducing the concept
of a SYNC-only packet. Relying on that, commit d7e58f412c
("migration/multifd: Don't send ram data during SYNC") later came
along and skipped reading data from SYNC packets.

In a versions timeline like this:

  8.2 f5f48a7 9.0 9.1 d7e58f41 9.2

The issue arises that QEMUs < 9.0 still send data along with SYNC, but
QEMUs > 9.1 don't gather that data anymore. This leads to various
kinds of migration failures due to desync/missing data.

Stop checking for a SYNC packet on the destination and unconditionally
unfill the packet.

>From now on:

old -> new:
the source sends data + sync, destination reads normally

new -> new:
source sends only sync, destination reads zeros

new -> old:
source sends only sync, destination reads zeros

CC: [email protected]
Fixes: d7e58f412c ("migration/multifd: Don't send ram data during SYNC")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2720
Reviewed-by: Peter Xu <[email protected]>
Signed-off-by: Fabiano Rosas <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Fabiano Rosas <[email protected]>


  Commit: 96f876d3d140fc6d0035a9ea7465cbfabc5aaca1
      
https://github.com/qemu/qemu/commit/96f876d3d140fc6d0035a9ea7465cbfabc5aaca1
  Author: Peter Xu <[email protected]>
  Date:   2025-01-03 (Fri, 03 Jan 2025)

  Changed paths:
    M migration/migration.c

  Log Message:
  -----------
  migration: Add helper to get target runstate

In 99% cases, after QEMU migrates to dest host, it tries to detect the
target VM runstate using global_state_get_runstate().

There's one outlier so far which is Xen that won't send global state.
That's the major reason why global_state_received() check was always there
together with global_state_get_runstate().

However it's utterly confusing why global_state_received() has anything to
do with "let's start VM or not".

Provide a helper to explain it, then we have an unified entry for getting
the target dest QEMU runstate after migration.

Suggested-by: Fabiano Rosas <[email protected]>
Signed-off-by: Peter Xu <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Fabiano Rosas <[email protected]>


  Commit: 140e1c8fd45157d6ac793bcbc6e58c8d70cd8371
      
https://github.com/qemu/qemu/commit/140e1c8fd45157d6ac793bcbc6e58c8d70cd8371
  Author: Peter Xu <[email protected]>
  Date:   2025-01-03 (Fri, 03 Jan 2025)

  Changed paths:
    M monitor/qmp-cmds.c

  Log Message:
  -----------
  qmp/cont: Only activate disks if migration completed

As the comment says, the activation of disks is for the case where
migration has completed, rather than when QEMU is still during
migration (RUN_STATE_INMIGRATE).

Move the code over to reflect what the comment is describing.

Cc: Kevin Wolf <[email protected]>
Cc: Markus Armbruster <[email protected]>
Signed-off-by: Peter Xu <[email protected]>
Reviewed-by: Fabiano Rosas <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Fabiano Rosas <[email protected]>


  Commit: 90d69c53af5362570bbb6b110c97ff3004f813a2
      
https://github.com/qemu/qemu/commit/90d69c53af5362570bbb6b110c97ff3004f813a2
  Author: Peter Xu <[email protected]>
  Date:   2025-01-03 (Fri, 03 Jan 2025)

  Changed paths:
    M migration/migration.c

  Log Message:
  -----------
  migration/block: Make late-block-active the default

Migration capability 'late-block-active' controls when the block drives
will be activated.  If enabled, block drives will only be activated until
VM starts, either src runstate was "live" (RUNNING, or SUSPENDED), or it'll
be postponed until qmp_cont().

Let's do this unconditionally.  There's no harm to delay activation of
block drives.  Meanwhile there's no ABI breakage if dest does it, because
src QEMU has nothing to do with it, so it's no concern on ABI breakage.

IIUC we could avoid introducing this cap when introducing it before, but
now it's still not too late to just always do it.  Cap now prone to
removal, but it'll be for later patches.

Signed-off-by: Peter Xu <[email protected]>
Reviewed-by: Fabiano Rosas <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Fabiano Rosas <[email protected]>


  Commit: 6909150a2f364581eaecf1c069f429a529b02605
      
https://github.com/qemu/qemu/commit/6909150a2f364581eaecf1c069f429a529b02605
  Author: Peter Xu <[email protected]>
  Date:   2025-01-03 (Fri, 03 Jan 2025)

  Changed paths:
    M migration/savevm.c

  Log Message:
  -----------
  migration/block: Apply late-block-active behavior to postcopy

Postcopy never cared about late-block-active.  However there's no mention
in the capability that it doesn't apply to postcopy.

Considering that we _assumed_ late activation is always good, do that too
for postcopy unconditionally, just like precopy.  After this patch, we
should have unified the behavior across all.

Signed-off-by: Peter Xu <[email protected]>
Reviewed-by: Fabiano Rosas <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Fabiano Rosas <[email protected]>


  Commit: 66ed08fb72cfd40cc722f0871b66289ae8bf731a
      
https://github.com/qemu/qemu/commit/66ed08fb72cfd40cc722f0871b66289ae8bf731a
  Author: Peter Xu <[email protected]>
  Date:   2025-01-03 (Fri, 03 Jan 2025)

  Changed paths:
    M migration/migration.c
    M migration/savevm.c

  Log Message:
  -----------
  migration/block: Fix possible race with block_inactive

Src QEMU sets block_inactive=true very early before the invalidation takes
place.  It means if something wrong happened during setting the flag but
before reaching qemu_savevm_state_complete_precopy_non_iterable() where it
did the invalidation work, it'll make block_inactive flag inconsistent.

For example, think about when qemu_savevm_state_complete_precopy_iterable()
can fail: it will have block_inactive set to true even if all block drives
are active.

Fix that by only update the flag after the invalidation is done.

No Fixes for any commit, because it's not an issue if bdrv_activate_all()
is re-entrant upon all-active disks - false positive block_inactive can
bring nothing more than "trying to active the blocks but they're already
active".  However let's still do it right to avoid the inconsistent flag
v.s. reality.

Signed-off-by: Peter Xu <[email protected]>
Reviewed-by: Fabiano Rosas <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Fabiano Rosas <[email protected]>


  Commit: 2c9d019fda3bd553d26bd6f78df7f67d5873c374
      
https://github.com/qemu/qemu/commit/2c9d019fda3bd553d26bd6f78df7f67d5873c374
  Author: Peter Xu <[email protected]>
  Date:   2025-01-03 (Fri, 03 Jan 2025)

  Changed paths:
    M include/migration/misc.h
    A migration/block-active.c
    M migration/colo.c
    M migration/meson.build
    M migration/migration.c
    M migration/migration.h
    M migration/savevm.c
    M migration/trace-events
    M monitor/qmp-cmds.c

  Log Message:
  -----------
  migration/block: Rewrite disk activation

This patch proposes a flag to maintain disk activation status globally.  It
mostly rewrites disk activation mgmt for QEMU, including COLO and QMP
command xen_save_devices_state.

Backgrounds
===========

We have two problems on disk activations, one resolved, one not.

Problem 1: disk activation recover (for switchover interruptions)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

When migration is either cancelled or failed during switchover, especially
when after the disks are inactivated, QEMU needs to remember re-activate
the disks again before vm starts.

It used to be done separately in two paths: one in qmp_migrate_cancel(),
the other one in the failure path of migration_completion().

It used to be fixed in different commits, all over the places in QEMU.  So
these are the relevant changes I saw, I'm not sure if it's complete list:

 - In 2016, commit fe904ea824 ("migration: regain control of images when
   migration fails to complete")

 - In 2017, commit 1d2acc3162 ("migration: re-active images while migration
   been canceled after inactive them")

 - In 2023, commit 6dab4c93ec ("migration: Attempt disk reactivation in
   more failure scenarios")

Now since we have a slightly better picture maybe we can unify the
reactivation in a single path.

One side benefit of doing so is, we can move the disk operation outside QMP
command "migrate_cancel".  It's possible that in the future we may want to
make "migrate_cancel" be OOB-compatible, while that requires the command
doesn't need BQL in the first place.  This will already do that and make
migrate_cancel command lightweight.

Problem 2: disk invalidation on top of invalidated disks
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

This is an unresolved bug for current QEMU.  Link in "Resolves:" at the
end.  It turns out besides the src switchover phase (problem 1 above), QEMU
also needs to remember block activation on destination.

Consider two continuous migration in a row, where the VM was always paused.
In that scenario, the disks are not activated even until migration
completed in the 1st round.  When the 2nd round starts, if QEMU doesn't
know the status of the disks, it needs to try inactivate the disk again.

Here the issue is the block layer API bdrv_inactivate_all() will crash a
QEMU if invoked on already inactive disks for the 2nd migration.  For
detail, see the bug link at the end.

Implementation
==============

This patch proposes to maintain disk activation with a global flag, so we
know:

  - If we used to inactivate disks for migration, but migration got
  cancelled, or failed, QEMU will know it should reactivate the disks.

  - On incoming side, if the disks are never activated but then another
  migration is triggered, QEMU should be able to tell that inactivate is
  not needed for the 2nd migration.

We used to have disk_inactive, but it only solves the 1st issue, not the
2nd.  Also, it's done in completely separate paths so it's extremely hard
to follow either how the flag changes, or the duration that the flag is
valid, and when we will reactivate the disks.

Convert the existing disk_inactive flag into that global flag (also invert
its naming), and maintain the disk activation status for the whole
lifecycle of qemu.  That includes the incoming QEMU.

Put both of the error cases of source migration (failure, cancelled)
together into migration_iteration_finish(), which will be invoked for
either of the scenario.  So from that part QEMU should behave the same as
before.  However with such global maintenance on disk activation status, we
not only cleanup quite a few temporary paths that we try to maintain the
disk activation status (e.g. in postcopy code), meanwhile it fixes the
crash for problem 2 in one shot.

For freshly started QEMU, the flag is initialized to TRUE showing that the
QEMU owns the disks by default.

For incoming migrated QEMU, the flag will be initialized to FALSE once and
for all showing that the dest QEMU doesn't own the disks until switchover.
That is guaranteed by the "once" variable.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2395
Signed-off-by: Peter Xu <[email protected]>
Reviewed-by: Fabiano Rosas <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Fabiano Rosas <[email protected]>


  Commit: 65a11ffdfabefd6017e65c3c98649944a4ecda9a
      
https://github.com/qemu/qemu/commit/65a11ffdfabefd6017e65c3c98649944a4ecda9a
  Author: Fabiano Rosas <[email protected]>
  Date:   2025-01-03 (Fri, 03 Jan 2025)

  Changed paths:
    M hw/s390x/s390-virtio-ccw.c

  Log Message:
  -----------
  s390x: Fix CSS migration

Commit a55ae46683 ("s390: move css_migration_enabled from machine to
css.c") disabled CSS migration globally instead of doing it
per-instance.

CC: Paolo Bonzini <[email protected]>
CC: [email protected] #9.1
Fixes: a55ae46683 ("s390: move css_migration_enabled from machine to css.c")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2704
Reviewed-by: Thomas Huth <[email protected]>
Signed-off-by: Fabiano Rosas <[email protected]>
Message-Id: <[email protected]>


  Commit: 3d266f9702ac116c2e8f2678aedbc7c3661b265b
      
https://github.com/qemu/qemu/commit/3d266f9702ac116c2e8f2678aedbc7c3661b265b
  Author: Fabiano Rosas <[email protected]>
  Date:   2025-01-03 (Fri, 03 Jan 2025)

  Changed paths:
    M scripts/analyze-migration.py

  Log Message:
  -----------
  migration: Add more error handling to analyze-migration.py

The analyze-migration script was seen failing in s390x in misterious
ways. It seems we're reaching the VMSDFieldStruct constructor without
any fields, which would indicate an empty .subsection entry, a
VMSTATE_STRUCT with no fields or a vmsd with no fields. We don't have
any of those, at least not without the unmigratable flag set, so this
should never happen.

Add some debug statements so that we can see what's going on the next
time the issue happens.

Reviewed-by: Peter Xu <[email protected]>
Signed-off-by: Fabiano Rosas <[email protected]>
Message-Id: <[email protected]>


Compare: https://github.com/qemu/qemu/compare/9ee90cfc2574...3d266f9702ac

To unsubscribe from these emails, change your notification settings at 
https://github.com/qemu/qemu/settings/notifications

[Qemu-commits] [qemu/qemu] 69c129: migration/multifd: Fix compile error caused by pag...

Reply via email to