On 3.02.2025 20:58, Peter Xu wrote:
On Mon, Feb 03, 2025 at 02:57:36PM +0100, Maciej S. Szmigiero wrote:
On 2.02.2025 13:45, Dr. David Alan Gilbert wrote:
* Maciej S. Szmigiero (m...@maciej.szmigiero.name) wrote:
On 2.02.2025 03:06, Dr. David Alan Gilbert wrote:
* Maciej S. Szmigiero (m...@maciej.szmigiero.name) wrote:
From: "Maciej S. Szmigiero" <maciej.szmigi...@oracle.com>
postcopy_ram_listen_thread() is a free running thread, so it needs to
take BQL around function calls to migration methods requiring BQL.
qemu_loadvm_state_main() needs BQL held since it ultimately calls
"load_state" SaveVMHandlers.
migration_incoming_state_destroy() needs BQL held since it ultimately calls
"load_cleanup" SaveVMHandlers.
Signed-off-by: Maciej S. Szmigiero <maciej.szmigi...@oracle.com>
---
migration/savevm.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/migration/savevm.c b/migration/savevm.c
index b0b74140daea..0ceea9638cc1 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -2013,7 +2013,9 @@ static void *postcopy_ram_listen_thread(void *opaque)
* in qemu_file, and thus we must be blocking now.
*/
qemu_file_set_blocking(f, true);
+ bql_lock();
load_res = qemu_loadvm_state_main(f, mis);
+ bql_unlock();
Doesn't that leave that held for a heck of a long time?
Yes, and it effectively broke "postcopy recover" test but I
think the reason for that is qemu_loadvm_state_main() and
its children don't drop BQL while waiting for I/O.
I've described this case in more detail in my reply to Fabiano here:
https://lore.kernel.org/qemu-devel/0a09e627-955e-4f26-8d08-0192ecd25...@maciej.szmigiero.name/
While it might be the cause in this case, my feeling is it's more fundamental
here - it's the whole reason that postcopy has a separate ram listen
thread. As the destination is running, after it loads it's devices
and as it starts up the destination will be still loading RAM
(and other postcopiable devices) potentially for quite a while.
Holding the bql around the ram listen thread means that the
execution of the destination won't be able to take that lock
until the postcopy load has finished; so while that might apparently
complete, it'll lead to the destination stalling until that's finished
which defeats the whole point of postcopy.
That last one probably won't fail a test but it will lead to a long stall
if you give it a nice big guest with lots of RAM that it's rapidly
changing.
Okay, I understand the postcopy case/flow now.
Thanks for explaining it clearly.
I still think that "load_state" SaveVMHandlers need to be called
with BQL held since implementations apparently expect it that way:
for example, I think PCI device configuration restore calls
address space manipulation methods which abort() if called
without BQL held.
However, the only devices that *should* be arriving on the channel
that the postcopy_ram_listen_thread is reading from are those
that are postcopiable (i.e. RAM and hmm block's dirty_bitmap).
Those load handlers are safe to be run while the other devices
are being changed. Note the *should* - you could add a check
to fail if any other device arrives on that channel.
I think ultimately there should be either an explicit check, or,
as you suggest in the paragraph below, a separate SaveVMHandler
that runs without BQL held.
To me those are bugs happening during postcopy, so those abort()s in
memory.c are indeed for catching these issues too.
Since the current state of just running these SaveVMHandlers
without BQL in this case and hoping that nothing breaks is
clearly sub-optimal.
I have previously even submitted a patch to explicitly document
"load_state" SaveVMHandler as requiring BQL (which was also
included in the previous version of this patch set) and it
received a "Reviewed-by:" tag:
https://lore.kernel.org/qemu-devel/6976f129df610c8207da4e531c8c0475ec204fa4.1730203967.git.maciej.szmigi...@oracle.com/
https://lore.kernel.org/qemu-devel/e1949839932efaa531e2fe63ac13324e5787439c.1731773021.git.maciej.szmigi...@oracle.com/
https://lore.kernel.org/qemu-devel/87o732bti7....@suse.de/
It happens!
You could make this safer by having a load_state and a load_state_postcopy
member, and only mark the load_state as requiring the lock.
To not digress too much from the subject of this patch set
(multifd VFIO device state transfer) for now I've just updated the
TODO comment around that qemu_loadvm_state_main(), so hopefully this
discussion won't get forgotten:
https://gitlab.com/maciejsszmigiero/qemu/-/commit/046e3deac5b1dbc406b3e9571f62468bd6743e79
The commit message may still need some touch ups, e.g.:
postcopy_ram_listen_thread() is a free running thread, so it needs to
take BQL around function calls to migration methods requiring BQL.
This sentence is still not correct, IMHO. As Dave explained, the ram load
thread is designed to run without BQL at least for the major workloads it
runs.
So what's your proposed wording of this commit then?
I don't worry on src sending something that crashes the dest: if that
happens, that's a bug, we need to fix it.. In that case abort() either in
memory.c or migration/ would be the same.
Yeah, but it would be a bug in the source (or just bit stream corruption for
any reason), yet it's the destination which would abort() or crash.
I think cases like that in principle should be handled more gracefully,
like exiting the destination QEMU with an error.
But that's something outside of the scope of this patch set.
We could add some explicit check
in migration code, but I don't expect it to catch anything real, at least
such never happened since postcopy introduced.. so it's roughly 10 years
without anything like that happens.
Taking BQL for migration_incoming_state_destroy() looks all safe. There's
one qemu_ram_block_writeback() which made me a bit nervous initially, but
then looks like RAM backends should be almost noop (for shmem and
hugetlbfs) but except pmem.
That's the only part where taking BQL is actually necessary for the
functionality of this patch set to work properly, so it's fine to leave
that call to qemu_loadvm_state_main() as-is (without BQL) for time being.
The other alternative is we define load_cleanup() to not rely on BQL (which
I believe is true before this series?), then take it only when VFIO's path
needs it.
I think other paths always call load_cleanup() with BQL so it's probably
safer to have consistent semantics here.
Thanks,
Thanks,
Maciej