On Wed, Mar 05, 2025 at 04:11:30PM +0100, Maciej S. Szmigiero wrote:
> On 5.03.2025 13:34, Peter Xu wrote:
> > On Tue, Mar 04, 2025 at 11:03:34PM +0100, Maciej S. Szmigiero wrote:
> > > From: "Maciej S. Szmigiero" <[email protected]>
> > >
> > > All callers to migration_incoming_state_destroy() other than
> > > postcopy_ram_listen_thread() do this call with BQL held.
> > >
> > > Since migration_incoming_state_destroy() ultimately calls "load_cleanup"
> > > SaveVMHandlers and it will soon call BQL-sensitive code it makes sense
> > > to always call that function under BQL rather than to have it deal with
> > > both cases (with BQL and without BQL).
> > > Add the necessary bql_lock() and bql_unlock() to
> > > postcopy_ram_listen_thread().
> > >
> > > qemu_loadvm_state_main() in postcopy_ram_listen_thread() could call
> > > "load_state" SaveVMHandlers that are expecting BQL to be held.
> > >
> > > In principle, the only devices that should be arriving on migration
> > > channel serviced by postcopy_ram_listen_thread() are those that are
> > > postcopiable and whose load handlers are safe to be called without BQL
> > > being held.
> > >
> > > But nothing currently prevents the source from sending data for "unsafe"
> > > devices which would cause trouble there.
> > > Add a TODO comment there so it's clear that it would be good to improve
> > > handling of such (erroneous) case in the future.
> > >
> > > Signed-off-by: Maciej S. Szmigiero <[email protected]>
> > > ---
> > > migration/migration.c | 16 ++++++++++++++++
> > > migration/savevm.c | 4 ++++
> > > 2 files changed, 20 insertions(+)
> > >
> > > diff --git a/migration/migration.c b/migration/migration.c
> > > index 9e9db26667f1..6b2a8af4231d 100644
> > > --- a/migration/migration.c
> > > +++ b/migration/migration.c
> > > @@ -402,10 +402,26 @@ void migration_incoming_state_destroy(void)
> > > struct MigrationIncomingState *mis =
> > > migration_incoming_get_current();
> > > multifd_recv_cleanup();
> > > +
> > > /*
> > > * RAM state cleanup needs to happen after multifd cleanup, because
> > > * multifd threads can use some of its states (receivedmap).
> > > + *
> > > + * This call also needs BQL held since it calls all registered
> > > + * load_cleanup SaveVMHandlers and at least the VFIO implementation
> > > is
> > > + * BQL-sensitive.
> > > + *
> > > + * In addition to the above, it also performs cleanup of load threads
> > > + * thread pool.
> > > + * This cleanup operation is BQL-sensitive as it requires unlocking
> > > BQL
> > > + * so a thread possibly waiting for it could get unblocked and
> > > finally
> > > + * exit.
> > > + * The reason why a load thread may need to hold BQL in the first
> > > place
> > > + * is because address space modification operations require it.
> >
> > Hold on...
> >
> > This almost says exactly why load_cleanup() should _not_ take BQL... rather
> > than should..
> >
> > So I had a closer look at the latest code, it's about this:
> >
> > static void vfio_load_cleanup_load_bufs_thread(VFIOMultifd *multifd)
> > {
> > /* The lock order is load_bufs_mutex -> BQL so unlock BQL here first */
> > bql_unlock();
> > WITH_QEMU_LOCK_GUARD(&multifd->load_bufs_mutex) {
> > while (multifd->load_bufs_thread_running) {
> > multifd->load_bufs_thread_want_exit = true;
> >
> > qemu_cond_signal(&multifd->load_bufs_buffer_ready_cond);
> > qemu_cond_signal(&multifd->load_bufs_iter_done_cond);
> > qemu_cond_wait(&multifd->load_bufs_thread_finished_cond,
> > &multifd->load_bufs_mutex);
> > }
> > }
> > bql_lock();
> > }
> >
> > It doesn't make much sense to me to take it only because we want to drop it
> > unconditionally. Can we guarantee the function not taking BQL instead? I
> > had a quick look on pmem's pmem_persist() (from libpmem, qemu_ram_msync <-
> > qemu_ram_block_writeback <- ram_load_cleanup), it looks ok.
> >
> > So the question is, is it safe to unlock BQL in whatever context (in
> > coroutines, or in a bottom half)?
> >
> > If the answer is yes, we could make migration_incoming_state_destroy()
> > always not taking BQL (and assert(!bql_locked()) instead).
>
> All the other callers of migration_incoming_state_destroy() are holding BQL:
> process_incoming_migration_bh(), process_incoming_migration_co() (called on,
> failure path only), load_snapshot() and qmp_xen_load_devices_state().
>
> So AFAIK the safer way is to standardize on holding BQL when calling
> that function.
> > If the answer is no, then vfio_load_cleanup_load_bufs_thread()'s current
> > version may not work either..
>
> I think the reason for BQL is to serialize access to the QEMU internals
> which are not thread-safe.
>
> So as long as these internals aren't touched when not holding BQL then
> we should be safe - I don't see any particular state that's cached
> around these BQL calls and would need explicit reloading after re-gaining
> it.
OK, I checked with misterious force and looks like it's ok.
Would you please rephrase the comment, though? I want to make it crystal
clear that what we're looking for is not holding BQL.. Maybe something like
this:
/*
* The VFIO load_cleanup() implementation is BQL-sensitive. It requires
* BQL must NOT be taken when recycling load threads, so that it won't
* block the load threads from making progress on address space
* modification operations.
*
* To make it work, we could try to not take BQL for all load_cleanup(),
* or conditionally unlock BQL only if bql_locked() in VFIO.
*
* Since most existing call sites take BQL for load_cleanup(), make
* it simple by taking BQL always as the rule, so that VFIO can unlock
* BQL and retake unconditionally.
*/
We may also want to update the subject. Currently:
"migration: postcopy_ram_listen_thread() should take BQL for some calls"
It's not accurate anymore, it could be:
"migration: Always take BQL for migration_incoming_state_destroy()"
If with all above, please feel free to take:
Acked-by: Peter Xu <[email protected]>
I'm OK if it'll be touched up when merge too.
Thanks,
--
Peter Xu