Re: [PATCH v4 19/33] migration: Add save_live_complete_precopy_thread handler

Peter Xu Thu, 06 Feb 2025 14:17:58 -0800

On Thu, Feb 06, 2025 at 12:41:50PM +0100, Maciej S. Szmigiero wrote:
> On 5.02.2025 16:55, Peter Xu wrote:
> > On Wed, Feb 05, 2025 at 12:53:21PM +0100, Maciej S. Szmigiero wrote:
> > > On 4.02.2025 21:34, Peter Xu wrote:
> > > > On Tue, Feb 04, 2025 at 08:32:15PM +0100, Maciej S. Szmigiero wrote:
> > > > > On 4.02.2025 18:54, Peter Xu wrote:
> > > > > > On Thu, Jan 30, 2025 at 11:08:40AM +0100, Maciej S. Szmigiero wrote:
> > > > > > > +static int multifd_device_state_save_thread(void *opaque)
> > > > > > > +{
> > > > > > > +    struct MultiFDDSSaveThreadData *data = opaque;
> > > > > > > +    int ret;
> > > > > > > +
> > > > > > > +    ret = data->hdlr(data->idstr, data->instance_id, 
> > > > > > > &send_threads_abort,
> > > > > > > +                     data->handler_opaque);
> > > > > > 
> > > > > > I thought we discussed somewhere and the plan was we could use 
> > > > > > Error** here
> > > > > > to report errors.  Would that still make sense, or maybe I lost some
> > > > > > context?
> > > > > 
> > > > > That was about *load* threads, here these are *save* threads.
> > > > 
> > > > Ah OK.
> > > > 
> > > > > 
> > > > > Save handlers do not return an Error value, neither 
> > > > > save_live_iterate, nor
> > > > > save_live_complete_precopy or save_state does so.
> > > > 
> > > > Let's try to make new APIs work with Error* if possible.
> > > 
> > > Let's assume that these threads return an Error object.
> > > 
> > > What's qemu_savevm_state_complete_precopy_iterable() supposed to do with 
> > > it?
> > 
> > IIUC it's not about qemu_savevm_state_complete_precopy_iterable() in this
> > context, as the Error* will be used in one of the thread of the pool, not
> > migration thread.
> > 
> > The goal is to be able to set Error* with migrate_set_error(), so that when
> > migration failed, query-migrate can return the error to libvirt, so
> > migration always tries to remember the 1st error hit if ever possible.
> > 
> > It's multifd_device_state_save_thread() to do migrate_set_error(), not in
> > migration thread.  qemu_savevm_state_complete_*() are indeed not ready to
> > pass Errors, but it's not in the discussed stack.
> 
> I understand what are you proposing now - you haven't written about using
> migrate_set_error() for save threads earlier, just about returning an Error
> object.
> 
> While this might work it has tendency to uncover errors in other parts of
> the migration core - much as using it in the load threads case uncovered
> the TLS session error.


Yes, hitting the tls issue is unfortunate, thanks for finding the bug.  I'm
ok if we have any workaround for that that is easily revertable, then we
fix it later.  Or if Fabiano's series can land earlier.  To me it seems
easier you stick with Fabiano's series and focus on the rest of the patches.

If we hit another bug, it's more unfortunate, but imho there's not much we
can do but try to fix all of them..

> 
> (Speaking of which, could you please respond to the issue at the bottom of
> this message from 2 days ago?:
> https://lore.kernel.org/qemu-devel/150a9741-daab-4724-add0-f35257e86...@maciej.szmigiero.name/
> It is blocking rework of the TLS session EOF handling in this patch set.
> Thanks.)
> 
> But I can try this migrate_set_error() approach here and see if something
> breaks.
> 
> (..)
> > > > > 
> > > > > > Meanwhile, I still feel uneasy on having these globals 
> > > > > > (send_threads_abort,
> > > > > > send_threads_ret).  Can we make MultiFDDSSaveThreadData the only 
> > > > > > interface
> > > > > > between migration and the threads impl?  So I wonder if it can be:
> > > > > > 
> > > > > >      ret = data->hdlr(data);
> > > > > > 
> > > > > > With extended struct like this (I added thread_error and 
> > > > > > thread_quit):
> > > > > > 
> > > > > > struct MultiFDDSSaveThreadData {
> > > > > >        SaveLiveCompletePrecopyThreadHandler hdlr;
> > > > > >        char *idstr;
> > > > > >        uint32_t instance_id;
> > > > > >        void *handler_opaque;
> > > > > >        /*
> > > > > >         * Should be NULL when struct passed over to thread, the 
> > > > > > thread should
> > > > > >         * set this if the handler would return false.  It must be 
> > > > > > kept NULL if
> > > > > >         * the handler returned true / success.
> > > > > >         */
> > > > > >        Error *thread_error;
> > > > > 
> > > > > As I mentioned above, these handlers do not generally return Error 
> > > > > type,
> > > > > so this would need to be an *int;
> > > > > 
> > > > > >        /*
> > > > > >         * Migration core would set this when it wants to notify 
> > > > > > thread to
> > > > > >         * quit, for example, when error occured in other threads, 
> > > > > > or migration is
> > > > > >         * cancelled by the user.
> > > > > >         */
> > > > > >        bool thread_quit;
> > > > > 
> > > > >               ^ I guess that was supposed to be a pointer too 
> > > > > (*thread_quit).
> > > > 
> > > > It's my intention to make this bool, to make everything managed 
> > > > per-thread.
> > > 
> > > But that's unnecessary since this flag is common to all these threads.
> > 
> > One bool would be enough, but you'll need to export another API for VFIO to
> > use otherwise.  I suppose that's ok too.
> > 
> > Some context of multifd threads and how that's done there..
> > 
> > We started with one "quit" per thread struct, but then we switched to one
> > bool exactly as you said, see commit 15f3f21d598148.
> > 
> > If you want to stick with one bool, it's okay too, you can export something
> > similar in misc.h, e.g. multifd_device_state_save_thread_quitting(), then
> > we can avoid passing in the "quit" either as handler parameter, or
> > per-thread flag.
> 
> Of course I can "export" this flag via a getter function rather than passing
> it as a parameter to SaveLiveCompletePrecopyThreadHandler.
> 
> > > 
> > > > It's actually what we do with multifd, these are a bunch of extra 
> > > > threads
> > > > to differeciate from the "IO threads" / "multifd threads".
> > > > 
> > > > > 
> > > > > > };
> > > > > > 
> > > > > > Then if any multifd_device_state_save_thread() failed, for example, 
> > > > > > it
> > > > > > should notify all threads to quit by setting thread_quit, instead of
> > > > > > relying on yet another global variable to show migration needs to 
> > > > > > quit.
> > > > > 
> > > > > multifd_abort_device_state_save_threads() needs to access
> > > > > send_threads_abort too.
> > > > 
> > > > This may need to become something like:
> > > > 
> > > >     QLIST_FOREACH() {
> > > >         MultiFDDSSaveThreadData *data = ...;
> > > >         data->thread_quit = true;
> > > >     }
> > > 
> > > At the most basic level that's turning O(1) operation into O(n).
> > > 
> > > Besides, it creates a question now who now owns these 
> > > MultiFDDSSaveThreadData
> > > structures - they could be owned by either thread pool or the
> > > multifd_device_state code.
> > 
> > I think it should be owned by migration, and with this idea it will need to
> > be there until waiting thread pool completing their works, so migration
> > core needs to free them.
> > 
> > > 
> > > Currently the ownership is simple - the multifd_device_state code
> > > allocates such per-thread structure in 
> > > multifd_spawn_device_state_save_thread()
> > > and immediately passes its ownership to the thread pool which
> > > takes care to free it once it no longer needs it.
> > 
> > Right, this is another reason why I think having migration owing these
> > structs is better.  We used to have task dangling issues when we shift
> > ownership of something to mainloop then we lose track of them (e.g. on TLS
> > handshake gsources).  Those are pretty hard to debug when hanged, because
> > migration core has nothing to link to the hanged tasks again anymore.
> > 
> > I think we should start from having migration core being able to reach
> > these thread-based tasks when needed.  Migration also have control of the
> > thread pool, then it would be easier.  Thread pool is so far simple so we
> > may still need to be able to reference to per-task info separately.
> 
> These are separate threads, so they are are pretty easy to identify
> in a debugger or a core dump.
> 
> Also, one can access them via the thread pool pointer if absolutely
> necessary.
> 
> If QMP introspection ever becomes necessary then it could be simply
> built into the generic thread pool itself.
> Then all thread pool consumers will benefit from it.

Having full control of tasks assigned makes sure it'll be manageable even
if num_tasks > num_threads.  And it's manageable when another thread wants
to do anything with a task.  I know it's always tasks==threads for now but
I think we shouldn't assume it like that when designing the api.

If we'll have one bool showing "we need to quit all tasks", then indeed I
don't see a major concern with not maintaining per-task infos.  In
qmp_cancel logically we should set that bool.

To me, I want to avoid any form of unnecessary struggles when e.g. main
thread wants to access the tasks, then UAFs racing with thread pool freeing
a task info struct or things like that.  For now, I'm OK not having it.
The bool should work.  We can leave that for later.

> 
> > > 
> > > Now, with the list implementation if the thread pool were to free
> > > that MultiFDDSSaveThreadData it would also need to release it from
> > > the list.
> > > 
> > > Which in turn would need appropriate locking around this removal
> > > operation and probably also each time the list is iterated over.
> > > 
> > > On the other hand if the multifd_device_state code were to own
> > > that MultiFDDSSaveThreadData then it would linger around until
> > > multifd_device_state_send_cleanup() cleans it up even though its
> > > associated thread might be long gone.
> > 
> > Do you see a problem with it?  It sounds good to me actually.. and pretty
> > easy to understand.
> > 
> > So migration creates these MultiFDDSSaveThreadData, then create threads to
> > enqueue then, then wait for all threads to complete, then free these
> > structs.
> 
> One of the benefits of using a thread pool is that it can abstract
> memory management away by taking ownership of the data pointed to by
> the passed thread opaque pointer (via the passed GDestroyNotify).
> 
> I don't see a benefit of re-implementing this also in the migration
> code (returning an Error object does *not* require such approach).
> 
> > > 
> > > > We may want to double check qmp 'migrate_cancel' will work when save
> > > > threads are running, but this can also be done for later.
> > > 
> > > > > 
> > > > > And multifd_join_device_state_save_threads() needs to access
> > > > > send_threads_ret.
> > > > 
> > > > Then this one becomes:
> > > > 
> > > >     thread_pool_wait(send_threads);
> > > >     QLIST_FOREACH() {
> > > >         MultiFDDSSaveThreadData *data = ...;
> > > >         if (data->thread_error) {
> > > >            return false;
> > > >         }
> > > >     }
> > > >     return true;
> > > 
> > > Same here, having a common error return would save us from having
> > > to iterate over a list (or having a list in the first place).
> > 
> > IMHO perf isn't an issue here. It's slow path, threads num is small, loop
> > is cheap.  I prefer prioritize cleaness in this case.
> > 
> > Otherwise any suggestion we could report an Error* in the threads?
> 
> Using Error doesn't need a list, load threads return an Error object
> just fine without it:
> >     if (!data->function(data->opaque, &mis->load_threads_abort, 
> > &local_err)) {
> >         MigrationState *s = migrate_get_current();
> > 
> >         assert(local_err);
> > 
> >         /*
> >          * In case of multiple load threads failing which thread error
> >          * return we end setting is purely arbitrary.
> >          */
> >         migrate_set_error(s, local_err);
> >     }
> > 
> 
> Same can be done for save threads here (with the caveat of migrate_set_error()
> uncovering possible other errors that I mentioned earlier).
> 
> > > 
> > > > > 
> > > > > These variables ultimately will have to be stored somewhere since
> > > > > there can be multiple save threads and so multiple instances of
> > > > > MultiFDDSSaveThreadData.
> > > > > 
> > > > > So these need to be stored somewhere where
> > > > > multifd_spawn_device_state_save_thread() can reach them to assign
> > > > > their addresses to MultiFDDSSaveThreadData members.
> > > > 
> > > > Then multifd_spawn_device_state_save_thread() will need to manage the
> > > > qlist, making sure migration core remembers what jobs it submitted.  It
> > > > sounds good to have that bookkeeping when I think about it, instead of
> > > > throw the job to the thread pool and forget it..
> > > 
> > > It's not "forgetting" about the job but rather letting thread pool
> > > manage it - I think thread pool was introduced so these details
> > > (thread management) are abstracted from the migration code.
> > > Now they would be effectively duplicated in the migration code.
> > 
> > Migration is still managing those as long as you have send_threads_abort,
> > isn't it?  The thread pool doesn't yet have an API to say "let's quit all
> > the tasks", otherwise I'm OK too to use the pool API instead of having
> > thread_quit.
> 
> The migration code does not manage each thread separately.
> 
> It manages them as a pool, and does each operation (wait, abort)
> on the pool itself (either literally via ThreadPool or by setting
> a variable that's shared by all threads).
> 
> > > 
> > > > > 
> > > > > However, at that point multifd_device_state_save_thread() can
> > > > > access them too so it does not need to have them passed via
> > > > > MultiFDDSSaveThreadData.
> > > > > 
> > > > > However, nothing prevents putting send_threads* variables
> > > > > into a global struct (with internal linkage - "static", just as
> > > > > these separate ones are) if you like such construct more.
> > > > 
> > > > This should be better than the current global vars indeed, but less
> > > > favoured if the per-thread way could work above.
> > > 
> > > You still need that list to be a global variable,
> > > so it's the same amount of global variables as just putting
> > > the existing variables in a struct (which could be even allocated
> > > in multifd_device_state_send_setup() and deallocated in
> > > multifd_device_state_send_cleanup() for extra memory savings).
> > 
> > Yes this works for me.
> > 
> > I think you got me wrong on "not allowing to introduce global variables".
> > I'm OK with it, but please still consider..
> > 
> >    - Put it under some existing global object rather than having separate
> >      global variables all over the places..
> > 
> >    - Having Error reports
> 
> Ok.
> 
> > And I still think we can change:
> > 
> > typedef int (*SaveLiveCompletePrecopyThreadHandler)(char *idstr,
> >                                                      uint32_t instance_id,
> >                                                      bool *abort_flag,
> >                                                      void *opaque);
> > 
> > To:
> > 
> > typedef int 
> > (*SaveLiveCompletePrecopyThreadHandler)(MultiFDDSSaveThreadData*);
> > 
> > No matter what.
> 
> We can do that, although this requires "exporting" the MultiFDDSSaveThreadData
> type.

I'm OK with that.

I think it's fair to export it because that's the interface for
outside-migration.

> 
> > > 
> > > These variables are having internal linkage limited to (relatively
> > > small) multifd-device-state.c, so it's not like they are polluting
> > > namespace in some major migration translation unit.
> > 
> > If someone proposes to introduce 100 global vars in multifd-device-state.c,
> > I'll strongly stop that.
> > 
> > If it's one global var, I'm OK.
> > 
> > What if it's 5?
> > 
> > ===8<===
> > static QemuMutex queue_job_mutex;
> > 
> > static ThreadPool *send_threads;
> > static int send_threads_ret;
> > static bool send_threads_abort;
> > 
> > static MultiFDSendData *device_state_send;
> > ===8<===
> > 
> > I think I should start calling a stop.  That's what happened..
> > 
> > Please consider introducing something like multifd_send_device_state so we
> > can avoid anyone in the future randomly add static global vars.
> 
> As I wrote before, I will pack it all into one global variable,
> could be called multifd_send_device_state as you suggest.
> 
> > > 
> > > Taking into consideration having to manage an extra data structure
> > > (list), needing more code to do so, having worse algorithms I don't
> > > really see a point of using that list.
> > > 
> > > (This is orthogonal to whether the thread return type is changed to
> > > Error which could be easily done on the existing save threads pool
> > > implementation).
> > 
> > My bet is changing to list is as easy (10-20 LOC?).  If not, I can try to
> > provide the diff on top of your patch.
> > 
> > I'm also not strictly asking for a list, but anything that makes the API
> > cleaner (less globals, better error reports, etc.).
> 
> I just think introducing that list is a step back due to reasons I described
> above.
> 
> And its not actually necessary for returning an Error code.

I'm ok with having no list, as long as the rest are addressed.

Thanks,

-- 
Peter Xu

Re: [PATCH v4 19/33] migration: Add save_live_complete_precopy_thread handler

Reply via email to