On Tue, Feb 04, 2025 at 08:32:15PM +0100, Maciej S. Szmigiero wrote:
> On 4.02.2025 18:54, Peter Xu wrote:
> > On Thu, Jan 30, 2025 at 11:08:40AM +0100, Maciej S. Szmigiero wrote:
> > > +static int multifd_device_state_save_thread(void *opaque)
> > > +{
> > > + struct MultiFDDSSaveThreadData *data = opaque;
> > > + int ret;
> > > +
> > > + ret = data->hdlr(data->idstr, data->instance_id, &send_threads_abort,
> > > + data->handler_opaque);
> >
> > I thought we discussed somewhere and the plan was we could use Error** here
> > to report errors. Would that still make sense, or maybe I lost some
> > context?
>
> That was about *load* threads, here these are *save* threads.
Ah OK.
>
> Save handlers do not return an Error value, neither save_live_iterate, nor
> save_live_complete_precopy or save_state does so.
Let's try to make new APIs work with Error* if possible.
>
> > Meanwhile, I still feel uneasy on having these globals (send_threads_abort,
> > send_threads_ret). Can we make MultiFDDSSaveThreadData the only interface
> > between migration and the threads impl? So I wonder if it can be:
> >
> > ret = data->hdlr(data);
> >
> > With extended struct like this (I added thread_error and thread_quit):
> >
> > struct MultiFDDSSaveThreadData {
> > SaveLiveCompletePrecopyThreadHandler hdlr;
> > char *idstr;
> > uint32_t instance_id;
> > void *handler_opaque;
> > /*
> > * Should be NULL when struct passed over to thread, the thread should
> > * set this if the handler would return false. It must be kept NULL if
> > * the handler returned true / success.
> > */
> > Error *thread_error;
>
> As I mentioned above, these handlers do not generally return Error type,
> so this would need to be an *int;
>
> > /*
> > * Migration core would set this when it wants to notify thread to
> > * quit, for example, when error occured in other threads, or
> > migration is
> > * cancelled by the user.
> > */
> > bool thread_quit;
>
> ^ I guess that was supposed to be a pointer too (*thread_quit).
It's my intention to make this bool, to make everything managed per-thread.
It's actually what we do with multifd, these are a bunch of extra threads
to differeciate from the "IO threads" / "multifd threads".
>
> > };
> >
> > Then if any multifd_device_state_save_thread() failed, for example, it
> > should notify all threads to quit by setting thread_quit, instead of
> > relying on yet another global variable to show migration needs to quit.
>
> multifd_abort_device_state_save_threads() needs to access
> send_threads_abort too.
This may need to become something like:
QLIST_FOREACH() {
MultiFDDSSaveThreadData *data = ...;
data->thread_quit = true;
}
We may want to double check qmp 'migrate_cancel' will work when save
threads are running, but this can also be done for later.
>
> And multifd_join_device_state_save_threads() needs to access
> send_threads_ret.
Then this one becomes:
thread_pool_wait(send_threads);
QLIST_FOREACH() {
MultiFDDSSaveThreadData *data = ...;
if (data->thread_error) {
return false;
}
}
return true;
>
> These variables ultimately will have to be stored somewhere since
> there can be multiple save threads and so multiple instances of
> MultiFDDSSaveThreadData.
>
> So these need to be stored somewhere where
> multifd_spawn_device_state_save_thread() can reach them to assign
> their addresses to MultiFDDSSaveThreadData members.
Then multifd_spawn_device_state_save_thread() will need to manage the
qlist, making sure migration core remembers what jobs it submitted. It
sounds good to have that bookkeeping when I think about it, instead of
throw the job to the thread pool and forget it..
>
> However, at that point multifd_device_state_save_thread() can
> access them too so it does not need to have them passed via
> MultiFDDSSaveThreadData.
>
> However, nothing prevents putting send_threads* variables
> into a global struct (with internal linkage - "static", just as
> these separate ones are) if you like such construct more.
This should be better than the current global vars indeed, but less
favoured if the per-thread way could work above.
Thanks,
--
Peter Xu