multifd: Move payload storage out of the channel parameters

Peter Xu Thu, 11 Jul 2024 13:28:28 -0700

On Thu, Jul 11, 2024 at 04:37:34PM -0300, Fabiano Rosas wrote:

[...]


> We also don't flush the iov at once, so f->buf seems redundant to
> me. But of course, if we touch any of that we must ensure we're not
> dropping any major optimization.

Yes some tests over that would be more persuasive when it comes.

Per my limited experience in the past few years: memcpy on chips nowadays
is pretty cheap.  You'll see very soon one more example of that when you
start to look at the qatzip series: that series decided to do one more
memcpy for all guest pages, to make it a larger chunk of buffer instead of
submitting the compression tasks in 4k chunks (while I thought 4k wasn't
too small itself).

That may be more involved so may not be a great example (e.g. the
compression algo can be special in this case where it just likes larger
buffers), but it's not uncommon that I see people trade things with memcpy,
especially small buffers.

[...]

> Any piece of code that fills an iov with data is prone to be able to
> send that data through multifd. From this perspective, multifd is just a
> way to give work to an iochannel. We don't *need* to use it, but it
> might be simple enough to the point that the benefit of ditching
> QEMUFile can be reached without too much rework.
> 
> Say we provision multifd threads early and leave them waiting for any
> part of the migration code to send some data. We could have n-1 threads
> idle waiting for the bulk of the data and use a single thread for any
> early traffic that does not need to be parallel.
> 
> I'm not suggesting we do any of this right away or even that this is the
> correct way to go, I'm just letting you know some of my ideas and why I
> think ram + device state might not be the only data we put through
> multifd.

We can wait and see whether that can be of any use in the future, even if
so, we still have chance to add more types into the union, I think.  But
again, I don't expect.

My gut feeling: we shouldn't bother putting any (1) non-huge-chunk, or (2)
non-IO, data onto multifd.  Again, I would ask "why not the main channel",
otherwise.

[...]

> Just to be clear, do you want a thread-pool to replace multifd? Or would
> that be only used for concurrency on the producer side?

Not replace multifd.  It's just that I was imagining multifd threads only
manage IO stuff, nothing else.

I was indeed thinking whether we can reuse multifd threads, but then I
found there's risk mangling these two concepts, as: when we do more than IO
in multifd threads (e.g., talking to VFIO kernel fetching data which can
block), we have risk of blocking IO even if we can push more so the NICs
can be idle again.  There's also the complexity where the job fetches data
from VFIO kernel and want to enqueue again, it means an multifd task can
enqueue to itself, and circular enqueue can be challenging: imagine 8
concurrent tasks (with a total of 8 multifd threads) trying to enqueue at
the same time; they hunger themselves to death.  Things like that.  Then I
figured the rest jobs are really fn(void*) type of things; they should
deserve their own pool of threads.

So the VFIO threads (used to be per-device) becomes migration worker
threads, we need them for both src/dst: on dst there's still pending work
to apply the continuous VFIO data back to the kernel driver, and that can't
be done by multifd thread too due to similar same reason.  Then those dest
side worker threads can also do load() not only for VFIO but also other
device states if we can add more.

So to summary, we'll have:

  - 1 main thread (send / recv)
  - N multifd threads (IOs only)
  - M worker threads (jobs only)

Of course, postcopy not involved..  How's that sound?

-- 
Peter Xu

Re: [RFC PATCH 6/7] migration/multifd: Move payload storage out of the channel parameters

Reply via email to