On 21.06.2024 17:56, Peter Xu wrote:
On Fri, Jun 21, 2024 at 05:31:54PM +0200, Maciej S. Szmigiero wrote:
On 21.06.2024 17:04, Fabiano Rosas wrote:
"Maciej S. Szmigiero" <m...@maciej.szmigiero.name> writes:

Hi Fabiano,

On 20.06.2024 23:21, Fabiano Rosas wrote:
Hi folks,

First of all, apologies for the roughness of the series. I'm off for
the next couple of weeks and wanted to put something together early
for your consideration.

This series is a refactoring (based on an earlier, off-list
attempt[0]), aimed to remove the usage of the MultiFDPages_t type in
the multifd core. If we're going to add support for more data types to
multifd, we first need to clean that up.

This time around this work was prompted by Maciej's series[1]. I see
you're having to add a bunch of is_device_state checks to work around
the rigidity of the code.

Aside from the VFIO work, there is also the intent (coming back from
Juan's ideas) to make multifd the default code path for migration,
which will have to include the vmstate migration and anything else we
put on the stream via QEMUFile.

I have long since been bothered by having 'pages' sprinkled all over
the code, so I might be coming at this with a bit of a narrow focus,
but I believe in order to support more types of payloads in multifd,
we need to first allow the scheduling at multifd_send_pages() to be
independent of MultiFDPages_t. So here it is. Let me know what you
think.

Thanks for the patch set, I quickly glanced at these patches and they
definitely make sense to me.

(..)
(as I said, I'll be off for a couple of weeks, so feel free to
incorporate any of this code if it's useful. Or to ignore it
completely).

I guess you are targeting QEMU 9.2 rather than 9.1 since 9.1 has
feature freeze in about a month, correct?


For general code improvements like this I'm not thinking about QEMU
releases at all. But this series is not super complex, so I could
imagine we merging it in time for 9.1 if we reach an agreement.

Are you thinking your series might miss the target? Or have concerns
over the stability of the refactoring? We can within reason merge code
based on the current framework and improve things on top, we already did
something similar when merging zero-page support. I don't have an issue
with that.

The reason that I asked whether you are targeting 9.1 is because my
patch set is definitely targeting that release.

At the same time my patch set will need to be rebased/refactored on top
of this patch set if it is supposed to be merged for 9.1 too.

If this patch set gets merged quickly that's not really a problem.

On the other hand, if another iteration(s) is/are needed AND you are
not available in the coming weeks to work on them then there's a
question whether we will make the required deadline.

I think it's a bit rush to merge the vfio series in this release.  I'm not
sure it has enough time to be properly reviewed, reposted, retested, etc.

I've already started looking at it, and so far I think I have doubt not
only on agreement with Fabiano on the device_state thing which I prefer to
avoid, but also I'm thinking of any possible way to at least make the
worker threads generic too: a direct impact could be vDPA in the near
future if anyone cared, while I don't want modules to create threads
randomly during migration.

Meanwhile I'm also thinking whether that "the thread needs to dump all
data, and during iteration we can't do that" is the good reason to not
support that during iterations.

I didn't yet reply because I don't think I think all things through, but
I'll get there.

So I'm not saying that the design is problematic, but IMHO it's just not
mature enough to assume it will land in 9.1, considering it's still a large
one, and the first non-rfc version just posted two days ago.


The RFC version was posted more than 2 months ago.

It has received some review comments from multiple people,
all of which were addressed in this patch set version.

I have not received any further comments during these 2 months, so I thought
the overall design is considered okay - if anything, there might be minor
code comments/issues but these can easily be improved/fixed in the 5 weeks
remaining to the soft code freeze for 9.1.


If anything, I think that the VM live phase (non-downtime) transfers
functionality should be deferred until 9.2 because:
* It wasn't a part of the RFC so even if implemented today would get much
less testing overall,

* It's orthogonal to the switchover time device state transfer functionality
introduced by this patch set and could be added on top of that without
changing the wire protocol for switchover time device state transfers,

* It doesn't impact the switchover downtime so in this case 9.1 would
already contain all what's necessary to improve it.


Thanks,
Maciej


Reply via email to