multifd: Device state transfer support - send side

Peter Xu Fri, 13 Sep 2024 07:20:35 -0700

On Fri, Sep 13, 2024 at 10:21:39AM -0300, Fabiano Rosas wrote:
> Peter Xu <pet...@redhat.com> writes:
> 
> > On Thu, Sep 12, 2024 at 03:43:39PM -0300, Fabiano Rosas wrote:
> >> Peter Xu <pet...@redhat.com> writes:
> >> 
> >> Hi Peter, sorry if I'm not very enthusiastic by this, I'm sure you
> >> understand the rework is a little frustrating.
> >
> > That's OK.
> >
> > [For some reason my email sync program decided to give up working for
> >  hours.  I got more time looking at a tsc bug, which is good, but then I
> >  miss a lot of emails..]
> >
> >> 
> >> > On Wed, Aug 28, 2024 at 09:41:17PM -0300, Fabiano Rosas wrote:
> >> >> > +size_t multifd_device_state_payload_size(void)
> >> >> > +{
> >> >> > +    return sizeof(MultiFDDeviceState_t);
> >> >> > +}
> >> >> 
> >> >> This will not be necessary because the payload size is the same as the
> >> >> data type. We only need it for the special case where the MultiFDPages_t
> >> >> is smaller than the total ram payload size.
> >> >
> >> > Today I was thinking maybe we should really clean this up, as the current
> >> > multifd_send_data_alloc() is indeed too tricky (blame me.. who requested
> >> > that more or less).  Knowing that VFIO can use dynamic buffers with 
> >> > ->idstr
> >> > and ->buf (I was thinking it could be buf[1M].. but I was wrong...) made
> >> > that feeling stronger.
> >> 
> >> If we're going to commit bad code and then rewrite it a week later, we
> >> could have just let the original series from Maciej merge without any of
> >> this.
> >
> > Why it's "bad code"?
> >
> > It runs pretty well, I don't think it's bad code.  You wrote it, and I
> > don't think it's bad at all.
> 
> Code that forces us to do arithmetic in order to properly allocate
> memory and comes with a big comment explaining how we're dodging
> compiler warnings is bad code in my book.
> 
> >
> > But now we're rethinking after reading Maciej's new series.
> >Personally I don't think it's a major problem.
> >
> > Note that we're not changing the design back: what was initially proposed
> > was the client submitting an array of multifd objects.  I still don't think
> > that's right.
> >
> > What the change goes so far is we make the union a struct, however that's
> > still N+2 objects not 2*N, where 2 came from RAM+VFIO.  I think the
> > important bits are still there (from your previous refactor series).
> >
> 
> You fail to appreciate that before the RFC series, multifd already
> allocated N for the pages.


It depends on how you see it, IMHO.  I think it allocates N not for the
"pages" but for the "threads", because the threads can be busy with those
buffers, no matter if it's "page" or "device data".

> The device state adds another client, so that
> would be another N anyway. The problem the RFC tried to solve was that
> multifd channels owned that 2N, so the array was added to move the
> memory into the client's ownership. IOW, it wasn't even the RFC series
> that made it 2N, that was the multifd design all along. Now in hindsight
> I don't think we should have went with the memory saving discussion.

I think I could have made that feeling that I only wanted to save memory,
if so, I'm sorry.  But do you still remember I mentioned "we can make it a
struct, too" before your series landed?  Then you think it's ok to keep
using union, and I'm ok too! At least at that time.  I don't think that's a
huge deal.  I don't think each route we go must be perfect, but we try the
best to make it as good.

I don't think any discussion must not happen.  I agree memory consumption
is not the 1st thing to worry, but I don't see why it can't be discussed.

Note that I never said we can't save those memory either - I plan to have
follow up patches (for this, after Maciej's series land.. that's why I even
didn't yet mention) to allow modules report device state buffer size.  I
just didn't say, yet, and don't plan to worry before vfio series lands.
When with that, we'll save 1M*N when no vfio device attached (but we'll
need to handle hotplug).  So I think we don't need to lose any finally.

However I think we need to get straight with the base design on how vfio
should use multifd, because it's important bit and I don't want to rework
important bits after a huge feature, if we know a better directions.

I don't even think what I proposed patch 1-3 here is a must to me, I should
be clear again here just in case we have similar discussions
afterwards.. that I'm ok with below be done after Maciej's:

  - Avoid memory allocations per-packet (done in patch 2-3)
  - Avoid unnecessary data copy (done in patch 2-3)
  - Avoid allocate device buffers when no device will use (not proposed)

But I'm not ok building everything on top of the idea of not using multifd
buffers in the current way, because that can involve a lot of changes:
that's where buffer passes from up to down or backwards, and the interface
needs change a lot too.  We already have that in master so it's not a
problem now.

> 
> >> I already suggested it a couple of times, we shouldn't be doing
> >> core refactorings underneath contributors' patches, this is too
> >> fragile. Just let people contribute their code and we can change it
> >> later.
> >
> > I sincerely don't think a lot needs changing... only patch 1.  Basically
> > patch 1 on top of your previous rework series will be at least what I want,
> > but I'm open to comments from you guys.
> 
> Don't get me wrong, I'm very much in favor of what you're doing
> here. However, I don't think it's ok to be backtracking on our design
> while other people have series in flight that depend on it. You
> certainly know the feeling of trying to merge a feature and having
> maintainers ask you to rewrite a bunch code just to be able to start
> working. That's not ideal.

I as a patch writer always like to do that when it's essential.  Normally
the case is I don't have enough reviewer resources to help me get a better
design, or discuss about it.

When vfio is the new user of multifd vfio needs to do the heavy lifting to
draft the api.

> 
> I tried to quickly insert the RFC series before the device state series
> progressed too much, but it's 3 months later and we're still discussing
> it, maybe we don't need to do it this way.

Can I get that of your feeling from when you were working on mapped-ram?
That series does take long enough, I agree.  Not so bad yet with the VFIO
series - it's good to have you around because you provide great reviews.
I'm also trying the best to not let a series dangle for more than a year.
I don't think 3 months is long with this feature: this is the 1st multifd
extrenal user (and file mapping is also in another angle), it can take some
time.

Sorry if it's so, but sorry again I don't think I get convinced: I think we
need to go this way to build blocks one by one, and we need to make sure
lower blocks are hopefully solid enough to take the upper ones.  Again I'm
ok with small things that go against it, but not major designs.  We
shouldn't go rewrite major designs if we seem to know a better one.

> 
> And ok, let's consider the current situation a special case. But I would
> like to avoid in the future this kind of uncertainty. 
> 
> >
> > Note that patch 2-3 will be on top of Maciej's changes and they're totally
> > not relevant to what we merged so far.  Hence, nothing relevant there to
> > what you worked.  And this is the diff of patch 1:
> >
> >  migration/multifd.h              | 16 +++++++++++-----
> >  migration/multifd-device-state.c |  8 ++++++--
> >  migration/multifd-nocomp.c       | 13 ++++++-------
> >  migration/multifd.c              | 25 ++++++-------------------
> >  4 files changed, 29 insertions(+), 33 deletions(-)
> >
> > It's only 33 lines removed (many of which are comments..), it's not a huge
> > lot.  I don't know why you feel so bad at this...
> >
> > It's probably because we maintain migration together, or we can keep our
> > own way of work.  I don't think we did anything wrong yet so far.
> >
> > We can definitely talk about this in next 1:1.
> >
> >> 
> >> This is also why I've been trying hard to separate core multifd
> >> functionality from migration code that uses multifd to transmit their
> >> data.
> >> 
> >> My original RFC plus the suggestion to extend multifd_ops for device
> >> state would have (almost) made it so that no client code would be left
> >> in multifd. We could have been turning this thing upside down and it
> >> wouldn't affect anyone in terms of code conflicts.
> >
> > Do you mean you preferred the 2*N approach?
> >
> 
> 2*N, where N is usually not larger than 32 and the payload size is
> 1k. Yes, I'd trade that off no problem.

I think it's a problem.

Vdpa when involved with exactly the same pattern of how vfio uses it (as
they're really alike underneath) then vdpa will need its own array of
buffers, or it'll need to take the same vfio lock which doesn't make sense
to me.

N+2, or, N+M (M is the user) is the minimum buffers we need.  N because
multifd can be worst case 100% busy on all threads occupying the buffers.
M because M users can be worst case 100% pre-filling.  It's either about
memory consumption, or about logical sensibility.

> 
> >> 
> >> The ship has already sailed, so your patches below are fine, I have just
> >> some small comments.
> >
> > I'm not sure what you meant about "ship sailed", but we should merge code
> > whenever we think is the most correct.
> 
> As you put above, I agree that the important bits of the original series
> have been preserved, but other secondary goals were lost, such as the
> more abstract separation between multifd & client code and that is the
> ship that has sailed.
> 
> That series was not: "introduce this array for no reason", we also lost
> the ability to abstract the payload from the multifd threads when we
> dropped the .alloc_fn callback for instance. The last patch you posted

I don't remember the details there, but my memory was that it was too
flexible while we seem to reach the consensus that we only process either
RAM or device, nothing else.

> here now adds multifd_device_state_prepare, somewhat ignoring that the
> ram code also has the same pattern and it could be made to use the same
> API.

I need some further elaborations to understand.

multifd_device_state_prepare currently does a few things: taking ownership
of the temp device state object, fill in idstr / instance_id, taking the
lock (so far is needed because we only have one device state object).  None
of them seems to be needed for RAM yet.

Feel free to send a rfc patch if that helps.

> 
> I did accept your premise that ram+compression is one thing while
> device_state is another, so I'm not asking it to be changed, just
> pointing out that the RFC series also addressed those issues. I might
> not have made that clear back then.
> 
> > I hope you meant after below all things look the best, or please shoot.
> > That's exactly what I'm requesting for as comments.
> 
> What you have here is certainly an improvement from the current
> state. I'm just ranting about the path we took here.

-- 
Peter Xu

Re: [PATCH v2 12/17] migration/multifd: Device state transfer support - send side

Reply via email to