On Tue, Jul 23, 2024 at 02:48:48PM -0300, Fabiano Rosas wrote:
> Peter Xu <pet...@redhat.com> writes:
> 
> > On Mon, Jul 22, 2024 at 06:20:28PM -0300, Fabiano Rosas wrote:
> >> Peter Xu <pet...@redhat.com> writes:
> >> 
> >> > On Mon, Jul 22, 2024 at 05:21:48PM -0300, Fabiano Rosas wrote:
> >> >> Peter Xu <pet...@redhat.com> writes:
> >> >> 
> >> >> > On Mon, Jul 22, 2024 at 02:59:05PM -0300, Fabiano Rosas wrote:
> >> >> >> Hi,
> >> >> >> 
> >> >> >> In this v2 I took Peter's suggestion of keeping the channels' 
> >> >> >> pointers
> >> >> >> and moving only the extra slot. The major changes are in patches 5 
> >> >> >> and
> >> >> >> 9. Patch 3 introduces the structure:
> >> >> >> 
> >> >> >> typedef enum {
> >> >> >>     MULTIFD_PAYLOAD_NONE,
> >> >> >>     MULTIFD_PAYLOAD_RAM,
> >> >> >> } MultiFDPayloadType;
> >> >> >> 
> >> >> >> struct MultiFDSendData {
> >> >> >>     MultiFDPayloadType type;
> >> >> >>     union {
> >> >> >>         MultiFDPages_t ram;
> >> >> >>     } u;
> >> >> >> };
> >> >> >> 
> >> >> >> I added a NONE type so we can use it to tell when the channel has
> >> >> >> finished sending a packet, since we'll need to switch types between
> >> >> >> clients anyway. This avoids having to introduce a 'size', or 'free'
> >> >> >> variable.
> >> >> >
> >> >> > This at least looks better to me, thanks.
> >> >> >
> >> >> >> 
> >> >> >> WHAT'S MISSING:
> >> >> >> 
> >> >> >> - The support for calling multifd_send() concurrently. Maciej has 
> >> >> >> this
> >> >> >>   in his series so I didn't touch it.
> >> >> >> 
> >> >> >> - A way of adding methods for the new payload type. Currently, the
> >> >> >>   compression methods are somewhat coupled with ram migration, so I'm
> >> >> >>   not sure how to proceed.
> >> >> >
> >> >> > What is this one?  Why compression methods need new payload?  Aren't 
> >> >> > they
> >> >> > ram-typed?
> >> >> 
> >> >> The data we transport is MultiFDPages_t, yes, but the MultiFDMethods are
> >> >> either nocomp, or the compression-specific methods
> >> >> (e.g. zlib_send_prepare).
> >> >> 
> >> >> How do we add methods for the upcoming new payload types? I don't expect
> >> >> us to continue using nocomp and then do "if (ram)... else if
> >> >> (device_state) ..." inside of them. I would expect us to rename
> >> >> s/nocomp/ram/ and add a new set of MultiFDMethods for the new data type
> >> >> (e.g. vfio_send_prepare, vmstate_send_prepare, etc).
> >> >> 
> >> >> multifd_nocomp_ops -> multifd_ram_ops // rename
> >> >> multifd_zlib_ops   // existing
> >> >> multifd_device_ops // new
> >> >> 
> >> >> The challenge here is that the current framework is nocomp
> >> >> vs. compression. It needs to become ram + compression vs. other types.
> >> >
> >> > IMHO we can keep multifd_ops[] only for RAM.  There's only send_prepare()
> >> > that device state will need, and so far it's only (referring Maciej's
> >> > code):
> >> >
> >> > static int nocomp_send_prepare_device_state(MultiFDSendParams *p,
> >> >                                             Error **errp)
> >> > {
> >> >     multifd_send_prepare_header_device_state(p);
> >> >
> >> >     assert(!(p->flags & MULTIFD_FLAG_SYNC));
> >> >
> >> >     p->next_packet_size = p->device_state->buf_len;
> >> >     if (p->next_packet_size > 0) {
> >> >         p->iov[p->iovs_num].iov_base = p->device_state->buf;
> >> >         p->iov[p->iovs_num].iov_len = p->next_packet_size;
> >> >         p->iovs_num++;
> >> >     }
> >> >
> >> >     p->flags |= MULTIFD_FLAG_NOCOMP | MULTIFD_FLAG_DEVICE_STATE;
> >> >
> >> >     multifd_send_fill_packet_device_state(p);
> >> >
> >> >     return 0;
> >> > }
> >> >
> >> > None of other multifd_ops are used.
> >> 
> >> There's also a conditional around device_state when calling
> >> ->recv(). That could seems like it could go to a hook.
> >> 
> >> https://lore.kernel.org/r/41dedaf2c9abebb5e45f88c052daa26320715a92.1718717584.git.maciej.szmigi...@oracle.com
> >
> > Actually that's exactly what I think is right.. it looks to me now that we
> > could bypass anything in MultifdOps (including recv()) but let device state
> > be a parallel layer of MultifdOps itself, leaving MultifdOps only for
> > compressors.
> >
> > And yeah, I still remember you just renamed it from recv_pages() to
> > recv()..  it's just that now when think it again it looks like cleaner to
> > make it only about pages..
> >
> >> 
> >> >
> >> > I think we can directly invoke this part of device state code in
> >> > multifd_send_thread() for now.  So far I think it should be ok.
> >> 
> >> It's not just that. There's also a check for "if (ram)" at every call to
> >> multifd_ops to avoid calling the ram code when doing the device
> >> migration. It would be way easier to just set noop functions for those.
> >> 
> >> static MultiFDMethods multifd_devstate_ops = {
> >>     .send_setup = noop_send_setup,
> >>     .send_cleanup = noop_send_cleanup,
> >>     .send_prepare = devstate_send_prepare,
> >>     .recv_setup = noop_recv_setup,
> >>     .recv_cleanup = noop_recv_cleanup,
> >>     .recv = devstate_recv
> >> };
> >> 
> >> I'm not saying this needs to be done in this series though. But I do
> >> think that's the correct design choice for the long term.
> >
> > Yes it should be separate.
> >
> > And what I meant is we don't need all these noops, but recv() keeps being
> > ignored just like above, then for sender side, right now it's:
> >
> >             ret = multifd_send_state->ops->send_prepare(p, &local_err);
> >             if (migrate_mapped_ram()) {
> >                 file_write_ramblock_iov();
> >             } else {
> >                 ret = qio_channel_writev_full_all();
> >             }
> >
> > VFIO can process device state in parallel, so:
> >
> >     if (ram) {
> >         ret = multifd_send_state->ops->send_prepare(p, &local_err);
> >         if (migrate_mapped_ram()) {
> >                 file_write_ramblock_iov();
> >         } else {
> >                 qio_channel_writev_full_all();
> >         }
> >     } else {
> >         // device state handling
> >         multifd_send_device_prepare(...);
> >         ...
> >         qio_channel_writev_full_all();
> >     }
> >
> > Then MultifdOps doesn't apply to device states.
> 
> To avoid getting into bikeshed territory, I think we should postpone
> this discussion until after Maciej's series is merged, so we can speak
> more concretely about the implications. It's easy enough to go from your
> suggestion to mine than the other way around, so let's leave at that.
> 
> I had it already written, so more of my reasoning below, if you're
> interested.

I never thought this is bikeshedding.. What we're discussing now is exactly
what should appear in Maciej's code, am I right?  I thought we should
figure it out before it's merged, if that's the case..

And whose suggestion isn't that important, IMO.  We simply try to discuss
this technically and reach a consensus.. no matter who proposed what.

> ======
> 
> We already have the send/recv threads structured in relation to what we
> do inside the hooks. You're just defining a function that's not a hook,
> but it has the same signature and responsibilities and needs to be
> called at the same moment.
> 
> I think the dissonance here is that you don't see the multifd thread
> code and the payloads (ram, device) as separate layers. Payload-specific
> code should not be at top level. Otherwise, it breaks any semblance of
> proper layering:
> 
> - payload code will have access to MultiFD*Params, which has a bunch of
>   control variables for the loop, the semaphores, etc. that should not
>   be touched;
> 
> - payload code ends up influencing the flow of the thread
>   function. E.g. when zero_copy_send used to dictate whether we'd have
>   separate IO for the packet or not.
> 
> - temporary variables needed by the payload code will have to be
>   declared inside the thread funcion, which makes tempting to use them
>   across payload types and also in the thread code itself;
> 
> - it creates doubt as to whether new changes go inside the hooks, in the
>   if/else or outside of it;
> 
> Think about how easy it has has been to review and merge the various
> compression features we had. It doesn't matter how much they mess up
> inside the hooks, it will never cause the dreaded "Memory content
> inconsistency at ..." error from check_guest_ram(). At least not in a
> way that affects other people. Now compare that with for instance the
> zero-page work, or even mapped-ram, that required a bunch of changes to
> the multifd control flow itself (e.g. all of the sync changes w/
> mapped-ram).

I think there's one issue where we only support one MultiFDMethods as of
now to be active, while the "clients" of multifd can be >1 from payload
POV.  It means I'm not sure how VFIO can provide a MultiFDMethods if it
will overwrite what should be there to define how to process RAM..

Then, we should logically allow VFIO migration to happen with RAM being
compressed with ZSTD/ZLIB/whatever, right?  The question is which
MultiFDMethods we should assign if they're the same layer in this case..

-- 
Peter Xu


Reply via email to