multifd: Move payload storage out of the channel parameters

Peter Xu Fri, 19 Jul 2024 10:59:15 -0700

On Fri, Jul 19, 2024 at 01:54:37PM -0300, Fabiano Rosas wrote:
> Peter Xu <pet...@redhat.com> writes:
> 
> > On Thu, Jul 18, 2024 at 07:32:05PM -0300, Fabiano Rosas wrote:
> >> Peter Xu <pet...@redhat.com> writes:
> >> 
> >> > On Thu, Jul 18, 2024 at 06:27:32PM -0300, Fabiano Rosas wrote:
> >> >> Peter Xu <pet...@redhat.com> writes:
> >> >> 
> >> >> > On Thu, Jul 18, 2024 at 04:39:00PM -0300, Fabiano Rosas wrote:
> >> >> >> v2 is ready, but unfortunately this approach doesn't work. When 
> >> >> >> client A
> >> >> >> takes the payload, it fills it with it's data, which may include
> >> >> >> allocating memory. MultiFDPages_t does that for the offset. This 
> >> >> >> means
> >> >> >> we need a round of free/malloc at every packet sent. For every client
> >> >> >> and every allocation they decide to do.
> >> >> >
> >> >> > Shouldn't be a blocker?  E.g. one option is:
> >> >> >
> >> >> >     /* Allocate both the pages + offset[] */
> >> >> >     MultiFDPages_t *pages = g_malloc0(sizeof(MultiFDPages_t) +
> >> >> >                                       sizeof(ram_addr_t) * n, 1);
> >> >> >     pages->allocated = n;
> >> >> >     pages->offset = &pages[1];
> >> >> >
> >> >> > Or.. we can also make offset[] dynamic size, if that looks less 
> >> >> > tricky:
> >> >> >
> >> >> > typedef struct {
> >> >> >     /* number of used pages */
> >> >> >     uint32_t num;
> >> >> >     /* number of normal pages */
> >> >> >     uint32_t normal_num;
> >> >> >     /* number of allocated pages */
> >> >> >     uint32_t allocated;
> >> >> >     RAMBlock *block;
> >> >> >     /* offset of each page */
> >> >> >     ram_addr_t offset[0];
> >> >> > } MultiFDPages_t;
> >> >> 
> >> >> I think you missed the point. If we hold a pointer inside the payload,
> >> >> we lose the reference when the other client takes the structure and puts
> >> >> its own data there. So we'll need to alloc/free everytime we send a
> >> >> packet.
> >> >
> >> > For option 1: when the buffer switch happens, MultiFDPages_t will switch 
> >> > as
> >> > a whole, including its offset[], because its offset[] always belong to 
> >> > this
> >> > MultiFDPages_t.  So yes, we want to lose that *offset reference together
> >> > with MultiFDPages_t here, so the offset[] always belongs to one single
> >> > MultiFDPages_t object for its lifetime.
> >> 
> >> MultiFDPages_t is part of MultiFDSendData, it doesn't get allocated
> >> individually:
> >> 
> >> struct MultiFDSendData {
> >>     MultiFDPayloadType type;
> >>     union {
> >>         MultiFDPages_t ram_payload;
> >>     } u;
> >> };
> >> 
> >> (and even if it did, then we'd lose the pointer to ram_payload anyway -
> >> or require multiple free/alloc)
> >
> > IMHO it's the same.
> >
> > The core idea is we allocate a buffer to put MultiFDSendData which may
> > contain either Pages_t or DeviceState_t, and the size of the buffer should
> > be MAX(A, B).
> >
> 
> Right, but with your zero-length array proposals we need to have a
> separate allocation for MultiFDPages_t because to expand the array we
> need to include the number of pages.


We need to fetch the max size we need and allocate one object covers all
the sizes we need.  I sincerely don't understand why it's an issue..

> 
> Also, don't think only about MultiFDPages_t. With this approach we
> cannot have pointers to memory allocated by the client at all anywhere
> inside the union. Every pointer needs to have another reference
> somewhere else to ensure we don't leak it. That's an unnecessary
> restriction.

So even if there can be multiple pointers we can definitely play the same
trick that we allocate object A+B+C+D in the same chunk and let A->b points
to B, A->c points to C, and so on.

Before that, my question is do we really need that.

For device states, AFAIU it'll always be an opaque buffer..  VFIO needs
that, vDPA probably the same, and for VMSDs it'll be a temp buffer to put
the VMSD dump.

For multifd, I used offset[0] just to make sure things like "dynamic sized
multifd buffers" will easily work without much changes.  Or even we could
have this, afaict:

#define MULTIFD_PAGES_PER_PACKET (128)

typedef struct {
    /* number of used pages */
    uint32_t num;
    /* number of normal pages */
    uint32_t normal_num;
    /* number of allocated pages */
    uint32_t allocated;
    RAMBlock *block;
    /* offset of each page */
    ram_addr_t offset[MULTIFD_PAGES_PER_PACKET];
} MultiFDPages_t;

It might change perf on a few archs where psize is not 4K, but I don't see
it a huge deal, personally.

Then everything will have no pointers, and it can be even slightly faster
because we use 64B cachelines in most systems nowadays, and one indirect
pointer may always need a load on a new cacheline otherwise..

This whole cacheline thing is trivial.  What I worried that you worry too
much on that flexibility that we may never need.

And even with that flexibilty I don't understand why you don't like
allocating an object that's larger than how the union is defined: I really
don't see it a problem..  It'll need care on alloc/free, true, but it
should be pretty manageable in this case to me.

> 
> >> 
> >> >
> >> > For option 2: I meant MultiFDPages_t will have no offset[] pointer 
> >> > anymore,
> >> > but make it part of the struct (MultiFDPages_t.offset[]).  Logically it's
> >> > the same as option 1 but maybe slight cleaner.  We just need to make it
> >> > sized 0 so as to be dynamic in size.
> >> 
> >> Seems like an undefined behavior magnet. If I sent this as the first
> >> version, you'd NACK me right away.
> >> 
> >> Besides, it's an unnecessary restriction to impose in the client
> >> code. And like above, we don't allocate the struct directly, it's part
> >> of MultiFDSendData, that's an advantage of using the union.
> >> 
> >> I think we've reached the point where I'd like to hear more concrete
> >> reasons for not going with the current proposal, except for the
> >> simplicity argument you already put. I like the union idea, but OTOH we
> >> already have a working solution right here.
> >
> > I think the issue with current proposal is each client will need to
> > allocate (N+1)*buffer, so more user using it the more buffers we'll need (M
> > users, then M*(N+1)*buffer).  Currently it seems to me we will have 3 users
> > at least: RAM, VFIO, and some other VMSD devices TBD in mid-long futures;
> > the latter two will share the same DeviceState_t.  Maybe vDPA as well at
> > some point?  Then 4.
> 
> You used the opposite argument earlier in this thread to argue in favor
> of the union: We'll only have 2 clients. I'm confused.

Maybe I meant "2 types of clients"?  VDPA will also use the same device
state buffer.

> 
> Although, granted, this RFC does use more memory.

IMHO it's also easier to understand, where any user always has a free
SendData buffer to manipulate, and multifd always has one buffer for each
channel (free or busy).  That is compared to each client needs to allocate
N buffers and we're actually at least leaking "number of multifd channels"
into the client which may not be wanted.

IOW, I wonder whether you're happy with below to drop the union idea:

     struct MultiFDSendData {
         MultiFDPayloadType type;
         MultiFDPages_t ram_payload;
         MultiFDDeviceState_t device_payload;
     };

Then we keep the "(M+N)" usage model, but don't use union and simply forget
about the memory consumption (similar to your original memory consumption
with this, but will be better as long as anything else joins, e.g. vDPA,
because then vDPA will at least share that same buffer with VFIO).

Do you think you would accept this?

> 
> > I'd agree with this approach only if multifd is flexible enough to not even
> > know what's the buffers, but it's not the case, and we seem only care about
> > two:
> >
> >   if (type==RAM)
> >      ...
> >   else
> >      assert(type==DEVICE);
> >      ...
> 
> I don't understand: "not even know what's the buffers" is exactly what
> this series is about. It doesn't have any such conditional on "type".
> 
> >
> > In this case I think it's easier we have multifd manage all the buffers
> > (after all, it knows them well...).  Then the consumption is not
> > M*(N+1)*buffer, but (M+N)*buffer.
> 
> Fine. As I said, I like the union approach. It's just that it doesn't
> work if the client wants to have a pointer in there.
> 
> Again, this is client data that multifd holds, it's not multifd
> data. MultiFDPages_t or DeviceState_t have nothing to do with
> multifd. It should be ok to have:
> 
> DeviceState_t *devstate = &p->data->u.device;
> devstate->foo = g_new0(...);
> devstate->bar = g_new0(...);
> 
> just like we have:
> 
> MultiFDPages_t *pages = &p->data->u.ram;
> pages->offset = g_new0(ram_addr_t, page_count);
> 
> >
> > Perhaps push your tree somewhere so we can have a quick look?
> 
> https://gitlab.com/farosas/qemu/-/commits/multifd-pages-decouple
> 
> > I'm totally
> > lost when you said I'll nack it.. so maybe I didn't really get what you
> > meant.  Codes may clarify that.
> 
> I'm conjecturing that any contributor adding a zero-length array (a[0])
> would probably be given a hard time on the mailing list. There's 10
> instances of it in the code base. The proper way to grow an array is to
> use a flexible array (a[]) instead.

I'm not familiar with flexible array.  What's the difference between:

struct {
    int a[];
};

v.s.

struct {
    int a[0];
};

?

If that works for you, it should work for me.

Or if you really hate the union / zero-sized array thing, it'll still be
nice to me to drop the union but keep using M+N objects model.

Thanks.

-- 
Peter Xu

Re: [RFC PATCH 6/7] migration/multifd: Move payload storage out of the channel parameters

Reply via email to