Re: [PATCH 0/6] refactor RDMA live migration based on rsocket API

Peter Xu Mon, 30 Sep 2024 13:13:17 -0700

On Mon, Sep 30, 2024 at 07:20:56PM +0000, Sean Hefty wrote:
> > > I'm sure rsocket has its place with much smaller transfer sizes, but
> > > this is very different.
> > 
> > Is it possible to make rsocket be friendly with large buffers (>4GB) like 
> > the VM
> > use case?
> 
> If you can perform large VM migrations using streaming sockets, rsockets is 
> likely usable, but it will involve data copies.  The problem is the socket 
> API semantics.
> 
> There are rsocket API extensions (riowrite, riomap) to support RDMA write 
> operations.  This avoids the data copy at the target, but not the sender.   
> (riowrite follows the socket send semantics on buffer ownership.)
> 
> It may be possible to enhance rsockets with MSG_ZEROCOPY or io_uring 
> extensions to enable zero-copy for large transfers, but that's not something 
> I've looked at.  True zero copy may require combining MSG_ZEROCOPY with 
> riowrite, but then that moves further away from using traditional socket 
> calls.


Thanks, Sean.

One thing to mention is that QEMU has QIO_CHANNEL_WRITE_FLAG_ZERO_COPY,
which already supports MSG_ZEROCOPY but only on sender side, and only if
when multifd is enabled, because it requires page pinning and alignments,
while it's more challenging to pin a random buffer than a guest page.

Nobody moved on yet with zerocopy recv for TCP; there might be similar
challenges that normal socket APIs may not work easily on top of current
iochannel design, but I don't know well to say..

Not sure whether it means there can be a shared goal with QEMU ultimately
supporting better zerocopy via either TCP or RDMA.  If that's true, maybe
there's chance we can move towards rsocket with all the above facilities,
meanwhile RDMA can, ideally, run similiarly like TCP with the same (to be
enhanced..) iochannel API, so that it can do zerocopy on both sides with
either transport.

-- 
Peter Xu

Re: [PATCH 0/6] refactor RDMA live migration based on rsocket API

Reply via email to