Re: [RFC 1/2] vhost-user: Add interface for virtio-fs migration

Stefan Hajnoczi Wed, 15 Mar 2023 09:35:03 -0700

On Wed, 15 Mar 2023 at 11:56, Hanna Czenczek <hre...@redhat.com> wrote:
>
> On 15.03.23 14:58, Stefan Hajnoczi wrote:
> > On Mon, Mar 13, 2023 at 06:48:32PM +0100, Hanna Czenczek wrote:
> >> Add a virtio-fs-specific vhost-user interface to facilitate migrating
> >> back-end-internal state.  We plan to migrate the internal state simply
> > Luckily the interface does not need to be virtiofs-specific since it
> > only transfers opaque data. Any stateful device can use this for
> > migration. Please make it generic both at the vhost-user protocol
> > message level and at the QEMU vhost API level.
>
> OK, sure.
>
> >> as a binary blob after the streaming phase, so all we need is a way to
> >> transfer such a blob from and to the back-end.  We do so by using a
> >> dedicated area of shared memory through which the blob is transferred in
> >> chunks.
> > Keeping the migration data transfer separate from the vhost-user UNIX
> > domain socket is a good idea since the amount of data could be large and
> > may congest the UNIX domain socket. The shared memory interface solves
> > this.
> >
> > Where I get lost is why it needs to be shared memory instead of simply
> > an fd? On the source, the front-end could read the fd until EOF and
> > transfer the opaque data. On the destination, the front-end could write
> > to the fd and then close it. I think that would be simpler than the
> > shared memory interface and could potentially support zero-copy via
> > splice(2) (QEMU doesn't need to look at the data being transferred!).
> >
> > Here is an outline of an fd-based interface:
> >
> > - SET_DEVICE_STATE_FD: The front-end passes a file descriptor for
> >    transferring device state.
> >
> >    The @direction argument:
> >    - SAVE: the back-end transfers an outgoing device state over the fd.
> >    - LOAD: the back-end transfers an incoming device state over the fd.
> >
> >    The @phase argument:
> >    - STOPPED: the device is stopped.
> >    - PRE_COPY: reserved for future use.
> >    - POST_COPY: reserved for future use.
> >
> >    The back-end transfers data over the fd according to @direction and
> >    @phase upon receiving the SET_DEVICE_STATE_FD message.
> >
> > There are loose ends like how the message interacts with the virtqueue
> > enabled state, what happens if multiple SET_DEVICE_STATE_FD messages are
> > sent, etc. I have ignored them for now.
> >
> > What I wanted to mention about the fd-based interface is:
> >
> > - It's just one message. The I/O activity happens via the fd and does
> >    not involve GET_STATE/SET_STATE messages over the vhost-user domain
> >    socket.
> >
> > - Buffer management is up to the front-end and back-end implementations
> >    and a bit simpler than the shared memory interface.
> >
> > Did you choose the shared memory approach because it has certain
> > advantages?
>
> I simply chose it because I didn’t think of anything else. :)
>
> Using just an FD for a pipe-like interface sounds perfect to me.  I
> expect that to make the code simpler and, as you point out, it’s just
> better in general.  Thanks!


The Linux VFIO Migration v2 API could be interesting to look at too:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/uapi/linux/vfio.h#n814

It has a state machine that puts the device into
pre-copy/saving/loading/etc states.

> > What is the rationale for waiting to receive the entire incoming state
> > before parsing it rather than parsing it in a streaming fashion? Can
> > this be left as an implementation detail of the vhost-user back-end so
> > that there's freedom in choosing either approach?
>
> The rationale was that when using the shared memory approach, you need
> to specify the offset into the state of the chunk that you’re currently
> transferring.  So to allow streaming, you’d need to make the front-end
> transfer the chunks in a streaming fashion, so that these offsets are
> continuously increasing.  Definitely possible, and reasonable, I just
> thought it’d be easier not to define it at this point and just state
> that decoding at the end is always safe.
>
> When using a pipe/splicing, however, that won’t be a concern anymore, so
> yes, then we can definitely allow the back-end to decode its state while
> it’s still being received.

I see.

Stefan

Re: [RFC 1/2] vhost-user: Add interface for virtio-fs migration

Reply via email to