Re: [Qemu-devel] [PATCH RFC 00/14] vhost-user: shutdown and reconnection

Marc-André Lureau Fri, 25 Mar 2016 11:01:55 -0700

Hi

On Thu, Mar 24, 2016 at 8:10 AM, Yuanhan Liu
<yuanhan....@linux.intel.com> wrote:
>> > The following series starts from the idea that the slave can request a
>> > "managed" shutdown instead and later recover (I guess the use case for
>> > this is to allow for example to update static dispatching/filter rules
>> > etc)
>
> What if the backend crashes, that no such request will be sent? And
> I'm wondering why this request is needed, as we are able to detect
> the disconnect now (with your patches).


I don't think trying to handle backend crashes is really a thing we
need to take care of. If the backend is bad enough to crash, it may as
well corrupt the guest memory (mst: my understanding of vhost-user is
that backend must be trusted, or it could just throw garbage in the
queue descriptors with surprising consequences or elsewhere in the
guest memory actually, right?).

> BTW, you meant to let QEMU as the server and the backend as the client
> here, right? Honestly, that's what we've thought of, too, in the first
> time.
> However, I'm wondering could we still go with the QEMU as the client
> and the backend as the server (the default and the only way DPDK
> supports), and let QEMU to try to reconnect when the backend crashes
> and restarts. In such case, we need enable the "reconnect" option
> for vhost-user, and once I have done that, it basically works in my
> test:
>

Conceptually, I think if we allow the backend to disconnect, it makes
sense that qemu is actually the socket server. But it doesn't matter
much, it's simple to teach qemu to reconnect a timer... So we should
probably allow both cases anyway.

> - start DPDK vhost-switch example
>
> - start QEMU, which will connect to DPDK vhost-user
>
>   link is good now.
>
> - kill DPDK vhost-switch
>
>   link is broken at this stage
>
> - start DPDK vhost-switch again
>
>   you will find that the link is back again.
>
>
> Will that makes sense to you? If so, we may need do nothing (or just
> very few) changes at all to DPDK to get the reconnect work.

The main issue with handling crashes (gone at any time) is that the
backend my not have time to sync the used idx (at the least). It may
already have processed incoming packets, so on reconnect, it may
duplicate the receiving/dispatching work. Similarly, on the backend
receiving end, some packets may be lost, never received by the VM, and
later overwritten by the backend after reconnect (for the same used
idx update reason). This may not be a big deal for unreliable
protocols, but I am not familiar enough with network usage to know if
that's fine in all cases. It may be fine for some packets, such as
udp.

However, in general, vhost-user should not be specific to network
transmission, and it would be nice to have a reliable way for the the
backend to reconnect. That's what I try to do in this series. I'll
repost it after I have done more testing.

thanks

-- 
Marc-André Lureau

Re: [Qemu-devel] [PATCH RFC 00/14] vhost-user: shutdown and reconnection

Reply via email to