On 2016/03/26 3:00, Marc-André Lureau wrote: > Hi > > On Thu, Mar 24, 2016 at 8:10 AM, Yuanhan Liu > <yuanhan....@linux.intel.com> wrote: >>>> The following series starts from the idea that the slave can request a >>>> "managed" shutdown instead and later recover (I guess the use case for >>>> this is to allow for example to update static dispatching/filter rules >>>> etc) >> What if the backend crashes, that no such request will be sent? And >> I'm wondering why this request is needed, as we are able to detect >> the disconnect now (with your patches). > I don't think trying to handle backend crashes is really a thing we > need to take care of. If the backend is bad enough to crash, it may as > well corrupt the guest memory (mst: my understanding of vhost-user is > that backend must be trusted, or it could just throw garbage in the > queue descriptors with surprising consequences or elsewhere in the > guest memory actually, right?). > >> BTW, you meant to let QEMU as the server and the backend as the client >> here, right? Honestly, that's what we've thought of, too, in the first >> time. >> However, I'm wondering could we still go with the QEMU as the client >> and the backend as the server (the default and the only way DPDK >> supports), and let QEMU to try to reconnect when the backend crashes >> and restarts. In such case, we need enable the "reconnect" option >> for vhost-user, and once I have done that, it basically works in my >> test: >> > Conceptually, I think if we allow the backend to disconnect, it makes > sense that qemu is actually the socket server. But it doesn't matter > much, it's simple to teach qemu to reconnect a timer... So we should > probably allow both cases anyway. > >> - start DPDK vhost-switch example >> >> - start QEMU, which will connect to DPDK vhost-user >> >> link is good now. >> >> - kill DPDK vhost-switch >> >> link is broken at this stage >> >> - start DPDK vhost-switch again >> >> you will find that the link is back again. >> >> >> Will that makes sense to you? If so, we may need do nothing (or just >> very few) changes at all to DPDK to get the reconnect work. > The main issue with handling crashes (gone at any time) is that the > backend my not have time to sync the used idx (at the least). It may > already have processed incoming packets, so on reconnect, it may > duplicate the receiving/dispatching work. Similarly, on the backend > receiving end, some packets may be lost, never received by the VM, and > later overwritten by the backend after reconnect (for the same used > idx update reason). This may not be a big deal for unreliable > protocols, but I am not familiar enough with network usage to know if > that's fine in all cases. It may be fine for some packets, such as > udp. > > However, in general, vhost-user should not be specific to network > transmission, and it would be nice to have a reliable way for the the > backend to reconnect. That's what I try to do in this series. I'll > repost it after I have done more testing. > > thanks >
Hi Yuanhan, Probably, we have 2 options here. One is using DEVICE_NEEDS_RESET, or adding one more new status like QUEUE_NEEDS_RESET to virtio specification. In this case, we will need to fix virtio-net drivers and virtio-net device of QEMU, so it might need to fix a lot of code, but we can handle unexpected shutdown of vhost-user backend. The other option is Marc's simple solution. In this case, we don't need to change virtio-net drivers, but we cannot handle unexpected shutdown. Thanks, Tetsuya