On Fri, Jun 24, 2016 at 03:25:30PM +0200, Marc-André Lureau wrote: > On Thu, Jun 23, 2016 at 7:13 PM, Michael S. Tsirkin <m...@redhat.com> wrote: > >> > If it's ok and we can recover, then why should we print errors? > >> > >> To me, the current disconnect handling is not handled cleanly. There > >> is not much/nothing in the protocol that tells when and how you can > >> disconnect. If qemu vhost-user reconnection works today, it is mostly > >> by luck. Imho, we should print errors if any call to vhost user fails > >> to help further analysis of broken behaviour. > > > > But we decided disconnect is OK, did we not? So now a failure like this > > is just expected. It's not broken behaviour anymore ... > > As you know, there are many ways qemu or the vm can break when the > backend is disconnected. For now, it would help a lot if we had a bit > of error messages in this case. But do we really want to support > spurious disconnect/reconnect at any time? It's going to be > challenging, to maintain, to test... Is it really worthwhile? I doubt > it. I think it would be easier and more future-proof to have a > dedicated vhost-user request for that and only do a best effort for > the ungraceful disconnect.
ungraceful might take a while. But I don't see a need for message exchanges for shutdown. Do you? > >> And we shoud have a > >> clean mechanism to handle shutdown/disconnect/reconnect. > > > > > > At some level this is something that's missing here. > > > > How about we patch docs/specs/vhost-user.txt > > and describe how is reconnection supposed to work? > > > > All we have is: > > If Master is unable to send the full message or receives a wrong > > reply > > it will close the connection. An optional reconnection mechanism > > can be > > implemented. > > This text is there from the original commit but it doesn't say how to > reconnect and or how to recover from the ring. I don't have enough > experience with virtio in general to say when it is acceptable for > backend to be gone (not processing the ring), I can imagine the > implementation vary widely, and the requirements depend on the kind of > device. > > If we limit the spec to vhost-user protocol only and leave it open on > how each virtio device should support ungraceful disconnect/reconnect, > then it sounds reasonable to document that after such disconnect, on > reconnect: > - the server assumes the backend has no knowledge of previous connection > - the state can be changed between reconnections (new ring state, mem table > etc) > - the server will check feature compatibility with previous backend > and reject incompatible backends > - backend must restore ring processing from used->idx and ignore > VHOST_USER_SET_VRING_BASE (or should we change and document that in > this case the ring base is updated from used->idx?) I think it's cleaner to change VHOST_USER_SET_VRING_BASE to get stuff from used->idx. > (it's probably not a complete list, I am not good at imagining all > possible cases) used ring must be flushed before disconnect (so all entries before used->idx have been processed, and no entries after used->idx have been processed). If enabled, dirty log must be flushed before disconnect. > > -- > Marc-André Lureau