Hi On Sat, Jun 25, 2016 at 12:38 AM, Michael S. Tsirkin <m...@redhat.com> wrote: > On Fri, Jun 24, 2016 at 03:25:30PM +0200, Marc-André Lureau wrote: >> On Thu, Jun 23, 2016 at 7:13 PM, Michael S. Tsirkin <m...@redhat.com> wrote: >> >> > If it's ok and we can recover, then why should we print errors? >> >> >> >> To me, the current disconnect handling is not handled cleanly. There >> >> is not much/nothing in the protocol that tells when and how you can >> >> disconnect. If qemu vhost-user reconnection works today, it is mostly >> >> by luck. Imho, we should print errors if any call to vhost user fails >> >> to help further analysis of broken behaviour. >> > >> > But we decided disconnect is OK, did we not? So now a failure like this >> > is just expected. It's not broken behaviour anymore ... >> >> As you know, there are many ways qemu or the vm can break when the >> backend is disconnected. For now, it would help a lot if we had a bit >> of error messages in this case. But do we really want to support >> spurious disconnect/reconnect at any time? It's going to be >> challenging, to maintain, to test... Is it really worthwhile? I doubt >> it. I think it would be easier and more future-proof to have a >> dedicated vhost-user request for that and only do a best effort for >> the ungraceful disconnect. > > ungraceful might take a while. But I don't see a need for > message exchanges for shutdown. Do you?
A message exchange would allow the server to do something before actually shuting down.: to check if shutdown is allowed/clean, to flush some pending operations, to change device state, to request something before shutdown (such as ring base etc),... >> >> And we shoud have a >> >> clean mechanism to handle shutdown/disconnect/reconnect. >> > >> > >> > At some level this is something that's missing here. >> > >> > How about we patch docs/specs/vhost-user.txt >> > and describe how is reconnection supposed to work? >> > >> > All we have is: >> > If Master is unable to send the full message or receives a wrong >> > reply >> > it will close the connection. An optional reconnection mechanism >> > can be >> > implemented. >> >> This text is there from the original commit but it doesn't say how to >> reconnect and or how to recover from the ring. I don't have enough >> experience with virtio in general to say when it is acceptable for >> backend to be gone (not processing the ring), I can imagine the >> implementation vary widely, and the requirements depend on the kind of >> device. >> >> If we limit the spec to vhost-user protocol only and leave it open on >> how each virtio device should support ungraceful disconnect/reconnect, >> then it sounds reasonable to document that after such disconnect, on >> reconnect: >> - the server assumes the backend has no knowledge of previous connection >> - the state can be changed between reconnections (new ring state, mem table >> etc) >> - the server will check feature compatibility with previous backend >> and reject incompatible backends >> - backend must restore ring processing from used->idx and ignore >> VHOST_USER_SET_VRING_BASE (or should we change and document that in >> this case the ring base is updated from used->idx?) > > I think it's cleaner to change VHOST_USER_SET_VRING_BASE to get stuff > from used->idx. The problem with this approach is that it depends on how the backend process the rings. Processing packets in order doesn't seem to be required in general, or is it? >> (it's probably not a complete list, I am not good at imagining all >> possible cases) > > used ring must be flushed before disconnect (so all entries > before used->idx have been processed, and no entries > after used->idx have been processed). > If enabled, dirty log must be flushed before disconnect. Ok that sounds like a good requirement for "non-explicit graceful shutdown" (how would you name it?) and seem to solve the processing order problem. Yet, at this point, I think it would be easy for the backend to do a proper shutdown request with all the benefits I listed. -- Marc-André Lureau