Re: [Qemu-devel] [PATCH 05/24] vhost: change some assert() for error_report() or silent fail

Michael S. Tsirkin Fri, 24 Jun 2016 15:40:06 -0700

On Fri, Jun 24, 2016 at 03:25:30PM +0200, Marc-André Lureau wrote:
> On Thu, Jun 23, 2016 at 7:13 PM, Michael S. Tsirkin <m...@redhat.com> wrote:
> >> > If it's ok and we can recover, then why should we print errors?
> >>
> >> To me, the current disconnect handling is not handled cleanly. There
> >> is not much/nothing in the protocol that tells when and how you can
> >> disconnect. If qemu vhost-user reconnection works today, it is mostly
> >> by luck. Imho, we should print errors if any call to vhost user fails
> >> to help further analysis of broken behaviour.
> >
> > But we decided disconnect is OK, did we not? So now a failure like this
> > is just expected. It's not broken behaviour anymore ...
> 
> As you know, there are many ways qemu or the vm can break when the
> backend is disconnected.  For now, it would help a lot if we had a bit
> of error messages in this case. But do we really want to support
> spurious disconnect/reconnect at any time? It's going to be
> challenging, to maintain, to test... Is it really worthwhile? I doubt
> it. I think it would be easier and more future-proof to have a
> dedicated vhost-user request for that and only do a best effort for
> the ungraceful disconnect.


ungraceful might take a while. But I don't see a need for
message exchanges for shutdown. Do you?

> >> And we shoud have a
> >> clean mechanism to handle shutdown/disconnect/reconnect.
> >
> >
> > At some level this is something that's missing here.
> >
> > How about we patch docs/specs/vhost-user.txt
> > and describe how is reconnection supposed to work?
> >
> > All we have is:
> >         If Master is unable to send the full message or receives a wrong 
> > reply
> >         it will close the connection. An optional reconnection mechanism 
> > can be
> >         implemented.
> 
> This text is there from the original commit but it doesn't say how to
> reconnect and or how to recover from the ring. I don't have enough
> experience with virtio in general to say when it is acceptable for
> backend to be gone (not processing the ring), I can imagine the
> implementation vary widely, and the requirements depend on the kind of
> device.
> 
> If we limit the spec to vhost-user protocol only and leave it open on
> how each virtio device should support ungraceful disconnect/reconnect,
> then it sounds reasonable to document that after such disconnect, on
> reconnect:
> - the server assumes the backend has no knowledge of previous connection
> - the state can be changed between reconnections (new ring state, mem table 
> etc)
> - the server will check feature compatibility with previous backend
> and reject incompatible backends
> - backend must restore ring processing from used->idx and ignore
> VHOST_USER_SET_VRING_BASE (or should we change and document that in
> this case the ring base is updated from used->idx?)

I think it's cleaner to change VHOST_USER_SET_VRING_BASE to get stuff
from used->idx.

> (it's probably not a complete list, I am not good at imagining all
> possible cases)

used ring must be flushed before disconnect (so all entries
before used->idx have been processed, and no entries
after used->idx have been processed).

If enabled, dirty log must be flushed before disconnect.


> 
> -- 
> Marc-André Lureau

Re: [Qemu-devel] [PATCH 05/24] vhost: change some assert() for error_report() or silent fail

Reply via email to