Re: [Qemu-devel] [PATCH 05/24] vhost: change some assert() for error_report() or silent fail

Marc-André Lureau Mon, 27 Jun 2016 02:05:08 -0700

Hi

On Sat, Jun 25, 2016 at 12:38 AM, Michael S. Tsirkin <m...@redhat.com> wrote:
> On Fri, Jun 24, 2016 at 03:25:30PM +0200, Marc-André Lureau wrote:
>> On Thu, Jun 23, 2016 at 7:13 PM, Michael S. Tsirkin <m...@redhat.com> wrote:
>> >> > If it's ok and we can recover, then why should we print errors?
>> >>
>> >> To me, the current disconnect handling is not handled cleanly. There
>> >> is not much/nothing in the protocol that tells when and how you can
>> >> disconnect. If qemu vhost-user reconnection works today, it is mostly
>> >> by luck. Imho, we should print errors if any call to vhost user fails
>> >> to help further analysis of broken behaviour.
>> >
>> > But we decided disconnect is OK, did we not? So now a failure like this
>> > is just expected. It's not broken behaviour anymore ...
>>
>> As you know, there are many ways qemu or the vm can break when the
>> backend is disconnected.  For now, it would help a lot if we had a bit
>> of error messages in this case. But do we really want to support
>> spurious disconnect/reconnect at any time? It's going to be
>> challenging, to maintain, to test... Is it really worthwhile? I doubt
>> it. I think it would be easier and more future-proof to have a
>> dedicated vhost-user request for that and only do a best effort for
>> the ungraceful disconnect.
>
> ungraceful might take a while. But I don't see a need for
> message exchanges for shutdown. Do you?


A message exchange would allow the server to do something before
actually shuting down.: to check if shutdown is allowed/clean, to
flush some pending operations, to change device state, to request
something before shutdown (such as ring base etc),...

>> >> And we shoud have a
>> >> clean mechanism to handle shutdown/disconnect/reconnect.
>> >
>> >
>> > At some level this is something that's missing here.
>> >
>> > How about we patch docs/specs/vhost-user.txt
>> > and describe how is reconnection supposed to work?
>> >
>> > All we have is:
>> >         If Master is unable to send the full message or receives a wrong 
>> > reply
>> >         it will close the connection. An optional reconnection mechanism 
>> > can be
>> >         implemented.
>>
>> This text is there from the original commit but it doesn't say how to
>> reconnect and or how to recover from the ring. I don't have enough
>> experience with virtio in general to say when it is acceptable for
>> backend to be gone (not processing the ring), I can imagine the
>> implementation vary widely, and the requirements depend on the kind of
>> device.
>>
>> If we limit the spec to vhost-user protocol only and leave it open on
>> how each virtio device should support ungraceful disconnect/reconnect,
>> then it sounds reasonable to document that after such disconnect, on
>> reconnect:
>> - the server assumes the backend has no knowledge of previous connection
>> - the state can be changed between reconnections (new ring state, mem table 
>> etc)
>> - the server will check feature compatibility with previous backend
>> and reject incompatible backends
>> - backend must restore ring processing from used->idx and ignore
>> VHOST_USER_SET_VRING_BASE (or should we change and document that in
>> this case the ring base is updated from used->idx?)
>
> I think it's cleaner to change VHOST_USER_SET_VRING_BASE to get stuff
> from used->idx.

The problem with this approach is that it depends on how the backend
process the rings. Processing packets in order doesn't seem to be
required in general, or is it?

>> (it's probably not a complete list, I am not good at imagining all
>> possible cases)
>
> used ring must be flushed before disconnect (so all entries
> before used->idx have been processed, and no entries
> after used->idx have been processed).
> If enabled, dirty log must be flushed before disconnect.

Ok that sounds like a good requirement for "non-explicit graceful
shutdown" (how would you name it?) and seem to solve the processing
order problem. Yet, at this point, I think it would be easy for the
backend to do a proper shutdown request with all the benefits I
listed.


-- 
Marc-André Lureau

Re: [Qemu-devel] [PATCH 05/24] vhost: change some assert() for error_report() or silent fail

Reply via email to