On 4/2/20 1:41 AM, Vladimir Sementsov-Ogievskiy wrote:
02.04.2020 1:38, Eric Blake wrote:
I was trying to test qemu's reconnect-delay parameter by using nbdkit
as a server that I could easily make disappear and resume. A bit of
experimenting shows that when nbdkit is abruptly killed (SIGKILL),
qemu detects EOF on the socket and manages to reconnect just fine; but
when nbdkit is gracefully killed (SIGTERM), it merely fails all
further guest requests with NBD_ESHUTDOWN until the client disconnects
first, and qemu was blindly failing the I/O request with ESHUTDOWN
from the server instead of attempting to reconnect.
While most NBD server failures are unlikely to change by merely
retrying the same transaction, our decision to not start a retry loop
in the common case is correct. But NBD_ESHUTDOWN is rare enough, and
really is indicative of a transient situation, that it is worth
special-casing.
Interesting. I see, that prior to this patch we don't handle ESHUTDOWN
at all in nbd client..
What does spec say?
> On a server shutdown, the server SHOULD wait for inflight requests to
be serviced prior to initiating a hard disconnect. A server MAY speed
this process up by issuing error replies. The error value issued in
respect of these requests and any subsequently received requests SHOULD
be NBD_ESHUTDOWN.
> If the client receives an NBD_ESHUTDOWN error it MUST initiate a soft
disconnect.
Perhaps the spec should be relaxed to state that a client SHOULD
initiate soft disconnect (as there are existing clients that do not).
If a server knows it wants to initiate hard disconnect soon, it
shouldn't be forced to wait for a client to respond to NBD_ESHUTDOWN,
since not all clients do. Then again, it is indeed nicer if the client
does initiate soft disconnect (as soft is always cleaner than hard).
> The client MAY issue a soft disconnect at any time, but SHOULD wait
until there are no inflight requests first.
> The client and the server MUST NOT initiate any form of disconnect
other than in one of the above circumstances.
Hmm. So, actually we MUST initiate a soft disconnect, which means that
we must send NBD_CMD_DISC..
With this patch as-is, qemu as client initiates hard disconnect in
response to NBD_ESHUTDOWN (but only if it plans on trying to reconnect).
Then, what about "SHOULD wait until no inflight requests"? We don't do
it either.. Should we?
qemu as server doesn't send NBD_ESHUTDOWN. It probably should (the way
nbdkit does), but that's orthogonal to qemu as client responding to
NBD_ESHUTDOWN.
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3226
Virtualization: qemu.org | libvirt.org