nbd: add cmdline and qapi parameter reconnect-delay

Vladimir Sementsov-Ogievskiy Tue, 05 Feb 2019 08:58:09 -0800

05.01.2019 1:25, Eric Blake wrote:
> On 7/31/18 12:30 PM, Vladimir Sementsov-Ogievskiy wrote:
>> Reconnect will be implemented in the following commit, so for now,
>> in semantics below, disconnect itself is a "serious error".
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsement...@virtuozzo.com>
>> ---
>>   qapi/block-core.json | 12 +++++++++++-
>>   block/nbd-client.h   |  1 +
>>   block/nbd-client.c   |  1 +
>>   block/nbd.c          | 16 +++++++++++++++-
>>   4 files changed, 28 insertions(+), 2 deletions(-)
>>
>> diff --git a/qapi/block-core.json b/qapi/block-core.json
>> index 5b9084a394..cf03402ec5 100644
>> --- a/qapi/block-core.json
>> +++ b/qapi/block-core.json
>> @@ -3511,13 +3511,23 @@
>>   #                  traditional "base:allocation" block status (see
>>   #                  NBD_OPT_LIST_META_CONTEXT in the NBD protocol) (since 
>> 3.0)
>>   #
>> +# @reconnect-delay: Reconnect delay. On disconnect, nbd client tries to 
>> connect
> 
> Maybe 'On unexpected disconnect', since intentional disconnect is not
> unexpected.
> 
>> +#                   again, until success or serious error. During first
>> +#                   @reconnect-delay seconds of reconnecting loop all 
>> requests
>> +#                   are paused and have a chance to rerun, if successful
>> +#                   connect occures during this time. After @reconnect-delay
> 
> occurs
> 
>> +#                   seconds all delayed requests are failed and all 
>> following
>> +#                   requests will be failed to (until successfull 
>> reconnect).
> 
> successful
> 
>> +#                   Default 300 seconds (Since 3.1)
> 
> My delay in reviewing means this now has to be 4.0.
> 
> I'm guessing that a delay of 0 means disable auto-reconnect.  From a
> backwards-compatibility standpoint, no auto-reconnect is more in line
> with what we previously had - but from a usability standpoint, trying to
> reconnect can avoid turning transient network hiccups into permanent
> loss of a device to EIO errors, especially if the retry timeout is long
> enough to allow an administrator to reroute the network to an
> alternative server.  So I'm probably okay with the default being
> non-zero - but it DOES mean that where you used to get instant EIO
> failures when a network connection was severed, you now have to wait for
> the reconnect delay to expire, and 5 minutes can be a long wait.  Since
> the long delay is guest-observable, can we run into issues where a guest
> that is currently used to instant EIO and total loss of the device could
> instead get confused by not getting any response for up to 5 minutes,
> whether or not that response eventually turns out to be EIO or a
> successful recovery?


Hmm, do you have in mind such scenarios? Who could know?

If we unsure, I'm OK to proceed with no-reconnect behavior by default. We can
change the default in a separate patch later, if needed.

> 
>> +++ b/block/nbd.c
>> @@ -360,6 +360,18 @@ static QemuOptsList nbd_runtime_opts = {
>>               .help = "experimental: expose named dirty bitmap in place of "
>>                       "block status",
>>           },
>> +        {
>> +            .name = "reconnect-delay",
>> +            .type = QEMU_OPT_NUMBER,
>> +            .help = "Reconnect delay. On disconnect, nbd client tries to"
>> +                    "connect again, until success or serious error. During"
>> +                    "first @reconnect-delay seconds of reconnecting loop 
>> all"
>> +                    "requests are paused and have a chance to rerun, if"
>> +                    "successful connect occures during this time. After"
>> +                    "@reconnect-delay seconds all delayed requests are 
>> failed"
>> +                    "and all following requests will be failed to (until"
>> +                    "successfull reconnect). Default 300 seconds",
> 
> Same typos as in qapi.
> 
> The UI aspects look fine, now I need to review the patch series for code
> issues :)
> 
> 


-- 
Best regards,
Vladimir

Re: [Qemu-devel] [PATCH v4 08/10] block/nbd: add cmdline and qapi parameter reconnect-delay

Reply via email to