Re: [Qemu-devel] [PATCH v4 08/10] block/nbd: add cmdline and qapi parameter reconnect-delay

2019-04-11 Thread Vladimir Sementsov-Ogievskiy
05.01.2019 1:25, Eric Blake wrote:
> On 7/31/18 12:30 PM, Vladimir Sementsov-Ogievskiy wrote:
>> Reconnect will be implemented in the following commit, so for now,
>> in semantics below, disconnect itself is a "serious error".
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy 
>> ---
>>   qapi/block-core.json | 12 +++-
>>   block/nbd-client.h   |  1 +
>>   block/nbd-client.c   |  1 +
>>   block/nbd.c  | 16 +++-
>>   4 files changed, 28 insertions(+), 2 deletions(-)
>>
>> diff --git a/qapi/block-core.json b/qapi/block-core.json
>> index 5b9084a394..cf03402ec5 100644
>> --- a/qapi/block-core.json
>> +++ b/qapi/block-core.json
>> @@ -3511,13 +3511,23 @@
>>   #  traditional "base:allocation" block status (see
>>   #  NBD_OPT_LIST_META_CONTEXT in the NBD protocol) (since 
>> 3.0)
>>   #
>> +# @reconnect-delay: Reconnect delay. On disconnect, nbd client tries to 
>> connect
> 
> Maybe 'On unexpected disconnect', since intentional disconnect is not
> unexpected.
> 
>> +#   again, until success or serious error. During first
>> +#   @reconnect-delay seconds of reconnecting loop all 
>> requests
>> +#   are paused and have a chance to rerun, if successful
>> +#   connect occures during this time. After @reconnect-delay
> 
> occurs
> 
>> +#   seconds all delayed requests are failed and all 
>> following
>> +#   requests will be failed to (until successfull 
>> reconnect).
> 
> successful
> 
>> +#   Default 300 seconds (Since 3.1)
> 
> My delay in reviewing means this now has to be 4.0.
> 
> I'm guessing that a delay of 0 means disable auto-reconnect.

I doubt, as reconnect-delay=0 is a valid case, when all request are failed
(pre-patch behavior), but we still try to reconnect in background. Any reason
not to try? If any, I suggest one of the following:

1. treat absence of the option as no-reconnect-at-all. So absence and 0 would be
not the same

2. use -1 value as no-reconnect-at-all

3. additional bool option "reconnect"

>  From a
> backwards-compatibility standpoint, no auto-reconnect is more in line
> with what we previously had - but from a usability standpoint, trying to
> reconnect can avoid turning transient network hiccups into permanent
> loss of a device to EIO errors, especially if the retry timeout is long
> enough to allow an administrator to reroute the network to an
> alternative server.  So I'm probably okay with the default being
> non-zero - but it DOES mean that where you used to get instant EIO
> failures when a network connection was severed, you now have to wait for
> the reconnect delay to expire, and 5 minutes can be a long wait.  Since
> the long delay is guest-observable, can we run into issues where a guest
> that is currently used to instant EIO and total loss of the device could
> instead get confused by not getting any response for up to 5 minutes,
> whether or not that response eventually turns out to be EIO or a
> successful recovery?
> 
>> +++ b/block/nbd.c
>> @@ -360,6 +360,18 @@ static QemuOptsList nbd_runtime_opts = {
>>   .help = "experimental: expose named dirty bitmap in place of "
>>   "block status",
>>   },
>> +{
>> +.name = "reconnect-delay",
>> +.type = QEMU_OPT_NUMBER,
>> +.help = "Reconnect delay. On disconnect, nbd client tries to"
>> +"connect again, until success or serious error. During"
>> +"first @reconnect-delay seconds of reconnecting loop 
>> all"
>> +"requests are paused and have a chance to rerun, if"
>> +"successful connect occures during this time. After"
>> +"@reconnect-delay seconds all delayed requests are 
>> failed"
>> +"and all following requests will be failed to (until"
>> +"successfull reconnect). Default 300 seconds",
> 
> Same typos as in qapi.
> 
> The UI aspects look fine, now I need to review the patch series for code
> issues :)
> 
> 


-- 
Best regards,
Vladimir


Re: [Qemu-devel] [PATCH v4 08/10] block/nbd: add cmdline and qapi parameter reconnect-delay

2019-02-05 Thread Vladimir Sementsov-Ogievskiy
05.01.2019 1:25, Eric Blake wrote:
> On 7/31/18 12:30 PM, Vladimir Sementsov-Ogievskiy wrote:
>> Reconnect will be implemented in the following commit, so for now,
>> in semantics below, disconnect itself is a "serious error".
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy 
>> ---
>>   qapi/block-core.json | 12 +++-
>>   block/nbd-client.h   |  1 +
>>   block/nbd-client.c   |  1 +
>>   block/nbd.c  | 16 +++-
>>   4 files changed, 28 insertions(+), 2 deletions(-)
>>
>> diff --git a/qapi/block-core.json b/qapi/block-core.json
>> index 5b9084a394..cf03402ec5 100644
>> --- a/qapi/block-core.json
>> +++ b/qapi/block-core.json
>> @@ -3511,13 +3511,23 @@
>>   #  traditional "base:allocation" block status (see
>>   #  NBD_OPT_LIST_META_CONTEXT in the NBD protocol) (since 
>> 3.0)
>>   #
>> +# @reconnect-delay: Reconnect delay. On disconnect, nbd client tries to 
>> connect
> 
> Maybe 'On unexpected disconnect', since intentional disconnect is not
> unexpected.
> 
>> +#   again, until success or serious error. During first
>> +#   @reconnect-delay seconds of reconnecting loop all 
>> requests
>> +#   are paused and have a chance to rerun, if successful
>> +#   connect occures during this time. After @reconnect-delay
> 
> occurs
> 
>> +#   seconds all delayed requests are failed and all 
>> following
>> +#   requests will be failed to (until successfull 
>> reconnect).
> 
> successful
> 
>> +#   Default 300 seconds (Since 3.1)
> 
> My delay in reviewing means this now has to be 4.0.
> 
> I'm guessing that a delay of 0 means disable auto-reconnect.  From a
> backwards-compatibility standpoint, no auto-reconnect is more in line
> with what we previously had - but from a usability standpoint, trying to
> reconnect can avoid turning transient network hiccups into permanent
> loss of a device to EIO errors, especially if the retry timeout is long
> enough to allow an administrator to reroute the network to an
> alternative server.  So I'm probably okay with the default being
> non-zero - but it DOES mean that where you used to get instant EIO
> failures when a network connection was severed, you now have to wait for
> the reconnect delay to expire, and 5 minutes can be a long wait.  Since
> the long delay is guest-observable, can we run into issues where a guest
> that is currently used to instant EIO and total loss of the device could
> instead get confused by not getting any response for up to 5 minutes,
> whether or not that response eventually turns out to be EIO or a
> successful recovery?

Hmm, do you have in mind such scenarios? Who could know?

If we unsure, I'm OK to proceed with no-reconnect behavior by default. We can
change the default in a separate patch later, if needed.

> 
>> +++ b/block/nbd.c
>> @@ -360,6 +360,18 @@ static QemuOptsList nbd_runtime_opts = {
>>   .help = "experimental: expose named dirty bitmap in place of "
>>   "block status",
>>   },
>> +{
>> +.name = "reconnect-delay",
>> +.type = QEMU_OPT_NUMBER,
>> +.help = "Reconnect delay. On disconnect, nbd client tries to"
>> +"connect again, until success or serious error. During"
>> +"first @reconnect-delay seconds of reconnecting loop 
>> all"
>> +"requests are paused and have a chance to rerun, if"
>> +"successful connect occures during this time. After"
>> +"@reconnect-delay seconds all delayed requests are 
>> failed"
>> +"and all following requests will be failed to (until"
>> +"successfull reconnect). Default 300 seconds",
> 
> Same typos as in qapi.
> 
> The UI aspects look fine, now I need to review the patch series for code
> issues :)
> 
> 


-- 
Best regards,
Vladimir


Re: [Qemu-devel] [PATCH v4 08/10] block/nbd: add cmdline and qapi parameter reconnect-delay

2019-01-04 Thread Eric Blake
On 7/31/18 12:30 PM, Vladimir Sementsov-Ogievskiy wrote:
> Reconnect will be implemented in the following commit, so for now,
> in semantics below, disconnect itself is a "serious error".
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
>  qapi/block-core.json | 12 +++-
>  block/nbd-client.h   |  1 +
>  block/nbd-client.c   |  1 +
>  block/nbd.c  | 16 +++-
>  4 files changed, 28 insertions(+), 2 deletions(-)
> 
> diff --git a/qapi/block-core.json b/qapi/block-core.json
> index 5b9084a394..cf03402ec5 100644
> --- a/qapi/block-core.json
> +++ b/qapi/block-core.json
> @@ -3511,13 +3511,23 @@
>  #  traditional "base:allocation" block status (see
>  #  NBD_OPT_LIST_META_CONTEXT in the NBD protocol) (since 3.0)
>  #
> +# @reconnect-delay: Reconnect delay. On disconnect, nbd client tries to 
> connect

Maybe 'On unexpected disconnect', since intentional disconnect is not
unexpected.

> +#   again, until success or serious error. During first
> +#   @reconnect-delay seconds of reconnecting loop all 
> requests
> +#   are paused and have a chance to rerun, if successful
> +#   connect occures during this time. After @reconnect-delay

occurs

> +#   seconds all delayed requests are failed and all following
> +#   requests will be failed to (until successfull reconnect).

successful

> +#   Default 300 seconds (Since 3.1)

My delay in reviewing means this now has to be 4.0.

I'm guessing that a delay of 0 means disable auto-reconnect.  From a
backwards-compatibility standpoint, no auto-reconnect is more in line
with what we previously had - but from a usability standpoint, trying to
reconnect can avoid turning transient network hiccups into permanent
loss of a device to EIO errors, especially if the retry timeout is long
enough to allow an administrator to reroute the network to an
alternative server.  So I'm probably okay with the default being
non-zero - but it DOES mean that where you used to get instant EIO
failures when a network connection was severed, you now have to wait for
the reconnect delay to expire, and 5 minutes can be a long wait.  Since
the long delay is guest-observable, can we run into issues where a guest
that is currently used to instant EIO and total loss of the device could
instead get confused by not getting any response for up to 5 minutes,
whether or not that response eventually turns out to be EIO or a
successful recovery?

> +++ b/block/nbd.c
> @@ -360,6 +360,18 @@ static QemuOptsList nbd_runtime_opts = {
>  .help = "experimental: expose named dirty bitmap in place of "
>  "block status",
>  },
> +{
> +.name = "reconnect-delay",
> +.type = QEMU_OPT_NUMBER,
> +.help = "Reconnect delay. On disconnect, nbd client tries to"
> +"connect again, until success or serious error. During"
> +"first @reconnect-delay seconds of reconnecting loop all"
> +"requests are paused and have a chance to rerun, if"
> +"successful connect occures during this time. After"
> +"@reconnect-delay seconds all delayed requests are 
> failed"
> +"and all following requests will be failed to (until"
> +"successfull reconnect). Default 300 seconds",

Same typos as in qapi.

The UI aspects look fine, now I need to review the patch series for code
issues :)


-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature


[Qemu-devel] [PATCH v4 08/10] block/nbd: add cmdline and qapi parameter reconnect-delay

2018-07-31 Thread Vladimir Sementsov-Ogievskiy
Reconnect will be implemented in the following commit, so for now,
in semantics below, disconnect itself is a "serious error".

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 qapi/block-core.json | 12 +++-
 block/nbd-client.h   |  1 +
 block/nbd-client.c   |  1 +
 block/nbd.c  | 16 +++-
 4 files changed, 28 insertions(+), 2 deletions(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index 5b9084a394..cf03402ec5 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -3511,13 +3511,23 @@
 #  traditional "base:allocation" block status (see
 #  NBD_OPT_LIST_META_CONTEXT in the NBD protocol) (since 3.0)
 #
+# @reconnect-delay: Reconnect delay. On disconnect, nbd client tries to connect
+#   again, until success or serious error. During first
+#   @reconnect-delay seconds of reconnecting loop all requests
+#   are paused and have a chance to rerun, if successful
+#   connect occures during this time. After @reconnect-delay
+#   seconds all delayed requests are failed and all following
+#   requests will be failed to (until successfull reconnect).
+#   Default 300 seconds (Since 3.1)
+#
 # Since: 2.9
 ##
 { 'struct': 'BlockdevOptionsNbd',
   'data': { 'server': 'SocketAddress',
 '*export': 'str',
 '*tls-creds': 'str',
-'*x-dirty-bitmap': 'str' } }
+'*x-dirty-bitmap': 'str',
+'*reconnect-delay': 'uint32' } }
 
 ##
 # @BlockdevOptionsRaw:
diff --git a/block/nbd-client.h b/block/nbd-client.h
index e7bda4d3c4..ef8a6a9239 100644
--- a/block/nbd-client.h
+++ b/block/nbd-client.h
@@ -53,6 +53,7 @@ int nbd_client_init(BlockDriverState *bs,
 QCryptoTLSCreds *tlscreds,
 const char *hostname,
 const char *x_dirty_bitmap,
+uint32_t reconnect_delay,
 Error **errp);
 void nbd_client_close(BlockDriverState *bs);
 
diff --git a/block/nbd-client.c b/block/nbd-client.c
index c184a9f0e9..41e6e6e702 100644
--- a/block/nbd-client.c
+++ b/block/nbd-client.c
@@ -1079,6 +1079,7 @@ int nbd_client_init(BlockDriverState *bs,
 QCryptoTLSCreds *tlscreds,
 const char *hostname,
 const char *x_dirty_bitmap,
+uint32_t reconnect_delay,
 Error **errp)
 {
 NBDClientSession *client = nbd_get_client_session(bs);
diff --git a/block/nbd.c b/block/nbd.c
index 9db5eded89..b40fb5a977 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -360,6 +360,18 @@ static QemuOptsList nbd_runtime_opts = {
 .help = "experimental: expose named dirty bitmap in place of "
 "block status",
 },
+{
+.name = "reconnect-delay",
+.type = QEMU_OPT_NUMBER,
+.help = "Reconnect delay. On disconnect, nbd client tries to"
+"connect again, until success or serious error. During"
+"first @reconnect-delay seconds of reconnecting loop all"
+"requests are paused and have a chance to rerun, if"
+"successful connect occures during this time. After"
+"@reconnect-delay seconds all delayed requests are failed"
+"and all following requests will be failed to (until"
+"successfull reconnect). Default 300 seconds",
+},
 { /* end of list */ }
 },
 };
@@ -411,7 +423,9 @@ static int nbd_open(BlockDriverState *bs, QDict *options, 
int flags,
 
 /* NBD handshake */
 ret = nbd_client_init(bs, s->saddr, s->export, tlscreds, hostname,
-  qemu_opt_get(opts, "x-dirty-bitmap"), errp);
+  qemu_opt_get(opts, "x-dirty-bitmap"),
+  qemu_opt_get_number(opts, "reconnect-delay", 300),
+  errp);
 
  error:
 if (tlscreds) {
-- 
2.11.1