05.01.2019 1:25, Eric Blake wrote: > On 7/31/18 12:30 PM, Vladimir Sementsov-Ogievskiy wrote: >> Reconnect will be implemented in the following commit, so for now, >> in semantics below, disconnect itself is a "serious error". >> >> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsement...@virtuozzo.com> >> --- >> qapi/block-core.json | 12 +++++++++++- >> block/nbd-client.h | 1 + >> block/nbd-client.c | 1 + >> block/nbd.c | 16 +++++++++++++++- >> 4 files changed, 28 insertions(+), 2 deletions(-) >> >> diff --git a/qapi/block-core.json b/qapi/block-core.json >> index 5b9084a394..cf03402ec5 100644 >> --- a/qapi/block-core.json >> +++ b/qapi/block-core.json >> @@ -3511,13 +3511,23 @@ >> # traditional "base:allocation" block status (see >> # NBD_OPT_LIST_META_CONTEXT in the NBD protocol) (since >> 3.0) >> # >> +# @reconnect-delay: Reconnect delay. On disconnect, nbd client tries to >> connect > > Maybe 'On unexpected disconnect', since intentional disconnect is not > unexpected. > >> +# again, until success or serious error. During first >> +# @reconnect-delay seconds of reconnecting loop all >> requests >> +# are paused and have a chance to rerun, if successful >> +# connect occures during this time. After @reconnect-delay > > occurs > >> +# seconds all delayed requests are failed and all >> following >> +# requests will be failed to (until successfull >> reconnect). > > successful > >> +# Default 300 seconds (Since 3.1) > > My delay in reviewing means this now has to be 4.0. > > I'm guessing that a delay of 0 means disable auto-reconnect. From a > backwards-compatibility standpoint, no auto-reconnect is more in line > with what we previously had - but from a usability standpoint, trying to > reconnect can avoid turning transient network hiccups into permanent > loss of a device to EIO errors, especially if the retry timeout is long > enough to allow an administrator to reroute the network to an > alternative server. So I'm probably okay with the default being > non-zero - but it DOES mean that where you used to get instant EIO > failures when a network connection was severed, you now have to wait for > the reconnect delay to expire, and 5 minutes can be a long wait. Since > the long delay is guest-observable, can we run into issues where a guest > that is currently used to instant EIO and total loss of the device could > instead get confused by not getting any response for up to 5 minutes, > whether or not that response eventually turns out to be EIO or a > successful recovery?
Hmm, do you have in mind such scenarios? Who could know? If we unsure, I'm OK to proceed with no-reconnect behavior by default. We can change the default in a separate patch later, if needed. > >> +++ b/block/nbd.c >> @@ -360,6 +360,18 @@ static QemuOptsList nbd_runtime_opts = { >> .help = "experimental: expose named dirty bitmap in place of " >> "block status", >> }, >> + { >> + .name = "reconnect-delay", >> + .type = QEMU_OPT_NUMBER, >> + .help = "Reconnect delay. On disconnect, nbd client tries to" >> + "connect again, until success or serious error. During" >> + "first @reconnect-delay seconds of reconnecting loop >> all" >> + "requests are paused and have a chance to rerun, if" >> + "successful connect occures during this time. After" >> + "@reconnect-delay seconds all delayed requests are >> failed" >> + "and all following requests will be failed to (until" >> + "successfull reconnect). Default 300 seconds", > > Same typos as in qapi. > > The UI aspects look fine, now I need to review the patch series for code > issues :) > > -- Best regards, Vladimir