On 09.03.23 14:39, Richard W.M. Jones wrote:
[ Patch series also available here, along with this cover letter and the
script used to generate test results:
https://gitlab.com/rwmjones/qemu/-/commits/2023-nbd-multi-conn-v1 ]
This patch series adds multi-conn support to the NBD block driver in
qemu. It is only meant for discussion and testing because it has a
number of obvious shortcomings (see "XXX" in commit messages and
code). If we decided this was a good idea, we can work on a better
patch.
I looked through the results and the code, and I think that's of course a good
idea!
We still need smarter integration with reconnect logic.
At least, we shouldn't make several open_timer instances..
Currently, on open() we have open-timeout. That's just a limit for the whole
nbd_open() - we can do several connection attempts during this time.
Seems we should proceed with success, if we succeeded with at least one
connection. Postponing additional connections to be established after open()
seems good too[*].
Next, we have reconnect-delay. When connection is lost nbd-client tries to
reconnect with no limit in attempts, but after reconnect-delay seconds of
reconnection all in-flight requests that are waiting for connection are just
failed.
When we have several connections, and one is broken, I think we shouldn't wait,
but instead retry the requests on other working connections. This way we don't
need several reconnect_delay_timer objects: we need only one, when all
connections are lost.
Reestablishing additional connections better to do in background, not blocking
in-flight requests. And that's the same as postponing additional connections
after open() should work ([*]).
--
Best regards,
Vladimir