On 07.05.19 22:38, Eric Blake wrote: > On 5/7/19 1:36 PM, Max Reitz wrote: >> common.nbd's nbd_server_set_tcp_port() tries to find a free port, and >> then uses it for the whole test run. However, this is racey because > > racy > >> even if the port was free at the beginning, there is no guarantee it >> will continue to be available. Therefore, 233 currently cannot reliably >> be run concurrently with other NBD TCP tests. >> >> This patch addresses the problem by dropping nbd_server_set_tcp_port(), >> and instead finding a new port every time nbd_server_start_tcp_socket() >> is invoked. For this, we run qemu-nbd with --fork and on error evaluate >> the output to see whether it contains "Address already in use". If so, >> we try the next port. >> >> On success, we still want to continually redirect the output from >> qemu-nbd to stderr. To achieve both, we redirect qemu-nbd's stderr to a >> FIFO that we then open in bash. If the parent process exits with status >> 0 (which means that the server has started successfully), we launch a >> background cat process that copies the FIFO to stderr. On failure, we >> read the whole content into a variable and then evaluate it. >> >> While at it, use --fork in nbd_server_start_unix_socket(), too. Doing >> so allows us to drop nbd_server_wait_for_*_socket(). >> >> Note that the reason common.nbd did not use --fork before is that >> qemu-nbd did not have --pid-file. >> >> Signed-off-by: Max Reitz <mre...@redhat.com> >> --- >> tests/qemu-iotests/233 | 1 - >> tests/qemu-iotests/common.nbd | 93 ++++++++++++++++------------------- >> 2 files changed, 42 insertions(+), 52 deletions(-) >> > >> @@ -34,76 +39,62 @@ nbd_server_stop() >> fi >> fi >> rm -f "$nbd_unix_socket" >> -} >> - >> -nbd_server_wait_for_unix_socket() >> -{ > ... >> - echo "Failed in check of unix socket created by qemu-nbd" >> - exit 1 >> + rm -f "$nbd_stderr_fifo" > > You could use a single 'rm -f "$nbd_unix_socket" "$nbd_stderr_fifo"'. > That's cosmetic, though. > > Are we sure that even on failure, our fifo will not fill up and cause > deadlock? If the failing qemu-nbd has so much output as to be non-atomic > so that it blocks waiting for a reader, but we don't read anything until > after qemu-nbd exits after forking the daemon, then we have deadlock.
Hm, right. I don’t think it will happen, but if it does, it won’t be because of an “Address already in use”. So if it did happen, the test should fail anyway. Of course, a hang is not the nicest way to fail a test, but I think as long as we don’t think it will be a problem, it should be fine. (The alternative I can think of would be to start a background cat that copies data over to a log file, and then kill it after the qemu-nbd parent process has exited. On error, we read the log; on success, we print it to stderr and then start the cat from nbd_stderr_fifo to stderr.) > But in the common case, I don't think qemu-nbd ever spits out that much > in errors, even when it fails to start whether due to a socket in use or > for other reasons. And even if it does hang, it is our testsuite (and > our CI tools will probably notice it), rather than our main code. > > Otherwise, it's a lot of shell code with quite a few bash-isms, but we > already require bash, and I didn't spot anything blatantly wrong. > > Reviewed-by: Eric Blake <ebl...@redhat.com> Thanks again! I’ll prepare the v2. Max
signature.asc
Description: OpenPGP digital signature