Dear Max, Max Reitz <mre...@redhat.com> writes:
[tests/qemu-iotests/162] [...] > +while true; do > + port=$((RANDOM + 32768)) > + $QEMU_NBD -p $port -f raw null-co:// &> /dev/null & > + nbd_pid=$! > + sleep 0.5 > + > + # Check whether the process is still alive > + # (which is the case if the server has been created successfully) > + if kill -0 $nbd_pid &> /dev/null; then > + break > + fi > +done Apart from being inherently racy, the chosen sleep time of 0.5s is also rather short in practice when running tests on a busy or slow host. Since in this case 162 believes start-up worked, it will still run into the original problem. Traditionally daemons fork and wait for their child to finish initialisation. Only once the child succeeded (or failed) will they exit themselves. That solves exactly the race condition above. If we don't have / want to introduce a good way to discover whether qemu-nbd start-up was successful, we need to check and wait for either qemu-nbd exiting (⇒ failed, try different port) or qemu-nbd having successfully acquired the port (⇒ success, continue with test). Unfortunately all the ways to discover whether a specific process is listening on a port are non-portable (no pun intended). Sorry to be a bother, but qemu-iotests failing intermittently on slow hosts is really a PITA. It's hard to tell the difference between the test randomly failing because the test is racy (→ false positive, not a bug) and the test randomly failing because the implementation is racy (→ true positive, bug). Sascha -- Softwareentwicklung Sascha Silbe, Niederhofenstraße 5/1, 71229 Leonberg https://se-silbe.de/ USt-IdNr. DE281696641