Re: [Qemu-devel] [PATCH v2] iotests: Do not rely on unavailable domains in 162

Sascha Silbe Thu, 25 Aug 2016 08:47:52 -0700

Dear Max,

Max Reitz <mre...@redhat.com> writes:


[tests/qemu-iotests/162]
[...]
> +while true; do
> +    port=$((RANDOM + 32768))
> +    $QEMU_NBD -p $port -f raw null-co:// &> /dev/null &
> +    nbd_pid=$!
> +    sleep 0.5
> +
> +    # Check whether the process is still alive
> +    # (which is the case if the server has been created successfully)
> +    if kill -0 $nbd_pid &> /dev/null; then
> +        break
> +    fi
> +done

Apart from being inherently racy, the chosen sleep time of 0.5s is also
rather short in practice when running tests on a busy or slow
host. Since in this case 162 believes start-up worked, it will still run
into the original problem.

Traditionally daemons fork and wait for their child to finish
initialisation. Only once the child succeeded (or failed) will they exit
themselves. That solves exactly the race condition above.

If we don't have / want to introduce a good way to discover whether
qemu-nbd start-up was successful, we need to check and wait for either
qemu-nbd exiting (⇒ failed, try different port) or qemu-nbd having
successfully acquired the port (⇒ success, continue with
test). Unfortunately all the ways to discover whether a specific process
is listening on a port are non-portable (no pun intended).

Sorry to be a bother, but qemu-iotests failing intermittently on slow
hosts is really a PITA. It's hard to tell the difference between the
test randomly failing because the test is racy (→ false positive, not a
bug) and the test randomly failing because the implementation is racy (→
true positive, bug).

Sascha
-- 
Softwareentwicklung Sascha Silbe, Niederhofenstraße 5/1, 71229 Leonberg
https://se-silbe.de/
USt-IdNr. DE281696641

Re: [Qemu-devel] [PATCH v2] iotests: Do not rely on unavailable domains in 162

Reply via email to