On 29.03.21 17:05, Stefan Hajnoczi wrote:
Hi Kevin,
Peter hit an s390 qemu-iotests failure. I was able to reproduce it
easily. I haven't checked if it reproduce on other platforms too, but it
seems likely. Here is what I found.

qsd-jobs has race conditions:

   # Just make sure that this doesn't crash
   $QSD --chardev stdio,id=stdio --monitor chardev=stdio \
       --blockdev node-name=file0,driver=file,filename="$TEST_IMG" \
       --blockdev node-name=fmt0,driver=qcow2,file=file0 <<EOF | _filter_qmp
   {"execute":"qmp_capabilities"}
   {"execute": "block-commit", "arguments": {"device": "fmt0", "job-id": 
"job0"}}
   {"execute": "quit"}
   EOF

The intent is for "quit" to run while job0 still exists. This proves
that the program does not abort with an assertion failure when "quit" is
called while there are block jobs. bdrv_close_all() expects there to be
no jobs.

Here is the failure that reproduces 1 out of 3 times:

   # build/tests/qemu-iotests/check -qcow2 qsd-jobs
   QEMU          -- 
"/root/qemu/build/tests/qemu-iotests/../../qemu-system-x86_64" -nodefaults 
-display none -accel qtest
   QEMU_IMG      -- "/root/qemu/build/tests/qemu-iotests/../../qemu-img"
   QEMU_IO       -- "/root/qemu/build/tests/qemu-iotests/../../qemu-io" --cache 
writeback --aio threads -f qcow2
   QEMU_NBD      -- "/root/qemu/build/tests/qemu-iotests/../../qemu-nbd"
   IMGFMT        -- qcow2
   IMGPROTO      -- file
   PLATFORM      -- Linux/s390x
   TEST_DIR      -- /root/qemu/scratch
   SOCK_DIR      -- /tmp/tmpapj5dydm
   SOCKET_SCM_HELPER -- /root/qemu/build/tests/qemu-iotests/socket_scm_helper

   qsd-jobs   fail       [10:42:23] [10:42:23]   0.1s   (last: 0.1s)  output 
mismatch (see qsd-jobs.out.bad)
   --- /root/qemu/tests/qemu-iotests/tests/qsd-jobs.out
   +++ qsd-jobs.out.bad
   @@ -9,11 +9,11 @@
    {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": 
{"status": "created", "id": "job0"}}
    {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": 
{"status": "running", "id": "job0"}}
    {"return": {}}
   +{"return": {}}
   +{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": 
{"status": "paused", "id": "job0"}}
   +{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": 
{"status": "running", "id": "job0"}}
    {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": 
{"status": "ready", "id": "job0"}}
    {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_READY", "data": {"device": "job0", 
"len": 0, "offset": 0, "speed": 0, "type": "commit"}}
   -{"return": {}}
   -{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": 
{"status": "standby", "id": "job0"}}
   -{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": 
{"status": "ready", "id": "job0"}}
    {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": 
{"status": "waiting", "id": "job0"}}
    {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": 
{"status": "pending", "id": "job0"}}
    {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_COMPLETED", "data": {"device": "job0", 
"len": 0, "offset": 0, "speed": 0, "type": "commit"}}
   Failures: qsd-jobs
   Failed 1 of 1 iotests

The timing of the QMP events seems to be non-deterministic relative to
the QMP command completions.

It's more complicated than simply lines moving up/down in the expected
output: the job state transitions are slightly different (running ->
ready vs running -> paused).

Should the test case filter out all JOB_* and BLOCK_JOB_* events?

Is there a way to make this test deterministic?

Would writing data to the overlay and setting a very restrictive speed help, so it doesn’t immediately go into the 'ready' state? (See attachment.)

Otherwise, I think we could take my “Allow using the QSD” patch (https://lists.nongnu.org/archive/html/qemu-block/2021-03/msg00654.html) to rewrite the patch to use common.qemu to test QSD. With that we could issue the block-commit command, wait for BLOCK_JOB_READY, and only then issue 'quit'.

Max
>From e77dc43cae17883cefb5766e6932fde359806dbd Mon Sep 17 00:00:00 2001
From: Max Reitz <mre...@redhat.com>
Date: Mon, 29 Mar 2021 17:35:08 +0200
Subject: [PATCH] iotests/qsd-jobs: Make output deterministic

By making the commit job perform work and limiting its speed, we can
prevent it from reaching the 'ready' state before the 'quit' command is
accepted by qemu.  Instead, it will only change its state once the
'quit' command is received (to 'aborting').

Reported-by: Peter Maydell <peter.mayd...@linaro.org>
Signed-off-by: Max Reitz <mre...@redhat.com>
---
 tests/qemu-iotests/tests/qsd-jobs     | 11 ++++++++++-
 tests/qemu-iotests/tests/qsd-jobs.out | 14 +++++++-------
 2 files changed, 17 insertions(+), 8 deletions(-)

diff --git a/tests/qemu-iotests/tests/qsd-jobs b/tests/qemu-iotests/tests/qsd-jobs
index 972b6b3898..4eb0b9b2e2 100755
--- a/tests/qemu-iotests/tests/qsd-jobs
+++ b/tests/qemu-iotests/tests/qsd-jobs
@@ -51,12 +51,21 @@ echo
 echo "=== Job still present at shutdown ==="
 echo
 
+# This amount of data will be committed instantaneously, but the
+# commit job will still wait accordingly to its @speed parameter
+# before announcing itself to be ready.  (We do not want the job
+# to be ready before quitting qemu, so we get deterministic
+# output.)
+$QEMU_IO -c 'write -P 42 0 64k' "$TEST_IMG" | _filter_qemu_io
+echo
+
 # Just make sure that this doesn't crash
 $QSD --chardev stdio,id=stdio --monitor chardev=stdio \
     --blockdev node-name=file0,driver=file,filename="$TEST_IMG" \
     --blockdev node-name=fmt0,driver=qcow2,file=file0 <<EOF | _filter_qmp
 {"execute":"qmp_capabilities"}
-{"execute": "block-commit", "arguments": {"device": "fmt0", "job-id": "job0"}}
+{"execute": "block-commit",
+ "arguments": {"device": "fmt0", "job-id": "job0", "speed": 1}}
 {"execute": "quit"}
 EOF
 
diff --git a/tests/qemu-iotests/tests/qsd-jobs.out b/tests/qemu-iotests/tests/qsd-jobs.out
index 05e1165e80..d8fefb2363 100644
--- a/tests/qemu-iotests/tests/qsd-jobs.out
+++ b/tests/qemu-iotests/tests/qsd-jobs.out
@@ -4,19 +4,19 @@ Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134217728 backing_file=TEST_DIR/
 
 === Job still present at shutdown ===
 
+wrote 65536/65536 bytes at offset 0
+64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
 QMP_VERSION
 {"return": {}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "created", "id": "job0"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "job0"}}
 {"return": {}}
-{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "ready", "id": "job0"}}
-{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_READY", "data": {"device": "job0", "len": 0, "offset": 0, "speed": 0, "type": "commit"}}
 {"return": {}}
-{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "standby", "id": "job0"}}
-{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "ready", "id": "job0"}}
-{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "waiting", "id": "job0"}}
-{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "pending", "id": "job0"}}
-{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_COMPLETED", "data": {"device": "job0", "len": 0, "offset": 0, "speed": 0, "type": "commit"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "paused", "id": "job0"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "job0"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "aborting", "id": "job0"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_CANCELLED", "data": {"device": "job0", "len": 65536, "offset": 65536, "speed": 1, "type": "commit"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "concluded", "id": "job0"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "null", "id": "job0"}}
 
-- 
2.29.2

Reply via email to