Hi Kevin, Peter hit an s390 qemu-iotests failure. I was able to reproduce it easily. I haven't checked if it reproduce on other platforms too, but it seems likely. Here is what I found.
qsd-jobs has race conditions: # Just make sure that this doesn't crash $QSD --chardev stdio,id=stdio --monitor chardev=stdio \ --blockdev node-name=file0,driver=file,filename="$TEST_IMG" \ --blockdev node-name=fmt0,driver=qcow2,file=file0 <<EOF | _filter_qmp {"execute":"qmp_capabilities"} {"execute": "block-commit", "arguments": {"device": "fmt0", "job-id": "job0"}} {"execute": "quit"} EOF The intent is for "quit" to run while job0 still exists. This proves that the program does not abort with an assertion failure when "quit" is called while there are block jobs. bdrv_close_all() expects there to be no jobs. Here is the failure that reproduces 1 out of 3 times: # build/tests/qemu-iotests/check -qcow2 qsd-jobs QEMU -- "/root/qemu/build/tests/qemu-iotests/../../qemu-system-x86_64" -nodefaults -display none -accel qtest QEMU_IMG -- "/root/qemu/build/tests/qemu-iotests/../../qemu-img" QEMU_IO -- "/root/qemu/build/tests/qemu-iotests/../../qemu-io" --cache writeback --aio threads -f qcow2 QEMU_NBD -- "/root/qemu/build/tests/qemu-iotests/../../qemu-nbd" IMGFMT -- qcow2 IMGPROTO -- file PLATFORM -- Linux/s390x TEST_DIR -- /root/qemu/scratch SOCK_DIR -- /tmp/tmpapj5dydm SOCKET_SCM_HELPER -- /root/qemu/build/tests/qemu-iotests/socket_scm_helper qsd-jobs fail [10:42:23] [10:42:23] 0.1s (last: 0.1s) output mismatch (see qsd-jobs.out.bad) --- /root/qemu/tests/qemu-iotests/tests/qsd-jobs.out +++ qsd-jobs.out.bad @@ -9,11 +9,11 @@ {"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "created", "id": "job0"}} {"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "job0"}} {"return": {}} +{"return": {}} +{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "paused", "id": "job0"}} +{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "job0"}} {"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "ready", "id": "job0"}} {"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": "BLOCK_JOB_READY", "data": {"device": "job0", "len": 0, "offset": 0, "speed": 0, "type": "commit"}} -{"return": {}} -{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "standby", "id": "job0"}} -{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "ready", "id": "job0"}} {"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "waiting", "id": "job0"}} {"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "pending", "id": "job0"}} {"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": "BLOCK_JOB_COMPLETED", "data": {"device": "job0", "len": 0, "offset": 0, "speed": 0, "type": "commit"}} Failures: qsd-jobs Failed 1 of 1 iotests The timing of the QMP events seems to be non-deterministic relative to the QMP command completions. It's more complicated than simply lines moving up/down in the expected output: the job state transitions are slightly different (running -> ready vs running -> paused). Should the test case filter out all JOB_* and BLOCK_JOB_* events? Is there a way to make this test deterministic? Stefan
signature.asc
Description: PGP signature