On 07/18/2016 05:22 PM, Vladimir Sementsov-Ogievskiy wrote:
Hi all!
This is a variant of existing test case which produces test failure.
It looks like the reason is:
one block job is in backup_complete, in synchronous bdrv_flush (success job)
other (job with injected io err) tries to synchronously cancel "success job"
It looks like some kind of dead-lock
-..........
+........EEE
+======================================================================
+ERROR: test_transaction_failure (__main__.TestIncrementalBackup)
+Test: Verify backups made from a transaction that partially fails.
+----------------------------------------------------------------------
+Traceback (most recent call last):
+ File "124", line 478, in test_transaction_failure
+ self.wait_qmp_backup_cancelled(drive0['id'])
+ File "124", line 173, in wait_qmp_backup_cancelled
+ match={'data': {'device': device}})
+ File "/work/src/qemu/tests/qemu-iotests/iotests.py", line 308, in event_wait
+ event = self._qmp.pull_event(wait=timeout)
+ File "/work/src/qemu/tests/qemu-iotests/../../scripts/qmp/qmp.py", line 194,
in pull_event
+ self.__get_events(wait)
+ File "/work/src/qemu/tests/qemu-iotests/../../scripts/qmp/qmp.py", line 109,
in __get_events
+ raise QMPTimeoutError("Timeout waiting for event")
+QMPTimeoutError: Timeout waiting for event
+
+======================================================================
+ERROR: test_transaction_failure (__main__.TestIncrementalBackup)
+Test: Verify backups made from a transaction that partially fails.
+----------------------------------------------------------------------
+Traceback (most recent call last):
+ File "124", line 272, in tearDown
+ self.vm.shutdown()
+ File "/work/src/qemu/tests/qemu-iotests/iotests.py", line 260, in shutdown
+ self._qmp.cmd('quit')
+ File "/work/src/qemu/tests/qemu-iotests/../../scripts/qmp/qmp.py", line 172,
in cmd
+ return self.cmd_obj(qmp_cmd)
+ File "/work/src/qemu/tests/qemu-iotests/../../scripts/qmp/qmp.py", line 157,
in cmd_obj
+ return self.__json_read()
+ File "/work/src/qemu/tests/qemu-iotests/../../scripts/qmp/qmp.py", line 66,
in __json_read
+ data = self.__sockfile.readline()
+ File "/usr/lib64/python2.7/socket.py", line 447, in readline
+ data = self._sock.recv(self._rbufsize)
+timeout: timed out
+
+======================================================================
+ERROR: test_incremental_failure (__main__.TestIncrementalBackupBlkdebug)
+Test: Verify backups made after a failure are correct.
+----------------------------------------------------------------------
+Traceback (most recent call last):
+ File "124", line 549, in setUp
+ self.vm.launch()
+ File "/work/src/qemu/tests/qemu-iotests/iotests.py", line 246, in launch
+ self._qmp = qmp.QEMUMonitorProtocol(self._monitor_path, server=True)
+ File "/work/src/qemu/tests/qemu-iotests/../../scripts/qmp/qmp.py", line 44,
in __init__
+ self.__sock.bind(self.__address)
+ File "/usr/lib64/python2.7/socket.py", line 224, in meth
+ return getattr(self._sock,name)(*args)
+error: [Errno 98] Address already in use
+
I guess this is a residual failure based off of our inability to clean
up properly after the real failure.
----------------------------------------------------------------------
Ran 10 tests
-OK
+FAILED (errors=3)
Failures: 124
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsement...@virtuozzo.com>
---
tests/qemu-iotests/124 | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/tests/qemu-iotests/124 b/tests/qemu-iotests/124
index de7cdbe..74de117 100644
--- a/tests/qemu-iotests/124
+++ b/tests/qemu-iotests/124
@@ -448,9 +448,9 @@ class TestIncrementalBackup(TestIncrementalBackupBase):
self.assertFalse(self.vm.get_qmp_events(wait=False))
# Emulate some writes
- self.hmp_io_writes(drive0['id'], (('0xab', 0, 512),
- ('0xfe', '16M', '256k'),
- ('0x64', '32736k', '64k')))
+ #self.hmp_io_writes(drive0['id'], (('0xab', 0, 512),
+ # ('0xfe', '16M', '256k'),
+ # ('0x64', '32736k', '64k')))
self.hmp_io_writes(drive1['id'], (('0xba', 0, 512),
('0xef', '16M', '256k'),
('0x46', '32736k', '64k')))
Python diffs are awful.
For the sake of the list: You modified the test_transaction_failure test
case, which tests an injected failure on one of two drives, both with
writes that need to be committed.
You induce the failure by removing the pending writes from drive0 such
that drive0 has nothing it needs to commit at all.
We then timeout waiting for notification that drive0's job was
canceled... Hm, perhaps this is a race condition where drive0 finishes
too quickly to cancel?