On 07/18/2016 05:22 PM, Vladimir Sementsov-Ogievskiy wrote:
Hi all!

This is a variant of existing test case which produces test failure.

It looks like the reason is:

one block job is in backup_complete, in synchronous bdrv_flush (success job)
other (job with injected io err) tries to synchronously cancel "success job"
It looks like some kind of dead-lock

-..........
+........EEE
+======================================================================
+ERROR: test_transaction_failure (__main__.TestIncrementalBackup)
+Test: Verify backups made from a transaction that partially fails.
+----------------------------------------------------------------------
+Traceback (most recent call last):
+  File "124", line 478, in test_transaction_failure
+    self.wait_qmp_backup_cancelled(drive0['id'])
+  File "124", line 173, in wait_qmp_backup_cancelled
+    match={'data': {'device': device}})
+  File "/work/src/qemu/tests/qemu-iotests/iotests.py", line 308, in event_wait
+    event = self._qmp.pull_event(wait=timeout)
+  File "/work/src/qemu/tests/qemu-iotests/../../scripts/qmp/qmp.py", line 194, 
in pull_event
+    self.__get_events(wait)
+  File "/work/src/qemu/tests/qemu-iotests/../../scripts/qmp/qmp.py", line 109, 
in __get_events
+    raise QMPTimeoutError("Timeout waiting for event")
+QMPTimeoutError: Timeout waiting for event
+
+======================================================================
+ERROR: test_transaction_failure (__main__.TestIncrementalBackup)
+Test: Verify backups made from a transaction that partially fails.
+----------------------------------------------------------------------
+Traceback (most recent call last):
+  File "124", line 272, in tearDown
+    self.vm.shutdown()
+  File "/work/src/qemu/tests/qemu-iotests/iotests.py", line 260, in shutdown
+    self._qmp.cmd('quit')
+  File "/work/src/qemu/tests/qemu-iotests/../../scripts/qmp/qmp.py", line 172, 
in cmd
+    return self.cmd_obj(qmp_cmd)
+  File "/work/src/qemu/tests/qemu-iotests/../../scripts/qmp/qmp.py", line 157, 
in cmd_obj
+    return self.__json_read()
+  File "/work/src/qemu/tests/qemu-iotests/../../scripts/qmp/qmp.py", line 66, 
in __json_read
+    data = self.__sockfile.readline()
+  File "/usr/lib64/python2.7/socket.py", line 447, in readline
+    data = self._sock.recv(self._rbufsize)
+timeout: timed out
+
+======================================================================
+ERROR: test_incremental_failure (__main__.TestIncrementalBackupBlkdebug)
+Test: Verify backups made after a failure are correct.
+----------------------------------------------------------------------
+Traceback (most recent call last):
+  File "124", line 549, in setUp
+    self.vm.launch()
+  File "/work/src/qemu/tests/qemu-iotests/iotests.py", line 246, in launch
+    self._qmp = qmp.QEMUMonitorProtocol(self._monitor_path, server=True)
+  File "/work/src/qemu/tests/qemu-iotests/../../scripts/qmp/qmp.py", line 44, 
in __init__
+    self.__sock.bind(self.__address)
+  File "/usr/lib64/python2.7/socket.py", line 224, in meth
+    return getattr(self._sock,name)(*args)
+error: [Errno 98] Address already in use
+

I guess this is a residual failure based off of our inability to clean up properly after the real failure.

 ----------------------------------------------------------------------
 Ran 10 tests

-OK
+FAILED (errors=3)
Failures: 124

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsement...@virtuozzo.com>
---
 tests/qemu-iotests/124 | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/tests/qemu-iotests/124 b/tests/qemu-iotests/124
index de7cdbe..74de117 100644
--- a/tests/qemu-iotests/124
+++ b/tests/qemu-iotests/124
@@ -448,9 +448,9 @@ class TestIncrementalBackup(TestIncrementalBackupBase):
         self.assertFalse(self.vm.get_qmp_events(wait=False))

         # Emulate some writes
-        self.hmp_io_writes(drive0['id'], (('0xab', 0, 512),
-                                          ('0xfe', '16M', '256k'),
-                                          ('0x64', '32736k', '64k')))
+        #self.hmp_io_writes(drive0['id'], (('0xab', 0, 512),
+        #                                  ('0xfe', '16M', '256k'),
+        #                                  ('0x64', '32736k', '64k')))
         self.hmp_io_writes(drive1['id'], (('0xba', 0, 512),
                                           ('0xef', '16M', '256k'),
                                           ('0x46', '32736k', '64k')))


Python diffs are awful.

For the sake of the list: You modified the test_transaction_failure test case, which tests an injected failure on one of two drives, both with writes that need to be committed.

You induce the failure by removing the pending writes from drive0 such that drive0 has nothing it needs to commit at all.

We then timeout waiting for notification that drive0's job was canceled... Hm, perhaps this is a race condition where drive0 finishes too quickly to cancel?

Reply via email to