On 3/31/22 16:46, John Snow wrote:
On Thu, Mar 31, 2022, 10:37 AM John Snow <js...@redhat.com
<mailto:js...@redhat.com>> wrote:
On Thu, Mar 31, 2022, 6:47 AM Li Zhang <lizh...@suse.de
<mailto:lizh...@suse.de>> wrote:
On 3/31/22 08:10, Hanna Reitz wrote:
> On 29.03.22 18:49, Li Zhang wrote:
>> Update what I observed.
>>
>> It seems that aqmp is not stable when running test cases.
>> So I revert the patches as the following, iotest40/41 test
cases work
>> well.
>
> Thanks for bisecting. I haven’t seen this problem before, so
I didn’t
> look into it; CC-ing John, perhaps he figures something.
>
Thanks a lot.
It happens occassionally. I didn't find a way to reproduce it
manually.
Sorry for the instability.
A few questions then:
- what operating system and version
- kernel version
- architecture (looks like x86 and others?)
- what commit/version of QEMU are you testing?
- what python version are you using to run iotests?
I'll also point out these patchsets, because we're in different
timezones and if I'm lucky it will be the answer:
[1] https://patchew.org/QEMU/20220201041134.1237016-1-js...@redhat.com/
<https://patchew.org/QEMU/20220201041134.1237016-1-js...@redhat.com/>
[2] https://patchew.org/QEMU/20220225205948.3693480-1-js...@redhat.com/
<https://patchew.org/QEMU/20220225205948.3693480-1-js...@redhat.com/>
Thanks a lot.
I backport aqmp related patches, it looks good to run the cases
iotest040/041.
#1 wasn't sufficient to fix the problem, but I had hoped #2 was. The
second patchset was merged somewhat recently, just before rc testing for
7.0.
Thanks,
--js
> Hanna
>
>> commit 76cd358671e6b8e7c435ec65b1c44200254514a9
>>
>> Author: John Snow <js...@redhat.com <mailto:js...@redhat.com>>
>>
>> Date: Tue Oct 26 13:56:12 2021 -0400
>>
>>
>>
>> python, iotests: replace qmp with aqmp
>>
>>
>>
>> Swap out the synchronous QEMUMonitorProtocol from
qemu.qmp with
>> the sync
>>
>> wrapper from qemu.aqmp instead.
>>
>>
>>
>> Add an escape hatch in the form of the environment variable
>>
>> QEMU_PYTHON_LEGACY_QMP which allows you to cajole
QEMUMachine into
>> using
>>
>> the old implementation, proving that both
implementations work
>>
>> concurrently.
>>
>>
>>
>> Signed-off-by: John Snow <js...@redhat.com
<mailto:js...@redhat.com>>
>>
>> Reviewed-by: Kevin Wolf <kw...@redhat.com
<mailto:kw...@redhat.com>>
>>
>> Reviewed-by: Hanna Reitz <hre...@redhat.com
<mailto:hre...@redhat.com>>
>>
>> Message-id: 20211026175612.4127598-9-js...@redhat.com
<mailto:20211026175612.4127598-9-js...@redhat.com>
>>
>> Signed-off-by: John Snow <js...@redhat.com
<mailto:js...@redhat.com>>
>>
>>
>> commit 1611e6cf4e7163f6102b37010a8b7e7120f468b5
>>
>> Author: John Snow <js...@redhat.com <mailto:js...@redhat.com>>
>>
>> Date: Thu Nov 18 15:46:18 2021 -0500
>>
>>
>>
>> python/machine: handle "fast" QEMU terminations
>>
>>
>>
>> In the case that the QEMU process actually launches --
but then
>> dies so
>>
>> quickly that we can't establish a QMP connection to it
-- QEMUMachine
>>
>> currently calls _post_shutdown() assuming that it never
launched
>> the VM
>>
>> process.
>>
>>
>>
>> This isn't true, though: it "merely" may have failed to
establish
>> a QMP
>>
>> connection and the process is in the middle of its own
exit path.
>>
>>
>>
>> If we don't wait for the subprocess, the caller may get
a bogus
>> `None`
>>
>> return for .exitcode(). This behavior was observed from
>>
>> device-crash-test; after the switch to Async QMP, the
timings were
>>
>> changed such that it was now seemingly possible to
witness the
>> failure
>>
>> of "vm.launch()" *prior* to the exitcode becoming available.
>>
>>
>>
>> The semantic of the `_launched` property is changed in this
>>
>> patch. Instead of representing the condition "launch()
executed
>>
>> successfully", it will now represent "has forked a child
process
>>
>> successfully". This way, wait() when called in the exit
path won't
>>
>> become a no-op.
>>
>>
>>
>> Signed-off-by: John Snow <js...@redhat.com
<mailto:js...@redhat.com>>
>>
>> Reviewed-by: Willian Rampazzo <willi...@redhat.com
<mailto:willi...@redhat.com>>
>>
>> Message-id: 20211118204620.1897674-6-js...@redhat.com
<mailto:20211118204620.1897674-6-js...@redhat.com>
>>
>> Signed-off-by: John Snow <js...@redhat.com
<mailto:js...@redhat.com>>
>>
>>
>>
>>
>>
>> On 3/25/22 11:17, Li Zhang wrote:
>>> Hi,
>>>
>>> I backport some iotests patches to the tree and change timeout.
>>> It doesn't work.
>>>
>>> Sometimes, iotest41 also reports the errors.
>>> [ 1347s]
>>>
+======================================================================
>>> [ 1347s] +ERROR: test_top_node_in_wrong_chain
(__main__.TestSingleDrive)
>>> [ 1347s]
>>>
+----------------------------------------------------------------------
>>> [ 1347s] +Traceback (most recent call last):
>>> [ 1347s] + File
>>>
"/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/machine.py",
>>> line 399, in launch
>>> [ 1347s] + self._launch()
>>> [ 1347s] + File
>>>
"/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/machine.py",
>>> line 434, in _launch
>>> [ 1347s] + self._post_launch()
>>> [ 1347s] + File
>>>
"/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/qtest.py",
line
>>> 147, in _post_launch
>>> [ 1347s] + super()._post_launch()
>>> [ 1347s] + File
>>>
"/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/machine.py",
>>> line 340, in _post_launch
>>> [ 1347s] + self._qmp.accept(self._qmp_timer)
>>> [ 1347s] + File
>>>
"/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/aqmp/legacy.py",
>>> line 69, in accept
>>> [ 1347s] + timeout
>>> [ 1347s] + File
>>>
"/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/aqmp/legacy.py",
>>> line 42, in _sync
>>> [ 1347s] + asyncio.wait_for(future, timeout=timeout)
>>> [ 1347s] + File
"/usr/lib64/python3.6/asyncio/base_events.py", line
>>> 488, in run_until_complete
>>> [ 1347s] + return future.result()
>>> [ 1347s] + File "/usr/lib64/python3.6/asyncio/tasks.py",
line 362,
>>> in wait_for
>>> [ 1347s] + raise futures.TimeoutError()
>>> [ 1347s] +concurrent.futures._base.TimeoutError
>>>
>>>
>>> I can see other errors like this, it's the problem of the
socket.
>>>
>>> [ 1535s] socket_accept failed: Resource temporarily unavailable
>>> [ 1535s] **
>>> [ 1535s]
>>>
ERROR:../tests/qtest/libqtest.c:321:qtest_init_without_qmp_handshake:
>>> assertion failed: (s->fd >= 0 && s->qmp_fd >= 0)
>>>
>>>
>>> The script is running the command as this:
>>> /usr/bin/make -O -j4 check-block V=1
>>>
>>> I can see the errors on ppc, arm or x86.
>>> But I couldn't reproduce it when I run it manually.
>>>
>>> It will be appreciated if any suggestions. Thanks.
>>>
>>>
>>> On 3/24/22 14:47, Li Zhang wrote:
>>>> Hi,
>>>>
>>>> When I run the testsuit on our buidling system, it reports
a timeout
>>>> sometimes not always as the following.
>>>> It couldn't connect qmp socket. Any ideas about this problem?
>>>>
>>>>
>>>> [ 1989s] ---
>>>>
/home/abuild/rpmbuild/BUILD/qemu-6.2.0/tests/qemu-iotests/040.out
>>>> [ 1989s] +++ 040.out.bad
>>>> [ 1989s] @@ -1,5 +1,55 @@
>>>> [ 1989s]
>>>>
-.................................................................
>>>> [ 1989s] +....ERROR:qemu.aqmp.qmp_client.qemu-6471:Failed to
>>>> establish connection: asyncio.exceptions.CancelledError
>>>> [ 1989s]
>>>>
+E..................................ERROR:qemu.aqmp.qmp_client.qemu-6471:Failed
>>>> to establish connection: asyncio.exceptions.CancelledError
>>>> [ 1989s] +E.........................
>>>> [ 1989s]
>>>>
+======================================================================
>>>> [ 1989s] +ERROR: test_commit_node
(__main__.TestActiveZeroLengthImage)
>>>> [ 1989s]
>>>>
+----------------------------------------------------------------------
>>>> [ 1989s] +Traceback (most recent call last):
>>>> [ 1989s] + File
>>>>
"/home/abuild/rpmbuild/BUILD/qemu-6.2.0/tests/qemu-iotests/040",
>>>> line 94, in setUp
>>>> [ 1989s] + self.vm.launch()
>>>> [ 1989s] + File
>>>>
"/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/machine.py",
>>>> line 399, in launch
>>>> [ 1989s] + self._launch()
>>>> [ 1989s] + File
>>>>
"/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/machine.py",
>>>> line 434, in _launch
>>>> [ 1989s] + self._post_launch()
>>>> [ 1989s] + File
>>>>
"/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/qtest.py",
>>>> line 147, in _post_launch
>>>> [ 1989s] + super()._post_launch()
>>>> [ 1989s] + File
>>>>
"/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/machine/machine.py",
>>>> line 340, in _post_launch
>>>> [ 1989s] + self._qmp.accept(self._qmp_timer)
>>>> [ 1989s] + File
>>>>
"/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/aqmp/legacy.py",
>>>> line 67, in accept
>>>> [ 1989s] + self._sync(
>>>> [ 1989s] + File
>>>>
"/home/abuild/rpmbuild/BUILD/qemu-6.2.0/python/qemu/aqmp/legacy.py",
>>>> line 41, in _sync
>>>> [ 1989s] + return self._aloop.run_until_complete(
>>>> [ 1989s] + File
"/usr/lib64/python3.8/asyncio/base_events.py", line
>>>> 616, in run_until_complete
>>>> [ 1989s] + return future.result()
>>>> [ 1989s] + File "/usr/lib64/python3.8/asyncio/tasks.py",
line 501,
>>>> in wait_for
>>>> [ 1989s] + raise exceptions.TimeoutError()
>>>> [ 1989s] +asyncio.exceptions.TimeoutError
>>>>
>>>
>>>
>>
>