Re: various iotests failures apparently due to overly optimistic timeout settings
On Mon, 20 Jul 2020 at 15:18, John Snow wrote: > > On 7/20/20 10:15 AM, Peter Maydell wrote: > > On Mon, 20 Jul 2020 at 15:12, John Snow wrote: > >> > >> On 7/20/20 6:46 AM, Kevin Wolf wrote: > >>> John, I think this is a result of your recent python/qemu/ changes that > >>> make failure of graceful shutdown an error rather than just silently > >>> falling back to SIGKILL. > >>> > >>> Should the default timeout be longer, should iotests override the > >>> default, or should iotests just always kill their VM instead of trying > >>> to shut it down gracefully? > > > >> Let's make it longer. Since you are seeing the failures, can you edit > >> python/qemu/machine.py and find the default timeout of 3 seconds for > >> shutdown() and change it to 10? 30? 60? > > > > Not conveniently, because this is for merge pull requests, and > > all the iotests failures are nested inside the tests/vm BSD VM > > setup. > > I meant permanently, for everything -- not as a one-off for just this case. Can't conveniently do that either, because the merge build runs on head-of-master plus the thing being merged, so it won't have local edits to machine.py in it, only the version that's in master. -- PMM
Re: various iotests failures apparently due to overly optimistic timeout settings
On 7/20/20 10:15 AM, Peter Maydell wrote: On Mon, 20 Jul 2020 at 15:12, John Snow wrote: On 7/20/20 6:46 AM, Kevin Wolf wrote: John, I think this is a result of your recent python/qemu/ changes that make failure of graceful shutdown an error rather than just silently falling back to SIGKILL. Should the default timeout be longer, should iotests override the default, or should iotests just always kill their VM instead of trying to shut it down gracefully? Let's make it longer. Since you are seeing the failures, can you edit python/qemu/machine.py and find the default timeout of 3 seconds for shutdown() and change it to 10? 30? 60? Not conveniently, because this is for merge pull requests, and all the iotests failures are nested inside the tests/vm BSD VM setup. I meant permanently, for everything -- not as a one-off for just this case. --js
Re: various iotests failures apparently due to overly optimistic timeout settings
On Mon, 20 Jul 2020 at 15:12, John Snow wrote: > > On 7/20/20 6:46 AM, Kevin Wolf wrote: > > John, I think this is a result of your recent python/qemu/ changes that > > make failure of graceful shutdown an error rather than just silently > > falling back to SIGKILL. > > > > Should the default timeout be longer, should iotests override the > > default, or should iotests just always kill their VM instead of trying > > to shut it down gracefully? > Let's make it longer. Since you are seeing the failures, can you edit > python/qemu/machine.py and find the default timeout of 3 seconds for > shutdown() and change it to 10? 30? 60? Not conveniently, because this is for merge pull requests, and all the iotests failures are nested inside the tests/vm BSD VM setup. thanks -- PMM
Re: various iotests failures apparently due to overly optimistic timeout settings
On 7/20/20 6:46 AM, Kevin Wolf wrote: Am 19.07.2020 um 14:07 hat Peter Maydell geschrieben: I just had a bunch of iotests fail on a freebsd VM test run. I think the machine the VM runs on is sometimes a bit heavily loaded for I/O, which means the VM can run slowly. This causes various over-optimistic timeouts in the iotest testsuite to spuriously fail. I also saw the 030 failure on the netbsd VM. Can we try to avoid timing-sensitive tests where we can, and make timeouts longer than 3 seconds where we can't, please? This kind of intermittent failure makes my mergetests flaky. John, I think this is a result of your recent python/qemu/ changes that make failure of graceful shutdown an error rather than just silently falling back to SIGKILL. Should the default timeout be longer, should iotests override the default, or should iotests just always kill their VM instead of trying to shut it down gracefully? Kevin Let's make it longer. Since you are seeing the failures, can you edit python/qemu/machine.py and find the default timeout of 3 seconds for shutdown() and change it to 10? 30? 60? 3 seconds was enough for me, but they shouldn't be freezing. If there are cases where the QEMU binary is hanging or not exiting correctly, or it's not important that it exits correctly, we can override it; in those cases use vm.kill() instead of vm.shutdown(). Or, if you want to try to shut it down normally but don't care if it doesn't work right, you can do: try: vm.shutdown() except AbornalShutdown: pass Relevant bits of the log below, there are several different flavours of failure. TESTiotest-qcow2: 030 [fail] QEMU -- "/home/qemu/qemu-test.oJwuyW/build/tests/qemu-iotests/../../aarch64-softmmu/qemu-system-aarch64" -nodefaults -display none -machine virt -accel qtest QEMU_IMG -- "/home/qemu/qemu-test.oJwuyW/build/tests/qemu-iotests/../../qemu-img" QEMU_IO -- "/home/qemu/qemu-test.oJwuyW/build/tests/qemu-iotests/../../qemu-io" --cache writeback --aio threads -f qcow2 QEMU_NBD -- "/home/qemu/qemu-test.oJwuyW/build/tests/qemu-iotests/../../qemu-nbd" IMGFMT-- qcow2 (compat=1.1) IMGPROTO -- file PLATFORM -- FreeBSD/amd64 freebsd 12.1-RELEASE TEST_DIR -- /home/qemu/qemu-test.oJwuyW/build/tests/qemu-iotests/scratch SOCK_DIR -- /tmp/tmp.JFidXl5N SOCKET_SCM_HELPER -- --- /home/qemu/qemu-test.oJwuyW/src/tests/qemu-iotests/030.out 2020-07-19 09:29:05.0 + +++ /home/qemu/qemu-test.oJwuyW/build/tests/qemu-iotests/030.out.bad 2020-07-19 09:52:04.256005000 + @@ -1,5 +1,57 @@ -... +WARNING:qemu.machine:qemu received signal 9; command: "/home/qemu/qemu-test.oJwuyW/build/tests/qemu-iotests/../../aarch64-softmmu/qemu-system-aarch64 -display none -vga none -chardev socket,id=mon,path=/tmp/tmp.JFidXl5N/qemu-24906-monitor.sock -mon chardev=mon,mode=control -qtest unix:path=/tmp/tmp.JFidXl5N/qemu-24906-qtest.sock -accel qtest -nodefaults -display none -machine virt -accel qtest -drive if=virtio,id=drive0,file=blkdebug::/home/qemu/qemu-test.oJwuyW/build/tests/qemu-iotests/scratch/test.img,format=qcow2,cache=writeback,aio=threads" +WARNING:qemu.machine:qemu received signal 9; command: "/home/qemu/qemu-test.oJwuyW/build/tests/qemu-iotests/../../aarch64-softmmu/qemu-system-aarch64 -display none -vga none -chardev socket,id=mon,path=/tmp/tmp.JFidXl5N/qemu-24906-monitor.sock -mon chardev=mon,mode=control -qtest unix:path=/tmp/tmp.JFidXl5N/qemu-24906-qtest.sock -accel qtest -nodefaults -display none -machine virt -accel qtest -drive if=virtio,id=drive0,file=blkdebug::/home/qemu/qemu-test.oJwuyW/build/tests/qemu-iotests/scratch/test.img,format=qcow2,cache=writeback,aio=threads,backing.node-name=mid,backing.backing.node-name=base" +...E..E +== +ERROR: test_set_speed (__main__.TestSetSpeed) -- +Traceback (most recent call last): + File "/usr/home/qemu/qemu-test.oJwuyW/src/tests/qemu-iotests/../../python/qemu/machine.py", line 436, in _do_shutdown +self._soft_shutdown(has_quit, timeout) + File "/usr/home/qemu/qemu-test.oJwuyW/src/tests/qemu-iotests/../../python/qemu/machine.py", line 419, in _soft_shutdown +self._popen.wait(timeout=timeout) + File "/usr/local/lib/python3.7/subprocess.py", line 1019, in wait +return self._wait(timeout=timeout) + File "/usr/local/lib/python3.7/subprocess.py", line 1645, in _wait +raise TimeoutExpired(self.args, timeout) +subprocess.TimeoutExpired: Command '['/home/qemu/qemu-test.oJwuyW/build/tests/qemu-iotests/../../aarch64-softmmu/qemu-system-aarch64', '-display', 'none', '-vga', 'none', '-chardev', 'socket,id=mon,path=/tmp/tmp.JFidXl5N/qemu-24906-monitor.sock', '-mon', 'chardev=mon,mode=control', '-qtest', 'unix:path=/tmp/tmp.JFidXl5N/qemu-24906-qtest.sock', '-accel', 'qtest', '-nodefaults', '-display', 'none', '-mac
Re: various iotests failures apparently due to overly optimistic timeout settings
Am 19.07.2020 um 14:07 hat Peter Maydell geschrieben: > I just had a bunch of iotests fail on a freebsd VM test run. > I think the machine the VM runs on is sometimes a bit heavily > loaded for I/O, which means the VM can run slowly. This causes > various over-optimistic timeouts in the iotest testsuite to > spuriously fail. I also saw the 030 failure on the netbsd VM. > > Can we try to avoid timing-sensitive tests where we can, > and make timeouts longer than 3 seconds where we can't, please? > This kind of intermittent failure makes my mergetests flaky. John, I think this is a result of your recent python/qemu/ changes that make failure of graceful shutdown an error rather than just silently falling back to SIGKILL. Should the default timeout be longer, should iotests override the default, or should iotests just always kill their VM instead of trying to shut it down gracefully? Kevin > Relevant bits of the log below, there are several different > flavours of failure. > > > TESTiotest-qcow2: 030 [fail] > QEMU -- > "/home/qemu/qemu-test.oJwuyW/build/tests/qemu-iotests/../../aarch64-softmmu/qemu-system-aarch64" > -nodefaults -display none -machine virt -accel qtest > QEMU_IMG -- > "/home/qemu/qemu-test.oJwuyW/build/tests/qemu-iotests/../../qemu-img" > QEMU_IO -- > "/home/qemu/qemu-test.oJwuyW/build/tests/qemu-iotests/../../qemu-io" > --cache writeback --aio threads -f qcow2 > QEMU_NBD -- > "/home/qemu/qemu-test.oJwuyW/build/tests/qemu-iotests/../../qemu-nbd" > IMGFMT-- qcow2 (compat=1.1) > IMGPROTO -- file > PLATFORM -- FreeBSD/amd64 freebsd 12.1-RELEASE > TEST_DIR -- /home/qemu/qemu-test.oJwuyW/build/tests/qemu-iotests/scratch > SOCK_DIR -- /tmp/tmp.JFidXl5N > SOCKET_SCM_HELPER -- > > --- /home/qemu/qemu-test.oJwuyW/src/tests/qemu-iotests/030.out > 2020-07-19 09:29:05.0 + > +++ /home/qemu/qemu-test.oJwuyW/build/tests/qemu-iotests/030.out.bad > 2020-07-19 09:52:04.256005000 + > @@ -1,5 +1,57 @@ > -... > +WARNING:qemu.machine:qemu received signal 9; command: > "/home/qemu/qemu-test.oJwuyW/build/tests/qemu-iotests/../../aarch64-softmmu/qemu-system-aarch64 > -display none -vga none -chardev > socket,id=mon,path=/tmp/tmp.JFidXl5N/qemu-24906-monitor.sock -mon > chardev=mon,mode=control -qtest > unix:path=/tmp/tmp.JFidXl5N/qemu-24906-qtest.sock -accel qtest > -nodefaults -display none -machine virt -accel qtest -drive > if=virtio,id=drive0,file=blkdebug::/home/qemu/qemu-test.oJwuyW/build/tests/qemu-iotests/scratch/test.img,format=qcow2,cache=writeback,aio=threads" > +WARNING:qemu.machine:qemu received signal 9; command: > "/home/qemu/qemu-test.oJwuyW/build/tests/qemu-iotests/../../aarch64-softmmu/qemu-system-aarch64 > -display none -vga none -chardev > socket,id=mon,path=/tmp/tmp.JFidXl5N/qemu-24906-monitor.sock -mon > chardev=mon,mode=control -qtest > unix:path=/tmp/tmp.JFidXl5N/qemu-24906-qtest.sock -accel qtest > -nodefaults -display none -machine virt -accel qtest -drive > if=virtio,id=drive0,file=blkdebug::/home/qemu/qemu-test.oJwuyW/build/tests/qemu-iotests/scratch/test.img,format=qcow2,cache=writeback,aio=threads,backing.node-name=mid,backing.backing.node-name=base" > +...E..E > +== > +ERROR: test_set_speed (__main__.TestSetSpeed) > -- > +Traceback (most recent call last): > + File > "/usr/home/qemu/qemu-test.oJwuyW/src/tests/qemu-iotests/../../python/qemu/machine.py", > line 436, in _do_shutdown > +self._soft_shutdown(has_quit, timeout) > + File > "/usr/home/qemu/qemu-test.oJwuyW/src/tests/qemu-iotests/../../python/qemu/machine.py", > line 419, in _soft_shutdown > +self._popen.wait(timeout=timeout) > + File "/usr/local/lib/python3.7/subprocess.py", line 1019, in wait > +return self._wait(timeout=timeout) > + File "/usr/local/lib/python3.7/subprocess.py", line 1645, in _wait > +raise TimeoutExpired(self.args, timeout) > +subprocess.TimeoutExpired: Command > '['/home/qemu/qemu-test.oJwuyW/build/tests/qemu-iotests/../../aarch64-softmmu/qemu-system-aarch64', > '-display', 'none', '-vga', 'none', '-chardev', > 'socket,id=mon,path=/tmp/tmp.JFidXl5N/qemu-24906-monitor.sock', > '-mon', 'chardev=mon,mode=control', '-qtest', > 'unix:path=/tmp/tmp.JFidXl5N/qemu-24906-qtest.sock', '-accel', > 'qtest', '-nodefaults', '-display', 'none', '-machine', 'virt', > '-accel', 'qtest', '-drive', > 'if=virtio,id=drive0,file=blkdebug::/home/qemu/qemu-test.oJwuyW/build/tests/qemu-iotests/scratch/test.img,format=qcow2,cache=writeback,aio=threads']' > timed out after 3 seconds > + > +The above exception was the direct cause of the following exception: > + > +Traceback (most recent call last): > + File "030", line 900, in tearDown > +self.vm.shutdown() > + File > "/usr/home/qemu/qemu-test.oJwuyW/src/tests/qemu-iotests/../../python/qemu/machine.
various iotests failures apparently due to overly optimistic timeout settings
I just had a bunch of iotests fail on a freebsd VM test run. I think the machine the VM runs on is sometimes a bit heavily loaded for I/O, which means the VM can run slowly. This causes various over-optimistic timeouts in the iotest testsuite to spuriously fail. I also saw the 030 failure on the netbsd VM. Can we try to avoid timing-sensitive tests where we can, and make timeouts longer than 3 seconds where we can't, please? This kind of intermittent failure makes my mergetests flaky. Relevant bits of the log below, there are several different flavours of failure. TESTiotest-qcow2: 030 [fail] QEMU -- "/home/qemu/qemu-test.oJwuyW/build/tests/qemu-iotests/../../aarch64-softmmu/qemu-system-aarch64" -nodefaults -display none -machine virt -accel qtest QEMU_IMG -- "/home/qemu/qemu-test.oJwuyW/build/tests/qemu-iotests/../../qemu-img" QEMU_IO -- "/home/qemu/qemu-test.oJwuyW/build/tests/qemu-iotests/../../qemu-io" --cache writeback --aio threads -f qcow2 QEMU_NBD -- "/home/qemu/qemu-test.oJwuyW/build/tests/qemu-iotests/../../qemu-nbd" IMGFMT-- qcow2 (compat=1.1) IMGPROTO -- file PLATFORM -- FreeBSD/amd64 freebsd 12.1-RELEASE TEST_DIR -- /home/qemu/qemu-test.oJwuyW/build/tests/qemu-iotests/scratch SOCK_DIR -- /tmp/tmp.JFidXl5N SOCKET_SCM_HELPER -- --- /home/qemu/qemu-test.oJwuyW/src/tests/qemu-iotests/030.out 2020-07-19 09:29:05.0 + +++ /home/qemu/qemu-test.oJwuyW/build/tests/qemu-iotests/030.out.bad 2020-07-19 09:52:04.256005000 + @@ -1,5 +1,57 @@ -... +WARNING:qemu.machine:qemu received signal 9; command: "/home/qemu/qemu-test.oJwuyW/build/tests/qemu-iotests/../../aarch64-softmmu/qemu-system-aarch64 -display none -vga none -chardev socket,id=mon,path=/tmp/tmp.JFidXl5N/qemu-24906-monitor.sock -mon chardev=mon,mode=control -qtest unix:path=/tmp/tmp.JFidXl5N/qemu-24906-qtest.sock -accel qtest -nodefaults -display none -machine virt -accel qtest -drive if=virtio,id=drive0,file=blkdebug::/home/qemu/qemu-test.oJwuyW/build/tests/qemu-iotests/scratch/test.img,format=qcow2,cache=writeback,aio=threads" +WARNING:qemu.machine:qemu received signal 9; command: "/home/qemu/qemu-test.oJwuyW/build/tests/qemu-iotests/../../aarch64-softmmu/qemu-system-aarch64 -display none -vga none -chardev socket,id=mon,path=/tmp/tmp.JFidXl5N/qemu-24906-monitor.sock -mon chardev=mon,mode=control -qtest unix:path=/tmp/tmp.JFidXl5N/qemu-24906-qtest.sock -accel qtest -nodefaults -display none -machine virt -accel qtest -drive if=virtio,id=drive0,file=blkdebug::/home/qemu/qemu-test.oJwuyW/build/tests/qemu-iotests/scratch/test.img,format=qcow2,cache=writeback,aio=threads,backing.node-name=mid,backing.backing.node-name=base" +...E..E +== +ERROR: test_set_speed (__main__.TestSetSpeed) -- +Traceback (most recent call last): + File "/usr/home/qemu/qemu-test.oJwuyW/src/tests/qemu-iotests/../../python/qemu/machine.py", line 436, in _do_shutdown +self._soft_shutdown(has_quit, timeout) + File "/usr/home/qemu/qemu-test.oJwuyW/src/tests/qemu-iotests/../../python/qemu/machine.py", line 419, in _soft_shutdown +self._popen.wait(timeout=timeout) + File "/usr/local/lib/python3.7/subprocess.py", line 1019, in wait +return self._wait(timeout=timeout) + File "/usr/local/lib/python3.7/subprocess.py", line 1645, in _wait +raise TimeoutExpired(self.args, timeout) +subprocess.TimeoutExpired: Command '['/home/qemu/qemu-test.oJwuyW/build/tests/qemu-iotests/../../aarch64-softmmu/qemu-system-aarch64', '-display', 'none', '-vga', 'none', '-chardev', 'socket,id=mon,path=/tmp/tmp.JFidXl5N/qemu-24906-monitor.sock', '-mon', 'chardev=mon,mode=control', '-qtest', 'unix:path=/tmp/tmp.JFidXl5N/qemu-24906-qtest.sock', '-accel', 'qtest', '-nodefaults', '-display', 'none', '-machine', 'virt', '-accel', 'qtest', '-drive', 'if=virtio,id=drive0,file=blkdebug::/home/qemu/qemu-test.oJwuyW/build/tests/qemu-iotests/scratch/test.img,format=qcow2,cache=writeback,aio=threads']' timed out after 3 seconds + +The above exception was the direct cause of the following exception: + +Traceback (most recent call last): + File "030", line 900, in tearDown +self.vm.shutdown() + File "/usr/home/qemu/qemu-test.oJwuyW/src/tests/qemu-iotests/../../python/qemu/machine.py", line 466, in shutdown +self._do_shutdown(has_quit, timeout=timeout) + File "/usr/home/qemu/qemu-test.oJwuyW/src/tests/qemu-iotests/../../python/qemu/machine.py", line 440, in _do_shutdown +from exc +qemu.machine.AbnormalShutdown: Could not perform graceful shutdown + +== +ERROR: test_stream_no_op (__main__.TestSingleDrive) +-- +Traceback (most recent call last): + File "/usr/home/qemu/qemu-test.oJwuyW/src/tests/qe