Re: various iotests failures apparently due to overly optimistic timeout settings

2020-07-20 Thread Peter Maydell
On Mon, 20 Jul 2020 at 15:18, John Snow  wrote:
>
> On 7/20/20 10:15 AM, Peter Maydell wrote:
> > On Mon, 20 Jul 2020 at 15:12, John Snow  wrote:
> >>
> >> On 7/20/20 6:46 AM, Kevin Wolf wrote:
> >>> John, I think this is a result of your recent python/qemu/ changes that
> >>> make failure of graceful shutdown an error rather than just silently
> >>> falling back to SIGKILL.
> >>>
> >>> Should the default timeout be longer, should iotests override the
> >>> default, or should iotests just always kill their VM instead of trying
> >>> to shut it down gracefully?
> >
> >> Let's make it longer. Since you are seeing the failures, can you edit
> >> python/qemu/machine.py and find the default timeout of 3 seconds for
> >> shutdown() and change it to 10? 30? 60?
> >
> > Not conveniently, because this is for merge pull requests, and
> > all the iotests failures are nested inside the tests/vm BSD VM
> > setup.
>
> I meant permanently, for everything -- not as a one-off for just this case.

Can't conveniently do that either, because the merge build runs on
head-of-master plus the thing being merged, so it won't have local
edits to machine.py in it, only the version that's in master.

-- PMM



Re: various iotests failures apparently due to overly optimistic timeout settings

2020-07-20 Thread John Snow

On 7/20/20 10:15 AM, Peter Maydell wrote:

On Mon, 20 Jul 2020 at 15:12, John Snow  wrote:


On 7/20/20 6:46 AM, Kevin Wolf wrote:

John, I think this is a result of your recent python/qemu/ changes that
make failure of graceful shutdown an error rather than just silently
falling back to SIGKILL.

Should the default timeout be longer, should iotests override the
default, or should iotests just always kill their VM instead of trying
to shut it down gracefully?



Let's make it longer. Since you are seeing the failures, can you edit
python/qemu/machine.py and find the default timeout of 3 seconds for
shutdown() and change it to 10? 30? 60?


Not conveniently, because this is for merge pull requests, and
all the iotests failures are nested inside the tests/vm BSD VM
setup.


I meant permanently, for everything -- not as a one-off for just this case.

--js




Re: various iotests failures apparently due to overly optimistic timeout settings

2020-07-20 Thread Peter Maydell
On Mon, 20 Jul 2020 at 15:12, John Snow  wrote:
>
> On 7/20/20 6:46 AM, Kevin Wolf wrote:
> > John, I think this is a result of your recent python/qemu/ changes that
> > make failure of graceful shutdown an error rather than just silently
> > falling back to SIGKILL.
> >
> > Should the default timeout be longer, should iotests override the
> > default, or should iotests just always kill their VM instead of trying
> > to shut it down gracefully?

> Let's make it longer. Since you are seeing the failures, can you edit
> python/qemu/machine.py and find the default timeout of 3 seconds for
> shutdown() and change it to 10? 30? 60?

Not conveniently, because this is for merge pull requests, and
all the iotests failures are nested inside the tests/vm BSD VM
setup.

thanks
-- PMM



Re: various iotests failures apparently due to overly optimistic timeout settings

2020-07-20 Thread John Snow

On 7/20/20 6:46 AM, Kevin Wolf wrote:

Am 19.07.2020 um 14:07 hat Peter Maydell geschrieben:

I just had a bunch of iotests fail on a freebsd VM test run.
I think the machine the VM runs on is sometimes a bit heavily
loaded for I/O, which means the VM can run slowly. This causes
various over-optimistic timeouts in the iotest testsuite to
spuriously fail. I also saw the 030 failure on the netbsd VM.

Can we try to avoid timing-sensitive tests where we can,
and make timeouts longer than 3 seconds where we can't, please?
This kind of intermittent failure makes my mergetests flaky.


John, I think this is a result of your recent python/qemu/ changes that
make failure of graceful shutdown an error rather than just silently
falling back to SIGKILL.

Should the default timeout be longer, should iotests override the
default, or should iotests just always kill their VM instead of trying
to shut it down gracefully?

Kevin



Let's make it longer. Since you are seeing the failures, can you edit 
python/qemu/machine.py and find the default timeout of 3 seconds for 
shutdown() and change it to 10? 30? 60?


3 seconds was enough for me, but they shouldn't be freezing.

If there are cases where the QEMU binary is hanging or not exiting 
correctly, or it's not important that it exits correctly, we can 
override it; in those cases use vm.kill() instead of vm.shutdown().


Or, if you want to try to shut it down normally but don't care if it 
doesn't work right, you can do:


try:
vm.shutdown()
except AbornalShutdown:
pass


Relevant bits of the log below, there are several different
flavours of failure.


   TESTiotest-qcow2: 030 [fail]
QEMU  --
"/home/qemu/qemu-test.oJwuyW/build/tests/qemu-iotests/../../aarch64-softmmu/qemu-system-aarch64"
-nodefaults -display none -machine virt -accel qtest
QEMU_IMG  --
"/home/qemu/qemu-test.oJwuyW/build/tests/qemu-iotests/../../qemu-img"
QEMU_IO   --
"/home/qemu/qemu-test.oJwuyW/build/tests/qemu-iotests/../../qemu-io"
--cache writeback --aio threads -f qcow2
QEMU_NBD  --
"/home/qemu/qemu-test.oJwuyW/build/tests/qemu-iotests/../../qemu-nbd"
IMGFMT-- qcow2 (compat=1.1)
IMGPROTO  -- file
PLATFORM  -- FreeBSD/amd64 freebsd 12.1-RELEASE
TEST_DIR  -- /home/qemu/qemu-test.oJwuyW/build/tests/qemu-iotests/scratch
SOCK_DIR  -- /tmp/tmp.JFidXl5N
SOCKET_SCM_HELPER --

--- /home/qemu/qemu-test.oJwuyW/src/tests/qemu-iotests/030.out
2020-07-19 09:29:05.0 +
+++ /home/qemu/qemu-test.oJwuyW/build/tests/qemu-iotests/030.out.bad
  2020-07-19 09:52:04.256005000 +
@@ -1,5 +1,57 @@
-...
+WARNING:qemu.machine:qemu received signal 9; command:
"/home/qemu/qemu-test.oJwuyW/build/tests/qemu-iotests/../../aarch64-softmmu/qemu-system-aarch64
-display none -vga none -chardev
socket,id=mon,path=/tmp/tmp.JFidXl5N/qemu-24906-monitor.sock -mon
chardev=mon,mode=control -qtest
unix:path=/tmp/tmp.JFidXl5N/qemu-24906-qtest.sock -accel qtest
-nodefaults -display none -machine virt -accel qtest -drive
if=virtio,id=drive0,file=blkdebug::/home/qemu/qemu-test.oJwuyW/build/tests/qemu-iotests/scratch/test.img,format=qcow2,cache=writeback,aio=threads"
+WARNING:qemu.machine:qemu received signal 9; command:
"/home/qemu/qemu-test.oJwuyW/build/tests/qemu-iotests/../../aarch64-softmmu/qemu-system-aarch64
-display none -vga none -chardev
socket,id=mon,path=/tmp/tmp.JFidXl5N/qemu-24906-monitor.sock -mon
chardev=mon,mode=control -qtest
unix:path=/tmp/tmp.JFidXl5N/qemu-24906-qtest.sock -accel qtest
-nodefaults -display none -machine virt -accel qtest -drive
if=virtio,id=drive0,file=blkdebug::/home/qemu/qemu-test.oJwuyW/build/tests/qemu-iotests/scratch/test.img,format=qcow2,cache=writeback,aio=threads,backing.node-name=mid,backing.backing.node-name=base"
+...E..E
+==
+ERROR: test_set_speed (__main__.TestSetSpeed)
  --
+Traceback (most recent call last):
+  File 
"/usr/home/qemu/qemu-test.oJwuyW/src/tests/qemu-iotests/../../python/qemu/machine.py",
line 436, in _do_shutdown
+self._soft_shutdown(has_quit, timeout)
+  File 
"/usr/home/qemu/qemu-test.oJwuyW/src/tests/qemu-iotests/../../python/qemu/machine.py",
line 419, in _soft_shutdown
+self._popen.wait(timeout=timeout)
+  File "/usr/local/lib/python3.7/subprocess.py", line 1019, in wait
+return self._wait(timeout=timeout)
+  File "/usr/local/lib/python3.7/subprocess.py", line 1645, in _wait
+raise TimeoutExpired(self.args, timeout)
+subprocess.TimeoutExpired: Command
'['/home/qemu/qemu-test.oJwuyW/build/tests/qemu-iotests/../../aarch64-softmmu/qemu-system-aarch64',
'-display', 'none', '-vga', 'none', '-chardev',
'socket,id=mon,path=/tmp/tmp.JFidXl5N/qemu-24906-monitor.sock',
'-mon', 'chardev=mon,mode=control', '-qtest',
'unix:path=/tmp/tmp.JFidXl5N/qemu-24906-qtest.sock', '-accel',
'qtest', '-nodefaults', '-display', 'none', '-mac

Re: various iotests failures apparently due to overly optimistic timeout settings

2020-07-20 Thread Kevin Wolf
Am 19.07.2020 um 14:07 hat Peter Maydell geschrieben:
> I just had a bunch of iotests fail on a freebsd VM test run.
> I think the machine the VM runs on is sometimes a bit heavily
> loaded for I/O, which means the VM can run slowly. This causes
> various over-optimistic timeouts in the iotest testsuite to
> spuriously fail. I also saw the 030 failure on the netbsd VM.
> 
> Can we try to avoid timing-sensitive tests where we can,
> and make timeouts longer than 3 seconds where we can't, please?
> This kind of intermittent failure makes my mergetests flaky.

John, I think this is a result of your recent python/qemu/ changes that
make failure of graceful shutdown an error rather than just silently
falling back to SIGKILL.

Should the default timeout be longer, should iotests override the
default, or should iotests just always kill their VM instead of trying
to shut it down gracefully?

Kevin

> Relevant bits of the log below, there are several different
> flavours of failure.
> 
> 
>   TESTiotest-qcow2: 030 [fail]
> QEMU  --
> "/home/qemu/qemu-test.oJwuyW/build/tests/qemu-iotests/../../aarch64-softmmu/qemu-system-aarch64"
> -nodefaults -display none -machine virt -accel qtest
> QEMU_IMG  --
> "/home/qemu/qemu-test.oJwuyW/build/tests/qemu-iotests/../../qemu-img"
> QEMU_IO   --
> "/home/qemu/qemu-test.oJwuyW/build/tests/qemu-iotests/../../qemu-io"
> --cache writeback --aio threads -f qcow2
> QEMU_NBD  --
> "/home/qemu/qemu-test.oJwuyW/build/tests/qemu-iotests/../../qemu-nbd"
> IMGFMT-- qcow2 (compat=1.1)
> IMGPROTO  -- file
> PLATFORM  -- FreeBSD/amd64 freebsd 12.1-RELEASE
> TEST_DIR  -- /home/qemu/qemu-test.oJwuyW/build/tests/qemu-iotests/scratch
> SOCK_DIR  -- /tmp/tmp.JFidXl5N
> SOCKET_SCM_HELPER --
> 
> --- /home/qemu/qemu-test.oJwuyW/src/tests/qemu-iotests/030.out
> 2020-07-19 09:29:05.0 +
> +++ /home/qemu/qemu-test.oJwuyW/build/tests/qemu-iotests/030.out.bad
>  2020-07-19 09:52:04.256005000 +
> @@ -1,5 +1,57 @@
> -...
> +WARNING:qemu.machine:qemu received signal 9; command:
> "/home/qemu/qemu-test.oJwuyW/build/tests/qemu-iotests/../../aarch64-softmmu/qemu-system-aarch64
> -display none -vga none -chardev
> socket,id=mon,path=/tmp/tmp.JFidXl5N/qemu-24906-monitor.sock -mon
> chardev=mon,mode=control -qtest
> unix:path=/tmp/tmp.JFidXl5N/qemu-24906-qtest.sock -accel qtest
> -nodefaults -display none -machine virt -accel qtest -drive
> if=virtio,id=drive0,file=blkdebug::/home/qemu/qemu-test.oJwuyW/build/tests/qemu-iotests/scratch/test.img,format=qcow2,cache=writeback,aio=threads"
> +WARNING:qemu.machine:qemu received signal 9; command:
> "/home/qemu/qemu-test.oJwuyW/build/tests/qemu-iotests/../../aarch64-softmmu/qemu-system-aarch64
> -display none -vga none -chardev
> socket,id=mon,path=/tmp/tmp.JFidXl5N/qemu-24906-monitor.sock -mon
> chardev=mon,mode=control -qtest
> unix:path=/tmp/tmp.JFidXl5N/qemu-24906-qtest.sock -accel qtest
> -nodefaults -display none -machine virt -accel qtest -drive
> if=virtio,id=drive0,file=blkdebug::/home/qemu/qemu-test.oJwuyW/build/tests/qemu-iotests/scratch/test.img,format=qcow2,cache=writeback,aio=threads,backing.node-name=mid,backing.backing.node-name=base"
> +...E..E
> +==
> +ERROR: test_set_speed (__main__.TestSetSpeed)
>  --
> +Traceback (most recent call last):
> +  File 
> "/usr/home/qemu/qemu-test.oJwuyW/src/tests/qemu-iotests/../../python/qemu/machine.py",
> line 436, in _do_shutdown
> +self._soft_shutdown(has_quit, timeout)
> +  File 
> "/usr/home/qemu/qemu-test.oJwuyW/src/tests/qemu-iotests/../../python/qemu/machine.py",
> line 419, in _soft_shutdown
> +self._popen.wait(timeout=timeout)
> +  File "/usr/local/lib/python3.7/subprocess.py", line 1019, in wait
> +return self._wait(timeout=timeout)
> +  File "/usr/local/lib/python3.7/subprocess.py", line 1645, in _wait
> +raise TimeoutExpired(self.args, timeout)
> +subprocess.TimeoutExpired: Command
> '['/home/qemu/qemu-test.oJwuyW/build/tests/qemu-iotests/../../aarch64-softmmu/qemu-system-aarch64',
> '-display', 'none', '-vga', 'none', '-chardev',
> 'socket,id=mon,path=/tmp/tmp.JFidXl5N/qemu-24906-monitor.sock',
> '-mon', 'chardev=mon,mode=control', '-qtest',
> 'unix:path=/tmp/tmp.JFidXl5N/qemu-24906-qtest.sock', '-accel',
> 'qtest', '-nodefaults', '-display', 'none', '-machine', 'virt',
> '-accel', 'qtest', '-drive',
> 'if=virtio,id=drive0,file=blkdebug::/home/qemu/qemu-test.oJwuyW/build/tests/qemu-iotests/scratch/test.img,format=qcow2,cache=writeback,aio=threads']'
> timed out after 3 seconds
> +
> +The above exception was the direct cause of the following exception:
> +
> +Traceback (most recent call last):
> +  File "030", line 900, in tearDown
> +self.vm.shutdown()
> +  File 
> "/usr/home/qemu/qemu-test.oJwuyW/src/tests/qemu-iotests/../../python/qemu/machine.