Re: [Qemu-devel] [PATCH 0/3] script for crash-testing -device
On 03/23/2017 05:43 PM, Thomas Huth wrote: On 22.03.2017 20:13, Eduardo Habkost wrote: On Wed, Mar 22, 2017 at 01:00:49PM -0300, Eduardo Habkost wrote: This series adds scripts/device-crashtest.py, that can be used to crash-test -device with multiple machine/accel/device combinations. The script found a few crashes on some machines/devices. A dump of existing cases can be seen here: https://gist.github.com/ehabkost/503b0af0375f0d98d3e84017e8ca54eb The script contains a whitelist that can also be useful as documentation of existing ways -device can fail or crash. Note that the script takes a few hours to run on the default mode (testing all accel/machine/device combinations), but the "-r N" option can be used to make it only test N random samples. Wow, impressive script, that must have been a lot of work 'til you've got it in a usable shape with that huge whitelist! +1 Great work Eduardo, thanks! Something I forgot to mention: I would like to run some subset of these tests on "make check", but I don't know how we could choose that subset. We could run, e.g., 100 random samples, but I am not sure we really want to make "make check" non-deterministic. Maybe limit the tests to the devices that have a high chance to work on different machines? ... that means primarily PCI, ISA and USB devices, I guess. Is hard to maintain that list, it will miss new devices and so on. We should have a "nightly" run or something, but still, maintaining the white list of known errors is still problematic. Thanks, Marcel Thomas
Re: [Qemu-devel] [PATCH 0/3] script for crash-testing -device
On Thu, Mar 23, 2017 at 04:43:01PM +0100, Thomas Huth wrote: > On 22.03.2017 20:13, Eduardo Habkost wrote: > > On Wed, Mar 22, 2017 at 01:00:49PM -0300, Eduardo Habkost wrote: > >> This series adds scripts/device-crashtest.py, that can be used to > >> crash-test -device with multiple machine/accel/device > >> combinations. > >> > >> The script found a few crashes on some machines/devices. A dump > >> of existing cases can be seen here: > >> https://gist.github.com/ehabkost/503b0af0375f0d98d3e84017e8ca54eb > >> > >> The script contains a whitelist that can also be useful as > >> documentation of existing ways -device can fail or crash. > >> > >> Note that the script takes a few hours to run on the default mode > >> (testing all accel/machine/device combinations), but the "-r N" > >> option can be used to make it only test N random samples. > > Wow, impressive script, that must have been a lot of work 'til you've > got it in a usable shape with that huge whitelist! > > > Something I forgot to mention: I would like to run some subset of > > these tests on "make check", but I don't know how we could choose > > that subset. We could run, e.g., 100 random samples, but I am not > > sure we really want to make "make check" non-deterministic. > > Maybe limit the tests to the devices that have a high chance to work on > different machines? ... that means primarily PCI, ISA and USB devices, I > guess. On the other hand, I believe the remaining devices are the ones most likely to crash machines unexpectedly... For reference, these are the numbers when trying to test every single machine type: Total: 89321 test cases pci: 27749 test cases usb: 5125 test cases isa: 3948 test cases >From those 89k test cases, 67k fail (cleanly). The top reasons they fail are: Count | Whitelist entry --+ 20681 | {'log': "No '[\\w-]+' bus found for device '[\\w-]+'"} 13076 | {'log': "Option '-device [\\w.,-]+' cannot be handled by this machine"} 4821 | {'log': '(Guest|ROM|Flash|Kernel) image must be specified'} 4096 | {'device': '.*-(i386|x86_64)-cpu'} 3200 | {'log': "images* must be given with the 'pflash' parameter"} 3084 | {'log': "[cC]ould not load [\\w ]+ (BIOS|bios) '[\\w-]+\\.bin'"} 1120 | {'log': 'Device [\\w.,-]+ can not be dynamically instantiated'} 800 | {'log': "Couldn't find rom image '[\\w-]+\\.bin'"} 607 | {'device': 'vhost-scsi.*'} 551 | {'loglevel': 40, 'log': "Device 'serial0' is in use", 'exitcode': -6} 476 | {'log': 'Device [\\w.,-]+ is not supported by this machine yet'} So, a few things we can do: 1) Using query-device-slots: if the test code knew in advance which buses/device-types are supported by each machine, we could limit the number of devices being tested. That means the test code will probably benefit from a query-device-slots command. This would get rid of the following: 20681 | {'log': "No '[\\w-]+' bus found for device '[\\w-]+'"} 13076 | {'log': "Option '-device [\\w.,-]+' cannot be handled by this machine"} 1120 | {'log': 'Device [\\w.,-]+ can not be dynamically instantiated'} 476 | {'log': 'Device [\\w.,-]+ is not supported by this machine yet'} 2) Don't keep trying to test machines that can't be tested out of the box because they need rom or kernel images. The script can first try to run the machine with no -device arguments, to ensure it is really usable, before trying to test it with all devices. This will get rid of the following: 4821 | {'log': '(Guest|ROM|Flash|Kernel) image must be specified'} 3200 | {'log': "images* must be given with the 'pflash' parameter"} 3084 | {'log': "[cC]ould not load [\\w ]+ (BIOS|bios) '[\\w-]+\\.bin'"} 800 | {'log': "Couldn't find rom image '[\\w-]+\\.bin'"} 3) Not testing the devices from the "devices that won't work out of the box" section. There are ~18k test cases matching those entries. If I did the calculations right, all of the above would eliminate more than 63k test cases. -- Eduardo
Re: [Qemu-devel] [PATCH 0/3] script for crash-testing -device
On 22.03.2017 20:13, Eduardo Habkost wrote: > On Wed, Mar 22, 2017 at 01:00:49PM -0300, Eduardo Habkost wrote: >> This series adds scripts/device-crashtest.py, that can be used to >> crash-test -device with multiple machine/accel/device >> combinations. >> >> The script found a few crashes on some machines/devices. A dump >> of existing cases can be seen here: >> https://gist.github.com/ehabkost/503b0af0375f0d98d3e84017e8ca54eb >> >> The script contains a whitelist that can also be useful as >> documentation of existing ways -device can fail or crash. >> >> Note that the script takes a few hours to run on the default mode >> (testing all accel/machine/device combinations), but the "-r N" >> option can be used to make it only test N random samples. Wow, impressive script, that must have been a lot of work 'til you've got it in a usable shape with that huge whitelist! > Something I forgot to mention: I would like to run some subset of > these tests on "make check", but I don't know how we could choose > that subset. We could run, e.g., 100 random samples, but I am not > sure we really want to make "make check" non-deterministic. Maybe limit the tests to the devices that have a high chance to work on different machines? ... that means primarily PCI, ISA and USB devices, I guess. Thomas
Re: [Qemu-devel] [PATCH 0/3] script for crash-testing -device
On Wed, Mar 22, 2017 at 01:00:49PM -0300, Eduardo Habkost wrote: > This series adds scripts/device-crashtest.py, that can be used to > crash-test -device with multiple machine/accel/device > combinations. > > The script found a few crashes on some machines/devices. A dump > of existing cases can be seen here: > https://gist.github.com/ehabkost/503b0af0375f0d98d3e84017e8ca54eb > > The script contains a whitelist that can also be useful as > documentation of existing ways -device can fail or crash. > > Note that the script takes a few hours to run on the default mode > (testing all accel/machine/device combinations), but the "-r N" > option can be used to make it only test N random samples. Something I forgot to mention: I would like to run some subset of these tests on "make check", but I don't know how we could choose that subset. We could run, e.g., 100 random samples, but I am not sure we really want to make "make check" non-deterministic. Ideas? -- Eduardo
[Qemu-devel] [PATCH 0/3] script for crash-testing -device
This series adds scripts/device-crashtest.py, that can be used to crash-test -device with multiple machine/accel/device combinations. The script found a few crashes on some machines/devices. A dump of existing cases can be seen here: https://gist.github.com/ehabkost/503b0af0375f0d98d3e84017e8ca54eb The script contains a whitelist that can also be useful as documentation of existing ways -device can fail or crash. Note that the script takes a few hours to run on the default mode (testing all accel/machine/device combinations), but the "-r N" option can be used to make it only test N random samples. Example script output: $ ../scripts/device-crash-test.py -v --shuffle INFO: test case: machine=verdex binary=./aarch64-softmmu/qemu-system-aarch64 device=exynos4210-ehci-usb accel=tcg INFO: test case: machine=none binary=./aarch64-softmmu/qemu-system-aarch64 device=onenand accel=tcg INFO: test case: machine=pc-i440fx-2.2 binary=./x86_64-softmmu/qemu-system-x86_64 device=ide-cd accel=kvm INFO: success: ./x86_64-softmmu/qemu-system-x86_64 -S -machine pc-i440fx-2.2,accel=kvm -device ide-cd INFO: test case: machine=SPARCClassic binary=./sparc-softmmu/qemu-system-sparc device=memory accel=tcg qemu received signal 6: -S -machine SPARCClassic,accel=tcg -device memory ERROR: failed: machine=SPARCClassic binary=./sparc-softmmu/qemu-system-sparc device=memory accel=tcg ERROR: cmdline: ./sparc-softmmu/qemu-system-sparc -S -machine SPARCClassic,accel=tcg -device memory ERROR: log: qemu-system-sparc: /root/qemu-build/exec.c:1500: find_ram_offset: Assertion `size != 0' failed. ERROR: exit code: -6 INFO: test case: machine=romulus-bmc binary=./arm-softmmu/qemu-system-arm device=ich9-usb-uhci6 accel=tcg INFO: test case: machine=ref405ep binary=./ppc-softmmu/qemu-system-ppc device=ivshmem-doorbell accel=tcg INFO: test case: machine=romulus-bmc binary=./aarch64-softmmu/qemu-system-aarch64 device=l2x0 accel=tcg INFO: test case: machine=pc-i440fx-1.7 binary=./x86_64-softmmu/qemu-system-x86_64 device=virtio-input-host-pci accel=tcg INFO: test case: machine=none binary=./ppc-softmmu/qemu-system-ppc device=virtio-tablet-pci accel=tcg INFO: test case: machine=terrier binary=./aarch64-softmmu/qemu-system-aarch64 device=sst25vf016b accel=tcg INFO: success: ./aarch64-softmmu/qemu-system-aarch64 -S -machine terrier,accel=tcg -device sst25vf016b INFO: test case: machine=none binary=./i386-softmmu/qemu-system-i386 device=intel-iommu accel=kvm qemu received signal 6: -S -machine none,accel=kvm -device intel-iommu ERROR: failed: machine=none binary=./i386-softmmu/qemu-system-i386 device=intel-iommu accel=kvm ERROR: cmdline: ./i386-softmmu/qemu-system-i386 -S -machine none,accel=kvm -device intel-iommu ERROR: log: /root/qemu-build/hw/i386/intel_iommu.c:2565:vtd_realize: Object 0x7fe117fabfb0 is not an instance of type generic-pc-machine ERROR: exit code: -6 INFO: test case: machine=tosa binary=./aarch64-softmmu/qemu-system-aarch64 device=integrator_core accel=tcg INFO: test case: machine=isapc binary=./i386-softmmu/qemu-system-i386 device=i82550 accel=kvm INFO: test case: machine=xlnx-ep108 binary=./aarch64-softmmu/qemu-system-aarch64 device=digic accel=tcg qemu received signal 6: -S -machine xlnx-ep108,accel=tcg -device digic ERROR: failed: machine=xlnx-ep108 binary=./aarch64-softmmu/qemu-system-aarch64 device=digic accel=tcg ERROR: cmdline: ./aarch64-softmmu/qemu-system-aarch64 -S -machine xlnx-ep108,accel=tcg -device digic ERROR: log: audio: Could not init `oss' audio driver ERROR: log: Unexpected error in qemu_chr_fe_init() at /root/qemu-build/chardev/char.c:512: ERROR: log: qemu-system-aarch64: -device digic: Device 'serial0' is in use ERROR: exit code: -6 INFO: test case: machine=raspi2 binary=./arm-softmmu/qemu-system-arm device=sd-card accel=tcg INFO: success: ./arm-softmmu/qemu-system-arm -S -machine raspi2,accel=tcg -device sd-card [...] Eduardo Habkost (3): qemu.py: Always save QEMU exit code qtest.py: Support QTEST_LOG environment variable scripts: Test script to look for -device crashes scripts/device-crash-test.py | 486 +++ scripts/qemu.py | 10 +- scripts/qtest.py | 6 + 3 files changed, 499 insertions(+), 3 deletions(-) create mode 100755 scripts/device-crash-test.py -- 2.11.0.259.g40922b1