Re: 'make check-acceptance' failing on s390 tests?
On 18/02/2022 16.04, Peter Maydell wrote: Hi; is anybody else seeing 'make check-acceptance' fail on some of the s390 tests? (009/183) tests/avocado/boot_linux.py:BootLinuxS390X.test_s390_ccw_virtio_tcg: INTERRUPTED: Test interrupted by SIGTERM\nRunner error occurred: Timeout reached\nOriginal status: ERROR\n{'name': '009-tests/avocado/boot_linux.py:BootLinuxS390X.test_s390_ccw_virtio_tcg', 'logdir': '/mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/clang/tests/results/j... (900.20 s) [...] Not sure about the timeout on the boot test: the avocado log shows it booting at least as far as "Kernel 5.3.7-301.fc31.s390x on an s390x (ttysclp0)" and then there's no further output until the timeout. Now that I've finally been able to run the test again (after manually tweaking that borked is_port_free() function in avocado), I've had a closer look at the failing BootLinuxS390X test: If you're looking at the output of the guest in the log, you can see that it fails to init the cloud-init stuff and thus fails to "phone home" at the end. This used to work fine in older versions, so I just spent a lot of time bisecting this issue and ended up here: f83bcecb1ffe25a18367409eaf4ba1453c835c48 is the first bad commit commit f83bcecb1ffe25a18367409eaf4ba1453c835c48 Author: Richard Henderson Date: Tue Jul 27 07:48:55 2021 -1000 accel/tcg: Add cpu_{ld,st}*_mmu interfaces Richard, could you please have a look at this one, too? ... it causes the test to fail: $ git checkout f83bcecb1ffe25a18367409eaf4ba1453c835c48~1 $ ./configure --target-list=s390x-softmmu --disable-docs $ make -j8 $ make check-venv $ cd build $ ./tests/venv/bin/avocado run tests/acceptance/boot_linux.py:BootLinuxS390X JOB ID : 0a6d287620d150d52c24417d0a672a1a826b3a82 JOB LOG: /home/thuth/avocado/job-results/job-2022-03-11T18.30-0a6d287/job.log (1/1) tests/acceptance/boot_linux.py:BootLinuxS390X.test_s390_ccw_virtio_tcg: PASS (130.38 s) RESULTS: PASS 1 | ERROR 0 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 0 | CANCEL 0 JOB TIME : 136.51 s $ grep cloud-ini /home/thuth/avocado/job-results/job-2022-03-11T18.30-0a6d287/job.log ... 2022-03-11 18:31:52,745 datadrainer L0193 DEBUG| [ OK ] Started Initial cloud-init…ob (metadata service crawler). $ git checkout f83bcecb1ffe25a18367409eaf4ba1453c835c48 $ make -j8 $ ./tests/venv/bin/avocado run tests/acceptance/boot_linux.py:BootLinuxS390X JOB ID : cb143be36631515f74cb6de2b263dfe1bc0f9709 JOB LOG: /home/thuth/avocado/job-results/job-2022-03-11T18.34-cb143be/job.log (1/1) tests/acceptance/boot_linux.py:BootLinuxS390X.test_s390_ccw_virtio_tcg: INTERRUPTED: Test interrupted by SIGTERM\nRunner error occurred: Timeout reached\nOriginal status: ERROR\n{'name': '1-tests/acceptance/boot_linux.py:BootLinuxS390X.test_s390_ccw_virtio_tcg', 'logdir': '/home/thuth/avocado/job-results/job-2022-03-11T18.34-cb143be/test-res... (900.97 s) RESULTS: PASS 0 | ERROR 0 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 1 | CANCEL 0 JOB TIME : 907.16 s $ grep cloud-ini /home/thuth/avocado/job-results/job-2022-03-11T18.34-cb143be/job.log 2022-03-11 18:35:15,106 datadrainer L0193 DEBUG| Starting Initial cloud-init job (pre-networking)... 2022-03-11 18:35:21,691 datadrainer L0193 DEBUG| [FAILED] Failed to start Initial cloud-init job (pre-networking). ... Thomas
Re: 'make check-acceptance' failing on s390 tests?
On 18/02/2022 16.04, Peter Maydell wrote: Hi; is anybody else seeing 'make check-acceptance' fail on some of the s390 tests? (009/183) tests/avocado/boot_linux.py:BootLinuxS390X.test_s390_ccw_virtio_tcg: INTERRUPTED: Test interrupted by SIGTERM\nRunner error occurred: Timeout reached\nOriginal status: ERROR\n{'name': '009-tests/avocado/boot_linux.py:BootLinuxS390X.test_s390_ccw_virtio_tcg', 'logdir': '/mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/clang/tests/results/j... (900.20 s) (090/183) tests/avocado/machine_s390_ccw_virtio.py:S390CCWVirtioMachine.test_s390x_fedora: FAIL: b'1280 800\n' != b'1024 768\n' (26.79 s) I've cc'd Daniel because the 090 at least looks like a resolution baked into the test case, and commit de72c4b7c that went in last month changed the EDID reported resolution from 1024x768 to 1280x800. Yes, that seems to be right - since the default monitor resolution changed, the screenshot now has a different size, too. I sent a patch here: https://lists.gnu.org/archive/html/qemu-devel/2022-02/msg04473.html Not sure about the timeout on the boot test: the avocado log shows it booting at least as far as "Kernel 5.3.7-301.fc31.s390x on an s390x (ttysclp0)" and then there's no further output until the timeout. Unfortunately the avocado log doesn't seem to include useful information like "this is the string we were waiting to see", so I'm not sure exactly what's gone wrong there. (I continue to find the Avocado tests rather opaque: when you get a series of green OK's that's fine, but when you get a failure it's often non-obvious why it failed or how to do simple things like "rerun just that one failed test" or "run the failing command, interactively on the command line".) For me, it's even worse with the tests/avocado/boot_linux.py - none of them is working on my local laptop, so I was always ignoring them until now. FWIW, I'm seeing this python backtrace in the log: Reproduced traceback from: /home/thuth/tmp/qemu-build/tests/venv/lib64/python3.6/site-packages/avocado/core/test.py:770 Traceback (most recent call last): File "/home/thuth/tmp/qemu-build/tests/avocado/boot_linux.py", line 30, in test_pc_i440fx_tcg self.launch_and_wait(set_up_ssh_connection=False) File "/home/thuth/tmp/qemu-build/tests/avocado/avocado_qemu/__init__.py", line 636, in launch_and_wait cloudinit.wait_for_phone_home(('0.0.0.0', self.phone_home_port), self.name) File "/home/thuth/tmp/qemu-build/tests/venv/lib64/python3.6/site-packages/avocado/utils/cloudinit.py", line 192, in wait_for_phone_home s = PhoneHomeServer(address, instance_id) File "/home/thuth/tmp/qemu-build/tests/venv/lib64/python3.6/site-packages/avocado/utils/cloudinit.py", line 173, in __init__ HTTPServer.__init__(self, address, PhoneHomeServerHandler) File "/usr/lib64/python3.6/socketserver.py", line 456, in __init__ self.server_bind() File "/usr/lib64/python3.6/http/server.py", line 136, in server_bind socketserver.TCPServer.server_bind(self) File "/usr/lib64/python3.6/socketserver.py", line 470, in server_bind self.socket.bind(self.server_address) TypeError: an integer is required (got type NoneType) ... no clue how to debug these problems, though. The 090 failure didn't cause the merge to be rejected because in commit 333168efe5c8 we disabled both these tests when running on GitLab. Suggestion: we should either disable tests entirely (except for manual "I want to run this known-flaky test") or not at all, rather than disabling them only on GitLab. If I'm running 'make check-acceptance' locally I don't want to be distracted by tests we know to be dodgy, any more than if I were running the CI on GitLab. IIRC I only saw the occasional hangs of the test on Gitlab, and never on my local host ... but I see your point ... I'm fine if we replace the @skipIf(os.getenv('GITLAB_CI')...) there with a @skipUnless(os.getenv('AVOCADO_ALLOW_FLAKY_TESTS')...) or something similar. Would you have some spare time to write such a patch? Thomas
Re: 'make check-acceptance' failing on s390 tests?
On 2/19/22 02:04, Peter Maydell wrote: Hi; is anybody else seeing 'make check-acceptance' fail on some of the s390 tests? (009/183) tests/avocado/boot_linux.py:BootLinuxS390X.test_s390_ccw_virtio_tcg: INTERRUPTED: Test interrupted by SIGTERM\nRunner error occurred: Timeout reached\nOriginal status: ERROR\n{'name': '009-tests/avocado/boot_linux.py:BootLinuxS390X.test_s390_ccw_virtio_tcg', 'logdir': '/mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/clang/tests/results/j... (900.20 s) (090/183) tests/avocado/machine_s390_ccw_virtio.py:S390CCWVirtioMachine.test_s390x_fedora: FAIL: b'1280 800\n' != b'1024 768\n' (26.79 s) FWIW, yes, I'm seeing those. r~
'make check-acceptance' failing on s390 tests?
Hi; is anybody else seeing 'make check-acceptance' fail on some of the s390 tests? (009/183) tests/avocado/boot_linux.py:BootLinuxS390X.test_s390_ccw_virtio_tcg: INTERRUPTED: Test interrupted by SIGTERM\nRunner error occurred: Timeout reached\nOriginal status: ERROR\n{'name': '009-tests/avocado/boot_linux.py:BootLinuxS390X.test_s390_ccw_virtio_tcg', 'logdir': '/mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/clang/tests/results/j... (900.20 s) (090/183) tests/avocado/machine_s390_ccw_virtio.py:S390CCWVirtioMachine.test_s390x_fedora: FAIL: b'1280 800\n' != b'1024 768\n' (26.79 s) I've cc'd Daniel because the 090 at least looks like a resolution baked into the test case, and commit de72c4b7c that went in last month changed the EDID reported resolution from 1024x768 to 1280x800. Not sure about the timeout on the boot test: the avocado log shows it booting at least as far as "Kernel 5.3.7-301.fc31.s390x on an s390x (ttysclp0)" and then there's no further output until the timeout. Unfortunately the avocado log doesn't seem to include useful information like "this is the string we were waiting to see", so I'm not sure exactly what's gone wrong there. (I continue to find the Avocado tests rather opaque: when you get a series of green OK's that's fine, but when you get a failure it's often non-obvious why it failed or how to do simple things like "rerun just that one failed test" or "run the failing command, interactively on the command line".) The 090 failure didn't cause the merge to be rejected because in commit 333168efe5c8 we disabled both these tests when running on GitLab. Suggestion: we should either disable tests entirely (except for manual "I want to run this known-flaky test") or not at all, rather than disabling them only on GitLab. If I'm running 'make check-acceptance' locally I don't want to be distracted by tests we know to be dodgy, any more than if I were running the CI on GitLab. thanks -- PMM