Re: 'make check-acceptance' failing on s390 tests?

2022-03-11 Thread Thomas Huth

On 18/02/2022 16.04, Peter Maydell wrote:

Hi; is anybody else seeing 'make check-acceptance' fail on some of
the s390 tests?

  (009/183) tests/avocado/boot_linux.py:BootLinuxS390X.test_s390_ccw_virtio_tcg:
INTERRUPTED: Test interrupted by SIGTERM\nRunner error occurred:
Timeout reached\nOriginal status: ERROR\n{'name':
'009-tests/avocado/boot_linux.py:BootLinuxS390X.test_s390_ccw_virtio_tcg',
'logdir': 
'/mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/clang/tests/results/j...
(900.20 s)

[...]

Not sure about the timeout on the boot test: the avocado log
shows it booting at least as far as
"Kernel 5.3.7-301.fc31.s390x on an s390x (ttysclp0)"
and then there's no further output until the timeout.


Now that I've finally been able to run the test again (after
manually tweaking that borked is_port_free() function in
avocado), I've had a closer look at the failing BootLinuxS390X
test: If you're looking at the output of the guest in the log,
you can see that it fails to init the cloud-init stuff and
thus fails to "phone home" at the end.

This used to work fine in older versions, so I just spent a
lot of time bisecting this issue and ended up here:

f83bcecb1ffe25a18367409eaf4ba1453c835c48 is the first bad commit
commit f83bcecb1ffe25a18367409eaf4ba1453c835c48
Author: Richard Henderson 
Date:   Tue Jul 27 07:48:55 2021 -1000

accel/tcg: Add cpu_{ld,st}*_mmu interfaces

Richard, could you please have a look at this one, too? ... it
causes the test to fail:

$ git checkout f83bcecb1ffe25a18367409eaf4ba1453c835c48~1
$ ./configure --target-list=s390x-softmmu --disable-docs
$ make -j8
$ make check-venv
$ cd build
$ ./tests/venv/bin/avocado run tests/acceptance/boot_linux.py:BootLinuxS390X
JOB ID : 0a6d287620d150d52c24417d0a672a1a826b3a82
JOB LOG: 
/home/thuth/avocado/job-results/job-2022-03-11T18.30-0a6d287/job.log
 (1/1) tests/acceptance/boot_linux.py:BootLinuxS390X.test_s390_ccw_virtio_tcg: 
PASS (130.38 s)
RESULTS: PASS 1 | ERROR 0 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 0 | CANCEL 0
JOB TIME   : 136.51 s
$ grep cloud-ini 
/home/thuth/avocado/job-results/job-2022-03-11T18.30-0a6d287/job.log
...
2022-03-11 18:31:52,745 datadrainer  L0193 DEBUG| [  OK  ] Started Initial 
cloud-init…ob (metadata service crawler).

$ git checkout f83bcecb1ffe25a18367409eaf4ba1453c835c48
$ make -j8
$ ./tests/venv/bin/avocado run tests/acceptance/boot_linux.py:BootLinuxS390X
JOB ID : cb143be36631515f74cb6de2b263dfe1bc0f9709
JOB LOG: 
/home/thuth/avocado/job-results/job-2022-03-11T18.34-cb143be/job.log
 (1/1) tests/acceptance/boot_linux.py:BootLinuxS390X.test_s390_ccw_virtio_tcg: 
INTERRUPTED: Test interrupted by SIGTERM\nRunner error occurred: Timeout 
reached\nOriginal status: ERROR\n{'name': 
'1-tests/acceptance/boot_linux.py:BootLinuxS390X.test_s390_ccw_virtio_tcg', 
'logdir': 
'/home/thuth/avocado/job-results/job-2022-03-11T18.34-cb143be/test-res... 
(900.97 s)
RESULTS: PASS 0 | ERROR 0 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 1 | CANCEL 0
JOB TIME   : 907.16 s
$ grep cloud-ini 
/home/thuth/avocado/job-results/job-2022-03-11T18.34-cb143be/job.log
2022-03-11 18:35:15,106 datadrainer  L0193 DEBUG|  Starting Initial 
cloud-init job (pre-networking)...
2022-03-11 18:35:21,691 datadrainer  L0193 DEBUG| [FAILED] Failed to start 
Initial cloud-init job (pre-networking).
...

 Thomas




Re: 'make check-acceptance' failing on s390 tests?

2022-02-21 Thread Thomas Huth

On 18/02/2022 16.04, Peter Maydell wrote:

Hi; is anybody else seeing 'make check-acceptance' fail on some of
the s390 tests?

  (009/183) tests/avocado/boot_linux.py:BootLinuxS390X.test_s390_ccw_virtio_tcg:
INTERRUPTED: Test interrupted by SIGTERM\nRunner error occurred:
Timeout reached\nOriginal status: ERROR\n{'name':
'009-tests/avocado/boot_linux.py:BootLinuxS390X.test_s390_ccw_virtio_tcg',
'logdir': 
'/mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/clang/tests/results/j...
(900.20 s)


  (090/183) 
tests/avocado/machine_s390_ccw_virtio.py:S390CCWVirtioMachine.test_s390x_fedora:
FAIL: b'1280 800\n' != b'1024 768\n' (26.79 s)


I've cc'd Daniel because the 090 at least looks like a resolution
baked into the test case, and commit de72c4b7c that went in
last month changed the EDID reported resolution from 1024x768
to 1280x800.


Yes, that seems to be right - since the default monitor resolution changed, 
the screenshot now has a different size, too. I sent a patch here:


https://lists.gnu.org/archive/html/qemu-devel/2022-02/msg04473.html


Not sure about the timeout on the boot test: the avocado log
shows it booting at least as far as
"Kernel 5.3.7-301.fc31.s390x on an s390x (ttysclp0)"
and then there's no further output until the timeout.
Unfortunately the avocado log doesn't seem to include useful
information like "this is the string we were waiting to see", so
I'm not sure exactly what's gone wrong there.

(I continue to find the Avocado tests rather opaque: when you
get a series of green OK's that's fine, but when you get a failure
it's often non-obvious why it failed or how to do simple things
like "rerun just that one failed test" or "run the failing command,
interactively on the command line".)


For me, it's even worse with the tests/avocado/boot_linux.py - none of them 
is working on my local laptop, so I was always ignoring them until now. 
FWIW, I'm seeing this python backtrace in the log:


 Reproduced traceback from: 
/home/thuth/tmp/qemu-build/tests/venv/lib64/python3.6/site-packages/avocado/core/test.py:770

 Traceback (most recent call last):
   File "/home/thuth/tmp/qemu-build/tests/avocado/boot_linux.py", line 30, 
in test_pc_i440fx_tcg

 self.launch_and_wait(set_up_ssh_connection=False)
   File 
"/home/thuth/tmp/qemu-build/tests/avocado/avocado_qemu/__init__.py", line 
636, in launch_and_wait
 cloudinit.wait_for_phone_home(('0.0.0.0', self.phone_home_port), 
self.name)
   File 
"/home/thuth/tmp/qemu-build/tests/venv/lib64/python3.6/site-packages/avocado/utils/cloudinit.py", 
line 192, in wait_for_phone_home

 s = PhoneHomeServer(address, instance_id)
   File 
"/home/thuth/tmp/qemu-build/tests/venv/lib64/python3.6/site-packages/avocado/utils/cloudinit.py", 
line 173, in __init__

 HTTPServer.__init__(self, address, PhoneHomeServerHandler)
   File "/usr/lib64/python3.6/socketserver.py", line 456, in __init__
 self.server_bind()
   File "/usr/lib64/python3.6/http/server.py", line 136, in server_bind
 socketserver.TCPServer.server_bind(self)
   File "/usr/lib64/python3.6/socketserver.py", line 470, in server_bind
 self.socket.bind(self.server_address)
 TypeError: an integer is required (got type NoneType)

... no clue how to debug these problems, though.


The 090 failure didn't cause the merge to be rejected because
in commit 333168efe5c8 we disabled both these tests when
running on GitLab.

Suggestion: we should either disable tests entirely (except
for manual "I want to run this known-flaky test") or not at
all, rather than disabling them only on GitLab. If I'm running
'make check-acceptance' locally I don't want to be distracted
by tests we know to be dodgy, any more than if I were running
the CI on GitLab.


IIRC I only saw the occasional hangs of the test on Gitlab, and never on my 
local host ... but I see your point ... I'm fine if we replace the 
@skipIf(os.getenv('GITLAB_CI')...) there with a 
@skipUnless(os.getenv('AVOCADO_ALLOW_FLAKY_TESTS')...) or something similar. 
Would you have some spare time to write such a patch?


 Thomas




Re: 'make check-acceptance' failing on s390 tests?

2022-02-18 Thread Richard Henderson

On 2/19/22 02:04, Peter Maydell wrote:

Hi; is anybody else seeing 'make check-acceptance' fail on some of
the s390 tests?

  (009/183) tests/avocado/boot_linux.py:BootLinuxS390X.test_s390_ccw_virtio_tcg:
INTERRUPTED: Test interrupted by SIGTERM\nRunner error occurred:
Timeout reached\nOriginal status: ERROR\n{'name':
'009-tests/avocado/boot_linux.py:BootLinuxS390X.test_s390_ccw_virtio_tcg',
'logdir': 
'/mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/clang/tests/results/j...
(900.20 s)


  (090/183) 
tests/avocado/machine_s390_ccw_virtio.py:S390CCWVirtioMachine.test_s390x_fedora:
FAIL: b'1280 800\n' != b'1024 768\n' (26.79 s)


FWIW, yes, I'm seeing those.


r~



'make check-acceptance' failing on s390 tests?

2022-02-18 Thread Peter Maydell
Hi; is anybody else seeing 'make check-acceptance' fail on some of
the s390 tests?

 (009/183) tests/avocado/boot_linux.py:BootLinuxS390X.test_s390_ccw_virtio_tcg:
INTERRUPTED: Test interrupted by SIGTERM\nRunner error occurred:
Timeout reached\nOriginal status: ERROR\n{'name':
'009-tests/avocado/boot_linux.py:BootLinuxS390X.test_s390_ccw_virtio_tcg',
'logdir': 
'/mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/clang/tests/results/j...
(900.20 s)


 (090/183) 
tests/avocado/machine_s390_ccw_virtio.py:S390CCWVirtioMachine.test_s390x_fedora:
FAIL: b'1280 800\n' != b'1024 768\n' (26.79 s)


I've cc'd Daniel because the 090 at least looks like a resolution
baked into the test case, and commit de72c4b7c that went in
last month changed the EDID reported resolution from 1024x768
to 1280x800.

Not sure about the timeout on the boot test: the avocado log
shows it booting at least as far as
"Kernel 5.3.7-301.fc31.s390x on an s390x (ttysclp0)"
and then there's no further output until the timeout.
Unfortunately the avocado log doesn't seem to include useful
information like "this is the string we were waiting to see", so
I'm not sure exactly what's gone wrong there.

(I continue to find the Avocado tests rather opaque: when you
get a series of green OK's that's fine, but when you get a failure
it's often non-obvious why it failed or how to do simple things
like "rerun just that one failed test" or "run the failing command,
interactively on the command line".)

The 090 failure didn't cause the merge to be rejected because
in commit 333168efe5c8 we disabled both these tests when
running on GitLab.

Suggestion: we should either disable tests entirely (except
for manual "I want to run this known-flaky test") or not at
all, rather than disabling them only on GitLab. If I'm running
'make check-acceptance' locally I don't want to be distracted
by tests we know to be dodgy, any more than if I were running
the CI on GitLab.

thanks
-- PMM