On 21/02/2023 14.27, Peter Maydell wrote:
The migration-test is annoyingly flaky. Examples:
https://gitlab.com/qemu-project/qemu/-/jobs/3806090216
(a FreeBSD job)
32/648 ERROR:../tests/qtest/migration-helpers.c:205:wait_for_migration_status:
assertion failed: (g_test_timer_elapsed() < MIGRATION_STATUS_WAIT_TIMEOUT) ERROR
on a local macos x86 box:
▶ 34/621
ERROR:../../tests/qtest/migration-helpers.c:151:migrate_query_not_failed:
assertion failed: (!g_
str_equal(status, "failed")) ERROR
34/621 qemu:qtest+qtest-i386 / qtest-i386/migration-test
ERROR 168.12s killed by signal 6 SIGABRT
――――――――――――――――――――――――――――――――――――― ✀ ―――――――――――――――――――――――――――――――――――――
stderr:
qemu-system-i386: Failed to peek at channel
query-migrate shows failed migration: Unable to write to socket: Broken pipe
**
ERROR:../../tests/qtest/migration-helpers.c:151:migrate_query_not_failed: assertion
failed: (!g_str_equal(status, "failed"))
(test program exited with status code -6)
――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――
▶ 37/621 ERROR:../../tests/qtest/migration-helpers.c:151:migrate_query_not_failed:
assertion failed: (!g_str_equal(status, "failed")) ERROR
37/621 qemu:qtest+qtest-x86_64 / qtest-x86_64/migration-test
ERROR 174.37s killed by signal 6 SIGABRT
――――――――――――――――――――――――――――――――――――― ✀ ―――――――――――――――――――――――――――――――――――――
stderr:
query-migrate shows failed migration: Unable to write to socket: Broken pipe
**
ERROR:../../tests/qtest/migration-helpers.c:151:migrate_query_not_failed: assertion
failed: (!g_str_equal(status, "failed"))
(test program exited with status code -6)
――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――
I've seen this on other CI jobs as well, but Gitlab's UI makes it
pretty much impossible to re-find failed jobs, since you can't
search for them by failure reason at all.
I've also seen this fail on the OpenBSD vm build.
I've seen the migration-test hang on the s390 private CI runner
in such a way that even though the CI job has timed out, the
stale QEMU and migration-test processes are still lying around on
the host.
I've complained about these before, but nobody has either investigated
or suggested improvements to the test program that would let us gather
more information about what's happening when these fail.
https://lore.kernel.org/qemu-devel/cafeaca8x_im3hn2-p9f+huxnxfxy+d6fze+leq4erldg7zk...@mail.gmail.com/
So this is the big hammer: disable the test entirely, so that we
don't keep getting CI job intermittent failures because of it.
When somebody has time to investigate, we can fix the underlying
cause and reenable the job.
Signed-off-by: Peter Maydell <peter.mayd...@linaro.org>
---
This is an "if you don't want this, propose something else" patch :-)
I'm also regularly running into issues with this test, so from my side:
Acked-by: Thomas Huth <th...@redhat.com>