Peter Maydell <peter.mayd...@linaro.org> wrote:
> On Wed, 3 May 2023 at 10:17, Juan Quintela <quint...@redhat.com> wrote:
>>
>> Peter Maydell <peter.mayd...@linaro.org> wrote:
>> > On Tue, 2 May 2023 at 11:39, Juan Quintela <quint...@redhat.com> wrote:
>> >> Richard, once that we are here, one of the problem that we are having is
>> >> that the test is exiting with an abort, so we have no clue what is
>> >> happening.  Is there a way to get a backtrace, or at least the number
>> >
>> > This has been consistently an issue with the migration tests.
>> > As the owner of the tests, if they are not providing you with
>> > the level of detail that you need to diagnose failures, I
>> > think that is something that is in your court to address:
>> > the CI system is always going to only be able to provide
>> > you with what your tests are outputting to the logs.
>>
>> Right now I would be happy just to see what test it is failing at.
>>
>> I am doing something wrong, or from the links that I see on richard
>> email, I am not able to reach anywhere where I can see the full logs.
>>
>> > For the specific case of backtraces from assertion failures,
>> > I think Dan was looking at whether we could put something
>> > together for that. It won't help with segfaults and the like, though.
>>
>> I am waiting for that O:-)
>>
>> > You should be able to at least get the number of the subtest out of
>> > the logs (either directly in the logs of the job, or else
>> > from the more detailed log file that gets stored as a
>> > job artefact in most cases).
>>
>> Also note that the test is stopping in an abort, with no diagnostic
>> message that I can see.  But I don't see where the abort cames from:
>
> So, as an example I took the check-system-opensuse log:
> https://gitlab.com/qemu-project/qemu/-/jobs/4201998342
>
> Use your browser's "search in web page" to look for "SIGABRT":
> it'll show you the two errors (as well as the summary at
> the bottom of the page which just says the tests aborted).
> Here's one:
>
> 5/351 qemu:qtest+qtest-x86_64 / qtest-x86_64/migration-test ERROR
> 246.12s killed by signal 6 SIGABRT
>>>> QTEST_QEMU_BINARY=./qemu-system-x86_64 QTEST_QEMU_IMG=./qemu-img
>>> MALLOC_PERTURB_=48
>>> QTEST_QEMU_STORAGE_DAEMON_BINARY=./storage-daemon/qemu-storage-daemon
>>> G_TEST_DBUS_DAEMON=/builds/qemu-project/qemu/tests/dbus-vmstate-daemon.sh
>>> /builds/qemu-project/qemu/build/tests/qtest/migration-test --tap -k
> ――――――――――――――――――――――――――――――――――――― ✀ ―――――――――――――――――――――――――――――――――――――
> stderr:
> Could not access KVM kernel module: No such file or directory
> Could not access KVM kernel module: No such file or directory
> Could not access KVM kernel module: No such file or directory
> Could not access KVM kernel module: No such file or directory
> Could not access KVM kernel module: No such file or directory
> Could not access KVM kernel module: No such file or directory
> Could not access KVM kernel module: No such file or directory
> Could not access KVM kernel module: No such file or directory
> Could not access KVM kernel module: No such file or directory
> Could not access KVM kernel module: No such file or directory
> **
> ERROR:../tests/qtest/migration-helpers.c:205:wait_for_migration_status:
> assertion failed: (g_test_timer_elapsed() <
> MIGRATION_STATUS_WAIT_TIMEOUT)
> (test program exited with status code -6)
> ――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――
> ▶ 6/351 
> ERROR:../tests/qtest/migration-helpers.c:205:wait_for_migration_status:
> assertion failed: (g_test_timer_elapsed() <
> MIGRATION_STATUS_WAIT_TIMEOUT) ERROR
> 6/351 qemu:qtest+qtest-aarch64 / qtest-aarch64/migration-test ERROR
> 221.18s killed by signal 6 SIGABRT
>
> Looks like it failed on a timeout in the test code.

Thanks.

> I think there ought to be artefacts from the job which have a
> copy of the full log, but I can't find them: not sure if this
> is just because the gitlab UI is terrible, or if they really
> didn't get generated.

So now we are between a rock and a hard place.

We have slowed down the bandwidth for migration test because on non
loaded machines, migration was too fast to need more than one pass.

And we slowed it so much than now we hit the timer that was set at 120
seconds.

So .....

It is going to be interesting.

BTW, what procesor speed do that aarch64 machines have? Or are they so
loaded that they are efectively trashing?

2minutes for a pass looks a bit too much.

Will give a try to get this test done changing when we detect that we
don't move to the completion stage.

Thanks for the explanation on where to find the data.  The other issue
is that whan I really want is to know what test failed.  I can't see a
way to get that info.  According to Daniel answer, we don't upload that
files for tests that fail.

Later, Juan.


Reply via email to