Daniel P. Berrangé <berra...@redhat.com> writes:
> On Tue, Sep 12, 2023 at 11:06:11AM -0400, Stefan Hajnoczi wrote: >> The avocado-system-alpine, avocado-system-fedora, and >> avocado-system-ubuntu jobs are unreliable. I identified them while >> looking over CI failures from the past week: >> https://gitlab.com/qemu-project/qemu/-/jobs/5058610614 >> https://gitlab.com/qemu-project/qemu/-/jobs/5058610654 >> https://gitlab.com/qemu-project/qemu/-/jobs/5030428571 >> >> Thomas Huth suggest on IRC today that there may be a legitimate failure >> in there: >> >> th_huth: f4bug, yes, seems like it does not start at all correctly on >> alpine anymore ... and it's broken since ~ 2 weeks already, so if nobody >> noticed this by now, this is worrying >> >> It crept in because the jobs were already unreliable. >> >> I don't know how to interpret the job output, so all I can do is to >> propose removing these jobs. A useful CI job has two outcomes: pass or >> fail. Timeouts and other in-between states are not useful because they >> require constant triaging by someone who understands the details of the >> tests and they can occur when run against pull requests that have >> nothing to do with the area covered by the test. >> >> Hopefully test owners will be able to identify the root causes and solve >> them so that these jobs can stay. In their current state the jobs are >> not useful since I cannot cannot tell whether job failures are real or >> just intermittent when merging qemu.git pull requests. >> >> If you are a test owner, please take a look. >> >> It is likely that other avocado-system-* CI jobs have similar failures >> from time to time, but I'll leave them as long as they are passing. >> >> Buglink: https://gitlab.com/qemu-project/qemu/-/issues/1884 >> Signed-off-by: Stefan Hajnoczi <stefa...@redhat.com> >> --- >> .gitlab-ci.d/buildtest.yml | 27 --------------------------- >> 1 file changed, 27 deletions(-) >> >> diff --git a/.gitlab-ci.d/buildtest.yml b/.gitlab-ci.d/buildtest.yml >> index aee9101507..83ce448c4d 100644 >> --- a/.gitlab-ci.d/buildtest.yml >> +++ b/.gitlab-ci.d/buildtest.yml >> @@ -22,15 +22,6 @@ check-system-alpine: >> IMAGE: alpine >> MAKE_CHECK_ARGS: check-unit check-qtest >> >> -avocado-system-alpine: >> - extends: .avocado_test_job_template >> - needs: >> - - job: build-system-alpine >> - artifacts: true >> - variables: >> - IMAGE: alpine >> - MAKE_CHECK_ARGS: check-avocado > > Instead of entirely deleting, I'd suggest adding > > # Disabled due to frequent random failures > # https://gitlab.com/qemu-project/qemu/-/issues/1884 > when: manual > > See example: https://docs.gitlab.com/ee/ci/yaml/#when > > This disables the job from running unless someone explicitly > tells it to run What I don't understand is why we didn't gate the release back when they first tripped. We should have noticed between: https://gitlab.com/qemu-project/qemu/-/pipelines/956543770 and https://gitlab.com/qemu-project/qemu/-/pipelines/957154381 that the system tests where regressing. Yet we merged the changes anyway. > >> - >> build-system-ubuntu: >> extends: >> - .native_build_job_template >> @@ -53,15 +44,6 @@ check-system-ubuntu: >> IMAGE: ubuntu2204 >> MAKE_CHECK_ARGS: check >> >> -avocado-system-ubuntu: >> - extends: .avocado_test_job_template >> - needs: >> - - job: build-system-ubuntu >> - artifacts: true >> - variables: >> - IMAGE: ubuntu2204 >> - MAKE_CHECK_ARGS: check-avocado >> - >> build-system-debian: >> extends: >> - .native_build_job_template >> @@ -127,15 +109,6 @@ check-system-fedora: >> IMAGE: fedora >> MAKE_CHECK_ARGS: check >> >> -avocado-system-fedora: >> - extends: .avocado_test_job_template >> - needs: >> - - job: build-system-fedora >> - artifacts: true >> - variables: >> - IMAGE: fedora >> - MAKE_CHECK_ARGS: check-avocado >> - >> crash-test-fedora: >> extends: .native_test_job_template >> needs: >> -- >> 2.41.0 >> >> > > With regards, > Daniel -- Alex Bennée Virtualisation Tech Lead @ Linaro