On Fri, Feb 02, 2024 at 10:47:05AM -0300, Fabiano Rosas wrote:
> Peter Maydell <peter.mayd...@linaro.org> writes:
> 
> > On Mon, 29 Jan 2024 at 03:04, <pet...@redhat.com> wrote:
> >>
> >> From: Fabiano Rosas <faro...@suse.de>
> >>
> >> The migration tests have support for being passed two QEMU binaries to
> >> test migration compatibility.
> >>
> >> Add a CI job that builds the lastest release of QEMU and another job
> >> that uses that version plus an already present build of the current
> >> version and run the migration tests with the two, both as source and
> >> destination. I.e.:
> >>
> >>  old QEMU (n-1) -> current QEMU (development tree)
> >>  current QEMU (development tree) -> old QEMU (n-1)
> >>
> >> The purpose of this CI job is to ensure the code we're about to merge
> >> will not cause a migration compatibility problem when migrating the
> >> next release (which will contain that code) to/from the previous
> >> release.
> >>
> >> The version of migration-test used will be the one matching the older
> >> QEMU. That way we can avoid special-casing new tests that wouldn't be
> >> compatible with the older QEMU.
> >>
> >> Note: for user forks, the version tags need to be pushed to gitlab
> >> otherwise it won't be able to checkout a different version.
> >>
> >> Signed-off-by: Fabiano Rosas <faro...@suse.de>
> >> Link: https://lore.kernel.org/r/20240118164951.30350-3-faro...@suse.de
> >> Signed-off-by: Peter Xu <pet...@redhat.com>
> >> ---
> >>  .gitlab-ci.d/buildtest.yml | 60 ++++++++++++++++++++++++++++++++++++++
> >>  1 file changed, 60 insertions(+)
> >>
> >> diff --git a/.gitlab-ci.d/buildtest.yml b/.gitlab-ci.d/buildtest.yml
> >> index e1c7801598..f0b0edc634 100644
> >> --- a/.gitlab-ci.d/buildtest.yml
> >> +++ b/.gitlab-ci.d/buildtest.yml
> >> @@ -167,6 +167,66 @@ build-system-centos:
> >>        x86_64-softmmu rx-softmmu sh4-softmmu nios2-softmmu
> >>      MAKE_CHECK_ARGS: check-build
> >>
> >> +# Previous QEMU release. Used for cross-version migration tests.
> >> +build-previous-qemu:
> >> +  extends: .native_build_job_template
> >> +  artifacts:
> >> +    when: on_success
> >> +    expire_in: 2 days
> >> +    paths:
> >> +      - build-previous
> >> +    exclude:
> >> +      - build-previous/**/*.p
> >> +      - build-previous/**/*.a.p
> >> +      - build-previous/**/*.fa.p
> >> +      - build-previous/**/*.c.o
> >> +      - build-previous/**/*.c.o.d
> >> +      - build-previous/**/*.fa
> >> +  needs:
> >> +    job: amd64-opensuse-leap-container
> >> +  variables:
> >> +    IMAGE: opensuse-leap
> >> +    TARGETS: x86_64-softmmu aarch64-softmmu
> >> +  before_script:
> >> +    - export QEMU_PREV_VERSION="$(sed 's/\([0-9.]*\)\.[0-9]*/v\1.0/' 
> >> VERSION)"
> >> +    - git checkout $QEMU_PREV_VERSION
> >> +  after_script:
> >> +    - mv build build-previous
> >
> > There seems to be a problem with this new CI job. Running a CI
> > run in my local repository it fails:
> >
> > https://gitlab.com/pm215/qemu/-/jobs/6075873685
> >
> > $ export QEMU_PREV_VERSION="$(sed 's/\([0-9.]*\)\.[0-9]*/v .0/' VERSION)"
> > $ git checkout $QEMU_PREV_VERSION
> > error: pathspec 'v8.2.0' did not match any file(s) known to git
> > Running after_script
> > Running after script...
> > $ mv build build-previous
> > mv: cannot stat 'build': No such file or directory
> > WARNING: after_script failed, but job will continue unaffected: exit code 1
> > Saving cache for failed job
> >
> >
> > I don't think you can assume that private forks doing submaintainer CI
> > runs necessarily have the full set of tags that the main repo does.
> 
> Yes, I thought this would be rare enough not to be an issue, but it
> seems it's not. I don't know what could be done here, if there's no tag,
> then there's no way to resolve the actual commit hash I think.
> 
> > I suspect the sed run will also do the wrong thing when run on the
> > commit that updates the version, because then it will replace
> > "9.0.0" with "9.0.0".
> 
> I just ignored this completly because my initial idea was to leave this
> job disabled and only run it for migration patchsets and pull requests,
> so it wouldn't make sense to run at that commit.
> 
> This job is also not entirely fail proof by design because we could
> always be hitting bugs in the older QEMU version that were already fixed
> in the new version.
> 
> I think the simplest fix here is to leave the test disabled, possibly
> with an env variable to enable it.

However if so that'll be unfortunate.. because the goal of the "n-1" test
is to fail the exact commit that will break compatibility and make it
enforced, IMHO.

Failing for some migration guy pushing CI can be better than nothing
indeed, but it is just less ideal..  we want the developer / module
maintainer notice this issue, fix it instead of merging something wrong
already, then we try to find what is broken and ask for a fix (where there
will still be a window it's broken; and if unlucky across major releases).

Currently the coverage of n-1 test is indeed still more focused on
migration framework, but it'll also cover quite some default configs of the
system layout (even if only x86 is covered), and some default devices IIRC.
We can already attach a few more standard devices in the cmdline so more
things can get covered.

A pretty dumb (but might be working?) solution is we keep commit ID rather
than tags to avoid all kinds of tag hassles:

  PREVIOUS_VERSION_COMMIT_ID=1600b9f46b1bd08b00fe86c46ef6dbb48cbe10d6

Then we boost it after a release.  I think it'll also work for the release
commit then.

Note that there can be a small window we run n-2 -> n test at the start,
but that's fine IMHO, as we should still allow that to work.  Fabiano's
"auto choose latest shared machine type" would be useful here, and I
assume it should just work.

With that, we try to figure something that can be smarter.  Would that
work for us?

-- 
Peter Xu


Reply via email to