Re: [PATCH v3 3/4] ci: Add a migration compatibility test job

2024-01-10 Thread Thomas Huth

On 09/01/2024 21.58, Fabiano Rosas wrote:

Cédric Le Goater  writes:


On 1/5/24 19:04, Fabiano Rosas wrote:

The migration tests have support for being passed two QEMU binaries to
test migration compatibility.

Add a CI job that builds the lastest release of QEMU and another job
that uses that version plus an already present build of the current
version and run the migration tests with the two, both as source and
destination. I.e.:

   old QEMU (n-1) -> current QEMU (development tree)
   current QEMU (development tree) -> old QEMU (n-1)

The purpose of this CI job is to ensure the code we're about to merge
will not cause a migration compatibility problem when migrating the
next release (which will contain that code) to/from the previous
release.

I'm leaving the jobs as manual for now because using an older QEMU in
tests could hit bugs that were already fixed in the current
development tree and we need to handle those case-by-case.

Note: for user forks, the version tags need to be pushed to gitlab
otherwise it won't be able to checkout a different version.

Signed-off-by: Fabiano Rosas 
---
   .gitlab-ci.d/buildtest.yml | 53 ++
   1 file changed, 53 insertions(+)

diff --git a/.gitlab-ci.d/buildtest.yml b/.gitlab-ci.d/buildtest.yml
index 91663946de..81163a3f6a 100644
--- a/.gitlab-ci.d/buildtest.yml
+++ b/.gitlab-ci.d/buildtest.yml
@@ -167,6 +167,59 @@ build-system-centos:
 x86_64-softmmu rx-softmmu sh4-softmmu nios2-softmmu
   MAKE_CHECK_ARGS: check-build
   
+build-previous-qemu:

+  extends: .native_build_job_template
+  artifacts:
+when: on_success
+expire_in: 2 days
+paths:
+  - build-previous
+exclude:
+  - build-previous/**/*.p
+  - build-previous/**/*.a.p
+  - build-previous/**/*.fa.p
+  - build-previous/**/*.c.o
+  - build-previous/**/*.c.o.d
+  - build-previous/**/*.fa
+  needs:
+job: amd64-opensuse-leap-container
+  variables:
+QEMU_JOB_OPTIONAL: 1
+IMAGE: opensuse-leap
+TARGETS: x86_64-softmmu aarch64-softmmu
+  before_script:
+- export QEMU_PREV_VERSION="$(sed 's/\([0-9.]*\)\.[0-9]*/v\1.0/' VERSION)"
+- git checkout $QEMU_PREV_VERSION
+  after_script:
+- mv build build-previous
+
+.migration-compat-common:
+  extends: .common_test_job_template
+  needs:
+- job: build-previous-qemu
+- job: build-system-opensuse
+  allow_failure: true
+  variables:
+QEMU_JOB_OPTIONAL: 1
+IMAGE: opensuse-leap
+MAKE_CHECK_ARGS: check-build
+  script:
+- cd build
+- QTEST_QEMU_BINARY_SRC=../build-previous/qemu-system-${TARGET}
+  QTEST_QEMU_BINARY=./qemu-system-${TARGET} 
./tests/qtest/migration-test
+- QTEST_QEMU_BINARY_DST=../build-previous/qemu-system-${TARGET}
+  QTEST_QEMU_BINARY=./qemu-system-${TARGET} 
./tests/qtest/migration-test
+
+migration-compat-aarch64:
+  extends: .migration-compat-common
+  variables:
+TARGET: aarch64
+
+migration-compat-x86_64:
+  extends: .migration-compat-common
+  variables:
+TARGET: x86_64



What about the others archs, s390x and ppc ? Do you lack the resources
or are there any problems to address ?


Currently s390x and ppc are only tested on KVM. Which means they are not
tested at all unless someone runs migration-test on a custom runner. The
same is true for this test.

The TCG tests have been disabled:
 /*
  * On ppc64, the test only works with kvm-hv, but not with kvm-pr and TCG
  * is touchy due to race conditions on dirty bits (especially on PPC for
  * some reason)
  */

 /*
  * Similar to ppc64, s390x seems to be touchy with TCG, so disable it
  * there until the problems are resolved
  */

It would be great if we could figure out what these issues are and fix
them so we can at least test with TCG like we do for aarch64.

Doing a TCG run of migration-test with both archs (one binary only, not
this series):

- ppc survived one run, taking 6 minutes longer than x86/Aarch64.
- s390x survived one run, taking 40s less than x86/aarch64.

I'll leave them enabled on my machine and do some runs here and there,
see if I spot something. If not, we can consider re-enabling them once
we figure out why ppc takes so long.


I was curious and re-enabled the ppc64 and s390x migration tests with TCG on 
my laptop here, running "make check-tcg -j$(nproc)" in a loop. s390x 
unfortunately hang after the second iteration already, but ppc64 survived 25 
runs (then I stopped it).


So we might want to try to re-enable ppc64 at least. But we might need to 
cut the run time for ppc64 with TCG a little bit, it is currently the 
longest test on my system (it takes 240s to finish, while all other tests 
finish within 150s).


 Thomas




Re: [PATCH v3 3/4] ci: Add a migration compatibility test job

2024-01-09 Thread Peter Xu
On Tue, Jan 09, 2024 at 10:00:17AM -0300, Fabiano Rosas wrote:
> > Can we opt-out those broken tests using either your "since:" thing or
> > anything similar?
> 
> If it's something migration related, then yes. But there might be other
> types of breakages that have nothing to do with migration. Our tests are
> not resilent enough (nor they should) to detect when QEMU aborted for
> other reasons. Think about the -audio issue: the old QEMU would just say
> "there's no -audio option, abort" and that's a test failure of course.

I'm wondering whether we can more or less remedy that by running
migration-test under the build-previous directory for cross-binary tests.
We don't necessarily need to cross-test anything new happening anyway.

IOW, we use both old QEMU / migration-test for "n-1", and we only use "n"
for the new QEMU binary?

-- 
Peter Xu




Re: [PATCH v3 3/4] ci: Add a migration compatibility test job

2024-01-09 Thread Fabiano Rosas
Cédric Le Goater  writes:

> On 1/5/24 19:04, Fabiano Rosas wrote:
>> The migration tests have support for being passed two QEMU binaries to
>> test migration compatibility.
>> 
>> Add a CI job that builds the lastest release of QEMU and another job
>> that uses that version plus an already present build of the current
>> version and run the migration tests with the two, both as source and
>> destination. I.e.:
>> 
>>   old QEMU (n-1) -> current QEMU (development tree)
>>   current QEMU (development tree) -> old QEMU (n-1)
>> 
>> The purpose of this CI job is to ensure the code we're about to merge
>> will not cause a migration compatibility problem when migrating the
>> next release (which will contain that code) to/from the previous
>> release.
>> 
>> I'm leaving the jobs as manual for now because using an older QEMU in
>> tests could hit bugs that were already fixed in the current
>> development tree and we need to handle those case-by-case.
>> 
>> Note: for user forks, the version tags need to be pushed to gitlab
>> otherwise it won't be able to checkout a different version.
>> 
>> Signed-off-by: Fabiano Rosas 
>> ---
>>   .gitlab-ci.d/buildtest.yml | 53 ++
>>   1 file changed, 53 insertions(+)
>> 
>> diff --git a/.gitlab-ci.d/buildtest.yml b/.gitlab-ci.d/buildtest.yml
>> index 91663946de..81163a3f6a 100644
>> --- a/.gitlab-ci.d/buildtest.yml
>> +++ b/.gitlab-ci.d/buildtest.yml
>> @@ -167,6 +167,59 @@ build-system-centos:
>> x86_64-softmmu rx-softmmu sh4-softmmu nios2-softmmu
>>   MAKE_CHECK_ARGS: check-build
>>   
>> +build-previous-qemu:
>> +  extends: .native_build_job_template
>> +  artifacts:
>> +when: on_success
>> +expire_in: 2 days
>> +paths:
>> +  - build-previous
>> +exclude:
>> +  - build-previous/**/*.p
>> +  - build-previous/**/*.a.p
>> +  - build-previous/**/*.fa.p
>> +  - build-previous/**/*.c.o
>> +  - build-previous/**/*.c.o.d
>> +  - build-previous/**/*.fa
>> +  needs:
>> +job: amd64-opensuse-leap-container
>> +  variables:
>> +QEMU_JOB_OPTIONAL: 1
>> +IMAGE: opensuse-leap
>> +TARGETS: x86_64-softmmu aarch64-softmmu
>> +  before_script:
>> +- export QEMU_PREV_VERSION="$(sed 's/\([0-9.]*\)\.[0-9]*/v\1.0/' 
>> VERSION)"
>> +- git checkout $QEMU_PREV_VERSION
>> +  after_script:
>> +- mv build build-previous
>> +
>> +.migration-compat-common:
>> +  extends: .common_test_job_template
>> +  needs:
>> +- job: build-previous-qemu
>> +- job: build-system-opensuse
>> +  allow_failure: true
>> +  variables:
>> +QEMU_JOB_OPTIONAL: 1
>> +IMAGE: opensuse-leap
>> +MAKE_CHECK_ARGS: check-build
>> +  script:
>> +- cd build
>> +- QTEST_QEMU_BINARY_SRC=../build-previous/qemu-system-${TARGET}
>> +  QTEST_QEMU_BINARY=./qemu-system-${TARGET} 
>> ./tests/qtest/migration-test
>> +- QTEST_QEMU_BINARY_DST=../build-previous/qemu-system-${TARGET}
>> +  QTEST_QEMU_BINARY=./qemu-system-${TARGET} 
>> ./tests/qtest/migration-test
>> +
>> +migration-compat-aarch64:
>> +  extends: .migration-compat-common
>> +  variables:
>> +TARGET: aarch64
>> +
>> +migration-compat-x86_64:
>> +  extends: .migration-compat-common
>> +  variables:
>> +TARGET: x86_64
>
>
> What about the others archs, s390x and ppc ? Do you lack the resources
> or are there any problems to address ?

Currently s390x and ppc are only tested on KVM. Which means they are not
tested at all unless someone runs migration-test on a custom runner. The
same is true for this test.

The TCG tests have been disabled:
/*
 * On ppc64, the test only works with kvm-hv, but not with kvm-pr and TCG
 * is touchy due to race conditions on dirty bits (especially on PPC for
 * some reason)
 */

/*
 * Similar to ppc64, s390x seems to be touchy with TCG, so disable it
 * there until the problems are resolved
 */

It would be great if we could figure out what these issues are and fix
them so we can at least test with TCG like we do for aarch64.

Doing a TCG run of migration-test with both archs (one binary only, not
this series):

- ppc survived one run, taking 6 minutes longer than x86/Aarch64.
- s390x survived one run, taking 40s less than x86/aarch64.

I'll leave them enabled on my machine and do some runs here and there,
see if I spot something. If not, we can consider re-enabling them once
we figure out why ppc takes so long.



Re: [PATCH v3 3/4] ci: Add a migration compatibility test job

2024-01-09 Thread Cédric Le Goater

On 1/5/24 19:04, Fabiano Rosas wrote:

The migration tests have support for being passed two QEMU binaries to
test migration compatibility.

Add a CI job that builds the lastest release of QEMU and another job
that uses that version plus an already present build of the current
version and run the migration tests with the two, both as source and
destination. I.e.:

  old QEMU (n-1) -> current QEMU (development tree)
  current QEMU (development tree) -> old QEMU (n-1)

The purpose of this CI job is to ensure the code we're about to merge
will not cause a migration compatibility problem when migrating the
next release (which will contain that code) to/from the previous
release.

I'm leaving the jobs as manual for now because using an older QEMU in
tests could hit bugs that were already fixed in the current
development tree and we need to handle those case-by-case.

Note: for user forks, the version tags need to be pushed to gitlab
otherwise it won't be able to checkout a different version.

Signed-off-by: Fabiano Rosas 
---
  .gitlab-ci.d/buildtest.yml | 53 ++
  1 file changed, 53 insertions(+)

diff --git a/.gitlab-ci.d/buildtest.yml b/.gitlab-ci.d/buildtest.yml
index 91663946de..81163a3f6a 100644
--- a/.gitlab-ci.d/buildtest.yml
+++ b/.gitlab-ci.d/buildtest.yml
@@ -167,6 +167,59 @@ build-system-centos:
x86_64-softmmu rx-softmmu sh4-softmmu nios2-softmmu
  MAKE_CHECK_ARGS: check-build
  
+build-previous-qemu:

+  extends: .native_build_job_template
+  artifacts:
+when: on_success
+expire_in: 2 days
+paths:
+  - build-previous
+exclude:
+  - build-previous/**/*.p
+  - build-previous/**/*.a.p
+  - build-previous/**/*.fa.p
+  - build-previous/**/*.c.o
+  - build-previous/**/*.c.o.d
+  - build-previous/**/*.fa
+  needs:
+job: amd64-opensuse-leap-container
+  variables:
+QEMU_JOB_OPTIONAL: 1
+IMAGE: opensuse-leap
+TARGETS: x86_64-softmmu aarch64-softmmu
+  before_script:
+- export QEMU_PREV_VERSION="$(sed 's/\([0-9.]*\)\.[0-9]*/v\1.0/' VERSION)"
+- git checkout $QEMU_PREV_VERSION
+  after_script:
+- mv build build-previous
+
+.migration-compat-common:
+  extends: .common_test_job_template
+  needs:
+- job: build-previous-qemu
+- job: build-system-opensuse
+  allow_failure: true
+  variables:
+QEMU_JOB_OPTIONAL: 1
+IMAGE: opensuse-leap
+MAKE_CHECK_ARGS: check-build
+  script:
+- cd build
+- QTEST_QEMU_BINARY_SRC=../build-previous/qemu-system-${TARGET}
+  QTEST_QEMU_BINARY=./qemu-system-${TARGET} 
./tests/qtest/migration-test
+- QTEST_QEMU_BINARY_DST=../build-previous/qemu-system-${TARGET}
+  QTEST_QEMU_BINARY=./qemu-system-${TARGET} 
./tests/qtest/migration-test
+
+migration-compat-aarch64:
+  extends: .migration-compat-common
+  variables:
+TARGET: aarch64
+
+migration-compat-x86_64:
+  extends: .migration-compat-common
+  variables:
+TARGET: x86_64



What about the others archs, s390x and ppc ? Do you lack the resources
or are there any problems to address ?

Thanks,

C.




Re: [PATCH v3 3/4] ci: Add a migration compatibility test job

2024-01-09 Thread Fabiano Rosas
Peter Xu  writes:

> On Fri, Jan 05, 2024 at 03:04:48PM -0300, Fabiano Rosas wrote:
>> The migration tests have support for being passed two QEMU binaries to
>> test migration compatibility.
>> 
>> Add a CI job that builds the lastest release of QEMU and another job
>> that uses that version plus an already present build of the current
>> version and run the migration tests with the two, both as source and
>> destination. I.e.:
>> 
>>  old QEMU (n-1) -> current QEMU (development tree)
>>  current QEMU (development tree) -> old QEMU (n-1)
>> 
>> The purpose of this CI job is to ensure the code we're about to merge
>> will not cause a migration compatibility problem when migrating the
>> next release (which will contain that code) to/from the previous
>> release.
>> 
>> I'm leaving the jobs as manual for now because using an older QEMU in
>> tests could hit bugs that were already fixed in the current
>> development tree and we need to handle those case-by-case.
>
> Can we opt-out those broken tests using either your "since:" thing or
> anything similar?

If it's something migration related, then yes. But there might be other
types of breakages that have nothing to do with migration. Our tests are
not resilent enough (nor they should) to detect when QEMU aborted for
other reasons. Think about the -audio issue: the old QEMU would just say
"there's no -audio option, abort" and that's a test failure of course.

> I hope we can start to run something by default in the CI in 9.0 to cover
> n-1 -> n, even if starting with a subset of tests.  Is it possible?

We could maybe have it enabled with "allow_failure" set. The important
thing here is that we don't want to get reports of "flaky test". These
tests are kind of flaky by definition, there's no way to backport a fix
to the older QEMU, so there's always the chance that this test will be
broken for a whole release cycle. We should act fast in adding the
"since" annotation or other workaround, but that depends on our
availability and the type of bug that we hit.



Re: [PATCH v3 3/4] ci: Add a migration compatibility test job

2024-01-08 Thread Peter Xu
On Fri, Jan 05, 2024 at 03:04:48PM -0300, Fabiano Rosas wrote:
> The migration tests have support for being passed two QEMU binaries to
> test migration compatibility.
> 
> Add a CI job that builds the lastest release of QEMU and another job
> that uses that version plus an already present build of the current
> version and run the migration tests with the two, both as source and
> destination. I.e.:
> 
>  old QEMU (n-1) -> current QEMU (development tree)
>  current QEMU (development tree) -> old QEMU (n-1)
> 
> The purpose of this CI job is to ensure the code we're about to merge
> will not cause a migration compatibility problem when migrating the
> next release (which will contain that code) to/from the previous
> release.
> 
> I'm leaving the jobs as manual for now because using an older QEMU in
> tests could hit bugs that were already fixed in the current
> development tree and we need to handle those case-by-case.

Can we opt-out those broken tests using either your "since:" thing or
anything similar?

I hope we can start to run something by default in the CI in 9.0 to cover
n-1 -> n, even if starting with a subset of tests.  Is it possible?

Thanks,

-- 
Peter Xu