Daniel P. Berrangé <berra...@redhat.com> wrote: > On Thu, Jun 01, 2023 at 09:27:09AM +0100, Daniel P. Berrangé wrote: >> On Wed, May 31, 2023 at 11:03:23PM +0200, Juan Quintela wrote: >> > Richard Henderson <richard.hender...@linaro.org> wrote: >> > > On 5/30/23 11:25, Juan Quintela wrote: >> > >> The following changes since commit >> > >> aa9bbd865502ed517624ab6fe7d4b5d89ca95e43: >> > >> Merge tag 'pull-ppc-20230528' of https://gitlab.com/danielhb/qemu >> > >> into staging (2023-05-29 14:31:52 -0700) >> > >> are available in the Git repository at: >> > >> https://gitlab.com/juan.quintela/qemu.git >> > >> tags/migration-20230530-pull-request >> > >> for you to fetch changes up to >> > >> c63c544005e6b1375a9c038f0e0fb8dfb8b249f4: >> > >> migration/rdma: Check sooner if we are in postcopy for >> > >> save_page() (2023-05-30 19:23:50 +0200) >> > >> ---------------------------------------------------------------- >> > >> > Added Markus and Daniel. >> > >> > >> Migration 20230530 Pull request (take 2) >> > >> Hi >> > >> Resend last PULL request, this time it compiles when CONFIG_RDMA is >> > >> not configured in. >> > >> [take 1] >> > >> On this PULL request: >> > >> - Set vmstate migration failure right (vladimir) >> > >> - Migration QEMUFileHook removal (juan) >> > >> - Migration Atomic counters (juan) >> > >> Please apply. >> > >> ---------------------------------------------------------------- >> > >> Juan Quintela (16): >> > >> migration: Don't abuse qemu_file transferred for RDMA >> > >> migration/RDMA: It is accounting for zero/normal pages in two places >> > >> migration/rdma: Remove QEMUFile parameter when not used >> > >> migration/rdma: Don't use imaginary transfers >> > >> migration: Remove unused qemu_file_credit_transfer() >> > >> migration/rdma: Simplify the function that saves a page >> > >> migration: Create migrate_rdma() >> > >> migration/rdma: Unfold ram_control_before_iterate() >> > >> migration/rdma: Unfold ram_control_after_iterate() >> > >> migration/rdma: Remove all uses of RAM_CONTROL_HOOK >> > >> migration/rdma: Unfold hook_ram_load() >> > >> migration/rdma: Create rdma_control_save_page() >> > >> qemu-file: Remove QEMUFileHooks >> > >> migration/rdma: Move rdma constants from qemu-file.h to rdma.h >> > >> migration/rdma: Remove qemu_ prefix from exported functions >> > >> migration/rdma: Check sooner if we are in postcopy for save_page() >> > >> Vladimir Sementsov-Ogievskiy (5): >> > >> runstate: add runstate_get() >> > >> migration: never fail in global_state_store() >> > >> runstate: drop unused runstate_store() >> > >> migration: switch from .vm_was_running to .vm_old_state >> > >> migration: restore vmstate on migration failure >> > > >> > > Appears to introduce multiple avocado failures: >> > > >> > > https://gitlab.com/qemu-project/qemu/-/jobs/4378066518#L286 >> > > >> > > Test summary: >> > > tests/avocado/migration.py:X86_64.test_migration_with_exec: ERROR >> > > tests/avocado/migration.py:X86_64.test_migration_with_tcp_localhost: >> > > ERROR >> > > tests/avocado/migration.py:X86_64.test_migration_with_unix: ERROR >> > > make: *** [/builds/qemu-project/qemu/tests/Makefile.include:142: >> > > check-avocado] Error 1 >> > > >> > > https://gitlab.com/qemu-project/qemu/-/jobs/4378066523#L387 >> > > >> > > Test summary: >> > > tests/avocado/migration.py:X86_64.test_migration_with_tcp_localhost: >> > > ERROR >> > > tests/avocado/migration.py:X86_64.test_migration_with_unix: ERROR >> > > make: *** [/builds/qemu-project/qemu/tests/Makefile.include:142: >> > > check-avocado] Error 1 >> > > >> > > Also fails QTEST_QEMU_BINARY=./qemu-system-aarch64 >> > > ./tests/qtest/migration-test >> > > >> > > ../src/migration/rdma.c:408:QIO_CHANNEL_RDMA: Object 0xaaaaf7bba680 is >> > > not an instance of type qio-channel-rdma >> > >> > I am looking at the other errors, but this one is weird. It is failing >> > here: >> > >> > #define TYPE_QIO_CHANNEL_RDMA "qio-channel-rdma" >> > OBJECT_DECLARE_SIMPLE_TYPE(QIOChannelRDMA, QIO_CHANNEL_RDMA) >> > >> > In the OBJECT line. >> > >> > I have no clue what problem are we having here with the object system to >> > decide at declaration time that a variable is not of the type that we >> > are declaring. >> > >> > I am missing something obvious here? >> >> I expect somewhere in the code has either corrupted memory, or is >> using free'd memory. Either way you'll need to get a stack trace >> to debug this kind of thing > > I've replied to the patches pointing out 4 places where the code > casts to QIOChannelRDMA, without first checking that this is an > RDMA migration, which look likely to be the cause of this.
Good catch. I can only say: Ouch. And why it don't failed for me. It passes for me: - make check (compiled every target/device/... that can be compiled on Fedora38) - I tested hundreds of times migration-test during development, never failed like that - I am switching to test aarch64 tcg as main target, because it appears it finds way more bugs on migration-tests. Thanks again. Later, Juan.