Hi Rudi, On Sun, 5 Apr 2020, Rudolph Bott wrote:
I entirely forgot about the QEMU log files. Yes, we can see an error message from the crashing QEMU process: qemu-system-x86_64: /build/qemu-oknQD6/qemu-4.2/accel/kvm/kvm-all.c:653: kvm_log_clear_one_slot: Assertion `mem->dirty_bmap' failed.
Nice finding. Well done!
It seems that it's related to starting a migration while the QEMU process (or rather: the VM inside) is still in the boot phase. The fix for this is already upstream: https://github.com/qemu/qemu/commit/9b3a31c745b61758aaa5466a3a9fc0526d409188 However, it seems it is only in for the next QEMU 5 release. I think we should open a Debian Bug for this (however, after quickly reading through the guide I am not a 100% sure I understood how to open a bug for QEMU in bullseye). Maybe the fix can be backported.
I'm also unsure about bullseye. Once it has package freeze, QEMU 5 might be included. Don't know if someone is willing to backport patches here. However Ubuntu/focal is shipping with QEMU 4.2. Maybe it's worth to open an issue there?
Any ideas how to work around this issue in the QA suite for now? Swap failover / migration tests?
Does swapping the tests help? ATM I seen no other options besides inserting sleeps/delays.
But from what I understand, this can also be an issue when someone triggers a live migration through ganeti while the VM is in a rebooting state internally. That again is something which might hit people in production.
I understand it in the same way, so this bug should be fixed. Thanks, Sascha. -- You received this message because you are subscribed to the Google Groups "ganeti-devel" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/ganeti-devel/alpine.DEB.2.20.2004061223490.5424%40ivy.loc.
