Hi Sascha,

On Sat, Apr 4, 2020 at 5:52 PM Sascha Lucas <[email protected]> wrote:

> Apparently the qemu process crashed. I wonder if there is something in
> the logs[1] (/var/log/ganeti/kvm/<instance name>.log)?
>

I entirely forgot about the QEMU log files. Yes, we can see an error
message from the crashing QEMU process:

qemu-system-x86_64: /build/qemu-oknQD6/qemu-4.2/accel/kvm/kvm-all.c:653:
kvm_log_clear_one_slot: Assertion `mem->dirty_bmap' failed.

Looking into that, I stumbled over these bugs:

https://bugzilla.redhat.com/show_bug.cgi?id=1771032
https://bugzilla.redhat.com/show_bug.cgi?id=1772774

It seems that it's related to starting a migration while the QEMU process
(or rather: the VM inside) is still in the boot phase. The fix for this is
already upstream:

https://github.com/qemu/qemu/commit/9b3a31c745b61758aaa5466a3a9fc0526d409188

However, it seems it is only in for the next QEMU 5 release. I think we
should open a Debian Bug for this (however, after quickly reading through
the guide I am not a 100% sure I understood how to open a bug for QEMU in
bullseye). Maybe the fix can be backported.

Any ideas how to work around this issue in the QA suite for now? Swap
failover / migration tests? Ganeti users issuing failover/reboot/start +
migrate in rapid order is probably not very likely for production systems.
But from what I understand, this can also be an issue when someone triggers
a live migration through ganeti while the VM is in a rebooting state
internally. That again is something which might hit people in production.


>
> > Simply adding a 'sleep 2' between the two ganeti commands fixes the
> issue.
>
> Sounds why DRBD does not trigger the bug. The disk transition from
> disconnect, primary/primary reconnect includes this extra seconds.
>

>From what we know now, this slight difference in timing should be enough to
get the QEMU process/VM out of its (re)booting state.

Cheers, Rudi

-- 
 Rudolph Bott - [email protected]
 Telefon: +49 (0)211-63 55 56-41
 Telefax: +49 (0)211-63 55 55-22

 sipgate GmbH - Gladbacher Str. 74 - 40219 Düsseldorf
 HRB Düsseldorf 39841 - Geschäftsführer: Thilo Salmon, Tim Mois
 Steuernummer: 106/5724/7147, Umsatzsteuer-ID: DE219349391

 www.sipgate.de - www.sipgate.co.uk

-- 
You received this message because you are subscribed to the Google Groups 
"ganeti-devel" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/ganeti-devel/CAPG4N%3DawfZBTHPyrxSArtniPE8O7h1AgwZd%3D79bZ8MY%3D_7ga7Q%40mail.gmail.com.

Reply via email to