Hello, I wrote this mail to the qemu-discuss mailing list, but today I am unsure, if I chose the right list. So I copy and paste this mail here in hope someone can respond :-)
I have reproducable problems with some code in qemu-coroutine.c: void qemu_coroutine_enter(Coroutine *co, void *opaque) { Coroutine *self = qemu_coroutine_self(); CoroutineAction ret; trace_qemu_coroutine_enter(self, co, opaque); if (co->caller) { fprintf(stderr, "Co-routine re-entered recursively\n"); abort(); <————————— This one triggers 4 or 5 out of ten tests to use the blockcommit feature } Unfortunately a "normal" system administrator like me does not understand the error message. I have no idea what causes it nor how to prevent it. Or if this is just a bug ;-) Original mail to qemu-discuss: ------------------------------------------------------------------------- I spent now full five days to debug a major problem with backing up VMs. I run a HP ProLiant Server SE316M1-R2 aka DL160G6) with two Xeon L5520 and 48GB RAM tripple channel. On this server I do monitoring and Qemu/libvirt. I run 7 guests on this server, which runs with Gentoo Linux (hardened; Grsecurity patched kernel, PaX, no RBAC). All guests use raw images as disks (also tested QED and QCOW2). The systems are all Gentoo and Ubuntu. All having qemu-guest-agent running. app-emulation/libvirt-1.2.18-r1::gentoo was built with the following: USE="caps fuse iscsi libvirtd lvm lxc macvtap nfs nls parted pcap qemu sasl systemd udev vepa -apparmor -audit -avahi -firewalld -glusterfs -numa -openvz -phyp -policykit -rbd (-selinux) -uml -virt-network -virtualbox (-wireshark-plugins) -xen" app-emulation/qemu-2.4.0::gentoo was built with the following: USE="aio caps curl fdt filecaps jpeg ncurses nls pin-upstream-blobs png python sasl seccomp spice ssh threads tls uuid vhost-net vnc xattr -accessibility -alsa -bluetooth -debug -glusterfs -gtk -gtk2 -infiniband -iscsi -lzo -nfs -numa -opengl -pulseaudio -rbd -sdl -sdl2 (-selinux) -smartcard -snappy -static -static-softmmu -static-user -systemtap -tci -test -usb -usbredir -vde -virtfs -vte -xen -xfs" PYTHON_TARGETS="python2_7" QEMU_SOFTMMU_TARGETS="i386 x86_64 -aarch64 (-alpha) (-arm) -cris -lm32 (-m68k) -microblaze -microblazeel (-mips) -mips64 -mips64el -mipsel -moxie -or32 (-ppc) (-ppc64) -ppcemb -s390x -sh4 -sh4eb (-sparc) -sparc64 -unicore32 -xtensa -xtensaeb" QEMU_USER_TARGETS="i386 x86_64 -aarch64 (-alpha) (-arm) -armeb -cris (-m68k) -microblaze -microblazeel (-mips) -mips64 -mips64el -mipsel -mipsn32 -mipsn32el -or32 (-ppc) (-ppc64) -ppc64abi32 -s390x -sh4 -sh4eb (-sparc) -sparc32plus -sparc64 -unicore32" I wrote a bash script hat shall backup all guests. It works like this: 1. Create external snapshot 2. Copy/rsync away the image 3. blockcommit snapshot 4. blockjob pivot 5. Copy/rsync away the XML description for the guest 6. Remove Snapshot file I did some test running the script in a cron job. For this I found out that copying the image file takes round about 15 minutes. So I did a 30 minute cycle for the script. 4 or 5 cycles work perfectly. (1) and (2) are working and when it comes to blockcommit, the guest may (random) be aborted and the command fails to continue, because the guest is no longer running. Starting the guest again, I found two situations: 1. I can directly call blockjob … —pivot, because the last blockcommit that failed reached 100%, or 2. Run a blockjob abort action. Re-sync and pivot on command line and that might work. Anyways, blockcommit is not stable here. I tested this on qemu-2.3.0 and 2.4.0 In the logs I only get this: … 2015-08-24 18:38:13.077+0000: starting up libvirt version: 1.2.18, qemu version: 2.4.0 LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin QEMU_AUDIO_DRV=none /usr/bin/qemu-system-x86_64 -name mx.roessner-net.de <http://mx.roessner-net.de/>-TESTING -S -machine pc-i440fx-2.1,accel=kvm,usb=off -cpu qemu64,+kvm_pv_eoi -m 4096 -realtime mlock=off -smp 4,sockets=4,cores=1,threads=1 -uuid d86b82d5-153f-4dd9-aa66-d98c2e65db8c -no-user-config -nodefaults -device sga -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/mx.roessner-net.de <http://mx.roessner-net.de/>-TESTING.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=discard -no-shutdown -boot order=cd,menu=on,strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x8 -drive file=/var/lib/libvirt/images/mx.roessner-net.de <http://mx.roessner-net.de/>-TESTING.img,if=none,id=drive-virtio-disk0,format=raw,cache=writeback -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 -drive if=none,id=drive-ide0-1-0,readonly=on,format=raw -device ide-cd,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -netdev tap,fd=34,id=hostnet0,vhost=on,vhostfd=35 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=54:52:00:27:ac:8d,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev socket,id=charchannel0,path=/var/lib/libvirt/qemu/channel/target/mx.roessner-net.de <http://mx.roessner-net.de/>-TESTING.org.qemu.guest_agent.0,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0 -vnc 127.0.0.1:7 -device cirrus-vga,id=video0,bus=pci.0,addr=0x2 -device i6300esb,id=watchdog0,bus=pci.0,addr=0x7 -watchdog-action reset -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5 -object rng-random,id=objrng0,filename=/dev/random -device virtio-rng-pci,rng=objrng0,id=rng0,bus=pci.0,addr=0x6 -msg timestamp=on char device redirected to /dev/pts/8 (label charserial0) Formatting '/var/backups/snapshots/backup-snapshot-mx.roessner-net.de <http://backup-snapshot-mx.roessner-net.de/>-TESTING.qcow2', fmt=qcow2 size=107374182400 backing_file='/var/lib/libvirt/images/mx.roessner-net.de <http://mx.roessner-net.de/>-TESTING.img' backing_fmt='raw' encryption=off cluster_size=65536 lazy_refcounts=off refcount_bits=16 Formatting '/var/backups/snapshots/backup-snapshot-mx.roessner-net.de <http://backup-snapshot-mx.roessner-net.de/>-TESTING.qcow2', fmt=qcow2 size=107374182400 backing_file='/var/lib/libvirt/images/mx.roessner-net.de <http://mx.roessner-net.de/>-TESTING.img' backing_fmt='raw' encryption=off cluster_size=65536 lazy_refcounts=off refcount_bits=16 Co-routine re-entered recursively 2015-08-24 19:43:17.700+0000: shutting down I tried to find out what this error: "Co-routine re-entered recursively" means? I have no idea. I only know that is is in qemu-coroutine.c line 111. But what causes this error? What am I missing? I checked a different linux kernel. Pur vanilla sources with NUMA-balancing on and off. Several Grsecurity-Kernels. Kernel makes no difference. Qemu version makes no difference. If I clean memory, I have round about 36GB of free memory. Storage is also ok, because it is a BBU driven P410i RAID-controller with RAID1+0 15k SAS disks. Even this server is 6 years old, it has enough power. So I don't think it is a resource or hardware problem. Anything else on the server runs perfectly without any issues. So if you have any idea, what could cause these aborts, please let me know :-) Only stuff I found on the web is that someone said that this co-routine code would be ugly and probably not thread save. No idea where I found this message. But could this be a threading problem? Many, many thanks in advance Christian
smime.p7s
Description: S/MIME cryptographic signature