(gdb) handle SIGUSR1 nostop noprint pass (gdb) b migration/qemu-file.c:295 (gdb) command p f->pos c end
That showed the pos is ever increasing and fails at an offset it never read before. Yet the absolute number was different. $1 = 0 $2 = 8948 $3 = 41423 [...] $11359 = 326387440 $11360 = 326420208 => This was the one failing this time This was a different f->pos than last time, so I wondered if this would change every time. With a less interactive gdb config I got in three tries: 1. 313153311 2. 313313376 3. 313571856 So a different f->pos to fail each time. Different but rather close. I wondered if the reasons I got a higher one when tracing in more detail printing all offsets could be that there still is something copied/synced and only slowly gets available. I stepped through rather slowly and got to 322429260 this time. So slower continuing on the iteration over qemu_fill_buffer makes it fail "later"? Finally it is surely interesting which channel that actually is- likely the migration socket? And yes, ioc->name in qio_channel_read is: $8 = 0x56ab78e5c0 "migration-socket-incoming -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1711602 Title: --copy-storage-all failing with qemu 2.10 Status in QEMU: New Status in libvirt package in Ubuntu: Confirmed Status in qemu package in Ubuntu: Confirmed Bug description: We fixed an issue around disk locking already in regard to qemu-nbd [1], but there still seem to be issues. $ virsh migrate --live --copy-storage-all kvmguest-artful-normal qemu+ssh://10.22.69.196/system error: internal error: qemu unexpectedly closed the monitor: 2017-08-18T12:10:29.800397Z qemu-system-x86_64: -chardev pty,id=charserial0: char device redirected to /dev/pts/0 (label charserial0) 2017-08-18T12:10:48.545776Z qemu-system-x86_64: load of migration failed: Input/output error Source libvirt log for the guest: 2017-08-18 12:09:08.251+0000: initiating migration 2017-08-18T12:09:08.809023Z qemu-system-x86_64: Unable to read from socket: Connection reset by peer 2017-08-18T12:09:08.809481Z qemu-system-x86_64: Unable to read from socket: Connection reset by peer Target libvirt log for the guest: 2017-08-18T12:09:08.730911Z qemu-system-x86_64: load of migration failed: Input/output error 2017-08-18 12:09:09.010+0000: shutting down, reason=crashed Given the timing it seems that the actual copy now works (it is busy ~10 seconds on my environment which would be the copy). Also we don't see the old errors we saw before, but afterwards on the actual take-over it fails. Dmesg has no related denials as often apparmor is in the mix. Need to check libvirt logs of source [2] and target [3] in Detail. [1]: https://lists.gnu.org/archive/html/qemu-devel/2017-08/msg02200.html [2]: http://paste.ubuntu.com/25339356/ [3]: http://paste.ubuntu.com/25339358/ To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1711602/+subscriptions