It can happen that the qmp connection gets lost while mirroring a disk. In that case the current block job get cancelled, but the real cause of the failure is lost, becase we die() at a later step with the generic message "die "$job: mirroring has been cancelled\n"
example: ... drive-scsi0: transferred: 5524946944 bytes remaining: 918355968 bytes total: 6443302912 bytes progression: 85.75 % busy: 1 ready: 0 drive-scsi0: Cancelling block job drive-scsi0: Done. 2017-07-26 15:39:56 ERROR: online migrate failure - mirroring error: drive-scsi0: mirroring has been cancelled 2017-07-26 15:39:56 aborting phase 2 - cleanup resources 2017-07-26 15:39:56 migrate_cancel ... after patch applied: 2017-07-27 09:43:37 ERROR: online migrate failure - mirroring error: lost connection to qemu machine protocol: VM 600 not running 2017-07-27 09:43:37 aborting phase 2 - cleanup resources --- PVE/QemuServer.pm | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm index 1f34101..3424e3a 100644 --- a/PVE/QemuServer.pm +++ b/PVE/QemuServer.pm @@ -6033,7 +6033,11 @@ sub qemu_drive_mirror_monitor { while (1) { die "storage migration timed out\n" if $err_complete > 300; - my $stats = vm_mon_cmd($vmid, "query-block-jobs"); + my $stats; + eval { + $stats = vm_mon_cmd($vmid, "query-block-jobs"); + }; + die "lost connection to qemu machine protocol socket: $@\n" if $@; my $running_mirror_jobs = {}; foreach my $stat (@$stats) { -- 2.11.0 _______________________________________________ pve-devel mailing list pve-devel@pve.proxmox.com https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel