It can happen that the qmp connection gets lost while mirroring a disk.
In that case the current block job get cancelled, but the real cause of the 
failure
is lost, becase we die() at a later step with the generic message
"die "$job: mirroring has been cancelled\n"

example:
...
drive-scsi0: transferred: 5524946944 bytes remaining: 918355968 bytes total: 
6443302912 bytes progression: 85.75 % busy: 1 ready: 0
drive-scsi0: Cancelling block job
drive-scsi0: Done.
2017-07-26 15:39:56 ERROR: online migrate failure - mirroring error: 
drive-scsi0: mirroring has been cancelled
2017-07-26 15:39:56 aborting phase 2 - cleanup resources
2017-07-26 15:39:56 migrate_cancel
...

after patch applied:
2017-07-27 09:43:37 ERROR: online migrate failure - mirroring error: lost 
connection to qemu machine protocol: VM 600 not running
2017-07-27 09:43:37 aborting phase 2 - cleanup resources
---
 PVE/QemuServer.pm | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
index 1f34101..3424e3a 100644
--- a/PVE/QemuServer.pm
+++ b/PVE/QemuServer.pm
@@ -6033,7 +6033,11 @@ sub qemu_drive_mirror_monitor {
        while (1) {
            die "storage migration timed out\n" if $err_complete > 300;
 
-           my $stats = vm_mon_cmd($vmid, "query-block-jobs");
+           my $stats;
+           eval {
+               $stats = vm_mon_cmd($vmid, "query-block-jobs");
+           };
+           die "lost connection to qemu machine protocol socket: $@\n" if $@;
 
            my $running_mirror_jobs = {};
            foreach my $stat (@$stats) {
-- 
2.11.0


_______________________________________________
pve-devel mailing list
pve-devel@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

Reply via email to