Re: [pve-devel] [PATCH 2/6] qemu_drive_mirror : handle multiple jobs
>>it has a `connect-timeout` option (which is almost surprising). Oh, works even better. #socat UNIX-LISTEN:/tmp/foo.sock TCP:8.8.8.8:1234,connect-timeout=1 # drive_mirror -n drive-scsi0 nbd:unix:/tmp/foo.sock Failed to read data (and socat is killed) - Mail original - De: "Wolfgang Bumiller" <w.bumil...@proxmox.com> À: "aderumier" <aderum...@odiso.com>, "pve-devel" <pve-devel@pve.proxmox.com> Envoyé: Dimanche 25 Décembre 2016 20:00:07 Objet: Re: [pve-devel] [PATCH 2/6] qemu_drive_mirror : handle multiple jobs > On December 24, 2016 at 10:12 AM Alexandre DERUMIER <aderum...@odiso.com> > wrote: > > > Just has done small test with a socat tunnel > > > > socat UNIX-LISTEN:/tmp/foo.sock,reuseaddr,fork TCP:8.8.8.8:1234 > > drive_mirror -n drive-scsi0 nbd:unix:/tmp/foo.sock:test > > > then it's hanging like before, because no reponse. (we can make a small > timeout to qmp response wait) > > then,we can kill socat, which is dropping the unix socket, and drive-mirror > abort and freeing qmp commands. A nasty trick, but I suppose it gets the job done for now (until someone sends a patch upstream). Btw. before you go around killing socats: it has a `connect-timeout` option (which is almost surprising). ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] [PATCH 2/6] qemu_drive_mirror : handle multiple jobs
> On December 24, 2016 at 10:12 AM Alexandre DERUMIER> wrote: > > > Just has done small test with a socat tunnel > > > > socat UNIX-LISTEN:/tmp/foo.sock,reuseaddr,fork TCP:8.8.8.8:1234 > > drive_mirror -n drive-scsi0 nbd:unix:/tmp/foo.sock:test > > > then it's hanging like before, because no reponse. (we can make a small > timeout to qmp response wait) > > then,we can kill socat, which is dropping the unix socket, and drive-mirror > abort and freeing qmp commands. A nasty trick, but I suppose it gets the job done for now (until someone sends a patch upstream). Btw. before you go around killing socats: it has a `connect-timeout` option (which is almost surprising). ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] [PATCH 2/6] qemu_drive_mirror : handle multiple jobs
Just has done small test with a socat tunnel socat UNIX-LISTEN:/tmp/foo.sock,reuseaddr,fork TCP:8.8.8.8:1234 drive_mirror -n drive-scsi0 nbd:unix:/tmp/foo.sock:test then it's hanging like before, because no reponse. (we can make a small timeout to qmp response wait) then,we can kill socat, which is dropping the unix socket, and drive-mirror abort and freeing qmp commands. I'll try to send a demo patch monday. - Mail original - De: "aderumier" <aderum...@odiso.com> À: "Wolfgang Bumiller" <w.bumil...@proxmox.com> Cc: "pve-devel" <pve-devel@pve.proxmox.com> Envoyé: Vendredi 23 Décembre 2016 11:33:18 Objet: Re: [pve-devel] [PATCH 2/6] qemu_drive_mirror : handle multiple jobs >>Doesn't the job show up in query-block-jobs before the connection >>finishes? If so then we could just assume if it's still at 0% after a >>timeout the connection doesn't work? The question is though, what is a >>good timeout? the problem is that the qemu monitor is totally hanging... >>Or we find (or add) a way to query the connection status, then we can >>poll that before we start polling the block-job status. >>(we already poll for stuff, so it's not worse than before ;p) maybe can we mount a tunnel (it's possible to use unix socket like for live migration). but I would like to have something without ssh if possible. maybe with socat or something like (it's possible to do native tls in qemu now for migration and drive mirror) Like this we can be sure that the tunnel is established before launching drive-mirror ? But yes, it could be great to avoid to maintain another qemu patch. (and I'm pretty poor with C language and network programming) - Mail original - De: "Wolfgang Bumiller" <w.bumil...@proxmox.com> À: "aderumier" <aderum...@odiso.com> Cc: "pve-devel" <pve-devel@pve.proxmox.com> Envoyé: Vendredi 23 Décembre 2016 10:53:12 Objet: Re: [pve-devel] [PATCH 2/6] qemu_drive_mirror : handle multiple jobs On Fri, Dec 23, 2016 at 10:20:03AM +0100, Alexandre DERUMIER wrote: > Hi wolfgang, > > I have done more test with drive mirror and nbd target. > > It seem that the hang occur only if the target ip is unreacheable (no network > reponse) > > # drive_mirror -n drive-scsi0 nbd://66.66.66.66/target > ERROR: VM 183 qmp command 'human-monitor-command' failed - got timeout > > If the ip address exist and up, > # drive_mirror -n drive-scsi0 nbd://10.3.94.89:666/target > Failed to connect socket: Connection refused > > > > I'm not sure, maybe it can hang too if pve-firewall do a drop instead a > reject on target port. > > > > I think this come from in qemu net/socket.c, > > where we have an infinite loop. > > I'm not sure how to add a timeout here, help is welcome :) Doesn't the job show up in query-block-jobs before the connection finishes? If so then we could just assume if it's still at 0% after a timeout the connection doesn't work? The question is though, what is a good timeout? Or we find (or add) a way to query the connection status, then we can poll that before we start polling the block-job status. (we already poll for stuff, so it's not worse than before ;p) Then again, an actual timeout on qemu's side might even make it upstream...? Maybe? (If it doesn't I'd prefer polling as described above over maintaining yet another qemu patch ;-) ) But it won't be a simple select/poll timeout as they use non-blocking sockets and register the fd with a callback for the connection completion later (in net_socket_fd_init() it calls qemu_set_fd_handler() where it ends up controlled by the aio handlers). It might be possible to add a timer (qemu_timer_new() & friends perhaps?) to NetSocketState (the NetSocktState cleanup would have to remove it, and if it gets triggered it cancels the connection instead). > static int net_socket_connect_init(NetClientState *peer, > const char *model, > const char *name, > const char *host_str) > { > NetSocketState *s; > int fd, connected, ret; > struct sockaddr_in saddr; > > if (parse_host_port(, host_str) < 0) > return -1; > > fd = qemu_socket(PF_INET, SOCK_STREAM, 0); > if (fd < 0) { > perror("socket"); > return -1; > } > qemu_set_nonblock(fd); > > connected = 0; > for(;;) { > ret = connect(fd, (struct sockaddr *), sizeof(saddr)); > if (ret < 0) { > if (errno == EINTR || errno == EWOULDBLOCK) { > /* continue */ > } else if (errno == EINPROGRESS || > errno == EALREADY || > errno == EINVAL) { > break; > } else { > perror("connect"); > closesocket(fd); > return -1; > } > } else { > connected = 1; > break; > }
Re: [pve-devel] [PATCH 2/6] qemu_drive_mirror : handle multiple jobs
>>Doesn't the job show up in query-block-jobs before the connection >>finishes? If so then we could just assume if it's still at 0% after a >>timeout the connection doesn't work? The question is though, what is a >>good timeout? the problem is that the qemu monitor is totally hanging... >>Or we find (or add) a way to query the connection status, then we can >>poll that before we start polling the block-job status. >>(we already poll for stuff, so it's not worse than before ;p) maybe can we mount a tunnel (it's possible to use unix socket like for live migration). but I would like to have something without ssh if possible. maybe with socat or something like (it's possible to do native tls in qemu now for migration and drive mirror) Like this we can be sure that the tunnel is established before launching drive-mirror ? But yes, it could be great to avoid to maintain another qemu patch. (and I'm pretty poor with C language and network programming) - Mail original - De: "Wolfgang Bumiller" <w.bumil...@proxmox.com> À: "aderumier" <aderum...@odiso.com> Cc: "pve-devel" <pve-devel@pve.proxmox.com> Envoyé: Vendredi 23 Décembre 2016 10:53:12 Objet: Re: [pve-devel] [PATCH 2/6] qemu_drive_mirror : handle multiple jobs On Fri, Dec 23, 2016 at 10:20:03AM +0100, Alexandre DERUMIER wrote: > Hi wolfgang, > > I have done more test with drive mirror and nbd target. > > It seem that the hang occur only if the target ip is unreacheable (no network > reponse) > > # drive_mirror -n drive-scsi0 nbd://66.66.66.66/target > ERROR: VM 183 qmp command 'human-monitor-command' failed - got timeout > > If the ip address exist and up, > # drive_mirror -n drive-scsi0 nbd://10.3.94.89:666/target > Failed to connect socket: Connection refused > > > > I'm not sure, maybe it can hang too if pve-firewall do a drop instead a > reject on target port. > > > > I think this come from in qemu net/socket.c, > > where we have an infinite loop. > > I'm not sure how to add a timeout here, help is welcome :) Doesn't the job show up in query-block-jobs before the connection finishes? If so then we could just assume if it's still at 0% after a timeout the connection doesn't work? The question is though, what is a good timeout? Or we find (or add) a way to query the connection status, then we can poll that before we start polling the block-job status. (we already poll for stuff, so it's not worse than before ;p) Then again, an actual timeout on qemu's side might even make it upstream...? Maybe? (If it doesn't I'd prefer polling as described above over maintaining yet another qemu patch ;-) ) But it won't be a simple select/poll timeout as they use non-blocking sockets and register the fd with a callback for the connection completion later (in net_socket_fd_init() it calls qemu_set_fd_handler() where it ends up controlled by the aio handlers). It might be possible to add a timer (qemu_timer_new() & friends perhaps?) to NetSocketState (the NetSocktState cleanup would have to remove it, and if it gets triggered it cancels the connection instead). > static int net_socket_connect_init(NetClientState *peer, > const char *model, > const char *name, > const char *host_str) > { > NetSocketState *s; > int fd, connected, ret; > struct sockaddr_in saddr; > > if (parse_host_port(, host_str) < 0) > return -1; > > fd = qemu_socket(PF_INET, SOCK_STREAM, 0); > if (fd < 0) { > perror("socket"); > return -1; > } > qemu_set_nonblock(fd); > > connected = 0; > for(;;) { > ret = connect(fd, (struct sockaddr *), sizeof(saddr)); > if (ret < 0) { > if (errno == EINTR || errno == EWOULDBLOCK) { > /* continue */ > } else if (errno == EINPROGRESS || > errno == EALREADY || > errno == EINVAL) { > break; > } else { > perror("connect"); > closesocket(fd); > return -1; > } > } else { > connected = 1; > break; > } > } > s = net_socket_fd_init(peer, model, name, fd, connected); > if (!s) > return -1; > snprintf(s->nc.info_str, sizeof(s->nc.info_str), > "socket: connect to %s:%d", > inet_ntoa(saddr.sin_addr), ntohs(saddr.sin_port)); > return 0; > } > > > > > - Mail original - > De: "aderumier" <aderum...@odiso.com> > À: "Wolfgang Bumiller" <w.bumil...@proxmox.com> > Cc: "pve-devel" <pve-devel@pve.proxmox.com> > Envoyé: Mercredi 21 Décembre 2016 13:57:10 > Objet: Re: [pve-devel] [PATCH 2/6] qemu_drive_mirror : handle multiple jobs > > >>Then it can still hang if the destination disappears between tcp_ping() > >
Re: [pve-devel] [PATCH 2/6] qemu_drive_mirror : handle multiple jobs
On Fri, Dec 23, 2016 at 10:20:03AM +0100, Alexandre DERUMIER wrote: > Hi wolfgang, > > I have done more test with drive mirror and nbd target. > > It seem that the hang occur only if the target ip is unreacheable (no network > reponse) > > # drive_mirror -n drive-scsi0 nbd://66.66.66.66/target > ERROR: VM 183 qmp command 'human-monitor-command' failed - got timeout > > If the ip address exist and up, > # drive_mirror -n drive-scsi0 nbd://10.3.94.89:666/target > Failed to connect socket: Connection refused > > > > I'm not sure, maybe it can hang too if pve-firewall do a drop instead a > reject on target port. > > > > I think this come from in qemu net/socket.c, > > where we have an infinite loop. > > I'm not sure how to add a timeout here, help is welcome :) Doesn't the job show up in query-block-jobs before the connection finishes? If so then we could just assume if it's still at 0% after a timeout the connection doesn't work? The question is though, what is a good timeout? Or we find (or add) a way to query the connection status, then we can poll that before we start polling the block-job status. (we already poll for stuff, so it's not worse than before ;p) Then again, an actual timeout on qemu's side might even make it upstream...? Maybe? (If it doesn't I'd prefer polling as described above over maintaining yet another qemu patch ;-) ) But it won't be a simple select/poll timeout as they use non-blocking sockets and register the fd with a callback for the connection completion later (in net_socket_fd_init() it calls qemu_set_fd_handler() where it ends up controlled by the aio handlers). It might be possible to add a timer (qemu_timer_new() & friends perhaps?) to NetSocketState (the NetSocktState cleanup would have to remove it, and if it gets triggered it cancels the connection instead). > static int net_socket_connect_init(NetClientState *peer, >const char *model, >const char *name, >const char *host_str) > { > NetSocketState *s; > int fd, connected, ret; > struct sockaddr_in saddr; > > if (parse_host_port(, host_str) < 0) > return -1; > > fd = qemu_socket(PF_INET, SOCK_STREAM, 0); > if (fd < 0) { > perror("socket"); > return -1; > } > qemu_set_nonblock(fd); > > connected = 0; > for(;;) { > ret = connect(fd, (struct sockaddr *), sizeof(saddr)); > if (ret < 0) { > if (errno == EINTR || errno == EWOULDBLOCK) { > /* continue */ > } else if (errno == EINPROGRESS || >errno == EALREADY || >errno == EINVAL) { > break; > } else { > perror("connect"); > closesocket(fd); > return -1; > } > } else { > connected = 1; > break; > } > } > s = net_socket_fd_init(peer, model, name, fd, connected); > if (!s) > return -1; > snprintf(s->nc.info_str, sizeof(s->nc.info_str), > "socket: connect to %s:%d", > inet_ntoa(saddr.sin_addr), ntohs(saddr.sin_port)); > return 0; > } > > > > > - Mail original - > De: "aderumier" <aderum...@odiso.com> > À: "Wolfgang Bumiller" <w.bumil...@proxmox.com> > Cc: "pve-devel" <pve-devel@pve.proxmox.com> > Envoyé: Mercredi 21 Décembre 2016 13:57:10 > Objet: Re: [pve-devel] [PATCH 2/6] qemu_drive_mirror : handle multiple jobs > > >>Then it can still hang if the destination disappears between tcp_ping() > >>and the `drive-mirror` command, so I'd rather get better behavior on qemu's > >>side. It needs a time-out or a way to cancel it or something. > Yes sure! > > I'm currently looking at qemu code to see how nbd client works. > > - Mail original - > De: "Wolfgang Bumiller" <w.bumil...@proxmox.com> > À: "aderumier" <aderum...@odiso.com> > Cc: "pve-devel" <pve-devel@pve.proxmox.com>, "dietmar" <diet...@proxmox.com> > Envoyé: Mercredi 21 Décembre 2016 12:20:28 > Objet: Re: [pve-devel] [PATCH 2/6] qemu_drive_mirror : handle multiple jobs > > > On December 21, 2016 at 10:51 AM Alexandre DERUMIER <aderum...@odiso.com> > > wrote: > > > > > > >>IIRC that was the only blocker. > > >> > > >>Basically the patchset has to work *without* tcp_ping() since it is an
Re: [pve-devel] [PATCH 2/6] qemu_drive_mirror : handle multiple jobs
Hi wolfgang, I have done more test with drive mirror and nbd target. It seem that the hang occur only if the target ip is unreacheable (no network reponse) # drive_mirror -n drive-scsi0 nbd://66.66.66.66/target ERROR: VM 183 qmp command 'human-monitor-command' failed - got timeout If the ip address exist and up, # drive_mirror -n drive-scsi0 nbd://10.3.94.89:666/target Failed to connect socket: Connection refused I'm not sure, maybe it can hang too if pve-firewall do a drop instead a reject on target port. I think this come from in qemu net/socket.c, where we have an infinite loop. I'm not sure how to add a timeout here, help is welcome :) static int net_socket_connect_init(NetClientState *peer, const char *model, const char *name, const char *host_str) { NetSocketState *s; int fd, connected, ret; struct sockaddr_in saddr; if (parse_host_port(, host_str) < 0) return -1; fd = qemu_socket(PF_INET, SOCK_STREAM, 0); if (fd < 0) { perror("socket"); return -1; } qemu_set_nonblock(fd); connected = 0; for(;;) { ret = connect(fd, (struct sockaddr *), sizeof(saddr)); if (ret < 0) { if (errno == EINTR || errno == EWOULDBLOCK) { /* continue */ } else if (errno == EINPROGRESS || errno == EALREADY || errno == EINVAL) { break; } else { perror("connect"); closesocket(fd); return -1; } } else { connected = 1; break; } } s = net_socket_fd_init(peer, model, name, fd, connected); if (!s) return -1; snprintf(s->nc.info_str, sizeof(s->nc.info_str), "socket: connect to %s:%d", inet_ntoa(saddr.sin_addr), ntohs(saddr.sin_port)); return 0; } - Mail original - De: "aderumier" <aderum...@odiso.com> À: "Wolfgang Bumiller" <w.bumil...@proxmox.com> Cc: "pve-devel" <pve-devel@pve.proxmox.com> Envoyé: Mercredi 21 Décembre 2016 13:57:10 Objet: Re: [pve-devel] [PATCH 2/6] qemu_drive_mirror : handle multiple jobs >>Then it can still hang if the destination disappears between tcp_ping() >>and the `drive-mirror` command, so I'd rather get better behavior on qemu's >>side. It needs a time-out or a way to cancel it or something. Yes sure! I'm currently looking at qemu code to see how nbd client works. - Mail original - De: "Wolfgang Bumiller" <w.bumil...@proxmox.com> À: "aderumier" <aderum...@odiso.com> Cc: "pve-devel" <pve-devel@pve.proxmox.com>, "dietmar" <diet...@proxmox.com> Envoyé: Mercredi 21 Décembre 2016 12:20:28 Objet: Re: [pve-devel] [PATCH 2/6] qemu_drive_mirror : handle multiple jobs > On December 21, 2016 at 10:51 AM Alexandre DERUMIER <aderum...@odiso.com> > wrote: > > > >>IIRC that was the only blocker. > >> > >>Basically the patchset has to work *without* tcp_ping() since it is an > >>unreliable check, and then we still have to catch failing connections > >>_correctly_. (There's no point in knowing that "some time in the past > >>you were able to connect to something which may or may not have been a > >>qemu nbd server", we need to know whether the drive-mirror job itself > >>was able to connect.) > > For me, the mirror job auto abort if connection is failing during the > migration. Do you see another behaviour ? That covers one problem. IIRC the disk-deletion problem was that due to wrong [] usage around an ipv6 address it could not connect in the first place and didn't error as I would have hoped. > > the tcp_ping was just before launching the drive mirror command, because it > was hanging in this case. Then it can still hang if the destination disappears between tcp_ping() and the `drive-mirror` command, so I'd rather get better behavior on qemu's side. It needs a time-out or a way to cancel it or something. ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] [PATCH 2/6] qemu_drive_mirror : handle multiple jobs
>>Then it can still hang if the destination disappears between tcp_ping() >>and the `drive-mirror` command, so I'd rather get better behavior on qemu's >>side. It needs a time-out or a way to cancel it or something. Yes sure! I'm currently looking at qemu code to see how nbd client works. - Mail original - De: "Wolfgang Bumiller" <w.bumil...@proxmox.com> À: "aderumier" <aderum...@odiso.com> Cc: "pve-devel" <pve-devel@pve.proxmox.com>, "dietmar" <diet...@proxmox.com> Envoyé: Mercredi 21 Décembre 2016 12:20:28 Objet: Re: [pve-devel] [PATCH 2/6] qemu_drive_mirror : handle multiple jobs > On December 21, 2016 at 10:51 AM Alexandre DERUMIER <aderum...@odiso.com> > wrote: > > > >>IIRC that was the only blocker. > >> > >>Basically the patchset has to work *without* tcp_ping() since it is an > >>unreliable check, and then we still have to catch failing connections > >>_correctly_. (There's no point in knowing that "some time in the past > >>you were able to connect to something which may or may not have been a > >>qemu nbd server", we need to know whether the drive-mirror job itself > >>was able to connect.) > > For me, the mirror job auto abort if connection is failing during the > migration. Do you see another behaviour ? That covers one problem. IIRC the disk-deletion problem was that due to wrong [] usage around an ipv6 address it could not connect in the first place and didn't error as I would have hoped. > > the tcp_ping was just before launching the drive mirror command, because it > was hanging in this case. Then it can still hang if the destination disappears between tcp_ping() and the `drive-mirror` command, so I'd rather get better behavior on qemu's side. It needs a time-out or a way to cancel it or something. ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] [PATCH 2/6] qemu_drive_mirror : handle multiple jobs
> On December 21, 2016 at 10:51 AM Alexandre DERUMIER> wrote: > > > >>IIRC that was the only blocker. > >> > >>Basically the patchset has to work *without* tcp_ping() since it is an > >>unreliable check, and then we still have to catch failing connections > >>_correctly_. (There's no point in knowing that "some time in the past > >>you were able to connect to something which may or may not have been a > >>qemu nbd server", we need to know whether the drive-mirror job itself > >>was able to connect.) > > For me, the mirror job auto abort if connection is failing during the > migration. Do you see another behaviour ? That covers one problem. IIRC the disk-deletion problem was that due to wrong [] usage around an ipv6 address it could not connect in the first place and didn't error as I would have hoped. > > the tcp_ping was just before launching the drive mirror command, because it > was hanging in this case. Then it can still hang if the destination disappears between tcp_ping() and the `drive-mirror` command, so I'd rather get better behavior on qemu's side. It needs a time-out or a way to cancel it or something. ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] [PATCH 2/6] qemu_drive_mirror : handle multiple jobs
I have seen a lot of improvement with nbd in qemu 2.8, maybe the connection hang has been fixed. I'll try to do tests. - Mail original - De: "aderumier" <aderum...@odiso.com> À: "Wolfgang Bumiller" <w.bumil...@proxmox.com> Cc: "pve-devel" <pve-devel@pve.proxmox.com> Envoyé: Mercredi 21 Décembre 2016 10:51:55 Objet: Re: [pve-devel] [PATCH 2/6] qemu_drive_mirror : handle multiple jobs >>IIRC that was the only blocker. >> >>Basically the patchset has to work *without* tcp_ping() since it is an >>unreliable check, and then we still have to catch failing connections >>_correctly_. (There's no point in knowing that "some time in the past >>you were able to connect to something which may or may not have been a >>qemu nbd server", we need to know whether the drive-mirror job itself >>was able to connect.) For me, the mirror job auto abort if connection is failing during the migration. Do you see another behaviour ? the tcp_ping was just before launching the drive mirror command, because it was hanging in this case. - Mail original - De: "Wolfgang Bumiller" <w.bumil...@proxmox.com> À: "aderumier" <aderum...@odiso.com> Cc: "dietmar" <diet...@proxmox.com>, "pve-devel" <pve-devel@pve.proxmox.com> Envoyé: Mercredi 21 Décembre 2016 10:47:48 Objet: Re: [pve-devel] [PATCH 2/6] qemu_drive_mirror : handle multiple jobs On Mon, Dec 19, 2016 at 07:05:58AM +0100, Alexandre DERUMIER wrote: > >>AFAIR the patch looks already quite good. We currently prepare for > >>the 4.4 release, but when that is done we can start adding new features > >>like local disk live migration. > > Hi, > Now that 4.4 has been release, could it be possible to apply live storage > migration patches ? > > I'll have some time until end of the year to polish them. So far they've been working, but iirc there's still the issue where breaking/failing connections _during_ the operation and/or the block jobs failing to establish the connection in the first place go unnoticed where we end up with possibly deleted disks instead of properly failing. (Which is why I said I didn't like the tcp_ping check.) IIRC that was the only blocker. Basically the patchset has to work *without* tcp_ping() since it is an unreliable check, and then we still have to catch failing connections _correctly_. (There's no point in knowing that "some time in the past you were able to connect to something which may or may not have been a qemu nbd server", we need to know whether the drive-mirror job itself was able to connect.) > Also, I would like to improve them later, to add another feature, live > migration to remote external proxmox cluster. (on different proxmox && > storage). > Vmware has this feature in entreprise premium, and I'll need it for some > customers next year where I can't have downtime. Sounds like a piece of work ;-) ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] [PATCH 2/6] qemu_drive_mirror : handle multiple jobs
>>IIRC that was the only blocker. >> >>Basically the patchset has to work *without* tcp_ping() since it is an >>unreliable check, and then we still have to catch failing connections >>_correctly_. (There's no point in knowing that "some time in the past >>you were able to connect to something which may or may not have been a >>qemu nbd server", we need to know whether the drive-mirror job itself >>was able to connect.) For me, the mirror job auto abort if connection is failing during the migration. Do you see another behaviour ? the tcp_ping was just before launching the drive mirror command, because it was hanging in this case. - Mail original - De: "Wolfgang Bumiller" <w.bumil...@proxmox.com> À: "aderumier" <aderum...@odiso.com> Cc: "dietmar" <diet...@proxmox.com>, "pve-devel" <pve-devel@pve.proxmox.com> Envoyé: Mercredi 21 Décembre 2016 10:47:48 Objet: Re: [pve-devel] [PATCH 2/6] qemu_drive_mirror : handle multiple jobs On Mon, Dec 19, 2016 at 07:05:58AM +0100, Alexandre DERUMIER wrote: > >>AFAIR the patch looks already quite good. We currently prepare for > >>the 4.4 release, but when that is done we can start adding new features > >>like local disk live migration. > > Hi, > Now that 4.4 has been release, could it be possible to apply live storage > migration patches ? > > I'll have some time until end of the year to polish them. So far they've been working, but iirc there's still the issue where breaking/failing connections _during_ the operation and/or the block jobs failing to establish the connection in the first place go unnoticed where we end up with possibly deleted disks instead of properly failing. (Which is why I said I didn't like the tcp_ping check.) IIRC that was the only blocker. Basically the patchset has to work *without* tcp_ping() since it is an unreliable check, and then we still have to catch failing connections _correctly_. (There's no point in knowing that "some time in the past you were able to connect to something which may or may not have been a qemu nbd server", we need to know whether the drive-mirror job itself was able to connect.) > Also, I would like to improve them later, to add another feature, live > migration to remote external proxmox cluster. (on different proxmox && > storage). > Vmware has this feature in entreprise premium, and I'll need it for some > customers next year where I can't have downtime. Sounds like a piece of work ;-) ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] [PATCH 2/6] qemu_drive_mirror : handle multiple jobs
On Mon, Dec 19, 2016 at 07:05:58AM +0100, Alexandre DERUMIER wrote: > >>AFAIR the patch looks already quite good. We currently prepare for > >>the 4.4 release, but when that is done we can start adding new features > >>like local disk live migration. > > Hi, > Now that 4.4 has been release, could it be possible to apply live storage > migration patches ? > > I'll have some time until end of the year to polish them. So far they've been working, but iirc there's still the issue where breaking/failing connections _during_ the operation and/or the block jobs failing to establish the connection in the first place go unnoticed where we end up with possibly deleted disks instead of properly failing. (Which is why I said I didn't like the tcp_ping check.) IIRC that was the only blocker. Basically the patchset has to work *without* tcp_ping() since it is an unreliable check, and then we still have to catch failing connections _correctly_. (There's no point in knowing that "some time in the past you were able to connect to something which may or may not have been a qemu nbd server", we need to know whether the drive-mirror job itself was able to connect.) > Also, I would like to improve them later, to add another feature, live > migration to remote external proxmox cluster. (on different proxmox && > storage). > Vmware has this feature in entreprise premium, and I'll need it for some > customers next year where I can't have downtime. Sounds like a piece of work ;-) ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] [PATCH 2/6] qemu_drive_mirror : handle multiple jobs
>>AFAIR the patch looks already quite good. We currently prepare for >>the 4.4 release, but when that is done we can start adding new features >>like local disk live migration. Hi, Now that 4.4 has been release, could it be possible to apply live storage migration patches ? I'll have some time until end of the year to polish them. Also, I would like to improve them later, to add another feature, live migration to remote external proxmox cluster. (on different proxmox && storage). Vmware has this feature in entreprise premium, and I'll need it for some customers next year where I can't have downtime. - Mail original - De: "dietmar" <diet...@proxmox.com> À: "Alexandre Derumier" <aderum...@odiso.com>, "pve-devel" <pve-devel@pve.proxmox.com> Envoyé: Jeudi 1 Décembre 2016 16:30:22 Objet: Re: [pve-devel] [PATCH 2/6] qemu_drive_mirror : handle multiple jobs > Any news for local disk live migration ? > > I have a time to work on it this month if cleanup is needed, but I'll be very > busy at the begin of next year. AFAIR the patch looks already quite good. We currently prepare for the 4.4 release, but when that is done we can start adding new features like local disk live migration. ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] [PATCH 2/6] qemu_drive_mirror : handle multiple jobs
>>We currently prepare for >>the 4.4 release, but when that is done we can start adding new features >>like local disk live migration. Ok Great ! - Mail original - De: "dietmar" <diet...@proxmox.com> À: "aderumier" <aderum...@odiso.com>, "pve-devel" <pve-devel@pve.proxmox.com> Envoyé: Jeudi 1 Décembre 2016 16:30:22 Objet: Re: [pve-devel] [PATCH 2/6] qemu_drive_mirror : handle multiple jobs > Any news for local disk live migration ? > > I have a time to work on it this month if cleanup is needed, but I'll be very > busy at the begin of next year. AFAIR the patch looks already quite good. We currently prepare for the 4.4 release, but when that is done we can start adding new features like local disk live migration. ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] [PATCH 2/6] qemu_drive_mirror : handle multiple jobs
> Any news for local disk live migration ? > > I have a time to work on it this month if cleanup is needed, but I'll be very > busy at the begin of next year. AFAIR the patch looks already quite good. We currently prepare for the 4.4 release, but when that is done we can start adding new features like local disk live migration. ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] [PATCH 2/6] qemu_drive_mirror : handle multiple jobs
Any news for local disk live migration ? I have a time to work on it this month if cleanup is needed, but I'll be very busy at the begin of next year. - Mail original - De: "aderumier" <aderum...@odiso.com> À: "pve-devel" <pve-devel@pve.proxmox.com> Envoyé: Mercredi 16 Novembre 2016 12:10:05 Objet: Re: [pve-devel] [PATCH 2/6] qemu_drive_mirror : handle multiple jobs Wolfgang, any comment ? - Mail original - De: "aderumier" <aderum...@odiso.com> À: "Wolfgang Bumiller" <w.bumil...@proxmox.com> Cc: "pve-devel" <pve-devel@pve.proxmox.com> Envoyé: Jeudi 10 Novembre 2016 15:47:47 Objet: Re: [pve-devel] [PATCH 2/6] qemu_drive_mirror : handle multiple jobs >>Well, yes, but what happens when the connection fails or gets interrupted? >>A vanishing connection (timeout) as well as when the connection gets killed >>for some reason (eg. tcpkill) need to be handled in the >>qemu_drive_mirror_monitor() functions properly. a running mirroring job auto abort if network fail. (or any write/read error on source/destination). This check was more for the connection. the drive-mirror hang when trying to establish the connection if the target not responding. (and I'm not sure about it they are a timeout, as I have wait some minutes without response) - Mail original - De: "Wolfgang Bumiller" <w.bumil...@proxmox.com> À: "aderumier" <aderum...@odiso.com> Cc: "pve-devel" <pve-devel@pve.proxmox.com> Envoyé: Jeudi 10 Novembre 2016 14:43:15 Objet: Re: [pve-devel] [PATCH 2/6] qemu_drive_mirror : handle multiple jobs > On November 10, 2016 at 1:54 PM Alexandre DERUMIER <aderum...@odiso.com> > wrote: > > > > + die "can't connect remote nbd server $server:$port" if > > !PVE::Network::tcp_ping($server, $port, 2); > >>I'm not all too happy about this check here as it is a TOCTOU race. > >>(I'm not happy about some the other uses of it as well, but some are > >>only for status quieries, iow. to display information, where it's fine.) > > >>However, in this case, if broken/missing connections can still not be > >>caught (like in my previous tests), then this only hides 99.99% of the > >>cases while still wrongly deleting disks in the other 0.01%, which is > >>unacceptable behavior. > > So, do you want that I remove the check ? Well, yes, but what happens when the connection fails or gets interrupted? A vanishing connection (timeout) as well as when the connection gets killed for some reason (eg. tcpkill) need to be handled in the qemu_drive_mirror_monitor() functions properly. > > > >>$jobs is still empty at this point. The assignment below should be moved > >>up. > > > + die "mirroring error: $err"; > > + } > > + > > + $jobs->{"drive-$drive"} = {}; > >>This one ^ > > Ok. > > Thanks for the review! > > > > - Mail original - > De: "Wolfgang Bumiller" <w.bumil...@proxmox.com> > À: "aderumier" <aderum...@odiso.com> > Cc: "pve-devel" <pve-devel@pve.proxmox.com> > Envoyé: Jeudi 10 Novembre 2016 13:21:00 > Objet: Re: [pve-devel] [PATCH 2/6] qemu_drive_mirror : handle multiple jobs > > On Tue, Nov 08, 2016 at 04:29:30AM +0100, Alexandre Derumier wrote: > > we can use multiple drive_mirror in parralel. > > > > block-job-complete can be skipped, if we want to add more mirror job later. > > > > also add support for nbd uri to qemu_drive_mirror > > > > Signed-off-by: Alexandre Derumier <aderum...@odiso.com> > > --- > > PVE/QemuServer.pm | 171 > > +++--- > > 1 file changed, 123 insertions(+), 48 deletions(-) > > > > diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm > > index 54edd96..e989670 100644 > > --- a/PVE/QemuServer.pm > > +++ b/PVE/QemuServer.pm > > @@ -5824,91 +5824,165 @@ sub qemu_img_format { > > } > > > > sub qemu_drive_mirror { > > - my ($vmid, $drive, $dst_volid, $vmiddst, $is_zero_initialized) = @_; > > + my ($vmid, $drive, $dst_volid, $vmiddst, $is_zero_initialized, $jobs, > > $skipcomplete) = @_; > > > > - my $storecfg = PVE::Storage::config(); > > - my ($dst_storeid, $dst_volname) = > > PVE::Storage::parse_volume_id($dst_volid); > > + $jobs = {} if !$jobs; > > + > > + my $qemu_target; > > + my $format; > > > > - my $dst_scfg = PVE::Storage::storage_config($storecfg, $dst_storeid); > > + if($dst_volid =~
Re: [pve-devel] [PATCH 2/6] qemu_drive_mirror : handle multiple jobs
Wolfgang, any comment ? - Mail original - De: "aderumier" <aderum...@odiso.com> À: "Wolfgang Bumiller" <w.bumil...@proxmox.com> Cc: "pve-devel" <pve-devel@pve.proxmox.com> Envoyé: Jeudi 10 Novembre 2016 15:47:47 Objet: Re: [pve-devel] [PATCH 2/6] qemu_drive_mirror : handle multiple jobs >>Well, yes, but what happens when the connection fails or gets interrupted? >>A vanishing connection (timeout) as well as when the connection gets killed >>for some reason (eg. tcpkill) need to be handled in the >>qemu_drive_mirror_monitor() functions properly. a running mirroring job auto abort if network fail. (or any write/read error on source/destination). This check was more for the connection. the drive-mirror hang when trying to establish the connection if the target not responding. (and I'm not sure about it they are a timeout, as I have wait some minutes without response) - Mail original - De: "Wolfgang Bumiller" <w.bumil...@proxmox.com> À: "aderumier" <aderum...@odiso.com> Cc: "pve-devel" <pve-devel@pve.proxmox.com> Envoyé: Jeudi 10 Novembre 2016 14:43:15 Objet: Re: [pve-devel] [PATCH 2/6] qemu_drive_mirror : handle multiple jobs > On November 10, 2016 at 1:54 PM Alexandre DERUMIER <aderum...@odiso.com> > wrote: > > > > + die "can't connect remote nbd server $server:$port" if > > !PVE::Network::tcp_ping($server, $port, 2); > >>I'm not all too happy about this check here as it is a TOCTOU race. > >>(I'm not happy about some the other uses of it as well, but some are > >>only for status quieries, iow. to display information, where it's fine.) > > >>However, in this case, if broken/missing connections can still not be > >>caught (like in my previous tests), then this only hides 99.99% of the > >>cases while still wrongly deleting disks in the other 0.01%, which is > >>unacceptable behavior. > > So, do you want that I remove the check ? Well, yes, but what happens when the connection fails or gets interrupted? A vanishing connection (timeout) as well as when the connection gets killed for some reason (eg. tcpkill) need to be handled in the qemu_drive_mirror_monitor() functions properly. > > > >>$jobs is still empty at this point. The assignment below should be moved > >>up. > > > + die "mirroring error: $err"; > > + } > > + > > + $jobs->{"drive-$drive"} = {}; > >>This one ^ > > Ok. > > Thanks for the review! > > > > - Mail original - > De: "Wolfgang Bumiller" <w.bumil...@proxmox.com> > À: "aderumier" <aderum...@odiso.com> > Cc: "pve-devel" <pve-devel@pve.proxmox.com> > Envoyé: Jeudi 10 Novembre 2016 13:21:00 > Objet: Re: [pve-devel] [PATCH 2/6] qemu_drive_mirror : handle multiple jobs > > On Tue, Nov 08, 2016 at 04:29:30AM +0100, Alexandre Derumier wrote: > > we can use multiple drive_mirror in parralel. > > > > block-job-complete can be skipped, if we want to add more mirror job later. > > > > also add support for nbd uri to qemu_drive_mirror > > > > Signed-off-by: Alexandre Derumier <aderum...@odiso.com> > > --- > > PVE/QemuServer.pm | 171 > > +++--- > > 1 file changed, 123 insertions(+), 48 deletions(-) > > > > diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm > > index 54edd96..e989670 100644 > > --- a/PVE/QemuServer.pm > > +++ b/PVE/QemuServer.pm > > @@ -5824,91 +5824,165 @@ sub qemu_img_format { > > } > > > > sub qemu_drive_mirror { > > - my ($vmid, $drive, $dst_volid, $vmiddst, $is_zero_initialized) = @_; > > + my ($vmid, $drive, $dst_volid, $vmiddst, $is_zero_initialized, $jobs, > > $skipcomplete) = @_; > > > > - my $storecfg = PVE::Storage::config(); > > - my ($dst_storeid, $dst_volname) = > > PVE::Storage::parse_volume_id($dst_volid); > > + $jobs = {} if !$jobs; > > + > > + my $qemu_target; > > + my $format; > > > > - my $dst_scfg = PVE::Storage::storage_config($storecfg, $dst_storeid); > > + if($dst_volid =~ /^nbd:(localhost|[\d\.]+|\[[\d\.:a-fA-F]+\]):(\d+)/) { > > + $qemu_target = $dst_volid; > > + my $server = $1; > > + my $port = $2; > > + $format = "nbd"; > > + die "can't connect remote nbd server $server:$port" if > > !PVE::Network::tcp_ping($server, $port, 2); > > I'm not all too happy about this check here as it is a TOCTOU race. > (I'm n
Re: [pve-devel] [PATCH 2/6] qemu_drive_mirror : handle multiple jobs
>>Well, yes, but what happens when the connection fails or gets interrupted? >>A vanishing connection (timeout) as well as when the connection gets killed >>for some reason (eg. tcpkill) need to be handled in the >>qemu_drive_mirror_monitor() functions properly. a running mirroring job auto abort if network fail. (or any write/read error on source/destination). This check was more for the connection. the drive-mirror hang when trying to establish the connection if the target not responding. (and I'm not sure about it they are a timeout, as I have wait some minutes without response) - Mail original - De: "Wolfgang Bumiller" <w.bumil...@proxmox.com> À: "aderumier" <aderum...@odiso.com> Cc: "pve-devel" <pve-devel@pve.proxmox.com> Envoyé: Jeudi 10 Novembre 2016 14:43:15 Objet: Re: [pve-devel] [PATCH 2/6] qemu_drive_mirror : handle multiple jobs > On November 10, 2016 at 1:54 PM Alexandre DERUMIER <aderum...@odiso.com> > wrote: > > > > + die "can't connect remote nbd server $server:$port" if > > !PVE::Network::tcp_ping($server, $port, 2); > >>I'm not all too happy about this check here as it is a TOCTOU race. > >>(I'm not happy about some the other uses of it as well, but some are > >>only for status quieries, iow. to display information, where it's fine.) > > >>However, in this case, if broken/missing connections can still not be > >>caught (like in my previous tests), then this only hides 99.99% of the > >>cases while still wrongly deleting disks in the other 0.01%, which is > >>unacceptable behavior. > > So, do you want that I remove the check ? Well, yes, but what happens when the connection fails or gets interrupted? A vanishing connection (timeout) as well as when the connection gets killed for some reason (eg. tcpkill) need to be handled in the qemu_drive_mirror_monitor() functions properly. > > > >>$jobs is still empty at this point. The assignment below should be moved > >>up. > > > + die "mirroring error: $err"; > > + } > > + > > + $jobs->{"drive-$drive"} = {}; > >>This one ^ > > Ok. > > Thanks for the review! > > > > - Mail original - > De: "Wolfgang Bumiller" <w.bumil...@proxmox.com> > À: "aderumier" <aderum...@odiso.com> > Cc: "pve-devel" <pve-devel@pve.proxmox.com> > Envoyé: Jeudi 10 Novembre 2016 13:21:00 > Objet: Re: [pve-devel] [PATCH 2/6] qemu_drive_mirror : handle multiple jobs > > On Tue, Nov 08, 2016 at 04:29:30AM +0100, Alexandre Derumier wrote: > > we can use multiple drive_mirror in parralel. > > > > block-job-complete can be skipped, if we want to add more mirror job later. > > > > also add support for nbd uri to qemu_drive_mirror > > > > Signed-off-by: Alexandre Derumier <aderum...@odiso.com> > > --- > > PVE/QemuServer.pm | 171 > > +++--- > > 1 file changed, 123 insertions(+), 48 deletions(-) > > > > diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm > > index 54edd96..e989670 100644 > > --- a/PVE/QemuServer.pm > > +++ b/PVE/QemuServer.pm > > @@ -5824,91 +5824,165 @@ sub qemu_img_format { > > } > > > > sub qemu_drive_mirror { > > - my ($vmid, $drive, $dst_volid, $vmiddst, $is_zero_initialized) = @_; > > + my ($vmid, $drive, $dst_volid, $vmiddst, $is_zero_initialized, $jobs, > > $skipcomplete) = @_; > > > > - my $storecfg = PVE::Storage::config(); > > - my ($dst_storeid, $dst_volname) = > > PVE::Storage::parse_volume_id($dst_volid); > > + $jobs = {} if !$jobs; > > + > > + my $qemu_target; > > + my $format; > > > > - my $dst_scfg = PVE::Storage::storage_config($storecfg, $dst_storeid); > > + if($dst_volid =~ /^nbd:(localhost|[\d\.]+|\[[\d\.:a-fA-F]+\]):(\d+)/) { > > + $qemu_target = $dst_volid; > > + my $server = $1; > > + my $port = $2; > > + $format = "nbd"; > > + die "can't connect remote nbd server $server:$port" if > > !PVE::Network::tcp_ping($server, $port, 2); > > I'm not all too happy about this check here as it is a TOCTOU race. > (I'm not happy about some the other uses of it as well, but some are > only for status quieries, iow. to display information, where it's fine.) > > However, in this case, if broken/missing connections can still not be > caught (like in my previous tests), then this only hides 99.99% of the > cases while still wrongly deleting disks i
Re: [pve-devel] [PATCH 2/6] qemu_drive_mirror : handle multiple jobs
> On November 10, 2016 at 1:54 PM Alexandre DERUMIER <aderum...@odiso.com> > wrote: > > > > + die "can't connect remote nbd server $server:$port" if > > !PVE::Network::tcp_ping($server, $port, 2); > >>I'm not all too happy about this check here as it is a TOCTOU race. > >>(I'm not happy about some the other uses of it as well, but some are > >>only for status quieries, iow. to display information, where it's fine.) > > >>However, in this case, if broken/missing connections can still not be > >>caught (like in my previous tests), then this only hides 99.99% of the > >>cases while still wrongly deleting disks in the other 0.01%, which is > >>unacceptable behavior. > > So, do you want that I remove the check ? Well, yes, but what happens when the connection fails or gets interrupted? A vanishing connection (timeout) as well as when the connection gets killed for some reason (eg. tcpkill) need to be handled in the qemu_drive_mirror_monitor() functions properly. > > > >>$jobs is still empty at this point. The assignment below should be moved > >>up. > > > + die "mirroring error: $err"; > > + } > > + > > + $jobs->{"drive-$drive"} = {}; > >>This one ^ > > Ok. > > Thanks for the review! > > > > ----- Mail original ----- > De: "Wolfgang Bumiller" <w.bumil...@proxmox.com> > À: "aderumier" <aderum...@odiso.com> > Cc: "pve-devel" <pve-devel@pve.proxmox.com> > Envoyé: Jeudi 10 Novembre 2016 13:21:00 > Objet: Re: [pve-devel] [PATCH 2/6] qemu_drive_mirror : handle multiple jobs > > On Tue, Nov 08, 2016 at 04:29:30AM +0100, Alexandre Derumier wrote: > > we can use multiple drive_mirror in parralel. > > > > block-job-complete can be skipped, if we want to add more mirror job later. > > > > also add support for nbd uri to qemu_drive_mirror > > > > Signed-off-by: Alexandre Derumier <aderum...@odiso.com> > > --- > > PVE/QemuServer.pm | 171 > > +++--- > > 1 file changed, 123 insertions(+), 48 deletions(-) > > > > diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm > > index 54edd96..e989670 100644 > > --- a/PVE/QemuServer.pm > > +++ b/PVE/QemuServer.pm > > @@ -5824,91 +5824,165 @@ sub qemu_img_format { > > } > > > > sub qemu_drive_mirror { > > - my ($vmid, $drive, $dst_volid, $vmiddst, $is_zero_initialized) = @_; > > + my ($vmid, $drive, $dst_volid, $vmiddst, $is_zero_initialized, $jobs, > > $skipcomplete) = @_; > > > > - my $storecfg = PVE::Storage::config(); > > - my ($dst_storeid, $dst_volname) = > > PVE::Storage::parse_volume_id($dst_volid); > > + $jobs = {} if !$jobs; > > + > > + my $qemu_target; > > + my $format; > > > > - my $dst_scfg = PVE::Storage::storage_config($storecfg, $dst_storeid); > > + if($dst_volid =~ /^nbd:(localhost|[\d\.]+|\[[\d\.:a-fA-F]+\]):(\d+)/) { > > + $qemu_target = $dst_volid; > > + my $server = $1; > > + my $port = $2; > > + $format = "nbd"; > > + die "can't connect remote nbd server $server:$port" if > > !PVE::Network::tcp_ping($server, $port, 2); > > I'm not all too happy about this check here as it is a TOCTOU race. > (I'm not happy about some the other uses of it as well, but some are > only for status quieries, iow. to display information, where it's fine.) > > However, in this case, if broken/missing connections can still not be > caught (like in my previous tests), then this only hides 99.99% of the > cases while still wrongly deleting disks in the other 0.01%, which is > unacceptable behavior. > > > + } else { > > + > > + my $storecfg = PVE::Storage::config(); > > + my ($dst_storeid, $dst_volname) = > > PVE::Storage::parse_volume_id($dst_volid); > > + > > + my $dst_scfg = PVE::Storage::storage_config($storecfg, $dst_storeid); > > > > - my $format = qemu_img_format($dst_scfg, $dst_volname); > > + $format = qemu_img_format($dst_scfg, $dst_volname); > > > > - my $dst_path = PVE::Storage::path($storecfg, $dst_volid); > > + my $dst_path = PVE::Storage::path($storecfg, $dst_volid); > > > > - my $qemu_target = $is_zero_initialized ? "zeroinit:$dst_path" : > > $dst_path; > > + $qemu_target = $is_zero_initialized ? "zeroinit:$dst_path" : $dst_path; > > + } > > > > my $opts = { timeout => 10, device => "drive-$drive&qu
Re: [pve-devel] [PATCH 2/6] qemu_drive_mirror : handle multiple jobs
> + die "can't connect remote nbd server $server:$port" if > !PVE::Network::tcp_ping($server, $port, 2); >>I'm not all too happy about this check here as it is a TOCTOU race. >>(I'm not happy about some the other uses of it as well, but some are >>only for status quieries, iow. to display information, where it's fine.) >>However, in this case, if broken/missing connections can still not be >>caught (like in my previous tests), then this only hides 99.99% of the >>cases while still wrongly deleting disks in the other 0.01%, which is >>unacceptable behavior. So, do you want that I remove the check ? >>$jobs is still empty at this point. The assignment below should be moved >>up. > + die "mirroring error: $err"; > + } > + > + $jobs->{"drive-$drive"} = {}; >>This one ^ Ok. Thanks for the review! - Mail original - De: "Wolfgang Bumiller" <w.bumil...@proxmox.com> À: "aderumier" <aderum...@odiso.com> Cc: "pve-devel" <pve-devel@pve.proxmox.com> Envoyé: Jeudi 10 Novembre 2016 13:21:00 Objet: Re: [pve-devel] [PATCH 2/6] qemu_drive_mirror : handle multiple jobs On Tue, Nov 08, 2016 at 04:29:30AM +0100, Alexandre Derumier wrote: > we can use multiple drive_mirror in parralel. > > block-job-complete can be skipped, if we want to add more mirror job later. > > also add support for nbd uri to qemu_drive_mirror > > Signed-off-by: Alexandre Derumier <aderum...@odiso.com> > --- > PVE/QemuServer.pm | 171 > +++--- > 1 file changed, 123 insertions(+), 48 deletions(-) > > diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm > index 54edd96..e989670 100644 > --- a/PVE/QemuServer.pm > +++ b/PVE/QemuServer.pm > @@ -5824,91 +5824,165 @@ sub qemu_img_format { > } > > sub qemu_drive_mirror { > - my ($vmid, $drive, $dst_volid, $vmiddst, $is_zero_initialized) = @_; > + my ($vmid, $drive, $dst_volid, $vmiddst, $is_zero_initialized, $jobs, > $skipcomplete) = @_; > > - my $storecfg = PVE::Storage::config(); > - my ($dst_storeid, $dst_volname) = > PVE::Storage::parse_volume_id($dst_volid); > + $jobs = {} if !$jobs; > + > + my $qemu_target; > + my $format; > > - my $dst_scfg = PVE::Storage::storage_config($storecfg, $dst_storeid); > + if($dst_volid =~ /^nbd:(localhost|[\d\.]+|\[[\d\.:a-fA-F]+\]):(\d+)/) { > + $qemu_target = $dst_volid; > + my $server = $1; > + my $port = $2; > + $format = "nbd"; > + die "can't connect remote nbd server $server:$port" if > !PVE::Network::tcp_ping($server, $port, 2); I'm not all too happy about this check here as it is a TOCTOU race. (I'm not happy about some the other uses of it as well, but some are only for status quieries, iow. to display information, where it's fine.) However, in this case, if broken/missing connections can still not be caught (like in my previous tests), then this only hides 99.99% of the cases while still wrongly deleting disks in the other 0.01%, which is unacceptable behavior. > + } else { > + > + my $storecfg = PVE::Storage::config(); > + my ($dst_storeid, $dst_volname) = > PVE::Storage::parse_volume_id($dst_volid); > + > + my $dst_scfg = PVE::Storage::storage_config($storecfg, $dst_storeid); > > - my $format = qemu_img_format($dst_scfg, $dst_volname); > + $format = qemu_img_format($dst_scfg, $dst_volname); > > - my $dst_path = PVE::Storage::path($storecfg, $dst_volid); > + my $dst_path = PVE::Storage::path($storecfg, $dst_volid); > > - my $qemu_target = $is_zero_initialized ? "zeroinit:$dst_path" : $dst_path; > + $qemu_target = $is_zero_initialized ? "zeroinit:$dst_path" : $dst_path; > + } > > my $opts = { timeout => 10, device => "drive-$drive", mode => "existing", > sync => "full", target => $qemu_target }; > $opts->{format} = $format if $format; > > - print "drive mirror is starting (scanning bitmap) : this step can take some > minutes/hours, depend of disk size and storage speed\n"; > + print "drive mirror is starting for drive-$drive\n"; > > - my $finish_job = sub { > - while (1) { > - my $stats = vm_mon_cmd($vmid, "query-block-jobs"); > - my $stat = @$stats[0]; > - last if !$stat; > - sleep 1; > - } > - }; > + eval { vm_mon_cmd($vmid, "drive-mirror", %$opts); }; #if a job already run > for this device,it's throw an error > + if (my $err = $@) { > + eval { PVE::QemuServer::qemu_blockjobs_cancel($vmid, $jobs) }; $jobs is still empty at this point. The assignment below should be moved
Re: [pve-devel] [PATCH 2/6] qemu_drive_mirror : handle multiple jobs
On Tue, Nov 08, 2016 at 04:29:30AM +0100, Alexandre Derumier wrote: > we can use multiple drive_mirror in parralel. > > block-job-complete can be skipped, if we want to add more mirror job later. > > also add support for nbd uri to qemu_drive_mirror > > Signed-off-by: Alexandre Derumier> --- > PVE/QemuServer.pm | 171 > +++--- > 1 file changed, 123 insertions(+), 48 deletions(-) > > diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm > index 54edd96..e989670 100644 > --- a/PVE/QemuServer.pm > +++ b/PVE/QemuServer.pm > @@ -5824,91 +5824,165 @@ sub qemu_img_format { > } > > sub qemu_drive_mirror { > -my ($vmid, $drive, $dst_volid, $vmiddst, $is_zero_initialized) = @_; > +my ($vmid, $drive, $dst_volid, $vmiddst, $is_zero_initialized, $jobs, > $skipcomplete) = @_; > > -my $storecfg = PVE::Storage::config(); > -my ($dst_storeid, $dst_volname) = > PVE::Storage::parse_volume_id($dst_volid); > +$jobs = {} if !$jobs; > + > +my $qemu_target; > +my $format; > > -my $dst_scfg = PVE::Storage::storage_config($storecfg, $dst_storeid); > +if($dst_volid =~ /^nbd:(localhost|[\d\.]+|\[[\d\.:a-fA-F]+\]):(\d+)/) { > + $qemu_target = $dst_volid; > + my $server = $1; > + my $port = $2; > + $format = "nbd"; > + die "can't connect remote nbd server $server:$port" if > !PVE::Network::tcp_ping($server, $port, 2); I'm not all too happy about this check here as it is a TOCTOU race. (I'm not happy about some the other uses of it as well, but some are only for status quieries, iow. to display information, where it's fine.) However, in this case, if broken/missing connections can still not be caught (like in my previous tests), then this only hides 99.99% of the cases while still wrongly deleting disks in the other 0.01%, which is unacceptable behavior. > +} else { > + > + my $storecfg = PVE::Storage::config(); > + my ($dst_storeid, $dst_volname) = > PVE::Storage::parse_volume_id($dst_volid); > + > + my $dst_scfg = PVE::Storage::storage_config($storecfg, $dst_storeid); > > -my $format = qemu_img_format($dst_scfg, $dst_volname); > + $format = qemu_img_format($dst_scfg, $dst_volname); > > -my $dst_path = PVE::Storage::path($storecfg, $dst_volid); > + my $dst_path = PVE::Storage::path($storecfg, $dst_volid); > > -my $qemu_target = $is_zero_initialized ? "zeroinit:$dst_path" : > $dst_path; > + $qemu_target = $is_zero_initialized ? "zeroinit:$dst_path" : $dst_path; > +} > > my $opts = { timeout => 10, device => "drive-$drive", mode => > "existing", sync => "full", target => $qemu_target }; > $opts->{format} = $format if $format; > > -print "drive mirror is starting (scanning bitmap) : this step can take > some minutes/hours, depend of disk size and storage speed\n"; > +print "drive mirror is starting for drive-$drive\n"; > > -my $finish_job = sub { > - while (1) { > - my $stats = vm_mon_cmd($vmid, "query-block-jobs"); > - my $stat = @$stats[0]; > - last if !$stat; > - sleep 1; > - } > -}; > +eval { vm_mon_cmd($vmid, "drive-mirror", %$opts); }; #if a job already > run for this device,it's throw an error > +if (my $err = $@) { > + eval { PVE::QemuServer::qemu_blockjobs_cancel($vmid, $jobs) }; $jobs is still empty at this point. The assignment below should be moved up. > + die "mirroring error: $err"; > +} > + > +$jobs->{"drive-$drive"} = {}; This one ^ > + > +qemu_drive_mirror_monitor ($vmid, $vmiddst, $jobs, $skipcomplete); > +} > + > +sub qemu_drive_mirror_monitor { > +my ($vmid, $vmiddst, $jobs, $skipcomplete) = @_; > > eval { > -vm_mon_cmd($vmid, "drive-mirror", %$opts); > + > + my $err_complete = 0; > + > while (1) { > + die "storage migration timed out\n" if $err_complete > 300; > + > my $stats = vm_mon_cmd($vmid, "query-block-jobs"); > - my $stat = @$stats[0]; > - die "mirroring job seem to have die. Maybe do you have bad > sectors?" if !$stat; > - die "error job is not mirroring" if $stat->{type} ne "mirror"; > > - my $busy = $stat->{busy}; > - my $ready = $stat->{ready}; > + my $running_mirror_jobs = {}; > + foreach my $stat (@$stats) { > + next if $stat->{type} ne 'mirror'; > + $running_mirror_jobs->{$stat->{device}} = $stat; > + } > > - if (my $total = $stat->{len}) { > - my $transferred = $stat->{offset} || 0; > - my $remaining = $total - $transferred; > - my $percent = sprintf "%.2f", ($transferred * 100 / $total); > + my $readycounter = 0; > > - print "transferred: $transferred bytes remaining: $remaining > bytes total: $total bytes progression: $percent % busy: $busy ready: $ready > \n"; > - } > + foreach my $job
[pve-devel] [PATCH 2/6] qemu_drive_mirror : handle multiple jobs
we can use multiple drive_mirror in parralel. block-job-complete can be skipped, if we want to add more mirror job later. also add support for nbd uri to qemu_drive_mirror Signed-off-by: Alexandre Derumier--- PVE/QemuServer.pm | 171 +++--- 1 file changed, 123 insertions(+), 48 deletions(-) diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm index 54edd96..e989670 100644 --- a/PVE/QemuServer.pm +++ b/PVE/QemuServer.pm @@ -5824,91 +5824,165 @@ sub qemu_img_format { } sub qemu_drive_mirror { -my ($vmid, $drive, $dst_volid, $vmiddst, $is_zero_initialized) = @_; +my ($vmid, $drive, $dst_volid, $vmiddst, $is_zero_initialized, $jobs, $skipcomplete) = @_; -my $storecfg = PVE::Storage::config(); -my ($dst_storeid, $dst_volname) = PVE::Storage::parse_volume_id($dst_volid); +$jobs = {} if !$jobs; + +my $qemu_target; +my $format; -my $dst_scfg = PVE::Storage::storage_config($storecfg, $dst_storeid); +if($dst_volid =~ /^nbd:(localhost|[\d\.]+|\[[\d\.:a-fA-F]+\]):(\d+)/) { + $qemu_target = $dst_volid; + my $server = $1; + my $port = $2; + $format = "nbd"; + die "can't connect remote nbd server $server:$port" if !PVE::Network::tcp_ping($server, $port, 2); +} else { + + my $storecfg = PVE::Storage::config(); + my ($dst_storeid, $dst_volname) = PVE::Storage::parse_volume_id($dst_volid); + + my $dst_scfg = PVE::Storage::storage_config($storecfg, $dst_storeid); -my $format = qemu_img_format($dst_scfg, $dst_volname); + $format = qemu_img_format($dst_scfg, $dst_volname); -my $dst_path = PVE::Storage::path($storecfg, $dst_volid); + my $dst_path = PVE::Storage::path($storecfg, $dst_volid); -my $qemu_target = $is_zero_initialized ? "zeroinit:$dst_path" : $dst_path; + $qemu_target = $is_zero_initialized ? "zeroinit:$dst_path" : $dst_path; +} my $opts = { timeout => 10, device => "drive-$drive", mode => "existing", sync => "full", target => $qemu_target }; $opts->{format} = $format if $format; -print "drive mirror is starting (scanning bitmap) : this step can take some minutes/hours, depend of disk size and storage speed\n"; +print "drive mirror is starting for drive-$drive\n"; -my $finish_job = sub { - while (1) { - my $stats = vm_mon_cmd($vmid, "query-block-jobs"); - my $stat = @$stats[0]; - last if !$stat; - sleep 1; - } -}; +eval { vm_mon_cmd($vmid, "drive-mirror", %$opts); }; #if a job already run for this device,it's throw an error +if (my $err = $@) { + eval { PVE::QemuServer::qemu_blockjobs_cancel($vmid, $jobs) }; + die "mirroring error: $err"; +} + +$jobs->{"drive-$drive"} = {}; + +qemu_drive_mirror_monitor ($vmid, $vmiddst, $jobs, $skipcomplete); +} + +sub qemu_drive_mirror_monitor { +my ($vmid, $vmiddst, $jobs, $skipcomplete) = @_; eval { -vm_mon_cmd($vmid, "drive-mirror", %$opts); + + my $err_complete = 0; + while (1) { + die "storage migration timed out\n" if $err_complete > 300; + my $stats = vm_mon_cmd($vmid, "query-block-jobs"); - my $stat = @$stats[0]; - die "mirroring job seem to have die. Maybe do you have bad sectors?" if !$stat; - die "error job is not mirroring" if $stat->{type} ne "mirror"; - my $busy = $stat->{busy}; - my $ready = $stat->{ready}; + my $running_mirror_jobs = {}; + foreach my $stat (@$stats) { + next if $stat->{type} ne 'mirror'; + $running_mirror_jobs->{$stat->{device}} = $stat; + } - if (my $total = $stat->{len}) { - my $transferred = $stat->{offset} || 0; - my $remaining = $total - $transferred; - my $percent = sprintf "%.2f", ($transferred * 100 / $total); + my $readycounter = 0; - print "transferred: $transferred bytes remaining: $remaining bytes total: $total bytes progression: $percent % busy: $busy ready: $ready \n"; - } + foreach my $job (keys %$jobs) { + + if(defined($jobs->{$job}->{complete}) && !defined($running_mirror_jobs->{$job})) { + print "$job : finished\n"; + delete $jobs->{$job}; + next; + } + + die "$job: mirroring has been cancelled. Maybe do you have bad sectors?" if !defined($running_mirror_jobs->{$job}); + my $busy = $running_mirror_jobs->{$job}->{busy}; + my $ready = $running_mirror_jobs->{$job}->{ready}; + if (my $total = $running_mirror_jobs->{$job}->{len}) { + my $transferred = $running_mirror_jobs->{$job}->{offset} || 0; + my $remaining = $total - $transferred; + my $percent =
[pve-devel] [PATCH 2/6] qemu_drive_mirror : handle multiple jobs
we can use multiple drive_mirror in parralel. block-job-complete can be skipped, if we want to add more mirror job later. also add support for nbd uri to qemu_drive_mirror Signed-off-by: Alexandre Derumier--- PVE/QemuServer.pm | 171 +++--- 1 file changed, 123 insertions(+), 48 deletions(-) diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm index 728110f..affb1e6 100644 --- a/PVE/QemuServer.pm +++ b/PVE/QemuServer.pm @@ -5801,91 +5801,165 @@ sub qemu_img_format { } sub qemu_drive_mirror { -my ($vmid, $drive, $dst_volid, $vmiddst, $is_zero_initialized) = @_; +my ($vmid, $drive, $dst_volid, $vmiddst, $is_zero_initialized, $jobs, $skipcomplete) = @_; -my $storecfg = PVE::Storage::config(); -my ($dst_storeid, $dst_volname) = PVE::Storage::parse_volume_id($dst_volid); +$jobs = {} if !$jobs; + +my $qemu_target; +my $format; -my $dst_scfg = PVE::Storage::storage_config($storecfg, $dst_storeid); +if($dst_volid =~ /^nbd:(localhost|[\d\.]+|\[[\d\.:a-fA-F]+\]):(\d+)/) { + $qemu_target = $dst_volid; + my $server = $1; + my $port = $2; + $format = "nbd"; + die "can't connect remote nbd server $server:$port" if !PVE::Network::tcp_ping($server, $port, 2); +} else { + + my $storecfg = PVE::Storage::config(); + my ($dst_storeid, $dst_volname) = PVE::Storage::parse_volume_id($dst_volid); + + my $dst_scfg = PVE::Storage::storage_config($storecfg, $dst_storeid); -my $format = qemu_img_format($dst_scfg, $dst_volname); + $format = qemu_img_format($dst_scfg, $dst_volname); -my $dst_path = PVE::Storage::path($storecfg, $dst_volid); + my $dst_path = PVE::Storage::path($storecfg, $dst_volid); -my $qemu_target = $is_zero_initialized ? "zeroinit:$dst_path" : $dst_path; + $qemu_target = $is_zero_initialized ? "zeroinit:$dst_path" : $dst_path; +} my $opts = { timeout => 10, device => "drive-$drive", mode => "existing", sync => "full", target => $qemu_target }; $opts->{format} = $format if $format; -print "drive mirror is starting (scanning bitmap) : this step can take some minutes/hours, depend of disk size and storage speed\n"; +print "drive mirror is starting for drive-$drive\n"; -my $finish_job = sub { - while (1) { - my $stats = vm_mon_cmd($vmid, "query-block-jobs"); - my $stat = @$stats[0]; - last if !$stat; - sleep 1; - } -}; +eval { vm_mon_cmd($vmid, "drive-mirror", %$opts); }; #if a job already run for this device,it's throw an error +if (my $err = $@) { + eval { PVE::QemuServer::qemu_blockjobs_cancel($vmid, $jobs) }; + die "mirroring error: $err"; +} + +$jobs->{"drive-$drive"} = {}; + +qemu_drive_mirror_monitor ($vmid, $vmiddst, $jobs, $skipcomplete); +} + +sub qemu_drive_mirror_monitor { +my ($vmid, $vmiddst, $jobs, $skipcomplete) = @_; eval { -vm_mon_cmd($vmid, "drive-mirror", %$opts); + + my $err_complete = 0; + while (1) { + die "storage migration timed out\n" if $err_complete > 300; + my $stats = vm_mon_cmd($vmid, "query-block-jobs"); - my $stat = @$stats[0]; - die "mirroring job seem to have die. Maybe do you have bad sectors?" if !$stat; - die "error job is not mirroring" if $stat->{type} ne "mirror"; - my $busy = $stat->{busy}; - my $ready = $stat->{ready}; + my $running_mirror_jobs = {}; + foreach my $stat (@$stats) { + next if $stat->{type} ne 'mirror'; + $running_mirror_jobs->{$stat->{device}} = $stat; + } - if (my $total = $stat->{len}) { - my $transferred = $stat->{offset} || 0; - my $remaining = $total - $transferred; - my $percent = sprintf "%.2f", ($transferred * 100 / $total); + my $readycounter = 0; - print "transferred: $transferred bytes remaining: $remaining bytes total: $total bytes progression: $percent % busy: $busy ready: $ready \n"; - } + foreach my $job (keys %$jobs) { + + if(defined($jobs->{$job}->{complete}) && !defined($running_mirror_jobs->{$job})) { + print "$job : finished\n"; + delete $jobs->{$job}; + next; + } + + die "$job: mirroring has been cancelled. Maybe do you have bad sectors?" if !defined($running_mirror_jobs->{$job}); + my $busy = $running_mirror_jobs->{$job}->{busy}; + my $ready = $running_mirror_jobs->{$job}->{ready}; + if (my $total = $running_mirror_jobs->{$job}->{len}) { + my $transferred = $running_mirror_jobs->{$job}->{offset} || 0; + my $remaining = $total - $transferred; + my $percent =
[pve-devel] [PATCH 2/6] qemu_drive_mirror : handle multiple jobs
we can use multiple drive_mirror in parralel. block-job-complete can be skipped, if we want to add more mirror job later. also add support for nbd uri to qemu_drive_mirror Signed-off-by: Alexandre Derumier--- PVE/QemuServer.pm | 168 ++ 1 file changed, 120 insertions(+), 48 deletions(-) diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm index 728110f..d0310b6 100644 --- a/PVE/QemuServer.pm +++ b/PVE/QemuServer.pm @@ -5801,91 +5801,162 @@ sub qemu_img_format { } sub qemu_drive_mirror { -my ($vmid, $drive, $dst_volid, $vmiddst, $is_zero_initialized) = @_; +my ($vmid, $drive, $dst_volid, $vmiddst, $is_zero_initialized, $jobs, $skipcomplete) = @_; -my $storecfg = PVE::Storage::config(); -my ($dst_storeid, $dst_volname) = PVE::Storage::parse_volume_id($dst_volid); +$jobs = {} if !$jobs; + +my $qemu_target; +my $format; -my $dst_scfg = PVE::Storage::storage_config($storecfg, $dst_storeid); +if($dst_volid =~ /^nbd:/) { + $qemu_target = $dst_volid; + $format = "nbd"; +} else { + + my $storecfg = PVE::Storage::config(); + my ($dst_storeid, $dst_volname) = PVE::Storage::parse_volume_id($dst_volid); + + my $dst_scfg = PVE::Storage::storage_config($storecfg, $dst_storeid); -my $format = qemu_img_format($dst_scfg, $dst_volname); + $format = qemu_img_format($dst_scfg, $dst_volname); -my $dst_path = PVE::Storage::path($storecfg, $dst_volid); + my $dst_path = PVE::Storage::path($storecfg, $dst_volid); -my $qemu_target = $is_zero_initialized ? "zeroinit:$dst_path" : $dst_path; + $qemu_target = $is_zero_initialized ? "zeroinit:$dst_path" : $dst_path; +} my $opts = { timeout => 10, device => "drive-$drive", mode => "existing", sync => "full", target => $qemu_target }; $opts->{format} = $format if $format; -print "drive mirror is starting (scanning bitmap) : this step can take some minutes/hours, depend of disk size and storage speed\n"; +print "drive mirror is starting for drive-$drive\n"; -my $finish_job = sub { - while (1) { - my $stats = vm_mon_cmd($vmid, "query-block-jobs"); - my $stat = @$stats[0]; - last if !$stat; - sleep 1; - } -}; +eval { vm_mon_cmd($vmid, "drive-mirror", %$opts); }; #if a job already run for this device,it's throw an error +if (my $err = $@) { + eval { PVE::QemuServer::qemu_blockjobs_cancel($vmid, $jobs) }; + die "mirroring error: $err"; +} + +$jobs->{"drive-$drive"} = {}; + +qemu_drive_mirror_monitor ($vmid, $vmiddst, $jobs, $skipcomplete); +} + +sub qemu_drive_mirror_monitor { +my ($vmid, $vmiddst, $jobs, $skipcomplete) = @_; eval { -vm_mon_cmd($vmid, "drive-mirror", %$opts); + + my $err_complete = 0; + while (1) { + die "storage migration timed out\n" if $err_complete > 300; + my $stats = vm_mon_cmd($vmid, "query-block-jobs"); - my $stat = @$stats[0]; - die "mirroring job seem to have die. Maybe do you have bad sectors?" if !$stat; - die "error job is not mirroring" if $stat->{type} ne "mirror"; - my $busy = $stat->{busy}; - my $ready = $stat->{ready}; + my $running_mirror_jobs = {}; + foreach my $stat (@$stats) { + next if $stat->{type} ne 'mirror'; + $running_mirror_jobs->{$stat->{device}} = $stat; + } - if (my $total = $stat->{len}) { - my $transferred = $stat->{offset} || 0; - my $remaining = $total - $transferred; - my $percent = sprintf "%.2f", ($transferred * 100 / $total); + my $readycounter = 0; - print "transferred: $transferred bytes remaining: $remaining bytes total: $total bytes progression: $percent % busy: $busy ready: $ready \n"; - } + foreach my $job (keys %$jobs) { + + if(defined($jobs->{$job}->{complete}) && !defined($running_mirror_jobs->{$job})) { + print "$job : finished\n"; + delete $jobs->{$job}; + next; + } + + die "$job: mirroring has been cancelled. Maybe do you have bad sectors?" if !defined($running_mirror_jobs->{$job}); + my $busy = $running_mirror_jobs->{$job}->{busy}; + my $ready = $running_mirror_jobs->{$job}->{ready}; + if (my $total = $running_mirror_jobs->{$job}->{len}) { + my $transferred = $running_mirror_jobs->{$job}->{offset} || 0; + my $remaining = $total - $transferred; + my $percent = sprintf "%.2f", ($transferred * 100 / $total); - if ($stat->{ready} eq 'true') { + print "$job: transferred: $transferred bytes remaining: $remaining bytes total: $total bytes
[pve-devel] [PATCH 2/6] qemu_drive_mirror : handle multiple jobs
we can use multiple drive_mirror in parralel. block-job-complete can be skipped, if we want to add more mirror job later. also add support for nbd uri to qemu_drive_mirror Signed-off-by: Alexandre Derumier--- PVE/QemuServer.pm | 140 +++--- 1 file changed, 92 insertions(+), 48 deletions(-) diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm index 728110f..feb3619 100644 --- a/PVE/QemuServer.pm +++ b/PVE/QemuServer.pm @@ -5801,91 +5801,134 @@ sub qemu_img_format { } sub qemu_drive_mirror { -my ($vmid, $drive, $dst_volid, $vmiddst, $is_zero_initialized) = @_; +my ($vmid, $drive, $dst_volid, $vmiddst, $is_zero_initialized, $jobs, $skipcomplete) = @_; -my $storecfg = PVE::Storage::config(); -my ($dst_storeid, $dst_volname) = PVE::Storage::parse_volume_id($dst_volid); +$jobs = {} if !$jobs; + +my $qemu_target; +my $format; -my $dst_scfg = PVE::Storage::storage_config($storecfg, $dst_storeid); +if($dst_volid =~ /^nbd:/) { + $qemu_target = $dst_volid; + $format = "nbd"; +} else { -my $format = qemu_img_format($dst_scfg, $dst_volname); + my $storecfg = PVE::Storage::config(); + my ($dst_storeid, $dst_volname) = PVE::Storage::parse_volume_id($dst_volid); -my $dst_path = PVE::Storage::path($storecfg, $dst_volid); + my $dst_scfg = PVE::Storage::storage_config($storecfg, $dst_storeid); -my $qemu_target = $is_zero_initialized ? "zeroinit:$dst_path" : $dst_path; + $format = qemu_img_format($dst_scfg, $dst_volname); + + my $dst_path = PVE::Storage::path($storecfg, $dst_volid); + + $qemu_target = $is_zero_initialized ? "zeroinit:$dst_path" : $dst_path; +} my $opts = { timeout => 10, device => "drive-$drive", mode => "existing", sync => "full", target => $qemu_target }; $opts->{format} = $format if $format; -print "drive mirror is starting (scanning bitmap) : this step can take some minutes/hours, depend of disk size and storage speed\n"; +print "drive mirror is starting for drive-$drive\n"; -my $finish_job = sub { - while (1) { - my $stats = vm_mon_cmd($vmid, "query-block-jobs"); - my $stat = @$stats[0]; - last if !$stat; - sleep 1; - } -}; +eval { vm_mon_cmd($vmid, "drive-mirror", %$opts); }; #if a job already run for this device,it's throw an error +if (my $err = $@) { + eval { PVE::QemuServer::qemu_blockjobs_cancel($vmid, $jobs) }; + die "mirroring error: $err"; +} + +$jobs->{"drive-$drive"} = 1; + +qemu_drive_mirror_monitor ($vmid, $vmiddst, $jobs, $skipcomplete); +} + +sub qemu_drive_mirror_monitor { +my ($vmid, $vmiddst, $jobs, $skipcomplete) = @_; eval { -vm_mon_cmd($vmid, "drive-mirror", %$opts); + + my $err_complete = 0; + while (1) { + die "storage migration timed out\n" if $err_complete > 300; + my $stats = vm_mon_cmd($vmid, "query-block-jobs"); - my $stat = @$stats[0]; - die "mirroring job seem to have die. Maybe do you have bad sectors?" if !$stat; - die "error job is not mirroring" if $stat->{type} ne "mirror"; - my $busy = $stat->{busy}; - my $ready = $stat->{ready}; + my $running_mirror_jobs = {}; + foreach my $stat (@$stats) { + next if $stat->{type} ne 'mirror'; + $running_mirror_jobs->{$stat->{device}} = $stat; + } - if (my $total = $stat->{len}) { - my $transferred = $stat->{offset} || 0; - my $remaining = $total - $transferred; - my $percent = sprintf "%.2f", ($transferred * 100 / $total); + my $readycounter = 0; - print "transferred: $transferred bytes remaining: $remaining bytes total: $total bytes progression: $percent % busy: $busy ready: $ready \n"; - } + foreach my $job (keys %$jobs) { + die "$job: mirroring has been cancelled. Maybe do you have bad sectors?" if !defined($running_mirror_jobs->{$job}); - if ($stat->{ready} eq 'true') { + my $busy = $running_mirror_jobs->{$job}->{busy}; + my $ready = $running_mirror_jobs->{$job}->{ready}; + if (my $total = $running_mirror_jobs->{$job}->{len}) { + my $transferred = $running_mirror_jobs->{$job}->{offset} || 0; + my $remaining = $total - $transferred; + my $percent = sprintf "%.2f", ($transferred * 100 / $total); - last if $vmiddst != $vmid; + print "$job: transferred: $transferred bytes remaining: $remaining bytes total: $total bytes progression: $percent % busy: $busy ready: $ready \n"; + } - # try to switch the disk if source and destination are on the same guest - eval {