Re: [pve-devel] [PATCH container] bindmount: catch rw/ro race and add tests
applied ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] [PATCH v4 1/3] migrate: collect migration tunnel child process
> On June 3, 2016 at 5:59 AM Alexandre DERUMIERwrote: > > > >>I just notice that qemu dev are planning to add tls support to livemigration > >> > > Seem that it's already commited for qemu 2.7 > > http://git.qemu.org/?p=qemu.git;a=commit;h=e122636562218b3d442cd2cd18fbc188dd9ce709 Thanks for the link! We should test that when we update to 2.7 ... ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] [PATCH v4 1/3] migrate: collect migration tunnel child process
>>I just notice that qemu dev are planning to add tls support to livemigration Seem that it's already commited for qemu 2.7 http://git.qemu.org/?p=qemu.git;a=commit;h=e122636562218b3d442cd2cd18fbc188dd9ce709 - Mail original - De: "aderumier"À: "dietmar" , "pve-devel" Cc: "Thomas Lamprecht" Envoyé: Vendredi 3 Juin 2016 05:55:38 Objet: Re: [pve-devel] [PATCH v4 1/3] migrate: collect migration tunnel child process Hi, My 2 cents, but I just notice that qemu dev are planning to add tls support to livemigration https://lists.gnu.org/archive/html/qemu-devel/2016-05/msg04644.html I think in the long term, it could be better than using ssh tunnel (which limit migration performance because of lack of multithreading) - Mail original - De: "dietmar" À: "Thomas Lamprecht" , "pve-devel" Envoyé: Jeudi 2 Juin 2016 19:53:02 Objet: Re: [pve-devel] [PATCH v4 1/3] migrate: collect migration tunnel child process And please note that waitpid is a particularly painful syscall when combined with perl. I a still not sure if the code is correct - what about EINTR and other errors? > > Oh, and thanks for the solution in the other answer, that seems nice. > > (And while I see your point I would not call mine serious drawback as it > > was only used once and hardcoded to 30, just drawback :) ) > > It was 20 in the first version, 30 in the second version, ... > > But the real problem is that I need to review all those patches. This may > look easy, but it is really time consuming task (as you see) if > patches contain too many changes at once ... > > ___ > pve-devel mailing list > pve-devel@pve.proxmox.com > http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel > ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] [PATCH v4 1/3] migrate: collect migration tunnel child process
Hi, My 2 cents, but I just notice that qemu dev are planning to add tls support to livemigration https://lists.gnu.org/archive/html/qemu-devel/2016-05/msg04644.html I think in the long term, it could be better than using ssh tunnel (which limit migration performance because of lack of multithreading) - Mail original - De: "dietmar"À: "Thomas Lamprecht" , "pve-devel" Envoyé: Jeudi 2 Juin 2016 19:53:02 Objet: Re: [pve-devel] [PATCH v4 1/3] migrate: collect migration tunnel child process And please note that waitpid is a particularly painful syscall when combined with perl. I a still not sure if the code is correct - what about EINTR and other errors? > > Oh, and thanks for the solution in the other answer, that seems nice. > > (And while I see your point I would not call mine serious drawback as it > > was only used once and hardcoded to 30, just drawback :) ) > > It was 20 in the first version, 30 in the second version, ... > > But the real problem is that I need to review all those patches. This may > look easy, but it is really time consuming task (as you see) if > patches contain too many changes at once ... > > ___ > pve-devel mailing list > pve-devel@pve.proxmox.com > http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel > ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] [PATCH v4 1/3] migrate: collect migration tunnel child process
And please note that waitpid is a particularly painful syscall when combined with perl. I a still not sure if the code is correct - what about EINTR and other errors? > > Oh, and thanks for the solution in the other answer, that seems nice. > > (And while I see your point I would not call mine serious drawback as it > > was only used once and hardcoded to 30, just drawback :) ) > > It was 20 in the first version, 30 in the second version, ... > > But the real problem is that I need to review all those patches. This may > look easy, but it is really time consuming task (as you see) if > patches contain too many changes at once ... > > ___ > pve-devel mailing list > pve-devel@pve.proxmox.com > http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel > ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] [PATCH v4 1/3] migrate: collect migration tunnel child process
> Oh, and thanks for the solution in the other answer, that seems nice. > (And while I see your point I would not call mine serious drawback as it > was only used once and hardcoded to 30, just drawback :) ) It was 20 in the first version, 30 in the second version, ... But the real problem is that I need to review all those patches. This may look easy, but it is really time consuming task (as you see) if patches contain too many changes at once ... ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] [PATCH v4 1/3] migrate: collect migration tunnel child process
Am 02.06.2016 um 18:37 schrieb Dietmar Maurer: >> >> if ($timeout) { >> for (my $i = 0; $i < $timeout; $i++) { >> - return if !PVE::ProcFSTools::check_process_running($cpid); >> + return if !&$check_process_running(); > > Maybe it is worth to include that waitpid > in PVE::ProcFSTools::check_process_running() - not sure... > maybe with a parameter? But as someone can only call wait for there one children and check_process_running is for all this may be not suited best. Oh, and thanks for the solution in the other answer, that seems nice. (And while I see your point I would not call mine serious drawback as it was only used once and hardcoded to 30, just drawback :) ) ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] [PATCH v4 1/3] migrate: collect migration tunnel child process
> > if ($timeout) { > for (my $i = 0; $i < $timeout; $i++) { > - return if !PVE::ProcFSTools::check_process_running($cpid); > + return if !&$check_process_running(); Maybe it is worth to include that waitpid in PVE::ProcFSTools::check_process_running() - not sure... ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] [PATCH v4 1/3] migrate: collect migration tunnel child process
> I did the loop exactly because of that reason, I had to insert the same > code two times exactly the same and another time the waitpid, as this > seems unnecessary to me I used a simple loop, anybody sees what happens You loop has serious drawbacks, for example you run into strange side effects when you want to change the timeout (timeout is no longer a parameter, so you have to change hard coded values at several places). > and it works without writing everything tree times... I is surely not > the nicest solution, but imo better that tripple code. I guess there are other ways to avoid code duplication, for example: diff --git a/PVE/QemuMigrate.pm b/PVE/QemuMigrate.pm index a25efff..d5dcc55 100644 --- a/PVE/QemuMigrate.pm +++ b/PVE/QemuMigrate.pm @@ -2,6 +2,7 @@ package PVE::QemuMigrate; use strict; use warnings; +use POSIX qw( WNOHANG ); use PVE::AbstractMigrate; use IO::File; use IPC::Open2; @@ -44,17 +45,28 @@ sub fork_command_pipe { sub finish_command_pipe { my ($self, $cmdpipe, $timeout) = @_; +my $cpid = $cmdpipe->{pid}; +return if !defined($cpid); + my $writer = $cmdpipe->{writer}; my $reader = $cmdpipe->{reader}; $writer->close(); $reader->close(); -my $cpid = $cmdpipe->{pid}; +my $check_process_running = sub { + my $res = waitpid($cpid, WNOHANG); + if (defined($res) && ($res == $cpid)) { + delete $cmdpipe->{cpid}; + return 1; + } else { + return 0; + } +}; if ($timeout) { for (my $i = 0; $i < $timeout; $i++) { - return if !PVE::ProcFSTools::check_process_running($cpid); + return if !&$check_process_running(); sleep(1); } } @@ -64,13 +76,15 @@ sub finish_command_pipe { # wait again for (my $i = 0; $i < 10; $i++) { - return if !PVE::ProcFSTools::check_process_running($cpid); + return if !&$check_process_running(); sleep(1); } $self->log('info', "ssh tunnel still running - terminating now with SIGKILL\n"); kill 9, $cpid; sleep 1; + +&$check_process_running(); } sub fork_tunnel { This it totally untested - I just want to show what I talk about. ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] [PATCH v4 1/3] migrate: collect migration tunnel child process
Am 02.06.2016 um 17:06 schrieb Dietmar Maurer: >> I can see the reason to use waitpid instead of check_process_running(), >> but why do you change the rest of the code? >> >> Can we have minimal patches, where each patch states the reason for the >> change >> in the commit log? > > I thought about something like this: > > diff --git a/PVE/QemuMigrate.pm b/PVE/QemuMigrate.pm > index a25efff..ce5774e 100644 > --- a/PVE/QemuMigrate.pm > +++ b/PVE/QemuMigrate.pm > @@ -71,6 +71,8 @@ sub finish_command_pipe { > $self->log('info', "ssh tunnel still running - terminating now with > SIGKILL\n"); > kill 9, $cpid; > sleep 1; > + > +waitpid($cpid); # avoid zombies That doesn't fixes it :) You have returns above so this statement is only reached if all fails and the tunnel has to be killed, in normal cases the child does not get selected. Also the writer/reader may get double closed here, i thought about something like diff --git a/PVE/QemuMigrate.pm b/PVE/QemuMigrate.pm index a25efff..d869fa3 100644 --- a/PVE/QemuMigrate.pm +++ b/PVE/QemuMigrate.pm @@ -42,37 +42,53 @@ sub fork_command_pipe { } sub finish_command_pipe { my ($self, $cmdpipe, $timeout) = @_; +my $cpid = $cmdpipe->{pid}; +return if !defined($cpid); + my $writer = $cmdpipe->{writer}; my $reader = $cmdpipe->{reader}; $writer->close(); $reader->close(); -my $cpid = $cmdpipe->{pid}; - +my $waitpid; if ($timeout) { for (my $i = 0; $i < $timeout; $i++) { - return if !PVE::ProcFSTools::check_process_running($cpid); + $waitpid = waitpid($cpid, WNOHANG); + if (defined($waitpid) && $waitpid == $cpid) { + delete $cmdpipe->{cpid}; + return; + } sleep(1); } } $self->log('info', "ssh tunnel still running - terminating now with SIGTERM\n"); kill(15, $cpid); # wait again for (my $i = 0; $i < 10; $i++) { - return if !PVE::ProcFSTools::check_process_running($cpid); + $waitpid = waitpid($cpid, WNOHANG); + if (defined($waitpid) && $waitpid == $cpid) { + delete $cmdpipe->{cpid}; + return; + } sleep(1); } $self->log('info', "ssh tunnel still running - terminating now with SIGKILL\n"); kill 9, $cpid; sleep 1; + +$waitpid = waitpid($cpid, WNOHANG); +$self->log('err', "Tunnel process (PID $cpid) could not be collected after kill") + if (!(defined($waitpid) && $waitpid == $cpid)); + +delete $cmdpipe->{cpid}; } > } > > sub fork_tunnel { > ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] [PATCH v4 1/3] migrate: collect migration tunnel child process
> I can see the reason to use waitpid instead of check_process_running(), > but why do you change the rest of the code? > > Can we have minimal patches, where each patch states the reason for the change > in the commit log? I thought about something like this: diff --git a/PVE/QemuMigrate.pm b/PVE/QemuMigrate.pm index a25efff..ce5774e 100644 --- a/PVE/QemuMigrate.pm +++ b/PVE/QemuMigrate.pm @@ -71,6 +71,8 @@ sub finish_command_pipe { $self->log('info', "ssh tunnel still running - terminating now with SIGKILL\n"); kill 9, $cpid; sleep 1; + +waitpid($cpid); # avoid zombies } sub fork_tunnel { ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] [PATCH manager] disable animation of charts on load
applied ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
[pve-devel] [PATCH container] bindmount: catch rw/ro race and add tests
rw/ro race occurs when a container contains the same bind mount twice and another container contains a bind mount containing the first container's destination. If the double bind mounts are both meant to be read-only then the second container could theoretically swap them out between the mount and the read-only remount call, then swap them back for the test. So to verify this we use the same file descriptor we use for the dev/inode check and perform an faccessat() call and expect it to return EROFS and nothing else. Also include O_NOFOLLOW in the checks' openat() calls. --- src/PVE/LXC.pm | 90 ++ src/test/Makefile | 5 ++- src/test/bindmount_test.pl | 89 + 3 files changed, 160 insertions(+), 24 deletions(-) create mode 100755 src/test/bindmount_test.pl diff --git a/src/PVE/LXC.pm b/src/PVE/LXC.pm index 7c5a5e6..3a7d2a2 100644 --- a/src/PVE/LXC.pm +++ b/src/PVE/LXC.pm @@ -11,7 +11,7 @@ use File::Path; use File::Spec; use Cwd qw(); use Fcntl qw(O_RDONLY O_NOFOLLOW O_DIRECTORY); -use Errno qw(ELOOP); +use Errno qw(ELOOP EROFS); use PVE::Exception qw(raise_perm_exc); use PVE::Storage; @@ -1044,49 +1044,95 @@ sub walk_tree_nofollow($$$) { return $fd; } -sub bindmount { -my ($dir, $parentfd, $last_dir, $dest, $ro, @extra_opts) = @_; - -my $srcdh = walk_tree_nofollow('/', $dir, 0); - -PVE::Tools::run_command(['mount', '-o', 'bind', @extra_opts, $dir, $dest]); -if ($ro) { - eval { PVE::Tools::run_command(['mount', '-o', 'bind,remount,ro', $dest]); }; - if (my $err = $@) { - warn "bindmount error\n"; - # don't leave writable bind-mounts behind... - PVE::Tools::run_command(['umount', $dest]); - die $err; - } -} +# To guard against symlink attack races against other currently running +# containers with shared recursive bind mount hierarchies we prepare a +# directory handle for the directory we're mounting over to verify the +# mountpoint afterwards. +sub __bindmount_prepare { +my ($hostroot, $dir) = @_; +my $srcdh = walk_tree_nofollow($hostroot, $dir, 0); +return $srcdh; +} +# Assuming we mount to rootfs/a/b/c, verify with the directory handle to 'b' +# ($parentfd) that 'b/c' (openat($parentfd, 'c')) really leads to the directory +# we intended to bind mount. +sub __bindmount_verify { +my ($srcdh, $parentfd, $last_dir, $ro) = @_; my $destdh; if ($parentfd) { # Open the mount point path coming from the parent directory since the # filehandle we would have gotten as first result of walk_tree_nofollow # earlier is still a handle to the underlying directory instead of the # mounted path. - $destdh = PVE::Tools::openat(fileno($parentfd), $last_dir, PVE::Tools::O_PATH) + $destdh = PVE::Tools::openat(fileno($parentfd), $last_dir, PVE::Tools::O_PATH | O_NOFOLLOW | O_DIRECTORY); + die "failed to open mount point: $!\n" if !$destdh; + if ($ro) { + my $dot = '.'; + # 269: faccessat() + # no separate function because 99% of the time it's the wrong thing to use. + if (syscall(269, fileno($destdh), $dot, ::W_OK, 0) != -1) { + die "failed to mark bind mount read only\n"; + } + die "read-only check failed: $!\n" if $! != EROFS; + } } else { # For the rootfs we don't have a parentfd so we open the path directly. # Note that this means bindmounting any prefix of the host's # /var/lib/lxc/$vmid path into another container is considered a grave # security error. sysopen $destdh, $last_dir, O_PATH | O_DIRECTORY; + die "failed to open mount point: $!\n" if !$destdh; } -die "failed to open directory $dir: $!\n" if !$destdh; my ($srcdev, $srcinode) = stat($srcdh); my ($dstdev, $dstinode) = stat($destdh); close $srcdh; close $destdh; -if ($srcdev != $dstdev || $srcinode != $dstinode) { +return ($srcdev == $dstdev && $srcinode == $dstinode); +} + +# Perform the actual bind mounting: +sub __bindmount_do { +my ($dir, $dest, $ro, @extra_opts) = @_; +PVE::Tools::run_command(['mount', '-o', 'bind', @extra_opts, $dir, $dest]); +if ($ro) { + eval { PVE::Tools::run_command(['mount', '-o', 'bind,remount,ro', $dest]); }; + if (my $err = $@) { + warn "bindmount error\n"; + # don't leave writable bind-mounts behind... + PVE::Tools::run_command(['umount', $dest]); + die $err; + } +} +} + +sub bindmount { +my ($dir, $parentfd, $last_dir, $dest, $ro, @extra_opts) = @_; + +my $srcdh = __bindmount_prepare('/', $dir); + +__bindmount_do($dir, $dest, $ro, @extra_opts); + +if (!__bindmount_verify($srcdh, $parentfd, $last_dir, $ro)) { PVE::Tools::run_command(['umount', $dest]); die "detected mount
Re: [pve-devel] [PATCH v4 1/3] migrate: collect migration tunnel child process
I can see the reason to use waitpid instead of check_process_running(), but why do you change the rest of the code? Can we have minimal patches, where each patch states the reason for the change in the commit log? > On June 2, 2016 at 2:44 PM Thomas Lamprechtwrote: > > > As we open2 it we also need to collect it to avoid zombies > > Wait for 15 seconds if the tunnel is still running then send a > SIGTERM, after another 15 seconds a SIGKILL takes care if it totally > hung up. > > As a this stage the tunnel is not used anymore it can safely be > killed, but we wait a little as a gracefull exit is always nicer. > > Signed-off-by: Thomas Lamprecht > --- > > changes since v3: > * just signal the tunnel once with sigterm and eventually sigkill > * wait 15 seconds for sending a sigterm and another 15 for sending the sigkil > > > > PVE/QemuMigrate.pm | 36 ++-- > 1 file changed, 18 insertions(+), 18 deletions(-) > > diff --git a/PVE/QemuMigrate.pm b/PVE/QemuMigrate.pm > index a25efff..8afe099 100644 > --- a/PVE/QemuMigrate.pm > +++ b/PVE/QemuMigrate.pm > @@ -5,6 +5,7 @@ use warnings; > use PVE::AbstractMigrate; > use IO::File; > use IPC::Open2; > +use POSIX qw( WNOHANG ); > use PVE::INotify; > use PVE::Tools; > use PVE::Cluster; > @@ -42,7 +43,10 @@ sub fork_command_pipe { > } > > sub finish_command_pipe { > -my ($self, $cmdpipe, $timeout) = @_; > +my ($self, $cmdpipe) = @_; > + > +my $cpid = $cmdpipe->{pid}; > +return undef if !$cpid; > > my $writer = $cmdpipe->{writer}; > my $reader = $cmdpipe->{reader}; > @@ -50,27 +54,23 @@ sub finish_command_pipe { > $writer->close(); > $reader->close(); > > -my $cpid = $cmdpipe->{pid}; > +# collect child process > +for (my $i = 1; $i <= 30; $i++) { > + my $waitpid = waitpid($cpid, WNOHANG); > + last if (defined($waitpid) && ($waitpid == $cpid)); > > -if ($timeout) { > - for (my $i = 0; $i < $timeout; $i++) { > - return if !PVE::ProcFSTools::check_process_running($cpid); > - sleep(1); > + if ($i == 15) { > + $self->log('info', "ssh tunnel still running - terminating now with > SIGTERM"); > + kill(15, $cpid); > + } elsif ($i == 29) { # kill for sure it and do an other round to > collect it > + $self->log('info', "ssh tunnel still running - terminating now with > SIGKILL"); > + kill(9, $cpid); > } > -} > - > -$self->log('info', "ssh tunnel still running - terminating now with > SIGTERM\n"); > -kill(15, $cpid); > > -# wait again > -for (my $i = 0; $i < 10; $i++) { > - return if !PVE::ProcFSTools::check_process_running($cpid); > - sleep(1); > + sleep (1); > } > > -$self->log('info', "ssh tunnel still running - terminating now with > SIGKILL\n"); > -kill 9, $cpid; > -sleep 1; > +delete $cmdpipe->{cpid}; > } > > sub fork_tunnel { > @@ -114,7 +114,7 @@ sub finish_tunnel { > }; > my $err = $@; > > -$self->finish_command_pipe($tunnel, 30); > +$self->finish_command_pipe($tunnel); > > die $err if $err; > } > -- > 2.1.4 > > > ___ > pve-devel mailing list > pve-devel@pve.proxmox.com > http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel > ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] [RFC manager] add Default Button to lxc and qemu options
this patch adds an "Default" button to the lxc and qemu options clicking on this simply deletes the selected key from the config the purpose of this is for instance if you want to reset the hotplug selection but you don't know what is default with this it would be possible to remove the default values from the comboboxes (and thus eliminating duplicate entries there) this includes a blacklist for lxc and qemu which contains, the values which should probably not be deleted (name and smbios for qemu, ostype and arch for lxc) any comment? i believe this feature would reduce confusion for many, since we have no simple way in the gui to "reset" an option to its default (and not every option has a default option to choose from) ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] [PATCH manager] disable animation of charts on load
> + // enable animation after the store ist loaded s/ist/is/ muscle memory ;) ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
[pve-devel] [PATCH manager] disable animation of charts on load
this enables the animation of the chart after it finished loading, to avoid the initial zoom Signed-off-by: Dominik Csapak--- www/manager6/panel/RRDChart.js | 6 ++ 1 file changed, 6 insertions(+) diff --git a/www/manager6/panel/RRDChart.js b/www/manager6/panel/RRDChart.js index 69998b0..bf0dd35 100644 --- a/www/manager6/panel/RRDChart.js +++ b/www/manager6/panel/RRDChart.js @@ -5,6 +5,7 @@ Ext.define('PVE.widget.RRDChart', { width: 800, height: 300, +animation: false, interactions: [{ type: 'crosszoom' }], @@ -179,5 +180,10 @@ Ext.define('PVE.widget.RRDChart', { } me.callParent(); + + // enable animation after the store ist loaded + me.store.onAfter('load', function(){ + me.setAnimation(true); + }, this, {single: true}); } }); -- 2.1.4 ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
[pve-devel] [PATCH v4 0/3] fix migration related issues
Same as previous but try to keep backwards compatibility for live migrating VMs from nodes with an old unupgradeable qemu-server to a new one. Also addressed some bad style in the first patch, although I'm not quite happy (nor with the actual or the previous version). I tested: old -> new (secure) new -> new (secure) new -> new (unsecure) new -> old (unsecure) new -> old secure *won't* work, and I see really no reason to hack this in, use unsecure migration if you put yourself in a situation where this is needed. (Or just update the old node first :) ) cheers, Thomas Thomas Lamprecht (3): migrate: collect migration tunnel child process migrate: use ssh forwarded UNIX socket tunnel migrate: add some more log output PVE/QemuMigrate.pm | 158 - PVE/QemuServer.pm | 10 2 files changed, 119 insertions(+), 49 deletions(-) -- 2.1.4 ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
[pve-devel] [PATCH v4 3/3] migrate: add some more log output
Output all errors - if any - and add some log outputs on what we qmp commands we do with which parameters, may be helpful when debugging or analyzing a users problem. Also check if the queried status is defined, as on a error this may not be. Signed-off-by: Thomas Lamprecht--- PVE/QemuMigrate.pm | 14 ++ 1 file changed, 10 insertions(+), 4 deletions(-) diff --git a/PVE/QemuMigrate.pm b/PVE/QemuMigrate.pm index eb060d8..68e089d 100644 --- a/PVE/QemuMigrate.pm +++ b/PVE/QemuMigrate.pm @@ -467,6 +467,7 @@ sub phase2 { $self->log('info', "migrate_set_downtime error: $@") if $@; } +$self->log('info', "set migration_caps"); eval { PVE::QemuServer::set_migration_caps($vmid); }; @@ -474,10 +475,12 @@ sub phase2 { #set cachesize 10% of the total memory my $cachesize = int($conf->{memory}*1048576/10); +$self->log('info', "set cachesize: $cachesize"); eval { - PVE::QemuServer::vm_mon_cmd_nocheck($vmid, "migrate-set-cache-size", value => $cachesize); + PVE::QemuServer::vm_mon_cmd_nocheck($vmid, "migrate-set-cache-size", value => int($cachesize)); }; - +$self->log('info', "migrate-set-cache-size error: $@") if $@; + if (PVE::QemuServer::vga_conf_has_spice($conf->{vga})) { my $rpcenv = PVE::RPCEnvironment::get(); my $authuser = $rpcenv->get_user(); @@ -498,6 +501,7 @@ sub phase2 { } +$self->log('info', "start migrate command to $ruri"); eval { PVE::QemuServer::vm_mon_cmd_nocheck($vmid, "migrate", uri => $ruri); }; @@ -522,6 +526,7 @@ sub phase2 { if (my $err = $@) { $err_count++; warn "query migrate failed: $err\n"; + $self->log('info', "query migrate failed: $err"); if ($err_count <= 5) { usleep(100); next; @@ -529,12 +534,12 @@ sub phase2 { die "too many query migrate failures - aborting\n"; } -if ($stat->{status} =~ m/^(setup)$/im) { +if (defined($stat->{status}) && $stat->{status} =~ m/^(setup)$/im) { sleep(1); next; } - if ($stat->{status} =~ m/^(active|completed|failed|cancelled)$/im) { + if (defined($stat->{status}) && $stat->{status} =~ m/^(active|completed|failed|cancelled)$/im) { $merr = undef; $err_count = 0; if ($stat->{status} eq 'completed') { @@ -547,6 +552,7 @@ sub phase2 { } if ($stat->{status} eq 'failed' || $stat->{status} eq 'cancelled') { + $self->log('info', "migration status error: $stat->{status}"); die "aborting\n" } -- 2.1.4 ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
[pve-devel] [PATCH v4 1/3] migrate: collect migration tunnel child process
As we open2 it we also need to collect it to avoid zombies Wait for 15 seconds if the tunnel is still running then send a SIGTERM, after another 15 seconds a SIGKILL takes care if it totally hung up. As a this stage the tunnel is not used anymore it can safely be killed, but we wait a little as a gracefull exit is always nicer. Signed-off-by: Thomas Lamprecht--- changes since v3: * just signal the tunnel once with sigterm and eventually sigkill * wait 15 seconds for sending a sigterm and another 15 for sending the sigkil PVE/QemuMigrate.pm | 36 ++-- 1 file changed, 18 insertions(+), 18 deletions(-) diff --git a/PVE/QemuMigrate.pm b/PVE/QemuMigrate.pm index a25efff..8afe099 100644 --- a/PVE/QemuMigrate.pm +++ b/PVE/QemuMigrate.pm @@ -5,6 +5,7 @@ use warnings; use PVE::AbstractMigrate; use IO::File; use IPC::Open2; +use POSIX qw( WNOHANG ); use PVE::INotify; use PVE::Tools; use PVE::Cluster; @@ -42,7 +43,10 @@ sub fork_command_pipe { } sub finish_command_pipe { -my ($self, $cmdpipe, $timeout) = @_; +my ($self, $cmdpipe) = @_; + +my $cpid = $cmdpipe->{pid}; +return undef if !$cpid; my $writer = $cmdpipe->{writer}; my $reader = $cmdpipe->{reader}; @@ -50,27 +54,23 @@ sub finish_command_pipe { $writer->close(); $reader->close(); -my $cpid = $cmdpipe->{pid}; +# collect child process +for (my $i = 1; $i <= 30; $i++) { + my $waitpid = waitpid($cpid, WNOHANG); + last if (defined($waitpid) && ($waitpid == $cpid)); -if ($timeout) { - for (my $i = 0; $i < $timeout; $i++) { - return if !PVE::ProcFSTools::check_process_running($cpid); - sleep(1); + if ($i == 15) { + $self->log('info', "ssh tunnel still running - terminating now with SIGTERM"); + kill(15, $cpid); + } elsif ($i == 29) { # kill for sure it and do an other round to collect it + $self->log('info', "ssh tunnel still running - terminating now with SIGKILL"); + kill(9, $cpid); } -} - -$self->log('info', "ssh tunnel still running - terminating now with SIGTERM\n"); -kill(15, $cpid); -# wait again -for (my $i = 0; $i < 10; $i++) { - return if !PVE::ProcFSTools::check_process_running($cpid); - sleep(1); + sleep (1); } -$self->log('info', "ssh tunnel still running - terminating now with SIGKILL\n"); -kill 9, $cpid; -sleep 1; +delete $cmdpipe->{cpid}; } sub fork_tunnel { @@ -114,7 +114,7 @@ sub finish_tunnel { }; my $err = $@; -$self->finish_command_pipe($tunnel, 30); +$self->finish_command_pipe($tunnel); die $err if $err; } -- 2.1.4 ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
[pve-devel] [PATCH v4 2/3] migrate: use ssh forwarded UNIX socket tunnel
We cannot guarantee when the SSH forward Tunnel really becomes ready. The check with the mtunnel API call did not help for this prolem as it only checked that the SSH connection itself works and that the destination node has quorum but the forwarded tunnel itself was not checked. The Forward tunnel is a different channel in the SSH connection, independent of the SSH `qm mtunnel` channel, so only if that works it does not guarantees that our migration tunnel is up and ready. When the node(s) where under load, or when we did parallel migrations (migrateall), the migrate command was often started before a tunnel was open and ready to receive data. This led to a direct abortion of the migration and is the main cause in why parallel migrations often leave two thirds or more VMs on the source node. The issue was tracked down to SSH after debugging the QEMU process and enabling debug logging showed that the tunnel became often to late available and ready, or not at all. Fixing the TCP forward tunnel is quirky and not straight ahead, the only way SSH gives as a possibility is to use -N (no command) -f (background) and -o "ExitOnForwardFailure=yes", then it would wait in the foreground until the tunnel is ready and only then background itself. This is not quite the nicest way for our special use case and our code base. Waiting for the local port to become open and ready (through /proc/net/tcp[6]] as a proof of concept is not enough, even if the port is in the listening state and should theoretically accept connections this still failed often as the tunnel was not yet fully ready. Further another problem would still be open if we tried to patch the SSH Forward method we currently use - which we solve for free with the approach of this patch - namely the problem that the method to get an available port (next_migration_port) has a serious race condition which could lead to multiple use of the same port on a parallel migration (I observed this on my many test, seldom but if it happens its really bad). So lets now use UNIX sockets, which ssh supports since version 5.7. The end points are UNIX socket bound to the VMID - thus no port so no race and also no limitation of available ports (we reserved 50 for migration). The endpoints get created in /run/qemu-server/VMID.migrate and as KVM/QEMU in current versions is able to use UNIX socket just as well as TCP we have not to change much on the interaction with QEMU. QEMU is started with the migrate_incoming url at the local destination endpoint and creates the socket file, we then create a listening socket on the source side and connect over SSH to the destination. Now the migration can be started by issuing the migrate qmp command with an updated uri. This breaks live migration from new to old, but *not* from old to new, so there is a upgrade path. If a live migration from new to old must be made (for whatever reason), use the unsecure_migration setting (man datacenter.conf) to allow this, although that should only be done in trusted network. Signed-off-by: Thomas Lamprecht--- changes since v3: * use --stateuri 'unix' to signal a secure unix connection, this lets us keep backwards compatibility to allow old -> new live migrations and tcp for stateuri would be confusing for using unix sockets. Decide when using unix or tcp by looking at the unsecure_migration setting, at this stage we have no other way (then querying the destination, which I don't like) * minor reformatting PVE/QemuMigrate.pm | 106 - PVE/QemuServer.pm | 10 + 2 files changed, 90 insertions(+), 26 deletions(-) diff --git a/PVE/QemuMigrate.pm b/PVE/QemuMigrate.pm index 8afe099..eb060d8 100644 --- a/PVE/QemuMigrate.pm +++ b/PVE/QemuMigrate.pm @@ -74,11 +74,11 @@ sub finish_command_pipe { } sub fork_tunnel { -my ($self, $nodeip, $lport, $rport) = @_; +my ($self, $tunnel_addr) = @_; -my @localtunnelinfo = $lport ? ('-L' , "$lport:localhost:$rport" ) : (); +my @localtunnelinfo = ('-L' , $tunnel_addr ); -my $cmd = [@{$self->{rem_ssh}}, @localtunnelinfo, 'qm', 'mtunnel' ]; +my $cmd = [@{$self->{rem_ssh}}, '-o ExitOnForwardFailure=yes', @localtunnelinfo, 'qm', 'mtunnel' ]; my $tunnel = $self->fork_command_pipe($cmd); @@ -98,12 +98,14 @@ sub fork_tunnel { $self->finish_command_pipe($tunnel); die "can't open migration tunnel - $err"; } + return $tunnel; } sub finish_tunnel { -my ($self, $tunnel) = @_; +my ($self) = @_; +my $tunnel = $self->{tunnel}; my $writer = $tunnel->{writer}; eval { @@ -116,7 +118,20 @@ sub finish_tunnel { $self->finish_command_pipe($tunnel); -die $err if $err; +if ($tunnel->{sock_addr}) { + # ssh does not clean up on local host + my $cmd = ['rm', '-f', $tunnel->{sock_addr}]; # + PVE::Tools::run_command($cmd); + + # .. and just to be sure check on remote
Re: [pve-devel] qemu-server: address migrations issues
El 01/06/16 a las 17:01, Alexandre DERUMIER escribió: @Alexandre: do you really need the other direction? No. Only old->new. I don't think that user try to migrate from new->old anyway. Users try strange things, but I don't think a failure to migrate to older version would be a surprise. :-) old->new compat should be mantained in 4.x series, then in 4.x->5.x procedure note to first upgrade to latest 4.x (if adequate). Cheers Eneko -- Zuzendari Teknikoa / Director Técnico Binovo IT Human Project, S.L. Telf. 943493611 943324914 Astigarraga bidea 2, planta 6 dcha., ofi. 3-2; 20180 Oiartzun (Gipuzkoa) www.binovo.es ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] [PATCH v3 1/3] migrate: collect migration tunnel child process
On 02.06.2016 10:34, Dietmar Maurer wrote: I do not really understand this loop. * Why do you call kill -9 multiple times? "Just to be sure", normally the -9 would instantly kill it and the next loop iteration would then pick it up, so the probability that a another sigkill gets send is quite low. (but yeah, the code so is bad style/confusing I guess) * Why do you iterate 20 times (instead of 30)? The migrations is here at an end, succeeded or not, but if the tunnel is still here at this point we want to quit it, waiting 30 seconds seems long for that, as the tunnel has no use now, as: * all data was carried over to the destination * the migration failed the VM stays on the source and no more data gets send over the tunnel. I'd maybe actually go for 5 then a sigterm and after then seconds a sigkill if its still there (which is really low probability and it has no effect on our migration anyway). But as it also does not really hinders us we can use the old timeouts and send a sigterm at 15 seconds and a sigkill after 30 if preferred. I'll resend the whole thing (mainly this patch and patch 2) where I address the here mentioned issue and that also old versions of qemu-server may live migrate (should not be to much code overhead). +# collect child process +for (my $i = 1; $i < 20; $i++) { + my $waitpid = waitpid($cpid, WNOHANG); + last if (defined($waitpid) && ($waitpid == $cpid)); + + if ($i == 10) { + $self->log('info', "ssh tunnel still running - terminating now with SIGTERM"); + kill(15, $cpid); + } elsif ($i >= 15) { + $self->log('info', "ssh tunnel still running - terminating now with SIGKILL"); + kill(9, $cpid); } + sleep (1); } ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] [PATCH V2 pve-zsync] fix #1004 adapt regex to new schema
applied with the following fix: - if ($opt =~ m/(?:file=|volume=)?([^:]+:)([A-Za-z0-9\-]+)/){ + if ($opt =~ m/^(?:file=|volume=)?([^:]+:)([A-Za-z0-9\-]+)$/){ > On June 2, 2016 at 10:28 AM Wolfgang Linkwrote: > > > --- > pve-zsync | 14 +++--- > 1 file changed, 11 insertions(+), 3 deletions(-) > > diff --git a/pve-zsync b/pve-zsync > index 212ada9..cc566b4 100644 > --- a/pve-zsync > +++ b/pve-zsync > @@ -763,9 +763,17 @@ sub parse_disks { > > my $disk = undef; > my $stor = undef; > - if($line =~ m/^(?:((?:virtio|ide|scsi|sata|mp)\d+)|rootfs): > ([^:]+:)([A-Za-z0-9\-]+),(.*)$/) { > - $disk = $3; > - $stor = $2; > + if($line =~ m/^(?:(?:(?:virtio|ide|scsi|sata|mp)\d+)|rootfs): (.*)$/) { > + my @parameter = split(/,/,$1); > + > + foreach my $opt (@parameter) { > + if ($opt =~ m/(?:file=|volume=)?([^:]+:)([A-Za-z0-9\-]+)/){ > + $disk = $2; > + $stor = $1; > + last; > + } > + } > + > } else { > print "Disk: \"$line\" will not include in pve-sync\n" if $get_err > || > $error; > next; > -- > 2.1.4 > > > ___ > pve-devel mailing list > pve-devel@pve.proxmox.com > http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel > ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] [PATCH v3 1/3] migrate: collect migration tunnel child process
I do not really understand this loop. * Why do you call kill -9 multiple times? * Why do you iterate 20 times (instead of 30)? > +# collect child process > +for (my $i = 1; $i < 20; $i++) { > + my $waitpid = waitpid($cpid, WNOHANG); > + last if (defined($waitpid) && ($waitpid == $cpid)); > + > + if ($i == 10) { > + $self->log('info', "ssh tunnel still running - terminating now with > SIGTERM"); > + kill(15, $cpid); > + } elsif ($i >= 15) { > + $self->log('info', "ssh tunnel still running - terminating now with > SIGKILL"); > + kill(9, $cpid); > } > + sleep (1); > } ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
[pve-devel] [PATCH V2 pve-zsync] fix #1004 adapt regex to new schema
--- pve-zsync | 14 +++--- 1 file changed, 11 insertions(+), 3 deletions(-) diff --git a/pve-zsync b/pve-zsync index 212ada9..cc566b4 100644 --- a/pve-zsync +++ b/pve-zsync @@ -763,9 +763,17 @@ sub parse_disks { my $disk = undef; my $stor = undef; - if($line =~ m/^(?:((?:virtio|ide|scsi|sata|mp)\d+)|rootfs): ([^:]+:)([A-Za-z0-9\-]+),(.*)$/) { - $disk = $3; - $stor = $2; + if($line =~ m/^(?:(?:(?:virtio|ide|scsi|sata|mp)\d+)|rootfs): (.*)$/) { + my @parameter = split(/,/,$1); + + foreach my $opt (@parameter) { + if ($opt =~ m/(?:file=|volume=)?([^:]+:)([A-Za-z0-9\-]+)/){ + $disk = $2; + $stor = $1; + last; + } + } + } else { print "Disk: \"$line\" will not include in pve-sync\n" if $get_err || $error; next; -- 2.1.4 ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] [PATCH] fix #1004 adapt regex to new schema
On 02.06.2016 09:27, Wolfgang Link wrote: --- pve-zsync | 2 ++ 1 file changed, 2 insertions(+) diff --git a/pve-zsync b/pve-zsync index 212ada9..932c299 100644 --- a/pve-zsync +++ b/pve-zsync @@ -766,6 +766,8 @@ sub parse_disks { if($line =~ m/^(?:((?:virtio|ide|scsi|sata|mp)\d+)|rootfs): ([^:]+:)([A-Za-z0-9\-]+),(.*)$/) { $disk = $3; $stor = $2; + + $stor =~ s/^file=|volume=//; The line start anchor (^) does only apply to file here but not to volume, is that intended? $stor =~ s/^file=|^volume=//; or maybe better: $stor =~ s/^(file|volume)=//; } else { print "Disk: \"$line\" will not include in pve-sync\n" if $get_err || $error; next; ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] [PATCH qemu-server] fix typo
applied ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
[pve-devel] [PATCH qemu-server] fix typo
--- PVE/QemuMigrate.pm | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/PVE/QemuMigrate.pm b/PVE/QemuMigrate.pm index a288627..a25efff 100644 --- a/PVE/QemuMigrate.pm +++ b/PVE/QemuMigrate.pm @@ -230,7 +230,7 @@ sub sync_disks { return if !$volid; - die "cant migrate local file/device '$volid'\n" if $volid =~ m|^/|; + die "can't migrate local file/device '$volid'\n" if $volid =~ m|^/|; if ($is_cdrom) { die "cant migrate local cdrom drive\n" if $volid eq 'cdrom'; -- 2.1.4 ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] [PATCH manager 4/4] make the graphs on the summary pages use empty space
applied ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
[pve-devel] [PATCH] fix #1004 adapt regex to new schema
--- pve-zsync | 2 ++ 1 file changed, 2 insertions(+) diff --git a/pve-zsync b/pve-zsync index 212ada9..932c299 100644 --- a/pve-zsync +++ b/pve-zsync @@ -766,6 +766,8 @@ sub parse_disks { if($line =~ m/^(?:((?:virtio|ide|scsi|sata|mp)\d+)|rootfs): ([^:]+:)([A-Za-z0-9\-]+),(.*)$/) { $disk = $3; $stor = $2; + + $stor =~ s/^file=|volume=//; } else { print "Disk: \"$line\" will not include in pve-sync\n" if $get_err || $error; next; -- 2.1.4 ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel