Re: [Lxc-users] Zombie container
On 2/14/2011 6:50 PM, Trent W. Buck wrote: > Daniel Lezcano writes: > >> As a quick fix, I suggest you look what application created the new >> namespace. Launch your container and then look at >> /cgroup/blackbird/1234/tasks and look for the command line associated >> with the pid in this file. I suspect vsftpd could be the culprit. If >> this is the case, there is an option to disable the namespace >> creation. > > Or, of course, pick a different application :-) > > If it is vsftpd, I *strongly* recommend switching to SFTP (part of SSH) > for writes, and HTTP for reads. http://mywiki.wooledge.org/FtpMustDie Well, of course, but what's that got to do with LXC or the namespace trick that vsftpd happens to use? Your observations, which everyone already knows, show that the ftp protocol is problematic. Granted but so what? The discussion here is how to get all commonly used tools working within containers, using lxc, that are currently used outside of containers, not what tools to use. 3 things: 1) The vstftpd problem is not a problem with the ftp protocol. Apache or any other service or app that meets your religious or aesthetic approval might have the same or similar problem at any time. Here we are only interested in containerizing anything that currently is done on traditional servers. For better or for worse, FTP is widely used on trandtional servers, and specifically vsftpd is. And so the discussion is about how to use vsftpd within a container, not whether to use ftp. 2) As if everyone has any choice in the matter anyway, since most use of any communication protocol, such as ftp, involve two different parties, not yourself at both ends. Even if you were so gauche as to try to dictate internal IT policies and procedures and technologies to your own customers and vendors, you still don't get to dictate to 2nd or more removed customers and vendors of your own customers and vendors. So when _big honking global bank/manufacturer/retailer/shipper/etc_ says they will ftp to you or you to them, you just *&^*7 do it. Oh you can offer the alternatives, and occasionally you get lucky, but that doesn't remove the need to make ftp work. Same goes for every other commonly used technology that you don't happen to personally like. 3) What makes http so special only for reading and sftp so special only for writing? Depending on my security needs and other factors I routinely use http for writing and/or sftp for reading. I also use rsync (native, not via ssh or rsh) for both reading and writing in many situations where most people use ftp or sftp or http. Conversely I never use nfs and only use samba extremely rarely, but I'm sure these technologies are perfectly justifiable and required for other people in other situations. Choice of tool is completely dependent on the job at hand and it's utterly silly to try to say what should and should not be used except within the context of a specific job, and then the answer only applies to that one specific job in that one specific context. -- bkw -- The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE: Pinpoint memory and threading errors before they happen. Find and fix more than 250 security defects in the development cycle. Locate bottlenecks in serial and parallel code that limit performance. http://p.sf.net/sfu/intel-dev2devfeb ___ Lxc-users mailing list Lxc-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/lxc-users
Re: [Lxc-users] Zombie container
> "DL" == Daniel Lezcano writes: DL> * simply do rm -rf /cgroup/blackbird (don't care about the DL> errors). >> >> This fails with "Operation not permitted" and the problem >> persists. DL> Do you try to remove the directories as root when the container DL> exited ? Yes. DL> It is not a kernel problem, it's the expected behavior but DL> unfortunately the cgroup automatic creation does not really fit DL> with the namespace concept. This is why the ns_cgroup will be DL> removed in the next kernel version in order to manage the cgroup DL> consistenly. OK, I have to simply live with the problem (it's not fatal) until then. -- The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE: Pinpoint memory and threading errors before they happen. Find and fix more than 250 security defects in the development cycle. Locate bottlenecks in serial and parallel code that limit performance. http://p.sf.net/sfu/intel-dev2devfeb ___ Lxc-users mailing list Lxc-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/lxc-users
Re: [Lxc-users] Zombie container
On 02/15/2011 10:17 AM, Milan Zamazal wrote: >> "DL" == Daniel Lezcano writes: > DL> It is probable you have an application creating new namespaces > DL> in the container. That's triggering a new cgroup creation which > DL> is nested with the container's one. This is a kernel feature > DL> (removed for the next kernel version). > > Thank you for explanation. > > By watching when these subdirectories get created I discovered the > problem appears when I run `fusermount -u'. > > DL>* simply do rm -rf /cgroup/blackbird (don't care about the > DL>errors). > > This fails with "Operation not permitted" and the problem persists. Do you try to remove the directories as root when the container exited ? > DL> Launch your container and then look at > DL> /cgroup/blackbird/1234/tasks and look for the command line > DL> associated with the pid in this file. > > The `tasks' file is empty. But it must be fusermount or something > related to its invocation. Ok. Interesting. > DL> Hope that helps. > > Thank you for help. Now I know what creates the problem, but I still > don't know how to safely prevent it or remedy it. Maybe it's a kernel > problem (I use standard kernel 2.6.32 from Debian)? It is not a kernel problem, it's the expected behavior but unfortunately the cgroup automatic creation does not really fit with the namespace concept. This is why the ns_cgroup will be removed in the next kernel version in order to manage the cgroup consistenly. http://git.kernel.org/?p=linux/kernel/git/sfr/linux-next.git;a=blob;f=Documentation/feature-removal-schedule.txt;h=ada3db8fc9f6307b0b9b51b503353a96b995b62d;hb=b7bbcc2b04070ebd77c827e8ebbd08a5b7493004 -- The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE: Pinpoint memory and threading errors before they happen. Find and fix more than 250 security defects in the development cycle. Locate bottlenecks in serial and parallel code that limit performance. http://p.sf.net/sfu/intel-dev2devfeb ___ Lxc-users mailing list Lxc-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/lxc-users
Re: [Lxc-users] Zombie container
> "DL" == Daniel Lezcano writes: DL> It is probable you have an application creating new namespaces DL> in the container. That's triggering a new cgroup creation which DL> is nested with the container's one. This is a kernel feature DL> (removed for the next kernel version). Thank you for explanation. By watching when these subdirectories get created I discovered the problem appears when I run `fusermount -u'. DL> * simply do rm -rf /cgroup/blackbird (don't care about the DL> errors). This fails with "Operation not permitted" and the problem persists. DL> Launch your container and then look at DL> /cgroup/blackbird/1234/tasks and look for the command line DL> associated with the pid in this file. The `tasks' file is empty. But it must be fusermount or something related to its invocation. DL> Hope that helps. Thank you for help. Now I know what creates the problem, but I still don't know how to safely prevent it or remedy it. Maybe it's a kernel problem (I use standard kernel 2.6.32 from Debian)? -- The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE: Pinpoint memory and threading errors before they happen. Find and fix more than 250 security defects in the development cycle. Locate bottlenecks in serial and parallel code that limit performance. http://p.sf.net/sfu/intel-dev2devfeb ___ Lxc-users mailing list Lxc-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/lxc-users
Re: [Lxc-users] Zombie container
Dne 15.2.2011 00:50, Trent W. Buck napsal(a): > Daniel Lezcano writes: > >> As a quick fix, I suggest you look what application created the new >> namespace. Launch your container and then look at >> /cgroup/blackbird/1234/tasks and look for the command line associated >> with the pid in this file. I suspect vsftpd could be the culprit. If >> this is the case, there is an option to disable the namespace >> creation. > > Or, of course, pick a different application :-) > > If it is vsftpd, I *strongly* recommend switching to SFTP (part of SSH) > for writes, and HTTP for reads. http://mywiki.wooledge.org/FtpMustDie If it is vsftpd, you can add: isolate=NO isolate_network=NO to /etc/vsftpd.conf and all will OK. Miroslav. -- Miroslav Lednicky, AVONET, s.r.o. -- The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE: Pinpoint memory and threading errors before they happen. Find and fix more than 250 security defects in the development cycle. Locate bottlenecks in serial and parallel code that limit performance. http://p.sf.net/sfu/intel-dev2devfeb ___ Lxc-users mailing list Lxc-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/lxc-users
Re: [Lxc-users] Zombie container
Daniel Lezcano writes: > As a quick fix, I suggest you look what application created the new > namespace. Launch your container and then look at > /cgroup/blackbird/1234/tasks and look for the command line associated > with the pid in this file. I suspect vsftpd could be the culprit. If > this is the case, there is an option to disable the namespace > creation. Or, of course, pick a different application :-) If it is vsftpd, I *strongly* recommend switching to SFTP (part of SSH) for writes, and HTTP for reads. http://mywiki.wooledge.org/FtpMustDie -- The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE: Pinpoint memory and threading errors before they happen. Find and fix more than 250 security defects in the development cycle. Locate bottlenecks in serial and parallel code that limit performance. http://p.sf.net/sfu/intel-dev2devfeb ___ Lxc-users mailing list Lxc-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/lxc-users
Re: [Lxc-users] Zombie container
On 02/14/2011 07:33 PM, Milan Zamazal wrote: > On a Debian 6.0 machine, I've got a certain container that can't be > started again once it was stopped: > ># lxc-start -n blackbird >lxc-start: Device or resource busy - failed to remove previous cgroup > '/cgroup/blackbird' >lxc-start: failed to spawn 'blackbird' >lxc-start: Device or resource busy - failed to remove cgroup > '/cgroup/blackbird' > > The container seems to be stopped completely but the /cgroup/blackbird/ > directory is indeed non-empty. There are some subdirectories with > numeric names there but I can't find any processes with such numbers in > the system nor any other processes related to the container. The only > way to get rid of it is to reboot the host. > > Is there a way to force removal of the cgroup? Or is there a way to > find out what keeps the cgroup busy? It is probable you have an application creating new namespaces in the container. That's triggering a new cgroup creation which is nested with the container's one. This is a kernel feature (removed for the next kernel version). There are several solutions : * fix this behavior in lxc where we will recursively remove the cgroup directories * simply do rm -rf /cgroup/blackbird (don't care about the errors). As a quick fix, I suggest you look what application created the new namespace. Launch your container and then look at /cgroup/blackbird/1234/tasks and look for the command line associated with the pid in this file. I suspect vsftpd could be the culprit. If this is the case, there is an option to disable the namespace creation. http://www.mail-archive.com/lxc-users@lists.sourceforge.net/msg01110.html Hope that helps. -- Daniel -- The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE: Pinpoint memory and threading errors before they happen. Find and fix more than 250 security defects in the development cycle. Locate bottlenecks in serial and parallel code that limit performance. http://p.sf.net/sfu/intel-dev2devfeb ___ Lxc-users mailing list Lxc-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/lxc-users
[Lxc-users] Zombie container
On a Debian 6.0 machine, I've got a certain container that can't be started again once it was stopped: # lxc-start -n blackbird lxc-start: Device or resource busy - failed to remove previous cgroup '/cgroup/blackbird' lxc-start: failed to spawn 'blackbird' lxc-start: Device or resource busy - failed to remove cgroup '/cgroup/blackbird' The container seems to be stopped completely but the /cgroup/blackbird/ directory is indeed non-empty. There are some subdirectories with numeric names there but I can't find any processes with such numbers in the system nor any other processes related to the container. The only way to get rid of it is to reboot the host. Is there a way to force removal of the cgroup? Or is there a way to find out what keeps the cgroup busy? -- The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE: Pinpoint memory and threading errors before they happen. Find and fix more than 250 security defects in the development cycle. Locate bottlenecks in serial and parallel code that limit performance. http://p.sf.net/sfu/intel-dev2devfeb ___ Lxc-users mailing list Lxc-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/lxc-users