Quoting Arun M (arunmahadevai...@gmail.com): > On Fri, Jul 6, 2012 at 8:51 PM, Serge Hallyn > <serge.hal...@canonical.com>wrote: > > > Quoting Arun M (arunmahadevai...@gmail.com): > > > Hi, > > > > > > I updated to lxc-0.8.0-rc2 and after that I observe dangling cgroups > > > (/cgroup/PID) in the filesystem after lxc-execute terminates. > > > > > > I am using ns_cgroup. > > > > Which kernel are you on? The ns cgroup is no longer available since > > over a year ago. > > > > > I am on 2.6.32. (RedHat Enterprise Linux 6) > > > Looks like a process is spawned in a new namespace but lxc-fails to > > remove > > > the cgroup directory. > > > > Can you show 'ps -ef'? If you can identify the process that won't > > die, can you see what it's doing (strace -f -o outfile -p $pid) ? > > > > My only guess would be that the container_reboot_supported() function, > > which gets cloned, is for some reason not dying. Except no, that > > can't be it, because this change actually moves that clone() to the > > monitor task, so it wouldn't be pinning the cgroup. > > > > Can you check whether your container is mounting a private /var or > > /run? My theory is that the initial task is never killed because you > > are relying on the utmp watcher (being on an older kernel), and the > > container is using a utmp that the monitor can't see. > > > > I dont have much idea about utmp watcher. However I see the following > messages in the log. > > $ /usr/local/bin/lxc-execute -n alpha -f n1.conf -l DEBUG -o /tmp/log -- > /bin/sh > > > lxc-execute 1341623135.543 DEBUG lxc_start - Dropping cap_sys_boot > and watching utmp > ... > ... > lxc-execute 1341623135.584 DEBUG lxc_utmp - Added > '/proc/23213/root/var/run' to inotifywatch > lxc-execute 1341623135.584 WARN lxc_start - invalid pid for > SIGCHLD, siginfo.ssi_pid:23210, *pid:23213 > > This is while the container shell is still running. > > $ file /proc/23210 > /proc/23210: cannot open `/proc/23210' (No such file or directory) > > $ file /proc/23213 > /proc/23213: directory > > $ cat /proc/23213/cmdline|less > /usr/local/libexec/lxc/lxc-init^@--^@/bin/sh^@ > > > And I see two cgroups, > > $ ls -ld /cgroup/alpha > drwxr-xr-x 2 arunm users 0 Jul 7 06:35 /cgroup/alpha > > $ cat /cgroup/alpha/tasks > 23213 > 23217 > > $ cat /cgroup/23210/tasks > [Nothing] > > And after I exit the shell /cgroup/23210 hangs around for ever. > > I dont see /var/run/utmp or /run directory inside the container.
Ooooh! I get it. I'm pretty sure 23210 (as I started to guess above but then decided couldn't be the case) is the task cloned to test for reboot support. Since we're cloning a new pid namespace, the ns cgroup creates a new child cgroup. That task immediately exits after testing for reboot, and because you don't have a release agent, is new cgroup is not being deleted. So I guess if the ns cgroup is mounted, we need to delete that cgroup. You can work around it by using a release agent. If you're interested in writing a patch for that, I'll be happy to help. thanks, -serge ------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ _______________________________________________ Lxc-users mailing list Lxc-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/lxc-users