Quoting Arun M (arunmahadevai...@gmail.com):
> On Fri, Jul 6, 2012 at 8:51 PM, Serge Hallyn 
> <serge.hal...@canonical.com>wrote:
> 
> > Quoting Arun M (arunmahadevai...@gmail.com):
> > > Hi,
> > >
> > > I updated to lxc-0.8.0-rc2 and after that I observe dangling cgroups
> > > (/cgroup/PID) in the filesystem after lxc-execute terminates.
> > >
> > > I am using ns_cgroup.
> >
> > Which kernel are you on?  The ns cgroup is no longer available since
> > over a year ago.
> >
> >
> I am on 2.6.32. (RedHat Enterprise Linux 6)
> 
>  > Looks like a process is spawned in a new namespace but lxc-fails to
> > remove
> > > the cgroup directory.
> >
> > Can you show 'ps -ef'?   If you can identify the process that won't
> > die, can you see what it's doing (strace -f -o outfile -p $pid) ?
> >
> > My only guess would be that the container_reboot_supported() function,
> > which gets cloned, is for some reason not dying.  Except no, that
> > can't be it, because this change actually moves that clone() to the
> > monitor task, so it wouldn't be pinning the cgroup.
> >
> > Can you check whether your container is mounting a private /var or
> > /run?  My theory is that the initial task is never killed because you
> > are relying on the utmp watcher (being on an older kernel), and the
> > container is using a utmp that the monitor can't see.
> >
> > I dont have much idea about utmp watcher. However I see the following
> messages in the log.
> 
> $ /usr/local/bin/lxc-execute -n alpha -f n1.conf -l DEBUG -o /tmp/log  --
> /bin/sh
> 
> 
>     lxc-execute 1341623135.543 DEBUG    lxc_start - Dropping cap_sys_boot
> and watching utmp
> ...
> ...
>     lxc-execute 1341623135.584 DEBUG    lxc_utmp - Added
> '/proc/23213/root/var/run' to inotifywatch
>     lxc-execute 1341623135.584 WARN     lxc_start - invalid pid for
> SIGCHLD, siginfo.ssi_pid:23210, *pid:23213
> 
> This is while the container shell is still running.
> 
> $ file /proc/23210
> /proc/23210: cannot open `/proc/23210' (No such file or directory)
> 
> $ file /proc/23213
> /proc/23213: directory
> 
> $ cat /proc/23213/cmdline|less
> /usr/local/libexec/lxc/lxc-init^@--^@/bin/sh^@
> 
> 
> And I see two cgroups,
> 
> $ ls -ld /cgroup/alpha
> drwxr-xr-x 2 arunm users 0 Jul  7 06:35 /cgroup/alpha
> 
> $ cat /cgroup/alpha/tasks
> 23213
> 23217
> 
> $ cat /cgroup/23210/tasks
> [Nothing]
> 
> And after I exit the shell /cgroup/23210 hangs around for ever.
> 
> I dont see /var/run/utmp or /run directory inside the container.

Ooooh!  I get it.

I'm pretty sure 23210  (as I started to guess above but then decided
couldn't be the case) is the task cloned to test for reboot support.
Since we're cloning a new pid namespace, the ns cgroup creates a new
child cgroup.  That task immediately exits after testing for reboot,
and because you don't have a release agent, is new cgroup is not
being deleted.

So I guess if the ns cgroup is mounted, we need to delete that
cgroup.  You can work around it by using a release agent.  If you're
interested in writing a patch for that, I'll be happy to help.

thanks,
-serge

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Lxc-users mailing list
Lxc-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/lxc-users

Reply via email to