This looks to be the third case we hit with the same symptom (and likely for yet another reason). The complicating issue is that lxc containers make use of net namespaces and if those are released, references to I think the netdevice structures, are temporarily moved over to the loopback device. So anything going wrong with respect of references will show up like what you see (to add non mental note: bug 1021471 and bug 1065434).
Since you say it takes 10-15hrs to hit it feels like this could again be a case of something rarely going on when the container is shut down which then causes a reference to not being dropped. Right now the range between 3.5.0-27 (maybe?) and 3.8.0-25 is quite vast. And at least up to 3.10 we can assume it has not been detected/fixed. So unlikely something that will be easy to spot. I know it is a lot of effort, but it would be really important to narrow down the version delta. If possible, I would suggest to use the mainline kernels to start of a rough manual bisection. http://kernel.ubuntu.com/~kernel-ppa/mainline/ -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1196295 Title: lxc-start enters uninterruptible sleep Status in The Linux Kernel: Confirmed Status in “linux” package in Ubuntu: Confirmed Status in “lxc” package in Ubuntu: Confirmed Bug description: After running and terminating around 6000 containers overnight, something happened on my box that is affecting every new LXC container I try to start. The DEBUG log file looks like: lxc-start 1372615570.399 WARN lxc_start - inherited fd 9 lxc-start 1372615570.399 INFO lxc_apparmor - aa_enabled set to 1 lxc-start 1372615570.399 DEBUG lxc_conf - allocated pty '/dev/pts/302' (5/6) lxc-start 1372615570.399 DEBUG lxc_conf - allocated pty '/dev/pts/303' (7/8) lxc-start 1372615570.399 DEBUG lxc_conf - allocated pty '/dev/pts/304' (10/11) lxc-start 1372615570.399 DEBUG lxc_conf - allocated pty '/dev/pts/305' (12/13) lxc-start 1372615570.399 DEBUG lxc_conf - allocated pty '/dev/pts/306' (14/15) lxc-start 1372615570.399 DEBUG lxc_conf - allocated pty '/dev/pts/307' (16/17) lxc-start 1372615570.399 DEBUG lxc_conf - allocated pty '/dev/pts/308' (18/19) lxc-start 1372615570.399 DEBUG lxc_conf - allocated pty '/dev/pts/309' (20/21) lxc-start 1372615570.399 INFO lxc_conf - tty's configured lxc-start 1372615570.399 DEBUG lxc_start - sigchild handler set lxc-start 1372615570.399 INFO lxc_start - 'vm-59' is initialized lxc-start 1372615570.404 DEBUG lxc_start - Not dropping cap_sys_boot or watching utmp lxc-start 1372615570.404 INFO lxc_start - stored saved_nic #0 idx 12392 name vethP59 lxc-start 1372615570.404 INFO lxc_conf - opened /home/x/vm/vm-59.hold as fd 25 It stops there. In 'ps faux', it looks like: root 31621 0.0 0.0 25572 1272 ? D 14:06 0:00 \_ lxc-start -n vm-59 -f /tmp/tmp.fG6T6ERZpS -l DEBUG -o /home/x/lxcdebug/vm-59.txt -- /usr/sbin/dropbear -F -E -m On a successful LXC run (prior to the server getting into this state), this hangs just before: lxc-start 1372394092.208 DEBUG lxc_cgroup - checking '/' (rootfs) lxc-start 1372394092.208 DEBUG lxc_cgroup - checking '/sys' (sysfs) lxc-start 1372394092.208 DEBUG lxc_cgroup - checking '/proc' (proc) lxc-start 1372394092.208 DEBUG lxc_cgroup - checking '/dev' (devtmpfs) lxc-start 1372394092.208 DEBUG lxc_cgroup - checking '/dev/pts' (devpts) lxc-start 1372394092.208 DEBUG lxc_cgroup - checking '/run' (tmpfs) lxc-start 1372394092.208 DEBUG lxc_cgroup - checking '/' (btrfs) lxc-start 1372394092.208 DEBUG lxc_cgroup - checking '/sys/fs/cgroup' (tmpfs) lxc-start 1372394092.208 DEBUG lxc_cgroup - checking '/sys/fs/cgroup/cpuset' (cgroup) lxc-start 1372394092.208 INFO lxc_cgroup - [1] found cgroup mounted at '/sys/fs/cgroup/cpuset',opts='rw,relatime,cpuset,clone_children' lxc-start 1372394092.208 DEBUG lxc_cgroup - get_init_cgroup: found init cgroup for subsys (null) at / It looks like a resource leak, but I'm not yet sure of what that would be. If it matters, I SIGKILL my lxc-start processes instead of using lxc- stop. Could that have any negative implications? Oh, and cgroups had almost 6000 entries for VMs that are long dead (I'm guessing it's due to my SIGKILL). I've run cgclear and my /sys/fs/cgroup/*/ dirs are now totally empty, but the new containers still hang. --- Architecture: amd64 DistroRelease: Ubuntu 13.04 MarkForUpload: True Package: lxc 0.9.0-0ubuntu3.3 PackageArchitecture: amd64 ProcEnviron: TERM=screen PATH=(custom, no user) LANG=en_US.UTF-8 SHELL=/bin/bash Uname: Linux 3.8.0-25-generic x86_64 UserGroups: To manage notifications about this bug go to: https://bugs.launchpad.net/linux/+bug/1196295/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp