[Bug 1196295] Re: lxc-start enters uninterruptible sleep
The last one I'm aware of that did not exhibit this issue was 3.5.0-27. I wish I had a simpler repro though, since on our system it takes 10-15 hours of heavy processing to hit the uninterruptible sleeps. Could it be tracked by looking at the state of the OS? Every new lxc- start ends up hanging after it happens. -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to lxc in Ubuntu. https://bugs.launchpad.net/bugs/1196295 Title: lxc-start enters uninterruptible sleep To manage notifications about this bug go to: https://bugs.launchpad.net/linux/+bug/1196295/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1196295] Re: lxc-start enters uninterruptible sleep
** Tags added: kernel-bug-exists-upstream -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to lxc in Ubuntu. https://bugs.launchpad.net/bugs/1196295 Title: lxc-start enters uninterruptible sleep To manage notifications about this bug go to: https://bugs.launchpad.net/linux/+bug/1196295/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1196295] Re: lxc-start enters uninterruptible sleep
Managed to repro with v3.10-saucy last night. What do you guys suspect it could be? I'm keeping the server in this state for now if you'd like me to gather some data. -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to lxc in Ubuntu. https://bugs.launchpad.net/bugs/1196295 Title: lxc-start enters uninterruptible sleep To manage notifications about this bug go to: https://bugs.launchpad.net/linux/+bug/1196295/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1196295] Re: lxc-start enters uninterruptible sleep
Changing to Confirmed as per instructions in comment #7 ** Changed in: linux Status: Incomplete = Confirmed -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to lxc in Ubuntu. https://bugs.launchpad.net/bugs/1196295 Title: lxc-start enters uninterruptible sleep To manage notifications about this bug go to: https://bugs.launchpad.net/linux/+bug/1196295/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1196295] Re: lxc-start enters uninterruptible sleep
Reproduced even with lxc-stop. dmesg: [178420.689704] unregister_netdevice: waiting for lo to become free. Usage count = 1 [178430.919783] unregister_netdevice: waiting for lo to become free. Usage count = 1 [178441.149854] unregister_netdevice: waiting for lo to become free. Usage count = 1 [178451.379920] unregister_netdevice: waiting for lo to become free. Usage count = 1 ps: root 31536 0.0 0.0 25572 1276 ?D06:35 0:00 lxc- start -n vm-106 -f /tmp/tmp.IsBlHPIMWw -l DEBUG -o /home/x/lxcdebug/vm-106.txt -- /usr/sbin/dropbear -F -E -m lxcdebug/vm-106.txt: lxc-start 1372761341.653 WARN lxc_start - inherited fd 9 lxc-start 1372761341.673 INFO lxc_apparmor - aa_enabled set to 1 lxc-start 1372761341.674 DEBUGlxc_conf - allocated pty '/dev/pts/845' (5/6) lxc-start 1372761341.674 DEBUGlxc_conf - allocated pty '/dev/pts/846' (7/8) lxc-start 1372761341.674 DEBUGlxc_conf - allocated pty '/dev/pts/847' (10/11) lxc-start 1372761341.674 DEBUGlxc_conf - allocated pty '/dev/pts/848' (12/13) lxc-start 1372761341.674 DEBUGlxc_conf - allocated pty '/dev/pts/849' (14/15) lxc-start 1372761341.674 DEBUGlxc_conf - allocated pty '/dev/pts/850' (16/17) lxc-start 1372761341.674 DEBUGlxc_conf - allocated pty '/dev/pts/851' (18/19) lxc-start 1372761341.674 DEBUGlxc_conf - allocated pty '/dev/pts/852' (20/21) lxc-start 1372761341.674 INFO lxc_conf - tty's configured lxc-start 1372761341.674 DEBUGlxc_start - sigchild handler set lxc-start 1372761341.674 INFO lxc_start - 'vm-106' is initialized lxc-start 1372761341.675 DEBUGlxc_start - Not dropping cap_sys_boot or watching utmp lxc-start 1372761341.676 DEBUGlxc_conf - mac address of host interface 'vethfS0zzk' changed to private fe:d6:f6:ea:af:ba lxc-start 1372761341.676 DEBUGlxc_conf - instanciated veth 'vethfS0zzk/vethTtYxwJ', index is '23962' lxc-start 1372761341.676 INFO lxc_conf - opened /home/x/vm/vm-106.hold as fd 25 - cgroups are clean -- absolutely no vm-X folders under /sys/fs/cgroup/*/lxc/ lxc.network.type is now 'veth' as I've moved off of using 'phys', but the result is the same. This happened after lxc-start / lxc-stop somewhere between 7k and 15k times. Similar number as reported last time this happened, which really suggests a leak of some sort. I have the box in this state right now. What other details could be helpful? -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to lxc in Ubuntu. https://bugs.launchpad.net/bugs/1196295 Title: lxc-start enters uninterruptible sleep To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/1196295/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1196295] Re: lxc-start enters uninterruptible sleep
This should help. kern.log: Jul 2 05:41:32 server1 kernel: [136565.201601] device vethbJ4JsM left promiscuous mode Jul 2 05:41:32 server1 kernel: [136565.201603] vmbr: port 5(vethbJ4JsM) entered disabled state Jul 2 05:41:38 server1 kernel: [136570.551496] vmbr: port 2(veth49SiBX) entered forwarding state Jul 2 05:41:38 server1 kernel: [136570.971787] device vethgEUinJ entered promiscuous mode Jul 2 05:41:38 server1 kernel: [136570.971858] IPv6: ADDRCONF(NETDEV_UP): vethgEUinJ: link is not ready Jul 2 05:41:39 server1 kernel: [136571.574489] vmbr: port 3(vethdl0Frj) entered forwarding state Jul 2 05:41:42 server1 kernel: [136575.242996] unregister_netdevice: waiting for lo to become free. Usage count = 1 Jul 2 05:41:42 server1 kernel: [136575.282886] vmbr: port 4(vethwdXg5p) entered forwarding state Jul 2 05:41:44 server1 kernel: [136576.945295] vmbr: port 6(vethxUZwwG) entered forwarding state Jul 2 05:41:47 server1 kernel: [136580.142190] vmbr: port 7(vethpuofJs) entered forwarding state Jul 2 05:41:53 server1 kernel: [136585.473065] unregister_netdevice: waiting for lo to become free. Usage count = 1 Jul 2 05:42:03 server1 kernel: [136595.703141] unregister_netdevice: waiting for lo to become free. Usage count = 1 Jul 2 05:42:13 server1 kernel: [136605.933214] unregister_netdevice: waiting for lo to become free. Usage count = 1 Jul 2 05:42:23 server1 kernel: [136616.163283] unregister_netdevice: waiting for lo to become free. Usage count = 1 Jul 2 05:42:33 server1 kernel: [136626.237470] unregister_netdevice: waiting for lo to become free. Usage count = 1 Jul 2 05:42:34 server1 kernel: [136627.221675] device vethkKcWdk entered promiscuous mode Jul 2 05:42:34 server1 kernel: [136627.221727] IPv6: ADDRCONF(NETDEV_UP): vethkKcWdk: link is not ready Jul 2 05:42:38 server1 kernel: [136630.800271] device veth5guMb3 entered promiscuous mode Jul 2 05:42:38 server1 kernel: [136630.800341] IPv6: ADDRCONF(NETDEV_UP): veth5guMb3: link is not ready Jul 2 05:42:44 server1 kernel: [136636.471570] unregister_netdevice: waiting for lo to become free. Usage count = 1 Jul 2 05:42:54 server1 kernel: [136646.701652] unregister_netdevice: waiting for lo to become free. Usage count = 1 Jul 2 05:43:04 server1 kernel: [136656.931720] unregister_netdevice: waiting for lo to become free. Usage count = 1 Jul 2 05:43:14 server1 kernel: [136667.161755] unregister_netdevice: waiting for lo to become free. Usage count = 1 Jul 2 05:43:25 server1 kernel: [136677.391864] unregister_netdevice: waiting for lo to become free. Usage count = 1 Jul 2 05:43:34 server1 kernel: [136686.721504] device vethOk6Soj entered promiscuous mode Jul 2 05:43:34 server1 kernel: [136686.721584] IPv6: ADDRCONF(NETDEV_UP): vethOk6Soj: link is not ready Jul 2 05:43:35 server1 kernel: [136687.621938] unregister_netdevice: waiting for lo to become free. Usage count = 1 Jul 2 05:43:38 server1 kernel: [136690.857458] device vethjxu2iI entered promiscuous mode Jul 2 05:43:38 server1 kernel: [136690.857565] IPv6: ADDRCONF(NETDEV_UP): vethjxu2iI: link is not ready Jul 2 05:43:45 server1 kernel: [136697.852009] unregister_netdevice: waiting for lo to become free. Usage count = 1 Jul 2 05:43:55 server1 kernel: [136708.082040] unregister_netdevice: waiting for lo to become free. Usage count = 1 Jul 2 05:44:05 server1 kernel: [136718.312155] unregister_netdevice: waiting for lo to become free. Usage count = 1 Jul 2 05:44:16 server1 kernel: [136728.542232] unregister_netdevice: waiting for lo to become free. Usage count = 1 Jul 2 05:44:26 server1 kernel: [136738.772298] unregister_netdevice: waiting for lo to become free. Usage count = 1 Jul 2 05:44:34 server1 kernel: [136746.713370] device vethxTRBXO entered promiscuous mode Jul 2 05:44:34 server1 kernel: [136746.713442] IPv6: ADDRCONF(NETDEV_UP): vethxTRBXO: link is not ready Jul 2 05:44:36 server1 kernel: [136749.002369] unregister_netdevice: waiting for lo to become free. Usage count = 1 Jul 2 05:44:38 server1 kernel: [136750.845172] device veth1ILXYc entered promiscuous mode Jul 2 05:44:38 server1 kernel: [136750.845262] IPv6: ADDRCONF(NETDEV_UP): veth1ILXYc: link is not ready Jul 2 05:44:46 server1 kernel: [136759.236438] unregister_netdevice: waiting for lo to become free. Usage count = 1 Jul 2 05:44:57 server1 kernel: [136769.466512] unregister_netdevice: waiting for lo to become free. Usage count = 1 Jul 2 05:45:07 server1 kernel: [136779.696578] unregister_netdevice: waiting for lo to become free. Usage count = 1 Jul 2 05:45:15 server1 kernel: [136787.680886] INFO: task lxc-start:27612 blocked for more than 120 seconds. Jul 2 05:45:15 server1 kernel: [136787.680925] echo 0 /proc/sys/kernel/hung_task_timeout_secs disables this message. Jul 2 05:45:15 server1 kernel: [136787.680962] lxc-start D 88041ecd3f40 0 27612 19646 0x Jul 2 05:45:15 server1 kernel: [136787.680967] 88010bd57d20
[Bug 1196295] Re: lxc-start enters uninterruptible sleep
apport information ** Tags added: apport-collected ** Description changed: After running and terminating around 6000 containers overnight, something happened on my box that is affecting every new LXC container I try to start. The DEBUG log file looks like: lxc-start 1372615570.399 WARN lxc_start - inherited fd 9 lxc-start 1372615570.399 INFO lxc_apparmor - aa_enabled set to 1 lxc-start 1372615570.399 DEBUGlxc_conf - allocated pty '/dev/pts/302' (5/6) lxc-start 1372615570.399 DEBUGlxc_conf - allocated pty '/dev/pts/303' (7/8) lxc-start 1372615570.399 DEBUGlxc_conf - allocated pty '/dev/pts/304' (10/11) lxc-start 1372615570.399 DEBUGlxc_conf - allocated pty '/dev/pts/305' (12/13) lxc-start 1372615570.399 DEBUGlxc_conf - allocated pty '/dev/pts/306' (14/15) lxc-start 1372615570.399 DEBUGlxc_conf - allocated pty '/dev/pts/307' (16/17) lxc-start 1372615570.399 DEBUGlxc_conf - allocated pty '/dev/pts/308' (18/19) lxc-start 1372615570.399 DEBUGlxc_conf - allocated pty '/dev/pts/309' (20/21) lxc-start 1372615570.399 INFO lxc_conf - tty's configured lxc-start 1372615570.399 DEBUGlxc_start - sigchild handler set lxc-start 1372615570.399 INFO lxc_start - 'vm-59' is initialized lxc-start 1372615570.404 DEBUGlxc_start - Not dropping cap_sys_boot or watching utmp lxc-start 1372615570.404 INFO lxc_start - stored saved_nic #0 idx 12392 name vethP59 lxc-start 1372615570.404 INFO lxc_conf - opened /home/x/vm/vm-59.hold as fd 25 It stops there. In 'ps faux', it looks like: root 31621 0.0 0.0 25572 1272 ?D14:06 0:00 \_ lxc-start -n vm-59 -f /tmp/tmp.fG6T6ERZpS -l DEBUG -o /home/x/lxcdebug/vm-59.txt -- /usr/sbin/dropbear -F -E -m On a successful LXC run (prior to the server getting into this state), this hangs just before: lxc-start 1372394092.208 DEBUGlxc_cgroup - checking '/' (rootfs) lxc-start 1372394092.208 DEBUGlxc_cgroup - checking '/sys' (sysfs) lxc-start 1372394092.208 DEBUGlxc_cgroup - checking '/proc' (proc) lxc-start 1372394092.208 DEBUGlxc_cgroup - checking '/dev' (devtmpfs) lxc-start 1372394092.208 DEBUGlxc_cgroup - checking '/dev/pts' (devpts) lxc-start 1372394092.208 DEBUGlxc_cgroup - checking '/run' (tmpfs) lxc-start 1372394092.208 DEBUGlxc_cgroup - checking '/' (btrfs) lxc-start 1372394092.208 DEBUGlxc_cgroup - checking '/sys/fs/cgroup' (tmpfs) lxc-start 1372394092.208 DEBUGlxc_cgroup - checking '/sys/fs/cgroup/cpuset' (cgroup) lxc-start 1372394092.208 INFO lxc_cgroup - [1] found cgroup mounted at '/sys/fs/cgroup/cpuset',opts='rw,relatime,cpuset,clone_children' lxc-start 1372394092.208 DEBUGlxc_cgroup - get_init_cgroup: found init cgroup for subsys (null) at / It looks like a resource leak, but I'm not yet sure of what that would be. If it matters, I SIGKILL my lxc-start processes instead of using lxc- stop. Could that have any negative implications? - Oh, and cgroups had almost 6000 entries for VMs that are long dead (I'm - guessing it's due to my SIGKILL). I've run cgclear and my - /sys/fs/cgroup/*/ dirs are now totally empty, but the new containers - still hang. + Oh, and cgroups had almost 6000 entries for VMs that are long dead (I'm guessing it's due to my SIGKILL). I've run cgclear and my /sys/fs/cgroup/*/ dirs are now totally empty, but the new containers still hang. + --- + Architecture: amd64 + DistroRelease: Ubuntu 13.04 + MarkForUpload: True + Package: lxc 0.9.0-0ubuntu3.3 + PackageArchitecture: amd64 + ProcEnviron: + TERM=screen + PATH=(custom, no user) + LANG=en_US.UTF-8 + SHELL=/bin/bash + Uname: Linux 3.8.0-25-generic x86_64 + UserGroups: ** Attachment added: Dependencies.txt https://bugs.launchpad.net/bugs/1196295/+attachment/3722559/+files/Dependencies.txt -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to lxc in Ubuntu. https://bugs.launchpad.net/bugs/1196295 Title: lxc-start enters uninterruptible sleep To manage notifications about this bug go to: https://bugs.launchpad.net/linux/+bug/1196295/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1196295] HookError_generic.txt
apport information ** Attachment added: HookError_generic.txt https://bugs.launchpad.net/bugs/1196295/+attachment/3722561/+files/HookError_generic.txt -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to lxc in Ubuntu. https://bugs.launchpad.net/bugs/1196295 Title: lxc-start enters uninterruptible sleep To manage notifications about this bug go to: https://bugs.launchpad.net/linux/+bug/1196295/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1196295] HookError_cloud_archive.txt
apport information ** Attachment added: HookError_cloud_archive.txt https://bugs.launchpad.net/bugs/1196295/+attachment/3722560/+files/HookError_cloud_archive.txt -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to lxc in Ubuntu. https://bugs.launchpad.net/bugs/1196295 Title: lxc-start enters uninterruptible sleep To manage notifications about this bug go to: https://bugs.launchpad.net/linux/+bug/1196295/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1196295] HookError_ubuntu.txt
apport information ** Attachment added: HookError_ubuntu.txt https://bugs.launchpad.net/bugs/1196295/+attachment/3722564/+files/HookError_ubuntu.txt -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to lxc in Ubuntu. https://bugs.launchpad.net/bugs/1196295 Title: lxc-start enters uninterruptible sleep To manage notifications about this bug go to: https://bugs.launchpad.net/linux/+bug/1196295/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1196295] HookError_source_lxc.txt
apport information ** Attachment added: HookError_source_lxc.txt https://bugs.launchpad.net/bugs/1196295/+attachment/3722563/+files/HookError_source_lxc.txt -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to lxc in Ubuntu. https://bugs.launchpad.net/bugs/1196295 Title: lxc-start enters uninterruptible sleep To manage notifications about this bug go to: https://bugs.launchpad.net/linux/+bug/1196295/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1196295] HookError_source_linux.txt
apport information ** Attachment added: HookError_source_linux.txt https://bugs.launchpad.net/bugs/1196295/+attachment/3722562/+files/HookError_source_linux.txt -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to lxc in Ubuntu. https://bugs.launchpad.net/bugs/1196295 Title: lxc-start enters uninterruptible sleep To manage notifications about this bug go to: https://bugs.launchpad.net/linux/+bug/1196295/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1196295] Re: lxc-start enters uninterruptible sleep
Looks like apport was missing some module to gather what it wanted. Let me know if this info would be valuable and I can re-run it. ** Changed in: lxc (Ubuntu) Status: Incomplete = Confirmed -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to lxc in Ubuntu. https://bugs.launchpad.net/bugs/1196295 Title: lxc-start enters uninterruptible sleep To manage notifications about this bug go to: https://bugs.launchpad.net/linux/+bug/1196295/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1196295] Re: lxc-start enters uninterruptible sleep
Attaching a better apport file after installing the missing dependency. I will hide the ones from earlier as this will contain the same data and more. ** Attachment added: apport.lxc.txt https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/1196295/+attachment/3722586/+files/apport.lxc.txt -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to lxc in Ubuntu. https://bugs.launchpad.net/bugs/1196295 Title: lxc-start enters uninterruptible sleep To manage notifications about this bug go to: https://bugs.launchpad.net/linux/+bug/1196295/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1196295] [NEW] lxc-start enters uninterruptible sleep
Public bug reported: After running and terminating around 6000 containers overnight, something happened on my box that is affecting every new LXC container I try to start. The DEBUG log file looks like: lxc-start 1372615570.399 WARN lxc_start - inherited fd 9 lxc-start 1372615570.399 INFO lxc_apparmor - aa_enabled set to 1 lxc-start 1372615570.399 DEBUGlxc_conf - allocated pty '/dev/pts/302' (5/6) lxc-start 1372615570.399 DEBUGlxc_conf - allocated pty '/dev/pts/303' (7/8) lxc-start 1372615570.399 DEBUGlxc_conf - allocated pty '/dev/pts/304' (10/11) lxc-start 1372615570.399 DEBUGlxc_conf - allocated pty '/dev/pts/305' (12/13) lxc-start 1372615570.399 DEBUGlxc_conf - allocated pty '/dev/pts/306' (14/15) lxc-start 1372615570.399 DEBUGlxc_conf - allocated pty '/dev/pts/307' (16/17) lxc-start 1372615570.399 DEBUGlxc_conf - allocated pty '/dev/pts/308' (18/19) lxc-start 1372615570.399 DEBUGlxc_conf - allocated pty '/dev/pts/309' (20/21) lxc-start 1372615570.399 INFO lxc_conf - tty's configured lxc-start 1372615570.399 DEBUGlxc_start - sigchild handler set lxc-start 1372615570.399 INFO lxc_start - 'vm-59' is initialized lxc-start 1372615570.404 DEBUGlxc_start - Not dropping cap_sys_boot or watching utmp lxc-start 1372615570.404 INFO lxc_start - stored saved_nic #0 idx 12392 name vethP59 lxc-start 1372615570.404 INFO lxc_conf - opened /home/x/vm/vm-59.hold as fd 25 It stops there. In 'ps faux', it looks like: root 31621 0.0 0.0 25572 1272 ?D14:06 0:00 \_ lxc-start -n vm-59 -f /tmp/tmp.fG6T6ERZpS -l DEBUG -o /home/x/lxcdebug/vm-59.txt -- /usr/sbin/dropbear -F -E -m On a successful LXC run (prior to the server getting into this state), this hangs just before: lxc-start 1372394092.208 DEBUGlxc_cgroup - checking '/' (rootfs) lxc-start 1372394092.208 DEBUGlxc_cgroup - checking '/sys' (sysfs) lxc-start 1372394092.208 DEBUGlxc_cgroup - checking '/proc' (proc) lxc-start 1372394092.208 DEBUGlxc_cgroup - checking '/dev' (devtmpfs) lxc-start 1372394092.208 DEBUGlxc_cgroup - checking '/dev/pts' (devpts) lxc-start 1372394092.208 DEBUGlxc_cgroup - checking '/run' (tmpfs) lxc-start 1372394092.208 DEBUGlxc_cgroup - checking '/' (btrfs) lxc-start 1372394092.208 DEBUGlxc_cgroup - checking '/sys/fs/cgroup' (tmpfs) lxc-start 1372394092.208 DEBUGlxc_cgroup - checking '/sys/fs/cgroup/cpuset' (cgroup) lxc-start 1372394092.208 INFO lxc_cgroup - [1] found cgroup mounted at '/sys/fs/cgroup/cpuset',opts='rw,relatime,cpuset,clone_children' lxc-start 1372394092.208 DEBUGlxc_cgroup - get_init_cgroup: found init cgroup for subsys (null) at / It looks like a resource leak, but I'm not yet sure of what that would be. If it matters, I SIGKILL my lxc-start processes instead of using lxc- stop. Could that have any negative implications? Oh, and cgroups had almost 6000 entries for VMs that are long dead (I'm guessing it's due to my SIGKILL). I've run cgclear and my /sys/fs/cgroup/*/ dirs are now totally empty, but the new containers still hang. ** Affects: lxc (Ubuntu) Importance: Undecided Status: New -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to lxc in Ubuntu. https://bugs.launchpad.net/bugs/1196295 Title: lxc-start enters uninterruptible sleep To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/1196295/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1196295] Re: lxc-start enters uninterruptible sleep
Also, in dmesg: [54545.873460] unregister_netdevice: waiting for lo to become free. Usage count = 1 [54556.103535] unregister_netdevice: waiting for lo to become free. Usage count = 1 [54566.333609] unregister_netdevice: waiting for lo to become free. Usage count = 1 [54576.563664] unregister_netdevice: waiting for lo to become free. Usage count = 1 [54586.793749] unregister_netdevice: waiting for lo to become free. Usage count = 1 I've modified my code to use lxc-stop as the cgroups do indeed leak otherwise. What's strange is that it kept happening after clearing cgroups, so perhaps it's something else. -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to lxc in Ubuntu. https://bugs.launchpad.net/bugs/1196295 Title: lxc-start enters uninterruptible sleep To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/1196295/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1196295] Re: lxc-start enters uninterruptible sleep
Some basic environment details. I can post more if requested. Ubuntu Server 13.04 64-bit $ uname -r 3.8.0-25-generic $ dpkg -l | grep lxc ii liblxc00.9.0-0ubuntu3.3 amd64Linux Containers userspace tools (library) ii lxc0.9.0-0ubuntu3.3 amd64Linux Containers userspace tools lxc.network.type = phys lxc.network.flags = up lxc.network.link = vethP0 lxc.network.ipv4 = 10.1.0.1 lxc.network.ipv4.gateway = 10.1.0.0 lxc.network.name = eth0 Prior to running lxc, I set up the interface pair as follows: ip link add name vethH0 type veth peer name vethP0 ifconfig vethH0 10.1.0.0/31 up route add -host 10.1.0.1 dev vethH0 -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to lxc in Ubuntu. https://bugs.launchpad.net/bugs/1196295 Title: lxc-start enters uninterruptible sleep To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/1196295/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1168526] Re: race condition causing lxc to not detect container init process exit
Hey Serge, let me know if that repro worked for you or when you're planning to give it a try. I'm keeping the VM image around in case you need it. What's odd is that I can't even reproduce it with the daily ppa build, which doesn't have the workaround which is in the ubuntu package. Did you try it on a single-core machine/VM? Looks like you need 2 or more. -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to lxc in Ubuntu. https://bugs.launchpad.net/bugs/1168526 Title: race condition causing lxc to not detect container init process exit To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/1168526/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1168526] Re: race condition causing lxc to not detect container init process exit
I can't try Saucy right now, but the repro instructions with kernel versions are in the original post and in #2. We've tried node v0.11.2 as well on Raring and got the repro. Repro summary: Install any of the above kernels, such as the one with the Raring installer, then install lxc from apt. Get node v0.11.0 - v0.11.2 or try any other recent version if that's easier, just not below v0.10. lxc init process is /usr/local/bin/node /boot/mio-init.js which contains only process.exit(0); and try running 100 of these, whether in parallel or one at a time. Continue until lxc-start doesn't exit. Run ps faux and look for zombie children of lxc-start. Let me know if that works for you :) -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to lxc in Ubuntu. https://bugs.launchpad.net/bugs/1168526 Title: race condition causing lxc to not detect container init process exit To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/1168526/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1168526] Re: race condition causing lxc to not detect container init process exit
Sure, run these inside the container: git clone https://github.com/joyent/node.git --depth 1 cd node ./configure make -j9 sudo make install Then the binary will be at /usr/local/bin/node It's v0.11.3-pre, but should still repro. -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to lxc in Ubuntu. https://bugs.launchpad.net/bugs/1168526 Title: race condition causing lxc to not detect container init process exit To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/1168526/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1168526] Re: race condition causing lxc to not detect container init process exit
I created a VM with Ubuntu Server 13.04 just for this bug. At first, I was able to run the steps outlined above 50 times with no issues. What was I missing? Concurrency! I rebooted the VM after adding 1 more core, and... bingo! Zombies on the 3rd try. The VM disk image I have here should be compatible with KVM, but give it a try on your Saucy VM with 2 or more cores. If you can't repro it then, I'll send you my repro VM :) -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to lxc in Ubuntu. https://bugs.launchpad.net/bugs/1168526 Title: race condition causing lxc to not detect container init process exit To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/1168526/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1168526] Re: race condition causing lxc to not detect container init process exit
Hey Serge, were you able to get a reliable repro for this? I have a reason to upgrade to Raring, and this seems to be the only blocker. We've reproduced the issue with the stock Linux Mint 15. -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to lxc in Ubuntu. https://bugs.launchpad.net/bugs/1168526 Title: race condition causing lxc to not detect container init process exit To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/1168526/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1168526] Re: race condition causing lxc to not detect container init process exit
I've also tried it with a C++ app very similar to yours and was unable to repro. There is something about having node.js as the init process running a process.exit(0); js. The init process (node v0.11.0) does exit as ps faux shows it as a zombie and a child of lxc-start. I went back to kernel 3.5.0-27 and lxc 0.8.0~rc1-4ubuntu39.12.10.2 for now as it seems to be the one without problems for our use case. The lxc I was using with 3.8.6 was 0.9.0-ubuntu2. -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to lxc in Ubuntu. https://bugs.launchpad.net/bugs/1168526 Title: race condition causing lxc to not detect container init process exit To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/1168526/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1168526] Re: race condition causing lxc to not detect container init process exit
Btw, that queueing mode would simply mean not calling epoll_wait until the pid is available. This shouldn't require managing a queue ourselves. Can you think of anything that this would break? Or we could go with the patch you've written, although I haven't looked into why the problem appears to be back in 3.8.6 with the patch. -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to lxc in Ubuntu. https://bugs.launchpad.net/bugs/1168526 Title: race condition causing lxc to not detect container init process exit To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/1168526/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1168526] Re: race condition causing lxc to not detect container init process exit
I should add that these forwarded signal 2 lines are due to me pressing Ctrl+C and are not actually relevant. Have you been able to repro this bug on kernel 3.8.6? I'm thinking how to fix this as lxc_spawn is what gets the pid which is needed by lxc_poll to listen for SIGCHLD from the correct pid, but lxc_poll should logically go before lxc_spawn to avoid this race. How about starting lxc_poll first in a queueing mode, so it just accumulates the signals but doesn't process them yet? When handler-pid = lxc_clone(do_start, handler, handler-clone_flags); returns, notify lxc_poll of the pid, thus it should now have all the info it needs, switching it out of queueing mode and initiating a loop to process the queued events now that the pid is known. -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to lxc in Ubuntu. https://bugs.launchpad.net/bugs/1168526 Title: race condition causing lxc to not detect container init process exit To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/1168526/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1168526] [NEW] race condition causing lxc to not detect container init process exit
Public bug reported: For the purpose of the repro, my lxc init process is node.js v0.11.0 (built from source) with a single line: process.exit(0); When running it in lxc, sometimes lxc doesn't exit. lxc-start remains a parent of a defunct node process without reaping it or exiting. I've made a custom build of lxc 0.9.0 to extract more information about this, adding only an INFO line, as follows: start.c: if (ret != sizeof(siginfo)) { ERROR(unexpected siginfo size); return -1; } +INFO(got signal %d from pid %d while expecting SIGCHLD(17) from pid %d | uid = %d, status = %d, siginfo.ssi_signo, siginfo.ssi_pid, *pid, siginfo.ssi_uid, siginfo.ssi_status); if (siginfo.ssi_signo != SIGCHLD) { kill(*pid, siginfo.ssi_signo); INFO(forwarded signal %d to pid %d, siginfo.ssi_signo, *pid); return 0; } I've tried this with a 3 official kernels. There is one difference in output. Kernels 3.7.9, 3.8.6: Successful case: lxc-start 1365724008.446 NOTICE lxc_start - '/usr/local/bin/node' started with pid '19458' lxc-start 1365724008.446 INFO lxc_console - no console will be used lxc-start 1365724008.446 INFO lxc_start - got signal 17 from pid 18165 while expecting SIGCHLD(17) from pid 19458 | uid = 0, status = 1 lxc-start 1365724008.446 WARN lxc_start - invalid pid for SIGCHLD lxc-start 1365724038.306 INFO lxc_start - got signal 17 from pid 19458 while expecting SIGCHLD(17) from pid 19458 | uid = 0, status = 0 lxc-start 1365724038.306 DEBUGlxc_start - container init process exited Hanging case: lxc-start 1365795195.358 NOTICE lxc_start - '/usr/local/bin/node' started with pid '8650' lxc-start 1365795195.358 INFO lxc_console - no console will be used lxc-start 1365795195.358 INFO lxc_start - got signal 17 from pid 8626 while expecting SIGCHLD(17) from pid 8650 | uid = 0, status = 1 lxc-start 1365795195.358 WARN lxc_start - invalid pid for SIGCHLD lxc-start 1365795333.347 INFO lxc_start - got signal 2 from pid 0 while expecting SIGCHLD(17) from pid 8650 | uid = 0, status = 0 lxc-start 1365795333.347 INFO lxc_start - forwarded signal 2 to pid 8650 Kernel 3.9.0-rc6: Successful case is the same, but the hanging case changes to just: lxc-start 1365794343.870 NOTICE lxc_start - '/usr/local/bin/node' started with pid '3432' lxc-start 1365794343.870 INFO lxc_console - no console will be used lxc-start 1365794343.870 INFO lxc_start - got signal 17 from pid 2851 while expecting SIGCHLD(17) from pid 3432 | uid = 0, status = 1 lxc-start 1365794343.870 WARN lxc_start - invalid pid for SIGCHLD ... without forwarding signal 2 (SIGINT). Notes: - I'm on Mint 14 Nadia with raring packages, if that helps. - In all cases, there is signal 17 (SIGCHLD) coming in to lxc-start, but it comes from a different pid and is ignored by lxc. Any idea what this could be? This process seems to have been cleaned up and no longer appears in ps aux. - The lxc-start process should be getting notified with a SIGCHLD from the child's pid when the child (init process) exits. - This could be a kernel bug, but it's probably something unique that lxc is doing to trigger it. - I've tried other init processes (node.js without the process.exit and a custom c++ app with a stdout write and exit 0), which greatly reduce the frequency of this happening. ** Affects: lxc (Ubuntu) Importance: Undecided Status: New -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to lxc in Ubuntu. https://bugs.launchpad.net/bugs/1168526 Title: race condition causing lxc to not detect container init process exit To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/1168526/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1168526] Re: race condition causing lxc to not detect container init process exit
Precisely which version of lxc were you using? I just put back version 0.9.0-0ubuntu2 (as opposed to the 0.9.0 I built from source) while on kernel 3.7.9-030709-generic and haven't yet run into this issue (I assume that's the patch you mentioned). However, when I update to kernel 3.8.6-030806-generic (leaving the same version of lxc), 12 out of 100 containers experienced what looks like this exact problem: lxc-start 1365805078.586 NOTICE lxc_start - '/usr/local/bin/node' started with pid '4107' lxc-start 1365805078.586 INFO lxc_console - no console will be used lxc-start 1365805078.586 WARN lxc_start - invalid pid for SIGCHLD lxc-start 1365805115.998 INFO lxc_start - forwarded signal 2 to pid 4107 -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to lxc in Ubuntu. https://bugs.launchpad.net/bugs/1168526 Title: race condition causing lxc to not detect container init process exit To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/1168526/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs