[Bug 1196295] Re: lxc-start enters uninterruptible sleep

2013-07-17 Thread Pavel Bennett
The last one I'm aware of that did not exhibit this issue was 3.5.0-27.
I wish I had a simpler repro though, since on our system it takes 10-15
hours of heavy processing to hit the uninterruptible sleeps.

Could it be tracked by looking at the state of the OS? Every new lxc-
start ends up hanging after it happens.

-- 
You received this bug notification because you are a member of Ubuntu
Server Team, which is subscribed to lxc in Ubuntu.
https://bugs.launchpad.net/bugs/1196295

Title:
  lxc-start enters uninterruptible sleep

To manage notifications about this bug go to:
https://bugs.launchpad.net/linux/+bug/1196295/+subscriptions

-- 
Ubuntu-server-bugs mailing list
Ubuntu-server-bugs@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs


[Bug 1196295] Re: lxc-start enters uninterruptible sleep

2013-07-05 Thread Pavel Bennett
** Tags added: kernel-bug-exists-upstream

-- 
You received this bug notification because you are a member of Ubuntu
Server Team, which is subscribed to lxc in Ubuntu.
https://bugs.launchpad.net/bugs/1196295

Title:
  lxc-start enters uninterruptible sleep

To manage notifications about this bug go to:
https://bugs.launchpad.net/linux/+bug/1196295/+subscriptions

-- 
Ubuntu-server-bugs mailing list
Ubuntu-server-bugs@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs


[Bug 1196295] Re: lxc-start enters uninterruptible sleep

2013-07-05 Thread Pavel Bennett
Managed to repro with v3.10-saucy last night.

What do you guys suspect it could be?

I'm keeping the server in this state for now if you'd like me to gather
some data.

-- 
You received this bug notification because you are a member of Ubuntu
Server Team, which is subscribed to lxc in Ubuntu.
https://bugs.launchpad.net/bugs/1196295

Title:
  lxc-start enters uninterruptible sleep

To manage notifications about this bug go to:
https://bugs.launchpad.net/linux/+bug/1196295/+subscriptions

-- 
Ubuntu-server-bugs mailing list
Ubuntu-server-bugs@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs


[Bug 1196295] Re: lxc-start enters uninterruptible sleep

2013-07-03 Thread Pavel Bennett
Changing to Confirmed as per instructions in comment #7

** Changed in: linux
   Status: Incomplete = Confirmed

-- 
You received this bug notification because you are a member of Ubuntu
Server Team, which is subscribed to lxc in Ubuntu.
https://bugs.launchpad.net/bugs/1196295

Title:
  lxc-start enters uninterruptible sleep

To manage notifications about this bug go to:
https://bugs.launchpad.net/linux/+bug/1196295/+subscriptions

-- 
Ubuntu-server-bugs mailing list
Ubuntu-server-bugs@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs


[Bug 1196295] Re: lxc-start enters uninterruptible sleep

2013-07-02 Thread Pavel Bennett
Reproduced even with lxc-stop.

dmesg:

[178420.689704] unregister_netdevice: waiting for lo to become free. Usage 
count = 1
[178430.919783] unregister_netdevice: waiting for lo to become free. Usage 
count = 1
[178441.149854] unregister_netdevice: waiting for lo to become free. Usage 
count = 1
[178451.379920] unregister_netdevice: waiting for lo to become free. Usage 
count = 1

ps:

root 31536  0.0  0.0  25572  1276 ?D06:35   0:00 lxc-
start -n vm-106 -f /tmp/tmp.IsBlHPIMWw -l DEBUG -o
/home/x/lxcdebug/vm-106.txt -- /usr/sbin/dropbear -F -E -m

lxcdebug/vm-106.txt:

  lxc-start 1372761341.653 WARN lxc_start - inherited fd 9
  lxc-start 1372761341.673 INFO lxc_apparmor - aa_enabled set to 1

  lxc-start 1372761341.674 DEBUGlxc_conf - allocated pty '/dev/pts/845' 
(5/6)
  lxc-start 1372761341.674 DEBUGlxc_conf - allocated pty '/dev/pts/846' 
(7/8)
  lxc-start 1372761341.674 DEBUGlxc_conf - allocated pty '/dev/pts/847' 
(10/11)
  lxc-start 1372761341.674 DEBUGlxc_conf - allocated pty '/dev/pts/848' 
(12/13)
  lxc-start 1372761341.674 DEBUGlxc_conf - allocated pty '/dev/pts/849' 
(14/15)
  lxc-start 1372761341.674 DEBUGlxc_conf - allocated pty '/dev/pts/850' 
(16/17)
  lxc-start 1372761341.674 DEBUGlxc_conf - allocated pty '/dev/pts/851' 
(18/19)
  lxc-start 1372761341.674 DEBUGlxc_conf - allocated pty '/dev/pts/852' 
(20/21)
  lxc-start 1372761341.674 INFO lxc_conf - tty's configured
  lxc-start 1372761341.674 DEBUGlxc_start - sigchild handler set
  lxc-start 1372761341.674 INFO lxc_start - 'vm-106' is initialized
  lxc-start 1372761341.675 DEBUGlxc_start - Not dropping cap_sys_boot 
or watching utmp

  lxc-start 1372761341.676 DEBUGlxc_conf - mac address of host 
interface 'vethfS0zzk' changed to private fe:d6:f6:ea:af:ba
  lxc-start 1372761341.676 DEBUGlxc_conf - instanciated veth 
'vethfS0zzk/vethTtYxwJ', index is '23962'
  lxc-start 1372761341.676 INFO lxc_conf - opened 
/home/x/vm/vm-106.hold as fd 25
-

cgroups are clean -- absolutely no vm-X folders under
/sys/fs/cgroup/*/lxc/

lxc.network.type is now 'veth' as I've moved off of using 'phys', but
the result is the same.

This happened after lxc-start / lxc-stop somewhere between 7k and 15k
times. Similar number as reported last time this happened, which really
suggests a leak of some sort.

I have the box in this state right now.

What other details could be helpful?

-- 
You received this bug notification because you are a member of Ubuntu
Server Team, which is subscribed to lxc in Ubuntu.
https://bugs.launchpad.net/bugs/1196295

Title:
  lxc-start enters uninterruptible sleep

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/1196295/+subscriptions

-- 
Ubuntu-server-bugs mailing list
Ubuntu-server-bugs@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs


[Bug 1196295] Re: lxc-start enters uninterruptible sleep

2013-07-02 Thread Pavel Bennett
This should help.

kern.log:

Jul  2 05:41:32 server1 kernel: [136565.201601] device vethbJ4JsM left 
promiscuous mode
Jul  2 05:41:32 server1 kernel: [136565.201603] vmbr: port 5(vethbJ4JsM) 
entered disabled state
Jul  2 05:41:38 server1 kernel: [136570.551496] vmbr: port 2(veth49SiBX) 
entered forwarding state
Jul  2 05:41:38 server1 kernel: [136570.971787] device vethgEUinJ entered 
promiscuous mode
Jul  2 05:41:38 server1 kernel: [136570.971858] IPv6: ADDRCONF(NETDEV_UP): 
vethgEUinJ: link is not ready
Jul  2 05:41:39 server1 kernel: [136571.574489] vmbr: port 3(vethdl0Frj) 
entered forwarding state
Jul  2 05:41:42 server1 kernel: [136575.242996] unregister_netdevice: waiting 
for lo to become free. Usage count = 1
Jul  2 05:41:42 server1 kernel: [136575.282886] vmbr: port 4(vethwdXg5p) 
entered forwarding state
Jul  2 05:41:44 server1 kernel: [136576.945295] vmbr: port 6(vethxUZwwG) 
entered forwarding state
Jul  2 05:41:47 server1 kernel: [136580.142190] vmbr: port 7(vethpuofJs) 
entered forwarding state
Jul  2 05:41:53 server1 kernel: [136585.473065] unregister_netdevice: waiting 
for lo to become free. Usage count = 1
Jul  2 05:42:03 server1 kernel: [136595.703141] unregister_netdevice: waiting 
for lo to become free. Usage count = 1
Jul  2 05:42:13 server1 kernel: [136605.933214] unregister_netdevice: waiting 
for lo to become free. Usage count = 1
Jul  2 05:42:23 server1 kernel: [136616.163283] unregister_netdevice: waiting 
for lo to become free. Usage count = 1
Jul  2 05:42:33 server1 kernel: [136626.237470] unregister_netdevice: waiting 
for lo to become free. Usage count = 1
Jul  2 05:42:34 server1 kernel: [136627.221675] device vethkKcWdk entered 
promiscuous mode
Jul  2 05:42:34 server1 kernel: [136627.221727] IPv6: ADDRCONF(NETDEV_UP): 
vethkKcWdk: link is not ready
Jul  2 05:42:38 server1 kernel: [136630.800271] device veth5guMb3 entered 
promiscuous mode
Jul  2 05:42:38 server1 kernel: [136630.800341] IPv6: ADDRCONF(NETDEV_UP): 
veth5guMb3: link is not ready
Jul  2 05:42:44 server1 kernel: [136636.471570] unregister_netdevice: waiting 
for lo to become free. Usage count = 1
Jul  2 05:42:54 server1 kernel: [136646.701652] unregister_netdevice: waiting 
for lo to become free. Usage count = 1
Jul  2 05:43:04 server1 kernel: [136656.931720] unregister_netdevice: waiting 
for lo to become free. Usage count = 1
Jul  2 05:43:14 server1 kernel: [136667.161755] unregister_netdevice: waiting 
for lo to become free. Usage count = 1
Jul  2 05:43:25 server1 kernel: [136677.391864] unregister_netdevice: waiting 
for lo to become free. Usage count = 1
Jul  2 05:43:34 server1 kernel: [136686.721504] device vethOk6Soj entered 
promiscuous mode
Jul  2 05:43:34 server1 kernel: [136686.721584] IPv6: ADDRCONF(NETDEV_UP): 
vethOk6Soj: link is not ready
Jul  2 05:43:35 server1 kernel: [136687.621938] unregister_netdevice: waiting 
for lo to become free. Usage count = 1
Jul  2 05:43:38 server1 kernel: [136690.857458] device vethjxu2iI entered 
promiscuous mode
Jul  2 05:43:38 server1 kernel: [136690.857565] IPv6: ADDRCONF(NETDEV_UP): 
vethjxu2iI: link is not ready
Jul  2 05:43:45 server1 kernel: [136697.852009] unregister_netdevice: waiting 
for lo to become free. Usage count = 1
Jul  2 05:43:55 server1 kernel: [136708.082040] unregister_netdevice: waiting 
for lo to become free. Usage count = 1
Jul  2 05:44:05 server1 kernel: [136718.312155] unregister_netdevice: waiting 
for lo to become free. Usage count = 1
Jul  2 05:44:16 server1 kernel: [136728.542232] unregister_netdevice: waiting 
for lo to become free. Usage count = 1
Jul  2 05:44:26 server1 kernel: [136738.772298] unregister_netdevice: waiting 
for lo to become free. Usage count = 1
Jul  2 05:44:34 server1 kernel: [136746.713370] device vethxTRBXO entered 
promiscuous mode
Jul  2 05:44:34 server1 kernel: [136746.713442] IPv6: ADDRCONF(NETDEV_UP): 
vethxTRBXO: link is not ready
Jul  2 05:44:36 server1 kernel: [136749.002369] unregister_netdevice: waiting 
for lo to become free. Usage count = 1
Jul  2 05:44:38 server1 kernel: [136750.845172] device veth1ILXYc entered 
promiscuous mode
Jul  2 05:44:38 server1 kernel: [136750.845262] IPv6: ADDRCONF(NETDEV_UP): 
veth1ILXYc: link is not ready
Jul  2 05:44:46 server1 kernel: [136759.236438] unregister_netdevice: waiting 
for lo to become free. Usage count = 1
Jul  2 05:44:57 server1 kernel: [136769.466512] unregister_netdevice: waiting 
for lo to become free. Usage count = 1
Jul  2 05:45:07 server1 kernel: [136779.696578] unregister_netdevice: waiting 
for lo to become free. Usage count = 1
Jul  2 05:45:15 server1 kernel: [136787.680886] INFO: task lxc-start:27612 
blocked for more than 120 seconds.
Jul  2 05:45:15 server1 kernel: [136787.680925] echo 0  
/proc/sys/kernel/hung_task_timeout_secs disables this message.
Jul  2 05:45:15 server1 kernel: [136787.680962] lxc-start   D 
88041ecd3f40 0 27612  19646 0x
Jul  2 05:45:15 server1 kernel: [136787.680967]  88010bd57d20 

[Bug 1196295] Re: lxc-start enters uninterruptible sleep

2013-07-02 Thread Pavel Bennett
apport information

** Tags added: apport-collected

** Description changed:

  After running and terminating around 6000 containers overnight,
  something happened on my box that is affecting every new LXC container I
  try to start. The DEBUG log file looks like:
  
lxc-start 1372615570.399 WARN lxc_start - inherited fd 9
lxc-start 1372615570.399 INFO lxc_apparmor - aa_enabled set to 1
  
lxc-start 1372615570.399 DEBUGlxc_conf - allocated pty 
'/dev/pts/302' (5/6)
lxc-start 1372615570.399 DEBUGlxc_conf - allocated pty 
'/dev/pts/303' (7/8)
lxc-start 1372615570.399 DEBUGlxc_conf - allocated pty 
'/dev/pts/304' (10/11)
lxc-start 1372615570.399 DEBUGlxc_conf - allocated pty 
'/dev/pts/305' (12/13)
lxc-start 1372615570.399 DEBUGlxc_conf - allocated pty 
'/dev/pts/306' (14/15)
lxc-start 1372615570.399 DEBUGlxc_conf - allocated pty 
'/dev/pts/307' (16/17)
lxc-start 1372615570.399 DEBUGlxc_conf - allocated pty 
'/dev/pts/308' (18/19)
lxc-start 1372615570.399 DEBUGlxc_conf - allocated pty 
'/dev/pts/309' (20/21)
lxc-start 1372615570.399 INFO lxc_conf - tty's configured
lxc-start 1372615570.399 DEBUGlxc_start - sigchild handler set
lxc-start 1372615570.399 INFO lxc_start - 'vm-59' is initialized
lxc-start 1372615570.404 DEBUGlxc_start - Not dropping cap_sys_boot 
or watching utmp
  
lxc-start 1372615570.404 INFO lxc_start - stored saved_nic #0
  idx 12392 name vethP59
  
lxc-start 1372615570.404 INFO lxc_conf - opened
  /home/x/vm/vm-59.hold as fd 25
  
  It stops there. In 'ps faux', it looks like:
  
  root 31621  0.0  0.0  25572  1272 ?D14:06   0:00  \_
  lxc-start -n vm-59 -f /tmp/tmp.fG6T6ERZpS -l DEBUG -o
  /home/x/lxcdebug/vm-59.txt -- /usr/sbin/dropbear -F -E -m
  
  On a successful LXC run (prior to the server getting into this state),
  this hangs just before:
  
lxc-start 1372394092.208 DEBUGlxc_cgroup - checking '/' (rootfs)
lxc-start 1372394092.208 DEBUGlxc_cgroup - checking '/sys' (sysfs)
lxc-start 1372394092.208 DEBUGlxc_cgroup - checking '/proc' (proc)
lxc-start 1372394092.208 DEBUGlxc_cgroup - checking '/dev' 
(devtmpfs)
lxc-start 1372394092.208 DEBUGlxc_cgroup - checking '/dev/pts' 
(devpts)
lxc-start 1372394092.208 DEBUGlxc_cgroup - checking '/run' (tmpfs)
lxc-start 1372394092.208 DEBUGlxc_cgroup - checking '/' (btrfs)
lxc-start 1372394092.208 DEBUGlxc_cgroup - checking 
'/sys/fs/cgroup' (tmpfs)
lxc-start 1372394092.208 DEBUGlxc_cgroup - checking 
'/sys/fs/cgroup/cpuset' (cgroup)
lxc-start 1372394092.208 INFO lxc_cgroup - [1] found cgroup mounted 
at '/sys/fs/cgroup/cpuset',opts='rw,relatime,cpuset,clone_children'
lxc-start 1372394092.208 DEBUGlxc_cgroup - get_init_cgroup: found 
init cgroup for subsys (null) at /
  
  It looks like a resource leak, but I'm not yet sure of what that would
  be.
  
  If it matters, I SIGKILL my lxc-start processes instead of using lxc-
  stop. Could that have any negative implications?
  
- Oh, and cgroups had almost 6000 entries for VMs that are long dead (I'm
- guessing it's due to my SIGKILL). I've run cgclear and my
- /sys/fs/cgroup/*/ dirs are now totally empty, but the new containers
- still hang.
+ Oh, and cgroups had almost 6000 entries for VMs that are long dead (I'm 
guessing it's due to my SIGKILL). I've run cgclear and my /sys/fs/cgroup/*/ 
dirs are now totally empty, but the new containers still hang.
+ --- 
+ Architecture: amd64
+ DistroRelease: Ubuntu 13.04
+ MarkForUpload: True
+ Package: lxc 0.9.0-0ubuntu3.3
+ PackageArchitecture: amd64
+ ProcEnviron:
+  TERM=screen
+  PATH=(custom, no user)
+  LANG=en_US.UTF-8
+  SHELL=/bin/bash
+ Uname: Linux 3.8.0-25-generic x86_64
+ UserGroups:

** Attachment added: Dependencies.txt
   
https://bugs.launchpad.net/bugs/1196295/+attachment/3722559/+files/Dependencies.txt

-- 
You received this bug notification because you are a member of Ubuntu
Server Team, which is subscribed to lxc in Ubuntu.
https://bugs.launchpad.net/bugs/1196295

Title:
  lxc-start enters uninterruptible sleep

To manage notifications about this bug go to:
https://bugs.launchpad.net/linux/+bug/1196295/+subscriptions

-- 
Ubuntu-server-bugs mailing list
Ubuntu-server-bugs@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs


[Bug 1196295] HookError_generic.txt

2013-07-02 Thread Pavel Bennett
apport information

** Attachment added: HookError_generic.txt
   
https://bugs.launchpad.net/bugs/1196295/+attachment/3722561/+files/HookError_generic.txt

-- 
You received this bug notification because you are a member of Ubuntu
Server Team, which is subscribed to lxc in Ubuntu.
https://bugs.launchpad.net/bugs/1196295

Title:
  lxc-start enters uninterruptible sleep

To manage notifications about this bug go to:
https://bugs.launchpad.net/linux/+bug/1196295/+subscriptions

-- 
Ubuntu-server-bugs mailing list
Ubuntu-server-bugs@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs


[Bug 1196295] HookError_cloud_archive.txt

2013-07-02 Thread Pavel Bennett
apport information

** Attachment added: HookError_cloud_archive.txt
   
https://bugs.launchpad.net/bugs/1196295/+attachment/3722560/+files/HookError_cloud_archive.txt

-- 
You received this bug notification because you are a member of Ubuntu
Server Team, which is subscribed to lxc in Ubuntu.
https://bugs.launchpad.net/bugs/1196295

Title:
  lxc-start enters uninterruptible sleep

To manage notifications about this bug go to:
https://bugs.launchpad.net/linux/+bug/1196295/+subscriptions

-- 
Ubuntu-server-bugs mailing list
Ubuntu-server-bugs@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs


[Bug 1196295] HookError_ubuntu.txt

2013-07-02 Thread Pavel Bennett
apport information

** Attachment added: HookError_ubuntu.txt
   
https://bugs.launchpad.net/bugs/1196295/+attachment/3722564/+files/HookError_ubuntu.txt

-- 
You received this bug notification because you are a member of Ubuntu
Server Team, which is subscribed to lxc in Ubuntu.
https://bugs.launchpad.net/bugs/1196295

Title:
  lxc-start enters uninterruptible sleep

To manage notifications about this bug go to:
https://bugs.launchpad.net/linux/+bug/1196295/+subscriptions

-- 
Ubuntu-server-bugs mailing list
Ubuntu-server-bugs@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs


[Bug 1196295] HookError_source_lxc.txt

2013-07-02 Thread Pavel Bennett
apport information

** Attachment added: HookError_source_lxc.txt
   
https://bugs.launchpad.net/bugs/1196295/+attachment/3722563/+files/HookError_source_lxc.txt

-- 
You received this bug notification because you are a member of Ubuntu
Server Team, which is subscribed to lxc in Ubuntu.
https://bugs.launchpad.net/bugs/1196295

Title:
  lxc-start enters uninterruptible sleep

To manage notifications about this bug go to:
https://bugs.launchpad.net/linux/+bug/1196295/+subscriptions

-- 
Ubuntu-server-bugs mailing list
Ubuntu-server-bugs@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs


[Bug 1196295] HookError_source_linux.txt

2013-07-02 Thread Pavel Bennett
apport information

** Attachment added: HookError_source_linux.txt
   
https://bugs.launchpad.net/bugs/1196295/+attachment/3722562/+files/HookError_source_linux.txt

-- 
You received this bug notification because you are a member of Ubuntu
Server Team, which is subscribed to lxc in Ubuntu.
https://bugs.launchpad.net/bugs/1196295

Title:
  lxc-start enters uninterruptible sleep

To manage notifications about this bug go to:
https://bugs.launchpad.net/linux/+bug/1196295/+subscriptions

-- 
Ubuntu-server-bugs mailing list
Ubuntu-server-bugs@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs


[Bug 1196295] Re: lxc-start enters uninterruptible sleep

2013-07-02 Thread Pavel Bennett
Looks like apport was missing some module to gather what it wanted. Let
me know if this info would be valuable and I can re-run it.

** Changed in: lxc (Ubuntu)
   Status: Incomplete = Confirmed

-- 
You received this bug notification because you are a member of Ubuntu
Server Team, which is subscribed to lxc in Ubuntu.
https://bugs.launchpad.net/bugs/1196295

Title:
  lxc-start enters uninterruptible sleep

To manage notifications about this bug go to:
https://bugs.launchpad.net/linux/+bug/1196295/+subscriptions

-- 
Ubuntu-server-bugs mailing list
Ubuntu-server-bugs@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs


[Bug 1196295] Re: lxc-start enters uninterruptible sleep

2013-07-02 Thread Pavel Bennett
Attaching a better apport file after installing the missing dependency.
I will hide the ones from earlier as this will contain the same data and
more.


** Attachment added: apport.lxc.txt
   
https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/1196295/+attachment/3722586/+files/apport.lxc.txt

-- 
You received this bug notification because you are a member of Ubuntu
Server Team, which is subscribed to lxc in Ubuntu.
https://bugs.launchpad.net/bugs/1196295

Title:
  lxc-start enters uninterruptible sleep

To manage notifications about this bug go to:
https://bugs.launchpad.net/linux/+bug/1196295/+subscriptions

-- 
Ubuntu-server-bugs mailing list
Ubuntu-server-bugs@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs


[Bug 1196295] [NEW] lxc-start enters uninterruptible sleep

2013-06-30 Thread Pavel Bennett
Public bug reported:

After running and terminating around 6000 containers overnight,
something happened on my box that is affecting every new LXC container I
try to start. The DEBUG log file looks like:

  lxc-start 1372615570.399 WARN lxc_start - inherited fd 9
  lxc-start 1372615570.399 INFO lxc_apparmor - aa_enabled set to 1

  lxc-start 1372615570.399 DEBUGlxc_conf - allocated pty '/dev/pts/302' 
(5/6)
  lxc-start 1372615570.399 DEBUGlxc_conf - allocated pty '/dev/pts/303' 
(7/8)
  lxc-start 1372615570.399 DEBUGlxc_conf - allocated pty '/dev/pts/304' 
(10/11)
  lxc-start 1372615570.399 DEBUGlxc_conf - allocated pty '/dev/pts/305' 
(12/13)
  lxc-start 1372615570.399 DEBUGlxc_conf - allocated pty '/dev/pts/306' 
(14/15)
  lxc-start 1372615570.399 DEBUGlxc_conf - allocated pty '/dev/pts/307' 
(16/17)
  lxc-start 1372615570.399 DEBUGlxc_conf - allocated pty '/dev/pts/308' 
(18/19)
  lxc-start 1372615570.399 DEBUGlxc_conf - allocated pty '/dev/pts/309' 
(20/21)
  lxc-start 1372615570.399 INFO lxc_conf - tty's configured
  lxc-start 1372615570.399 DEBUGlxc_start - sigchild handler set
  lxc-start 1372615570.399 INFO lxc_start - 'vm-59' is initialized
  lxc-start 1372615570.404 DEBUGlxc_start - Not dropping cap_sys_boot 
or watching utmp

  lxc-start 1372615570.404 INFO lxc_start - stored saved_nic #0
idx 12392 name vethP59

  lxc-start 1372615570.404 INFO lxc_conf - opened
/home/x/vm/vm-59.hold as fd 25

It stops there. In 'ps faux', it looks like:

root 31621  0.0  0.0  25572  1272 ?D14:06   0:00  \_
lxc-start -n vm-59 -f /tmp/tmp.fG6T6ERZpS -l DEBUG -o
/home/x/lxcdebug/vm-59.txt -- /usr/sbin/dropbear -F -E -m

On a successful LXC run (prior to the server getting into this state),
this hangs just before:

  lxc-start 1372394092.208 DEBUGlxc_cgroup - checking '/' (rootfs)
  lxc-start 1372394092.208 DEBUGlxc_cgroup - checking '/sys' (sysfs)
  lxc-start 1372394092.208 DEBUGlxc_cgroup - checking '/proc' (proc)
  lxc-start 1372394092.208 DEBUGlxc_cgroup - checking '/dev' (devtmpfs)
  lxc-start 1372394092.208 DEBUGlxc_cgroup - checking '/dev/pts' 
(devpts)
  lxc-start 1372394092.208 DEBUGlxc_cgroup - checking '/run' (tmpfs)
  lxc-start 1372394092.208 DEBUGlxc_cgroup - checking '/' (btrfs)
  lxc-start 1372394092.208 DEBUGlxc_cgroup - checking '/sys/fs/cgroup' 
(tmpfs)
  lxc-start 1372394092.208 DEBUGlxc_cgroup - checking 
'/sys/fs/cgroup/cpuset' (cgroup)
  lxc-start 1372394092.208 INFO lxc_cgroup - [1] found cgroup mounted 
at '/sys/fs/cgroup/cpuset',opts='rw,relatime,cpuset,clone_children'
  lxc-start 1372394092.208 DEBUGlxc_cgroup - get_init_cgroup: found 
init cgroup for subsys (null) at /

It looks like a resource leak, but I'm not yet sure of what that would
be.

If it matters, I SIGKILL my lxc-start processes instead of using lxc-
stop. Could that have any negative implications?

Oh, and cgroups had almost 6000 entries for VMs that are long dead (I'm
guessing it's due to my SIGKILL). I've run cgclear and my
/sys/fs/cgroup/*/ dirs are now totally empty, but the new containers
still hang.

** Affects: lxc (Ubuntu)
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Ubuntu
Server Team, which is subscribed to lxc in Ubuntu.
https://bugs.launchpad.net/bugs/1196295

Title:
  lxc-start enters uninterruptible sleep

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/1196295/+subscriptions

-- 
Ubuntu-server-bugs mailing list
Ubuntu-server-bugs@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs


[Bug 1196295] Re: lxc-start enters uninterruptible sleep

2013-06-30 Thread Pavel Bennett
Also, in dmesg:

[54545.873460] unregister_netdevice: waiting for lo to become free. Usage count 
= 1
[54556.103535] unregister_netdevice: waiting for lo to become free. Usage count 
= 1
[54566.333609] unregister_netdevice: waiting for lo to become free. Usage count 
= 1
[54576.563664] unregister_netdevice: waiting for lo to become free. Usage count 
= 1
[54586.793749] unregister_netdevice: waiting for lo to become free. Usage count 
= 1

I've modified my code to use lxc-stop as the cgroups do indeed leak
otherwise. What's strange is that it kept happening after clearing
cgroups, so perhaps it's something else.

-- 
You received this bug notification because you are a member of Ubuntu
Server Team, which is subscribed to lxc in Ubuntu.
https://bugs.launchpad.net/bugs/1196295

Title:
  lxc-start enters uninterruptible sleep

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/1196295/+subscriptions

-- 
Ubuntu-server-bugs mailing list
Ubuntu-server-bugs@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs


[Bug 1196295] Re: lxc-start enters uninterruptible sleep

2013-06-30 Thread Pavel Bennett
Some basic environment details. I can post more if requested.

Ubuntu Server 13.04 64-bit

$ uname -r
3.8.0-25-generic

$ dpkg -l | grep lxc
ii  liblxc00.9.0-0ubuntu3.3
amd64Linux Containers userspace tools (library)
ii  lxc0.9.0-0ubuntu3.3
amd64Linux Containers userspace tools

lxc.network.type = phys
lxc.network.flags = up
lxc.network.link = vethP0
lxc.network.ipv4 = 10.1.0.1
lxc.network.ipv4.gateway = 10.1.0.0
lxc.network.name = eth0

Prior to running lxc, I set up the interface pair as follows:

ip link add name vethH0 type veth peer name vethP0
ifconfig vethH0 10.1.0.0/31 up
route add -host 10.1.0.1 dev vethH0

-- 
You received this bug notification because you are a member of Ubuntu
Server Team, which is subscribed to lxc in Ubuntu.
https://bugs.launchpad.net/bugs/1196295

Title:
  lxc-start enters uninterruptible sleep

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/1196295/+subscriptions

-- 
Ubuntu-server-bugs mailing list
Ubuntu-server-bugs@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs


[Bug 1168526] Re: race condition causing lxc to not detect container init process exit

2013-06-28 Thread Pavel Bennett
Hey Serge, let me know if that repro worked for you or when you're
planning to give it a try. I'm keeping the VM image around in case you
need it.

 What's odd is that I can't even reproduce it with the daily ppa build,
 which doesn't have the workaround which is in the ubuntu package.

Did you try it on a single-core machine/VM? Looks like you need 2 or
more.

-- 
You received this bug notification because you are a member of Ubuntu
Server Team, which is subscribed to lxc in Ubuntu.
https://bugs.launchpad.net/bugs/1168526

Title:
  race condition causing lxc to not detect container init process exit

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/1168526/+subscriptions

-- 
Ubuntu-server-bugs mailing list
Ubuntu-server-bugs@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs


[Bug 1168526] Re: race condition causing lxc to not detect container init process exit

2013-06-21 Thread Pavel Bennett
I can't try Saucy right now, but the repro instructions with kernel
versions are in the original post and in #2.

We've tried node v0.11.2 as well on Raring and got the repro.

Repro summary:

Install any of the above kernels, such as the one with the Raring installer, 
then install lxc from apt.
Get node v0.11.0 - v0.11.2 or try any other recent version if that's easier, 
just not below v0.10.
lxc init process is /usr/local/bin/node /boot/mio-init.js which contains only 
process.exit(0); and try running 100 of these, whether in parallel or one at 
a time. Continue until lxc-start doesn't exit. Run ps faux and look for 
zombie children of lxc-start.

Let me know if that works for you :)

-- 
You received this bug notification because you are a member of Ubuntu
Server Team, which is subscribed to lxc in Ubuntu.
https://bugs.launchpad.net/bugs/1168526

Title:
  race condition causing lxc to not detect container init process exit

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/1168526/+subscriptions

-- 
Ubuntu-server-bugs mailing list
Ubuntu-server-bugs@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs


[Bug 1168526] Re: race condition causing lxc to not detect container init process exit

2013-06-21 Thread Pavel Bennett
Sure, run these inside the container:

git clone https://github.com/joyent/node.git --depth 1

cd node
./configure
make -j9
sudo make install

Then the binary will be at /usr/local/bin/node

It's v0.11.3-pre, but should still repro.

-- 
You received this bug notification because you are a member of Ubuntu
Server Team, which is subscribed to lxc in Ubuntu.
https://bugs.launchpad.net/bugs/1168526

Title:
  race condition causing lxc to not detect container init process exit

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/1168526/+subscriptions

-- 
Ubuntu-server-bugs mailing list
Ubuntu-server-bugs@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs


[Bug 1168526] Re: race condition causing lxc to not detect container init process exit

2013-06-21 Thread Pavel Bennett
I created a VM with Ubuntu Server 13.04 just for this bug. At first, I
was able to run the steps outlined above 50 times with no issues. What
was I missing? Concurrency! I rebooted the VM after adding 1 more core,
and... bingo! Zombies on the 3rd try.

The VM disk image I have here should be compatible with KVM, but give it
a try on your Saucy VM with 2 or more cores. If you can't repro it then,
I'll send you my repro VM :)

-- 
You received this bug notification because you are a member of Ubuntu
Server Team, which is subscribed to lxc in Ubuntu.
https://bugs.launchpad.net/bugs/1168526

Title:
  race condition causing lxc to not detect container init process exit

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/1168526/+subscriptions

-- 
Ubuntu-server-bugs mailing list
Ubuntu-server-bugs@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs


[Bug 1168526] Re: race condition causing lxc to not detect container init process exit

2013-06-20 Thread Pavel Bennett
Hey Serge, were you able to get a reliable repro for this? I have a
reason to upgrade to Raring, and this seems to be the only blocker.
We've reproduced the issue with the stock Linux Mint 15.

-- 
You received this bug notification because you are a member of Ubuntu
Server Team, which is subscribed to lxc in Ubuntu.
https://bugs.launchpad.net/bugs/1168526

Title:
  race condition causing lxc to not detect container init process exit

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/1168526/+subscriptions

-- 
Ubuntu-server-bugs mailing list
Ubuntu-server-bugs@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs


[Bug 1168526] Re: race condition causing lxc to not detect container init process exit

2013-04-20 Thread Pavel Bennett
I've also tried it with a C++ app very similar to yours and was unable
to repro. There is something about having node.js as the init process
running a process.exit(0); js. The init process (node v0.11.0) does
exit as ps faux shows it as a zombie and a child of lxc-start.

I went back to kernel 3.5.0-27 and lxc 0.8.0~rc1-4ubuntu39.12.10.2 for
now as it seems to be the one without problems for our use case. The lxc
I was using with 3.8.6 was 0.9.0-ubuntu2.

-- 
You received this bug notification because you are a member of Ubuntu
Server Team, which is subscribed to lxc in Ubuntu.
https://bugs.launchpad.net/bugs/1168526

Title:
  race condition causing lxc to not detect container init process exit

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/1168526/+subscriptions

-- 
Ubuntu-server-bugs mailing list
Ubuntu-server-bugs@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs


[Bug 1168526] Re: race condition causing lxc to not detect container init process exit

2013-04-15 Thread Pavel Bennett
Btw, that queueing mode would simply mean not calling epoll_wait until
the pid is available. This shouldn't require managing a queue ourselves.
Can you think of anything that this would break?

Or we could go with the patch you've written, although I haven't looked
into why the problem appears to be back in 3.8.6 with the patch.

-- 
You received this bug notification because you are a member of Ubuntu
Server Team, which is subscribed to lxc in Ubuntu.
https://bugs.launchpad.net/bugs/1168526

Title:
  race condition causing lxc to not detect container init process exit

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/1168526/+subscriptions

-- 
Ubuntu-server-bugs mailing list
Ubuntu-server-bugs@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs


[Bug 1168526] Re: race condition causing lxc to not detect container init process exit

2013-04-13 Thread Pavel Bennett
I should add that these forwarded signal 2 lines are due to me
pressing Ctrl+C and are not actually relevant.

Have you been able to repro this bug on kernel 3.8.6?

I'm thinking how to fix this as lxc_spawn is what gets the pid which is
needed by lxc_poll to listen for SIGCHLD from the correct pid, but
lxc_poll should logically go before lxc_spawn to avoid this race.

How about starting lxc_poll first in a queueing mode, so it just
accumulates the signals but doesn't process them yet? When handler-pid
= lxc_clone(do_start, handler, handler-clone_flags); returns, notify
lxc_poll of the pid, thus it should now have all the info it needs,
switching it out of queueing mode and initiating a loop to process the
queued events now that the pid is known.

-- 
You received this bug notification because you are a member of Ubuntu
Server Team, which is subscribed to lxc in Ubuntu.
https://bugs.launchpad.net/bugs/1168526

Title:
  race condition causing lxc to not detect container init process exit

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/1168526/+subscriptions

-- 
Ubuntu-server-bugs mailing list
Ubuntu-server-bugs@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs


[Bug 1168526] [NEW] race condition causing lxc to not detect container init process exit

2013-04-12 Thread Pavel Bennett
Public bug reported:

For the purpose of the repro, my lxc init process is node.js v0.11.0
(built from source) with a single line:

process.exit(0);

When running it in lxc, sometimes lxc doesn't exit. lxc-start remains a
parent of a defunct node process without reaping it or exiting.

I've made a custom build of lxc 0.9.0 to extract more information about
this, adding only an INFO line, as follows:

start.c:

if (ret != sizeof(siginfo)) {
ERROR(unexpected siginfo size);
return -1;
}
+INFO(got signal %d from pid %d while expecting SIGCHLD(17) from pid 
%d | uid = %d, status = %d, siginfo.ssi_signo, siginfo.ssi_pid, *pid, 
siginfo.ssi_uid, siginfo.ssi_status);

if (siginfo.ssi_signo != SIGCHLD) {
kill(*pid, siginfo.ssi_signo);
INFO(forwarded signal %d to pid %d, siginfo.ssi_signo, *pid);
return 0;
}

I've tried this with a 3 official kernels. There is one difference in
output.

Kernels 3.7.9, 3.8.6:

Successful case:

  lxc-start 1365724008.446 NOTICE   lxc_start - '/usr/local/bin/node' 
started with pid '19458'
  lxc-start 1365724008.446 INFO lxc_console - no console will be used
  lxc-start 1365724008.446 INFO lxc_start - got signal 17 from pid 
18165 while expecting SIGCHLD(17) from pid 19458 | uid = 0, status = 1
  lxc-start 1365724008.446 WARN lxc_start - invalid pid for SIGCHLD
  lxc-start 1365724038.306 INFO lxc_start - got signal 17 from pid 
19458 while expecting SIGCHLD(17) from pid 19458 | uid = 0, status = 0
  lxc-start 1365724038.306 DEBUGlxc_start - container init process 
exited

Hanging case:

  lxc-start 1365795195.358 NOTICE   lxc_start - '/usr/local/bin/node' 
started with pid '8650'
  lxc-start 1365795195.358 INFO lxc_console - no console will be used
  lxc-start 1365795195.358 INFO lxc_start - got signal 17 from pid 8626 
while expecting SIGCHLD(17) from pid 8650 | uid = 0, status = 1
  lxc-start 1365795195.358 WARN lxc_start - invalid pid for SIGCHLD
  lxc-start 1365795333.347 INFO lxc_start - got signal 2 from pid 0 
while expecting SIGCHLD(17) from pid 8650 | uid = 0, status = 0
  lxc-start 1365795333.347 INFO lxc_start - forwarded signal 2 to pid 
8650

Kernel 3.9.0-rc6:

Successful case is the same, but the hanging case changes to just:

  lxc-start 1365794343.870 NOTICE   lxc_start - '/usr/local/bin/node' 
started with pid '3432'
  lxc-start 1365794343.870 INFO lxc_console - no console will be used
  lxc-start 1365794343.870 INFO lxc_start - got signal 17 from pid 2851 
while expecting SIGCHLD(17) from pid 3432 | uid = 0, status = 1
  lxc-start 1365794343.870 WARN lxc_start - invalid pid for SIGCHLD

... without forwarding signal 2 (SIGINT).

Notes:
- I'm on Mint 14 Nadia with raring packages, if that helps.
- In all cases, there is signal 17 (SIGCHLD) coming in to lxc-start, but it 
comes from a different pid and is ignored by lxc. Any idea what this could be? 
This process seems to have been cleaned up and no longer appears in ps aux.
- The lxc-start process should be getting notified with a SIGCHLD from the 
child's pid when the child (init process) exits.
- This could be a kernel bug, but it's probably something unique that lxc is 
doing to trigger it.
- I've tried other init processes (node.js without the process.exit and a 
custom c++ app with a stdout write and exit 0), which greatly reduce the 
frequency of this happening.

** Affects: lxc (Ubuntu)
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Ubuntu
Server Team, which is subscribed to lxc in Ubuntu.
https://bugs.launchpad.net/bugs/1168526

Title:
  race condition causing lxc to not detect container init process exit

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/1168526/+subscriptions

-- 
Ubuntu-server-bugs mailing list
Ubuntu-server-bugs@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs


[Bug 1168526] Re: race condition causing lxc to not detect container init process exit

2013-04-12 Thread Pavel Bennett
 Precisely which version of lxc were you using?

I just put back version 0.9.0-0ubuntu2 (as opposed to the 0.9.0 I built
from source) while on kernel 3.7.9-030709-generic and haven't yet run
into this issue (I assume that's the patch you mentioned). However, when
I update to kernel 3.8.6-030806-generic (leaving the same version of
lxc), 12 out of 100 containers experienced what looks like this exact
problem:

  lxc-start 1365805078.586 NOTICE   lxc_start - '/usr/local/bin/node' 
started with pid '4107'
  lxc-start 1365805078.586 INFO lxc_console - no console will be used
  lxc-start 1365805078.586 WARN lxc_start - invalid pid for SIGCHLD
  lxc-start 1365805115.998 INFO lxc_start - forwarded signal 2 to pid 
4107

-- 
You received this bug notification because you are a member of Ubuntu
Server Team, which is subscribed to lxc in Ubuntu.
https://bugs.launchpad.net/bugs/1168526

Title:
  race condition causing lxc to not detect container init process exit

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/1168526/+subscriptions

-- 
Ubuntu-server-bugs mailing list
Ubuntu-server-bugs@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs