[Bug 1196295] Re: lxc-start enters uninterruptible sleep

2013-07-17 Thread Pavel Bennett
The last one I'm aware of that did not exhibit this issue was 3.5.0-27. I wish I had a simpler repro though, since on our system it takes 10-15 hours of heavy processing to hit the uninterruptible sleeps. Could it be tracked by looking at the state of the OS? Every new lxc- start ends up hanging

[Bug 1196295] Re: lxc-start enters uninterruptible sleep

2013-07-17 Thread Pavel Bennett
The last one I'm aware of that did not exhibit this issue was 3.5.0-27. I wish I had a simpler repro though, since on our system it takes 10-15 hours of heavy processing to hit the uninterruptible sleeps. Could it be tracked by looking at the state of the OS? Every new lxc- start ends up hanging

[Bug 1196295] Re: lxc-start enters uninterruptible sleep

2013-07-05 Thread Pavel Bennett
** Tags added: kernel-bug-exists-upstream -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to lxc in Ubuntu. https://bugs.launchpad.net/bugs/1196295 Title: lxc-start enters uninterruptible sleep To manage notifications about this bug

[Bug 1196295] Re: lxc-start enters uninterruptible sleep

2013-07-05 Thread Pavel Bennett
Managed to repro with v3.10-saucy last night. What do you guys suspect it could be? I'm keeping the server in this state for now if you'd like me to gather some data. -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to lxc in Ubuntu.

[Bug 1196295] Re: lxc-start enters uninterruptible sleep

2013-07-05 Thread Pavel Bennett
** Tags added: kernel-bug-exists-upstream -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1196295 Title: lxc-start enters uninterruptible sleep To manage notifications about this bug go to:

[Bug 1196295] Re: lxc-start enters uninterruptible sleep

2013-07-05 Thread Pavel Bennett
Managed to repro with v3.10-saucy last night. What do you guys suspect it could be? I'm keeping the server in this state for now if you'd like me to gather some data. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu.

[Bug 1196295] Re: lxc-start enters uninterruptible sleep

2013-07-03 Thread Pavel Bennett
Changing to Confirmed as per instructions in comment #7 ** Changed in: linux Status: Incomplete = Confirmed -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to lxc in Ubuntu. https://bugs.launchpad.net/bugs/1196295 Title:

[Bug 1196295] Re: lxc-start enters uninterruptible sleep

2013-07-03 Thread Pavel Bennett
Changing to Confirmed as per instructions in comment #7 ** Changed in: linux Status: Incomplete = Confirmed -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1196295 Title: lxc-start enters

[Bug 1196295] Re: lxc-start enters uninterruptible sleep

2013-07-02 Thread Pavel Bennett
Reproduced even with lxc-stop. dmesg: [178420.689704] unregister_netdevice: waiting for lo to become free. Usage count = 1 [178430.919783] unregister_netdevice: waiting for lo to become free. Usage count = 1 [178441.149854] unregister_netdevice: waiting for lo to become free. Usage count = 1

[Bug 1196295] Re: lxc-start enters uninterruptible sleep

2013-07-02 Thread Pavel Bennett
This should help. kern.log: Jul 2 05:41:32 server1 kernel: [136565.201601] device vethbJ4JsM left promiscuous mode Jul 2 05:41:32 server1 kernel: [136565.201603] vmbr: port 5(vethbJ4JsM) entered disabled state Jul 2 05:41:38 server1 kernel: [136570.551496] vmbr: port 2(veth49SiBX) entered

[Bug 1196295] Re: lxc-start enters uninterruptible sleep

2013-07-02 Thread Pavel Bennett
apport information ** Tags added: apport-collected ** Description changed: After running and terminating around 6000 containers overnight, something happened on my box that is affecting every new LXC container I try to start. The DEBUG log file looks like: lxc-start

[Bug 1196295] HookError_generic.txt

2013-07-02 Thread Pavel Bennett
apport information ** Attachment added: HookError_generic.txt https://bugs.launchpad.net/bugs/1196295/+attachment/3722561/+files/HookError_generic.txt -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to lxc in Ubuntu.

[Bug 1196295] HookError_cloud_archive.txt

2013-07-02 Thread Pavel Bennett
apport information ** Attachment added: HookError_cloud_archive.txt https://bugs.launchpad.net/bugs/1196295/+attachment/3722560/+files/HookError_cloud_archive.txt -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to lxc in Ubuntu.

[Bug 1196295] HookError_ubuntu.txt

2013-07-02 Thread Pavel Bennett
apport information ** Attachment added: HookError_ubuntu.txt https://bugs.launchpad.net/bugs/1196295/+attachment/3722564/+files/HookError_ubuntu.txt -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to lxc in Ubuntu.

[Bug 1196295] HookError_source_lxc.txt

2013-07-02 Thread Pavel Bennett
apport information ** Attachment added: HookError_source_lxc.txt https://bugs.launchpad.net/bugs/1196295/+attachment/3722563/+files/HookError_source_lxc.txt -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to lxc in Ubuntu.

[Bug 1196295] HookError_source_linux.txt

2013-07-02 Thread Pavel Bennett
apport information ** Attachment added: HookError_source_linux.txt https://bugs.launchpad.net/bugs/1196295/+attachment/3722562/+files/HookError_source_linux.txt -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to lxc in Ubuntu.

[Bug 1196295] Re: lxc-start enters uninterruptible sleep

2013-07-02 Thread Pavel Bennett
Looks like apport was missing some module to gather what it wanted. Let me know if this info would be valuable and I can re-run it. ** Changed in: lxc (Ubuntu) Status: Incomplete = Confirmed -- You received this bug notification because you are a member of Ubuntu Server Team, which is

[Bug 1196295] Re: lxc-start enters uninterruptible sleep

2013-07-02 Thread Pavel Bennett
Attaching a better apport file after installing the missing dependency. I will hide the ones from earlier as this will contain the same data and more. ** Attachment added: apport.lxc.txt https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/1196295/+attachment/3722586/+files/apport.lxc.txt --

[Bug 1196295] Re: lxc-start enters uninterruptible sleep

2013-07-02 Thread Pavel Bennett
Reproduced even with lxc-stop. dmesg: [178420.689704] unregister_netdevice: waiting for lo to become free. Usage count = 1 [178430.919783] unregister_netdevice: waiting for lo to become free. Usage count = 1 [178441.149854] unregister_netdevice: waiting for lo to become free. Usage count = 1

[Bug 1196295] Re: lxc-start enters uninterruptible sleep

2013-07-02 Thread Pavel Bennett
This should help. kern.log: Jul 2 05:41:32 server1 kernel: [136565.201601] device vethbJ4JsM left promiscuous mode Jul 2 05:41:32 server1 kernel: [136565.201603] vmbr: port 5(vethbJ4JsM) entered disabled state Jul 2 05:41:38 server1 kernel: [136570.551496] vmbr: port 2(veth49SiBX) entered

[Bug 1196295] HookError_generic.txt

2013-07-02 Thread Pavel Bennett
apport information ** Attachment added: HookError_generic.txt https://bugs.launchpad.net/bugs/1196295/+attachment/3722561/+files/HookError_generic.txt -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu.

[Bug 1196295] Re: lxc-start enters uninterruptible sleep

2013-07-02 Thread Pavel Bennett
apport information ** Tags added: apport-collected ** Description changed: After running and terminating around 6000 containers overnight, something happened on my box that is affecting every new LXC container I try to start. The DEBUG log file looks like: lxc-start

[Bug 1196295] HookError_cloud_archive.txt

2013-07-02 Thread Pavel Bennett
apport information ** Attachment added: HookError_cloud_archive.txt https://bugs.launchpad.net/bugs/1196295/+attachment/3722560/+files/HookError_cloud_archive.txt -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu.

[Bug 1196295] HookError_source_lxc.txt

2013-07-02 Thread Pavel Bennett
apport information ** Attachment added: HookError_source_lxc.txt https://bugs.launchpad.net/bugs/1196295/+attachment/3722563/+files/HookError_source_lxc.txt -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu.

[Bug 1196295] HookError_source_linux.txt

2013-07-02 Thread Pavel Bennett
apport information ** Attachment added: HookError_source_linux.txt https://bugs.launchpad.net/bugs/1196295/+attachment/3722562/+files/HookError_source_linux.txt -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu.

[Bug 1196295] HookError_ubuntu.txt

2013-07-02 Thread Pavel Bennett
apport information ** Attachment added: HookError_ubuntu.txt https://bugs.launchpad.net/bugs/1196295/+attachment/3722564/+files/HookError_ubuntu.txt -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu.

[Bug 1196295] Re: lxc-start enters uninterruptible sleep

2013-07-02 Thread Pavel Bennett
Looks like apport was missing some module to gather what it wanted. Let me know if this info would be valuable and I can re-run it. ** Changed in: lxc (Ubuntu) Status: Incomplete = Confirmed -- You received this bug notification because you are a member of Ubuntu Bugs, which is

[Bug 1196295] Re: lxc-start enters uninterruptible sleep

2013-07-02 Thread Pavel Bennett
Attaching a better apport file after installing the missing dependency. I will hide the ones from earlier as this will contain the same data and more. ** Attachment added: apport.lxc.txt https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/1196295/+attachment/3722586/+files/apport.lxc.txt --

[Bug 1196295] [NEW] lxc-start enters uninterruptible sleep

2013-06-30 Thread Pavel Bennett
Public bug reported: After running and terminating around 6000 containers overnight, something happened on my box that is affecting every new LXC container I try to start. The DEBUG log file looks like: lxc-start 1372615570.399 WARN lxc_start - inherited fd 9 lxc-start

[Bug 1196295] Re: lxc-start enters uninterruptible sleep

2013-06-30 Thread Pavel Bennett
Also, in dmesg: [54545.873460] unregister_netdevice: waiting for lo to become free. Usage count = 1 [54556.103535] unregister_netdevice: waiting for lo to become free. Usage count = 1 [54566.333609] unregister_netdevice: waiting for lo to become free. Usage count = 1 [54576.563664]

[Bug 1196295] Re: lxc-start enters uninterruptible sleep

2013-06-30 Thread Pavel Bennett
Some basic environment details. I can post more if requested. Ubuntu Server 13.04 64-bit $ uname -r 3.8.0-25-generic $ dpkg -l | grep lxc ii liblxc00.9.0-0ubuntu3.3 amd64Linux Containers userspace tools (library) ii lxc

[Bug 1196295] [NEW] lxc-start enters uninterruptible sleep

2013-06-30 Thread Pavel Bennett
Public bug reported: After running and terminating around 6000 containers overnight, something happened on my box that is affecting every new LXC container I try to start. The DEBUG log file looks like: lxc-start 1372615570.399 WARN lxc_start - inherited fd 9 lxc-start

[Bug 1196295] Re: lxc-start enters uninterruptible sleep

2013-06-30 Thread Pavel Bennett
Also, in dmesg: [54545.873460] unregister_netdevice: waiting for lo to become free. Usage count = 1 [54556.103535] unregister_netdevice: waiting for lo to become free. Usage count = 1 [54566.333609] unregister_netdevice: waiting for lo to become free. Usage count = 1 [54576.563664]

[Bug 1196295] Re: lxc-start enters uninterruptible sleep

2013-06-30 Thread Pavel Bennett
Some basic environment details. I can post more if requested. Ubuntu Server 13.04 64-bit $ uname -r 3.8.0-25-generic $ dpkg -l | grep lxc ii liblxc00.9.0-0ubuntu3.3 amd64Linux Containers userspace tools (library) ii lxc

[Bug 1168526] Re: race condition causing lxc to not detect container init process exit

2013-06-28 Thread Pavel Bennett
Hey Serge, let me know if that repro worked for you or when you're planning to give it a try. I'm keeping the VM image around in case you need it. What's odd is that I can't even reproduce it with the daily ppa build, which doesn't have the workaround which is in the ubuntu package. Did you

[Bug 1168526] Re: race condition causing lxc to not detect container init process exit

2013-06-28 Thread Pavel Bennett
Hey Serge, let me know if that repro worked for you or when you're planning to give it a try. I'm keeping the VM image around in case you need it. What's odd is that I can't even reproduce it with the daily ppa build, which doesn't have the workaround which is in the ubuntu package. Did you

[Bug 1168526] Re: race condition causing lxc to not detect container init process exit

2013-06-21 Thread Pavel Bennett
I can't try Saucy right now, but the repro instructions with kernel versions are in the original post and in #2. We've tried node v0.11.2 as well on Raring and got the repro. Repro summary: Install any of the above kernels, such as the one with the Raring installer, then install lxc from apt.

[Bug 1168526] Re: race condition causing lxc to not detect container init process exit

2013-06-21 Thread Pavel Bennett
Sure, run these inside the container: git clone https://github.com/joyent/node.git --depth 1 cd node ./configure make -j9 sudo make install Then the binary will be at /usr/local/bin/node It's v0.11.3-pre, but should still repro. -- You received this bug notification because you are a member

[Bug 1168526] Re: race condition causing lxc to not detect container init process exit

2013-06-21 Thread Pavel Bennett
I created a VM with Ubuntu Server 13.04 just for this bug. At first, I was able to run the steps outlined above 50 times with no issues. What was I missing? Concurrency! I rebooted the VM after adding 1 more core, and... bingo! Zombies on the 3rd try. The VM disk image I have here should be

[Bug 1168526] Re: race condition causing lxc to not detect container init process exit

2013-06-21 Thread Pavel Bennett
I can't try Saucy right now, but the repro instructions with kernel versions are in the original post and in #2. We've tried node v0.11.2 as well on Raring and got the repro. Repro summary: Install any of the above kernels, such as the one with the Raring installer, then install lxc from apt.

[Bug 1168526] Re: race condition causing lxc to not detect container init process exit

2013-06-21 Thread Pavel Bennett
Sure, run these inside the container: git clone https://github.com/joyent/node.git --depth 1 cd node ./configure make -j9 sudo make install Then the binary will be at /usr/local/bin/node It's v0.11.3-pre, but should still repro. -- You received this bug notification because you are a member

[Bug 1168526] Re: race condition causing lxc to not detect container init process exit

2013-06-21 Thread Pavel Bennett
I created a VM with Ubuntu Server 13.04 just for this bug. At first, I was able to run the steps outlined above 50 times with no issues. What was I missing? Concurrency! I rebooted the VM after adding 1 more core, and... bingo! Zombies on the 3rd try. The VM disk image I have here should be

[Bug 1168526] Re: race condition causing lxc to not detect container init process exit

2013-06-20 Thread Pavel Bennett
Hey Serge, were you able to get a reliable repro for this? I have a reason to upgrade to Raring, and this seems to be the only blocker. We've reproduced the issue with the stock Linux Mint 15. -- You received this bug notification because you are a member of Ubuntu Server Team, which is

[Bug 1168526] Re: race condition causing lxc to not detect container init process exit

2013-06-20 Thread Pavel Bennett
Hey Serge, were you able to get a reliable repro for this? I have a reason to upgrade to Raring, and this seems to be the only blocker. We've reproduced the issue with the stock Linux Mint 15. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to

[Bug 1168526] Re: race condition causing lxc to not detect container init process exit

2013-04-20 Thread Pavel Bennett
I've also tried it with a C++ app very similar to yours and was unable to repro. There is something about having node.js as the init process running a process.exit(0); js. The init process (node v0.11.0) does exit as ps faux shows it as a zombie and a child of lxc-start. I went back to kernel

[Bug 1168526] Re: race condition causing lxc to not detect container init process exit

2013-04-20 Thread Pavel Bennett
I've also tried it with a C++ app very similar to yours and was unable to repro. There is something about having node.js as the init process running a process.exit(0); js. The init process (node v0.11.0) does exit as ps faux shows it as a zombie and a child of lxc-start. I went back to kernel

[Bug 1168526] Re: race condition causing lxc to not detect container init process exit

2013-04-15 Thread Pavel Bennett
Btw, that queueing mode would simply mean not calling epoll_wait until the pid is available. This shouldn't require managing a queue ourselves. Can you think of anything that this would break? Or we could go with the patch you've written, although I haven't looked into why the problem appears to

[Bug 1168526] Re: race condition causing lxc to not detect container init process exit

2013-04-15 Thread Pavel Bennett
Btw, that queueing mode would simply mean not calling epoll_wait until the pid is available. This shouldn't require managing a queue ourselves. Can you think of anything that this would break? Or we could go with the patch you've written, although I haven't looked into why the problem appears to

[Bug 1168526] Re: race condition causing lxc to not detect container init process exit

2013-04-13 Thread Pavel Bennett
I should add that these forwarded signal 2 lines are due to me pressing Ctrl+C and are not actually relevant. Have you been able to repro this bug on kernel 3.8.6? I'm thinking how to fix this as lxc_spawn is what gets the pid which is needed by lxc_poll to listen for SIGCHLD from the correct

[Bug 1168526] Re: race condition causing lxc to not detect container init process exit

2013-04-13 Thread Pavel Bennett
I should add that these forwarded signal 2 lines are due to me pressing Ctrl+C and are not actually relevant. Have you been able to repro this bug on kernel 3.8.6? I'm thinking how to fix this as lxc_spawn is what gets the pid which is needed by lxc_poll to listen for SIGCHLD from the correct

[Bug 1168526] [NEW] race condition causing lxc to not detect container init process exit

2013-04-12 Thread Pavel Bennett
Public bug reported: For the purpose of the repro, my lxc init process is node.js v0.11.0 (built from source) with a single line: process.exit(0); When running it in lxc, sometimes lxc doesn't exit. lxc-start remains a parent of a defunct node process without reaping it or exiting. I've made a

[Bug 1168526] Re: race condition causing lxc to not detect container init process exit

2013-04-12 Thread Pavel Bennett
Precisely which version of lxc were you using? I just put back version 0.9.0-0ubuntu2 (as opposed to the 0.9.0 I built from source) while on kernel 3.7.9-030709-generic and haven't yet run into this issue (I assume that's the patch you mentioned). However, when I update to kernel

[Bug 1168526] [NEW] race condition causing lxc to not detect container init process exit

2013-04-12 Thread Pavel Bennett
Public bug reported: For the purpose of the repro, my lxc init process is node.js v0.11.0 (built from source) with a single line: process.exit(0); When running it in lxc, sometimes lxc doesn't exit. lxc-start remains a parent of a defunct node process without reaping it or exiting. I've made a

[Bug 1168526] Re: race condition causing lxc to not detect container init process exit

2013-04-12 Thread Pavel Bennett
Precisely which version of lxc were you using? I just put back version 0.9.0-0ubuntu2 (as opposed to the 0.9.0 I built from source) while on kernel 3.7.9-030709-generic and haven't yet run into this issue (I assume that's the patch you mentioned). However, when I update to kernel