On Wed, Dec 17, 2014 at 8:47 PM, Tim Penhey <tim.pen...@canonical.com> wrote:
> > 1. Seems that if you happen to have more than... say, 30 machines, Juju > > starts behaving weirdly until you remove unused machines. One of the > > weird things is that new deploys all stay stuck with a pending status. > > That happened at least 4 times, so now I always destroy-environment when > > testing things just in case. Have anyone else seen this behaviour? Can > > this because of LXC with Juju local? I do a lot of Juju testing so it's > > not usual for me to have a couple hundreds of machines after a mont by > > the way. > > I'll answer this one now. This is due to "not enough file handles". It > seems that the LXC containers that get created inherit the handles of > the parent process, which is the machine agent. After a certain number > of machines, and it may be around 30, the new machines start failing to > recognise the new upstart script because inotify isn't working properly. > This means the agents don't start, and don't tell the state server they > are running, which means the machines stay pending even though lxc says > "yep you're all good". > > I'm not sure how big we can make the "limit nofile" in the agent upstart > script without it causing problems elsewhere. Hey, that makes a lot of sense. I wonder if you can detect that in advance and perhaps make Juju tell the sysadmin about the limit being reached (or nearly reached) then?
-- Juju mailing list Juju@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/juju