On Wed, Dec 17, 2014 at 8:47 PM, Tim Penhey <tim.pen...@canonical.com>
wrote:

> > 1. Seems that if you happen to have more than... say, 30 machines, Juju
> > starts behaving weirdly until you remove unused machines. One of the
> > weird things is that new deploys all stay stuck with a pending status.
> > That happened at least 4 times, so now I always destroy-environment when
> > testing things just in case. Have anyone else seen this behaviour? Can
> > this because of LXC with Juju local? I do a lot of Juju testing so it's
> > not usual for me to have a couple hundreds of machines after a mont by
> > the way.
>
> I'll answer this one now.  This is due to "not enough file handles".  It
> seems that the LXC containers that get created inherit the handles of
> the parent process, which is the machine agent.  After a certain number
> of machines, and it may be around 30, the new machines start failing to
> recognise the new upstart script because inotify isn't working properly.
> This means the agents don't start, and don't tell the state server they
> are running, which means the machines stay pending even though lxc says
> "yep you're all good".
>
> I'm not sure how big we can make the "limit nofile" in the agent upstart
> script without it causing problems elsewhere.


Hey, that makes a lot of sense. I wonder if you can detect that in advance
and perhaps make Juju tell the sysadmin about the limit being reached (or
nearly reached) then?
-- 
Juju mailing list
Juju@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju

Reply via email to