Ah, and you can also suggest increasing the swap. It would not actually be used, but the system would be able to commit to the amount of memory required, if it really had to. On Jun 3, 2015 1:24 AM, "Gustavo Niemeyer" <gust...@niemeyer.net> wrote:
> Hey John, > > It's probably an overcommit issue. Even if you don't have the memory in > use, cloning it would mean the new process would have a chance to change > that memory and thus require real memory pages, which the system obviously > cannot give it. You can workaround that by explicitly enabling overcommit, > which means the potential to crash late in strange places in the bad case, > but would be totally okay for the exec situation. > So we're running into this failure mode again at one of our sites. > > Specifically, the system is running with a reasonable number of nodes > (~100) and has been running for a while. It appears that it wanted to > restart itself (I don't think it restarted jujud, but I do think it at > least restarted a lot of the workers.) > Anyway, we have a fair number of things that we "exec" during startup > (kvm-ok, restart rsyslog, etc). > But when we get into this situation (whatever it actually is) then we > can't exec anything and we start getting failures. > > Now, this *might* be a golang bug. > > When I was trying to debug it in the past, I created a small program that > just allocated big slices of memory (10MB strings, IIRC) and then tried to > run "echo hello" until it started failing. > IIRC the failure point was when I wasn't using swap and the allocated > memory was 50% of total available memory. (I have 8GB of RAM, it would > start failing once we had allocated 4GB of strings). > When I tried digging into the golang code, it looked like they use > clone(2) as the "create a new process for exec" function. And it seemed it > wasn't playing nicely with copy-on-write. At least, it appeared that > instead of doing a simple copy-on-write clone without allocating any new > memory and then exec into a new process, it actually required to have > enough RAM available for the new process. > > On the customer site, though, jujud has a RES size of only 1GB, and they > have 4GB of available RAM and swap is enabled (2GB of 4GB swap currently in > use). > > The only workaround I can think of is for us to create a "forker" process > right away at startup that we just send RPC requests to run a command for > us and return the results. ATM I don't think we do any fork and run > interactively such that we need the stdin/stdout file handles inside our > process. > > I'd rather just have golang fork() work even when the current process is > using a large amount of RAM. > > Any of the golang folks know what is going on? > > John > =:-> > > > -- > Juju-dev mailing list > Juju-dev@lists.ubuntu.com > Modify settings or unsubscribe at: > https://lists.ubuntu.com/mailman/listinfo/juju-dev > >
-- Juju-dev mailing list Juju-dev@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/juju-dev