So interestingly we are already fairly heavily overcommitted. We have 4GB of RAM and 4GB of swap available. And cat /proc/meminfo is saying: CommitLimit: 6214344 kB Committed_AS: 9764580 kB
John =:-> On Wed, Jun 3, 2015 at 9:28 AM, Gustavo Niemeyer <gust...@niemeyer.net> wrote: > Ah, and you can also suggest increasing the swap. It would not actually be > used, but the system would be able to commit to the amount of memory > required, if it really had to. > On Jun 3, 2015 1:24 AM, "Gustavo Niemeyer" <gust...@niemeyer.net> wrote: > >> Hey John, >> >> It's probably an overcommit issue. Even if you don't have the memory in >> use, cloning it would mean the new process would have a chance to change >> that memory and thus require real memory pages, which the system obviously >> cannot give it. You can workaround that by explicitly enabling overcommit, >> which means the potential to crash late in strange places in the bad case, >> but would be totally okay for the exec situation. >> So we're running into this failure mode again at one of our sites. >> >> Specifically, the system is running with a reasonable number of nodes >> (~100) and has been running for a while. It appears that it wanted to >> restart itself (I don't think it restarted jujud, but I do think it at >> least restarted a lot of the workers.) >> Anyway, we have a fair number of things that we "exec" during startup >> (kvm-ok, restart rsyslog, etc). >> But when we get into this situation (whatever it actually is) then we >> can't exec anything and we start getting failures. >> >> Now, this *might* be a golang bug. >> >> When I was trying to debug it in the past, I created a small program that >> just allocated big slices of memory (10MB strings, IIRC) and then tried to >> run "echo hello" until it started failing. >> IIRC the failure point was when I wasn't using swap and the allocated >> memory was 50% of total available memory. (I have 8GB of RAM, it would >> start failing once we had allocated 4GB of strings). >> When I tried digging into the golang code, it looked like they use >> clone(2) as the "create a new process for exec" function. And it seemed it >> wasn't playing nicely with copy-on-write. At least, it appeared that >> instead of doing a simple copy-on-write clone without allocating any new >> memory and then exec into a new process, it actually required to have >> enough RAM available for the new process. >> >> On the customer site, though, jujud has a RES size of only 1GB, and they >> have 4GB of available RAM and swap is enabled (2GB of 4GB swap currently in >> use). >> >> The only workaround I can think of is for us to create a "forker" process >> right away at startup that we just send RPC requests to run a command for >> us and return the results. ATM I don't think we do any fork and run >> interactively such that we need the stdin/stdout file handles inside our >> process. >> >> I'd rather just have golang fork() work even when the current process is >> using a large amount of RAM. >> >> Any of the golang folks know what is going on? >> >> John >> =:-> >> >> >> -- >> Juju-dev mailing list >> Juju-dev@lists.ubuntu.com >> Modify settings or unsubscribe at: >> https://lists.ubuntu.com/mailman/listinfo/juju-dev >> >>
-- Juju-dev mailing list Juju-dev@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/juju-dev