So interestingly we are already fairly heavily overcommitted. We have 4GB
of RAM and 4GB of swap available. And cat /proc/meminfo is saying:
CommitLimit:     6214344 kB
Committed_AS:    9764580 kB

John
=:->



On Wed, Jun 3, 2015 at 9:28 AM, Gustavo Niemeyer <gust...@niemeyer.net>
wrote:

> Ah, and you can also suggest increasing the swap. It would not actually be
> used, but the system would be able to commit to the amount of memory
> required, if it really had to.
>  On Jun 3, 2015 1:24 AM, "Gustavo Niemeyer" <gust...@niemeyer.net> wrote:
>
>> Hey John,
>>
>> It's probably an overcommit issue. Even if you don't have the memory in
>> use, cloning it would mean the new process would have a chance to change
>> that memory and thus require real memory pages, which the system obviously
>> cannot give it. You can workaround that by explicitly enabling overcommit,
>> which means the potential to crash late in strange places in the bad case,
>> but would be totally okay for the exec situation.
>> So we're running into this failure mode again at one of our sites.
>>
>> Specifically, the system is running with a reasonable number of nodes
>> (~100) and has been running for a while. It appears that it wanted to
>> restart itself (I don't think it restarted jujud, but I do think it at
>> least restarted a lot of the workers.)
>> Anyway, we have a fair number of things that we "exec" during startup
>> (kvm-ok, restart rsyslog, etc).
>> But when we get into this situation (whatever it actually is) then we
>> can't exec anything and we start getting failures.
>>
>> Now, this *might* be a golang bug.
>>
>> When I was trying to debug it in the past, I created a small program that
>> just allocated big slices of memory (10MB strings, IIRC) and then tried to
>> run "echo hello" until it started failing.
>> IIRC the failure point was when I wasn't using swap and the allocated
>> memory was 50% of total available memory. (I have 8GB of RAM, it would
>> start failing once we had allocated 4GB of strings).
>> When I tried digging into the golang code, it looked like they use
>> clone(2) as the "create a new process for exec" function. And it seemed it
>> wasn't playing nicely with copy-on-write. At least, it appeared that
>> instead of doing a simple copy-on-write clone without allocating any new
>> memory and then exec into a new process, it actually required to have
>> enough RAM available for the new process.
>>
>> On the customer site, though, jujud has a RES size of only 1GB, and they
>> have 4GB of available RAM and swap is enabled (2GB of 4GB swap currently in
>> use).
>>
>> The only workaround I can think of is for us to create a "forker" process
>> right away at startup that we just send RPC requests to run a command for
>> us and return the results. ATM I don't think we do any fork and run
>> interactively such that we need the stdin/stdout file handles inside our
>> process.
>>
>> I'd rather just have golang fork() work even when the current process is
>> using a large amount of RAM.
>>
>> Any of the golang folks know what is going on?
>>
>> John
>> =:->
>>
>>
>> --
>> Juju-dev mailing list
>> Juju-dev@lists.ubuntu.com
>> Modify settings or unsubscribe at:
>> https://lists.ubuntu.com/mailman/listinfo/juju-dev
>>
>>
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Reply via email to