Hey John,

It's probably an overcommit issue. Even if you don't have the memory in
use, cloning it would mean the new process would have a chance to change
that memory and thus require real memory pages, which the system obviously
cannot give it. You can workaround that by explicitly enabling overcommit,
which means the potential to crash late in strange places in the bad case,
but would be totally okay for the exec situation.
So we're running into this failure mode again at one of our sites.

Specifically, the system is running with a reasonable number of nodes
(~100) and has been running for a while. It appears that it wanted to
restart itself (I don't think it restarted jujud, but I do think it at
least restarted a lot of the workers.)
Anyway, we have a fair number of things that we "exec" during startup
(kvm-ok, restart rsyslog, etc).
But when we get into this situation (whatever it actually is) then we can't
exec anything and we start getting failures.

Now, this *might* be a golang bug.

When I was trying to debug it in the past, I created a small program that
just allocated big slices of memory (10MB strings, IIRC) and then tried to
run "echo hello" until it started failing.
IIRC the failure point was when I wasn't using swap and the allocated
memory was 50% of total available memory. (I have 8GB of RAM, it would
start failing once we had allocated 4GB of strings).
When I tried digging into the golang code, it looked like they use clone(2)
as the "create a new process for exec" function. And it seemed it wasn't
playing nicely with copy-on-write. At least, it appeared that instead of
doing a simple copy-on-write clone without allocating any new memory and
then exec into a new process, it actually required to have enough RAM
available for the new process.

On the customer site, though, jujud has a RES size of only 1GB, and they
have 4GB of available RAM and swap is enabled (2GB of 4GB swap currently in
use).

The only workaround I can think of is for us to create a "forker" process
right away at startup that we just send RPC requests to run a command for
us and return the results. ATM I don't think we do any fork and run
interactively such that we need the stdin/stdout file handles inside our
process.

I'd rather just have golang fork() work even when the current process is
using a large amount of RAM.

Any of the golang folks know what is going on?

John
=:->


--
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at:
https://lists.ubuntu.com/mailman/listinfo/juju-dev
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Reply via email to