This thread and the ticket linked by Michael got me curious about whether
we could write our own routine for spawning processes that doesn't invoke
the usual copy-on-write semantics.

The struct returned by exec.Command has a a SysProcAttr field where you can
set (Linux-specific) flags to pass to the clone syscall. The CLONE_VM flag
looks promising but it seems to upset Go when you use it (fatal error:
runtime: stack growth during syscall). If CLONE_VM | CLONE_VFORK is used
the executable runs - I see the output from echo - but the call to Run
never returns. I'm not sure why given that with CLONE_VFORK the parent
process is supposed to unblock once the child calls execve (which it does).

I could poke at this further but I need to get back to other things.
Looking at https://github.com/golang/go/issues/5838 I'm not the only one
who's tried this and run into similar problems.

Another approach - and one that Go might do internally one day - is to tell
the kernel to not allow copy-on-write for all (or at least large) memory
blocks. Tweaking the allocations in Nate's example like this:

    bigs := make([][]byte, 6)
    for i := range bigs {
        bigs[i] = make([]byte, GB)
        syscall.Madvise(bigs[i], syscall.MADV_DONTFORK)
    }

Allows the fork to work. It fails as before without the Madvise calls. This
isn't particularly practical for us but it's an interesting data point
anyway.

- Menno




On 4 June 2015 at 02:07, John Meinel <j...@arbash-meinel.com> wrote:

> Yeah, I'm pretty sure this machine is on "0" and we've just overcommitted
> enough that Linux is refusing to overcommit more. I'm pretty sure juju was
> at least at 2GB of pages, where 1G was in RAM and 1GB was in swap. And if
> we've already overcommitted to 9.7GB over 6.2GB linux probably decided that
> another 2GB was "obvious overcommits" that it would refuse.
>
> John
> =:->
>
>
> On Wed, Jun 3, 2015 at 5:32 PM, Gustavo Niemeyer <gust...@niemeyer.net>
> wrote:
>
>> From https://www.kernel.org/doc/Documentation/vm/overcommit-accounting:
>>
>> The Linux kernel supports the following overcommit handling modes
>>
>> 0    -       Heuristic overcommit handling. Obvious overcommits of
>>              address space are refused. Used for a typical system. It
>>              ensures a seriously wild allocation fails while allowing
>>              overcommit to reduce swap usage.  root is allowed to
>>              allocate slightly more memory in this mode. This is the
>>              default.
>>
>> 1    -       Always overcommit. Appropriate for some scientific
>>              applications. Classic example is code using sparse arrays
>>              and just relying on the virtual memory consisting almost
>>              entirely of zero pages.
>>
>> 2    -       Don't overcommit. The total address space commit
>>              for the system is not permitted to exceed swap + a
>>              configurable amount (default is 50%) of physical RAM.
>>              Depending on the amount you use, in most situations
>>              this means a process will not be killed while accessing
>>              pages but will receive errors on memory allocation as
>>              appropriate.
>>
>>              Useful for applications that want to guarantee their
>>              memory allocations will be available in the future
>>              without having to initialize every page.
>>
>>
>> On Wed, Jun 3, 2015 at 7:40 AM, John Meinel <j...@arbash-meinel.com>
>> wrote:
>>
>>> So interestingly we are already fairly heavily overcommitted. We have
>>> 4GB of RAM and 4GB of swap available. And cat /proc/meminfo is saying:
>>> CommitLimit:     6214344 kB
>>> Committed_AS:    9764580 kB
>>>
>>> John
>>> =:->
>>>
>>>
>>>
>>> On Wed, Jun 3, 2015 at 9:28 AM, Gustavo Niemeyer <gust...@niemeyer.net>
>>> wrote:
>>>
>>>> Ah, and you can also suggest increasing the swap. It would not actually
>>>> be used, but the system would be able to commit to the amount of memory
>>>> required, if it really had to.
>>>>  On Jun 3, 2015 1:24 AM, "Gustavo Niemeyer" <gust...@niemeyer.net>
>>>> wrote:
>>>>
>>>>> Hey John,
>>>>>
>>>>> It's probably an overcommit issue. Even if you don't have the memory
>>>>> in use, cloning it would mean the new process would have a chance to 
>>>>> change
>>>>> that memory and thus require real memory pages, which the system obviously
>>>>> cannot give it. You can workaround that by explicitly enabling overcommit,
>>>>> which means the potential to crash late in strange places in the bad case,
>>>>> but would be totally okay for the exec situation.
>>>>> So we're running into this failure mode again at one of our sites.
>>>>>
>>>>> Specifically, the system is running with a reasonable number of nodes
>>>>> (~100) and has been running for a while. It appears that it wanted to
>>>>> restart itself (I don't think it restarted jujud, but I do think it at
>>>>> least restarted a lot of the workers.)
>>>>> Anyway, we have a fair number of things that we "exec" during startup
>>>>> (kvm-ok, restart rsyslog, etc).
>>>>> But when we get into this situation (whatever it actually is) then we
>>>>> can't exec anything and we start getting failures.
>>>>>
>>>>> Now, this *might* be a golang bug.
>>>>>
>>>>> When I was trying to debug it in the past, I created a small program
>>>>> that just allocated big slices of memory (10MB strings, IIRC) and then
>>>>> tried to run "echo hello" until it started failing.
>>>>> IIRC the failure point was when I wasn't using swap and the allocated
>>>>> memory was 50% of total available memory. (I have 8GB of RAM, it would
>>>>> start failing once we had allocated 4GB of strings).
>>>>> When I tried digging into the golang code, it looked like they use
>>>>> clone(2) as the "create a new process for exec" function. And it seemed it
>>>>> wasn't playing nicely with copy-on-write. At least, it appeared that
>>>>> instead of doing a simple copy-on-write clone without allocating any new
>>>>> memory and then exec into a new process, it actually required to have
>>>>> enough RAM available for the new process.
>>>>>
>>>>> On the customer site, though, jujud has a RES size of only 1GB, and
>>>>> they have 4GB of available RAM and swap is enabled (2GB of 4GB swap
>>>>> currently in use).
>>>>>
>>>>> The only workaround I can think of is for us to create a "forker"
>>>>> process right away at startup that we just send RPC requests to run a
>>>>> command for us and return the results. ATM I don't think we do any fork 
>>>>> and
>>>>> run interactively such that we need the stdin/stdout file handles inside
>>>>> our process.
>>>>>
>>>>> I'd rather just have golang fork() work even when the current process
>>>>> is using a large amount of RAM.
>>>>>
>>>>> Any of the golang folks know what is going on?
>>>>>
>>>>> John
>>>>> =:->
>>>>>
>>>>>
>>>>> --
>>>>> Juju-dev mailing list
>>>>> Juju-dev@lists.ubuntu.com
>>>>> Modify settings or unsubscribe at:
>>>>> https://lists.ubuntu.com/mailman/listinfo/juju-dev
>>>>>
>>>>>
>>>
>>
>>
>> --
>>
>> gustavo @ http://niemeyer.net
>>
>
>
> --
> Juju-dev mailing list
> Juju-dev@lists.ubuntu.com
> Modify settings or unsubscribe at:
> https://lists.ubuntu.com/mailman/listinfo/juju-dev
>
>
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Reply via email to