On Mon, Nov 6, 2017 at 10:26 PM, Rick Payne (Offshore) <ri...@rossfell.co.uk
> wrote:

> > Out of memory: could not reclaim any further. Current memory: 5122256 Kb
> >
> > This suggests there was 5GB free while the allocation failed.
> > This *can* be a fragmentation issue (e.g., you asked for a 1 GB
> allocation, but we couldn't free a 1GB consecutive area), but can also be a
> malloc() of a ridiculous amount. Since commit 
> 7ea953ca7d6533c025e535be49ee5bd2567fc8f3
> a malloc() of over the amount of memory we have prints a different error
> message, but perhaps you still have some very large (but less than 10GB)
> single allocation?
> >
> > The sad thing is that since we fail in the memory reclaimer, not in the
> malloc(), you know which malloc() failed. This is
> https://github.com/cloudius-systems/osv/issues/585.
> > One ad-hoc thing you can try is to connect with gdb, and see which OSv
> thread is waiting in malloc - and see what malloc() it is trying to do.
>
> In an attempt to work around this, I've been trying to get the BEAM vm to
> pre-allocate memory, which it does via mmap.


I am not sure how this will help, as the later malloc() can still fail when
it wants to allocate physically-contiguous memory.

One hack you can try to fix
https://github.com/cloudius-systems/osv/issues/854 and hopefully your issue
is to change in core/mempool.cc, the function std_malloc(), replace the

    } else {
        ret = memory::malloc_large(size, alignment);
    }

by something like (completely untested!)


std::unordered_map<void*, size_t> huge_allocations_size;

    } else if (size > 2 * huge_page_size) {
        ret =  mmap(NULL, size, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
        huge_allocations_size[ret] = size;
    } else {
        ret = memory::malloc_large(size, alignment);
    }

and in the same file mempool.cc, std in free() (note there are several
functions there with this name, look for the one starting with
trace_memory_free(object);),
before the switch() add something like:

   if (object < mmu::phys_mem) {
        // this object was mmapped, not mallocated
        munmap(object, huge_allocations_size.at(object));
        huge_allocates_size.erase(object);
    } else ...

Of course if this works it needs to be replaced by something less ugly,
additional error-handling, etc., if we want to consider making this
permanent.


> However, this didn't help initially as the memory wasn't being populated.
> I altered the mmap calls to include MAP_POPULATE to get them filled at
> startup, and now I get this crash. The debug output is from the erlang
> runtime system's os_mmap function. It seems to turn from the first call to
> mmap for a 2GB chunk, but asserts shortly after that (and the following is
> all I get):
>
> Attempting to mmap 2147483648 bytes to 0
> mmaped 2147483648 bytes to address 200000400000
> Assertion failed: !large() (arch/x64/arch-mmu.hh: next_pt_addr: 82)
>

Unfortunately, I'm not familiar with this complex templated code, only Gleb
is (CC'ed).
*Gleb*, in commit 1b31de0e on of the changes you did was

-inline u64 pt_element_common<N>::next_pt_pfn() const { return pfn(false); }
+inline u64 pt_element_common<N>::next_pt_pfn() const {
+    assert(!large());
+    return pfn();
+}

Can you try to recall why you added this assert here (and in a couple of
other places too). If this assert is really justified, do you have any
guess what sort of bug may cause it to trigger?


> [backtrace]
> 0x00000000002281da <__assert_fail+26>
> 0x0000000000331a35 <???+3349045>
> 0x000000000033da0c <mmu::map_level<mmu::populate<(mmu::account_opt)1>,
> 1>::operator()(mmu::hw_ptep<1>, unsigned long)+76>
> 0x000000000033dc4a <mmu::map_level<mmu::populate<(mmu::account_opt)1>,
> 2>::operator()(mmu::hw_ptep<2>, unsigned long)+314>
> 0x000000000033debc <mmu::map_level<mmu::populate<(mmu::account_opt)1>,
> 3>::operator()(mmu::hw_ptep<3>, unsigned long)+284>
> 0x000000000033e11d <void mmu::map_range<mmu::populate<(mmu::account_opt)1>
> >(unsigned long, unsigned long, unsigned long, 
> >mmu::populate<(mmu::account_opt)1>&,
> unsigned long)+413>
> 0x000000000033ee0f <unsigned long 
> mmu::populate_vma<(mmu::account_opt)1>(mmu::vma*,
> void*, unsigned long, bool)+1231>
> 0x0000000000337521 <mmu::map_anon(void const*, unsigned long, unsigned
> int, unsigned int)+225>
> 0x0000000000459345 <mmap+181>
>
> Any clues?
>
> Cheers,
> Rick
>
>

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to