On Apr 22, 2016, at 10:22 PM, Daniel Mewes <[email protected]> wrote:
> The reason for the failing `munmap` appears to be that we hit the kernel's 
> `max_map_count` limit.
> 
> I can reproduce the issue very quickly by reducing the limit through `echo 
> 16000 > /proc/sys/vm/max_map_count`, and it disappears in our tests when 
> increasing it to something like `echo 131060 > /proc/sys/vm/max_map_count`. 
> The default value is 65530 I believe.
> 
> We used to see this behavior in jemalloc 2.x, but didn't see it in 3.x 
> anymore. It now re-appeared somewhere between 3.6 and 4.1.

Version 4 switched to per arena management of huge allocations, and along with 
that completely independent trees of cached chunks.  For many workloads this 
means increased virtual memory usage, since cached chunks can't migrate among 
arenas.  I have plans to reduce the impact somewhat by decreasing the number of 
arenas by 4X, but the independence of arenas' mappings has numerous advantages 
that I plan to leverage more over time.

> Do you think the allocator should handle reaching the map_count limit and 
> somehow deal with it gracefully (if that's even possible)? Or should we just 
> advise our users to raise the kernel limit, or alternatively try to change 
> RethinkDB's allocation patterns to avoid hitting it?

I'm surprised you're hitting this, because the normal mode of operation is for 
jemalloc's chunk allocation to get almost all contiguous mappings, which means 
very few distinct kernel VM map entries.  Is it possible that RethinkDB is 
routinely calling mmap() and interspersing mappings that are not a multiple of 
the chunk size?  One would hope that the kernel could densely pack such small 
mappings in the existing gaps between jemalloc's chunks, but unfortunately 
Linux uses fragile heuristics to find available virtual memory (the exact 
problem that --disable-munmap works around).

To your question about making jemalloc gracefully deal with munmap() failure, 
it seems likely that mmap() is in imminent danger of failing under these 
conditions, so there's not much that can be done.  In fact, jemalloc only 
aborts if the abort option is set to true (the default for debug builds), so 
the error message jemalloc is printing probably doesn't directly correspond to 
a crash.

As a workaround, you could substantially increase the chunk size (e.g. 
MALLOC_CONF=lg_chunk:30), but better would be to diagnose and address whatever 
is causing the terrible VM map fragmentation.

Thanks,
Jason
_______________________________________________
jemalloc-discuss mailing list
[email protected]
http://www.canonware.com/mailman/listinfo/jemalloc-discuss

Reply via email to