On May 8, 2014, at 12:04 PM, Jason Evans <[email protected]> wrote:

> On May 8, 2014, at 9:00 AM, D'Alessandro, Luke K <[email protected]> wrote:
>> I’m in the market for a good concurrent allocator to manage a memory region 
>> corresponding to pinned network memory for a multithreaded and distributed 
>> HPC application. Basically, I’m going to want to do RDMA to objects that are 
>> often malloced and freed. The pinning operation is expensive so it is 
>> important to amortize it over lots of uses. I’ve written a simple 
>> thread-local caching allocator that allows me to pin contiguous blocks when 
>> they’re first allocated, and then just use TLS free listing to reuse space, 
>> however I don’t really have the resources needed to implement this in a 
>> robust way.
>> 
>> Is there any natural way to do this in jemalloc at this time? My gut feeling 
>> is that there isn’t, explicitly specifying an arena breaks its caching and 
>> there’s not an obvious way to register a callback to run on internal block 
>> allocation and freeing (where I could pin/unpin the underlying memory).
>> 
>> If jemalloc doesn’t really support this use case, does anyone know of an 
>> efficient, scalable, robust allocator that does?
> 
> This pending change may be relevant to your needs:
> 
>       https://github.com/jemalloc/jemalloc/pull/80
> 
> I’m imagining that you would implement a custom chunk allocator that pins 
> entire chunks, and then specifically use that arena for allocations that you 
> require to be pinned.  This approach has some shortcomings, but perhaps they 
> don’t matter to your specific application.

Thanks Jason,

This patch appears to address half the battle, though I’m not 100% sure how to 
implement the chunk allocator without calling back into jemalloc recursively. I 
guess that I either use mmap() directly or jemalloc.h has a way to get “raw” 
memory already. Although chunk_alloc_core doesn’t seem like a name that’s going 
to be exposed, so maybe mmap() is the way to go—not a big deal.

Based on the proposed patch, it looks like MALLOCX_ARENA(a) /will/ have an 
effect for huge regions for both huge_palloc() and huge_dalloc() as well, which 
is exactly what I need.

The caching is an issue. Certain applications have threads that churn through 
this memory at nearly the rate they do function calls, and they (hopefully) use 
it with high temporal locality. It’s also used for inter-thread communication 
sometimes, in addition to its role in distributed communication. It may be 
enough though to use one arena per thread for pinned memory, which might cause 
problems for deallocation if it’s often remote. As always, no way to know if 
this will work without trying though.

Do you have any sense of the likelihood that this patch will be accepted going 
forward?

Thanks,
Luke

_______________________________________________
jemalloc-discuss mailing list
[email protected]
http://www.canonware.com/mailman/listinfo/jemalloc-discuss

Reply via email to