Hello Rémi,

under the hood jemalloc does quite similar things to what you describe. I'm not 
sure what the offset is in the current version but in earlier releases, it used 
a different allocation strategy for objects above 4MB. For the initial large 
allocation, you will see quite some copies as mmap is returning a new base 
address and it isn't able to reuse an existing space. This could probably be 
circumvented by a single large allocation which is free'd again. 

As your suggestions don't seem to be specific to Arrow, why not contribute them 
directly to jemalloc? They are much better in reviewing allocator code than we 
are.

Still, when we read a column, we should be able to determine its final size 
from the Parquet metadata. Maybe we're passing an information there not along?

Best,
Uwe

On Thu, Jun 4, 2020, at 5:48 PM, Rémi Dettai wrote:
> When creating large arrays, Arrow uses realloc quite intensively.
> 
> I have an example where y read a gzipped parquet column (strings) that
> expands from 8MB to 100+MB when loaded into Arrow. Of course Jemalloc
> cannot anticipate this and every reallocate call above 1MB (the most
> critical ones) ends up being a copy.
> 
> I think that knowing that we like using realloc in Arrow, we could come up
> with an allocator for large objects that would behave a lot better than
> Jemalloc. For smaller objects, this allocator could just let the memory
> request being handled by Jemalloc. Not trying to outsmart the brilliant
> guys from Facebook and co ;-) But for larger objects, we could adopt a
> custom strategy:
> - if an allocation or a re-allocation larger than 1MB (or maybe even 512K)
> is made on our memory pool, call mmap with size XGB (X being slightly
> smaller than the total physical memory on the system). This is ok because
> mmap will not physically allocate this memory as long as it is not touched.
> - we keep track of all allocations that we created this way, by storing the
> pointer + the actual used size inside our XGB alloc in a map.
> - when growing an alloc mmaped this way we will always have contiguous
> memory available, (otherwise we would already have OOMed because X is the
> physical memory size).
> - when reducing the alloc size we can free with madvice (optional: if the
> alloc becomes small enough, we might copy it back into a Jemalloc
> allocation).
> 
> I am not an expert of these matters, and I just learned what an allocator
> really is, so my approach might be naive. In this case feel free ton
> enlighten me!
> 
> Please note that I'm not sure about the level of portability of this
> solution.
> 
> Have a nice day!
> 
> Remi
>

Reply via email to