Hi Eric,
I'm not an expert on most of this, but hopefully this will add a little bit
of context.

My sense is that the Java implementation is farther along for "real world"
use-cases because it is used by Dremio and Drill in distributed analytics
engines, where they've had to solve find grained memory tracking.  My
understanding is that C++ MemoryPool is meant to ultimately build towards
something similar (hierarchical pools to finely track what is making
allocations and when memory is returned).  There are already MemoryPool
classes that wrap a root-pool, and there are open JIRAs  on extending the
abstraction for other devices (i.e. GPUs) [1] and having an explicit pool
in user space [2]).

On alignment: The Arrow Spec calls for at least 8-byte alignment but
recommends 64-byte alignment precisely for SIMD use-cases.   There is still
an open JIRA item [3] to make Java have 64-byte alignment, so I don't think
Java is handling 64-byte alignment (I don't know about 8-byte alignment,
which might come for free on 64-bit platforms), and I don't believe much
work has been done on the C++ implementation to explicitly exploit the
alignment requirement.

I don't have enough expertise to answer your other two questions.  But FWIW
on #2, the C++ implementation mostly uses share_ptr<Buffer>, to manage most
of its memory so direct calls to Free are fairly uncommon.  Also, there
might be some refactoring of the Java abstractions to support arbitrary
memory instead of relying on Netty [4].

Hope this helps,
Micah

[1] https://issues.apache.org/jira/browse/ARROW-2447
[2] https://issues.apache.org/jira/browse/ARROW-3406
[3] https://issues.apache.org/jira/browse/ARROW-186
[4] https://issues.apache.org/jira/browse/ARROW-3191

On Mon, Mar 18, 2019 at 4:33 PM Eric Erhardt
<eric.erha...@microsoft.com.invalid> wrote:

> We are having a discussion on
> https://github.com/apache/arrow/pull/3925#issuecomment-473605919 about
> the `MemoryPool` class in the C# library.
>
> In reality, the way `MemoryPool` is designed in C#, it is more of a
> "MemoryAllocator" - it just allocates or reallocates memory. There is no
> API for "returning" the memory back into the pool. The memory gets
> deallocated because the finalizer, which is invoked by the garbage
> collector.
>
> I was looking around a bit, and I see the Java library doesn't have a
> MemoryPool, but instead BufferManager and BufferAllocator types. The Java
> library also has `AutoCloseable` (which I assume is analogous to
> IDisposable in .NET) on all the types - ArrowRecordBatch, ArrowBuf,
> IntVector, etc.
>
> Looking into the C++ implementation, I don't really see a "pooling"
> implementation, but instead just an "malloc" and "free" (or using jemalloc,
> if built with it enabled). I also see it is using "aligned" memory. I'm not
> sure how/what handles this on the Java side. I assume the alignment is
> useful for SIMD operations, but is it required?
>
> Go also appears to be using the "Allocator" name instead of a "Pool".
>
> So I'm wondering a few things:
>
>
>   1.  Should we rename the C# "MemoryPool" class to "MemoryAllocator"
> instead? Or is there really an intention in Arrow to have "pooling" of
> memory?
>   2.  Should there be a way to "close" (Dispose() in the .NET
> nomenclature) types that hold memory? Ex. RecordBatch, ArrowArray, etc.
>
> I assume these mechanisms are super useful to the other implementations,
> so I'm trying to keep the C# library designed roughly the same. But I'd
> appreciate some advice.
>
> Eric Erhardt
>

Reply via email to