lidavidm commented on code in PR #15051: URL: https://github.com/apache/arrow/pull/15051#discussion_r1053641396
########## docs/source/java/memory.rst: ########## @@ -284,3 +302,159 @@ Finally, enabling the ``TRACE`` logging level will automatically provide this st .. _`ReferenceManager`: https://arrow.apache.org/docs/java/reference/org/apache/arrow/memory/ReferenceManager.html .. _`ReferenceManager.release`: https://arrow.apache.org/docs/java/reference/org/apache/arrow/memory/ReferenceManager.html#release-- .. _`ReferenceManager.retain`: https://arrow.apache.org/docs/java/reference/org/apache/arrow/memory/ReferenceManager.html#retain-- + +Arrow Memory In-Depth +===================== + +Design Principles +----------------- +Arrow’s memory model is based on the following basic concepts: + +- Memory can be allocated up to some limit. That limit could be a real + limit (OS/JVM) or a locally imposed limit. +- Allocation operates in two phases: accounting then actual allocation. + Allocation could fail at either point. +- Allocation failure should be recoverable. In all cases, the Allocator + infrastructure should expose memory allocation failures (OS or + internal limit-based) as ``OutOfMemoryException``\ s. +- Any allocator can reserve memory when created. This memory shall be + held such that this allocator will always be able to allocate that + amount of memory. +- A particular application component should work to use a local + allocator to understand local memory usage and better debug memory + leaks. +- The same physical memory can be shared by multiple allocators and the + allocator must provide an accounting paradigm for this purpose. + +Reserving Memory +---------------- + +Arrow provides two different ways to reserve memory: + +- BufferAllocator accounting reservations: When a new allocator (other + than the ``RootAllocator``) is initialized, it can set aside memory + that it will keep locally for its lifetime. This is memory that will + never be released back to its parent allocator until the allocator is + closed. +- ``AllocationReservation`` via BufferAllocator.newReservation(): + Allows a short-term preallocation strategy so that a particular + subsystem can ensure future memory is available to support a + particular request. + +Reference Counting Details +-------------------------- + +Typically, the ReferenceManager implementation used is an instance of `BufferLedger`_. +A BufferLedger is a ReferenceManager that also maintains the relationship between an ``AllocationManager``, +a ``BufferAllocator`` and one or more individual ``ArrowBuf``\ s + +All ArrowBufs (direct or sliced) related to a single BufferLedger/BufferAllocator combination +share the same reference count and either all will be valid or all will be invalid. +For simplicity of accounting, we treat that memory as being used by one +of the BufferAllocators associated with the memory. When that allocator +releases its claim on that memory, the memory ownership is then moved to +another BufferLedger belonging to the same AllocationManager. + +Allocation Details +------------------ + +There are several Allocator types in Arrow Java: + +- ``BufferAllocator`` - The public interface application users should be leveraging +- ``BaseAllocator`` - The base implementation of memory allocation, contains the meat of the Arrow allocator implementation +- ``RootAllocator`` - The root allocator. Typically only one created for a JVM. It serves as the parent/ancestor for child allocators +- ``ChildAllocator`` - A child allocator that derives from the root allocator + +Many BufferAllocators can reference the same piece of physical memory at the same +time. It is the AllocationManager’s responsibility to ensure that in this situation, +all memory is accurately accounted for from the Root’s perspective +and also to ensure that the memory is correctly released once all +BufferAllocators have stopped using that memory. + +For simplicity of accounting, we treat that memory as being used by one +of the BufferAllocators associated with the memory. When that allocator +releases its claim on that memory, the memory ownership is then moved to +another BufferLedger belonging to the same AllocationManager. Note that +because a ArrowBuf.release() is what actually causes memory ownership +transfer to occur, we always proceed with ownership transfer (even if +that violates an allocator limit). It is the responsibility of the +application owning a particular allocator to frequently confirm whether +the allocator is over its memory limit (BufferAllocator.isOverLimit()) +and if so, attempt to aggressively release memory to ameliorate the +situation. + + +Object Hierarchy +---------------- + +There are two main ways that someone can look at the object hierarchy +for Arrow’s memory management scheme. The first is a memory based +perspective as below: + +Memory Perspective +~~~~~~~~~~~~~~~~~~ + +.. raw:: html + + <pre> + + AllocationManager + | + |-- UnsignedDirectLittleEndian (One per AllocationManager) + | + |-+ BufferLedger 1 ==> Allocator A (owning) + | ` - ArrowBuf 1 + |-+ BufferLedger 2 ==> Allocator B (non-owning) + | ` - ArrowBuf 2 + |-+ BufferLedger 3 ==> Allocator C (non-owning) + | - ArrowBuf 3 + | - ArrowBuf 4 + ` - ArrowBuf 5 + </pre> Review Comment: nit: I think you can use `.. code-block:: none` and remove the `<pre>` tags. (Not that I expect anyone to render the docs to anything but HTML…) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
