davisusanibar commented on a change in pull request #12603: URL: https://github.com/apache/arrow/pull/12603#discussion_r828342652
########## File path: docs/source/java/memory.rst ########## @@ -0,0 +1,174 @@ +.. Licensed to the Apache Software Foundation (ASF) under one +.. or more contributor license agreements. See the NOTICE file +.. distributed with this work for additional information +.. regarding copyright ownership. The ASF licenses this file +.. to you under the Apache License, Version 2.0 (the +.. "License"); you may not use this file except in compliance +.. with the License. You may obtain a copy of the License at + +.. http://www.apache.org/licenses/LICENSE-2.0 + +.. Unless required by applicable law or agreed to in writing, +.. software distributed under the License is distributed on an +.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +.. KIND, either express or implied. See the License for the +.. specific language governing permissions and limitations +.. under the License. + +================= +Memory Management +================= + +.. contents:: + +The memory modules contain all the functionality that Arrow uses to manage memory (allocation and deallocation). +This section will introduce you to the major concepts in Java’s memory management: + +* `BufferAllocator`_ +* `ArrowBuf`_ +* `Reference counting`_ + +Getting Started +=============== + +Arrow's memory management is built around the needs of the columnar format and using off-heap memory. +Also, it is its own independent implementation, and does not wrap the C++ implementation. + +Arrow offers a high level of abstraction providing several access APIs to read/write data into a direct memory. + +Arrow provides multiple modules: the core interfaces, and implementations of the interfaces. +Users need the core interfaces, and exactly one of the implementations. + +* ``Memory Core``: Provides the interfaces used by the Arrow libraries and applications. +* ``Memory Netty``: An implementation of the memory interfaces based on the `Netty`_ library. +* ``Memory Unsafe``: An implementation of the memory interfaces based on the `sun.misc.Unsafe`_ library. + +BufferAllocator +=============== + +The BufferAllocator interface deals with allocating ArrowBufs for the application. + +The concrete implementation of the allocator is RootAllocator. Applications should generally create one RootAllocator at the +start of the program, and use it through the BufferAllocator interface. Allocators have a memory limit. The RootAllocator +sets the program-wide memory limit. The RootAllocator is responsible for being the master bookkeeper for memory allocations. + +Arrow provides a tree-based model for memory allocation. The RootAllocator is created first, then all allocators +are created as children ``BufferAllocator.newChildAllocator`` of that allocator. + +One of the uses of child allocators is to set a lower temporary limit for one section of the code. Also, child +allocators can be named; this makes it easier to tell where an ArrowBuf came from during debugging. + +ArrowBuf +======== + +ArrowBuf represents a single, contiguous allocation of `Direct Memory`_. It consists of an address and a length, +and provides low-level interfaces for working with the contents, similar to ByteBuffer. + +The objects created using ``Direct Memory`` take advantage of native executions and it is decided natively by the JVM. Arrow +offer efficient memory operations base on this Direct Memory implementation (`see section below for detailed reasons of use`). + +Unlike (Direct)ByteBuffer, it has reference counting built in (`see the next section`). + +Reference counting +================== + +Direct memory involve more activities than allocate and deallocate because allocators (thru pool/cache) +allocate buffers (ArrowBuf). + +Arrow uses manual reference counting to track whether a buffer is in use, or can be deallocated or returned +to the allocator's pool. This simply means that each buffer has a counter keeping track of the number of references to +this buffer, and end user is responsible for properly incrementing/decrementing the counter according the buffer is used. + +In Arrow, each ArrowBuf has an associated ReferenceManager that tracks the reference count, which can be retrieved +with ArrowBuf.getReferenceManager(). The reference count can be updated with ``ReferenceManager.release`` and +``ReferenceManager.retain``. + +Of course, this is tedious and error-prone, so usually, instead of directly working with buffers, we should use +higher-level APIs like ValueVector. Such classes generally implement Closeable/AutoCloseable and will automatically +decrement the reference count when closed method. + +.. code-block:: + + |__ A = Allocator + |____ B = IntVector (reference count = 2 ) Review comment: Deleted -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org