davisusanibar commented on a change in pull request #12603:
URL: https://github.com/apache/arrow/pull/12603#discussion_r828341417



##########
File path: docs/source/java/memory.rst
##########
@@ -0,0 +1,174 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements.  See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership.  The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License.  You may obtain a copy of the License at
+
+..   http://www.apache.org/licenses/LICENSE-2.0
+
+.. Unless required by applicable law or agreed to in writing,
+.. software distributed under the License is distributed on an
+.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+.. KIND, either express or implied.  See the License for the
+.. specific language governing permissions and limitations
+.. under the License.
+
+=================
+Memory Management
+=================
+
+.. contents::
+
+The memory modules contain all the functionality that Arrow uses to manage 
memory (allocation and deallocation).
+This section will introduce you to the major concepts in Java’s memory 
management:
+
+* `BufferAllocator`_
+* `ArrowBuf`_
+* `Reference counting`_
+
+Getting Started
+===============
+
+Arrow's memory management is built around the needs of the columnar format and 
using off-heap memory.
+Also, it is its own independent implementation, and does not wrap the C++ 
implementation.
+
+Arrow offers a high level of abstraction providing several access APIs to 
read/write data into a direct memory.
+
+Arrow provides multiple modules: the core interfaces, and implementations of 
the interfaces.
+Users need the core interfaces, and exactly one of the implementations.
+
+* ``Memory Core``: Provides the interfaces used by the Arrow libraries and 
applications.
+* ``Memory Netty``: An implementation of the memory interfaces based on the 
`Netty`_ library.
+* ``Memory Unsafe``: An implementation of the memory interfaces based on the 
`sun.misc.Unsafe`_ library.
+
+BufferAllocator
+===============
+
+The BufferAllocator interface deals with allocating ArrowBufs for the 
application.
+
+The concrete implementation of the allocator is RootAllocator. Applications 
should generally create one RootAllocator at the
+start of the program, and use it through the BufferAllocator interface. 
Allocators have a memory limit. The RootAllocator
+sets the program-wide memory limit. The RootAllocator is responsible for being 
the master bookkeeper for memory allocations.
+
+Arrow provides a tree-based model for memory allocation. The RootAllocator is 
created first, then all allocators
+are created as children ``BufferAllocator.newChildAllocator`` of that 
allocator.
+
+One of the uses of child allocators is to set a lower temporary limit for one 
section of the code. Also, child
+allocators can be named; this makes it easier to tell where an ArrowBuf came 
from during debugging.
+
+ArrowBuf
+========
+
+ArrowBuf represents a single, contiguous allocation of `Direct Memory`_. It 
consists of an address and a length,
+and provides low-level interfaces for working with the contents, similar to 
ByteBuffer.
+
+The objects created using ``Direct Memory`` take advantage of native 
executions and it is decided natively by the JVM. Arrow
+offer efficient memory operations base on this Direct Memory implementation 
(`see section below for detailed reasons of use`).
+
+Unlike (Direct)ByteBuffer, it has reference counting built in (`see the next 
section`).
+
+Reference counting
+==================
+
+Direct memory involve more activities than allocate and deallocate because 
allocators (thru pool/cache)
+allocate buffers (ArrowBuf).
+
+Arrow uses manual reference counting to track whether a buffer is in use, or 
can be deallocated or returned
+to the allocator's pool. This simply means that each buffer has a counter 
keeping track of the number of references to
+this buffer, and end user is responsible for properly 
incrementing/decrementing the counter according the buffer is used.
+
+In Arrow, each ArrowBuf has an associated ReferenceManager that tracks the 
reference count, which can be retrieved
+with ArrowBuf.getReferenceManager(). The reference count can be updated with 
``ReferenceManager.release`` and
+``ReferenceManager.retain``.
+
+Of course, this is tedious and error-prone, so usually, instead of directly 
working with buffers, we should use
+higher-level APIs like ValueVector. Such classes generally implement 
Closeable/AutoCloseable and will automatically
+decrement the reference count when closed method.
+
+.. code-block::
+
+    |__ A = Allocator
+    |____ B = IntVector (reference count = 2 )
+    |____________ ValidityBuffer
+    |____________ ValueBuffer
+    |____ C = VarcharVector (reference count = 2 )
+    |____________ ValidityBuffer
+    |____________ ValueBuffer
+
+Allocators implement AutoCloseable as well. In this case, closing the 
allocator will check that all buffers
+obtained from the allocator are closed. If not, ``close()`` method will raise 
an exception; this helps track
+memory leaks from unclosed buffers.
+
+As you see reference counting needs to be handled properly by us, if at some 
point you need to ensuring that an
+independent section of code has `fully cleaned up all allocated buffers while 
still maintaining a global memory limit
+through the RootAllocator`, well ``BufferAllocator.newChildAllocator`` is what 
you should use.
+
+Reason To Use Direct Memory
+===========================
+
+* When `writing an ArrowBuf`_ we use the direct buffer (``nioBuffer()`` 
returns a DirectByteBuffer) and the JVM `will attempt to avoid copying the 
buffer's content to (or from) an intermediate buffer`_ so it makes I/O (and 
hence IPC) faster.

Review comment:
       Thanks, added

##########
File path: docs/source/java/memory.rst
##########
@@ -0,0 +1,174 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements.  See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership.  The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License.  You may obtain a copy of the License at
+
+..   http://www.apache.org/licenses/LICENSE-2.0
+
+.. Unless required by applicable law or agreed to in writing,
+.. software distributed under the License is distributed on an
+.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+.. KIND, either express or implied.  See the License for the
+.. specific language governing permissions and limitations
+.. under the License.
+
+=================
+Memory Management
+=================
+
+.. contents::
+
+The memory modules contain all the functionality that Arrow uses to manage 
memory (allocation and deallocation).
+This section will introduce you to the major concepts in Java’s memory 
management:
+
+* `BufferAllocator`_
+* `ArrowBuf`_
+* `Reference counting`_
+
+Getting Started
+===============
+
+Arrow's memory management is built around the needs of the columnar format and 
using off-heap memory.
+Also, it is its own independent implementation, and does not wrap the C++ 
implementation.
+
+Arrow offers a high level of abstraction providing several access APIs to 
read/write data into a direct memory.
+
+Arrow provides multiple modules: the core interfaces, and implementations of 
the interfaces.
+Users need the core interfaces, and exactly one of the implementations.
+
+* ``Memory Core``: Provides the interfaces used by the Arrow libraries and 
applications.
+* ``Memory Netty``: An implementation of the memory interfaces based on the 
`Netty`_ library.
+* ``Memory Unsafe``: An implementation of the memory interfaces based on the 
`sun.misc.Unsafe`_ library.
+
+BufferAllocator
+===============
+
+The BufferAllocator interface deals with allocating ArrowBufs for the 
application.
+
+The concrete implementation of the allocator is RootAllocator. Applications 
should generally create one RootAllocator at the
+start of the program, and use it through the BufferAllocator interface. 
Allocators have a memory limit. The RootAllocator
+sets the program-wide memory limit. The RootAllocator is responsible for being 
the master bookkeeper for memory allocations.
+
+Arrow provides a tree-based model for memory allocation. The RootAllocator is 
created first, then all allocators
+are created as children ``BufferAllocator.newChildAllocator`` of that 
allocator.
+
+One of the uses of child allocators is to set a lower temporary limit for one 
section of the code. Also, child
+allocators can be named; this makes it easier to tell where an ArrowBuf came 
from during debugging.
+
+ArrowBuf
+========
+
+ArrowBuf represents a single, contiguous allocation of `Direct Memory`_. It 
consists of an address and a length,
+and provides low-level interfaces for working with the contents, similar to 
ByteBuffer.
+
+The objects created using ``Direct Memory`` take advantage of native 
executions and it is decided natively by the JVM. Arrow
+offer efficient memory operations base on this Direct Memory implementation 
(`see section below for detailed reasons of use`).
+
+Unlike (Direct)ByteBuffer, it has reference counting built in (`see the next 
section`).
+
+Reference counting
+==================
+
+Direct memory involve more activities than allocate and deallocate because 
allocators (thru pool/cache)
+allocate buffers (ArrowBuf).
+
+Arrow uses manual reference counting to track whether a buffer is in use, or 
can be deallocated or returned
+to the allocator's pool. This simply means that each buffer has a counter 
keeping track of the number of references to
+this buffer, and end user is responsible for properly 
incrementing/decrementing the counter according the buffer is used.
+
+In Arrow, each ArrowBuf has an associated ReferenceManager that tracks the 
reference count, which can be retrieved
+with ArrowBuf.getReferenceManager(). The reference count can be updated with 
``ReferenceManager.release`` and
+``ReferenceManager.retain``.
+
+Of course, this is tedious and error-prone, so usually, instead of directly 
working with buffers, we should use
+higher-level APIs like ValueVector. Such classes generally implement 
Closeable/AutoCloseable and will automatically
+decrement the reference count when closed method.
+
+.. code-block::
+
+    |__ A = Allocator
+    |____ B = IntVector (reference count = 2 )
+    |____________ ValidityBuffer
+    |____________ ValueBuffer
+    |____ C = VarcharVector (reference count = 2 )
+    |____________ ValidityBuffer
+    |____________ ValueBuffer
+
+Allocators implement AutoCloseable as well. In this case, closing the 
allocator will check that all buffers
+obtained from the allocator are closed. If not, ``close()`` method will raise 
an exception; this helps track
+memory leaks from unclosed buffers.
+
+As you see reference counting needs to be handled properly by us, if at some 
point you need to ensuring that an
+independent section of code has `fully cleaned up all allocated buffers while 
still maintaining a global memory limit
+through the RootAllocator`, well ``BufferAllocator.newChildAllocator`` is what 
you should use.
+
+Reason To Use Direct Memory
+===========================
+
+* When `writing an ArrowBuf`_ we use the direct buffer (``nioBuffer()`` 
returns a DirectByteBuffer) and the JVM `will attempt to avoid copying the 
buffer's content to (or from) an intermediate buffer`_ so it makes I/O (and 
hence IPC) faster.
+* We can `directly wrap a native memory address`_ instead of having to copy 
data for JNI (where in implementing the C Data Interface we can directly create 
`Java ArrowBufs that directly correspond to the C pointers`_).
+* Conversely in JNI, we can directly use `Java ArrowBufs in C++`_ without 
having to copy data.

Review comment:
       Updated




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to