[ 
https://issues.apache.org/jira/browse/FLINK-1320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14269201#comment-14269201
 ] 

ASF GitHub Bot commented on FLINK-1320:
---------------------------------------

GitHub user mxm opened a pull request:

    https://github.com/apache/flink/pull/290

    [FLINK-1320] Add an off-heap variant of the managed memory

    The MemorySegment class has been converted into an abstract class. Its old 
JVM
    heap implementation can now be found in HeapMemorySegment. In addition, an
    implementation which uses direct (outside the JVM heap) memory allocation 
can be
    found in DirectMemorySegment. Both of the classes use the sun.misc.Unsafe 
class
    which modifies the memory directly. This method is unsafe in the sense that 
any
    incorrectly written bytes may crash the JVM. By default, both classes 
perform
    boundary checks when writing to the memory.
    
    The DefaultMemoryManager has been renamed to HeapMemoryManager. In 
addition, a
    DirectMemoryManager has been added. The classes' main difference is the 
queue
    freeSegments which holds the memory segments. In the HeapMemoryManager, the
    queue holds byte arrays while in the DirectMemoryManager, the queue holds
    ByteBuffers.
    
    The direct (off-heap) memory management is enabled for the task manager 
when the
    config entry "taskmanager.memory.directAllocation" is set to "true". Like 
for
    the heap memory management, if "taskmanager.memory.size" is set to a value
    greater 0, the amount of memory in mega bytes will be allocated for the 
memory
    allocation. Otherwise, a fraction (0.7) of the task manager JVM heap will be
    used to determine the amount of memory to allocate. As of now, the user has 
to
    take care to properly adjust the task manager's heap memory size (as 
configured
    in "taskmanager.heap.mb") when using direct (off-heap) memory allocation.
    
    The tests for all classes have been changed to test both classes.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/mxm/flink off_heap_rebased

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/290.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #290
    
----
commit 9d711ccd1bbd581845e3b690cf1cc5d8dfcfa603
Author: Stephan Ewen <[email protected]>
Date:   2014-12-16T11:27:24Z

    [FLINK-1320] Mockup draft for off heap memory segments

commit a44309373da2449c483463cf1f093667e3ccfb79
Author: Max <[email protected]>
Date:   2014-12-18T12:27:08Z

    [FLINK-1320] rename DefaultMemoryManager to HeapMemoryManager, create 
DirectMemoryManager, add default config entry

commit 9e64df34ff6b01f3bea17e3ffc0bbafc2dc769e6
Author: Max <[email protected]>
Date:   2014-12-19T10:49:22Z

    [FLINK-1320] add compare and swapBytes methods into abstract class 
MemorySegment, implement swapBytes method differently without supplying a 
swapping buffer, make HeapMemorySegement the default in all tests

commit 3a447e2afb5644bd294874f144b708380ec1b7b0
Author: Max <[email protected]>
Date:   2014-12-19T15:00:59Z

    [FLINK-1320] set default byte encoding to big endian

commit cbf8e0057975cba6a8fa3b6c71aca612a5f8865f
Author: Max <[email protected]>
Date:   2014-12-21T22:46:07Z

    [FLINK-1320] rename DefaultMemoryManager to HeapMemoryManager

commit 4e7a5407a53ac9bf9552005b48998161f8285e25
Author: Max <[email protected]>
Date:   2014-12-22T19:33:41Z

    [FLINK-1320] configurable switch for heap and direct (off-heap)  memory 
allocation

commit ff3432c61295210f1081ffb02738dfdc2b1bc2bc
Author: Max <[email protected]>
Date:   2015-01-05T13:03:22Z

    [FLINK-1320] add default parameter for local execution, rename test suite

commit 021f094124cf8a057e055dba9e728f446aa6ea54
Author: Max <[email protected]>
Date:   2015-01-06T18:31:03Z

    [FLINK-1320] add documentation and correct code formatting

commit df36f7abfcc53829ea1d97f18f6491ab200356b4
Author: Max <[email protected]>
Date:   2015-01-07T10:46:28Z

    [FLINK-1320] fix isFreed() method to return correct status

commit ca4614b25fa734fdf468b0ec26aabf29d3d890fe
Author: Max <[email protected]>
Date:   2015-01-07T10:47:25Z

    [FLINK-1320] fix boundary checks in for get and put

commit 2783a807af3f14c40bb733038688e64ab350aa0e
Author: Max <[email protected]>
Date:   2015-01-07T11:32:00Z

    [FLINK-1320] rename HeapMemorySegmentTest to MemorySegmentTest
    
    this class should test both implementations of MemorySegment

commit bdad3318a502c7d00dd56afae8273635cb96c813
Author: Max <[email protected]>
Date:   2015-01-07T14:21:08Z

    [FLINK-1320] adapt MemorySegmentTest to run tests with both heap and direct 
memory segments

commit 02c27b0e59f9b64203ae94a0c1a0b750dcc7303d
Author: Max <[email protected]>
Date:   2015-01-07T14:52:29Z

    [FLINK-1320] rename HeapMemoryManagerTest to MemoryManagerTest

commit 4d319565542973ece1a4796d14f9437ef35bb785
Author: Max <[email protected]>
Date:   2015-01-07T16:40:57Z

    [FLINK-1320] adapt MemoryManagerTest to run tests with both heap and direct 
memory managers

----


> Add an off-heap variant of the managed memory
> ---------------------------------------------
>
>                 Key: FLINK-1320
>                 URL: https://issues.apache.org/jira/browse/FLINK-1320
>             Project: Flink
>          Issue Type: Improvement
>          Components: Local Runtime
>            Reporter: Stephan Ewen
>            Priority: Minor
>
> For (nearly) all memory that Flink accumulates (in the form of sort buffers, 
> hash tables, caching), we use a special way of representing data serialized 
> across a set of memory pages. The big work lies in the way the algorithms are 
> implemented to operate on pages, rather than on objects.
> The core class for the memory is the {{MemorySegment}}, which has all methods 
> to set and get primitives values efficiently. It is a somewhat simpler (and 
> faster) variant of a HeapByteBuffer.
> As such, it should be straightforward to create a version where the memory 
> segment is not backed by a heap byte[], but by memory allocated outside the 
> JVM, in a similar way as the NIO DirectByteBuffers, or the Netty direct 
> buffers do it.
> This may have multiple advantages:
>   - We reduce the size of the JVM heap (garbage collected) and the number and 
> size of long living alive objects. For large JVM sizes, this may improve 
> performance quite a bit. Utilmately, we would in many cases reduce JVM size 
> to 1/3 to 1/2 and keep the remaining memory outside the JVM.
>   - We save copies when we move memory pages to disk (spilling) or through 
> the network (shuffling / broadcasting / forward piping)
> The changes required to implement this are
>   - Add a {{UnmanagedMemorySegment}} that only stores the memory adress as a 
> long, and the segment size. It is initialized from a DirectByteBuffer.
>   - Allow the MemoryManager to allocate these MemorySegments, instead of the 
> current ones.
>   - Make sure that the startup script pick up the mode and configure the heap 
> size and the max direct memory properly.
> Since the MemorySegment is probably the most performance critical class in 
> Flink, we must take care that we do this right. The following are critical 
> considerations:
>   - If we want both solutions (heap and off-heap) to exist side-by-side 
> (configurable), we must make the base MemorySegment abstract and implement 
> two versions (heap and off-heap).
>   - To get the best performance, we need to make sure that only one class 
> gets loaded (or at least ever used), to ensure optimal JIT de-virtualization 
> and inlining.
>   - We should carefully measure the performance of both variants. From 
> previous micro benchmarks, I remember that individual byte accesses in 
> DirectByteBuffers (off-heap) were slightly slower than on-heap, any larger 
> accesses were equally good or slightly better.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to