[jira] [Comment Edited] (SPARK-2468) zero-copy shuffle network communication

Reynold Xin (JIRA) Mon, 14 Jul 2014 10:56:26 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14060937#comment-14060937
 ]


Reynold Xin edited comment on SPARK-2468 at 7/14/14 5:55 PM:
-------------------------------------------------------------

We do use mmap for large blocks. However, most of the shuffle blocks are small 
so a lot of blocks are not mapped. In addition, there are multiple problems 
with memory mapped files:

1. Memory mapped blocks are off-heap and are not managed by the JVM, which 
creates another memory space to tune/mange

2. Memory mapped blocks cannot be reused and are only released at GC

3. On Linux machines with Huge Pages configured (which is increasingly more 
common with large memory), the default behavior is each file will consume 2MB, 
leading to OOM very soon.

4. For large blocks that span multiple pages, it creates page faults which 
leads to unnecessary context switches

The last one is probably much less important.



was (Author: rxin):
We do use mmap for large blocks. However, most of the shuffle blocks are small 
so a lot of blocks are not mapped. In addition, there are multiple problems 
with memory mapped files:

1. Memory mapped blocks are off-heap and are not managed by the JVM, which 
creates another memory space to tune/mange

2. Memory mapped blocks cannot be reused and are only released at GC

3. On Linux machines with Huge Pages configured (which is increasingly more 
common with large memory), the default behavior is each file will consume 2MB, 
leading to OOM very soon.

4. For large blocks that span multiple pages, it creates page faults which 
leads to unnecessary context switches


> zero-copy shuffle network communication
> ---------------------------------------
>
>                 Key: SPARK-2468
>                 URL: https://issues.apache.org/jira/browse/SPARK-2468
>             Project: Spark
>          Issue Type: Improvement
>          Components: Shuffle, Spark Core
>            Reporter: Reynold Xin
>            Assignee: Reynold Xin
>            Priority: Critical
>
> Right now shuffle send goes through the block manager. This is inefficient 
> because it requires loading a block from disk into a kernel buffer, then into 
> a user space buffer, and then back to a kernel send buffer before it reaches 
> the NIC. It does multiple copies of the data and context switching between 
> kernel/user. It also creates unnecessary buffer in the JVM that increases GC
> Instead, we should use FileChannel.transferTo, which handles this in the 
> kernel space with zero-copy. See 
> http://www.ibm.com/developerworks/library/j-zerocopy/
> One potential solution is to use Netty NIO.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Comment Edited] (SPARK-2468) zero-copy shuffle network communication

Reply via email to