[ 
https://issues.apache.org/jira/browse/SPARK-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14061094#comment-14061094
 ] 

Mridul Muralidharan commented on SPARK-2468:
--------------------------------------------


Ah, small files - those are indeed a problem.

Btw, we do dispose off map'ed blocks as soon as it is done; so we dont need to 
wait for gc to free them. Also note that the files are closed as soon as opened 
and mmap'ed - so they do not count towards open file count/ulimit.

Agree on 1, 3 and 4 - some of these apply to sendfile too btw : so not 
avoidable; but it is the best we have right now.
Since we use mmap'ed buffers and rarely transfer the same file again, the 
performance jump might not be the order(s) of magnitude other projects claim - 
but then even 10% (or whatever) improvement in our case would be substantial !

> zero-copy shuffle network communication
> ---------------------------------------
>
>                 Key: SPARK-2468
>                 URL: https://issues.apache.org/jira/browse/SPARK-2468
>             Project: Spark
>          Issue Type: Improvement
>          Components: Shuffle, Spark Core
>            Reporter: Reynold Xin
>            Assignee: Reynold Xin
>            Priority: Critical
>
> Right now shuffle send goes through the block manager. This is inefficient 
> because it requires loading a block from disk into a kernel buffer, then into 
> a user space buffer, and then back to a kernel send buffer before it reaches 
> the NIC. It does multiple copies of the data and context switching between 
> kernel/user. It also creates unnecessary buffer in the JVM that increases GC
> Instead, we should use FileChannel.transferTo, which handles this in the 
> kernel space with zero-copy. See 
> http://www.ibm.com/developerworks/library/j-zerocopy/
> One potential solution is to use Netty NIO.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to