[ 
https://issues.apache.org/jira/browse/SPARK-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14225681#comment-14225681
 ] 

Aaron Davidson commented on SPARK-2468:
---------------------------------------

Hey guys, I finally got a chance to run a more comprehensive set of tests with 
constrained containers. In doing so, I found a critical issue which caused us 
to allocate direct byte buffers proportional to the number of executors times 
the number of cores, rather than just proportional to the number of cores. With 
patch [#3465|https://github.com/apache/spark/pull/3465], I was able to run a 
shuffle with [~lianhuiwang]'s configuration of 7GB container with 6GB heap and 
2 cores -- prior to the patch, it exceeded the container's limits. 

If you guys get a chance, please let me know if this is sufficient to fix your 
issues with your initial overhead configurations. (Note that while the memory 
usage was greatly decreased, we still allocate a significant amount of off-heap 
memory, so it's possible you need to shift some of the heap to off-heap if your 
off-heap was previously very constrained.)

> Netty-based block server / client module
> ----------------------------------------
>
>                 Key: SPARK-2468
>                 URL: https://issues.apache.org/jira/browse/SPARK-2468
>             Project: Spark
>          Issue Type: Improvement
>          Components: Shuffle, Spark Core
>            Reporter: Reynold Xin
>            Assignee: Reynold Xin
>            Priority: Critical
>             Fix For: 1.2.0
>
>
> Right now shuffle send goes through the block manager. This is inefficient 
> because it requires loading a block from disk into a kernel buffer, then into 
> a user space buffer, and then back to a kernel send buffer before it reaches 
> the NIC. It does multiple copies of the data and context switching between 
> kernel/user. It also creates unnecessary buffer in the JVM that increases GC
> Instead, we should use FileChannel.transferTo, which handles this in the 
> kernel space with zero-copy. See 
> http://www.ibm.com/developerworks/library/j-zerocopy/
> One potential solution is to use Netty.  Spark already has a Netty based 
> network module implemented (org.apache.spark.network.netty). However, it 
> lacks some functionality and is turned off by default. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to