[ 
https://issues.apache.org/jira/browse/SPARK-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14201849#comment-14201849
 ] 

Aaron Davidson commented on SPARK-2468:
---------------------------------------

[~zzcclp] Thank you for the writeup. Is it really the case that each of your 
executors is only using 1 core for its 20GB of RAM? It seems like 5 would be in 
line with the portion of memory you're using. Also, the sum of your storage and 
memory fractions exceed 1, so if you're caching any data and then performing a 
reduction/groupBy, you could actually see an OOM even without this other issue. 
I would recommend keeping shuffle fraction relatively low unless you have a 
good reason not to, as it can lead to increased instability.

The numbers are relatively close to my expectations, which would estimate netty 
allocating around 750MB of direct buffer space, thinking that it has 24 cores. 
With #3155 and maxUsableCores set to 1 (or 5), I hope this issue may be 
resolved.

> Netty-based block server / client module
> ----------------------------------------
>
>                 Key: SPARK-2468
>                 URL: https://issues.apache.org/jira/browse/SPARK-2468
>             Project: Spark
>          Issue Type: Improvement
>          Components: Shuffle, Spark Core
>            Reporter: Reynold Xin
>            Assignee: Reynold Xin
>            Priority: Critical
>             Fix For: 1.2.0
>
>
> Right now shuffle send goes through the block manager. This is inefficient 
> because it requires loading a block from disk into a kernel buffer, then into 
> a user space buffer, and then back to a kernel send buffer before it reaches 
> the NIC. It does multiple copies of the data and context switching between 
> kernel/user. It also creates unnecessary buffer in the JVM that increases GC
> Instead, we should use FileChannel.transferTo, which handles this in the 
> kernel space with zero-copy. See 
> http://www.ibm.com/developerworks/library/j-zerocopy/
> One potential solution is to use Netty.  Spark already has a Netty based 
> network module implemented (org.apache.spark.network.netty). However, it 
> lacks some functionality and is turned off by default. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to