[ 
https://issues.apache.org/jira/browse/SPARK-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14202490#comment-14202490
 ] 

Aaron Davidson commented on SPARK-2468:
---------------------------------------

[~lianhuiwang] Can you try again with preferDirectBufs set to true, and just 
setting maxUsableCores down to the number of cores each container actually has? 
It's possible the performance discrepancy you're seeing is simply due to heap 
byte buffers not being as fast as direct ones. You might also decrease the Java 
heap size a bit while keeping the container size the same, if _any_ direct 
memory allocation is causing the container to be killed.

[~zzcclp] Same suggestion for you about setting preferDirectBufs to true and 
setting maxUsableCores down, but I will also perform another round of 
benchmarking -- it's possible we accidentally introduced a performance 
regression in the last few patches. 

Comparing Hadoop vs Spark performance is a different matter. A few suggestions 
on your setup: You should set executor-cores to 5, so that each executor is 
actually using 5 cores instead of just 1. You're losing significant parallelism 
because of this setting, as Spark will only launch 1 task per core on an 
executor at any given time. Second, groupBy() is inefficient (it's doc was 
changed recently to reflect this), and should be avoided. I would recommend 
changing your job to sort the whole RDD using something similar to 
{code}mapR.map { x => ((x._1._1, x._2._1), x) }.sortByKey(){code}, which would 
not require that all values for a single group fit in memory. This would still 
effectively group by x._1._1, but would sort within each group by x._2._1, and 
would utilize Spark's efficient sorting machinery.

> Netty-based block server / client module
> ----------------------------------------
>
>                 Key: SPARK-2468
>                 URL: https://issues.apache.org/jira/browse/SPARK-2468
>             Project: Spark
>          Issue Type: Improvement
>          Components: Shuffle, Spark Core
>            Reporter: Reynold Xin
>            Assignee: Reynold Xin
>            Priority: Critical
>             Fix For: 1.2.0
>
>
> Right now shuffle send goes through the block manager. This is inefficient 
> because it requires loading a block from disk into a kernel buffer, then into 
> a user space buffer, and then back to a kernel send buffer before it reaches 
> the NIC. It does multiple copies of the data and context switching between 
> kernel/user. It also creates unnecessary buffer in the JVM that increases GC
> Instead, we should use FileChannel.transferTo, which handles this in the 
> kernel space with zero-copy. See 
> http://www.ibm.com/developerworks/library/j-zerocopy/
> One potential solution is to use Netty.  Spark already has a Netty based 
> network module implemented (org.apache.spark.network.netty). However, it 
> lacks some functionality and is turned off by default. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to