[ 
https://issues.apache.org/jira/browse/SPARK-4740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14241206#comment-14241206
 ] 

Zhang, Liye commented on SPARK-4740:
------------------------------------

Hi [~adav], [~rxin], I run the test with the latest master branch today, In 
which rxin's patch is merged. 

On my 4 nodes 48 cores per node cluster, I set the *spark.local.dir* to one 
tmpfs (ramdisk) dir, the ramdisk size is 136GB to make sure enough for shuffle 
(total shuffle write 284GB, total shuffle read 213GB), *spark.executor.memory* 
is set to 48GB. In this way to eliminate the disk I/O effect. Still with the 
400GB data set, the test result shows Netty is better than NIO (reduce time 
*Netty:24mins* VS *NIO:26mins*).

Also, I retested with 8HDDs, remain *spark.executor.memory* with 48GB, set 
*spark.local.dir* to 8 HDD dirs. The result is about the same as before, that 
is NIO outperforms Netty (reduce time*Netty:32mins* VS *NIO:25mins*). And in 
Netty test, unbalance still exists, the best executor finishes 308 tasks, and 
the worst executor only finished 222 tasks.

It seems NIO is not effected with whether it is HDD or ramdisk, while Netty is 
more sensitive with HDD.

Till now, maybe we can limit the problem to the different behavior between 
Netty and NIO on disk operations.

> Netty's network throughput is about 1/2 of NIO's in spark-perf sortByKey
> ------------------------------------------------------------------------
>
>                 Key: SPARK-4740
>                 URL: https://issues.apache.org/jira/browse/SPARK-4740
>             Project: Spark
>          Issue Type: Improvement
>          Components: Shuffle, Spark Core
>    Affects Versions: 1.2.0
>            Reporter: Zhang, Liye
>            Assignee: Reynold Xin
>         Attachments: (rxin patch better executor)TestRunner  sort-by-key - 
> Thread dump for executor 3_files.zip, (rxin patch normal executor)TestRunner  
> sort-by-key - Thread dump for executor 0 _files.zip, Spark-perf Test Report 
> 16 Cores per Executor.pdf, Spark-perf Test Report.pdf, TestRunner  
> sort-by-key - Thread dump for executor 1_files (Netty-48 Cores per node).zip, 
> TestRunner  sort-by-key - Thread dump for executor 1_files (Nio-48 cores per 
> node).zip, rxin_patch-on_4_node_cluster_48CoresPerNode(Unbalance).7z
>
>
> When testing current spark master (1.3.0-snapshot) with spark-perf 
> (sort-by-key, aggregate-by-key, etc), Netty based shuffle transferService 
> takes much longer time than NIO based shuffle transferService. The network 
> throughput of Netty is only about half of that of NIO. 
> We tested with standalone mode, and the data set we used for test is 20 
> billion records, and the total size is about 400GB. Spark-perf test is 
> Running on a 4 node cluster with 10G NIC, 48 cpu cores per node and each 
> executor memory is 64GB. The reduce tasks number is set to 1000. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to