[ 
https://issues.apache.org/jira/browse/SPARK-2926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14096608#comment-14096608
 ] 

Saisai Shao edited comment on SPARK-2926 at 8/14/14 7:12 AM:
-------------------------------------------------------------

Hi Matei, 

I just uploaded a Spark shuffle performance test report. In this report, I 
choose 3 different workloads (sort-by-key, aggregate-by-key and group-by-key) 
in SparkPerf to test the performance of current 3 shuffle implementations: 
hash-based shuffle; sort-based shuffle with HashShuffleReader; sort-based 
shuffle with sort-merge shuffle reader (our prototype). Generally for 
sort-by-key our prototype can gain more benefits than other two 
implementations, while for other two workloads the performance is almost the 
same.

Would you mind taking a look at it, any comment would be greatly appreciated, 
thanks a lot.


was (Author: jerryshao):
Hi Matei, 

I just uploaded a Spark shuffle performance test report. In this report, I 
choose 3 different workload (sort-by-key, aggregate-by-key and group-by-key) in 
SparkPerf to test the performance of current 3 shuffle implementations: 
hash-based shuffle; sort-based shuffle with HashShuffleReader; sort-based 
shuffle with sort-merge shuffle reader (our prototype).

Would you mind taking a look at it, any comment would be greatly appreciated, 
thanks a lot.

> Add MR-style (merge-sort) SortShuffleReader for sort-based shuffle
> ------------------------------------------------------------------
>
>                 Key: SPARK-2926
>                 URL: https://issues.apache.org/jira/browse/SPARK-2926
>             Project: Spark
>          Issue Type: Improvement
>          Components: Shuffle
>    Affects Versions: 1.1.0
>            Reporter: Saisai Shao
>         Attachments: SortBasedShuffleRead.pdf, Spark Shuffle Test Report.pdf
>
>
> Currently Spark has already integrated sort-based shuffle write, which 
> greatly improve the IO performance and reduce the memory consumption when 
> reducer number is very large. But for the reducer side, it still adopts the 
> implementation of hash-based shuffle reader, which neglects the ordering 
> attributes of map output data in some situations.
> Here we propose a MR style sort-merge like shuffle reader for sort-based 
> shuffle to better improve the performance of sort-based shuffle.
> Working in progress code and performance test report will be posted later 
> when some unit test bugs are fixed.
> Any comments would be greatly appreciated. 
> Thanks a lot.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to