[ 
https://issues.apache.org/jira/browse/SPARK-22229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16752356#comment-16752356
 ] 

Thomas Graves commented on SPARK-22229:
---------------------------------------

This is interesting, a few questions
 * I'm assuming all the data has to fit into memory for this to work?  Or is it 
somehow handling spill files by pulling them into memory and then transferring? 
 Does it fail if its not all in memory?
 * The benchmarks data size I saw seemed to all appear to fit into memory, is 
that right? 
 * Did you performance test with both rdma over ethernet and infiniband?
 * To clarify the above question, is the implementation in Mellanox/SparkRDMA 
github stable or not yet complete?
 * The SPIP mentions: MapStatuses are redundant – no need for those extra 
transfers that take precious seconds in many job. -> How does reducer know 
where to fetch map output from then?  It still somehow needs to know a host and 
perhaps memory location unless that host its fetching from just knows based on 
mapid and reduceid.  
 * I assume this is only supported with external shuffle disabled (which 
probably doesn't exist since you have different shuffle manager) and no dynamic 
allocation?
 * Depending on above questions, if its all in memory, I assume if executor 
goes down it has to rerun those tasks since its not on disk for external 
shuffle service to still serve up.
 * If someone was to try this out, from the spip: "SparkRDMA manages its own 
memory, off-heap", I take that to mean in addition to sparks normal memory 
usage you need to give the spark executor enough off heap memory to account for 
whatever size you are shuffling then?

> SPIP: RDMA Accelerated Shuffle Engine
> -------------------------------------
>
>                 Key: SPARK-22229
>                 URL: https://issues.apache.org/jira/browse/SPARK-22229
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 2.3.0, 2.4.0, 3.0.0
>            Reporter: Yuval Degani
>            Priority: Major
>         Attachments: 
> SPARK-22229_SPIP_RDMA_Accelerated_Shuffle_Engine_Rev_1.0.pdf
>
>
> An RDMA-accelerated shuffle engine can provide enormous performance benefits 
> to shuffle-intensive Spark jobs, as demonstrated in the “SparkRDMA” plugin 
> open-source project ([https://github.com/Mellanox/SparkRDMA]).
> Using RDMA for shuffle improves CPU utilization significantly and reduces I/O 
> processing overhead by bypassing the kernel and networking stack as well as 
> avoiding memory copies entirely. Those valuable CPU cycles are then consumed 
> directly by the actual Spark workloads, and help reducing the job runtime 
> significantly. 
> This performance gain is demonstrated with both industry standard HiBench 
> TeraSort (shows 1.5x speedup in sorting) as well as shuffle intensive 
> customer applications. 
> SparkRDMA will be presented at Spark Summit 2017 in Dublin 
> ([https://spark-summit.org/eu-2017/events/accelerating-shuffle-a-tailor-made-rdma-solution-for-apache-spark/]).
> Please see attached proposal document for more information.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to