[jira] [Commented] (SPARK-22229) SPIP: RDMA Accelerated Shuffle Engine

Yuval Degani (JIRA) Thu, 12 Oct 2017 11:10:23 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-22229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16202368#comment-16202368
 ]


Yuval Degani commented on SPARK-22229:
--------------------------------------

Good point [~jerryshao].
Regarding testing on a machines without RDMA support:
For this exact reason, and also for cases where RDMA is used on a mixed 
cluster, where you may have both RDMA capable and non-RDMA capable machines, 
there is a software solution that is already part of the Linux kernel (version 
4.8+): "Soft-RoCE" aka "rxe".
Here are some links with more information:
https://elixir.free-electrons.com/linux/v4.8/source/drivers/infiniband/sw/rxe
https://community.mellanox.com/docs/DOC-2184
https://github.com/SoftRoCE/rxe-dev

Regarding your concern about maintaining the code:
I don't think that limited familiarity with a new promising feature is a good 
enough reason to avoid it. If every new feature will be treated this way, then 
new technologies will never get introduced to Spark.
For what it's worth, this is a project we take very seriously, and will gladly 
commit to maintaining and supporting it.

> SPIP: RDMA Accelerated Shuffle Engine
> -------------------------------------
>
>                 Key: SPARK-22229
>                 URL: https://issues.apache.org/jira/browse/SPARK-22229
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 2.3.0
>            Reporter: Yuval Degani
>         Attachments: 
> SPARK-22229_SPIP_RDMA_Accelerated_Shuffle_Engine_Rev_1.0.pdf
>
>
> An RDMA-accelerated shuffle engine can provide enormous performance benefits 
> to shuffle-intensive Spark jobs, as demonstrated in the “SparkRDMA” plugin 
> open-source project ([https://github.com/Mellanox/SparkRDMA]).
> Using RDMA for shuffle improves CPU utilization significantly and reduces I/O 
> processing overhead by bypassing the kernel and networking stack as well as 
> avoiding memory copies entirely. Those valuable CPU cycles are then consumed 
> directly by the actual Spark workloads, and help reducing the job runtime 
> significantly. 
> This performance gain is demonstrated with both industry standard HiBench 
> TeraSort (shows 1.5x speedup in sorting) as well as shuffle intensive 
> customer applications. 
> SparkRDMA will be presented at Spark Summit 2017 in Dublin 
> ([https://spark-summit.org/eu-2017/events/accelerating-shuffle-a-tailor-made-rdma-solution-for-apache-spark/]).
> Please see attached proposal document for more information.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-22229) SPIP: RDMA Accelerated Shuffle Engine

Reply via email to