[ https://issues.apache.org/jira/browse/SPARK-22229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16202368#comment-16202368 ]
Yuval Degani commented on SPARK-22229: -------------------------------------- Good point [~jerryshao]. Regarding testing on a machines without RDMA support: For this exact reason, and also for cases where RDMA is used on a mixed cluster, where you may have both RDMA capable and non-RDMA capable machines, there is a software solution that is already part of the Linux kernel (version 4.8+): "Soft-RoCE" aka "rxe". Here are some links with more information: https://elixir.free-electrons.com/linux/v4.8/source/drivers/infiniband/sw/rxe https://community.mellanox.com/docs/DOC-2184 https://github.com/SoftRoCE/rxe-dev Regarding your concern about maintaining the code: I don't think that limited familiarity with a new promising feature is a good enough reason to avoid it. If every new feature will be treated this way, then new technologies will never get introduced to Spark. For what it's worth, this is a project we take very seriously, and will gladly commit to maintaining and supporting it. > SPIP: RDMA Accelerated Shuffle Engine > ------------------------------------- > > Key: SPARK-22229 > URL: https://issues.apache.org/jira/browse/SPARK-22229 > Project: Spark > Issue Type: Improvement > Components: Spark Core > Affects Versions: 2.3.0 > Reporter: Yuval Degani > Attachments: > SPARK-22229_SPIP_RDMA_Accelerated_Shuffle_Engine_Rev_1.0.pdf > > > An RDMA-accelerated shuffle engine can provide enormous performance benefits > to shuffle-intensive Spark jobs, as demonstrated in the “SparkRDMA” plugin > open-source project ([https://github.com/Mellanox/SparkRDMA]). > Using RDMA for shuffle improves CPU utilization significantly and reduces I/O > processing overhead by bypassing the kernel and networking stack as well as > avoiding memory copies entirely. Those valuable CPU cycles are then consumed > directly by the actual Spark workloads, and help reducing the job runtime > significantly. > This performance gain is demonstrated with both industry standard HiBench > TeraSort (shows 1.5x speedup in sorting) as well as shuffle intensive > customer applications. > SparkRDMA will be presented at Spark Summit 2017 in Dublin > ([https://spark-summit.org/eu-2017/events/accelerating-shuffle-a-tailor-made-rdma-solution-for-apache-spark/]). > Please see attached proposal document for more information. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org