[jira] [Created] (SPARK-22229) SPIP: RDMA Accelerated Shuffle Engine

2017-10-09 Thread Yuval Degani (JIRA)
Yuval Degani created SPARK-9:


 Summary: SPIP: RDMA Accelerated Shuffle Engine
 Key: SPARK-9
 URL: https://issues.apache.org/jira/browse/SPARK-9
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 2.3.0
Reporter: Yuval Degani


An RDMA-accelerated shuffle engine can provide enormous performance benefits to 
shuffle-intensive Spark jobs, as demonstrated in the “SparkRDMA” plugin 
open-source project ([https://github.com/Mellanox/SparkRDMA]).
Using RDMA for shuffle improves CPU utilization significantly and reduces I/O 
processing overhead by bypassing the kernel and networking stack as well as 
avoiding memory copies entirely. Those valuable CPU cycles are then consumed 
directly by the actual Spark workloads, and help reducing the job runtime 
significantly. 
This performance gain is demonstrated with both industry standard HiBench 
TeraSort (shows 1.5x speedup in sorting) as well as shuffle intensive customer 
applications. 
SparkRDMA will be presented at Spark Summit 2017 in Dublin 
([https://spark-summit.org/eu-2017/events/accelerating-shuffle-a-tailor-made-rdma-solution-for-apache-spark/])



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-22229) SPIP: RDMA Accelerated Shuffle Engine

2017-10-09 Thread Yuval Degani (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuval Degani updated SPARK-9:
-
Description: 
An RDMA-accelerated shuffle engine can provide enormous performance benefits to 
shuffle-intensive Spark jobs, as demonstrated in the “SparkRDMA” plugin 
open-source project ([https://github.com/Mellanox/SparkRDMA]).
Using RDMA for shuffle improves CPU utilization significantly and reduces I/O 
processing overhead by bypassing the kernel and networking stack as well as 
avoiding memory copies entirely. Those valuable CPU cycles are then consumed 
directly by the actual Spark workloads, and help reducing the job runtime 
significantly. 
This performance gain is demonstrated with both industry standard HiBench 
TeraSort (shows 1.5x speedup in sorting) as well as shuffle intensive customer 
applications. 
SparkRDMA will be presented at Spark Summit 2017 in Dublin 
([https://spark-summit.org/eu-2017/events/accelerating-shuffle-a-tailor-made-rdma-solution-for-apache-spark/]).

Please see attached proposal document for more information.

  was:
An RDMA-accelerated shuffle engine can provide enormous performance benefits to 
shuffle-intensive Spark jobs, as demonstrated in the “SparkRDMA” plugin 
open-source project ([https://github.com/Mellanox/SparkRDMA]).
Using RDMA for shuffle improves CPU utilization significantly and reduces I/O 
processing overhead by bypassing the kernel and networking stack as well as 
avoiding memory copies entirely. Those valuable CPU cycles are then consumed 
directly by the actual Spark workloads, and help reducing the job runtime 
significantly. 
This performance gain is demonstrated with both industry standard HiBench 
TeraSort (shows 1.5x speedup in sorting) as well as shuffle intensive customer 
applications. 
SparkRDMA will be presented at Spark Summit 2017 in Dublin 
([https://spark-summit.org/eu-2017/events/accelerating-shuffle-a-tailor-made-rdma-solution-for-apache-spark/])


> SPIP: RDMA Accelerated Shuffle Engine
> -
>
> Key: SPARK-9
> URL: https://issues.apache.org/jira/browse/SPARK-9
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.3.0
>Reporter: Yuval Degani
>
> An RDMA-accelerated shuffle engine can provide enormous performance benefits 
> to shuffle-intensive Spark jobs, as demonstrated in the “SparkRDMA” plugin 
> open-source project ([https://github.com/Mellanox/SparkRDMA]).
> Using RDMA for shuffle improves CPU utilization significantly and reduces I/O 
> processing overhead by bypassing the kernel and networking stack as well as 
> avoiding memory copies entirely. Those valuable CPU cycles are then consumed 
> directly by the actual Spark workloads, and help reducing the job runtime 
> significantly. 
> This performance gain is demonstrated with both industry standard HiBench 
> TeraSort (shows 1.5x speedup in sorting) as well as shuffle intensive 
> customer applications. 
> SparkRDMA will be presented at Spark Summit 2017 in Dublin 
> ([https://spark-summit.org/eu-2017/events/accelerating-shuffle-a-tailor-made-rdma-solution-for-apache-spark/]).
> Please see attached proposal document for more information.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-22229) SPIP: RDMA Accelerated Shuffle Engine

2017-10-09 Thread Yuval Degani (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuval Degani updated SPARK-9:
-
Attachment: SPARK-9_SPIP_RDMA_Accelerated_Shuffle_Engine_Rev_1.0.pdf

> SPIP: RDMA Accelerated Shuffle Engine
> -
>
> Key: SPARK-9
> URL: https://issues.apache.org/jira/browse/SPARK-9
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.3.0
>Reporter: Yuval Degani
> Attachments: 
> SPARK-9_SPIP_RDMA_Accelerated_Shuffle_Engine_Rev_1.0.pdf
>
>
> An RDMA-accelerated shuffle engine can provide enormous performance benefits 
> to shuffle-intensive Spark jobs, as demonstrated in the “SparkRDMA” plugin 
> open-source project ([https://github.com/Mellanox/SparkRDMA]).
> Using RDMA for shuffle improves CPU utilization significantly and reduces I/O 
> processing overhead by bypassing the kernel and networking stack as well as 
> avoiding memory copies entirely. Those valuable CPU cycles are then consumed 
> directly by the actual Spark workloads, and help reducing the job runtime 
> significantly. 
> This performance gain is demonstrated with both industry standard HiBench 
> TeraSort (shows 1.5x speedup in sorting) as well as shuffle intensive 
> customer applications. 
> SparkRDMA will be presented at Spark Summit 2017 in Dublin 
> ([https://spark-summit.org/eu-2017/events/accelerating-shuffle-a-tailor-made-rdma-solution-for-apache-spark/]).
> Please see attached proposal document for more information.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-22229) SPIP: RDMA Accelerated Shuffle Engine

2017-10-10 Thread Yuval Degani (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16199052#comment-16199052
 ] 

Yuval Degani commented on SPARK-9:
--

[~srowen], [~viirya], thanks for your response.

Regarding whether RDMA requires specialized hardware:
RDMA is considered a commodity these days. You will find that most 10Gb/s+ 
network cards support it, and RDMA supported NICs are sold by many vendors: 
Mellanox, Intel, Broadcom, Chelsio, Cavium, HP, Dell, Emulex and more. As a 
matter of fact, most people are not even aware that their existing setups 
already support RDMA, and this is where we come in and try to make this 
technology accessible and seamless.
Also, cloud provider support is growing fast: Microsoft Azure A, H nodes 
support RDMA for a while now.

Regarding the pluggable mechanism:
I think that we, as Spark advocates and enthusiasts, would like to keep Spark 
as a framework that shows uncontested performance.
We see lower-level integration reaching almost every mainstream framework with 
GPU and ASIC most recently, and also RDMA is now taking its place.
RDMA is already supported natively in today's most popular distributed ML 
platforms: TensorFlow, Caffe2 and CNTK, and is being driven into others as well.

I think that in order for Spark to keep up with today's performance challenges, 
we must allow some lower-level integration, especially where mature and proven 
technologies such as RDMA are considered.

> SPIP: RDMA Accelerated Shuffle Engine
> -
>
> Key: SPARK-9
> URL: https://issues.apache.org/jira/browse/SPARK-9
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.3.0
>Reporter: Yuval Degani
> Attachments: 
> SPARK-9_SPIP_RDMA_Accelerated_Shuffle_Engine_Rev_1.0.pdf
>
>
> An RDMA-accelerated shuffle engine can provide enormous performance benefits 
> to shuffle-intensive Spark jobs, as demonstrated in the “SparkRDMA” plugin 
> open-source project ([https://github.com/Mellanox/SparkRDMA]).
> Using RDMA for shuffle improves CPU utilization significantly and reduces I/O 
> processing overhead by bypassing the kernel and networking stack as well as 
> avoiding memory copies entirely. Those valuable CPU cycles are then consumed 
> directly by the actual Spark workloads, and help reducing the job runtime 
> significantly. 
> This performance gain is demonstrated with both industry standard HiBench 
> TeraSort (shows 1.5x speedup in sorting) as well as shuffle intensive 
> customer applications. 
> SparkRDMA will be presented at Spark Summit 2017 in Dublin 
> ([https://spark-summit.org/eu-2017/events/accelerating-shuffle-a-tailor-made-rdma-solution-for-apache-spark/]).
> Please see attached proposal document for more information.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-22229) SPIP: RDMA Accelerated Shuffle Engine

2017-10-10 Thread Yuval Degani (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16199293#comment-16199293
 ] 

Yuval Degani commented on SPARK-9:
--

We already published an open-source package that implements an RDMA shuffle 
engine: it is available on https://github.com/Mellanox/SparkRDMA.
We have been working on this project for about a year now, and have a growing 
list of enterprises that use it in production, or intend to do so after 
pre-production testing in the very near future.

I think whether to introduce it to Spark or leave it as an external plugin is 
kind of a chicken or egg question.
Not having it available in mainstream Spark is currently what's blocking 
widespread adoptions.

I limited the goals for this SPIP to the shuffle engine, since I didn't want to 
impose a cross-component change.
Actually, it probably makes more sense to integrate RDMA a little lower in the 
stack by offering an alternative for the "NettyBlockTransferService" which 
implements the "BlockTransferService interface". However, this interface is not 
pluginable in today's Spark code.
By implementing an "RdmaBlockTransferService", we can instantly allow RDMA 
transfers across the board for broadcasts, remote RDDs, task results and 
shuffles of course.

My plan is to have those RDMA capabilities up and running for shuffle, and let 
it be the first step in introducing RDMA to Spark.
There's a lot more that RDMA can do in the context of Spark besides those I 
already mentioned: RPC messaging over RDMA, GPUDirect (remotely access GPU 
memory on NVIDIA - crucial for ML applications, and the main driver for 
implementing RDMA in TensorFlow and Caffe2), and more.

> SPIP: RDMA Accelerated Shuffle Engine
> -
>
> Key: SPARK-9
> URL: https://issues.apache.org/jira/browse/SPARK-9
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.3.0
>Reporter: Yuval Degani
> Attachments: 
> SPARK-9_SPIP_RDMA_Accelerated_Shuffle_Engine_Rev_1.0.pdf
>
>
> An RDMA-accelerated shuffle engine can provide enormous performance benefits 
> to shuffle-intensive Spark jobs, as demonstrated in the “SparkRDMA” plugin 
> open-source project ([https://github.com/Mellanox/SparkRDMA]).
> Using RDMA for shuffle improves CPU utilization significantly and reduces I/O 
> processing overhead by bypassing the kernel and networking stack as well as 
> avoiding memory copies entirely. Those valuable CPU cycles are then consumed 
> directly by the actual Spark workloads, and help reducing the job runtime 
> significantly. 
> This performance gain is demonstrated with both industry standard HiBench 
> TeraSort (shows 1.5x speedup in sorting) as well as shuffle intensive 
> customer applications. 
> SparkRDMA will be presented at Spark Summit 2017 in Dublin 
> ([https://spark-summit.org/eu-2017/events/accelerating-shuffle-a-tailor-made-rdma-solution-for-apache-spark/]).
> Please see attached proposal document for more information.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-22229) SPIP: RDMA Accelerated Shuffle Engine

2017-10-11 Thread Yuval Degani (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16200664#comment-16200664
 ] 

Yuval Degani commented on SPARK-9:
--

Yes, transitioning from a ShuffleManager to a BlockTransferService will allow 
significant reuse of the original code, as the code is well organized in 
self-contained facilities.
All RDMA client/server facilities can be instantly reused.
Management code will have to be moved and adjusted, but on the other hand may 
allow reuse of other Spark facilities such as BlockStoreShuffleReader and 
ShuffleBlockFetcherIterator.

> SPIP: RDMA Accelerated Shuffle Engine
> -
>
> Key: SPARK-9
> URL: https://issues.apache.org/jira/browse/SPARK-9
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.3.0
>Reporter: Yuval Degani
> Attachments: 
> SPARK-9_SPIP_RDMA_Accelerated_Shuffle_Engine_Rev_1.0.pdf
>
>
> An RDMA-accelerated shuffle engine can provide enormous performance benefits 
> to shuffle-intensive Spark jobs, as demonstrated in the “SparkRDMA” plugin 
> open-source project ([https://github.com/Mellanox/SparkRDMA]).
> Using RDMA for shuffle improves CPU utilization significantly and reduces I/O 
> processing overhead by bypassing the kernel and networking stack as well as 
> avoiding memory copies entirely. Those valuable CPU cycles are then consumed 
> directly by the actual Spark workloads, and help reducing the job runtime 
> significantly. 
> This performance gain is demonstrated with both industry standard HiBench 
> TeraSort (shows 1.5x speedup in sorting) as well as shuffle intensive 
> customer applications. 
> SparkRDMA will be presented at Spark Summit 2017 in Dublin 
> ([https://spark-summit.org/eu-2017/events/accelerating-shuffle-a-tailor-made-rdma-solution-for-apache-spark/]).
> Please see attached proposal document for more information.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-22229) SPIP: RDMA Accelerated Shuffle Engine

2017-10-12 Thread Yuval Degani (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16202368#comment-16202368
 ] 

Yuval Degani commented on SPARK-9:
--

Good point [~jerryshao].
Regarding testing on a machines without RDMA support:
For this exact reason, and also for cases where RDMA is used on a mixed 
cluster, where you may have both RDMA capable and non-RDMA capable machines, 
there is a software solution that is already part of the Linux kernel (version 
4.8+): "Soft-RoCE" aka "rxe".
Here are some links with more information:
https://elixir.free-electrons.com/linux/v4.8/source/drivers/infiniband/sw/rxe
https://community.mellanox.com/docs/DOC-2184
https://github.com/SoftRoCE/rxe-dev

Regarding your concern about maintaining the code:
I don't think that limited familiarity with a new promising feature is a good 
enough reason to avoid it. If every new feature will be treated this way, then 
new technologies will never get introduced to Spark.
For what it's worth, this is a project we take very seriously, and will gladly 
commit to maintaining and supporting it.

> SPIP: RDMA Accelerated Shuffle Engine
> -
>
> Key: SPARK-9
> URL: https://issues.apache.org/jira/browse/SPARK-9
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.3.0
>Reporter: Yuval Degani
> Attachments: 
> SPARK-9_SPIP_RDMA_Accelerated_Shuffle_Engine_Rev_1.0.pdf
>
>
> An RDMA-accelerated shuffle engine can provide enormous performance benefits 
> to shuffle-intensive Spark jobs, as demonstrated in the “SparkRDMA” plugin 
> open-source project ([https://github.com/Mellanox/SparkRDMA]).
> Using RDMA for shuffle improves CPU utilization significantly and reduces I/O 
> processing overhead by bypassing the kernel and networking stack as well as 
> avoiding memory copies entirely. Those valuable CPU cycles are then consumed 
> directly by the actual Spark workloads, and help reducing the job runtime 
> significantly. 
> This performance gain is demonstrated with both industry standard HiBench 
> TeraSort (shows 1.5x speedup in sorting) as well as shuffle intensive 
> customer applications. 
> SparkRDMA will be presented at Spark Summit 2017 in Dublin 
> ([https://spark-summit.org/eu-2017/events/accelerating-shuffle-a-tailor-made-rdma-solution-for-apache-spark/]).
> Please see attached proposal document for more information.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-22229) SPIP: RDMA Accelerated Shuffle Engine

2017-10-13 Thread Yuval Degani (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16204041#comment-16204041
 ] 

Yuval Degani commented on SPARK-9:
--

[~jerryshao], thanks for clearing your point.
I agree that RDMA adoption in big data is not widespread as it can be, yet. I 
think that's because RDMA is not supported by either MapReduce or Spark, which 
basically control the big data world.

On the other hand, RDMA is very popular in distributed machine learning 
platforms, which were introduced much later than Spark and MapReduce.
Spark later caught up with allowing integration with GPU resources. I think 
once RDMA is introduced into Spark, it will grow in adoption in big data, same 
as was with TensorFlow and Caffe2.

Moreover, RDMA was discussed in the context of Spark many times before, also at 
previous Spark Summits:
https://spark-summit.org/2017/events/running-apache-spark-on-a-high-performance-cluster-using-rdma-and-nvme-flash/
https://spark-summit.org/east-2017/events/bringing-hpc-algorithms-to-big-data-platforms/
https://spark-summit.org/2017/events/tensorflow-on-spark-scalable-tensorflow-learning-on-spark-clusters/

I think there's a lot of interest in RDMA around Spark, and what I propose here 
is a practical solution for introducing RDMA capabilities.




> SPIP: RDMA Accelerated Shuffle Engine
> -
>
> Key: SPARK-9
> URL: https://issues.apache.org/jira/browse/SPARK-9
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.3.0
>Reporter: Yuval Degani
> Attachments: 
> SPARK-9_SPIP_RDMA_Accelerated_Shuffle_Engine_Rev_1.0.pdf
>
>
> An RDMA-accelerated shuffle engine can provide enormous performance benefits 
> to shuffle-intensive Spark jobs, as demonstrated in the “SparkRDMA” plugin 
> open-source project ([https://github.com/Mellanox/SparkRDMA]).
> Using RDMA for shuffle improves CPU utilization significantly and reduces I/O 
> processing overhead by bypassing the kernel and networking stack as well as 
> avoiding memory copies entirely. Those valuable CPU cycles are then consumed 
> directly by the actual Spark workloads, and help reducing the job runtime 
> significantly. 
> This performance gain is demonstrated with both industry standard HiBench 
> TeraSort (shows 1.5x speedup in sorting) as well as shuffle intensive 
> customer applications. 
> SparkRDMA will be presented at Spark Summit 2017 in Dublin 
> ([https://spark-summit.org/eu-2017/events/accelerating-shuffle-a-tailor-made-rdma-solution-for-apache-spark/]).
> Please see attached proposal document for more information.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-22229) SPIP: RDMA Accelerated Shuffle Engine

2017-10-19 Thread Yuval Degani (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16211038#comment-16211038
 ] 

Yuval Degani commented on SPARK-9:
--

[~rvesse], thanks for taking the time to review.

Regarding performance testing - we use a suite of common benchmarks in our day 
to day work (HiBench, TPC-DS, etc...), as well as several customer applications.
We see anywhere between 5% to 120% speedups in runtime, depending on the type 
and size of the workload.
In terms of scale, we test multiple variations in the range of 2 to 128 
physical machines, and with different link speeds (10-100Gbps).

Regarding compression, we share the same experience with setting 
{{spark.shuffle.compress=false}}. It causes TCP/IP to perform better in most 
cases, and since the shuffle size is significantly larger, RDMA shows even a 
bigger advantage over TCP/IP in that case.
For example, in TeraSort, when comparing TCP/IP to RDMA with 
{{spark.shuffle.compress=false}}, we see about %45 speedup in the total 
runtime. Running the same test with {{spark.shuffle.compress=true}}, yields 
around %20 speedup for RDMA, as the shuffle size reduces significantly.

Regarding licensing of RDMA dependencies, I'll try to do some proper drill down 
into the issues raised.
In general, {{librdmacm}} is part of the linux source code, and is not 
statically linked with {{libdisni}}. I'm not a licensing expert in any way, but 
I presume that other Apache projects depend on linux libraries at different 
capacities.
There's at least one Apache project that I'm aware of that depends on 
{{librdmacm}} - [Apache Qpid|http://qpid.apache.org/] - so I think this 
constitutes a precedent for having a dependency on {{librdmacm}}.


> SPIP: RDMA Accelerated Shuffle Engine
> -
>
> Key: SPARK-9
> URL: https://issues.apache.org/jira/browse/SPARK-9
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.3.0
>Reporter: Yuval Degani
> Attachments: 
> SPARK-9_SPIP_RDMA_Accelerated_Shuffle_Engine_Rev_1.0.pdf
>
>
> An RDMA-accelerated shuffle engine can provide enormous performance benefits 
> to shuffle-intensive Spark jobs, as demonstrated in the “SparkRDMA” plugin 
> open-source project ([https://github.com/Mellanox/SparkRDMA]).
> Using RDMA for shuffle improves CPU utilization significantly and reduces I/O 
> processing overhead by bypassing the kernel and networking stack as well as 
> avoiding memory copies entirely. Those valuable CPU cycles are then consumed 
> directly by the actual Spark workloads, and help reducing the job runtime 
> significantly. 
> This performance gain is demonstrated with both industry standard HiBench 
> TeraSort (shows 1.5x speedup in sorting) as well as shuffle intensive 
> customer applications. 
> SparkRDMA will be presented at Spark Summit 2017 in Dublin 
> ([https://spark-summit.org/eu-2017/events/accelerating-shuffle-a-tailor-made-rdma-solution-for-apache-spark/]).
> Please see attached proposal document for more information.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-22229) SPIP: RDMA Accelerated Shuffle Engine

2018-01-11 Thread Yuval Degani (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16322747#comment-16322747
 ] 

Yuval Degani commented on SPARK-9:
--

[~byronyi], thank you for kicking off the discussion again.
I affirm that libdisni does not introduce non-GPL code, as it dynamically links 
to 'librdmacm' as Bairen said.

I didn't submit a PR yet since I wanted to have some support on the direction 
we should go with.
My suggestion is to integrate [SparkRDMA|https://github.com/Mellanox/SparkRDMA] 
into upstream Spark as a non-default option for the ShuffleManager. We can 
later introduce a method to automatically detect support for RDMA.
We are now working on a new version of SparkRDMA that introduces further 
significant performance speedups and also improves stability and scalability.
We expect to release this version on GitHub as GA by the end of Q1 2018.
I think that will be an ideal milestone for integrating SparkRDMA into Spark, 
and shifting the dev work to the Spark main tree.

> SPIP: RDMA Accelerated Shuffle Engine
> -
>
> Key: SPARK-9
> URL: https://issues.apache.org/jira/browse/SPARK-9
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.3.0
>Reporter: Yuval Degani
> Attachments: 
> SPARK-9_SPIP_RDMA_Accelerated_Shuffle_Engine_Rev_1.0.pdf
>
>
> An RDMA-accelerated shuffle engine can provide enormous performance benefits 
> to shuffle-intensive Spark jobs, as demonstrated in the “SparkRDMA” plugin 
> open-source project ([https://github.com/Mellanox/SparkRDMA]).
> Using RDMA for shuffle improves CPU utilization significantly and reduces I/O 
> processing overhead by bypassing the kernel and networking stack as well as 
> avoiding memory copies entirely. Those valuable CPU cycles are then consumed 
> directly by the actual Spark workloads, and help reducing the job runtime 
> significantly. 
> This performance gain is demonstrated with both industry standard HiBench 
> TeraSort (shows 1.5x speedup in sorting) as well as shuffle intensive 
> customer applications. 
> SparkRDMA will be presented at Spark Summit 2017 in Dublin 
> ([https://spark-summit.org/eu-2017/events/accelerating-shuffle-a-tailor-made-rdma-solution-for-apache-spark/]).
> Please see attached proposal document for more information.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-22229) SPIP: RDMA Accelerated Shuffle Engine

2019-01-25 Thread Yuval Degani (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-9?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16752451#comment-16752451
 ] 

Yuval Degani commented on SPARK-9:
--

Great questions, [~tgraves]
 * SparkRDMA (starting version 2.0) supports ODP (On-Demand Paging) for RDMA 
buffers, meaning that it can handle memory buffers that are not necessarily 
pinned to physical memory. This allows SparkRDMA buffers and mapped shuffle 
files to be swapped out and thus occupy more space than can fit in memory. 
Further reading: 
[https://community.mellanox.com/s/article/understanding-on-demand-paging--odp-x|http://example.com/]
 * Jobs will, of course, perform better if they can fit in memory - this is 
true for SparkRDMA and Spark in general. [~prudenko] can also share some 
results with over-subscription on memory that shows further value for SparkRDMA.
 * SparkRDMA works seamlessly on both InfiniBand and Ethernet fabrics. 
InfiniBand will provide better numbers compared to Ethernet. One example is our 
joint work with Microsoft Azure on their InfiniBand HPC clusters: 
[https://databricks.com/session/accelerated-spark-on-azure-seamless-and-scalable-hardware-offloads-in-the-cloud|http://example.com/]
 * SparkRDMA is considered GA and production ready. It has been under 
continuous development since 2016 while integrating with various customers with 
a variety of workload patterns and sizes.
 * Re redundancy of MapStatuses - SparkRDMA offers an alternate protocol for 
obtaining MapStatuses and translating them into remote memory addresses while 
utilizing RDMA as well. SparkRDMA collects an RDMA-able table on the driver of 
remote memory addresses per mapper (each mapper holds another table mapping 
from reduceIds to memory addresses that contain the shuffle data for that 
reduceId). SparkRDMA uses RDMA-Read for obtaining information from the tables 
while removing significant overhead from the driver and reducing its position 
as a bottleneck. It also reduces overhead from executors as they also use 
RDMA-Read to obtain translations instead of costly RPCs. The SparkRDMA 
translation protocol is fully compliant with Spark's recovery mechanisms for 
crashed tasks/executors.
 * True, SparkRDMA does not support the external shuffle service at this time, 
although this is in the plans for the next version. And yes, that means that 
dynamic allocation is not yet supported as well.
 *  If an executor crashes, the files still remain on the disk, though they 
will lose their RDMA mapping. SparkRDMA does not recover them as of now, but 
rather requires to rerun the map tasks. I do believe however that this is also 
the case for the traditional Shuffle engine in Spark (unless it was changed 
recently)
 * Re off-heap memory - yes, the user will be required to provide an adequate 
amount of memory to the JVM so that it can contain the memory that's needed. As 
far as I have seen so far, this usually requires changes to YARN configs only 
and not to Spark it self. [~prudenko] , please correct me if I'm wrong.

> SPIP: RDMA Accelerated Shuffle Engine
> -
>
> Key: SPARK-9
> URL: https://issues.apache.org/jira/browse/SPARK-9
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.3.0, 2.4.0, 3.0.0
>Reporter: Yuval Degani
>Priority: Major
> Attachments: 
> SPARK-9_SPIP_RDMA_Accelerated_Shuffle_Engine_Rev_1.0.pdf
>
>
> An RDMA-accelerated shuffle engine can provide enormous performance benefits 
> to shuffle-intensive Spark jobs, as demonstrated in the “SparkRDMA” plugin 
> open-source project ([https://github.com/Mellanox/SparkRDMA]).
> Using RDMA for shuffle improves CPU utilization significantly and reduces I/O 
> processing overhead by bypassing the kernel and networking stack as well as 
> avoiding memory copies entirely. Those valuable CPU cycles are then consumed 
> directly by the actual Spark workloads, and help reducing the job runtime 
> significantly. 
> This performance gain is demonstrated with both industry standard HiBench 
> TeraSort (shows 1.5x speedup in sorting) as well as shuffle intensive 
> customer applications. 
> SparkRDMA will be presented at Spark Summit 2017 in Dublin 
> ([https://spark-summit.org/eu-2017/events/accelerating-shuffle-a-tailor-made-rdma-solution-for-apache-spark/]).
> Please see attached proposal document for more information.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org