Re: Enabling push-based shuffle in Spark

Long, Andrew Mon, 27 Jan 2020 13:35:34 -0800

The easiest would be to create a fork of the code in github.   I can also 
accept diffs.


Cheers Andrew

From: Min Shen <ms...@apache.org>
Date: Monday, January 27, 2020 at 12:48 PM
To: "Long, Andrew" <loand...@amazon.com>, "dev@spark.apache.org" 
<dev@spark.apache.org>
Subject: Re: Enabling push-based shuffle in Spark

Hi Andrew,

We are leveraging SPARK-6237 to control the off-heap memory consumption due to 
Netty.
With that change, the data is processed in a streaming fashion so Netty does 
not buffer an entire RPC in memory before handing it over to RPCHandler.
We tested with our internal stress testing framework, and did not see much 
change in the memory consumption of the shuffle service.
In terms of sharing the code, not sure what would be an effective way to do 
that.
If interested, maybe we can call a meeting to chat in more depth.

Best,
Min

On Mon, Jan 27, 2020 at 11:30 AM Long, Andrew 
<loand...@amazon.com<mailto:loand...@amazon.com>> wrote:
Hey Min,

One thing of concern would be off heap memory utilization due to netty.  
Depending on the number of connections that you create.

Would it be possible to take a look at your code?  My team has a performance 
test harness that I'd like to test it with.

Cheers Andrew



On 1/23/20, 10:25 AM, "mshen" <ms...@apache.org<mailto:ms...@apache.org>> wrote:

    Hi Wenchen,

    Glad to know that you like this idea.
    We also looked into making this pluggable in our early design phase.
    While the ShuffleManager API for pluggable shuffle systems does provide
    quite some room for customized behaviors for Spark shuffle, we feel that it
    is still not enough for this case.

    Right now, the shuffle block location information is tracked inside
    MapOutputTracker and updated by DAGScheduler.
    Since we are relocating the shuffle blocks to improve overall shuffle
    throughput and efficiency, being able to update the information tracked
    inside MapOutputTracker so reducers can access their shuffle input more
    efficiently is thus necessary.
    Letting DAGScheduler orchestrate this process also provides the benefit of
    better coping with stragglers.
    If DAGScheduler has no control or is agnostic of the block push progress, it
    does leave a few gaps.

    On the shuffle Netty protocol side, there are a lot that can be leveraged
    from the existing code.
    With improvements in SPARK-24355 and SPARK-30512, the shuffle service Netty
    server is becoming much more reliable.
    The work in SPARK-6237 also provided quite some leverage for streaming push
    of shuffle blocks.
    Instead of building all of these from scratch, we took the alternative route
    of building on top of the existing Netty protocol to implement the shuffle
    block push operation.

    We feel that this design has the potential of further improving Spark
    shuffle system's scalability and efficiency, making Spark an even better
    compute engine.
    Would like to explore how we can leverage the shuffle plugin API to make
    this design more acceptable.



    -----
    Min Shen
    Staff Software Engineer
    LinkedIn
    --
    Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

    ---------------------------------------------------------------------
    To unsubscribe e-mail: 
dev-unsubscr...@spark.apache.org<mailto:dev-unsubscr...@spark.apache.org>

Re: Enabling push-based shuffle in Spark

Reply via email to