Hi Andrew,

We are leveraging SPARK-6237 to control the off-heap memory consumption due
to Netty.
With that change, the data is processed in a streaming fashion so Netty
does not buffer an entire RPC in memory before handing it over to
RPCHandler.
We tested with our internal stress testing framework, and did not see much
change in the memory consumption of the shuffle service.
In terms of sharing the code, not sure what would be an effective way to do
that.
If interested, maybe we can call a meeting to chat in more depth.

Best,
Min

On Mon, Jan 27, 2020 at 11:30 AM Long, Andrew <loand...@amazon.com> wrote:

> Hey Min,
>
> One thing of concern would be off heap memory utilization due to netty.
> Depending on the number of connections that you create.
>
> Would it be possible to take a look at your code?  My team has a
> performance test harness that I'd like to test it with.
>
> Cheers Andrew
>
>
>
> On 1/23/20, 10:25 AM, "mshen" <ms...@apache.org> wrote:
>
>     Hi Wenchen,
>
>     Glad to know that you like this idea.
>     We also looked into making this pluggable in our early design phase.
>     While the ShuffleManager API for pluggable shuffle systems does provide
>     quite some room for customized behaviors for Spark shuffle, we feel
> that it
>     is still not enough for this case.
>
>     Right now, the shuffle block location information is tracked inside
>     MapOutputTracker and updated by DAGScheduler.
>     Since we are relocating the shuffle blocks to improve overall shuffle
>     throughput and efficiency, being able to update the information tracked
>     inside MapOutputTracker so reducers can access their shuffle input more
>     efficiently is thus necessary.
>     Letting DAGScheduler orchestrate this process also provides the
> benefit of
>     better coping with stragglers.
>     If DAGScheduler has no control or is agnostic of the block push
> progress, it
>     does leave a few gaps.
>
>     On the shuffle Netty protocol side, there are a lot that can be
> leveraged
>     from the existing code.
>     With improvements in SPARK-24355 and SPARK-30512, the shuffle service
> Netty
>     server is becoming much more reliable.
>     The work in SPARK-6237 also provided quite some leverage for streaming
> push
>     of shuffle blocks.
>     Instead of building all of these from scratch, we took the alternative
> route
>     of building on top of the existing Netty protocol to implement the
> shuffle
>     block push operation.
>
>     We feel that this design has the potential of further improving Spark
>     shuffle system's scalability and efficiency, making Spark an even
> better
>     compute engine.
>     Would like to explore how we can leverage the shuffle plugin API to
> make
>     this design more acceptable.
>
>
>
>     -----
>     Min Shen
>     Staff Software Engineer
>     LinkedIn
>     --
>     Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
>
>     ---------------------------------------------------------------------
>     To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>
>
>

Reply via email to