Hi Wenchen,

Glad to know that you like this idea.
We also looked into making this pluggable in our early design phase.
While the ShuffleManager API for pluggable shuffle systems does provide
quite some room for customized behaviors for Spark shuffle, we feel that it
is still not enough for this case.

Right now, the shuffle block location information is tracked inside
MapOutputTracker and updated by DAGScheduler.
Since we are relocating the shuffle blocks to improve overall shuffle
throughput and efficiency, being able to update the information tracked
inside MapOutputTracker so reducers can access their shuffle input more
efficiently is thus necessary.
Letting DAGScheduler orchestrate this process also provides the benefit of
better coping with stragglers.
If DAGScheduler has no control or is agnostic of the block push progress, it
does leave a few gaps.

On the shuffle Netty protocol side, there are a lot that can be leveraged
from the existing code.
With improvements in SPARK-24355 and SPARK-30512, the shuffle service Netty
server is becoming much more reliable.
The work in SPARK-6237 also provided quite some leverage for streaming push
of shuffle blocks.
Instead of building all of these from scratch, we took the alternative route
of building on top of the existing Netty protocol to implement the shuffle
block push operation.

We feel that this design has the potential of further improving Spark
shuffle system's scalability and efficiency, making Spark an even better
compute engine.
Would like to explore how we can leverage the shuffle plugin API to make
this design more acceptable.



-----
Min Shen
Staff Software Engineer
LinkedIn
--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Reply via email to