[ https://issues.apache.org/jira/browse/SPARK-26268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16729297#comment-16729297 ]
Peiyu Zhuang commented on SPARK-26268: -------------------------------------- Check [SPARK-25299|https://issues.apache.org/jira/browse/SPARK-25299], we are trying to implement a shuffle manager with storage plugin that could support different kinds of external/local storage. The work will be open-source soon. > Decouple shuffle data from Spark deployment > ------------------------------------------- > > Key: SPARK-26268 > URL: https://issues.apache.org/jira/browse/SPARK-26268 > Project: Spark > Issue Type: Improvement > Components: Shuffle > Affects Versions: 2.4.0 > Reporter: Ben Sidhom > Priority: Major > > Right now the batch scheduler assumes that shuffle data is tied to executors. > As a result, when an executor is lost, any map tasks that ran on that > executor are rescheduled unless the "external" shuffle service is being used. > Note that this service is only external in the sense that it does not live > within executors themselves; its implementation cannot be swapped out and it > is assumed to speak the BlockManager language. > The following changes would facilitate external shuffle (see SPARK-25299 for > motivation): > * Do not rerun map tasks on lost executors when shuffle data is stored > externally. For example, this could be determined by a property or by an > additional method that all ShuffleManagers implement. > * Do not assume that shuffle data is stored in the standard BlockManager > format or that a BlockManager is or must be available to ShuffleManagers. > Note that only the first change is actually required to realize the benefits > of remote shuffle implementations as a phony (or null) BlockManager can be > used by shuffle implementations. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org