[ 
https://issues.apache.org/jira/browse/SPARK-34601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-34601:
---------------------------------
    Fix Version/s:     (was: 3.2.0)

> Do not delete shuffle file on executor lost event when using remote shuffle 
> service
> -----------------------------------------------------------------------------------
>
>                 Key: SPARK-34601
>                 URL: https://issues.apache.org/jira/browse/SPARK-34601
>             Project: Spark
>          Issue Type: New Feature
>          Components: Shuffle
>    Affects Versions: 3.2.0
>            Reporter: BoYang
>            Priority: Major
>              Labels: shuffle
>
> There are multiple work going on with disaggregated/remote shuffle service 
> (e.g. [LinkedIn 
> shuffle|https://engineering.linkedin.com/blog/2020/introducing-magnet], 
> [Facebook shuffle 
> service|https://databricks.com/session/cosco-an-efficient-facebook-scale-shuffle-service],
>  [Uber shuffle service|https://github.com/uber/RemoteShuffleService]). Such 
> remote shuffle service is not Spark External Shuffle Service. It could be 
> third party shuffle solution and user uses it by setting 
> spark.shuffle.manager. In those systems, shuffle data will be stored on 
> different server other than executor. Spark should not mark shuffle data lost 
> when the executor is lost. We could add a Spark configuration to control this 
> behavior. By default, Spark still mark shuffle file lost. For 
> disaggregated/remote shuffle service, people could set the configure to not 
> mark shuffle file lost.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to