Devesh Agrawal created SPARK-32217:
--------------------------------------

             Summary: Track whether the worker is also being decommissioned 
along with an executor
                 Key: SPARK-32217
                 URL: https://issues.apache.org/jira/browse/SPARK-32217
             Project: Spark
          Issue Type: Sub-task
          Components: Spark Core
    Affects Versions: 3.1.0
            Reporter: Devesh Agrawal


When an executor is decommissioned, we would like to know if its shuffle data 
is truly going to be lost. In the case of external shuffle service, this means 
knowing that the worker (or the node that the executor is on) is also going to 
be lost. 

 

( I don't think we need to worry about disaggregated remote shuffle storage at 
present since those are only used in a couple of web companies – but when there 
is remote shuffle then yes the shuffle won't be lost with a decommissioned 
executor )

 

We know for sure that a worker is being decommissioned when the Master is asked 
to decommission a worker. In case of other schedulers:
 * Yarn support for decommissioning isn't implemented yet. But the idea would 
be for Yarn preeemption to not mark that the worker is being lost, but machine 
level decommissioning (like for kernel upgrades) to do mark such.
 * K8s isn't quite working with external shuffle service as yet, so when the 
executor is lost, the worker isn't quite lost with it. 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to