prakharjain09 commented on issue #27864: [SPARK-20732][CORE] Decommission cache 
blocks to other executors when an executor is decommissioned
URL: https://github.com/apache/spark/pull/27864#issuecomment-598769144
 
 
   @holdenk @cloud-fan  This PR currently does not handle running tasks - What 
to do with tasks that are already running on these decommissioned executors? If 
we start BlockManager decommissioning and we stop accepting new cached RDD 
blocks, then these already running tasks might fail (In case these already 
running tasks try to cache something).
   
   One possible approach is: Kill these already running tasks after some 
configurable timeout.
   
   Executor decommissioning can be broken into two steps:
   1. Compute decommissioning: Stop assigning new tasks, Kill existing running 
tasks - maybe after some timeout
   2. Storage decommissioning: BlockManager stops taking new cache blocks, 
offloads old  cache blocks
   
   Firstly we can do `Compute decommissioning` i.e. Stop assigning new tasks 
(already done as part of  SPARK-20628), kill running tasks after some timeout 
(TODO)
   Then do `Storage decommissioning` i.e. Stop accepting new RDD cache blocks 
(done as part of this PR), Move existing RDD cache blocks to other possible 
locations (done as part of this PR), Maybe drop the blocks if executor is not 
able to offload after some timeout - say after 5-10 cycles/retries.
   
   Any thoughts/feedback?
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to