agrawaldevesh opened a new pull request #29015:
URL: https://github.com/apache/spark/pull/29015


   ### What changes were proposed in this pull request?
   
   This PR allows an external agent to inform the Master that certain nodes
   (or host-ports) are being decommissioned.
   
   ### Why are the changes needed?
   
   The current decommissioning is triggered by the Worker getting getting a 
SIGPWR
   (out of band possibly by some cleanup hook), which then informs the Master
   about it. This approach may not be feasible in some environments that cannot
   trigger a clean up hook on the Worker. In addition, when a large number of
   worker nodes are being decommissioned then the master will get a flood of
   messages.
   
   So we add a new post endpoint `/workers/kill` on the MasterWebUI that allows 
an
   external agent to inform the master about all the nodes being decommissioned 
in
   bulk. The workers are identified by either their `host:port` or just the host
   -- in which case all workers on the host would be decommissioned.
   
   This API is merely a new entry point into the existing decommissioning
   logic. It does not change how the decommissioning request is handled in
   its core.
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes, a new endpoint `/workers/kill` is added to the MasterWebUI. By default 
this endpoint is disabled.
   
   ### How was this patch tested?
   
   Added unit tests
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to