GitHub user holdenk opened a pull request: https://github.com/apache/spark/pull/19045
[WIP][SPARK-20628][CORE] Keep track of nodes (/ spot instances) which are going to be shutdown ## What changes were proposed in this pull request? Keep track of nodes which are going to be shutdown to prevent schedualing tasks. The PR is designed with spot instances in mind, where there is some notice (depending on the cloud vendor) that the node will be shut down. Since each vendor notifies instances of pending termination in different manner, it is left to the instance to notify the worker(s) of decommissioning with SIGPWR. SPARK-20628 is a sub-task of SPARK-20624 with follow up tasks to perform migration of data and re-launching of tasks. SPARK-20628 is distinct from other mechanism where Spark its self has control of executor decommissioning, however the later follow up tasks in SPARK-20624 should be usable across voluntary and involuntary termination (e.g. https://github.com/apache/spark/pull/19041 could provide a good mechanism for doing data copy during involuntary termination). ## How was this patch tested? Extension of AppClientSuite to cover decommissioning and addition of explicit worker decom suite. TODO: Deploy on live EC2 cluster with companion monitoring script and wait for spot instance prices to spike and confirm decommissioning. You can merge this pull request into a Git repository by running: $ git pull https://github.com/holdenk/spark SPARK-20628-keep-track-of-nodes-which-are-going-to-be-shutdown-r2 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19045.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19045 ---- commit 81fff20471bd2aded08380c8dd99c09fe34d2c79 Author: Holden Karau <hol...@us.ibm.com> Date: 2017-05-12T14:07:40Z Start of work on adventures commit e470bac53151418d02dd5f03f243d635900376a9 Author: Holden Karau <hol...@us.ibm.com> Date: 2017-06-02T12:16:39Z Mini progresss commit a00c707cd707c6ca2003c4d53ee51735dda3a96e Author: Holden Karau <hol...@us.ibm.com> Date: 2017-06-02T13:29:41Z Go down the path of handling as lost but urgh lets just blacklist instead maybe commit 74ade447ec94b600f5447a9269e66e47ae78fb11 Author: Holden Karau <hol...@us.ibm.com> Date: 2017-06-09T18:59:26Z Plumb through executor loss to the scheduables commit a880177f9bf45a2f0644229fbff863f80d058161 Author: Holden Karau <hol...@us.ibm.com> Date: 2017-06-21T13:00:03Z AppClient suite works! yay commit b9704038e96b0bb862b824cb9723e68633e18c06 Author: Holden Karau <hol...@us.ibm.com> Date: 2017-06-21T17:04:53Z Decomissioning now works in the coarse grained scheduler, yay.... commit ded6bbc8d056f9f82302450aa27dbc0d94fdbccd Author: Holden Karau <hol...@us.ibm.com> Date: 2017-06-22T16:10:06Z Remove sketchy println debugging commit 16c855ad9eb7b961d805bc2f459d86f3b3d31108 Author: Holden Karau <hol...@us.ibm.com> Date: 2017-07-06T06:13:41Z Add a worker decommissioning suite commit c79a06d0d38c53877c9d1b607ba95bb7b89f1e44 Author: Holden Karau <hol...@us.ibm.com> Date: 2017-07-06T23:41:34Z Merge in latest master commit e3798d0f462659ca4ebb4ba660c8f00aa023c380 Author: Holden Karau <hol...@us.ibm.com> Date: 2017-08-16T18:57:00Z Merge branch 'master' into SPARK-20628-keep-track-of-nodes-which-are-going-to-be-shutdown-r2 commit 4f70706847a4d04b78e19e0eaa50035e9721e7f0 Author: Holden Karau <hol...@us.ibm.com> Date: 2017-08-17T18:32:56Z Merge branch 'master' into SPARK-20628-keep-track-of-nodes-which-are-going-to-be-shutdown-r2 commit 07c3e3e01516f43f67ad67cc55581197008c7556 Author: Holden Karau <hol...@us.ibm.com> Date: 2017-08-22T19:10:01Z Merge branch 'master' into SPARK-20628-keep-track-of-nodes-which-are-going-to-be-shutdown-r2 commit c2a0ad87dc3220eb5154a6d0a117ce0260bd2695 Author: Holden Karau <hol...@us.ibm.com> Date: 2017-08-22T20:28:24Z Add decommissioning script for whatever process is running locally on host to call commit 672c3b6f79400cce867ce273199ccdcf995b6ed6 Author: Holden Karau <hol...@us.ibm.com> Date: 2017-08-22T21:38:51Z Leave polling mechanism up to the cloud vendors commit 9cfdb7fc36691bf0c627080de5c2008fe83ba3bd Author: Holden Karau <hol...@us.ibm.com> Date: 2017-08-22T21:55:12Z Remove legacy comment and remove some unecessary blank lines commit 65a29c12c1740c285ff7b06f3788cd2a92ce87f1 Author: Holden Karau <hol...@us.ibm.com> Date: 2017-08-22T21:59:24Z Remove manually debugging printlns (oops) ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org