GitHub user holdenk opened a pull request:

    https://github.com/apache/spark/pull/19045

    [WIP][SPARK-20628][CORE] Keep track of nodes (/ spot instances) which are 
going to be shutdown

    ## What changes were proposed in this pull request?
    
    Keep track of nodes which are going to be shutdown to prevent schedualing 
tasks. The PR is designed with spot instances in mind, where there is some 
notice (depending on the cloud vendor) that the node will be shut down.
    
    Since each vendor notifies instances of pending termination in different 
manner, it is left to the instance to notify the worker(s) of decommissioning 
with SIGPWR.
    
    SPARK-20628 is a sub-task of SPARK-20624 with follow up tasks to perform 
migration of data and re-launching of tasks. SPARK-20628 is distinct from other 
mechanism where Spark its self has control of executor decommissioning, however 
the later follow up tasks in SPARK-20624 should be usable across voluntary and 
involuntary termination (e.g. https://github.com/apache/spark/pull/19041 could 
provide a good mechanism for doing data copy during involuntary termination).
    
    ## How was this patch tested?
    
    Extension of AppClientSuite to cover decommissioning and addition of 
explicit worker decom suite.
    
    TODO: Deploy on live EC2 cluster with companion monitoring script and wait 
for spot instance prices to spike and confirm decommissioning.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/holdenk/spark 
SPARK-20628-keep-track-of-nodes-which-are-going-to-be-shutdown-r2

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/19045.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #19045
    
----
commit 81fff20471bd2aded08380c8dd99c09fe34d2c79
Author: Holden Karau <hol...@us.ibm.com>
Date:   2017-05-12T14:07:40Z

    Start of work on adventures

commit e470bac53151418d02dd5f03f243d635900376a9
Author: Holden Karau <hol...@us.ibm.com>
Date:   2017-06-02T12:16:39Z

    Mini progresss

commit a00c707cd707c6ca2003c4d53ee51735dda3a96e
Author: Holden Karau <hol...@us.ibm.com>
Date:   2017-06-02T13:29:41Z

    Go down the path of handling as lost but urgh lets just blacklist instead 
maybe

commit 74ade447ec94b600f5447a9269e66e47ae78fb11
Author: Holden Karau <hol...@us.ibm.com>
Date:   2017-06-09T18:59:26Z

    Plumb through executor loss to the scheduables

commit a880177f9bf45a2f0644229fbff863f80d058161
Author: Holden Karau <hol...@us.ibm.com>
Date:   2017-06-21T13:00:03Z

    AppClient suite works! yay

commit b9704038e96b0bb862b824cb9723e68633e18c06
Author: Holden Karau <hol...@us.ibm.com>
Date:   2017-06-21T17:04:53Z

    Decomissioning now works in the coarse grained scheduler, yay....

commit ded6bbc8d056f9f82302450aa27dbc0d94fdbccd
Author: Holden Karau <hol...@us.ibm.com>
Date:   2017-06-22T16:10:06Z

    Remove sketchy println debugging

commit 16c855ad9eb7b961d805bc2f459d86f3b3d31108
Author: Holden Karau <hol...@us.ibm.com>
Date:   2017-07-06T06:13:41Z

    Add a worker decommissioning suite

commit c79a06d0d38c53877c9d1b607ba95bb7b89f1e44
Author: Holden Karau <hol...@us.ibm.com>
Date:   2017-07-06T23:41:34Z

    Merge in latest master

commit e3798d0f462659ca4ebb4ba660c8f00aa023c380
Author: Holden Karau <hol...@us.ibm.com>
Date:   2017-08-16T18:57:00Z

    Merge branch 'master' into 
SPARK-20628-keep-track-of-nodes-which-are-going-to-be-shutdown-r2

commit 4f70706847a4d04b78e19e0eaa50035e9721e7f0
Author: Holden Karau <hol...@us.ibm.com>
Date:   2017-08-17T18:32:56Z

    Merge branch 'master' into 
SPARK-20628-keep-track-of-nodes-which-are-going-to-be-shutdown-r2

commit 07c3e3e01516f43f67ad67cc55581197008c7556
Author: Holden Karau <hol...@us.ibm.com>
Date:   2017-08-22T19:10:01Z

    Merge branch 'master' into 
SPARK-20628-keep-track-of-nodes-which-are-going-to-be-shutdown-r2

commit c2a0ad87dc3220eb5154a6d0a117ce0260bd2695
Author: Holden Karau <hol...@us.ibm.com>
Date:   2017-08-22T20:28:24Z

    Add decommissioning script for whatever process is running locally on host 
to call

commit 672c3b6f79400cce867ce273199ccdcf995b6ed6
Author: Holden Karau <hol...@us.ibm.com>
Date:   2017-08-22T21:38:51Z

    Leave polling mechanism up to the cloud vendors

commit 9cfdb7fc36691bf0c627080de5c2008fe83ba3bd
Author: Holden Karau <hol...@us.ibm.com>
Date:   2017-08-22T21:55:12Z

    Remove legacy comment and remove some unecessary blank lines

commit 65a29c12c1740c285ff7b06f3788cd2a92ce87f1
Author: Holden Karau <hol...@us.ibm.com>
Date:   2017-08-22T21:59:24Z

    Remove manually debugging printlns (oops)

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to