GitHub user squito opened a pull request:

    https://github.com/apache/spark/pull/13234

    [WIP] [SPARK-8426] Enhance Blacklist mechanism for fault-tolerance

    ## What changes were proposed in this pull request?
    
    Update of https://github.com/apache/spark/pull/8760 by @mwws.  The current 
blacklist mechanism only considers one task a time -- this expands that by 
considering:
    1. When we determine an executor is bad, we blacklist *all* tasks from that 
blacklist, both within the taskset and subsequent task sets.
    2. When many executors on a node appear to be bad, we blacklist the entire 
node.
    
    ## How was this patch tested?
    
    Unit tests via jenkins.
    Also I ran the additional tests proposed 
[here](https://github.com/apache/spark/pull/8559) which include blacklist 
tests. 
    
    TODO:
    [ ] performance tests
    [ ] more internal comments (in particular on concurrency)
    [ ] manual testing on a cluster

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/squito/spark blacklist-SPARK-8426

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/13234.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #13234
    
----
commit 975a2a3c2b810f6b462eb46813075aac4928c0ae
Author: mwws <wei....@intel.com>
Date:   2015-12-29T06:01:17Z

    enhance blacklist mechanism
    
    1. create new BlacklistTracker and BlacklistStrategy interface to
    support
    complex use case for blacklist mechanism.
    2. make Yarn allocator aware of node blacklist information
    3. three strategies implemented for convenience, also user can define
    his own strategy
    SingleTaskStrategy: remain default behavior before this change.
    AdvanceSingleTaskStrategy: enhance SingleTaskStrategy by supporting
    stage level node blacklist
    ExecutorAndNodeStrategy: different taskSet can share blacklist
    information.

commit 51d3c88720faffd6a1fb6910b999cdce0d446bcf
Author: mwws <wei....@intel.com>
Date:   2016-01-13T05:43:46Z

    change import order to meet new scala style check rule

commit 7e52311bcf4b5528d127d1d0a16bade7c039517e
Author: mwws <wei....@intel.com>
Date:   2016-02-23T05:28:56Z

    simplify code and fix typo
    
    1. fix compile error after rebase to latest codebas.
    2. simplify configuration.
    3. fix typo.
    4. enhance comment and unit text.
    5. remove unused import.
    6. remove ExecutorAndNode strategy.

commit b600604a0920054cf3b33bff047d84cbd302fb3c
Author: Imran Rashid <iras...@cloudera.com>
Date:   2016-05-10T17:49:05Z

    style

commit 45525a118db078f80b3e0e74abe7d7f2e04a7883
Author: Imran Rashid <iras...@cloudera.com>
Date:   2016-05-10T19:27:39Z

    small refactoring

commit f6bb6de673cae7058c26d2f124d3de0d2eb5b06b
Author: Imran Rashid <iras...@cloudera.com>
Date:   2016-05-20T21:09:13Z

    Merge branch 'master' into blacklist-SPARK-8426

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to