Saisai Shao created SPARK-8424: ---------------------------------- Summary: Add blacklist mechanism for task scheduler and Yarn container allocation Key: SPARK-8424 URL: https://issues.apache.org/jira/browse/SPARK-8424 Project: Spark Issue Type: New Feature Components: Scheduler, YARN Affects Versions: 1.4.0 Reporter: Saisai Shao
Previously MapReduce has a blacklist and graylist to exclude some constantly failed TaskTrackers/nodes, it is important for a large cluster to alleviate the problem of increasing chance of hardware and software failure. Unfortunately current version of Spark lacks such mechanism to blacklist some constantly failed executors/nodes. The only blacklist mechanism in Spark is to avoid relaunching the task on the same executor when this task is previously failed on this executor within specified time. So here propose a new feature to add blacklist mechanism for Spark, this proposal is divided into two sub-tasks: 1. Add a heuristic blacklist algorithm to track the status of executors by the status of finished tasks, and enable blacklist mechanism in tasking scheduling. 2. Enable blacklist mechanism in YARN container allocation (avoid allocating containers on the blacklist hosts). -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org