[ https://issues.apache.org/jira/browse/SPARK-18761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Yin Huai resolved SPARK-18761. ------------------------------ Resolution: Fixed Fix Version/s: 2.2.0 Issue resolved by pull request 16189 [https://github.com/apache/spark/pull/16189] > Uncancellable / unkillable tasks may starve jobs of resoures > ------------------------------------------------------------ > > Key: SPARK-18761 > URL: https://issues.apache.org/jira/browse/SPARK-18761 > Project: Spark > Issue Type: Bug > Components: Spark Core > Reporter: Josh Rosen > Assignee: Josh Rosen > Fix For: 2.2.0 > > > Spark's current task cancellation / task killing mechanism is "best effort" > in the sense that some tasks may not be interruptible and may not respond to > their "killed" flags being set. If a significant fraction of a cluster's task > slots are occupied by tasks that have been marked as killed but remain > running then this can lead to a situation where new jobs and tasks are > starved of resources because zombie tasks are holding resources. > I propose to address this problem by introducing a "task reaper" mechanism in > executors to monitor tasks after they are marked for killing in order to > periodically re-attempt the task kill, capture and log stacktraces / warnings > if tasks do not exit in a timely manner, and, optionally, kill the entire > executor JVM if cancelled tasks cannot be killed within some timeout. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org