[ 
https://issues.apache.org/jira/browse/SPARK-18761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai resolved SPARK-18761.
------------------------------
       Resolution: Fixed
    Fix Version/s: 2.2.0

Issue resolved by pull request 16189
[https://github.com/apache/spark/pull/16189]

> Uncancellable / unkillable tasks may starve jobs of resoures
> ------------------------------------------------------------
>
>                 Key: SPARK-18761
>                 URL: https://issues.apache.org/jira/browse/SPARK-18761
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>            Reporter: Josh Rosen
>            Assignee: Josh Rosen
>             Fix For: 2.2.0
>
>
> Spark's current task cancellation / task killing mechanism is "best effort" 
> in the sense that some tasks may not be interruptible and may not respond to 
> their "killed" flags being set. If a significant fraction of a cluster's task 
> slots are occupied by tasks that have been marked as killed but remain 
> running then this can lead to a situation where new jobs and tasks are 
> starved of resources because zombie tasks are holding resources.
> I propose to address this problem by introducing a "task reaper" mechanism in 
> executors to monitor tasks after they are marked for killing in order to 
> periodically re-attempt the task kill, capture and log stacktraces / warnings 
> if tasks do not exit in a timely manner, and, optionally, kill the entire 
> executor JVM if cancelled tasks cannot be killed within some timeout.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to