[ https://issues.apache.org/jira/browse/SPARK-9552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andrew Or resolved SPARK-9552. ------------------------------ Resolution: Fixed Fix Version/s: 1.6.0 Target Version/s: 1.6.0 > Dynamic allocation kills busy executors on race condition > --------------------------------------------------------- > > Key: SPARK-9552 > URL: https://issues.apache.org/jira/browse/SPARK-9552 > Project: Spark > Issue Type: Bug > Components: Scheduler > Affects Versions: 1.4.0, 1.4.1 > Reporter: Jie Huang > Assignee: Jie Huang > Fix For: 1.6.0 > > > By using the dynamic allocation, sometimes it occurs false killing for those > busy executors. Some executors with assignments will be killed because of > being idle for enough time (say 60 seconds). The root cause is that the > Task-Launch listener event is asynchronized. > For example, some executors are under assigning tasks, but not sending out > the listener notification yet. Meanwhile, the dynamic allocation's executor > idle time is up (e.g., 60 seconds). It will trigger killExecutor event at the > same time. > the timer expiration starts before the listener event arrives. > Then, the task is going to run on top of that killed/killing executor. It > will lead to task failure finally. > Here is the proposal to fix it. We can add the force control for > killExecutor. If the force control is not set (i.e., false), we'd better to > check if the executor under killing is idle or busy. If the current executor > has some assignment, we should not kill that executor and return back false > (to indicate killing failure). In dynamic allocation, we'd better to turn off > force killing (i.e., force = false), we will meet killing failure if tries to > kill a busy executor. And then, the executor timer won't be invalid. Later > on, the task assignment event arrives, we can remove the idle timer > accordingly. So that we can avoid false killing for those busy executors in > dynamic allocation. > For the rest of usages, the end users can decide if to use force killing or > not by themselves. If to turn on that option, the killExecutor will do the > action without any status checking. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org