[ https://issues.apache.org/jira/browse/SPARK-17929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Weizhong updated SPARK-17929: ----------------------------- Summary: Deadlock when AM restart and send RemoveExecutor on reset (was: Deadlock when AM restart send RemoveExecutor) > Deadlock when AM restart and send RemoveExecutor on reset > --------------------------------------------------------- > > Key: SPARK-17929 > URL: https://issues.apache.org/jira/browse/SPARK-17929 > Project: Spark > Issue Type: Bug > Components: Spark Core > Affects Versions: 2.0.0 > Reporter: Weizhong > Priority: Minor > > We fix SPARK-10582, and add reset in CoarseGrainedSchedulerBackend.scala > {code} > protected def reset(): Unit = synchronized { > numPendingExecutors = 0 > executorsPendingToRemove.clear() > // Remove all the lingering executors that should be removed but not yet. > The reason might be > // because (1) disconnected event is not yet received; (2) executors die > silently. > executorDataMap.toMap.foreach { case (eid, _) => > driverEndpoint.askWithRetry[Boolean]( > RemoveExecutor(eid, SlaveLost("Stale executor after cluster manager > re-registered."))) > } > } > {code} > but on removeExecutor also need the lock > "CoarseGrainedSchedulerBackend.this.synchronized", this will cause deadlock, > and send RPC will failed, and reset failed > {code} > private def removeExecutor(executorId: String, reason: > ExecutorLossReason): Unit = { > logDebug(s"Asked to remove executor $executorId with reason $reason") > executorDataMap.get(executorId) match { > case Some(executorInfo) => > // This must be synchronized because variables mutated > // in this block are read when requesting executors > val killed = CoarseGrainedSchedulerBackend.this.synchronized { > addressToExecutorId -= executorInfo.executorAddress > executorDataMap -= executorId > executorsPendingLossReason -= executorId > executorsPendingToRemove.remove(executorId).getOrElse(false) > } > ... > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org