Tal Sliwowicz created SPARK-4006:
------------------------------------

             Summary: Spark Driver crashes whenever an Executor is registered 
twice
                 Key: SPARK-4006
                 URL: https://issues.apache.org/jira/browse/SPARK-4006
             Project: Spark
          Issue Type: Bug
          Components: Block Manager, Spark Core
    Affects Versions: 1.1.0, 1.0.2, 0.9.2
         Environment: Mesos, Coarse Grained
            Reporter: Tal Sliwowicz
            Priority: Critical


We have long running spark drivers and even though we have state of the art 
hardware, from time to time executors disconnect. In many cases, the 
RemoveExecutor is not received, and when the new executor registers, the driver 
crashes. In mesos coarse grained, executor ids are fixed. 

The issue is with the System.exit(1) in BlockManagerMasterActor


private def register(id: BlockManagerId, maxMemSize: Long, slaveActor: 
ActorRef) {
    if (!blockManagerInfo.contains(id)) {
      blockManagerIdByExecutor.get(id.executorId) match {
        case Some(manager) =>
          // A block manager of the same executor already exists.
          // This should never happen. Let's just quit.
          logError("Got two different block manager registrations on " + 
id.executorId)
          System.exit(1)
        case None =>
          blockManagerIdByExecutor(id.executorId) = id
      }

      logInfo("Registering block manager %s with %s RAM".format(
        id.hostPort, Utils.bytesToString(maxMemSize)))

      blockManagerInfo(id) =
        new BlockManagerInfo(id, System.currentTimeMillis(), maxMemSize, 
slaveActor)
    }
    listenerBus.post(SparkListenerBlockManagerAdded(id, maxMemSize))
  }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to