Tal Sliwowicz created SPARK-4006: ------------------------------------ Summary: Spark Driver crashes whenever an Executor is registered twice Key: SPARK-4006 URL: https://issues.apache.org/jira/browse/SPARK-4006 Project: Spark Issue Type: Bug Components: Block Manager, Spark Core Affects Versions: 1.1.0, 1.0.2, 0.9.2 Environment: Mesos, Coarse Grained Reporter: Tal Sliwowicz Priority: Critical
We have long running spark drivers and even though we have state of the art hardware, from time to time executors disconnect. In many cases, the RemoveExecutor is not received, and when the new executor registers, the driver crashes. In mesos coarse grained, executor ids are fixed. The issue is with the System.exit(1) in BlockManagerMasterActor private def register(id: BlockManagerId, maxMemSize: Long, slaveActor: ActorRef) { if (!blockManagerInfo.contains(id)) { blockManagerIdByExecutor.get(id.executorId) match { case Some(manager) => // A block manager of the same executor already exists. // This should never happen. Let's just quit. logError("Got two different block manager registrations on " + id.executorId) System.exit(1) case None => blockManagerIdByExecutor(id.executorId) = id } logInfo("Registering block manager %s with %s RAM".format( id.hostPort, Utils.bytesToString(maxMemSize))) blockManagerInfo(id) = new BlockManagerInfo(id, System.currentTimeMillis(), maxMemSize, slaveActor) } listenerBus.post(SparkListenerBlockManagerAdded(id, maxMemSize)) } -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org