[ https://issues.apache.org/jira/browse/SPARK-4006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andrew Or updated SPARK-4006: ----------------------------- Fix Version/s: 1.2.0 > Spark Driver crashes whenever an Executor is registered twice > ------------------------------------------------------------- > > Key: SPARK-4006 > URL: https://issues.apache.org/jira/browse/SPARK-4006 > Project: Spark > Issue Type: Bug > Components: Block Manager, Spark Core > Affects Versions: 0.9.2, 1.0.2, 1.1.0, 1.2.0 > Environment: Mesos, Coarse Grained > Reporter: Tal Sliwowicz > Priority: Critical > Fix For: 1.2.0 > > > This is a huge robustness issue for us (Taboola), in mission critical , time > sensitive (real time) spark jobs. > We have long running spark drivers and even though we have state of the art > hardware, from time to time executors disconnect. In many cases, the > RemoveExecutor is not received, and when the new executor registers, the > driver crashes. In mesos coarse grained, executor ids are fixed. > The issue is with the System.exit(1) in BlockManagerMasterActor > {code} > private def register(id: BlockManagerId, maxMemSize: Long, slaveActor: > ActorRef) { > if (!blockManagerInfo.contains(id)) { > blockManagerIdByExecutor.get(id.executorId) match { > case Some(manager) => > // A block manager of the same executor already exists. > // This should never happen. Let's just quit. > logError("Got two different block manager registrations on " + > id.executorId) > System.exit(1) > case None => > blockManagerIdByExecutor(id.executorId) = id > } > logInfo("Registering block manager %s with %s RAM".format( > id.hostPort, Utils.bytesToString(maxMemSize))) > blockManagerInfo(id) = > new BlockManagerInfo(id, System.currentTimeMillis(), maxMemSize, > slaveActor) > } > listenerBus.post(SparkListenerBlockManagerAdded(id, maxMemSize)) > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org