Mulitple Spark Context
I need continuously run multiple calculations concurrently on a cluster. They are not sharing RDDs. Each of the calculations needs different number of cores and memory. Also, some of them are long running calculation and others are short running calculation.They all need be run on regular basis and finish in time. The short ones cannot wait until long ones to finish. They cannot run too slow either since the short ones' running interval is short as well. It looks like the sharing inside a sparkContext cannot guarantee that the short ones will get enough resources to finish in time if long ones already running. Or am I wrong about that? I tried to create a sparkContext for each of the calculations but only the first one is alive. The rest dies. I am getting the error below. Is it possible to create multiple sparkContext from inside one application jvm? ERROR 2014-11-14 14:59:46 akka.actor.OneForOneStrategy: spark.httpBroadcast.uri java.util.NoSuchElementException: spark.httpBroadcast.uri at org.apache.spark.SparkConf$$anonfun$get$1.apply(SparkConf.scala:151) at org.apache.spark.SparkConf$$anonfun$get$1.apply(SparkConf.scala:151) at scala.collection.MapLike$class.getOrElse(MapLike.scala:128) at scala.collection.AbstractMap.getOrElse(Map.scala:58) at org.apache.spark.SparkConf.get(SparkConf.scala:151) at org.apache.spark.broadcast.HttpBroadcast$.initialize(HttpBroadcast.scala:104) at org.apache.spark.broadcast.HttpBroadcastFactory.initialize(HttpBroadcast.scala:70) at org.apache.spark.broadcast.BroadcastManager.initialize(Broadcast.scala:81) at org.apache.spark.broadcast.BroadcastManager.init(Broadcast.scala:68) at org.apache.spark.SparkEnv$.create(SparkEnv.scala:175) at org.apache.spark.executor.Executor.init(Executor.scala:110) at org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$receive$1.applyOrElse(CoarseGrainedExecutorBackend.scala:56) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) at akka.actor.ActorCell.invoke(ActorCell.scala:456) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237) at akka.dispatch.Mailbox.run(Mailbox.scala:219) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) INFO 2014-11-14 14:59:46 org.apache.spark.executor.CoarseGrainedExecutorBackend: Connecting to driver: akka.tcp://spark@172.32.1.12:51590/user/CoarseGrainedScheduler ERROR 2014-11-14 14:59:46 org.apache.spark.executor.CoarseGrainedExecutorBackend: Slave registration failed: Duplicate executor ID: 1 -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Mulitple-Spark-Context-tp18975.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Mulitple Spark Context
Its not recommended to have multiple spark contexts in one JVM, but you could launch a separate JVM per context. How resources get allocated is probably outside the scope of Spark, and more of a task for the cluster manager. On Fri, Nov 14, 2014 at 12:58 PM, Charles charles...@cenx.com wrote: I need continuously run multiple calculations concurrently on a cluster. They are not sharing RDDs. Each of the calculations needs different number of cores and memory. Also, some of them are long running calculation and others are short running calculation.They all need be run on regular basis and finish in time. The short ones cannot wait until long ones to finish. They cannot run too slow either since the short ones' running interval is short as well. It looks like the sharing inside a sparkContext cannot guarantee that the short ones will get enough resources to finish in time if long ones already running. Or am I wrong about that? I tried to create a sparkContext for each of the calculations but only the first one is alive. The rest dies. I am getting the error below. Is it possible to create multiple sparkContext from inside one application jvm? ERROR 2014-11-14 14:59:46 akka.actor.OneForOneStrategy: spark.httpBroadcast.uri java.util.NoSuchElementException: spark.httpBroadcast.uri at org.apache.spark.SparkConf$$anonfun$get$1.apply(SparkConf.scala:151) at org.apache.spark.SparkConf$$anonfun$get$1.apply(SparkConf.scala:151) at scala.collection.MapLike$class.getOrElse(MapLike.scala:128) at scala.collection.AbstractMap.getOrElse(Map.scala:58) at org.apache.spark.SparkConf.get(SparkConf.scala:151) at org.apache.spark.broadcast.HttpBroadcast$.initialize(HttpBroadcast.scala:104) at org.apache.spark.broadcast.HttpBroadcastFactory.initialize(HttpBroadcast.scala:70) at org.apache.spark.broadcast.BroadcastManager.initialize(Broadcast.scala:81) at org.apache.spark.broadcast.BroadcastManager.init(Broadcast.scala:68) at org.apache.spark.SparkEnv$.create(SparkEnv.scala:175) at org.apache.spark.executor.Executor.init(Executor.scala:110) at org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$receive$1.applyOrElse(CoarseGrainedExecutorBackend.scala:56) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) at akka.actor.ActorCell.invoke(ActorCell.scala:456) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237) at akka.dispatch.Mailbox.run(Mailbox.scala:219) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) INFO 2014-11-14 14:59:46 org.apache.spark.executor.CoarseGrainedExecutorBackend: Connecting to driver: akka.tcp://spark@172.32.1.12:51590/user/CoarseGrainedScheduler ERROR 2014-11-14 14:59:46 org.apache.spark.executor.CoarseGrainedExecutorBackend: Slave registration failed: Duplicate executor ID: 1 -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Mulitple-Spark-Context-tp18975.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
RE: Mulitple Spark Context
Does this also apply to StreamingContext ? What issue would I have if I have 1000s of StreaminContext ? Thanks Tri From: Daniil Osipov [mailto:daniil.osi...@shazam.com] Sent: Friday, November 14, 2014 3:47 PM To: Charles Cc: u...@spark.incubator.apache.org Subject: Re: Mulitple Spark Context Its not recommended to have multiple spark contexts in one JVM, but you could launch a separate JVM per context. How resources get allocated is probably outside the scope of Spark, and more of a task for the cluster manager. On Fri, Nov 14, 2014 at 12:58 PM, Charles charles...@cenx.commailto:charles...@cenx.com wrote: I need continuously run multiple calculations concurrently on a cluster. They are not sharing RDDs. Each of the calculations needs different number of cores and memory. Also, some of them are long running calculation and others are short running calculation.They all need be run on regular basis and finish in time. The short ones cannot wait until long ones to finish. They cannot run too slow either since the short ones' running interval is short as well. It looks like the sharing inside a sparkContext cannot guarantee that the short ones will get enough resources to finish in time if long ones already running. Or am I wrong about that? I tried to create a sparkContext for each of the calculations but only the first one is alive. The rest dies. I am getting the error below. Is it possible to create multiple sparkContext from inside one application jvm? ERROR 2014-11-14 14:59:46 akka.actor.OneForOneStrategy: spark.httpBroadcast.uri java.util.NoSuchElementException: spark.httpBroadcast.uri at org.apache.spark.SparkConf$$anonfun$get$1.apply(SparkConf.scala:151) at org.apache.spark.SparkConf$$anonfun$get$1.apply(SparkConf.scala:151) at scala.collection.MapLike$class.getOrElse(MapLike.scala:128) at scala.collection.AbstractMap.getOrElse(Map.scala:58) at org.apache.spark.SparkConf.get(SparkConf.scala:151) at org.apache.spark.broadcast.HttpBroadcast$.initialize(HttpBroadcast.scala:104) at org.apache.spark.broadcast.HttpBroadcastFactory.initialize(HttpBroadcast.scala:70) at org.apache.spark.broadcast.BroadcastManager.initialize(Broadcast.scala:81) at org.apache.spark.broadcast.BroadcastManager.init(Broadcast.scala:68) at org.apache.spark.SparkEnv$.create(SparkEnv.scala:175) at org.apache.spark.executor.Executor.init(Executor.scala:110) at org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$receive$1.applyOrElse(CoarseGrainedExecutorBackend.scala:56) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) at akka.actor.ActorCell.invoke(ActorCell.scala:456) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237) at akka.dispatch.Mailbox.run(Mailbox.scala:219) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) INFO 2014-11-14 14:59:46 org.apache.spark.executor.CoarseGrainedExecutorBackend: Connecting to driver: akka.tcp://spark@172.32.1.12:51590/user/CoarseGrainedSchedulerhttp://spark@172.32.1.12:51590/user/CoarseGrainedScheduler ERROR 2014-11-14 14:59:46 org.apache.spark.executor.CoarseGrainedExecutorBackend: Slave registration failed: Duplicate executor ID: 1 -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Mulitple-Spark-Context-tp18975.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.orgmailto:user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.orgmailto:user-h...@spark.apache.org
RE: Mulitple Spark Context
Thanks for your reply! Can you be more specific about the JVM? Is JVM referring to the driver application? if I want to create multiple sparkContext, I will need start a driver application instance for each sparkContext? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Mulitple-Spark-Context-tp18975p18985.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org