Mulitple Spark Context

2014-11-14 Thread Charles
I need continuously run multiple calculations concurrently on a cluster. They
are not sharing RDDs.
Each of the calculations needs different number of cores and memory. Also,
some of them are long running calculation and others are short running
calculation.They all need be run on regular basis and finish in time. The
short ones cannot wait until long ones to finish. They cannot run too slow
either since the short ones' running interval is short as well. 

It looks like the sharing inside a sparkContext cannot guarantee that the
short ones will get enough resources to finish in time if long ones already
running. Or am I wrong about that?

 I tried to create a sparkContext for each of the calculations but only the
first one is alive. The rest dies. I am getting the error below. Is it
possible to create multiple sparkContext from inside one application jvm?

ERROR 2014-11-14 14:59:46 akka.actor.OneForOneStrategy:
spark.httpBroadcast.uri 
java.util.NoSuchElementException: spark.httpBroadcast.uri 
at org.apache.spark.SparkConf$$anonfun$get$1.apply(SparkConf.scala:151) 
at org.apache.spark.SparkConf$$anonfun$get$1.apply(SparkConf.scala:151) 
at scala.collection.MapLike$class.getOrElse(MapLike.scala:128) 
at scala.collection.AbstractMap.getOrElse(Map.scala:58) 
at org.apache.spark.SparkConf.get(SparkConf.scala:151) 
at
org.apache.spark.broadcast.HttpBroadcast$.initialize(HttpBroadcast.scala:104) 
at
org.apache.spark.broadcast.HttpBroadcastFactory.initialize(HttpBroadcast.scala:70)
 
at
org.apache.spark.broadcast.BroadcastManager.initialize(Broadcast.scala:81) 
at org.apache.spark.broadcast.BroadcastManager.init(Broadcast.scala:68) 
at org.apache.spark.SparkEnv$.create(SparkEnv.scala:175) 
at org.apache.spark.executor.Executor.init(Executor.scala:110) 
at
org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$receive$1.applyOrElse(CoarseGrainedExecutorBackend.scala:56)
 
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) 
at akka.actor.ActorCell.invoke(ActorCell.scala:456) 
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237) 
at akka.dispatch.Mailbox.run(Mailbox.scala:219) 
at
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
 
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) 
at
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
 
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) 
at
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
 
INFO 2014-11-14 14:59:46
org.apache.spark.executor.CoarseGrainedExecutorBackend: Connecting to
driver: akka.tcp://spark@172.32.1.12:51590/user/CoarseGrainedScheduler 
ERROR 2014-11-14 14:59:46
org.apache.spark.executor.CoarseGrainedExecutorBackend: Slave registration
failed: Duplicate executor ID: 1



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Mulitple-Spark-Context-tp18975.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Mulitple Spark Context

2014-11-14 Thread Daniil Osipov
Its not recommended to have multiple spark contexts in one JVM, but you
could launch a separate JVM per context. How resources get allocated is
probably outside the scope of Spark, and more of a task for the cluster
manager.

On Fri, Nov 14, 2014 at 12:58 PM, Charles charles...@cenx.com wrote:

 I need continuously run multiple calculations concurrently on a cluster.
 They
 are not sharing RDDs.
 Each of the calculations needs different number of cores and memory. Also,
 some of them are long running calculation and others are short running
 calculation.They all need be run on regular basis and finish in time. The
 short ones cannot wait until long ones to finish. They cannot run too slow
 either since the short ones' running interval is short as well.

 It looks like the sharing inside a sparkContext cannot guarantee that the
 short ones will get enough resources to finish in time if long ones already
 running. Or am I wrong about that?

  I tried to create a sparkContext for each of the calculations but only the
 first one is alive. The rest dies. I am getting the error below. Is it
 possible to create multiple sparkContext from inside one application jvm?

 ERROR 2014-11-14 14:59:46 akka.actor.OneForOneStrategy:
 spark.httpBroadcast.uri
 java.util.NoSuchElementException: spark.httpBroadcast.uri
 at org.apache.spark.SparkConf$$anonfun$get$1.apply(SparkConf.scala:151)
 at org.apache.spark.SparkConf$$anonfun$get$1.apply(SparkConf.scala:151)
 at scala.collection.MapLike$class.getOrElse(MapLike.scala:128)
 at scala.collection.AbstractMap.getOrElse(Map.scala:58)
 at org.apache.spark.SparkConf.get(SparkConf.scala:151)
 at

 org.apache.spark.broadcast.HttpBroadcast$.initialize(HttpBroadcast.scala:104)
 at

 org.apache.spark.broadcast.HttpBroadcastFactory.initialize(HttpBroadcast.scala:70)
 at
 org.apache.spark.broadcast.BroadcastManager.initialize(Broadcast.scala:81)
 at org.apache.spark.broadcast.BroadcastManager.init(Broadcast.scala:68)
 at org.apache.spark.SparkEnv$.create(SparkEnv.scala:175)
 at org.apache.spark.executor.Executor.init(Executor.scala:110)
 at

 org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$receive$1.applyOrElse(CoarseGrainedExecutorBackend.scala:56)
 at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
 at akka.actor.ActorCell.invoke(ActorCell.scala:456)
 at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
 at akka.dispatch.Mailbox.run(Mailbox.scala:219)
 at

 akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
 at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
 at

 scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
 at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
 at

 scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
 INFO 2014-11-14 14:59:46
 org.apache.spark.executor.CoarseGrainedExecutorBackend: Connecting to
 driver: akka.tcp://spark@172.32.1.12:51590/user/CoarseGrainedScheduler
 ERROR 2014-11-14 14:59:46
 org.apache.spark.executor.CoarseGrainedExecutorBackend: Slave registration
 failed: Duplicate executor ID: 1



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Mulitple-Spark-Context-tp18975.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org




RE: Mulitple Spark Context

2014-11-14 Thread Bui, Tri
Does this also apply to StreamingContext ?

What issue would I have if I have 1000s of StreaminContext ?

Thanks
Tri

From: Daniil Osipov [mailto:daniil.osi...@shazam.com]
Sent: Friday, November 14, 2014 3:47 PM
To: Charles
Cc: u...@spark.incubator.apache.org
Subject: Re: Mulitple Spark Context

Its not recommended to have multiple spark contexts in one JVM, but you could 
launch a separate JVM per context. How resources get allocated is probably 
outside the scope of Spark, and more of a task for the cluster manager.

On Fri, Nov 14, 2014 at 12:58 PM, Charles 
charles...@cenx.commailto:charles...@cenx.com wrote:
I need continuously run multiple calculations concurrently on a cluster. They
are not sharing RDDs.
Each of the calculations needs different number of cores and memory. Also,
some of them are long running calculation and others are short running
calculation.They all need be run on regular basis and finish in time. The
short ones cannot wait until long ones to finish. They cannot run too slow
either since the short ones' running interval is short as well.

It looks like the sharing inside a sparkContext cannot guarantee that the
short ones will get enough resources to finish in time if long ones already
running. Or am I wrong about that?

 I tried to create a sparkContext for each of the calculations but only the
first one is alive. The rest dies. I am getting the error below. Is it
possible to create multiple sparkContext from inside one application jvm?

ERROR 2014-11-14 14:59:46 akka.actor.OneForOneStrategy:
spark.httpBroadcast.uri
java.util.NoSuchElementException: spark.httpBroadcast.uri
at org.apache.spark.SparkConf$$anonfun$get$1.apply(SparkConf.scala:151)
at org.apache.spark.SparkConf$$anonfun$get$1.apply(SparkConf.scala:151)
at scala.collection.MapLike$class.getOrElse(MapLike.scala:128)
at scala.collection.AbstractMap.getOrElse(Map.scala:58)
at org.apache.spark.SparkConf.get(SparkConf.scala:151)
at
org.apache.spark.broadcast.HttpBroadcast$.initialize(HttpBroadcast.scala:104)
at
org.apache.spark.broadcast.HttpBroadcastFactory.initialize(HttpBroadcast.scala:70)
at
org.apache.spark.broadcast.BroadcastManager.initialize(Broadcast.scala:81)
at org.apache.spark.broadcast.BroadcastManager.init(Broadcast.scala:68)
at org.apache.spark.SparkEnv$.create(SparkEnv.scala:175)
at org.apache.spark.executor.Executor.init(Executor.scala:110)
at
org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$receive$1.applyOrElse(CoarseGrainedExecutorBackend.scala:56)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
at akka.actor.ActorCell.invoke(ActorCell.scala:456)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
at akka.dispatch.Mailbox.run(Mailbox.scala:219)
at
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
INFO 2014-11-14 14:59:46
org.apache.spark.executor.CoarseGrainedExecutorBackend: Connecting to
driver: 
akka.tcp://spark@172.32.1.12:51590/user/CoarseGrainedSchedulerhttp://spark@172.32.1.12:51590/user/CoarseGrainedScheduler
ERROR 2014-11-14 14:59:46
org.apache.spark.executor.CoarseGrainedExecutorBackend: Slave registration
failed: Duplicate executor ID: 1



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Mulitple-Spark-Context-tp18975.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: 
user-unsubscr...@spark.apache.orgmailto:user-unsubscr...@spark.apache.org
For additional commands, e-mail: 
user-h...@spark.apache.orgmailto:user-h...@spark.apache.org



RE: Mulitple Spark Context

2014-11-14 Thread Charles
Thanks for your reply! Can you be more specific about the JVM? Is JVM
referring to the driver application? 
 if I want to create multiple sparkContext, I will need start a driver
application instance for each sparkContext?



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Mulitple-Spark-Context-tp18975p18985.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org