[jira] [Comment Edited] (SPARK-2243) Support multiple SparkContexts in the same JVM
[ https://issues.apache.org/jira/browse/SPARK-2243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16683600#comment-16683600 ] sam edited comment on SPARK-2243 at 11/12/18 11:24 AM: --- Big bonus of being able to create and shutdown SparkContexts is to be able to grab/free up resources while the job is running in a stable and predictable way. E.g. {code:java} 1. Create SparkContext A with 10 executors with 20 cores each and 400GB of RAM 2. Run job A 3. Kill context A 4. Create SparkContext B with 4 executors with 10 cores each and 40 GB of RAM 5. Run job B 6. Kill context B{code} Suppose step 1 takes 2 hours and step 4 takes 1 hour. We have freed up 100s of cores and 100s of GBs of RAM for one hour. Currently the only way to optimise for this kind of thing is to have multiple spark submits, which means breaking out of Scala / the process. was (Author: sams): Big bonus of being able to create and shutdown SparkContexts is to be able to grab/free up resources while the job is running in a stable and predictable way. E.g. {code:java} 1. Create SparkContext A with 10 executors with 20 cores each and 400GB of RAM 2. Run job 3. Kill context A 4. Create SparkContext A with 4 executors with 10 cores each and 40 GB of RAM{code} Suppose step 1 takes 2 hours and step 4 takes 1 hour. We have freed up 100s of cores and 100s of GBs of RAM for one hour. Currently the only way to optimise for this kind of thing is to have multiple spark submits, which means breaking out of Scala / the process. > Support multiple SparkContexts in the same JVM > -- > > Key: SPARK-2243 > URL: https://issues.apache.org/jira/browse/SPARK-2243 > Project: Spark > Issue Type: New Feature > Components: Block Manager, Spark Core >Affects Versions: 0.7.0, 1.0.0, 1.1.0 >Reporter: Miguel Angel Fernandez Diaz >Priority: Major > > We're developing a platform where we create several Spark contexts for > carrying out different calculations. Is there any restriction when using > several Spark contexts? We have two contexts, one for Spark calculations and > another one for Spark Streaming jobs. The next error arises when we first > execute a Spark calculation and, once the execution is finished, a Spark > Streaming job is launched: > {code} > 14/06/23 16:40:08 ERROR executor.Executor: Exception in task ID 0 > java.io.FileNotFoundException: http://172.19.0.215:47530/broadcast_0 > at > sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1624) > at > org.apache.spark.broadcast.HttpBroadcast$.read(HttpBroadcast.scala:156) > at > org.apache.spark.broadcast.HttpBroadcast.readObject(HttpBroadcast.scala:56) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017) > at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893) > at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) > at > java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) > at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) > at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) > at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) > at > org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:40) > at > org.apache.spark.scheduler.ResultTask$.deserializeInfo(ResultTask.scala:63) > at > org.apache.spark.scheduler.ResultTask.readExternal(ResultTask.scala:139) > at > java.io.ObjectInputStream.readExternalData(ObjectInputStream.java:1837) > at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) > at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) > at > org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:40) > at > org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:62) > at > org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:193) > at > org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:45) > at
[jira] [Comment Edited] (SPARK-2243) Support multiple SparkContexts in the same JVM
[ https://issues.apache.org/jira/browse/SPARK-2243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15103268#comment-15103268 ] Richard Marscher edited comment on SPARK-2243 at 1/16/16 5:25 PM: -- I fail to see how dynamic allocation would help, can you clarify? We already are constantly using 100% of the cluster resources and have a fixed # of JVM driver hosts. If a given context has 32 cores available across executors and is constantly processing jobs with stages with 32+ tasks, it will always be busy so I don't see why it would scale down with dynamic allocation. Meanwhile, since we have to share this context it will mix in jobs/stages/tasks for the separate DAG I mentioned. Another use case. After observation of months in production, there seems to be overhead and cost to sharing a SparkContext between jobs as opposed to running the same number of jobs fanned out across different contexts started on separate JVMs. And yes this includes trying out different scheduler and pool settings (fair vs fifo). If this weren't the case, we could just run 1 big spark context on 1 JVM and share it for all our jobs. Since it's not the case we need to have X many separate JVMs solely because each one can only have a single SparkContext. Anyway, I don't mind this issue being closed as Won't Fix, but if feels like the entire comment chain is dancing around the underlying reason. Use cases are valid it just seems like the conclusion is they aren't critical enough in comparison to the changes to the Spark code to support them. That's fine, but can we just admit that? was (Author: rmarscher): I fail to see how dynamic allocation would help, can you clarify? We already are constantly using 100% of the cluster resources and have a fixed # of JVM driver hosts. If a given context has 32 cores available across executors and is constantly processing jobs with stages with 32+ tasks, it will always be busy so I don't see why it would scale down with dynamic allocation. Meanwhile, since we have to share this context it will mix in jobs/stages/tasks for the separate DAG I mentioned. Another use case. After observation of months in production, there seems to be overhead and cost to sharing a SparkContext between jobs as opposed to running the same number of jobs fanned out across different contexts started on separate JVMs. And yes this includes trying out different scheduler and pool settins (fair vs fifo). If this weren't the case, we could just run 1 big spark context on 1 JVM and share it for all our jobs. Since it's not the case we need to have X many separate JVMs solely because each one can only have a single SparkContext. Anyway, I don't mind this issue being closed as Won't Fix, but if feels like the entire comment chain is dancing around the underlying reason. Use cases are valid it just seems like the conclusion is they aren't critical enough in comparison to the changes to the Spark code to support them. That's fine, but can we just admit that? > Support multiple SparkContexts in the same JVM > -- > > Key: SPARK-2243 > URL: https://issues.apache.org/jira/browse/SPARK-2243 > Project: Spark > Issue Type: New Feature > Components: Block Manager, Spark Core >Affects Versions: 0.7.0, 1.0.0, 1.1.0 >Reporter: Miguel Angel Fernandez Diaz > > We're developing a platform where we create several Spark contexts for > carrying out different calculations. Is there any restriction when using > several Spark contexts? We have two contexts, one for Spark calculations and > another one for Spark Streaming jobs. The next error arises when we first > execute a Spark calculation and, once the execution is finished, a Spark > Streaming job is launched: > {code} > 14/06/23 16:40:08 ERROR executor.Executor: Exception in task ID 0 > java.io.FileNotFoundException: http://172.19.0.215:47530/broadcast_0 > at > sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1624) > at > org.apache.spark.broadcast.HttpBroadcast$.read(HttpBroadcast.scala:156) > at > org.apache.spark.broadcast.HttpBroadcast.readObject(HttpBroadcast.scala:56) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017) > at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893) > at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) > at
[jira] [Comment Edited] (SPARK-2243) Support multiple SparkContexts in the same JVM
[ https://issues.apache.org/jira/browse/SPARK-2243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14392848#comment-14392848 ] sam edited comment on SPARK-2243 at 4/2/15 3:37 PM: Yup, a singleton would make sense, it's creation is side effecting, so one might as well have a method initSparkConf(conf: SparkConf) for initialization and setting the conf rather than creating a new SparkContext. There is no point in using an pure FP pattern when the domain isn't pure. was (Author: sams): Yup, a singleton would make sense, it's creation is side effecting, so one might as well have a method setSparkContextConf(conf: SparkConf) for setting the conf rather than creating a new SparkContext. There is no point in using an pure FP pattern when the domain isn't pure. Support multiple SparkContexts in the same JVM -- Key: SPARK-2243 URL: https://issues.apache.org/jira/browse/SPARK-2243 Project: Spark Issue Type: New Feature Components: Block Manager, Spark Core Affects Versions: 0.7.0, 1.0.0, 1.1.0 Reporter: Miguel Angel Fernandez Diaz We're developing a platform where we create several Spark contexts for carrying out different calculations. Is there any restriction when using several Spark contexts? We have two contexts, one for Spark calculations and another one for Spark Streaming jobs. The next error arises when we first execute a Spark calculation and, once the execution is finished, a Spark Streaming job is launched: {code} 14/06/23 16:40:08 ERROR executor.Executor: Exception in task ID 0 java.io.FileNotFoundException: http://172.19.0.215:47530/broadcast_0 at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1624) at org.apache.spark.broadcast.HttpBroadcast$.read(HttpBroadcast.scala:156) at org.apache.spark.broadcast.HttpBroadcast.readObject(HttpBroadcast.scala:56) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:40) at org.apache.spark.scheduler.ResultTask$.deserializeInfo(ResultTask.scala:63) at org.apache.spark.scheduler.ResultTask.readExternal(ResultTask.scala:139) at java.io.ObjectInputStream.readExternalData(ObjectInputStream.java:1837) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:40) at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:62) at org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:193) at org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:45) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:176) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 14/06/23 16:40:08 WARN scheduler.TaskSetManager: Lost TID 0 (task 0.0:0) 14/06/23 16:40:08 WARN scheduler.TaskSetManager: Loss was due to java.io.FileNotFoundException java.io.FileNotFoundException: http://172.19.0.215:47530/broadcast_0 at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1624) at org.apache.spark.broadcast.HttpBroadcast$.read(HttpBroadcast.scala:156) at org.apache.spark.broadcast.HttpBroadcast.readObject(HttpBroadcast.scala:56) at
[jira] [Comment Edited] (SPARK-2243) Support multiple SparkContexts in the same JVM
[ https://issues.apache.org/jira/browse/SPARK-2243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14392519#comment-14392519 ] Sudharma Puranik edited comment on SPARK-2243 at 4/2/15 10:36 AM: -- [~sowen] : My reply was for Jason where he mentioned about workaround. And yes I mean running logically distinct apps with logically distinct configurations only in terms of cores. Yes you are right in regard to running seperate process. Well , again, Spark per se, process is nothing but another {{SparkContext}}, which again has boundaries with cores and memory. :) I guess we are in unison in understanding the implications of multiple {{SparkContext}} on same JVM vs sharing multiple {{SparkContext}} across JVMs. was (Author: sudharma.pura...@gmail.com): [~sowen] : My reply was for Jason where he mentioned about workaround. And yes I mean running logically distinct apps with logically distinct configurations only in terms of cores. Yes you are right in regard to running seperate process. Well , again, Spark per se, process is nothing but another SparkContext, which again has boundaries with cores and memory. :) I guess we are in unison in understanding the implications of multiple SparkContexts on same JVM vs sharing multiple sparkContexts across JVMs. Support multiple SparkContexts in the same JVM -- Key: SPARK-2243 URL: https://issues.apache.org/jira/browse/SPARK-2243 Project: Spark Issue Type: New Feature Components: Block Manager, Spark Core Affects Versions: 0.7.0, 1.0.0, 1.1.0 Reporter: Miguel Angel Fernandez Diaz We're developing a platform where we create several Spark contexts for carrying out different calculations. Is there any restriction when using several Spark contexts? We have two contexts, one for Spark calculations and another one for Spark Streaming jobs. The next error arises when we first execute a Spark calculation and, once the execution is finished, a Spark Streaming job is launched: {code} 14/06/23 16:40:08 ERROR executor.Executor: Exception in task ID 0 java.io.FileNotFoundException: http://172.19.0.215:47530/broadcast_0 at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1624) at org.apache.spark.broadcast.HttpBroadcast$.read(HttpBroadcast.scala:156) at org.apache.spark.broadcast.HttpBroadcast.readObject(HttpBroadcast.scala:56) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:40) at org.apache.spark.scheduler.ResultTask$.deserializeInfo(ResultTask.scala:63) at org.apache.spark.scheduler.ResultTask.readExternal(ResultTask.scala:139) at java.io.ObjectInputStream.readExternalData(ObjectInputStream.java:1837) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:40) at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:62) at org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:193) at org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:45) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:176) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 14/06/23 16:40:08
[jira] [Comment Edited] (SPARK-2243) Support multiple SparkContexts in the same JVM
[ https://issues.apache.org/jira/browse/SPARK-2243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14391770#comment-14391770 ] Jason Hubbard edited comment on SPARK-2243 at 4/1/15 11:40 PM: --- Apologizing for being flippant is a bit of an oxymoron isn't it? The answer you proprose is the only one available, but it isn't a real solution, it's a workaround. Obviously running in separate JVMs causes other issues with overhead of starting multiple JVMs and the complexity of having to serialize data so they can communicate. Having multiple workloads in the same SparkContext is what I have chosen, but sometimes you would like different settings for the different workloads which this would now not allow. was (Author: jahubba): Apologizing for being flippant is a bit of an oxymoron isn't it? The answer you purpose is the only one available, but it isn't a real solution, it's a workaround. Obviously running in separate JVMs causes other issues with overhead of starting multiple JVMs and the complexity of having to serialize data so they can communicate. Having multiple workloads in the same SparkContext is what I have chosen, but sometimes you would like different settings for the different workloads which this would now not allow. Support multiple SparkContexts in the same JVM -- Key: SPARK-2243 URL: https://issues.apache.org/jira/browse/SPARK-2243 Project: Spark Issue Type: New Feature Components: Block Manager, Spark Core Affects Versions: 0.7.0, 1.0.0, 1.1.0 Reporter: Miguel Angel Fernandez Diaz We're developing a platform where we create several Spark contexts for carrying out different calculations. Is there any restriction when using several Spark contexts? We have two contexts, one for Spark calculations and another one for Spark Streaming jobs. The next error arises when we first execute a Spark calculation and, once the execution is finished, a Spark Streaming job is launched: {code} 14/06/23 16:40:08 ERROR executor.Executor: Exception in task ID 0 java.io.FileNotFoundException: http://172.19.0.215:47530/broadcast_0 at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1624) at org.apache.spark.broadcast.HttpBroadcast$.read(HttpBroadcast.scala:156) at org.apache.spark.broadcast.HttpBroadcast.readObject(HttpBroadcast.scala:56) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:40) at org.apache.spark.scheduler.ResultTask$.deserializeInfo(ResultTask.scala:63) at org.apache.spark.scheduler.ResultTask.readExternal(ResultTask.scala:139) at java.io.ObjectInputStream.readExternalData(ObjectInputStream.java:1837) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:40) at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:62) at org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:193) at org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:45) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:176) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 14/06/23 16:40:08 WARN
[jira] [Comment Edited] (SPARK-2243) Support multiple SparkContexts in the same JVM
[ https://issues.apache.org/jira/browse/SPARK-2243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14391770#comment-14391770 ] Jason Hubbard edited comment on SPARK-2243 at 4/1/15 11:41 PM: --- Apologizing for being flippant is a bit of an oxymoron isn't it? The answer you propose is the only one available, but it isn't a real solution, it's a workaround. Obviously running in separate JVMs causes other issues with overhead of starting multiple JVMs and the complexity of having to serialize data so they can communicate. Having multiple workloads in the same SparkContext is what I have chosen, but sometimes you would like different settings for the different workloads which this would now not allow. was (Author: jahubba): Apologizing for being flippant is a bit of an oxymoron isn't it? The answer you proprose is the only one available, but it isn't a real solution, it's a workaround. Obviously running in separate JVMs causes other issues with overhead of starting multiple JVMs and the complexity of having to serialize data so they can communicate. Having multiple workloads in the same SparkContext is what I have chosen, but sometimes you would like different settings for the different workloads which this would now not allow. Support multiple SparkContexts in the same JVM -- Key: SPARK-2243 URL: https://issues.apache.org/jira/browse/SPARK-2243 Project: Spark Issue Type: New Feature Components: Block Manager, Spark Core Affects Versions: 0.7.0, 1.0.0, 1.1.0 Reporter: Miguel Angel Fernandez Diaz We're developing a platform where we create several Spark contexts for carrying out different calculations. Is there any restriction when using several Spark contexts? We have two contexts, one for Spark calculations and another one for Spark Streaming jobs. The next error arises when we first execute a Spark calculation and, once the execution is finished, a Spark Streaming job is launched: {code} 14/06/23 16:40:08 ERROR executor.Executor: Exception in task ID 0 java.io.FileNotFoundException: http://172.19.0.215:47530/broadcast_0 at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1624) at org.apache.spark.broadcast.HttpBroadcast$.read(HttpBroadcast.scala:156) at org.apache.spark.broadcast.HttpBroadcast.readObject(HttpBroadcast.scala:56) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:40) at org.apache.spark.scheduler.ResultTask$.deserializeInfo(ResultTask.scala:63) at org.apache.spark.scheduler.ResultTask.readExternal(ResultTask.scala:139) at java.io.ObjectInputStream.readExternalData(ObjectInputStream.java:1837) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:40) at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:62) at org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:193) at org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:45) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:176) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 14/06/23 16:40:08 WARN