Re: Spark streaming on standalone cluster

Tathagata Das Tue, 30 Jun 2015 13:03:38 -0700

How many receivers do you have in the streaming program? You have to have
more numbers of core in reserver by your spar application than the number
of receivers. That would explain the receiving output after stopping.


TD

On Tue, Jun 30, 2015 at 7:59 AM, Borja Garrido Bear <kazebo...@gmail.com>
wrote:

> Hi all,
>
> I'm running a spark standalone cluster with one master and one slave
> (different machines and both in version 1.4.0), the thing is I have a spark
> streaming job that gets data from Kafka, and the just prints it.
>
> To configure the cluster I just started the master and then the slaves
> pointing to it, as everything appears in the web interface I assumed
> everything was fine, but maybe I missed some configuration.
>
> When I run it locally there is no problem, it works.
> When I run it in the cluster the worker state appears as "loading"
>  - If the job is a Scala one, when I stop it I receive all the output
>  - If the job is Python, when I stop it I receive a bunch of these
> exceptions
>
>
> \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
>
> ERROR JobScheduler: Error running job streaming job 1435675420000 ms.0
> py4j.Py4JException: An exception was raised by the Python Proxy. Return
> Message: null
> at py4j.Protocol.getReturnValue(Protocol.java:417)
> at py4j.reflection.PythonProxyHandler.invoke(PythonProxyHandler.java:113)
> at com.sun.proxy.$Proxy14.call(Unknown Source)
> at
> org.apache.spark.streaming.api.python.TransformFunction.apply(PythonDStream.scala:63)
> at
> org.apache.spark.streaming.api.python.PythonDStream$$anonfun$callForeachRDD$1.apply(PythonDStream.scala:156)
> at
> org.apache.spark.streaming.api.python.PythonDStream$$anonfun$callForeachRDD$1.apply(PythonDStream.scala:156)
> at
> org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ForEachDStream.scala:42)
> at
> org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:40)
> at
> org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:40)
> at
> org.apache.spark.streaming.dstream.DStream.createRDDWithLocalProperties(DStream.scala:399)
> at
> org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply$mcV$sp(ForEachDStream.scala:40)
> at
> org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:40)
> at
> org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:40)
> at scala.util.Try$.apply(Try.scala:161)
> at org.apache.spark.streaming.scheduler.Job.run(Job.scala:34)
> at
> org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply$mcV$sp(JobScheduler.scala:193)
> at
> org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:193)
> at
> org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:193)
> at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
> at
> org.apache.spark.streaming.scheduler.JobScheduler$JobHandler.run(JobScheduler.scala:192)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
>
>
> \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
>
> Is there any known issue with spark streaming and the standalone mode? or
> with Python?
>

Re: Spark streaming on standalone cluster

Reply via email to