RE: Spark streaming on standalone cluster

prajod.vettiyattil Wed, 01 Jul 2015 01:33:11 -0700

Spark streaming needs at least two threads on the worker/slave side. I have 
seen this issue when(to test the behavior), I set the thread count for spark 
streaming to 1. It should be atleast 2: one for the receiver adapter(kafka, 
flume etc) and the second for processing the data.


But I tested that in local mode: “--master local[2] “. The same issue could 
happen in worker also.  If you set “--master local[1] “ the streaming 
worker/slave blocks due to starvation.

Which conf parameter sets the worker thread count in cluster mode ? is it 
spark.akka.threads ?

From: Tathagata Das [mailto:t...@databricks.com]
Sent: 01 July 2015 01:32
To: Borja Garrido Bear
Cc: user
Subject: Re: Spark streaming on standalone cluster

How many receivers do you have in the streaming program? You have to have more 
numbers of core in reserver by your spar application than the number of 
receivers. That would explain the receiving output after stopping.

TD

On Tue, Jun 30, 2015 at 7:59 AM, Borja Garrido Bear 
<kazebo...@gmail.com<mailto:kazebo...@gmail.com>> wrote:
Hi all,

I'm running a spark standalone cluster with one master and one slave (different 
machines and both in version 1.4.0), the thing is I have a spark streaming job 
that gets data from Kafka, and the just prints it.

To configure the cluster I just started the master and then the slaves pointing 
to it, as everything appears in the web interface I assumed everything was 
fine, but maybe I missed some configuration.

When I run it locally there is no problem, it works.
When I run it in the cluster the worker state appears as "loading"
 - If the job is a Scala one, when I stop it I receive all the output
 - If the job is Python, when I stop it I receive a bunch of these exceptions

\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\

ERROR JobScheduler: Error running job streaming job 1435675420000 ms.0
py4j.Py4JException: An exception was raised by the Python Proxy. Return 
Message: null
at py4j.Protocol.getReturnValue(Protocol.java:417)
at py4j.reflection.PythonProxyHandler.invoke(PythonProxyHandler.java:113)
at com.sun.proxy.$Proxy14.call(Unknown Source)
at 
org.apache.spark.streaming.api.python.TransformFunction.apply(PythonDStream.scala:63)
at 
org.apache.spark.streaming.api.python.PythonDStream$$anonfun$callForeachRDD$1.apply(PythonDStream.scala:156)
at 
org.apache.spark.streaming.api.python.PythonDStream$$anonfun$callForeachRDD$1.apply(PythonDStream.scala:156)
at 
org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ForEachDStream.scala:42)
at 
org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:40)
at 
org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:40)
at 
org.apache.spark.streaming.dstream.DStream.createRDDWithLocalProperties(DStream.scala:399)
at 
org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply$mcV$sp(ForEachDStream.scala:40)
at 
org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:40)
at 
org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:40)
at scala.util.Try$.apply(Try.scala:161)
at org.apache.spark.streaming.scheduler.Job.run(Job.scala:34)
at 
org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply$mcV$sp(JobScheduler.scala:193)
at 
org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:193)
at 
org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:193)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
at 
org.apache.spark.streaming.scheduler.JobScheduler$JobHandler.run(JobScheduler.scala:192)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\

Is there any known issue with spark streaming and the standalone mode? or with 
Python?

The information contained in this electronic message and any attachments to 
this message are intended for the exclusive use of the addressee(s) and may 
contain proprietary, confidential or privileged information. If you are not the 
intended recipient, you should not disseminate, distribute or copy this e-mail. 
Please notify the sender immediately and destroy all copies of this message and 
any attachments. WARNING: Computer viruses can be transmitted via email. The 
recipient should check this email and any attachments for the presence of 
viruses. The company accepts no liability for any damage caused by any virus 
transmitted by this email. www.wipro.com

RE: Spark streaming on standalone cluster

Reply via email to