Re: Spark streaming stops computing while the receiver keeps running without any errors reported

Tathagata Das Thu, 11 Sep 2014 14:20:50 -0700

This is very puzzling, given that this works in the local mode.

Does running the kinesis example work with your spark-submit?


https://github.com/apache/spark/blob/master/extras/kinesis-asl/src/main/scala/org/apache/spark/examples/streaming/KinesisWordCountASL.scala

The instructions are present in the streaming guide.
https://github.com/apache/spark/blob/master/docs/streaming-kinesis-integration.md

If that does not work on cluster, then I would see the streaming UI for the
number records that are being received, and the stages page for whether
jobs are being executed for every batch or not. Can tell use whether that
is working well.

Also ccing, chris fregly who wrote Kinesis integration.

TD




On Thu, Sep 11, 2014 at 4:51 AM, Aniket Bhatnagar <
aniket.bhatna...@gmail.com> wrote:

> Hi all
>
> I am trying to run kinesis spark streaming application on a standalone
> spark cluster. The job works find in local mode but when I submit it (using
> spark-submit), it doesn't do anything. I enabled logs
> for org.apache.spark.streaming.kinesis package and I regularly get the
> following in worker logs:
>
> 14/09/11 11:41:25 DEBUG KinesisRecordProcessor: Stored:  Worker
> x.x.x.x:b88a9210-cbb9-4c31-8da7-35fd92faba09 stored 34 records for shardId
> shardId-000000000000
> 14/09/11 11:41:25 DEBUG KinesisRecordProcessor: Stored:  Worker
> x.x.x.x:b2e9c20f-470a-44fe-a036-630c776919fb stored 31 records for shardId
> shardId-000000000001
>
> But the job does not perform any operations defined on DStream. To
> investigate this further, I changed the kinesis-asl's KinesisUtils to
> perform the following computation on the DStream created
> using ssc.receiverStream(new KinesisReceiver...):
>
> stream.count().foreachRDD(rdd => rdd.foreach(tuple => logInfo("Emitted " +
> tuple)))
>
> Even the above line does not results in any corresponding log entries both
> in driver and worker logs. The only relevant logs that I could find in
> driver logs are:
> 14/09/11 11:40:58 INFO DAGScheduler: Stage 2 (foreach at
> KinesisUtils.scala:68) finished in 0.398 s
> 14/09/11 11:40:58 INFO SparkContext: Job finished: foreach at
> KinesisUtils.scala:68, took 4.926449985 s
> 14/09/11 11:40:58 INFO JobScheduler: Finished job streaming job
> 1410435653000 ms.0 from job set of time 1410435653000 ms
> 14/09/11 11:40:58 INFO JobScheduler: Starting job streaming job
> 1410435653000 ms.1 from job set of time 1410435653000 ms
> 14/09/11 11:40:58 INFO SparkContext: Starting job: foreach at
> KinesisUtils.scala:68
> 14/09/11 11:40:58 INFO DAGScheduler: Registering RDD 13 (union at
> DStream.scala:489)
> 14/09/11 11:40:58 INFO DAGScheduler: Got job 3 (foreach at
> KinesisUtils.scala:68) with 2 output partitions (allowLocal=false)
> 14/09/11 11:40:58 INFO DAGScheduler: Final stage: Stage 5(foreach at
> KinesisUtils.scala:68)
>
> After the above logs, nothing shows up corresponding to KinesisUtils. I am
> out of ideas on this one and any help on this would greatly appreciated.
>
> Thanks,
> Aniket
>

Re: Spark streaming stops computing while the receiver keeps running without any errors reported

Reply via email to