You are right. "checkpointInterval" is only for data checkpointing.
"metadata checkpoint" is done for each batch. Feel free to send a PR to add
the missing doc.
Best Regards,
Shixiong Zhu
2015-12-18 8:26 GMT-08:00 Lan Jiang :
> Need some clarification about the documentation.
Have a look at the various versions of
PairDStreamFunctions.updateStateByWindow (
http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.streaming.dstream.PairDStreamFunctions).
It supports updating running state in memory. (You can persist the state to
a database/files
Dibyendu,
Thanks for the reply.
I am reading your project homepage now.
One quick question I care about is:
If the receivers failed for some reasons(for example, killed brutally by
someone else), is there any mechanism for the receiver to fail over
automatically?
On Mon, Mar 16, 2015 at 3:25
Akhil,
I have checked the logs. There isn't any clue as to why the 5 receivers
failed.
That's why I just take it for granted that it will be a common issue for
receiver failures, and we need to figure out a way to detect this kind of
failure and do fail-over.
Thanks
On Mon, Mar 16, 2015 at
You need to figure out why the receivers failed in the first place. Look in
your worker logs and see what really happened. When you run a streaming job
continuously for longer period mostly there'll be a lot of logs (you can
enable log rotation etc.) and if you are doing a groupBy, join, etc type
Which version of Spark you are running ?
You can try this Low Level Consumer :
http://spark-packages.org/package/dibbhatt/kafka-spark-consumer
This is designed to recover from various failures and have very good fault
recovery mechanism built in. This is being used by many users and at
present
Yes.. Auto restart is enabled in my low level consumer ..when there is some
unhandled exception comes...
Even if you see KafkaConsumer.java, for some cases ( like broker failure,
kafka leader changes etc ) it can even refresh the Consumer (The
Coordinator which talks to a Leader) which will
I have checked Dibyendu's code, it looks that his implementation has
auto-restart mechanism:
Hi Arush,
With your code, I still didn't see the output Received X flumes events..
bit1...@163.com
From: bit1...@163.com
Date: 2015-02-17 14:08
To: Arush Kharbanda
CC: user
Subject: Re: Re: Question about spark streaming+Flume
Ok, you are missing a letter in foreachRDD.. let me proceed
Hi
Can you try this
val lines = FlumeUtils.createStream(ssc,localhost,)
// Print out the count of events received from this server in each
batch
lines.count().map(cnt = Received + cnt + flume events. at +
System.currentTimeMillis() )
lines.forechRDD(_.foreach(println))
Thanks
Ok, you are missing a letter in foreachRDD.. let me proceed..
bit1...@163.com
From: Arush Kharbanda
Date: 2015-02-17 14:31
To: bit1...@163.com
CC: user
Subject: Re: Question about spark streaming+Flume
Hi
Can you try this
val lines = FlumeUtils.createStream(ssc,localhost,)
// Print
14:31
To: bit1...@163.com
CC: user
Subject: Re: Question about spark streaming+Flume
Hi
Can you try this
val lines = FlumeUtils.createStream(ssc,localhost,)
// Print out the count of events received from this server in each batch
lines.count().map(cnt = Received + cnt + flume events
12 matches
Mail list logo