Our additional question on checkpointing is basically the logistics of it --
At which point does the data get written into checkpointing? Is it written
as soon as the driver program retrieves an RDD from Kafka (or another
source)? Or, is it written after that RDD has been processed and we're
You'll resume and re-process the rdd that didnt finish
On Fri, Aug 14, 2015 at 1:31 PM, Dmitry Goldenberg dgoldenberg...@gmail.com
wrote:
Our additional question on checkpointing is basically the logistics of it
--
At which point does the data get written into checkpointing? Is it
written
Thanks, Cody. It sounds like Spark Streaming has enough state info to know
how many batches have been processed and if not all of them then the RDD is
'unfinished'. I wonder if it would know whether the last micro-batch has
been fully processed successfully. Hypothetically, the driver program
If you've set the checkpoint dir, it seems like indeed the intent is
to use a default checkpoint interval in DStream:
private[streaming] def initialize(time: Time) {
...
// Set the checkpoint interval to be slideDuration or 10 seconds,
which ever is larger
if (mustCheckpoint
It looks like there's an issue with the 'Parameters' pojo I'm using within
my driver program. For some reason that needs to be serializable, which is
odd.
java.io.NotSerializableException: com.kona.consumer.kafka.spark.Parameters
Giving it another whirl though having to make it serializable
I've instrumented checkpointing per the programming guide and I can tell
that Spark Streaming is creating the checkpoint directories but I'm not
seeing any content being created in those directories nor am I seeing the
effects I'd expect from checkpointing. I'd expect any data that comes into
I'll check the log info message..
Meanwhile, the code is basically
public class KafkaSparkStreamingDriver implements Serializable {
..
SparkConf sparkConf = createSparkConf(appName, kahunaEnv);
JavaStreamingContext jssc = params.isCheckpointed() ?
Show us the relevant code
On Fri, Jul 31, 2015 at 12:16 PM, Dmitry Goldenberg
dgoldenberg...@gmail.com wrote:
I've instrumented checkpointing per the programming guide and I can tell
that Spark Streaming is creating the checkpoint directories but I'm not
seeing any content being created in