There are different kinds of checkpointing going on. updateStateByKey
requires RDD checkpointing which can be enabled only by called
sparkContext.setCheckpointDirectory. But that does not enable Spark
Streaming driver checkpoints, which is necessary for recovering from driver
failures. That is
I reached a similar conclusion about checkpointing . It requires your
entire computation to be serializable, even all of the 'local' bits.
Which makes sense. In my case I do not use checkpointing and it is
fine to restart the driver in the case of failure and not try to
recover its state.
What I
Sean,
thanks for your message!
On Mon, Feb 23, 2015 at 6:03 PM, Sean Owen so...@cloudera.com wrote:
What I haven't investigated is whether you can enable checkpointing
for the state in updateStateByKey separately from this mechanism,
which is exactly your question. What happens if you set a
Hi,
On Tue, Feb 24, 2015 at 4:34 AM, Tathagata Das tathagata.das1...@gmail.com
wrote:
There are different kinds of checkpointing going on. updateStateByKey
requires RDD checkpointing which can be enabled only by called
sparkContext.setCheckpointDirectory. But that does not enable Spark
Aha, you’re right, I did a wrong comparison, the reason might be only for
checkpointing :).
Thanks
Jerry
From: Tobias Pfeiffer [mailto:t...@preferred.jp]
Sent: Wednesday, January 28, 2015 10:39 AM
To: Shao, Saisai
Cc: user
Subject: Re: Why must the dstream.foreachRDD(...) parameter
I believe this is needed for driver recovery in Spark Streaming. If your Spark
driver program crashes, Spark Streaming can recover the application by reading
the set of DStreams and output operations from a checkpoint file (see
Hey Tobias,
I think one consideration is for checkpoint of DStream which guarantee driver
fault tolerance.
Also this `foreachFunc` is more like an action function of RDD, thinking of
rdd.foreach(func), in which `func` need to be serializable. So maybe I think
your way of use it is not a
Hi,
thanks for the answers!
On Wed, Jan 28, 2015 at 11:31 AM, Shao, Saisai saisai.s...@intel.com
wrote:
Also this `foreachFunc` is more like an action function of RDD, thinking
of rdd.foreach(func), in which `func` need to be serializable. So maybe I
think your way of use it is not a normal