Using a long period betweem checkpoints may cause a long linage of the graphs
computations to be created, since Spark uses checkpointing to cut it, which
can also cause a delay in the streaming job.
--
View this message in context:
You can't. Spark doesn't let you fiddle with the data being checkpoint, as
it's an internal implementation detail.
--
View this message in context:
Hi,
I have checkpoints enabled in Spark streaming and I use updateStateByKey and
reduceByKeyAndWindow with inverse functions. How do I reduce the amount of
data that I am writing to the checkpoint or clear out the data that I dont
care?
Thanks!
--
View this message in context: