Re: Need some Clarification on checkpointing w.r.t Spark Structured Streaming

2017-09-11 Thread Michael Armbrust
Checkpoints record what has been processed for a specific query, and as such only need to be defined when writing (which is how you "start" a query). You can use the DataFrame created with readStream to start multiple queries, so it wouldn't really make sense to have a single checkpoint there.

Need some Clarification on checkpointing w.r.t Spark Structured Streaming

2017-09-11 Thread kant kodali
Hi All, I was wondering if we need to checkpoint both read and write streams when reading from Kafka and inserting into a target store? for example sparkSession.readStream().option("checkpointLocation", "hdfsPath").load() vs dataSet.writeStream().option("checkpointLocation", "hdfsPath")