Good question, with fileStream or textFileStream basically it will only takes in the files whose timestamp is > the current timestamp <https://github.com/apache/spark/blob/3c0156899dc1ec1f7dfe6d7c8af47fa6dc7d00bf/streaming/src/main/scala/org/apache/spark/streaming/dstream/FileInputDStream.scala#L172> and when checkpointing is enabled <https://github.com/apache/spark/blob/3c0156899dc1ec1f7dfe6d7c8af47fa6dc7d00bf/streaming/src/main/scala/org/apache/spark/streaming/dstream/FileInputDStream.scala#L324> it would restore the latest filenames from the checkpoint directory which i believe will kind of reprocess some files.
Thanks Best Regards On Mon, Jun 15, 2015 at 2:49 PM, Haopu Wang <hw...@qilinsoft.com> wrote: > Akhil, thank you for the response. I want to explore more. > > > > If the application is just monitoring a HDFS folder and output the word > count of each streaming batch into also HDFS. > > > > When I kill the application _*before*_ spark takes a checkpoint, after > recovery, spark will resume the processing from the timestamp of latest > checkpoint. That means some files will be processed twice and duplicate > results are generated. > > > > Please correct me if the understanding is wrong, thanks again! > > > ------------------------------ > > *From:* Akhil Das [mailto:ak...@sigmoidanalytics.com] > *Sent:* Monday, June 15, 2015 3:48 PM > *To:* Haopu Wang > *Cc:* user > *Subject:* Re: If not stop StreamingContext gracefully, will checkpoint > data be consistent? > > > > I think it should be fine, that's the whole point of check-pointing (in > case of driver failure etc). > > > Thanks > > Best Regards > > > > On Mon, Jun 15, 2015 at 6:54 AM, Haopu Wang <hw...@qilinsoft.com> wrote: > > Hi, can someone help to confirm the behavior? Thank you! > > > -----Original Message----- > From: Haopu Wang > Sent: Friday, June 12, 2015 4:57 PM > To: user > Subject: If not stop StreamingContext gracefully, will checkpoint data > be consistent? > > This is a quick question about Checkpoint. The question is: if the > StreamingContext is not stopped gracefully, will the checkpoint be > consistent? > Or I should always gracefully shutdown the application even in order to > use the checkpoint? > > Thank you very much! > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > > >