Good question, with  fileStream or textFileStream basically it will only
takes in the files whose timestamp is > the current timestamp
<https://github.com/apache/spark/blob/3c0156899dc1ec1f7dfe6d7c8af47fa6dc7d00bf/streaming/src/main/scala/org/apache/spark/streaming/dstream/FileInputDStream.scala#L172>
and
when checkpointing is enabled
<https://github.com/apache/spark/blob/3c0156899dc1ec1f7dfe6d7c8af47fa6dc7d00bf/streaming/src/main/scala/org/apache/spark/streaming/dstream/FileInputDStream.scala#L324>
it
would restore the latest filenames from the checkpoint directory which i
believe will kind of reprocess some files.

Thanks
Best Regards

On Mon, Jun 15, 2015 at 2:49 PM, Haopu Wang <hw...@qilinsoft.com> wrote:

>  Akhil, thank you for the response. I want to explore more.
>
>
>
> If the application is just monitoring a HDFS folder and output the word
> count of each streaming batch into also HDFS.
>
>
>
> When I kill the application _*before*_ spark takes a checkpoint, after
> recovery, spark will resume the processing from the timestamp of latest
> checkpoint. That means some files will be processed twice and duplicate
> results are generated.
>
>
>
> Please correct me if the understanding is wrong, thanks again!
>
>
>  ------------------------------
>
> *From:* Akhil Das [mailto:ak...@sigmoidanalytics.com]
> *Sent:* Monday, June 15, 2015 3:48 PM
> *To:* Haopu Wang
> *Cc:* user
> *Subject:* Re: If not stop StreamingContext gracefully, will checkpoint
> data be consistent?
>
>
>
> I think it should be fine, that's the whole point of check-pointing (in
> case of driver failure etc).
>
>
>   Thanks
>
> Best Regards
>
>
>
> On Mon, Jun 15, 2015 at 6:54 AM, Haopu Wang <hw...@qilinsoft.com> wrote:
>
> Hi, can someone help to confirm the behavior? Thank you!
>
>
> -----Original Message-----
> From: Haopu Wang
> Sent: Friday, June 12, 2015 4:57 PM
> To: user
> Subject: If not stop StreamingContext gracefully, will checkpoint data
> be consistent?
>
> This is a quick question about Checkpoint. The question is: if the
> StreamingContext is not stopped gracefully, will the checkpoint be
> consistent?
> Or I should always gracefully shutdown the application even in order to
> use the checkpoint?
>
> Thank you very much!
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>
>

Reply via email to