Akhil, thank you for the response. I want to explore more.

 

If the application is just monitoring a HDFS folder and output the word
count of each streaming batch into also HDFS.

 

When I kill the application _before_ spark takes a checkpoint, after
recovery, spark will resume the processing from the timestamp of latest
checkpoint. That means some files will be processed twice and duplicate
results are generated.

 

Please correct me if the understanding is wrong, thanks again!

 

________________________________

From: Akhil Das [mailto:ak...@sigmoidanalytics.com] 
Sent: Monday, June 15, 2015 3:48 PM
To: Haopu Wang
Cc: user
Subject: Re: If not stop StreamingContext gracefully, will checkpoint
data be consistent?

 

I think it should be fine, that's the whole point of check-pointing (in
case of driver failure etc).




Thanks

Best Regards

 

On Mon, Jun 15, 2015 at 6:54 AM, Haopu Wang <hw...@qilinsoft.com> wrote:

Hi, can someone help to confirm the behavior? Thank you!


-----Original Message-----
From: Haopu Wang
Sent: Friday, June 12, 2015 4:57 PM
To: user
Subject: If not stop StreamingContext gracefully, will checkpoint data
be consistent?

This is a quick question about Checkpoint. The question is: if the
StreamingContext is not stopped gracefully, will the checkpoint be
consistent?
Or I should always gracefully shutdown the application even in order to
use the checkpoint?

Thank you very much!


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

 

Reply via email to