Re: If not stop StreamingContext gracefully, will checkpoint data be consistent?

2015-06-19 Thread Akhil Das
One workaround would be remove/move the files from the input directory once you have it processed. Thanks Best Regards On Fri, Jun 19, 2015 at 5:48 AM, Haopu Wang hw...@qilinsoft.com wrote: Akhil, From my test, I can see the files in the last batch will alwyas be reprocessed upon

RE: If not stop StreamingContext gracefully, will checkpoint data be consistent?

2015-06-18 Thread Haopu Wang
Akhil, From my test, I can see the files in the last batch will alwyas be reprocessed upon restarting from checkpoint even for graceful shutdown. I think usually the file is expected to be processed only once. Maybe this is a bug in fileStream? or do you know any approach to workaround it?

Re: If not stop StreamingContext gracefully, will checkpoint data be consistent?

2015-06-16 Thread Akhil Das
Good question, with fileStream or textFileStream basically it will only takes in the files whose timestamp is the current timestamp

RE: If not stop StreamingContext gracefully, will checkpoint data be consistent?

2015-06-15 Thread Haopu Wang
Akhil, thank you for the response. I want to explore more. If the application is just monitoring a HDFS folder and output the word count of each streaming batch into also HDFS. When I kill the application _before_ spark takes a checkpoint, after recovery, spark will resume the processing

Re: If not stop StreamingContext gracefully, will checkpoint data be consistent?

2015-06-15 Thread Akhil Das
I think it should be fine, that's the whole point of check-pointing (in case of driver failure etc). Thanks Best Regards On Mon, Jun 15, 2015 at 6:54 AM, Haopu Wang hw...@qilinsoft.com wrote: Hi, can someone help to confirm the behavior? Thank you! -Original Message- From: Haopu

RE: If not stop StreamingContext gracefully, will checkpoint data be consistent?

2015-06-14 Thread Haopu Wang
Hi, can someone help to confirm the behavior? Thank you! -Original Message- From: Haopu Wang Sent: Friday, June 12, 2015 4:57 PM To: user Subject: If not stop StreamingContext gracefully, will checkpoint data be consistent? This is a quick question about Checkpoint. The question is: if