One workaround would be remove/move the files from the input directory once
you have it processed.
Thanks
Best Regards
On Fri, Jun 19, 2015 at 5:48 AM, Haopu Wang hw...@qilinsoft.com wrote:
Akhil,
From my test, I can see the files in the last batch will alwyas be
reprocessed upon
Akhil,
From my test, I can see the files in the last batch will alwyas be
reprocessed upon restarting from checkpoint even for graceful shutdown.
I think usually the file is expected to be processed only once. Maybe
this is a bug in fileStream? or do you know any approach to workaround
it?
Good question, with fileStream or textFileStream basically it will only
takes in the files whose timestamp is the current timestamp
Akhil, thank you for the response. I want to explore more.
If the application is just monitoring a HDFS folder and output the word
count of each streaming batch into also HDFS.
When I kill the application _before_ spark takes a checkpoint, after
recovery, spark will resume the processing
I think it should be fine, that's the whole point of check-pointing (in
case of driver failure etc).
Thanks
Best Regards
On Mon, Jun 15, 2015 at 6:54 AM, Haopu Wang hw...@qilinsoft.com wrote:
Hi, can someone help to confirm the behavior? Thank you!
-Original Message-
From: Haopu
Hi, can someone help to confirm the behavior? Thank you!
-Original Message-
From: Haopu Wang
Sent: Friday, June 12, 2015 4:57 PM
To: user
Subject: If not stop StreamingContext gracefully, will checkpoint data
be consistent?
This is a quick question about Checkpoint. The question is: if