Spark streaming - tracking/deleting processed files

ganterm Fri, 30 Jan 2015 10:09:12 -0800

We are running a Spark streaming job that retrieves files from a directory
(using textFileStream). 
One concern we are having is the case where the job is down but files are
still being added to the directory.
Once the job starts up again, those files are not being picked up (since
they are not new or changed while the job is running) but we would like them
to be processed. 
Is there a solution for that? Is there a way to keep track what files have
been processed and can we "force" older files to be picked up? Is there a
way to delete the processed files?


Thanks!
Markus 



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-streaming-tracking-deleting-processed-files-tp21444.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Spark streaming - tracking/deleting processed files

Reply via email to