RE: StreamingContext textFileStream question

2015-02-23 Thread Shao, Saisai
Hi Mark,

For input streams like text input stream, only RDDs can be recovered from 
checkpoint, no missed files, if file is missed, actually an exception will be 
raised. If you use HDFS, HDFS will guarantee no data loss since it has 3 
copies.Otherwise user logic has to guarantee no file deleted before recovering.

For input stream which is receiver based, like Kafka input stream or socket 
input stream, a WAL(write ahead log) mechanism can be enabled to store the 
received data as well as metadata, so data can be recovered from failure.

Thanks
Jerry

-Original Message-
From: mkhaitman [mailto:mark.khait...@chango.com] 
Sent: Monday, February 23, 2015 10:54 AM
To: dev@spark.apache.org
Subject: StreamingContext textFileStream question

Hello,

I was interested in creating a StreamingContext textFileStream based job, which 
runs for long durations, and can also recover from prolonged driver failure... 
It seems like StreamingContext checkpointing is mainly used for the case when 
the driver dies during the processing of an RDD, and to recover that one RDD, 
but my question specifically relates to whether there is a way to also recover 
which files were missed between the timeframe of the driver dying and being 
started back up (whether manually or automatically).

Any assistance/suggestions with this one would be greatly appreciated!

Thanks,
Mark.



--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/StreamingContext-textFileStream-question-tp10742.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional 
commands, e-mail: dev-h...@spark.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



RE: StreamingContext textFileStream question

2015-02-23 Thread mkhaitman
Hi Jerry,

Thanks for the quick response! Looks like I'll need to come up with an
alternative solution in the meantime,  since I'd like to avoid the other
input streams + WAL approach. :)

Thanks again,
Mark.



--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/StreamingContext-textFileStream-question-tp10742p10745.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org