RE: Spark streaming -> cassandra : Fault Tolerance
Hi Cody, Thanks for your reply. Is there a way in Spark-Kafka-Direct API, so that if an exception to write to Cassandra occurs, we stop updating the checkpoint ? In this way, there will be no message lost, once cassandra comes up, we can start reading from the point we left off. Regards, Sam From: Cody Koeninger [mailto:c...@koeninger.org] Sent: Thursday, September 10, 2015 1:13 AM To: Samya MAITI Cc: user@spark.apache.org Subject: Re: Spark streaming -> cassandra : Fault Tolerance It's been a while since I've looked at the cassandra connector, so I can't give you specific advice on it. But in general, if a spark task fails (uncaught exception), it will be retried automatically. In the case of the kafka direct stream rdd, it will have exactly the same messages as the first attempt (as long as they're still in the kafka log). If you or the cassandra connector are catching the exception, the task won't be retried automatically and it's up to you to deal with it. On Wed, Sep 9, 2015 at 2:09 PM, Samya mailto:samya.ma...@amadeus.com>> wrote: Hi Team, I have an sample spark application which reads from Kafka using direct API & then does some transformation & stores to cassandra (using saveToCassandra()). If Cassandra goes down, then application logs NoHostAvailable exception (as expected). But in the mean time the new incoming messages are lost, as the Direct API creates new checkpoint & deletes the previous one's. Does that mean, I should handle the exception at application side? Or is there any other hook to handle the same? Thanks in advance. Regards, Sam -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-streaming-cassandra-Fault-Tolerance-tp24625.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org<mailto:user-unsubscr...@spark.apache.org> For additional commands, e-mail: user-h...@spark.apache.org<mailto:user-h...@spark.apache.org>
Re: Spark streaming -> cassandra : Fault Tolerance
It's been a while since I've looked at the cassandra connector, so I can't give you specific advice on it. But in general, if a spark task fails (uncaught exception), it will be retried automatically. In the case of the kafka direct stream rdd, it will have exactly the same messages as the first attempt (as long as they're still in the kafka log). If you or the cassandra connector are catching the exception, the task won't be retried automatically and it's up to you to deal with it. On Wed, Sep 9, 2015 at 2:09 PM, Samya wrote: > Hi Team, > > I have an sample spark application which reads from Kafka using direct API > & > then does some transformation & stores to cassandra (using > saveToCassandra()). > > If Cassandra goes down, then application logs NoHostAvailable exception (as > expected). But in the mean time the new incoming messages are lost, as the > Direct API creates new checkpoint & deletes the previous one's. > > Does that mean, I should handle the exception at application side? > > Or is there any other hook to handle the same? > > Thanks in advance. > > Regards, > Sam > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Spark-streaming-cassandra-Fault-Tolerance-tp24625.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > - > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >
Spark streaming -> cassandra : Fault Tolerance
Hi Team, I have an sample spark application which reads from Kafka using direct API & then does some transformation & stores to cassandra (using saveToCassandra()). If Cassandra goes down, then application logs NoHostAvailable exception (as expected). But in the mean time the new incoming messages are lost, as the Direct API creates new checkpoint & deletes the previous one's. Does that mean, I should handle the exception at application side? Or is there any other hook to handle the same? Thanks in advance. Regards, Sam -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-streaming-cassandra-Fault-Tolerance-tp24625.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org