Spark 1.5 kafka direct i think does not store messages rather than it fetches messages as in when consumed in the pipeline.That would prevent you from having data loss.
On Fri, Oct 9, 2015 at 7:34 AM, bitborn <andrew.clark...@ave81.com> wrote: > Hi all, > > My company is using Spark streaming and the Kafka API's to process an event > stream. We've got most of our application written, but are stuck on "at > least once" processing. > > I created a demo to show roughly what we're doing here: > https://github.com/bitborn/resilient-kafka-streaming-in-spark > <https://github.com/bitborn/resilient-kafka-streaming-in-spark> > > The problem we're having is when the application experiences an exception > (network issue, out of memory, etc) it will drop the batch it's processing. > The ideal behavior is it will process each event "at least once" even if > that means processing it more than once. Whether this happens via > checkpointing, WAL, or kafka offsets is irrelevant, as long as we don't > drop > data. :) > > A couple of things we've tried: > - Using the kafka direct stream API (via Cody Koeninger > < > https://github.com/koeninger/kafka-exactly-once/blob/master/src/main/scala/example/IdempotentExample.scala > > > ) > - Using checkpointing with both the low-level and high-level API's > - Enabling the write ahead log > > I've included a log here spark.log > < > https://github.com/bitborn/resilient-kafka-streaming-in-spark/blob/master/spark.log > > > , but I'm afraid it doesn't reveal much. > > The fact that others seem to be able to get this working properly suggests > we're missing some magic configuration or are possibly executing it in a > way > that won't support the desired behavior. > > I'd really appreciate some pointers! > > Thanks much, > Andrew Clarkson > > > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Kafka-streaming-at-least-once-semantics-tp24995.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >