Spark 1.5 kafka direct i think does not store messages rather than it
fetches messages as in when consumed in the pipeline.That would prevent you
from having data loss.



On Fri, Oct 9, 2015 at 7:34 AM, bitborn <andrew.clark...@ave81.com> wrote:

> Hi all,
>
> My company is using Spark streaming and the Kafka API's to process an event
> stream. We've got most of our application written, but are stuck on "at
> least once" processing.
>
> I created a demo to show roughly what we're doing here:
> https://github.com/bitborn/resilient-kafka-streaming-in-spark
> <https://github.com/bitborn/resilient-kafka-streaming-in-spark>
>
> The problem we're having is when the application experiences an exception
> (network issue, out of memory, etc) it will drop the batch it's processing.
> The ideal behavior is it will process each event "at least once" even if
> that means processing it more than once. Whether this happens via
> checkpointing, WAL, or kafka offsets is irrelevant, as long as we don't
> drop
> data. :)
>
> A couple of things we've tried:
> - Using the kafka direct stream API (via  Cody Koeninger
> <
> https://github.com/koeninger/kafka-exactly-once/blob/master/src/main/scala/example/IdempotentExample.scala
> >
> )
> - Using checkpointing with both the low-level and high-level API's
> - Enabling the write ahead log
>
> I've included a log here  spark.log
> <
> https://github.com/bitborn/resilient-kafka-streaming-in-spark/blob/master/spark.log
> >
> , but I'm afraid it doesn't reveal much.
>
> The fact that others seem to be able to get this working properly suggests
> we're missing some magic configuration or are possibly executing it in a
> way
> that won't support the desired behavior.
>
> I'd really appreciate some pointers!
>
> Thanks much,
> Andrew Clarkson
>
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Kafka-streaming-at-least-once-semantics-tp24995.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Reply via email to