Re: avoid duplicate due to executor failure in spark stream

2015-08-12 Thread Cody Koeninger
Accumulators aren't going to work to communicate state changes between executors. You need external storage. On Tue, Aug 11, 2015 at 11:28 AM, Shushant Arora shushantaror...@gmail.com wrote: What if processing is neither idempotent nor its in transaction ,say I am posting events to some

Re: avoid duplicate due to executor failure in spark stream

2015-08-11 Thread Shushant Arora
What if processing is neither idempotent nor its in transaction ,say I am posting events to some external server after processing. Is it possible to get accumulator of failed task in retry task? Is there any way to detect whether this task is retried task or original task ? I was trying to

Re: avoid duplicate due to executor failure in spark stream

2015-08-10 Thread Cody Koeninger
http://spark.apache.org/docs/latest/streaming-kafka-integration.html#approach-2-direct-approach-no-receivers http://spark.apache.org/docs/latest/streaming-programming-guide.html#semantics-of-output-operations https://www.youtube.com/watch?v=fXnNEq1v3VA On Mon, Aug 10, 2015 at 4:32 PM, Shushant

avoid duplicate due to executor failure in spark stream

2015-08-10 Thread Shushant Arora
Hi How can I avoid duplicate processing of kafka messages in spark stream 1.3 because of executor failure. 1.Can I some how access accumulators of failed task in retry task to skip those many events which are already processed by failed task on this partition ? 2.Or I ll have to persist each