Re: avoid duplicate due to executor failure in spark stream

Shushant Arora Tue, 11 Aug 2015 09:29:39 -0700

What if processing is neither idempotent nor its in transaction ,say  I am
posting events to some external server after processing.

Is it possible to get accumulator of failed task in retry task? Is there
any way to detect whether this task is retried task or original task ?

I was trying to achieve something like incrementing a counter after each
event processed and if task fails- retry task will just ignore already
processed events by accessing counter of failed task. Is it directly
possible to access accumulator per task basis without writing to hdfs or
hbase.

On Tue, Aug 11, 2015 at 3:15 AM, Cody Koeninger <c...@koeninger.org> wrote:

>
> http://spark.apache.org/docs/latest/streaming-kafka-integration.html#approach-2-direct-approach-no-receivers
>
>
> http://spark.apache.org/docs/latest/streaming-programming-guide.html#semantics-of-output-operations
>
> https://www.youtube.com/watch?v=fXnNEq1v3VA
>
>
> On Mon, Aug 10, 2015 at 4:32 PM, Shushant Arora <shushantaror...@gmail.com
> > wrote:
>
>> Hi
>>
>> How can I avoid duplicate processing of kafka messages in spark stream
>> 1.3 because of executor failure.
>>
>> 1.Can I some how access accumulators of failed task in retry  task to
>> skip those many events which are already processed by failed task on this
>> partition ?
>>
>> 2.Or I ll have to persist each msg processed and then check before
>> processing each msg whether its already processed by failure task and
>> delete this perisited information at each batch end?
>>
>
>

Re: avoid duplicate due to executor failure in spark stream

Reply via email to