avoid duplicate due to executor failure in spark stream

Shushant Arora Mon, 10 Aug 2015 14:33:00 -0700

Hi

How can I avoid duplicate processing of kafka messages in spark stream 1.3
because of executor failure.


1.Can I some how access accumulators of failed task in retry  task to skip
those many events which are already processed by failed task on this
partition ?

2.Or I ll have to persist each msg processed and then check before
processing each msg whether its already processed by failure task and
delete this perisited information at each batch end?

avoid duplicate due to executor failure in spark stream

Reply via email to