Hi

How can I avoid duplicate processing of kafka messages in spark stream 1.3
because of executor failure.

1.Can I some how access accumulators of failed task in retry  task to skip
those many events which are already processed by failed task on this
partition ?

2.Or I ll have to persist each msg processed and then check before
processing each msg whether its already processed by failure task and
delete this perisited information at each batch end?

Reply via email to