Yes. That could happen.

Kafka provided at-least-once processing semantics if you commit messages
after processing.

You can avoid duplicates, if you commit offsets before processing, but
this might result in data loss.

Getting exactly-once is quite hard, and you will need to build your own
de-duplication logic.


-Matthias

On 2/10/17 10:40 AM, Michaud, Ben A wrote:
> Is it possible to receive duplicate messages from Kafka 0.9.0.1 or 0.10.1.0 
> when you have a topic with three partitions, and one consumer group with 
> three consumer clients. One client stops consuming and is taken offline. 
> These clients do not commit offset immediately, but the offsets are committed 
> automatically after a default wait time setting. The partition assigned to 
> the client that goes down is moved to another client in the same group 
> automatically.
> 
> Meanwhile, the client that went down gets some TLC, still holds some messages 
> that were retrieved but never fully processed. When it comes back up, it 
> happily completes processing the data and writes it to an HDFS.
> 
> Will the second client be given uncommitted messages that the first client 
> had already received, but never committed? This would result in duplicate 
> messages on HDFS, which is what we witnessed this week when just such a thing 
> happened.
> 
> Regards,
> Ben
> 
> 
> 
> This e-mail, including attachments, may include confidential and/or
> proprietary information, and may be used only by the person or entity
> to which it is addressed. If the reader of this e-mail is not the intended
> recipient or his or her authorized agent, the reader is hereby notified
> that any dissemination, distribution or copying of this e-mail is
> prohibited. If you have received this e-mail in error, please notify the
> sender by replying to this message and delete this e-mail immediately.
> 

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to