Yes. That could happen. Kafka provided at-least-once processing semantics if you commit messages after processing.
You can avoid duplicates, if you commit offsets before processing, but this might result in data loss. Getting exactly-once is quite hard, and you will need to build your own de-duplication logic. -Matthias On 2/10/17 10:40 AM, Michaud, Ben A wrote: > Is it possible to receive duplicate messages from Kafka 0.9.0.1 or 0.10.1.0 > when you have a topic with three partitions, and one consumer group with > three consumer clients. One client stops consuming and is taken offline. > These clients do not commit offset immediately, but the offsets are committed > automatically after a default wait time setting. The partition assigned to > the client that goes down is moved to another client in the same group > automatically. > > Meanwhile, the client that went down gets some TLC, still holds some messages > that were retrieved but never fully processed. When it comes back up, it > happily completes processing the data and writes it to an HDFS. > > Will the second client be given uncommitted messages that the first client > had already received, but never committed? This would result in duplicate > messages on HDFS, which is what we witnessed this week when just such a thing > happened. > > Regards, > Ben > > > > This e-mail, including attachments, may include confidential and/or > proprietary information, and may be used only by the person or entity > to which it is addressed. If the reader of this e-mail is not the intended > recipient or his or her authorized agent, the reader is hereby notified > that any dissemination, distribution or copying of this e-mail is > prohibited. If you have received this e-mail in error, please notify the > sender by replying to this message and delete this e-mail immediately. >
signature.asc
Description: OpenPGP digital signature