GitHub user ernisv opened a pull request: https://github.com/apache/storm/pull/1873
Kafka spout - no duplicates on leader changes Current behavior of Kafka spout emits duplicate tuples whenever Kafka topic leader's change. In case of exception caused by leader changes, PartitionManagers are simply recreated losing the state about which tuples were already emitted and new PartitionManager re-emits them again. This is fine as at-least-once is fulfilled, but still it would be better to not emit duplicate data if possible. Moreover this could be easily avoided by moving the state related to emitted tuples from old PartitionManager to new one. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ernisv/storm kafka_spout_no_dup_on_leader_changes Alternatively you can review and apply these changes as the patch at: https://github.com/apache/storm/pull/1873.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1873 ---- commit 9b1e336ed03a320b25e5799655eb64571ce21c48 Author: Ernestas Vaiciukevicius <e.vaiciukevic...@adform.com> Date: 2017-01-12T14:54:59Z Move state from old PartitionManager when recreating manager for same partition commit a1a7cef9c84941ef8a1909fd4db10c85fe509e0e Author: Ernestas Vaiciukevicius <e.vaiciukevic...@adform.com> Date: 2017-01-12T15:39:51Z Test to check if old PartitionManager's state is moved to new manager during manager recreate commit c8c6ee83d69cf76d8aaeb9d5ccaedbd5946d4c9b Author: Ernestas Vaiciukevicius <e.vaiciukevic...@adform.com> Date: 2017-01-12T15:57:46Z Include _emittedToOffset when copying state during PartitionManager recreate ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---