[
https://issues.apache.org/jira/browse/KAFKA-2329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Guozhang Wang updated KAFKA-2329:
---------------------------------
Status: In Progress (was: Patch Available)
> Consumers balance fails when multiple consumers are started simultaneously.
> ---------------------------------------------------------------------------
>
> Key: KAFKA-2329
> URL: https://issues.apache.org/jira/browse/KAFKA-2329
> Project: Kafka
> Issue Type: Bug
> Components: consumer
> Affects Versions: 0.8.2.1, 0.8.1.1
> Reporter: Ze'ev Eli Klapow
> Assignee: Ze'ev Eli Klapow
> Labels: consumer, patch
> Fix For: 0.8.1.2
>
> Attachments: zookeeper-consumer-connector-epoch-node.patch
>
>
> During consumer startup a race condition can occur if multiple consumers are
> started (nearly) simultaneously.
> If a second consumer is started while the first consumer is in the middle of
> {{zkClient.subscribeChildChanges}} the first consumer will never see the
> registration of the second consumer, because the consumer registration node
> for the second consumer will be unwatched, and no new child will be
> registered later. This causes the first consumer to own all partitions, and
> then never release ownership causing the second consumer to fail rebalancing.
> The attached patch solves this by using an "epoch" node which all consumers
> watch and update to trigger a rebalance. When a rebalance is triggered we
> check the consumer registrations against a cached state, to avoid unnecessary
> rebalances. For safety, we also periodically check the consumer registrations
> and rebalance. We have been using this patch in production at HubSpot for a
> while and it has eliminated all rebalance issues.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)