[
https://issues.apache.org/jira/browse/KAFKA-262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Neha Narkhede closed KAFKA-262.
-------------------------------
> Bug in the consumer rebalancing logic causes one consumer to release
> partitions that it does not own
> ----------------------------------------------------------------------------------------------------
>
> Key: KAFKA-262
> URL: https://issues.apache.org/jira/browse/KAFKA-262
> Project: Kafka
> Issue Type: Bug
> Components: core
> Affects Versions: 0.7
> Reporter: Neha Narkhede
> Assignee: Neha Narkhede
> Fix For: 0.7.1
>
> Attachments: kafka-262-v3.patch, kafka-262.patch
>
> Original Estimate: 24h
> Remaining Estimate: 24h
>
> The consumer maintains a cache of topics and partitions it owns along with
> the fetcher queues corresponding to those. But while releasing partition
> ownership, this cache is not cleared. This leads the consumer to release a
> partition that it does not own any more. This can also lead the consumer to
> commit offsets for partitions that it no longer consumes from.
> The rebalance operation goes through following steps -
> 1. close fetchers
> 2. commit offsets
> 3. release partition ownership.
> 4. rebalance, add topic, partition and fetcher queues to the topic registry,
> for all topics that the consumer process currently wants to own.
> 5. If the consumer runs into conflict for one topic or partition, the
> rebalancing attempt fails, and it goes to step 1.
> Say, there are 2 consumers in a group, c1 and c2. Both are consuming topic1
> with partitions 0-0, 0-1 and 1-0. Say c1 owns 0-0 and 0-1 and c2 owns 1-0.
> 1. Broker 1 goes down. This triggers rebalancing attempt in c1 and c2.
> 2. c1's release partition ownership and during step 4 (above), fails to
> rebalance.
> 3. Meanwhile, c2 completes rebalancing successfully, and owns partition 0-1
> and starts consuming data.
> 4. c1 starts next rebalancing attempt and during step 3 (above), it releases
> partition 0-1. During step 4, it owns partition 0-0 again, and starts
> consuming data.
> 5. Effectively, rebalancing has completed successfully, but there is no owner
> for partition 0-1 registered in Zookeeper.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)