Jason Gustafson created KAFKA-2841:
--------------------------------------
Summary: Group metadata cache loading is not safe when reloading a
partition
Key: KAFKA-2841
URL: https://issues.apache.org/jira/browse/KAFKA-2841
Project: Kafka
Issue Type: Bug
Affects Versions: 0.9.0.0
Reporter: Jason Gustafson
Assignee: Jason Gustafson
Priority: Blocker
If the coordinator receives a leaderAndIsr request which includes a higher
leader epoch for one of the partitions that it owns, then it will reload the
offset/metadata from the offsets topic again. This can happen because the
leader epoch is incremented for ISR changes which do not result in a new leader
for the partition. Currently, the coordinator replaces cached metadata values
blindly on reloading, which can result in weird behavior such as unexpected
session timeouts or request timeouts while rebalancing.
To fix this, we need to check that the group being loaded has a higher
generation than the cached value before replacing it. Also, if we have to
replace a cached value (which shouldn't happen except when loading), we need to
be very careful to ensure that any active delayed operations won't affect the
group.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)