David Arthur created KAFKA-12686:
------------------------------------
Summary: Race condition in AlterIsr response handling
Key: KAFKA-12686
URL: https://issues.apache.org/jira/browse/KAFKA-12686
Project: Kafka
Issue Type: Bug
Affects Versions: 2.8.0, 2.7.0
Reporter: David Arthur
Assignee: David Arthur
Fix For: 3.0.0
In Partition.scala, there is a race condition between the handling of an
AlterIsrResponse and a LeaderAndIsrRequest. This is a pretty rare scenario and
would involve the AlterIsrResponse being delayed for some time, but it is
possible. This was observed in a test environment when lots of ISR and
leadership changes were happening due to broker restarts.
When the leader handles the LeaderAndIsr, it calls Partition#makeLeader which
overrides the {{isrState}} variable and clears the pending ISR items via
{{AlterIsrManager#clearPending(TopicPartition)}}.
The bug is that AlterIsrManager does not check its inflight state before
clearing pending items. The way AlterIsrManager is designed, it retains
inflight items in the pending items collection until the response is processed
(to allow for retries). The result is that an inflight item is inadvertently
removed from this collection.
Since the inflight item is cleared from the collection, AlterIsrManager allows
for new AlterIsrItem-s to be enqueued for this partition even though it has an
inflight AlterIsrItem. By allowing an update to be enqueued, Partition will
transition its {{isrState}} to one of the inflight states (PendingIsrExpand,
PendingIsrShrink, etc). Once the inflight partition's response is handled, it
will fail to update the {{isrState}} due to detecting changes since the request
was sent (which is by design). However, after the response callback is run,
AlterIsrManager will clear the partitions that it saw in the response from the
unsent items collection. This includes the newly added (and unsent) update.
The result is that Partition has a "inflight" isrState but AlterIsrManager does
not have an unsent item for this partition. This prevents any further ISR
updates on the partition until the next leader election (when {{isrState}} is
reset).
If this bug is encountered, the workaround is to force a leader election which
will reset the partition's state.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)