[ https://issues.apache.org/jira/browse/KAFKA-10614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Guozhang Wang resolved KAFKA-10614. ----------------------------------- Fix Version/s: 3.0.0 Resolution: Fixed > Group coordinator onElection/onResignation should guard against leader epoch > ---------------------------------------------------------------------------- > > Key: KAFKA-10614 > URL: https://issues.apache.org/jira/browse/KAFKA-10614 > Project: Kafka > Issue Type: Bug > Components: core > Reporter: Guozhang Wang > Assignee: Tom Bentley > Priority: Major > Fix For: 3.0.0 > > > When there are a sequence of LeaderAndISR or StopReplica requests sent from > different controllers causing the group coordinator to elect / resign, we may > re-order the events due to race condition. For example: > 1) First LeaderAndISR request received from old controller to resign as the > group coordinator. > 2) Second LeaderAndISR request received from new controller to elect as the > group coordinator. > 3) Although threads handling the 1/2) requests are synchronized on the > replica manager, their callback {{onLeadershipChange}} would trigger > {{onElection/onResignation}} which would schedule the loading / unloading on > background threads, and are not synchronized. > 4) As a result, the {{onElection}} maybe triggered by the thread first, and > then {{onResignation}}. As a result, the coordinator would not recognize it > self as the coordinator and hence would respond any coordinator request with > {{NOT_COORDINATOR}}. > Here are two proposals on top of my head: > 1) Let the scheduled load / unload function to keep the passed in leader > epoch, and also materialize the epoch in memory. Then when execute the > unloading check against the leader epoch. > 2) This may be a bit simpler: using a single background thread working on a > FIFO queue of loading / unloading jobs, since the caller are actually > synchronized on replica manager and order preserved, the enqueued loading / > unloading job would be correctly ordered as well. In that case we would avoid > the reordering. -- This message was sent by Atlassian Jira (v8.3.4#803005)