[ https://issues.apache.org/jira/browse/KAFKA-16556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Kirk True updated KAFKA-16556: ------------------------------ Description: There appears to be a race condition between invoking the {{ConsumerRebalanceListener}} callbacks on reconciliation and {{initWithCommittedOffsetsIfNeeded}} in the consumer. The membership manager adds the newly assigned partitions to the {{{}SubscriptionState{}}}, but marks them as {{{}pendingOnAssignedCallback{}}}. Then, after the {{ConsumerRebalanceListener.onPartitionsAssigned()}} completes, the membership manager will invoke {{enablePartitionsAwaitingCallback}} to set all of those partitions' 'pending' flag to false. During the main {{Consumer.poll()}} loop, {{AsyncKafkaConsumer}} may need to call {{initWithCommittedOffsetsIfNeeded()}} if the positions aren't already cached. Inside {{{}initWithCommittedOffsetsIfNeeded{}}}, the consumer calls the subscription's {{initializingPartitions}} method to get a set of the partitions for which to fetch their committed offsets. However, {{SubscriptionState.initializingPartitions()}} only returns partitions that have the {{pendingOnAssignedCallback}} flag set to to false. The result is: * If the {{MembershipManagerImpl.assignPartitions()}} future is completed on the background thread first, the 'pending' flag is set to false. On the application thread, when {{SubscriptionState.initializingPartitions()}} is called, it returns the partition, and we fetch its committed offsets * If instead the application thread calls {{SubscriptionState.initializingPartitions()}} first, the partitions's 'pending' flag is still set to false, and so the partition is omitted from the returned set. The {{updateFetchPositions()}} method then continues on and re-initializes the partition's fetch offset to 0. was: There appears to be a race condition between invoking the {{ConsumerRebalanceListener}} callbacks on reconciliation and {{initWithCommittedOffsetsIfNeeded}} in the consumer. The membership manager adds the newly assigned partitions to the {{{}SubscriptionState{}}}, but marks them as {{{}pendingOnAssignedCallback{}}}. Then, after the {{ConsumerRebalanceListener.onPartitionsAssigned()}} completes, the membership manager will invoke {{enablePartitionsAwaitingCallback}} to set all of those partitions' 'pending' flag to false. During the main {{Consumer.poll()}} loop, {{AsyncKafkaConsumer}} may need to call {{initWithCommittedOffsetsIfNeeded()}} if the positions aren't already cached. Inside {{{}initWithCommittedOffsetsIfNeeded{}}}, the consumer calls the subscription's {{initializingPartitions}} method to get a set of the partitions for which to fetch their committed offsets. However, {{SubscriptionState.initializingPartitions()}} only returns partitions that have the {{pendingOnAssignedCallback}} flag set to to false. The result is: * If the {{MembershipManagerImpl.assignPartitions()}} future is completed on the background thread first, the 'pending' flag is set to false. On the application thread, when {{SubscriptionState.initializingPartitions()}} is called, it returns the partition, and we fetch its committed offsets * If instead the application thread calls {{SubscriptionState.initializingPartitions()}} first, the partitions's 'pending' flag is still set to false, and so the partition is omitted from the returned set. The {{updateFetchPositions()}} method then continues on and re-initializes the partition's fetch offset to 0. > Race condition between ConsumerRebalanceListener and SubscriptionState > ---------------------------------------------------------------------- > > Key: KAFKA-16556 > URL: https://issues.apache.org/jira/browse/KAFKA-16556 > Project: Kafka > Issue Type: Bug > Components: clients, consumer > Affects Versions: 3.7.0 > Reporter: Kirk True > Assignee: Kirk True > Priority: Blocker > Labels: kip-848-client-support > Fix For: 3.8.0 > > > There appears to be a race condition between invoking the > {{ConsumerRebalanceListener}} callbacks on reconciliation and > {{initWithCommittedOffsetsIfNeeded}} in the consumer. > > The membership manager adds the newly assigned partitions to the > {{{}SubscriptionState{}}}, but marks them as > {{{}pendingOnAssignedCallback{}}}. Then, after the > {{ConsumerRebalanceListener.onPartitionsAssigned()}} completes, the > membership manager will invoke {{enablePartitionsAwaitingCallback}} to set > all of those partitions' 'pending' flag to false. > > During the main {{Consumer.poll()}} loop, {{AsyncKafkaConsumer}} may need to > call {{initWithCommittedOffsetsIfNeeded()}} if the positions aren't already > cached. Inside {{{}initWithCommittedOffsetsIfNeeded{}}}, the consumer calls > the subscription's {{initializingPartitions}} method to get a set of the > partitions for which to fetch their committed offsets. However, > {{SubscriptionState.initializingPartitions()}} only returns partitions that > have the {{pendingOnAssignedCallback}} flag set to to false. > > The result is: > * If the {{MembershipManagerImpl.assignPartitions()}} future is completed > on the background thread first, the 'pending' flag is set to false. On the > application thread, when {{SubscriptionState.initializingPartitions()}} is > called, it returns the partition, and we fetch its committed offsets > * If instead the application thread calls > {{SubscriptionState.initializingPartitions()}} first, the partitions's > 'pending' flag is still set to false, and so the partition is omitted from > the returned set. The {{updateFetchPositions()}} method then continues on and > re-initializes the partition's fetch offset to 0. -- This message was sent by Atlassian Jira (v8.20.10#820010)