[ 
https://issues.apache.org/jira/browse/KAFKA-16556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirk True updated KAFKA-16556:
------------------------------
    Description: 
There appears to be a race condition between invoking the 
{{ConsumerRebalanceListener}} callbacks on reconciliation and 
{{initWithCommittedOffsetsIfNeeded}} in the consumer.
 
The membership manager adds the newly assigned partitions to the 
{{{}SubscriptionState{}}}, but marks them as {{{}pendingOnAssignedCallback{}}}. 
Then, after the {{ConsumerRebalanceListener.onPartitionsAssigned()}} completes, 
the membership manager will invoke {{enablePartitionsAwaitingCallback}} to set 
all of those partitions' 'pending' flag to false.
 
During the main {{Consumer.poll()}} loop, {{AsyncKafkaConsumer}} may need to 
call {{initWithCommittedOffsetsIfNeeded()}} if the positions aren't already 
cached. Inside {{{}initWithCommittedOffsetsIfNeeded{}}}, the consumer calls the 
subscription's {{initializingPartitions}} method to get a set of the partitions 
for which to fetch their committed offsets. However, 
{{SubscriptionState.initializingPartitions()}} only returns partitions that 
have the {{pendingOnAssignedCallback}} flag set to to false.
 
The result is:
 * If the {{MembershipManagerImpl.assignPartitions()}} future  is completed on 
the background thread first, the 'pending' flag is set to false. On the 
application thread, when {{SubscriptionState.initializingPartitions()}} is 
called, it returns the partition, and we fetch its committed offsets
 * If instead the application thread calls 
{{SubscriptionState.initializingPartitions()}} first, the partitions's 
'pending' flag is still set to false, and so the partition is omitted from the 
returned set. The {{updateFetchPositions()}} method then continues on and 
re-initializes the partition's fetch offset to 0.

  was:
There appears to be a race condition between invoking the 
{{ConsumerRebalanceListener}} callbacks on reconciliation and 
{{initWithCommittedOffsetsIfNeeded}} in the consumer.
 
The membership manager adds the newly assigned partitions to the 
{{{}SubscriptionState{}}}, but marks them as {{{}pendingOnAssignedCallback{}}}. 
Then, after the {{ConsumerRebalanceListener.onPartitionsAssigned()}} completes, 
the membership manager will invoke {{enablePartitionsAwaitingCallback}} to set 
all of those partitions' 'pending' flag to false.
 
During the main {{Consumer.poll()}} loop, {{AsyncKafkaConsumer}} may need to 
call {{initWithCommittedOffsetsIfNeeded()}} if the positions aren't already 
cached. Inside {{{}initWithCommittedOffsetsIfNeeded{}}}, the consumer calls the 
subscription's {{initializingPartitions}} method to get a set of the partitions 
for which to fetch their committed offsets. However, 
{{SubscriptionState.initializingPartitions()}} only returns partitions that 
have the {{pendingOnAssignedCallback}} flag set to to false.
 
The result is: * If the {{MembershipManagerImpl.assignPartitions()}} future  is 
completed on the background thread first, the 'pending' flag is set to false. 
On the application thread, when {{SubscriptionState.initializingPartitions()}} 
is called, it returns the partition, and we fetch its committed offsets
 * If instead the application thread calls 
{{SubscriptionState.initializingPartitions()}} first, the partitions's 
'pending' flag is still set to false, and so the partition is omitted from the 
returned set. The {{updateFetchPositions()}} method then continues on and 
re-initializes the partition's fetch offset to 0.


> Race condition between ConsumerRebalanceListener and SubscriptionState
> ----------------------------------------------------------------------
>
>                 Key: KAFKA-16556
>                 URL: https://issues.apache.org/jira/browse/KAFKA-16556
>             Project: Kafka
>          Issue Type: Bug
>          Components: clients, consumer
>    Affects Versions: 3.7.0
>            Reporter: Kirk True
>            Assignee: Kirk True
>            Priority: Blocker
>              Labels: kip-848-client-support
>             Fix For: 3.8.0
>
>
> There appears to be a race condition between invoking the 
> {{ConsumerRebalanceListener}} callbacks on reconciliation and 
> {{initWithCommittedOffsetsIfNeeded}} in the consumer.
>  
> The membership manager adds the newly assigned partitions to the 
> {{{}SubscriptionState{}}}, but marks them as 
> {{{}pendingOnAssignedCallback{}}}. Then, after the 
> {{ConsumerRebalanceListener.onPartitionsAssigned()}} completes, the 
> membership manager will invoke {{enablePartitionsAwaitingCallback}} to set 
> all of those partitions' 'pending' flag to false.
>  
> During the main {{Consumer.poll()}} loop, {{AsyncKafkaConsumer}} may need to 
> call {{initWithCommittedOffsetsIfNeeded()}} if the positions aren't already 
> cached. Inside {{{}initWithCommittedOffsetsIfNeeded{}}}, the consumer calls 
> the subscription's {{initializingPartitions}} method to get a set of the 
> partitions for which to fetch their committed offsets. However, 
> {{SubscriptionState.initializingPartitions()}} only returns partitions that 
> have the {{pendingOnAssignedCallback}} flag set to to false.
>  
> The result is:
>  * If the {{MembershipManagerImpl.assignPartitions()}} future  is completed 
> on the background thread first, the 'pending' flag is set to false. On the 
> application thread, when {{SubscriptionState.initializingPartitions()}} is 
> called, it returns the partition, and we fetch its committed offsets
>  * If instead the application thread calls 
> {{SubscriptionState.initializingPartitions()}} first, the partitions's 
> 'pending' flag is still set to false, and so the partition is omitted from 
> the returned set. The {{updateFetchPositions()}} method then continues on and 
> re-initializes the partition's fetch offset to 0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to