[ 
https://issues.apache.org/jira/browse/KAFKA-19862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Quah updated KAFKA-19862:
------------------------------
    Fix Version/s: 4.2.0
         Priority: Blocker  (was: Major)

> Group coordinator loading may fail when there is concurrent compaction
> ----------------------------------------------------------------------
>
>                 Key: KAFKA-19862
>                 URL: https://issues.apache.org/jira/browse/KAFKA-19862
>             Project: Kafka
>          Issue Type: Bug
>          Components: group-coordinator
>            Reporter: Sean Quah
>            Assignee: Sean Quah
>            Priority: Blocker
>             Fix For: 4.2.0
>
>
> For consumer and streams groups, we reject replay of 
> {{Consumer/StreamsGroupCurrentMemberAssignment}} records when we detect a 
> partition / task is already owned by another member.
> During group coordinator load, we replay the records in 
> {{{}__consumer_offsets{}}}. When compaction is running concurrently, we can 
> load uncompacted data, followed by a newly swapped in compacted segment, 
> followed by the uncompacted head of the log. This allows for situations where 
> the record unassigning a partition/task is missed during loading.
> eg.
> We can load a record \{ Member A is assigned partition X },
> then miss the record \{ Member A is unassigned partition X },
> then load the record \{ Member B is assigned partition X }, which fails with 
> an exception like
> {{[GroupCoordinator id=2] Failed to load metadata from __consumer_offsets-4 
> with epoch 10 due to java.lang.RuntimeException: Replaying record 
> CoordinatorRecord(key=ConsumerGroupCurrentMemberAssignmentKey(groupId='...', 
> memberId='ZxHk7W53S_aHFdpxYc-_Jw'), 
> value=ApiMessageAndVersion(ConsumerGroupCurrentMemberAssignmentValue(memberEpoch=854659,
>  previousMemberEpoch=854633, state=0, 
> assignedPartitions=[TopicPartitions(topicId=9lL1aTMuSC22QAXsHgzhew, 
> partitions=[1, 2]), TopicPartitions(topicId=RHKM682KQYyOfF1XsOSF1A, 
> partitions=[0]), TopicPartitions(topicId=rKx9q1JmS1uP-ug_cj56ug, 
> partitions=[0]), TopicPartitions(topicId=I7EtFwesTRubnj-VHClqbQ, 
> partitions=[2]), TopicPartitions(topicId=ydAln6IUTZe-od9UUkn3rg, 
> partitions=[2])], partitionsPendingRevocation=[]) at version 0)) from 
> __consumer_offsets-4 at offset 3889549 with producer id -1 and producer epoch 
> -1 failed..}}
> {{java.lang.IllegalStateException: Cannot set the epoch of 
> RHKM682KQYyOfF1XsOSF1A-0 to 854659 because the partition is still owned at 
> epoch 853490}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to