[ https://issues.apache.org/jira/browse/KAFKA-16147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
David Jacot reassigned KAFKA-16147: ----------------------------------- Assignee: David Jacot > Partition is assigned to two members at the same time > ----------------------------------------------------- > > Key: KAFKA-16147 > URL: https://issues.apache.org/jira/browse/KAFKA-16147 > Project: Kafka > Issue Type: Sub-task > Reporter: Emanuele Sabellico > Assignee: David Jacot > Priority: Major > Attachments: broker1.log, broker2.log, broker3.log, librdkafka.log, > server.properties, server1.properties, server2.properties > > > While running [test 0113 of > librdkafka|https://github.com/confluentinc/librdkafka/blob/8b6357f872efe2a5a3a2fd2828e4133f85e6b023/tests/0113-cooperative_rebalance.cpp#L2384], > subtest _u_multiple_subscription_changes_ have received this error saying > that a partition is assigned to two members at the same time. > {code:java} > Error: C_6#consumer-9 is assigned rdkafkatest_rnd550f20623daba04c_0113u_2 [0] > which is already assigned to consumer C_5#consumer-8 {code} > I've reconstructed this sequence: > C_5 SUBSCRIBES TO T1 > {noformat} > %7|1705403451.561|HEARTBEAT|C_5#consumer-8| [thrd:main]: GroupCoordinator/1: > Heartbeat of member id "RaTCu6RXQH-FiSl95iZzdw", group id > "rdkafkatest_rnd53b4eb0c2de343_0113u", generation id 6, group instance id > "(null)", current assignment "", subscribe topics > "rdkafkatest_rnd5a91902462d61c2e_0113u_1((null))[-1]"{noformat} > C_5 ASSIGNMENT CHANGES TO T1-P7, T1-P8, T1-P12 > {noformat} > [2024-01-16 12:10:51,562] INFO [GroupCoordinator id=1 > topic=__consumer_offsets partition=7] [GroupId > rdkafkatest_rnd53b4eb0c2de343_0113u] Member RaTCu6RXQH-FiSl95iZzdw > transitioned from CurrentAssignment(memberEpoch=6, previousMemberEpoch=0, > targetMemberEpoch=6, state=assigning, assignedPartitions={}, > partitionsPendingRevocation={}, > partitionsPendingAssignment={IKXGrFR1Rv-Qes7Ummas6A=[3, 12]}) to > CurrentAssignment(memberEpoch=14, previousMemberEpoch=6, > targetMemberEpoch=14, state=stable, > assignedPartitions={IKXGrFR1Rv-Qes7Ummas6A=[7, 8, 12]}, > partitionsPendingRevocation={}, partitionsPendingAssignment={}). > (org.apache.kafka.coordinator.group.GroupMetadataManager){noformat} > > C_5 RECEIVES TARGET ASSIGNMENT > {noformat} > %7|1705403451.565|HEARTBEAT|C_5#consumer-8| [thrd:main]: GroupCoordinator/1: > Heartbeat response received target assignment > "(null)(IKXGrFR1Rv+Qes7Ummas6A)[7], (null)(IKXGrFR1Rv+Qes7Ummas6A)[8], > (null)(IKXGrFR1Rv+Qes7Ummas6A)[12]"{noformat} > > C_5 ACKS TARGET ASSIGNMENT > {noformat} > %7|1705403451.566|HEARTBEAT|C_5#consumer-8| [thrd:main]: GroupCoordinator/1: > Heartbeat of member id "RaTCu6RXQH-FiSl95iZzdw", group id > "rdkafkatest_rnd53b4eb0c2de343_0113u", generation id 14, group instance id > "NULL", current assignment > "rdkafkatest_rnd5a91902462d61c2e_0113u_1(IKXGrFR1Rv+Qes7Ummas6A)[7], > rdkafkatest_rnd5a91902462d61c2e_0113u_1(IKXGrFR1Rv+Qes7Ummas6A)[8], > rdkafkatest_rnd5a91902462d61c2e_0113u_1(IKXGrFR1Rv+Qes7Ummas6A)[12]", > subscribe topics "rdkafkatest_rnd5a91902462d61c2e_0113u_1((null))[-1]" > %7|1705403451.567|HEARTBEAT|C_5#consumer-8| [thrd:main]: GroupCoordinator/1: > Heartbeat response received target assignment > "(null)(IKXGrFR1Rv+Qes7Ummas6A)[7], (null)(IKXGrFR1Rv+Qes7Ummas6A)[8], > (null)(IKXGrFR1Rv+Qes7Ummas6A)[12]"{noformat} > > C_5 SUBSCRIBES TO T1,T2: T1 partitions are revoked, 5 T2 partitions are > pending > {noformat} > %7|1705403452.612|HEARTBEAT|C_5#consumer-8| [thrd:main]: GroupCoordinator/1: > Heartbeat of member id "RaTCu6RXQH-FiSl95iZzdw", group id > "rdkafkatest_rnd53b4eb0c2de343_0113u", generation id 14, group instance id > "NULL", current assignment "NULL", subscribe topics > "rdkafkatest_rnd550f20623daba04c_0113u_2((null))[-1], > rdkafkatest_rnd5a91902462d61c2e_0113u_1((null))[-1]" > [2024-01-16 12:10:52,615] INFO [GroupCoordinator id=1 > topic=__consumer_offsets partition=7] [GroupId > rdkafkatest_rnd53b4eb0c2de343_0113u] Member RaTCu6RXQH-FiSl95iZzdw updated > its subscribed topics to: [rdkafkatest_rnd550f20623daba04c_0113u_2, > rdkafkatest_rnd5a91902462d61c2e_0113u_1]. > (org.apache.kafka.coordinator.group.GroupMetadataManager) > [2024-01-16 12:10:52,616] INFO [GroupCoordinator id=1 > topic=__consumer_offsets partition=7] [GroupId > rdkafkatest_rnd53b4eb0c2de343_0113u] Member RaTCu6RXQH-FiSl95iZzdw > transitioned from CurrentAssignment(memberEpoch=14, previousMemberEpoch=6, > targetMemberEpoch=14, state=stable, > assignedPartitions={IKXGrFR1Rv-Qes7Ummas6A=[7, 8, 12]}, > partitionsPendingRevocation={}, partitionsPendingAssignment={}) to > CurrentAssignment(memberEpoch=14, previousMemberEpoch=6, > targetMemberEpoch=16, state=revoking, assignedPartitions={}, > partitionsPendingRevocation={IKXGrFR1Rv-Qes7Ummas6A=[7, 8, 12]}, > partitionsPendingAssignment={EnZMikZURKiUoxZf0rozaA=[0, 1, 2, 8, 9]}). > (org.apache.kafka.coordinator.group.GroupMetadataManager) > %7|1705403452.618|HEARTBEAT|C_5#consumer-8| [thrd:main]: GroupCoordinator/1: > Heartbeat response received target assignment ""{noformat} > C_5 FINISHES REVOCATION > {noformat} > %7|1705403452.618|CGRPJOINSTATE|C_5#consumer-8| [thrd:main]: Group > "rdkafkatest_rnd53b4eb0c2de343_0113u" changed join state wait-assign-call -> > steady (state up){noformat} > C_5 ACKS REVOCATION, RECEIVES T2-P0,T2-P1,T2-P2 > > {noformat} > %7|1705403452.618|HEARTBEAT|C_5#consumer-8| [thrd:main]: GroupCoordinator/1: > Heartbeat of member id "RaTCu6RXQH-FiSl95iZzdw", group id > "rdkafkatest_rnd53b4eb0c2de343_0113u", generation id 14, group instance id > "NULL", current assignment "", subscribe topics "NULL" > [2024-01-16 12:10:52,618] INFO [GroupCoordinator id=1 > topic=__consumer_offsets partition=7] [GroupId > rdkafkatest_rnd53b4eb0c2de343_0113u] Member RaTCu6RXQH-FiSl95iZzdw > transitioned from CurrentAssignment(memberEpoch=14, previousMemberEpoch=6, > targetMemberEpoch=16, state=revoking, assignedPartitions={}, > partitionsPendingRevocation={IKXGrFR1Rv-Qes7Ummas6A=[7, 8, 12]}, > partitionsPendingAssignment={EnZMikZURKiUoxZf0rozaA=[0, 1, 2, 8, 9]}) to > CurrentAssignment(memberEpoch=16, previousMemberEpoch=14, > targetMemberEpoch=16, state=assigning, > assignedPartitions={EnZMikZURKiUoxZf0rozaA=[0, 1, 2]}, > partitionsPendingRevocation={}, > partitionsPendingAssignment={EnZMikZURKiUoxZf0rozaA=[8, 9]}). > (org.apache.kafka.coordinator.group.GroupMetadataManager) > %7|1705403452.620|HEARTBEAT|C_5#consumer-8| [thrd:main]: GroupCoordinator/1: > Heartbeat response received target assignment > "(null)(EnZMikZURKiUoxZf0rozaA)[0], (null)(EnZMikZURKiUoxZf0rozaA)[1], > (null)(EnZMikZURKiUoxZf0rozaA)[2]" > %7|1705403452.621|HEARTBEAT|C_5#consumer-8| [thrd:main]: GroupCoordinator/1: > Heartbeat of member id "RaTCu6RXQH-FiSl95iZzdw", group id > "rdkafkatest_rnd53b4eb0c2de343_0113u", generation id 16, group instance id > "NULL", current assignment > "rdkafkatest_rnd550f20623daba04c_0113u_2(EnZMikZURKiUoxZf0rozaA)[0], > rdkafkatest_rnd550f20623daba04c_0113u_2(EnZMikZURKiUoxZf0rozaA)[1], > rdkafkatest_rnd550f20623daba04c_0113u_2(EnZMikZURKiUoxZf0rozaA)[2]", > subscribe topics "NULL" > %7|1705403452.622|HEARTBEAT|C_5#consumer-8| [thrd:main]: GroupCoordinator/1: > Heartbeat response received target assignment > "(null)(EnZMikZURKiUoxZf0rozaA)[0], (null)(EnZMikZURKiUoxZf0rozaA)[1], > (null)(EnZMikZURKiUoxZf0rozaA)[2]" > {noformat} > LATER C_6 SUBSCRIBES TO T2,T1, IT RECEIVES T1-P9,T1-P13,T1-P15 AND T2-P0. > T2-P0 is assigned without being revoked from C_5. > {noformat} > %7|1705403453.612|HEARTBEAT|C_6#consumer-9| [thrd:main]: GroupCoordinator/1: > Heartbeat of member id "ss189cxiQ4q_cyNB1aNrfA", group id > "rdkafkatest_rnd53b4eb0c2de343_0113u", generation id 14, group instance id > "NULL", current assignment "NULL", subscribe topics > "rdkafkatest_rnd550f20623daba04c_0113u_2((null))[-1], > rdkafkatest_rnd5a91902462d61c2e_0113u_1((null))[-1]" > [2024-01-16 12:10:53,612] INFO [GroupCoordinator id=1 > topic=__consumer_offsets partition=7] [GroupId > rdkafkatest_rnd53b4eb0c2de343_0113u] Member ss189cxiQ4q_cyNB1aNrfA updated > its subscribed topics to: [rdkafkatest_rnd550f20623daba04c_0113u_2, > rdkafkatest_rnd5a91902462d61c2e_0113u_1]. > (org.apache.kafka.coordinator.group.GroupMetadataManager) > [2024-01-16 12:10:53,613] INFO [GroupCoordinator id=1 > topic=__consumer_offsets partition=7] [GroupId > rdkafkatest_rnd53b4eb0c2de343_0113u] Bumped group epoch to 17. > (org.apache.kafka.coordinator.group.GroupMetadataManager) > [2024-01-16 12:10:53,614] INFO [GroupCoordinator id=1 > topic=__consumer_offsets partition=7] [GroupId > rdkafkatest_rnd53b4eb0c2de343_0113u] Computed a new target assignment for > epoch 17 with 'uniform' assignor: {SWlBlUvBQpqdl93hD6SJOA=Assignment(error=0, > partitions={IKXGrFR1Rv-Qes7Ummas6A=[2, 4, 5]}, > metadata=VersionedMetadata(version=0, metadata=java.nio.HeapByteBuffer[pos=0 > lim=0 cap=0])), RaTCu6RXQH-FiSl95iZzdw=Assignment(error=0, > partitions={EnZMikZURKiUoxZf0rozaA=[1, 2, 8, 9, 10]}, > metadata=VersionedMetadata(version=0, metadata=java.nio.HeapByteBuffer[pos=0 > lim=0 cap=0])), mhzGxxF2R-Oa3BQGotzx7A=Assignment(error=0, > partitions={EnZMikZURKiUoxZf0rozaA=[3, 4, 5, 6, 7]}, > metadata=VersionedMetadata(version=0, metadata=java.nio.HeapByteBuffer[pos=0 > lim=0 cap=0])), i7qqdVw4S0KZR5J5Z0XvoQ=Assignment(error=0, > partitions={IKXGrFR1Rv-Qes7Ummas6A=[1, 3, 7]}, > metadata=VersionedMetadata(version=0, metadata=java.nio.HeapByteBuffer[pos=0 > lim=0 cap=0])), d6cbfWsfTt-7XdYU23MNaw=Assignment(error=0, > partitions={IKXGrFR1Rv-Qes7Ummas6A=[10, 11, 14]}, > metadata=VersionedMetadata(version=0, metadata=java.nio.HeapByteBuffer[pos=0 > lim=0 cap=0])), ss189cxiQ4q_cyNB1aNrfA=Assignment(error=0, > partitions={IKXGrFR1Rv-Qes7Ummas6A=[8, 13, 15], EnZMikZURKiUoxZf0rozaA=[0]}, > metadata=VersionedMetadata(version=0, metadata=java.nio.HeapByteBuffer[pos=0 > lim=0 cap=0])), 2vIQ0F6NRlOd9RgnVKtuDg=Assignment(error=0, > partitions={IKXGrFR1Rv-Qes7Ummas6A=[0, 6, 9, 12]}, > metadata=VersionedMetadata(version=0, metadata=java.nio.HeapByteBuffer[pos=0 > lim=0 cap=0])), pOvUx45IT-CJDvBQlfvVbQ=Assignment(error=0, > partitions={EnZMikZURKiUoxZf0rozaA=[11, 12, 13, 14, 15]}, > metadata=VersionedMetadata(version=0, metadata=java.nio.HeapByteBuffer[pos=0 > lim=0 cap=0]))}. (org.apache.kafka.coordinator.group.GroupMetadataManager) > [2024-01-16 12:10:53,614] INFO [GroupCoordinator id=1 > topic=__consumer_offsets partition=7] [GroupId > rdkafkatest_rnd53b4eb0c2de343_0113u] Member ss189cxiQ4q_cyNB1aNrfA > transitioned from CurrentAssignment(memberEpoch=14, previousMemberEpoch=9, > targetMemberEpoch=14, state=stable, > assignedPartitions={IKXGrFR1Rv-Qes7Ummas6A=[13, 15]}, > partitionsPendingRevocation={}, partitionsPendingAssignment={}) to > CurrentAssignment(memberEpoch=17, previousMemberEpoch=14, > targetMemberEpoch=17, state=stable, > assignedPartitions={IKXGrFR1Rv-Qes7Ummas6A=[8, 13, 15], > EnZMikZURKiUoxZf0rozaA=[0]}, partitionsPendingRevocation={}, > partitionsPendingAssignment={}). > (org.apache.kafka.coordinator.group.GroupMetadataManager) > [0m%7|1705403453.616|HEARTBEAT|C_6#consumer-9| [thrd:main]: > GroupCoordinator/1: Heartbeat response received target assignment > "(null)(IKXGrFR1Rv+Qes7Ummas6A)[8], (null)(IKXGrFR1Rv+Qes7Ummas6A)[13], > (null)(IKXGrFR1Rv+Qes7Ummas6A)[15], > (null)(EnZMikZURKiUoxZf0rozaA)[0]"{noformat} > I also see in consecutive broker logs that the same partition T2-P0, but also > T2-P1 and T2-p2, is assigned to C_5 (RaTCu6RXQH-FiSl95iZzdw) and pending > revocation from a third one (member id mhzGxxF2R-Oa3BQGotzx7A) > {noformat} > [2024-01-16 12:10:52,618] INFO [GroupCoordinator id=1 > topic=__consumer_offsets partition=7] [GroupId > rdkafkatest_rnd53b4eb0c2de343_0113u] Member RaTCu6RXQH-FiSl95iZzdw > transitioned from CurrentAssignment(memberEpoch=14, previousMemberEpoch=6, > targetMemberEpoch=16, state=revoking, assignedPartitions={}, > partitionsPendingRevocation={IKXGrFR1Rv-Qes7Ummas6A=[7, 8, 12]}, > partitionsPendingAssignment={EnZMikZURKiUoxZf0rozaA=[0, 1, 2, 8, 9]}) to > CurrentAssignment(memberEpoch=16, previousMemberEpoch=14, > targetMemberEpoch=16, state=assigning, > assignedPartitions={EnZMikZURKiUoxZf0rozaA=[0, 1, 2]}, > partitionsPendingRevocation={}, > partitionsPendingAssignment={EnZMikZURKiUoxZf0rozaA=[8, 9]}). > (org.apache.kafka.coordinator.group.GroupMetadataManager) > [2024-01-16 12:10:52,620] INFO [GroupCoordinator id=1 > topic=__consumer_offsets partition=7] [GroupId > rdkafkatest_rnd53b4eb0c2de343_0113u] Member mhzGxxF2R-Oa3BQGotzx7A > transitioned from CurrentAssignment(memberEpoch=15, previousMemberEpoch=11, > targetMemberEpoch=15, state=stable, > assignedPartitions={EnZMikZURKiUoxZf0rozaA=[0, 1, 2, 3, 4, 5, 6, 7]}, > partitionsPendingRevocation={}, partitionsPendingAssignment={}) to > CurrentAssignment(memberEpoch=15, previousMemberEpoch=11, > targetMemberEpoch=16, state=revoking, > assignedPartitions={EnZMikZURKiUoxZf0rozaA=[3, 4, 5, 6, 7]}, > partitionsPendingRevocation={EnZMikZURKiUoxZf0rozaA=[0, 1, 2]}, > partitionsPendingAssignment={}). > (org.apache.kafka.coordinator.group.GroupMetadataManager){noformat} > NOTE: error handling in my current branch is disabled because it needs to be > integrated, there is an assert that fails the tests if any error happens, so > effects of retriable errors can be excluded. > The coordinator was broker1. > I attach the broker properties too. -- This message was sent by Atlassian Jira (v8.20.10#820010)