[ https://issues.apache.org/jira/browse/KAFKA-10253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Konstantin Lalafaryan resolved KAFKA-10253. ------------------------------------------- Resolution: Not A Problem Seems like we have found a problem, and it is related to the `{{config.storage.topic}}` topic which has been created with 3 partitions but it should be created *always with single partition.* *_config.storage.topic_* The name of the topic where connector and task configuration data are stored. This must be the same for all Workers with the same group.id. Kafka Connect will upon startup attempt to automatically create this topic with a single-partition and compacted cleanup policy to avoid losing data, but it will simply use the topic if it already exists. If you choose to create this topic manually, always create it as a compacted topic with a single partition and a high replication factor (3x or more). > Kafka Connect gets into an infinite rebalance loop > -------------------------------------------------- > > Key: KAFKA-10253 > URL: https://issues.apache.org/jira/browse/KAFKA-10253 > Project: Kafka > Issue Type: Bug > Components: KafkaConnect > Affects Versions: 2.5.0 > Reporter: Konstantin Lalafaryan > Priority: Blocker > > Hello everyone! > > We are running kafka-connect cluster (3 workers) and very often it gets into > an infinite rebalance loop. > > {code:java} > 2020-07-09 08:51:25,731 INFO [Worker clientId=connect-1, groupId= > kafka-connect] Rebalance started > (org.apache.kafka.connect.runtime.distributed.WorkerCoordinator) > [DistributedHerder-connect-1-1] > 2020-07-09 08:51:25,731 INFO [Worker clientId=connect-1, groupId= > kafka-connect] (Re-)joining group > (org.apache.kafka.clients.consumer.internals.AbstractCoordinator) > [DistributedHerder-connect-1-1] > 2020-07-09 08:51:25,733 INFO [Worker clientId=connect-1, groupId= > kafka-connect] Was selected to perform assignments, but do not have latest > config found in sync request. Returning an empty configuration to trigger > re-sync. > (org.apache.kafka.connect.runtime.distributed.IncrementalCooperativeAssignor) > [DistributedHerder-connect-1-1] > 2020-07-09 08:51:25,735 INFO [Worker clientId=connect-1, groupId= > kafka-connect] Successfully joined group with generation 305655831 > (org.apache.kafka.clients.consumer.internals.AbstractCoordinator) > [DistributedHerder-connect-1-1] > 2020-07-09 08:51:25,735 INFO [Worker clientId=connect-1, groupId= > kafka-connect] Joined group at generation 305655831 with protocol version 2 > and got assignment: Assignment{error=1, > leader='connect-1-0008abc5-a152-42fe-a697-a4a4641f72bb', > leaderUrl='http://10.20.30.221:8083/', offset=12, connectorIds=[], > taskIds=[], revokedConnectorIds=[], revokedTaskIds=[], delay=0} with > rebalance delay: 0 > (org.apache.kafka.connect.runtime.distributed.DistributedHerder) > [DistributedHerder-connect-1-1] > 2020-07-09 08:51:25,735 INFO [Worker clientId=connect-1, groupId= > kafka-connect] Rebalance started > (org.apache.kafka.connect.runtime.distributed.WorkerCoordinator) > [DistributedHerder-connect-1-1] > 2020-07-09 08:51:25,735 INFO [Worker clientId=connect-1, groupId= > kafka-connect] (Re-)joining group > (org.apache.kafka.clients.consumer.internals.AbstractCoordinator) > [DistributedHerder-connect-1-1] > 2020-07-09 08:51:25,736 INFO [Worker clientId=connect-1, groupId= > kafka-connect] Was selected to perform assignments, but do not have latest > config found in sync request. Returning an empty configuration to trigger > re-sync. > (org.apache.kafka.connect.runtime.distributed.IncrementalCooperativeAssignor) > [DistributedHerder-connect-1-1] > 2020-07-09 08:51:25,739 INFO [Worker clientId=connect-1, groupId= > kafka-connect] Successfully joined group with generation 305655832 > (org.apache.kafka.clients.consumer.internals.AbstractCoordinator) > [DistributedHerder-connect-1-1] > 2020-07-09 08:51:25,739 INFO [Worker clientId=connect-1, groupId= > kafka-connect] Joined group at generation 305655832 with protocol version 2 > and got assignment: Assignment{error=1, > leader='connect-1-0008abc5-a152-42fe-a697-a4a4641f72bb', > leaderUrl='http://10.20.30.221:8083/', offset=12, connectorIds=[], > taskIds=[], revokedConnectorIds=[], revokedTaskIds=[], delay=0} with > rebalance delay: 0 > (org.apache.kafka.connect.runtime.distributed.DistributedHerder) > [DistributedHerder-connect-1-1] > 2020-07-09 08:51:25,739 INFO [Worker clientId=connect-1, groupId= > kafka-connect] Rebalance started > (org.apache.kafka.connect.runtime.distributed.WorkerCoordinator) > [DistributedHerder-connect-1-1] > 2020-07-09 08:51:25,739 INFO [Worker clientId=connect-1, groupId= > kafka-connect] (Re-)joining group > (org.apache.kafka.clients.consumer.internals.AbstractCoordinator) > [DistributedHerder-connect-1-1] > 2020-07-09 08:51:25,740 INFO [Worker clientId=connect-1, groupId= > kafka-connect] Was selected to perform assignments, but do not have latest > config found in sync request. Returning an empty configuration to trigger > re-sync. > (org.apache.kafka.connect.runtime.distributed.IncrementalCooperativeAssignor) > [DistributedHerder-connect-1-1] > 2020-07-09 08:51:25,742 INFO [Worker clientId=connect-1, groupId= > kafka-connect] Successfully joined group with generation 305655833 > (org.apache.kafka.clients.consumer.internals.AbstractCoordinator) > [DistributedHerder-connect-1-1] > 2020-07-09 08:51:25,742 INFO [Worker clientId=connect-1, groupId= > kafka-connect] Joined group at generation 305655833 with protocol version 2 > and got assignment: Assignment{error=1, > leader='connect-1-0008abc5-a152-42fe-a697-a4a4641f72bb', > leaderUrl='http://10.20.30.221:8083/', offset=12, connectorIds=[], > taskIds=[], revokedConnectorIds=[], revokedTaskIds=[], delay=0} with > rebalance delay: 0 > (org.apache.kafka.connect.runtime.distributed.DistributedHerder) > [DistributedHerder-connect-1-1] > 2020-07-09 08:51:25,742 INFO [Worker clientId=connect-1, groupId= > kafka-connect] Rebalance started > (org.apache.kafka.connect.runtime.distributed.WorkerCoordinator) > [DistributedHerder-connect-1-1] > 2020-07-09 08:51:25,742 INFO [Worker clientId=connect-1, groupId= > kafka-connect] (Re-)joining group > (org.apache.kafka.clients.consumer.internals.AbstractCoordinator) > [DistributedHerder-connect-1-1] > 2020-07-09 08:51:25,744 INFO [Worker clientId=connect-1, groupId= > kafka-connect] Was selected to perform assignments, but do not have latest > config found in sync request. Returning an empty configuration to trigger > re-sync. > (org.apache.kafka.connect.runtime.distributed.IncrementalCooperativeAssignor) > [DistributedHerder-connect-1-1] > 2020-07-09 08:51:25,746 INFO [Worker clientId=connect-1, groupId= > kafka-connect] Successfully joined group with generation 305655834 > (org.apache.kafka.clients.consumer.internals.AbstractCoordinator) > [DistributedHerder-connect-1-1] > 2020-07-09 08:51:25,746 INFO [Worker clientId=connect-1, groupId= > kafka-connect] Joined group at generation 305655834 with protocol version 2 > and got assignment: Assignment{error=1, > leader='connect-1-0008abc5-a152-42fe-a697-a4a4641f72bb', > leaderUrl='http://10.20.30.221:8083/', offset=12, connectorIds=[], > taskIds=[], revokedConnectorIds=[], revokedTaskIds=[], delay=0} with > rebalance delay: 0 > (org.apache.kafka.connect.runtime.distributed.DistributedHerder) > [DistributedHerder-connect-1-1] > 2020-07-09 08:51:25,746 INFO [Worker clientId=connect-1, groupId= > kafka-connect] Rebalance started > (org.apache.kafka.connect.runtime.distributed.WorkerCoordinator) > [DistributedHerder-connect-1-1] > 2020-07-09 08:51:25,746 INFO [Worker clientId=connect-1, groupId= > kafka-connect] (Re-)joining group > (org.apache.kafka.clients.consumer.internals.AbstractCoordinator) > [DistributedHerder-connect-1-1] > 2020-07-09 08:51:25,748 INFO [Worker clientId=connect-1, groupId= > kafka-connect] Was selected to perform assignments, but do not have latest > config found in sync request. Returning an empty configuration to trigger > re-sync. > (org.apache.kafka.connect.runtime.distributed.IncrementalCooperativeAssignor) > [DistributedHerder-connect-1-1] > 2020-07-09 08:51:25,750 INFO [Worker clientId=connect-1, groupId= > kafka-connect] Successfully joined group with generation 305655835 > (org.apache.kafka.clients.consumer.internals.AbstractCoordinator) > [DistributedHerder-connect-1-1] > 2020-07-09 08:51:25,750 INFO [Worker clientId=connect-1, groupId= > kafka-connect] Joined group at generation 305655835 with protocol version 2 > and got assignment: Assignment{error=1, > leader='connect-1-0008abc5-a152-42fe-a697-a4a4641f72bb', > leaderUrl='http://10.20.30.221:8083/', offset=12, connectorIds=[], > taskIds=[], revokedConnectorIds=[], revokedTaskIds=[], delay=0} with > rebalance delay: 0 > (org.apache.kafka.connect.runtime.distributed.DistributedHerder) > [DistributedHerder-connect-1-1] > 2020-07-09 08:51:25,750 INFO [Worker clientId=connect-1, groupId= > kafka-connect] Rebalance started > (org.apache.kafka.connect.runtime.distributed.WorkerCoordinator) > [DistributedHerder-connect-1-1] > 2020-07-09 08:51:25,750 INFO [Worker clientId=connect-1, groupId= > kafka-connect] (Re-)joining group > (org.apache.kafka.clients.consumer.internals.AbstractCoordinator) > [DistributedHerder-connect-1-1] > 2020-07-09 08:51:25,751 INFO [Worker clientId=connect-1, groupId= > kafka-connect] Was selected to perform assignments, but do not have latest > config found in sync request. Returning an empty configuration to trigger > re-sync. > (org.apache.kafka.connect.runtime.distributed.IncrementalCooperativeAssignor) > [DistributedHerder-connect-1-1] > 2020-07-09 08:51:25,754 INFO [Worker clientId=connect-1, groupId= > kafka-connect] Successfully joined group with generation 305655836 > (org.apache.kafka.clients.consumer.internals.AbstractCoordinator) > [DistributedHerder-connect-1-1] > 2020-07-09 08:51:25,754 INFO [Worker clientId=connect-1, groupId= > kafka-connect] Joined group at generation 305655836 with protocol version 2 > and got assignment: Assignment{error=1, > leader='connect-1-0008abc5-a152-42fe-a697-a4a4641f72bb', > leaderUrl='http://10.20.30.221:8083/', offset=12, connectorIds=[], > taskIds=[], revokedConnectorIds=[], revokedTaskIds=[], delay=0} with > rebalance delay: 0 > (org.apache.kafka.connect.runtime.distributed.DistributedHerder) > [DistributedHerder-connect-1-1] > 2020-07-09 08:51:25,754 INFO [Worker clientId=connect-1, groupId= > kafka-connect] Rebalance started > (org.apache.kafka.connect.runtime.distributed.WorkerCoordinator) > [DistributedHerder-connect-1-1] > 2020-07-09 08:51:25,754 INFO [Worker clientId=connect-1, groupId= > kafka-connect] (Re-)joining group > (org.apache.kafka.clients.consumer.internals.AbstractCoordinator) > [DistributedHerder-connect-1-1] > 2020-07-09 08:51:25,755 INFO [Worker clientId=connect-1, groupId= > kafka-connect] Was selected to perform assignments, but do not have latest > config found in sync request. Returning an empty configuration to trigger > re-sync. > (org.apache.kafka.connect.runtime.distributed.IncrementalCooperativeAssignor) > [DistributedHerder-connect-1-1] > 2020-07-09 08:51:25,757 INFO [Worker clientId=connect-1, groupId= > kafka-connect] Successfully joined group with generation 305655837 > (org.apache.kafka.clients.consumer.internals.AbstractCoordinator) > [DistributedHerder-connect-1-1] > 2020-07-09 08:51:25,757 INFO [Worker clientId=connect-1, groupId= > kafka-connect] Joined group at generation 305655837 with protocol version 2 > and got assignment: Assignment{error=1, > leader='connect-1-0008abc5-a152-42fe-a697-a4a4641f72bb', > leaderUrl='http://10.20.30.221:8083/', offset=12, connectorIds=[], > taskIds=[], revokedConnectorIds=[], revokedTaskIds=[], delay=0} with > rebalance delay: 0 > (org.apache.kafka.connect.runtime.distributed.DistributedHerder) > [DistributedHerder-connect-1-1] > 2020-07-09 08:51:25,757 INFO [Worker clientId=connect-1, groupId= > kafka-connect] Rebalance started > (org.apache.kafka.connect.runtime.distributed.WorkerCoordinator) > [DistributedHerder-connect-1-1] > 2020-07-09 08:51:25,757 INFO [Worker clientId=connect-1, groupId= > kafka-connect] (Re-)joining group > (org.apache.kafka.clients.consumer.internals.AbstractCoordinator) > [DistributedHerder-connect-1-1] > 2020-07-09 08:51:25,759 INFO [Worker clientId=connect-1, groupId= > kafka-connect] Was selected to perform assignments, but do not have latest > config found in sync request. Returning an empty configuration to trigger > re-sync. > (org.apache.kafka.connect.runtime.distributed.IncrementalCooperativeAssignor) > [DistributedHerder-connect-1-1] > 2020-07-09 08:51:25,761 INFO [Worker clientId=connect-1, groupId= > kafka-connect] Successfully joined group with generation 305655838 > (org.apache.kafka.clients.consumer.internals.AbstractCoordinator) > [DistributedHerder-connect-1-1] > 2020-07-09 08:51:25,761 INFO [Worker clientId=connect-1, groupId= > kafka-connect] Joined group at generation 305655838 with protocol version 2 > and got assignment: Assignment{error=1, > leader='connect-1-0008abc5-a152-42fe-a697-a4a4641f72bb', > leaderUrl='http://10.20.30.221:8083/', offset=12, connectorIds=[], > taskIds=[], revokedConnectorIds=[], revokedTaskIds=[], delay=0} with > rebalance delay: 0 > (org.apache.kafka.connect.runtime.distributed.DistributedHerder) > [DistributedHerder-connect-1-1] > 2020-07-09 08:51:25,761 INFO [Worker clientId=connect-1, groupId= > kafka-connect] Rebalance started > (org.apache.kafka.connect.runtime.distributed.WorkerCoordinator) > [DistributedHerder-connect-1-1] > 2020-07-09 08:51:25,761 INFO [Worker clientId=connect-1, groupId= > kafka-connect] (Re-)joining group > (org.apache.kafka.clients.consumer.internals.AbstractCoordinator) > [DistributedHerder-connect-1-1] > 2020-07-09 08:51:25,763 INFO [Worker clientId=connect-1, groupId= > kafka-connect] Was selected to perform assignments, but do not have latest > config found in sync request. Returning an empty configuration to trigger > re-sync. > (org.apache.kafka.connect.runtime.distributed.IncrementalCooperativeAssignor) > [DistributedHerder-connect-1-1] > 2020-07-09 08:51:25,766 INFO [Worker clientId=connect-1, groupId= > kafka-connect] Successfully joined group with generation 305655839 > (org.apache.kafka.clients.consumer.internals.AbstractCoordinator) > [DistributedHerder-connect-1-1] > 2020-07-09 08:51:25,766 INFO [Worker clientId=connect-1, groupId= > kafka-connect] Joined group at generation 305655839 with protocol version 2 > and got assignment: Assignment{error=1, > leader='connect-1-0008abc5-a152-42fe-a697-a4a4641f72bb', > leaderUrl='http://10.20.30.221:8083/', offset=12, connectorIds=[], > taskIds=[], revokedConnectorIds=[], revokedTaskIds=[], delay=0} with > rebalance delay: 0 > (org.apache.kafka.connect.runtime.distributed.DistributedHerder) > [DistributedHerder-connect-1-1] > 2020-07-09 08:51:25,766 INFO [Worker clientId=connect-1, groupId= > kafka-connect] Rebalance started > (org.apache.kafka.connect.runtime.distributed.WorkerCoordinator) > [DistributedHerder-connect-1-1] > 2020-07-09 08:51:25,766 INFO [Worker clientId=connect-1, groupId= > kafka-connect] (Re-)joining group > (org.apache.kafka.clients.consumer.internals.AbstractCoordinator) > [DistributedHerder-connect-1-1] > 2020-07-09 08:51:25,768 INFO [Worker clientId=connect-1, groupId= > kafka-connect] Was selected to perform assignments, but do not have latest > config found in sync request. Returning an empty configuration to trigger > re-sync. > (org.apache.kafka.connect.runtime.distributed.IncrementalCooperativeAssignor) > [DistributedHerder-connect-1-1] > 2020-07-09 08:51:25,771 INFO [Worker clientId=connect-1, groupId= > kafka-connect] Successfully joined group with generation 305655840 > (org.apache.kafka.clients.consumer.internals.AbstractCoordinator) > [DistributedHerder-connect-1-1] > 2020-07-09 08:51:25,771 INFO [Worker clientId=connect-1, groupId= > kafka-connect] Joined group at generation > {code} > It is happening in all 3 workers. > > And in the broker side we can see following: > {code:java} > 2020-07-09 16:39:46,260 INFO [GroupCoordinator 0]: Preparing to rebalance > group kafka-connect in state PreparingRebalance with old generation 311127279 > (__consumer_offsets-7) (reason: Updating metadata for member > connect-1-bdf1c132-3311-493a-894a-5fffcac41ec7) > (kafka.coordinator.group.GroupCoordinator) > [data-plane-kafka-request-handler-0] > 2020-07-09 16:39:46,261 INFO [GroupCoordinator 0]: Stabilized group > kafka-connect generation 311127280 (__consumer_offsets-7) > (kafka.coordinator.group.GroupCoordinator) > [data-plane-kafka-request-handler-5] > 2020-07-09 16:39:46,262 INFO [GroupCoordinator 0]: Assignment received from > leader for group kafka-connect for generation 311127280 > (kafka.coordinator.group.GroupCoordinator) > [data-plane-kafka-request-handler-1] > 2020-07-09 16:39:46,265 INFO [GroupCoordinator 0]: Preparing to rebalance > group kafka-connect in state PreparingRebalance with old generation 311127280 > (__consumer_offsets-7) (reason: Updating metadata for member > connect-1-bdf1c132-3311-493a-894a-5fffcac41ec7) > (kafka.coordinator.group.GroupCoordinator) > [data-plane-kafka-request-handler-1] > 2020-07-09 16:39:46,266 INFO [GroupCoordinator 0]: Stabilized group > kafka-connect generation 311127281 (__consumer_offsets-7) > (kafka.coordinator.group.GroupCoordinator) > [data-plane-kafka-request-handler-6] > 2020-07-09 16:39:46,267 INFO [GroupCoordinator 0]: Assignment received from > leader for group kafka-connect for generation 311127281 > (kafka.coordinator.group.GroupCoordinator) > [data-plane-kafka-request-handler-1] > 2020-07-09 16:39:46,270 INFO [GroupCoordinator 0]: Preparing to rebalance > group kafka-connect in state PreparingRebalance with old generation 311127281 > (__consumer_offsets-7) (reason: Updating metadata for member > connect-1-bdf1c132-3311-493a-894a-5fffcac41ec7) > (kafka.coordinator.group.GroupCoordinator) > [data-plane-kafka-request-handler-7] > 2020-07-09 16:39:46,271 INFO [GroupCoordinator 0]: Stabilized group > kafka-connect generation 311127282 (__consumer_offsets-7) > (kafka.coordinator.group.GroupCoordinator) > [data-plane-kafka-request-handler-6] > 2020-07-09 16:39:46,272 INFO [GroupCoordinator 0]: Assignment received from > leader for group kafka-connect for generation 311127282 > (kafka.coordinator.group.GroupCoordinator) > [data-plane-kafka-request-handler-1] > 2020-07-09 16:39:46,275 INFO [GroupCoordinator 0]: Preparing to rebalance > group kafka-connect in state PreparingRebalance with old generation 311127282 > (__consumer_offsets-7) (reason: Updating metadata for member > connect-1-bdf1c132-3311-493a-894a-5fffcac41ec7) > (kafka.coordinator.group.GroupCoordinator) > [data-plane-kafka-request-handler-3] > 2020-07-09 16:39:46,276 INFO [GroupCoordinator 0]: Stabilized group > kafka-connect generation 311127283 (__consumer_offsets-7) > (kafka.coordinator.group.GroupCoordinator) > [data-plane-kafka-request-handler-7] > 2020-07-09 16:39:46,277 INFO [GroupCoordinator 0]: Assignment received from > leader for group kafka-connect for generation 311127283 > (kafka.coordinator.group.GroupCoordinator) > [data-plane-kafka-request-handler-5] > 2020-07-09 16:39:46,280 INFO [GroupCoordinator 0]: Preparing to rebalance > group kafka-connect in state PreparingRebalance with old generation 311127283 > (__consumer_offsets-7) (reason: Updating metadata for member > connect-1-bdf1c132-3311-493a-894a-5fffcac41ec7) > (kafka.coordinator.group.GroupCoordinator) > [data-plane-kafka-request-handler-5] > 2020-07-09 16:39:46,281 INFO [GroupCoordinator 0]: Stabilized group > kafka-connect generation 311127284 (__consumer_offsets-7) > (kafka.coordinator.group.GroupCoordinator) > [data-plane-kafka-request-handler-7] > 2020-07-09 16:39:46,282 INFO [GroupCoordinator 0]: Assignment received from > leader for group kafka-connect for generation 311127284 > (kafka.coordinator.group.GroupCoordinator) > [data-plane-kafka-request-handler-3] > 2020-07-09 16:39:46,285 INFO [GroupCoordinator 0]: Preparing to rebalance > group kafka-connect in state PreparingRebalance with old generation 311127284 > (__consumer_offsets-7) (reason: Updating metadata for member > connect-1-bdf1c132-3311-493a-894a-5fffcac41ec7) > (kafka.coordinator.group.GroupCoordinator) > [data-plane-kafka-request-handler-1] > 2020-07-09 16:39:46,286 INFO [GroupCoordinator 0]: Stabilized group > kafka-connect generation 311127285 (__consumer_offsets-7) > (kafka.coordinator.group.GroupCoordinator) > [data-plane-kafka-request-handler-4] > 2020-07-09 16:39:46,287 INFO [GroupCoordinator 0]: Assignment received from > leader for group kafka-connect for generation 311127285 > (kafka.coordinator.group.GroupCoordinator) > [data-plane-kafka-request-handler-7] > {code} > > Any feedback is appreciated! > Thanks! -- This message was sent by Atlassian Jira (v8.3.4#803005)