wehbi created KAFKA-10654: ----------------------------- Summary: connector has failed, but worker status was ok Key: KAFKA-10654 URL: https://issues.apache.org/jira/browse/KAFKA-10654 Project: Kafka Issue Type: Bug Components: KafkaConnect Affects Versions: 2.1.1 Environment: Kafka distib : confluent CE kafka version:kafka_2.12-5.4.0-ccs.jar Reporter: wehbi
Hello We are using Kafka Mongo sink connector (please see below configuration), and we have multiple connectors on multiple topics. lately one of the connector has stopped to work, but the others continue to operate normally within the same worker. Looking into the connector logs (see extract below), we can observe that the Kafka topic leader was not available. the worker service status was running (systemctl service) Restarting the workers service has solved the problem. why the connector was not able to recover automatically ? how can we monitor and detect this failure ? for information: Kafka distib : confluent CE kafka version:kafka_2.12-5.4.0-ccs.jar {"class":"com.mongodb.kafka.connect.MongoSinkConnector","type":"sink","version":"1.0.1"} we have a distributed workers. I've checked the task status before restarting the worker, and it was saying that it is running (not failed). and also tried pause/resume for the task but it didn't do any thing. we are already monitoring the connector metrics (using prometheus/graphana) and they never detected the task failure. All metrics are indicating that all is fine. ----------------------------------- connector logs ----------------- [2020-09-25 11:53:52,352] WARN [Consumer clientId=connector-consumer-adherent-sink-0, groupId=connect-adherent-sink] Received unknown topic or partition error in fetch for partition RAMOWNER.ADHERENT-1 (org.apache.kafka.clients.consumer.internals.Fetcher:1246) [2020-09-25 11:53:52,353] WARN [Consumer clientId=connector-consumer-adherent-sink-0, groupId=connect-adherent-sink] Received unknown topic or partition error in fetch for partition RAMOWNER.ADHERENT-4 (org.apache.kafka.clients.consumer.internals.Fetcher:1246) [2020-09-25 11:53:52,353] WARN [Consumer clientId=connector-consumer-adherent-sink-0, groupId=connect-adherent-sink] Received unknown topic or partition error in fetch for partition RAMOWNER.ADHERENT-7 (org.apache.kafka.clients.consumer.internals.Fetcher:1246) [2020-09-25 11:53:52,365] WARN [Consumer clientId=connector-consumer-adherent-sink-0, groupId=connect-adherent-sink] Error while fetching metadata with correlation id 20822125 : \{RAMOWNER.ADHERENT=LEADER_NOT_AVAILABLE} (org.apache.kafka.clients.NetworkClient:1063) [2020-09-25 11:53:52,374] INFO [Consumer clientId=connector-consumer-adherent-sink-0, groupId=connect-adherent-sink] Revoke previously assigned partitions RAMOWNER.ADHERENT-3, RAMOWNER.ADHERENT-2, RAMOWNER.ADHERENT-1, RAMOWNER.ADHERENT-0, RAMOWNER.ADHERENT-7, RAMOWNER.ADHERENT-6, RAMOWNER.ADHERENT-5, RAMOWNER.ADHERENT-4 (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator:286) [2020-09-25 11:53:52,472] WARN [Consumer clientId=connector-consumer-adherent-sink-0, groupId=connect-adherent-sink] Error while fetching metadata with correlation id 20822127 : \{RAMOWNER.ADHERENT=LEADER_NOT_AVAILABLE} (org.apache.kafka.clients.NetworkClient:1063) [2020-09-25 11:53:52,472] WARN [Consumer clientId=connector-consumer-adherent-sink-0, groupId=connect-adherent-sink] The following subscribed topics are not assigned to any members: [RAMOWNER.ADHERENT] (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator:570) [2020-09-25 11:53:52,597] WARN [Consumer clientId=connector-consumer-adherent-sink-0, groupId=connect-adherent-sink] Error while fetching metadata with correlation id 20822129 : \{RAMOWNER.ADHERENT=LEADER_NOT_AVAILABLE} (org.apache.kafka.clients.NetworkClient:1063) topics = [RAMOWNER.ADHERENT] topics = [RAMOWNER.ADHERENT] -- This message was sent by Atlassian Jira (v8.3.4#803005)