[ https://issues.apache.org/jira/browse/KAFKA-14000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17582972#comment-17582972 ]
Sagar Rao commented on KAFKA-14000: ----------------------------------- I took a look at this today. Looking at the trace logs, the messages all belong to the same task and all of them would be sent to the same partition. But the logs seem all jumbled up in the sense older generations are read after newer ones. Also, I see that the tasks were moved around from one worker to another. So, was there a case of task failures/rebalances happening? The connector status push to status topic happens in an async manner and deletion is not considered for a safe put. I think If tasks are moved around, then they would be assigned to different producers and the ordering guarantees even within a single partition could be broken. One approach could be to sort by generation id for the same task. The issue there is for delete messages, there won't be any generation set in the kafka messages (as they are null). So, we won't know at what point the delete happened. > Kafka-connect standby server shows empty tasks list > --------------------------------------------------- > > Key: KAFKA-14000 > URL: https://issues.apache.org/jira/browse/KAFKA-14000 > Project: Kafka > Issue Type: Bug > Components: KafkaConnect > Affects Versions: 2.6.0 > Reporter: Xinyu Zou > Assignee: Sagar Rao > Priority: Major > Attachments: kafka-connect-trace.log > > > I'm using Kafka-connect distributed mode. There're two servers. One active > and one standby. The standby server sometimes shows empty tasks list in > status rest API response. > curl host:8443/connectors/name1/status > {code:java} > { > "connector": { > "state": "RUNNING", > "worker_id": "1.2.3.4:10443" > }, > "name": "name1", > "tasks": [], > "type": "source" > } {code} > I enabled TRACE log and checked. As required, the connect-status topic is set > to cleanup.policy=compact. But messages in the topic won't be compacted > timely. They will be compacted in a specific interval. So usually there're > more than one messages with same key. E.g. When kafka-connect is launched > there's no connector running. And then we start a new connector. Then there > will be two messages in connect-status topic: > status-task-name1 : state=RUNNING, workerId='10.251.170.166:10443', > generation=100 > status-task-name1 : _<emtpy>_ > Please check the log file [^kafka-connect-trace.log]. We can see that the > tasks status was removed finally. But actually the empty status was not the > newest message in topic connect-status. > > When reading status from connect-status topic, it doesn't sort messages by > generation. > [https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/consumer/ConsumerRecords.java] > So I think this could be improved. We can either sort the messages after poll > or compare generation value before we choose correct status message. -- This message was sent by Atlassian Jira (v8.20.10#820010)