[ https://issues.apache.org/jira/browse/KAFKA-4441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15804874#comment-15804874 ]
Edoardo Comar edited comment on KAFKA-4441 at 1/6/17 4:16 PM: -------------------------------------------------------------- For the {{UnderReplicatedPartitions}} metrics, the Gauge defined inside {{ReplicaManager}} needs to be able to make a check like {{deleteTopicManager.isTopicQueuedUpForDeletion(topic)}} The current startup ordering inside {{KafkaServer}} has the {{ReplicaManager}} start before the {{KafkaController}}. Could the order be reversed ? Else the {{ReplicaManager}} could be assigned a {{DeletionChecker}} function after the {{KafkaController}} has started. This would be minimally disruptive to the current code. [~ijuma] [~junrao] any preferences ? was (Author: ecomar): For the {{UnderReplicatedPartitions}} metrics, the Gauge defined inside {{ReplicaManager}} needs to be able to make a check {{deleteTopicManager.isTopicQueuedUpForDeletion(topic)}} We will follow up with another PR > Kafka Monitoring is incorrect during rapid topic creation and deletion > ---------------------------------------------------------------------- > > Key: KAFKA-4441 > URL: https://issues.apache.org/jira/browse/KAFKA-4441 > Project: Kafka > Issue Type: Bug > Affects Versions: 0.10.0.0, 0.10.0.1 > Reporter: Tom Crayford > Assignee: Edoardo Comar > > Kafka reports several metrics off the state of partitions: > UnderReplicatedPartitions > PreferredReplicaImbalanceCount > OfflinePartitionsCount > All of these metrics trigger when rapidly creating and deleting topics in a > tight loop, although the actual causes of the metrics firing are from topics > that are undergoing creation/deletion, and the cluster is otherwise stable. > Looking through the source code, topic deletion goes through an asynchronous > state machine: > https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/controller/TopicDeletionManager.scala#L35. > However, the metrics do not know about the progress of this state machine: > https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/controller/KafkaController.scala#L185 > > I believe the fix to this is relatively simple - we need to make the metrics > know that a topic is currently undergoing deletion or creation, and only > include topics that are "stable" -- This message was sent by Atlassian JIRA (v6.3.4#6332)