[
https://issues.apache.org/jira/browse/KAFKA-203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443283#comment-13443283
]
Jun Rao commented on KAFKA-203:
-------------------------------
I propose that we add/keep the following set of metrics. Anything missed?
Server side:
A. Requests:
A1. produceRequestRate (meter, total)
A2. fetchRequestRate (meter, follower/non-follower)
A3. getMetadataRate (meter, total)
A4. getOffsetRate (meter, total)
A5. leaderAndISRRate (meter, total)
A6. stopReplicaRate (meter, total)
A7. produceRequestSizeHist (hist, total)
A8. fetchResponseSizeHist (hist, total)
A9. produceFailureRate (meter, topic/total)
A10. fetchFailureRate (meter, topic/total)
A11. produceRequestTime (timer, total)
A12. fetchRequestTime (timer, total)
A13. messagesInRate (meter, topic/total)
A14. messagesOutRate (meter, topic/total)
A15. messagesBytesInRate (meter, topic/total)
A16. messagesBytesOutRate (meter, topic/total)
B. Log:
B1. logFlushTime (timer, total)
C. Purgatory:
Produce:
C1. expiredRequestMeter (meter, partition/total)
C2. satisfactionTimeHist (hist, total)
Fetch:
C3. expiredRequestMeter (meter, follower/non-follower)
C4. satisfactionTimeHist (hist, follower/non-follower)
Both:
C5. delayedRequests (gauge, Fetch/Produce)
D. ReplicaManager:
D1. leaderPartitionCounts (gauge, total)
D2. underReplicatedPartitionCounts (|ISR| < replication factor, gauge, total)
D3. ISRExpandRate (meter, partition/total)
D4. ISRShrinkRate (meter, partition/total)
E. Controller:
E1. requestRate (meter, total)
E2. requestTimeHist (hist, total)
E3. controllerActiveCount (gauge, total)
Clients:
F. Producer:
F1. messageRate (meter, topic/total)
F2. byteRate (meter, topic/total)
F3. droppedEventRate (meter, total)
F4. requestRate (meter, total)
F5. requestSizeHist (hist, total)
F6. requestTimeHist (hist, total)
F7. resendRate (meter, total)
F8. failedSendRate (meter, total)
F9. getMetadataRate (meter, total)
G. Consumer:
G1. messageRate (meter, topic/total)
G2. byteRate (meter, topic/total)
G3. requestRate (meter, total)
G4. requestSizeHist (hist, total)
G5. requestTimeHist (hist, total)
G6. lagInBytes (gauge, partition)
Also, I propose that we remove the following metrics since they are either not
very useful or are redundant.
Purgatory:
Produce:
* caughtUpFollowerFetchRequest (meter, partition/total): not very useful
* followerCatchupTime (hist, total): not very useful
* throughputMeter (meter, partition/total): same as bytesIn
* satisfiedRequestMeter (meter, total): not very useful
Fetch:
* satisfiedRequestMeter (meter, total): not very useful
* throughputMeter (meter, partition/total): same as bytesOut
Both
* satisfactionRate (meter, Fetch/Produce): not very useful
* expirationRate (meter, Fetch/Produce/topic): already at Produce/Fetch leve
> Improve Kafka internal metrics
> ------------------------------
>
> Key: KAFKA-203
> URL: https://issues.apache.org/jira/browse/KAFKA-203
> Project: Kafka
> Issue Type: New Feature
> Components: core
> Affects Versions: 0.8
> Reporter: Jay Kreps
> Assignee: Jay Kreps
> Labels: tools
>
> Currently metrics in kafka are using old-school JMX directly. This makes
> adding metrics a pain. It would be good to do one of the following:
> 1. Convert to Coda Hale's metrics package
> (https://github.com/codahale/metrics)
> 2. Write a simple metrics package
> The new metrics package should make metrics easier to add and work with and
> package up the common logic of keeping windowed gauges, histograms, counters,
> etc. JMX should be just one output of this.
> The advantage of the Coda Hale package is that it exists so we don't need to
> write it. The downsides are (1) introduces another client dependency which
> causes conflicts, and (2) seems a bit heavy on design. The good news is that
> the metrics-core package doesn't seem to bring in a lot of dependencies which
> is nice, though the scala wrapper seems to want scala 2.9. I am also a little
> skeptical of the approach for histograms--it does sampling instead of
> bucketing though that may be okay.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira