[ https://issues.apache.org/jira/browse/KAFKA-7704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16711463#comment-16711463 ]
ASF GitHub Bot commented on KAFKA-7704: --------------------------------------- junrao closed pull request #5998: KAFKA-7704: MaxLag.Replica metric is reported incorrectly URL: https://github.com/apache/kafka/pull/5998 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/core/src/main/scala/kafka/server/AbstractFetcherThread.scala b/core/src/main/scala/kafka/server/AbstractFetcherThread.scala index 2cee83c2e74..02158fac879 100755 --- a/core/src/main/scala/kafka/server/AbstractFetcherThread.scala +++ b/core/src/main/scala/kafka/server/AbstractFetcherThread.scala @@ -272,10 +272,10 @@ abstract class AbstractFetcherThread(name: String, partitionData) logAppendInfoOpt.foreach { logAppendInfo => - val nextOffset = logAppendInfo.lastOffset + 1 + val validBytes = logAppendInfo.validBytes + val nextOffset = if (validBytes > 0) logAppendInfo.lastOffset + 1 else currentFetchState.fetchOffset fetcherLagStats.getAndMaybePut(topicPartition).lag = Math.max(0L, partitionData.highWatermark - nextOffset) - val validBytes = logAppendInfo.validBytes // ReplicaDirAlterThread may have removed topicPartition from the partitionStates after processing the partition data if (validBytes > 0 && partitionStates.contains(topicPartition)) { // Update partitionStates only if there is no exception during processPartitionData ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > kafka.server.ReplicaFetechManager.MaxLag.Replica metric is reported > incorrectly > ------------------------------------------------------------------------------- > > Key: KAFKA-7704 > URL: https://issues.apache.org/jira/browse/KAFKA-7704 > Project: Kafka > Issue Type: Bug > Components: metrics > Affects Versions: 2.1.0 > Reporter: Yu Yang > Assignee: huxihx > Priority: Major > Attachments: Screen Shot 2018-12-03 at 4.33.35 PM.png, Screen Shot > 2018-12-05 at 10.13.09 PM.png > > > We recently deployed kafka 2.1, and noticed a jump in > kafka.server.ReplicaFetcherManager.MaxLag.Replica metric. At the same time, > there is no under-replicated partitions for the cluster. > The initial analysis shows that kafka 2.1.0 does not report metric correctly > for topics that have no incoming traffic right now, but had traffic earlier. > For those topics, ReplicaFetcherManager will consider the maxLag be the > latest offset. > For instance, we have a topic named `test_topic`: > {code} > [root@kafkabroker03002:/mnt/kafka/test_topic-0]# ls -l > total 8 > -rw-rw-r-- 1 kafka kafka 10485760 Dec 4 00:13 00000000099043947579.index > -rw-rw-r-- 1 kafka kafka 0 Sep 23 03:01 00000000099043947579.log > -rw-rw-r-- 1 kafka kafka 10 Dec 4 00:13 00000000099043947579.snapshot > -rw-rw-r-- 1 kafka kafka 10485756 Dec 4 00:13 00000000099043947579.timeindex > -rw-rw-r-- 1 kafka kafka 4 Dec 4 00:13 leader-epoch-checkpoint > {code} > kafka reports ReplicaFetcherManager.MaxLag.Replica be 99043947579 > !Screen Shot 2018-12-03 at 4.33.35 PM.png|width=720px! -- This message was sent by Atlassian JIRA (v7.6.3#76005)