[ 
https://issues.apache.org/jira/browse/KAFKA-7704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yu Yang updated KAFKA-7704:
---------------------------
    Description: 
We recently deployed kafka 2.1, and noticed a jump in 
kafka.server.ReplicaFetcherManager.MaxLag.Replica metric. At the same time, 
there is no under-replicated partitions. 

The initial analysis showed that kafka 2.1.0 does not report metric correctly 
for topics that have no incoming traffic right now, but had traffic earlier. 
For those topics, ReplicaFetcherManager will consider the maxLag be the latest 
offset. 

For instance, we have a topic *test_topic*: 

{code}
[root@kafkabroker03002:/mnt/kafka/test_topic-0]# ls -l
total 8
-rw-rw-r-- 1 kafka kafka 10485760 Dec  4 00:13 00000000099043947579.index
-rw-rw-r-- 1 kafka kafka        0 Sep 23 03:01 00000000099043947579.log
-rw-rw-r-- 1 kafka kafka       10 Dec  4 00:13 00000000099043947579.snapshot
-rw-rw-r-- 1 kafka kafka 10485756 Dec  4 00:13 00000000099043947579.timeindex
-rw-rw-r-- 1 kafka kafka        4 Dec  4 00:13 leader-epoch-checkpoint
{code}

kafka reports ReplicaFetcherManager.MaxLag.Replica be 99043947579

 !Screen Shot 2018-12-03 at 4.33.35 PM.png|width=720px! 



  was:
We deployed kafka 2.1, and noticed a jump in 
kafka.server.ReplicaFetcherManager.MaxLag.Replica metric. At the same time, 
there is no under-replicated partitions. 

The initial analysis showed that kafka 2.1.0 does not report metric correctly 
for topics that have no incoming traffic right now, but had traffic earlier. 
For those topics, ReplicaFetcherManager will consider the maxLag be the latest 
offset. 

For instance, we have a topic *test_topic*: 

{code}
[root@kafkabroker03002:/mnt/kafka/test_topic-0]# ls -l
total 8
-rw-rw-r-- 1 kafka kafka 10485760 Dec  4 00:13 00000000099043947579.index
-rw-rw-r-- 1 kafka kafka        0 Sep 23 03:01 00000000099043947579.log
-rw-rw-r-- 1 kafka kafka       10 Dec  4 00:13 00000000099043947579.snapshot
-rw-rw-r-- 1 kafka kafka 10485756 Dec  4 00:13 00000000099043947579.timeindex
-rw-rw-r-- 1 kafka kafka        4 Dec  4 00:13 leader-epoch-checkpoint
{code}

kafka reports ReplicaFetcherManager.MaxLag.Replica be 99043947579

 !Screen Shot 2018-12-03 at 4.33.35 PM.png|width=720px! 




> kafka.server.ReplicaFetechManager.MaxLag.Replica metric is reported 
> incorrectly
> -------------------------------------------------------------------------------
>
>                 Key: KAFKA-7704
>                 URL: https://issues.apache.org/jira/browse/KAFKA-7704
>             Project: Kafka
>          Issue Type: Bug
>          Components: metrics
>    Affects Versions: 2.1.0
>            Reporter: Yu Yang
>            Priority: Major
>         Attachments: Screen Shot 2018-12-03 at 4.33.35 PM.png
>
>
> We recently deployed kafka 2.1, and noticed a jump in 
> kafka.server.ReplicaFetcherManager.MaxLag.Replica metric. At the same time, 
> there is no under-replicated partitions. 
> The initial analysis showed that kafka 2.1.0 does not report metric correctly 
> for topics that have no incoming traffic right now, but had traffic earlier. 
> For those topics, ReplicaFetcherManager will consider the maxLag be the 
> latest offset. 
> For instance, we have a topic *test_topic*: 
> {code}
> [root@kafkabroker03002:/mnt/kafka/test_topic-0]# ls -l
> total 8
> -rw-rw-r-- 1 kafka kafka 10485760 Dec  4 00:13 00000000099043947579.index
> -rw-rw-r-- 1 kafka kafka        0 Sep 23 03:01 00000000099043947579.log
> -rw-rw-r-- 1 kafka kafka       10 Dec  4 00:13 00000000099043947579.snapshot
> -rw-rw-r-- 1 kafka kafka 10485756 Dec  4 00:13 00000000099043947579.timeindex
> -rw-rw-r-- 1 kafka kafka        4 Dec  4 00:13 leader-epoch-checkpoint
> {code}
> kafka reports ReplicaFetcherManager.MaxLag.Replica be 99043947579
>  !Screen Shot 2018-12-03 at 4.33.35 PM.png|width=720px! 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to