[jira] [Commented] (KAFKA-7704) kafka.server.ReplicaFetechManager.MaxLag.Replica metric is reported incorrectly

2018-12-06 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16711463#comment-16711463
 ] 

ASF GitHub Bot commented on KAFKA-7704:
---

junrao closed pull request #5998: KAFKA-7704: MaxLag.Replica metric is reported 
incorrectly
URL: https://github.com/apache/kafka/pull/5998
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/core/src/main/scala/kafka/server/AbstractFetcherThread.scala 
b/core/src/main/scala/kafka/server/AbstractFetcherThread.scala
index 2cee83c2e74..02158fac879 100755
--- a/core/src/main/scala/kafka/server/AbstractFetcherThread.scala
+++ b/core/src/main/scala/kafka/server/AbstractFetcherThread.scala
@@ -272,10 +272,10 @@ abstract class AbstractFetcherThread(name: String,
   partitionData)
 
 logAppendInfoOpt.foreach { logAppendInfo =>
-  val nextOffset = logAppendInfo.lastOffset + 1
+  val validBytes = logAppendInfo.validBytes
+  val nextOffset = if (validBytes > 0) 
logAppendInfo.lastOffset + 1 else currentFetchState.fetchOffset
   fetcherLagStats.getAndMaybePut(topicPartition).lag = 
Math.max(0L, partitionData.highWatermark - nextOffset)
 
-  val validBytes = logAppendInfo.validBytes
   // ReplicaDirAlterThread may have removed topicPartition 
from the partitionStates after processing the partition data
   if (validBytes > 0 && 
partitionStates.contains(topicPartition)) {
 // Update partitionStates only if there is no 
exception during processPartitionData


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> kafka.server.ReplicaFetechManager.MaxLag.Replica metric is reported 
> incorrectly
> ---
>
> Key: KAFKA-7704
> URL: https://issues.apache.org/jira/browse/KAFKA-7704
> Project: Kafka
>  Issue Type: Bug
>  Components: metrics
>Affects Versions: 2.1.0
>Reporter: Yu Yang
>Assignee: huxihx
>Priority: Major
> Attachments: Screen Shot 2018-12-03 at 4.33.35 PM.png, Screen Shot 
> 2018-12-05 at 10.13.09 PM.png
>
>
> We recently deployed kafka 2.1, and noticed a jump in 
> kafka.server.ReplicaFetcherManager.MaxLag.Replica metric. At the same time, 
> there is no under-replicated partitions for the cluster. 
> The initial analysis shows that kafka 2.1.0 does not report metric correctly 
> for topics that have no incoming traffic right now, but had traffic earlier. 
> For those topics, ReplicaFetcherManager will consider the maxLag be the 
> latest offset. 
> For instance, we have a topic named `test_topic`: 
> {code}
> [root@kafkabroker03002:/mnt/kafka/test_topic-0]# ls -l
> total 8
> -rw-rw-r-- 1 kafka kafka 10485760 Dec  4 00:13 099043947579.index
> -rw-rw-r-- 1 kafka kafka0 Sep 23 03:01 099043947579.log
> -rw-rw-r-- 1 kafka kafka   10 Dec  4 00:13 099043947579.snapshot
> -rw-rw-r-- 1 kafka kafka 10485756 Dec  4 00:13 099043947579.timeindex
> -rw-rw-r-- 1 kafka kafka4 Dec  4 00:13 leader-epoch-checkpoint
> {code}
> kafka reports ReplicaFetcherManager.MaxLag.Replica be 99043947579
>  !Screen Shot 2018-12-03 at 4.33.35 PM.png|width=720px! 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-7704) kafka.server.ReplicaFetechManager.MaxLag.Replica metric is reported incorrectly

2018-12-05 Thread Yu Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16711009#comment-16711009
 ] 

Yu Yang commented on KAFKA-7704:


[~huxi_2b], [~junrao] I verified that  
https://github.com/apache/kafka/pull/5998 does fix the maxlag metric issue. 
Thanks for the quick fix!

> kafka.server.ReplicaFetechManager.MaxLag.Replica metric is reported 
> incorrectly
> ---
>
> Key: KAFKA-7704
> URL: https://issues.apache.org/jira/browse/KAFKA-7704
> Project: Kafka
>  Issue Type: Bug
>  Components: metrics
>Affects Versions: 2.1.0
>Reporter: Yu Yang
>Assignee: huxihx
>Priority: Major
> Attachments: Screen Shot 2018-12-03 at 4.33.35 PM.png
>
>
> We recently deployed kafka 2.1, and noticed a jump in 
> kafka.server.ReplicaFetcherManager.MaxLag.Replica metric. At the same time, 
> there is no under-replicated partitions for the cluster. 
> The initial analysis shows that kafka 2.1.0 does not report metric correctly 
> for topics that have no incoming traffic right now, but had traffic earlier. 
> For those topics, ReplicaFetcherManager will consider the maxLag be the 
> latest offset. 
> For instance, we have a topic named `test_topic`: 
> {code}
> [root@kafkabroker03002:/mnt/kafka/test_topic-0]# ls -l
> total 8
> -rw-rw-r-- 1 kafka kafka 10485760 Dec  4 00:13 099043947579.index
> -rw-rw-r-- 1 kafka kafka0 Sep 23 03:01 099043947579.log
> -rw-rw-r-- 1 kafka kafka   10 Dec  4 00:13 099043947579.snapshot
> -rw-rw-r-- 1 kafka kafka 10485756 Dec  4 00:13 099043947579.timeindex
> -rw-rw-r-- 1 kafka kafka4 Dec  4 00:13 leader-epoch-checkpoint
> {code}
> kafka reports ReplicaFetcherManager.MaxLag.Replica be 99043947579
>  !Screen Shot 2018-12-03 at 4.33.35 PM.png|width=720px! 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-7704) kafka.server.ReplicaFetechManager.MaxLag.Replica metric is reported incorrectly

2018-12-04 Thread Jun Rao (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16709589#comment-16709589
 ] 

Jun Rao commented on KAFKA-7704:


[~yuyang08], could you try the PR and see if it fixes the issue? Thanks,

> kafka.server.ReplicaFetechManager.MaxLag.Replica metric is reported 
> incorrectly
> ---
>
> Key: KAFKA-7704
> URL: https://issues.apache.org/jira/browse/KAFKA-7704
> Project: Kafka
>  Issue Type: Bug
>  Components: metrics
>Affects Versions: 2.1.0
>Reporter: Yu Yang
>Assignee: huxihx
>Priority: Major
> Attachments: Screen Shot 2018-12-03 at 4.33.35 PM.png
>
>
> We recently deployed kafka 2.1, and noticed a jump in 
> kafka.server.ReplicaFetcherManager.MaxLag.Replica metric. At the same time, 
> there is no under-replicated partitions for the cluster. 
> The initial analysis shows that kafka 2.1.0 does not report metric correctly 
> for topics that have no incoming traffic right now, but had traffic earlier. 
> For those topics, ReplicaFetcherManager will consider the maxLag be the 
> latest offset. 
> For instance, we have a topic named `test_topic`: 
> {code}
> [root@kafkabroker03002:/mnt/kafka/test_topic-0]# ls -l
> total 8
> -rw-rw-r-- 1 kafka kafka 10485760 Dec  4 00:13 099043947579.index
> -rw-rw-r-- 1 kafka kafka0 Sep 23 03:01 099043947579.log
> -rw-rw-r-- 1 kafka kafka   10 Dec  4 00:13 099043947579.snapshot
> -rw-rw-r-- 1 kafka kafka 10485756 Dec  4 00:13 099043947579.timeindex
> -rw-rw-r-- 1 kafka kafka4 Dec  4 00:13 leader-epoch-checkpoint
> {code}
> kafka reports ReplicaFetcherManager.MaxLag.Replica be 99043947579
>  !Screen Shot 2018-12-03 at 4.33.35 PM.png|width=720px! 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-7704) kafka.server.ReplicaFetechManager.MaxLag.Replica metric is reported incorrectly

2018-12-03 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16708142#comment-16708142
 ] 

ASF GitHub Bot commented on KAFKA-7704:
---

huxihx opened a new pull request #5998: KAFKA-7704: MaxLag.Replica metric is 
reported incorrectly
URL: https://github.com/apache/kafka/pull/5998
 
 
   On the follower side, for the empty `LogAppendInfo` retrieved from the 
leader, fetcherLagStats set the wrong lag for fetcherLagStats due to 
`nextOffset` is zero in this case where it actually means no lagging, so the 
lag should be set to 0 if `nextOffset` is 0 or `logAppendInfo.lastOffset` is -1.
   
   *More detailed description of your change,
   if necessary. The PR title and PR message become
   the squashed commit message, so use a separate
   comment to ping reviewers.*
   
   *Summary of testing strategy (including rationale)
   for the feature or bug fix. Unit and/or integration
   tests are expected for any behaviour change and
   system tests should be considered for larger changes.*
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> kafka.server.ReplicaFetechManager.MaxLag.Replica metric is reported 
> incorrectly
> ---
>
> Key: KAFKA-7704
> URL: https://issues.apache.org/jira/browse/KAFKA-7704
> Project: Kafka
>  Issue Type: Bug
>  Components: metrics
>Affects Versions: 2.1.0
>Reporter: Yu Yang
>Priority: Major
> Attachments: Screen Shot 2018-12-03 at 4.33.35 PM.png
>
>
> We recently deployed kafka 2.1, and noticed a jump in 
> kafka.server.ReplicaFetcherManager.MaxLag.Replica metric. At the same time, 
> there is no under-replicated partitions for the cluster. 
> The initial analysis shows that kafka 2.1.0 does not report metric correctly 
> for topics that have no incoming traffic right now, but had traffic earlier. 
> For those topics, ReplicaFetcherManager will consider the maxLag be the 
> latest offset. 
> For instance, we have a topic named `test_topic`: 
> {code}
> [root@kafkabroker03002:/mnt/kafka/test_topic-0]# ls -l
> total 8
> -rw-rw-r-- 1 kafka kafka 10485760 Dec  4 00:13 099043947579.index
> -rw-rw-r-- 1 kafka kafka0 Sep 23 03:01 099043947579.log
> -rw-rw-r-- 1 kafka kafka   10 Dec  4 00:13 099043947579.snapshot
> -rw-rw-r-- 1 kafka kafka 10485756 Dec  4 00:13 099043947579.timeindex
> -rw-rw-r-- 1 kafka kafka4 Dec  4 00:13 leader-epoch-checkpoint
> {code}
> kafka reports ReplicaFetcherManager.MaxLag.Replica be 99043947579
>  !Screen Shot 2018-12-03 at 4.33.35 PM.png|width=720px! 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)