[jira] [Comment Edited] (KAFKA-12713) Report "REAL" follower/consumer fetch latency

2021-08-16 Thread Kai Huang (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-12713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17399940#comment-17399940
 ] 

Kai Huang edited comment on KAFKA-12713 at 8/16/21, 6:48 PM:
-

[~ijuma] I would like to follow up on this ticket, and continue the discussion. 
I replied to the 
[discussion|https://lists.apache.org/thread.html/r7f82dde9133bf9d3a8b688ca7ae02ad761c52e7b79212c9247b276c5%40%3Cdev.kafka.apache.org%3E]
 thread and illustrate how the fetch latency metric should work in 
[KIP-736|https://cwiki.apache.org/confluence/display/KAFKA/KIP-736%3A+Report+the+true+end+to+end+fetch+latency].
 Could you please take a look and see if that clarifies your question?


was (Author: kaihuang):
[~ijuma] I would like to follow up on this ticket, and continue the discussion. 
I replied to your 
[discussion|https://lists.apache.org/thread.html/r7f82dde9133bf9d3a8b688ca7ae02ad761c52e7b79212c9247b276c5%40%3Cdev.kafka.apache.org%3E]
 thread and illustrate how the fetch latency metric should work in 
[KIP-736|https://cwiki.apache.org/confluence/display/KAFKA/KIP-736%3A+Report+the+true+end+to+end+fetch+latency].
 Could you please take a look and see if that clarifies your question?

> Report "REAL" follower/consumer fetch latency
> -
>
> Key: KAFKA-12713
> URL: https://issues.apache.org/jira/browse/KAFKA-12713
> Project: Kafka
>  Issue Type: Bug
>Reporter: Ming Liu
>Assignee: Kai Huang
>Priority: Major
>
> The fetch latency is an important metrics to monitor for the cluster 
> performance. With ACK=ALL, the produce latency is affected primarily by 
> broker fetch latency.
> However, currently the reported fetch latency didn't reflect the true fetch 
> latency because it sometimes need to stay in purgatory and wait for 
> replica.fetch.wait.max.ms when data is not available. This greatly affect the 
> real P50, P99 etc. 
> I like to propose a KIP to be able track the real fetch latency for both 
> broker follower and consumer. 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (KAFKA-12713) Report "REAL" follower/consumer fetch latency

2021-04-26 Thread Ming Liu (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-12713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17331590#comment-17331590
 ] 

Ming Liu edited comment on KAFKA-12713 at 4/27/21, 3:25 AM:


The idea is:
 # Add waitTimeMs in FetchResponse
 # In processResponseCallback() of handleFetchRequest,  set the waitTimeMs as 
the time spent in purgatory.
 # In FetcherStats, we will add a new meter to track the fetch latency, by 
deduct the waitTimeMs from the latency. 

Also, in FetchLatency, we should also report a time called TotalEffectiveTime = 
TotalTime-RemoteTime. 

Created KIP: 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-736%3A+Track+the+real+fetch+latency


was (Author: mingaliu):
The idea is:
 #  Add waitTimeMs in FetchResponse
 #  In processResponseCallback() of handleFetchRequest,  set the waitTimeMs as 
the time spent in purgatory.
 #  In FetcherStats, we will add a new meter to track the fetch latency, by 
deduct the waitTimeMs from the latency. 

Also, in FetchLatency, we should also report a time called TotalEffectiveTime = 
TotalTime-RemoteTime. 

Let me know for any suggestion/feedback.  I like to propose a KIP on that 
change. 

> Report "REAL" follower/consumer fetch latency
> -
>
> Key: KAFKA-12713
> URL: https://issues.apache.org/jira/browse/KAFKA-12713
> Project: Kafka
>  Issue Type: Bug
>Reporter: Ming Liu
>Priority: Major
>
> The fetch latency is an important metrics to monitor for the cluster 
> performance. With ACK=ALL, the produce latency is affected primarily by 
> broker fetch latency.
> However, currently the reported fetch latency didn't reflect the true fetch 
> latency because it sometimes need to stay in purgatory and wait for 
> replica.fetch.wait.max.ms when data is not available. This greatly affect the 
> real P50, P99 etc. 
> I like to propose a KIP to be able track the real fetch latency for both 
> broker follower and consumer. 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (KAFKA-12713) Report "REAL" follower/consumer fetch latency

2021-04-26 Thread Ming Liu (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-12713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17331590#comment-17331590
 ] 

Ming Liu edited comment on KAFKA-12713 at 4/27/21, 12:17 AM:
-

The idea is:
 #  Add waitTimeMs in FetchResponse
 #  In processResponseCallback() of handleFetchRequest,  set the waitTimeMs as 
the time spent in purgatory.
 #  In FetcherStats, we will add a new meter to track the fetch latency, by 
deduct the waitTimeMs from the latency. 

Also, in FetchLatency, we should also report a time called TotalEffectiveTime = 
TotalTime-RemoteTime. 

Let me know for any suggestion/feedback.  I like to propose a KIP on that 
change. 


was (Author: mingaliu):
The idea is:

0. Add waitTimeMs in Request()

1. In delayedOperation DelayedFetch class, add some code to track the actual 
wait time. 

2. In processResponseCallback() of handleFetchRequest, we can add additional 
parameter of waitTimeMs invoked from DelayedFetch.  It will set 
request.waitTimeMs.

3. In updateRequestMetrics() function, if waitTimeMs is not zero, we will 
deduct that out of RemoteTime and TotalTime.

Let me know for any suggestion/feedback.  I like to propose a KIP on that 
change. 

> Report "REAL" follower/consumer fetch latency
> -
>
> Key: KAFKA-12713
> URL: https://issues.apache.org/jira/browse/KAFKA-12713
> Project: Kafka
>  Issue Type: Bug
>Reporter: Ming Liu
>Priority: Major
>
> The fetch latency is an important metrics to monitor for the cluster 
> performance. With ACK=ALL, the produce latency is affected primarily by 
> broker fetch latency.
> However, currently the reported fetch latency didn't reflect the true fetch 
> latency because it sometimes need to stay in purgatory and wait for 
> replica.fetch.wait.max.ms when data is not available. This greatly affect the 
> real P50, P99 etc. 
> I like to propose a KIP to be able track the real fetch latency for both 
> broker follower and consumer. 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)