[jira] [Updated] (KAFKA-8656) Kafka Consumer Record Latency Metric

Sean Glover (JIRA) Thu, 11 Jul 2019 09:26:12 -0700


     [ 
https://issues.apache.org/jira/browse/KAFKA-8656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Sean Glover updated KAFKA-8656:
-------------------------------
    Description: 
Consumer lag is a useful metric to monitor how many records are queued to be 
processed.  We can look at individual lag per partition or we may aggregate 
metrics. For example, we may want to monitor what the maximum lag of any 
particular partition in our consumer subscription so we can identify hot 
partitions, caused by an insufficient producing partitioning strategy.  We may 
want to monitor a sum of lag across all partitions so we have a sense as to our 
total backlog of messages to consume. Lag in offsets is useful when you have a 
good understanding of your messages and processing characteristics, but it 
doesn’t tell us how far behind _in time_ we are.  This is known as wait time in 
queueing theory, or more informally it’s referred to as latency.

The latency of a message can be defined as the difference between when that 
message was first produced to when the message is received by a consumer.  The 
latency of records in a partition correlates with lag, but a larger lag doesn’t 
necessarily mean a larger latency. For example, a topic consumed by two 
separate application consumer groups A and B may have similar lag, but 
different latency per partition.  Application A is a consumer which performs 
CPU intensive business logic on each message it receives. It’s distributed 
across many consumer group members to handle the load quickly enough, but since 
its processing time is slower, it takes longer to process each message per 
partition.  Meanwhile, Application B is a consumer which performs a simple ETL 
operation to land streaming data in another system, such as HDFS. It may have 
similar lag to Application A, but because it has a faster processing time its 
latency per partition is significantly less.

If the Kafka Consumer reported a latency metric it would be easier to build 
Service Level Agreements (SLAs) based on non-functional requirements of the 
streaming system.  For example, the system must never have a latency of greater 
than 10 minutes. This SLA could be used in monitoring alerts or as input to 
automatic scaling solutions.

[KIP-488|https://cwiki.apache.org/confluence/display/KAFKA/488%3A+Kafka+Consumer+Record+Latency+Metric]

 

  was:
Consumer lag is a useful metric to monitor how many records are queued to be 
processed.  We can look at individual lag per partition or we may aggregate 
metrics. For example, we may want to monitor what the maximum lag of any 
particular partition in our consumer subscription so we can identify hot 
partitions, caused by an insufficient producing partitioning strategy.  We may 
want to monitor a sum of lag across all partitions so we have a sense as to our 
total backlog of messages to consume. Lag in offsets is useful when you have a 
good understanding of your messages and processing characteristics, but it 
doesn’t tell us how far behind _in time_ we are.  This is known as wait time in 
queueing theory, or more informally it’s referred to as latency.

The latency of a message can be defined as the difference between when that 
message was first produced to when the message is received by a consumer.  The 
latency of records in a partition correlates with lag, but a larger lag doesn’t 
necessarily mean a larger latency. For example, a topic consumed by two 
separate application consumer groups A and B may have similar lag, but 
different latency per partition.  Application A is a consumer which performs 
CPU intensive business logic on each message it receives. It’s distributed 
across many consumer group members to handle the load quickly enough, but since 
its processing time is slower, it takes longer to process each message per 
partition.  Meanwhile, Application B is a consumer which performs a simple ETL 
operation to land streaming data in another system, such as HDFS. It may have 
similar lag to Application A, but because it has a faster processing time its 
latency per partition is significantly less.

If the Kafka Consumer reported a latency metric it would be easier to build 
Service Level Agreements (SLAs) based on non-functional requirements of the 
streaming system.  For example, the system must never have a latency of greater 
than 10 minutes. This SLA could be used in monitoring alerts or as input to 
automatic scaling solutions.

[KIP-488|[https://cwiki.apache.org/confluence/display/KAFKA/488%3A+Kafka+Consumer+Record+Latency+Metric]]

 


> Kafka Consumer Record Latency Metric
> ------------------------------------
>
>                 Key: KAFKA-8656
>                 URL: https://issues.apache.org/jira/browse/KAFKA-8656
>             Project: Kafka
>          Issue Type: New Feature
>          Components: metrics
>            Reporter: Sean Glover
>            Assignee: Sean Glover
>            Priority: Major
>
> Consumer lag is a useful metric to monitor how many records are queued to be 
> processed.  We can look at individual lag per partition or we may aggregate 
> metrics. For example, we may want to monitor what the maximum lag of any 
> particular partition in our consumer subscription so we can identify hot 
> partitions, caused by an insufficient producing partitioning strategy.  We 
> may want to monitor a sum of lag across all partitions so we have a sense as 
> to our total backlog of messages to consume. Lag in offsets is useful when 
> you have a good understanding of your messages and processing 
> characteristics, but it doesn’t tell us how far behind _in time_ we are.  
> This is known as wait time in queueing theory, or more informally it’s 
> referred to as latency.
> The latency of a message can be defined as the difference between when that 
> message was first produced to when the message is received by a consumer.  
> The latency of records in a partition correlates with lag, but a larger lag 
> doesn’t necessarily mean a larger latency. For example, a topic consumed by 
> two separate application consumer groups A and B may have similar lag, but 
> different latency per partition.  Application A is a consumer which performs 
> CPU intensive business logic on each message it receives. It’s distributed 
> across many consumer group members to handle the load quickly enough, but 
> since its processing time is slower, it takes longer to process each message 
> per partition.  Meanwhile, Application B is a consumer which performs a 
> simple ETL operation to land streaming data in another system, such as HDFS. 
> It may have similar lag to Application A, but because it has a faster 
> processing time its latency per partition is significantly less.
> If the Kafka Consumer reported a latency metric it would be easier to build 
> Service Level Agreements (SLAs) based on non-functional requirements of the 
> streaming system.  For example, the system must never have a latency of 
> greater than 10 minutes. This SLA could be used in monitoring alerts or as 
> input to automatic scaling solutions.
> [KIP-488|https://cwiki.apache.org/confluence/display/KAFKA/488%3A+Kafka+Consumer+Record+Latency+Metric]
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

[jira] [Updated] (KAFKA-8656) Kafka Consumer Record Latency Metric

Reply via email to