Tzu-Li (Gordon) Tai created FLINK-6109:
------------------------------------------

             Summary: Add "consumer lag" report metric to FlinkKafkaConsumer
                 Key: FLINK-6109
                 URL: https://issues.apache.org/jira/browse/FLINK-6109
             Project: Flink
          Issue Type: New Feature
          Components: Kafka Connector, Streaming Connectors
            Reporter: Tzu-Li (Gordon) Tai


This is a feature discussed in this ML: 
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Telling-if-a-job-has-caught-up-with-Kafka-td12261.html.

As discussed, we can expose two kinds of "consumer lag" metrics for this:

 - "current consumer lag for partition": the current difference between the 
latest offset and the last collected record record of a partition. This metric 
is calculated and updated at a configurable interval. This metric basically 
serves as an indicator of how the consumer is keeping up with the head of 
partitions. I propose to name this `currentOffsetLag`.

 - "Consumer lag of last checkpoint": the difference between the latest offset 
and the offset stored in the checkpoint of a partition. This metric is only 
updated when checkpoints are completed. It serves as an indicator of how much 
data may need to be replayed in case of a failure. I propose to name this 
`lastCheckpointedOffsetLag`.

The granularity of the metric is per-FlinkKafkaConsumer, and independent of the 
consumer group.id used (the offset used to calculate consumer lag is the 
internal offset state of the FlinkKafkaConsumer, not the consumer group's 
committed offsets in Kafka).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to