Philip Nee created KAFKA-18217:
----------------------------------

             Summary: Slow HWM/LSO update might have subtle effect on the 
consumer lag reporting
                 Key: KAFKA-18217
                 URL: https://issues.apache.org/jira/browse/KAFKA-18217
             Project: Kafka
          Issue Type: Improvement
          Components: clients, consumer
            Reporter: Philip Nee


We've discovered the consumer lag metrics appear spiky for the 
AsyncKafkaConsumer.  We examined how HWM/LSO is updated and measure the cadence 
between the two consumer using the local examples. TL;DR - Consumer Lag metrics 
can sometimes be off due to KAFKA-18216 and slowness of HWM/LSO update.

 

Context: Fetcher performs multiple consumer lag measurements between two 
HWM/LSO updates.  The closer the HWM/LSO update, the better the lag measurement 
is because

lag = HWM/LSO - fetch position

The elementary statics show the behavioral differences between the 2 consumer 
implementations.  The data will vary based on the platform running these tests, 
so this is just for the reader's reference. (These are the outputs of my custom 
script).  Both are measuring by produce-consuming 200 million records.

 

AsyncKafkaConsumer

Updating 7179 HWM/LSO
Average HWM/LSO increment: 3589.99
Standard deviation of increment: 2381.07
Average number of 'recording lag' count: 7.69
Standard deviation of 'recording lag' count: 4.66

 

ClassicKafkaConsumer

Updating 58418 HWM/LSO
Average HWM/LSO increment 1223.02
Standard deviation of increment: 532.52
Average 'recording lag' count: 2.95
Standard deviation of 'recording lag' count: 1.10



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to