[ https://issues.apache.org/jira/browse/KAFKA-19484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18003877#comment-18003877 ]
George Wu commented on KAFKA-19484: ----------------------------------- https://github.com/apache/kafka/pull/20129 > Tiered Storage Quota Metrics can stop reporting > ----------------------------------------------- > > Key: KAFKA-19484 > URL: https://issues.apache.org/jira/browse/KAFKA-19484 > Project: Kafka > Issue Type: Bug > Components: Tiered-Storage > Affects Versions: 3.9.0, 4.0.0 > Environment: Ubuntu 22, Amazon Corretto Java 17 > Reporter: George Wu > Priority: Minor > > It is possible for tiered storage throttle metrics (introduced as a part of > [KIP-956|https://cwiki.apache.org/confluence/display/KAFKA/KIP-956+Tiered+Storage+Quotas]) > to stop reporting if the relevant tiered storage operation (copy/fetch) goes > idle for longer than the sensor expiry timeout of one hour. > > RemoteLogManager maintains a static reference to the sensors used for metric > reporting. This is a problem because the default sensor expiry time is one > hour and there is nothing responsible for handling expired sensors. If the > sensors expire, RemoteLogManager will continue producing metrics through it's > static references to sensor objects that have already been cleaned up by the > ExpireSensorTask. > > This issue tends to affect fetch metrics a lot more than copy metrics because > the copy sensors don't go idle unless the topics stop being produced to. In > contrast, the use case of backfilling from earliest offset using tiered > storage is a pretty common use case. > > *Reproduction* > * Generate some amount of tiered storage fetch traffic on a topic. Confirm > the remote-fetch-throttle-time-avg/max metrics are being reported. > * Remove the consumer workload that triggers the tiered storage fetch > traffic. Wait for one hour (the sensor expiration period) > * Generate some more tiered storage fetch traffic. The metric will no longer > report. -- This message was sent by Atlassian Jira (v8.20.10#820010)