[ https://issues.apache.org/jira/browse/KAFKA-3857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15364847#comment-15364847 ]
ASF GitHub Bot commented on KAFKA-3857: --------------------------------------- GitHub user kiranptivo opened a pull request: https://github.com/apache/kafka/pull/1593 KAFKA-3857 Additional log cleaner metrics Fixes KAFKA-3857 Changes proposed in this pull request: The following additional log cleaner metrics have been added. 1. num-runs: Cumulative number of successful log cleaner runs since last broker restart. 2. last-run-time: Time of last log cleaner run. 3. num-filthy-logs: Number of filthy logs. A non zero value for an extended period of time indicates that the cleaner has not been successful in cleaning the logs. A note on num-filthy-logs: It is incremented whenever a filthy topic partition is added to inProgress HashMap. And it is decremented once the cleaning is successful, or if the cleaning is aborted. Note that the existing LogCleaner code does not provide a metric to check if the clean operation is successful or not. There is an inProgress HashMap with topicPartition => LogCleaningInProgress entries in it, but the entries are removed from the HashMap even when clean operation throws an exception. So, added an additional metric num-filthy-logs, to differentiate between a successful log clean case and an exception case. The code is ready. I have tested and verified JMX metrics. There is one case I couldn't test though. It's the case where numFilthyLogs is decremented in 'resumeCleaning(...)' in LogCleanerManager.scala Line 188. It seems to be a part of the workflow that aborts the cleaning of a particular partition. Any ideas on how to test this scenario? You can merge this pull request into a Git repository by running: $ git pull https://github.com/TiVo/kafka log_cleaner_jmx_metrics Alternatively you can review and apply these changes as the patch at: https://github.com/apache/kafka/pull/1593.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1593 ---- commit f00de412f6b1f6568adef479687ae0df789f9c96 Author: Kiran Pillarisetty <pillarise...@pillarisetty-mbpr.tivo.com> Date: 2016-06-14T17:40:26Z Create a couple of additional Log Cleaner JMX metrics log-clean-last-run: Log cleaner's last run time log-clean-runs: Number of log cleaner runs. commit 7dc7511ee2b6d3cdf9df0c366fe23bf34d062a54 Author: Kiran Pillarisetty <pillarise...@tivo.com> Date: 2016-06-14T20:24:00Z Created a couple of additional Log Cleaner JMX metrics log-clean-last-run: a metric to track last log cleaner run (unix timestamp) log-clean-runs: a metric to track number of log cleaner runs Committer: Kiran Pillarisetty <pillarise...@tivo.com> commit 7f1214ff1118103dd639df717e988a22bad8033d Author: Kiran Pillarisetty <pillarise...@tivo.com> Date: 2016-07-01T22:14:57Z Add additional JMX metric to track successful cleaning of a log segment commit 1ac346bb37008312e41035167dbfd75803595cd6 Author: Kiran Pillarisetty <pillarise...@tivo.com> Date: 2016-07-01T22:17:25Z Add additional JMX metric to track successful cleaning of a log segment commit 4f08d875e05c35bd7d7c849584b8b029031f884b Author: Kiran Pillarisetty <pillarise...@tivo.com> Date: 2016-07-05T22:23:20Z Metric name updated to num-filthy-logs. Metric incremented as it is grabbed for cleaning, and decremented once the cleaning is done, or if the cleaning is aborted commit cd887c05bf1d56b7566c5b72b3ddf3bcdfb70898 Author: Kiran Pillarisetty <pillarise...@tivo.com> Date: 2016-07-05T23:31:32Z Changed a metric name (number-of-runs to num-runs). Removed an extra \n around line 164. It is not present in the trunk ---- > Additional log cleaner metrics > ------------------------------ > > Key: KAFKA-3857 > URL: https://issues.apache.org/jira/browse/KAFKA-3857 > Project: Kafka > Issue Type: Improvement > Reporter: Kiran Pillarisetty > > The proposal would be to add a couple of additional log cleaner metrics: > 1. Time of last log cleaner run > 2. Cumulative number of successful log cleaner runs since last broker restart. > Existing log cleaner metrics (max-buffer-utilization-percent, > cleaner-recopy-percent, max-clean-time-secs, max-dirty-percent) do not > differentiate an idle log cleaner from a dead log cleaner. It would be useful > to have the above two metrics added, to indicate whether log cleaner is alive > (and successfully cleaning) or not. -- This message was sent by Atlassian JIRA (v6.3.4#6332)