[ https://issues.apache.org/jira/browse/KAFKA-8725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16947484#comment-16947484 ]
ASF GitHub Bot commented on KAFKA-8725: --------------------------------------- stanislavkozlovski commented on pull request #7475: KAFKA-8725: Improve LogCleanerManager#grabFilthiestLog error handling URL: https://github.com/apache/kafka/pull/7475 KAFKA-7215 improved the log cleaner error handling to mitigate thread death but missed one case. Exceptions in still cause the thread to die. This patch improves handling to ensure that errors in that function still mark a partition as uncleanable and do not crash the thread. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Improve LogCleaner error handling when failing to grab the filthiest log > ------------------------------------------------------------------------ > > Key: KAFKA-8725 > URL: https://issues.apache.org/jira/browse/KAFKA-8725 > Project: Kafka > Issue Type: Improvement > Reporter: Stanislav Kozlovski > Assignee: Stanislav Kozlovski > Priority: Major > > https://issues.apache.org/jira/browse/KAFKA-7215 improved error handling in > the log cleaner with the goal of not having the whole thread die when an > exception happens, but rather mark the partition that caused it as > uncleanable and continue cleaning the error-free partitions. > Unfortunately, the current code can still bubble up an exception and cause > the thread to die when an error happens before we can grab the filthiest log > and start cleaning it. At that point, we don't have a clear reference to the > log that caused the exception and chose to throw an IllegalStateException - > [https://github.com/apache/kafka/blob/39bcc8447c906506d63b8df156cf90174bbb8b78/core/src/main/scala/kafka/log/LogCleaner.scala#L346] > (as seen in https://issues.apache.org/jira/browse/KAFKA-8724) > Essentially, exceptions in `grabFilthiestCompactedLog` still cause the thread > to die. This can be further improved by trying to catch what log caused the > exception in the aforementioned function -- This message was sent by Atlassian Jira (v8.3.4#803005)