Krzysztof Piecuch created KAFKA-15885:
-----------------------------------------
Summary: Reduce lock contention when cleaning topics
Key: KAFKA-15885
URL: https://issues.apache.org/jira/browse/KAFKA-15885
Project: Kafka
Issue Type: Improvement
Components: log cleaner
Reporter: Krzysztof Piecuch
Somewhat similar to KAFKA-14213, there are a couple of subroutines which
require the same lock which results in throttling compaction speed and limits
parallelism.
There's a couple of problems here:
# LogCleanerManager.grabFilthiestCompactedLog - iterates through a list of
partitions multiple times, all of this while holding a lock
# LogCleanerManager.grabFilthiestCompactedLog doesn't cache anything and
returns only 1 item at a time - method is issued every time a cleaner thread
asks for a new partition to compact
# LogCleanerManager.checkCleaningAborted - a quick check which:
## shares a lock with grabFilthiestCompactedLog
## is executed every time a LogCleaner reads bufsize data to compact
# LogCleaner's bufsize is limited to 1G / (number of log cleaner threads)
Here's the scenario where this design falls short:
* I have 15k partitions
* all of which need to be compacted fairly often but it doesn't take a lot of
time to compact them
* Most of the cputime spent by cleaner threads is spent on
grabFilthiestCompactedLog
** so the other cleaners can't do anything since they need to acquire a lock
to read data to compact as per 3.1. and 3.2.
** because of 4. log cleaners run out of work to do as soon as
grabFilthiestLog is called
* Negative performance scaling - increasing # of log cleaner threads decreases
log cleaner's bufsize which makes them hammer the lock mentioned in 3.1. and
3.2. more often
I suggest:
* making LogCleanerManager to use more fine-grained locking (ex. RW lock for
checkCleaningAborted data structures) to decrease the effect of negative
performance scaling
* making LogCleanerManager.grabFilthiestLog faster on average:
** we don't need grabFilthiestLog to be 100% accurate
** we can try caching candidates for "filthiestLog" and re-calculate the cache
every 1 minute or so.
** change the algorithm to probabilistic sampling (get 100 topics and pick the
worst one?) or even round-robin
* Alternatively we could make LogCleaner's bufsize to allow values higher
--
This message was sent by Atlassian Jira
(v8.20.10#820010)