[jira] [Commented] (KAFKA-8764) LogCleanerManager endless loop while compacting/cleaning segments
[ https://issues.apache.org/jira/browse/KAFKA-8764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17189684#comment-17189684 ] Jun Rao commented on KAFKA-8764: [~wushujames]: Yes, reassigned replica can have the same issue. > LogCleanerManager endless loop while compacting/cleaning segments > - > > Key: KAFKA-8764 > URL: https://issues.apache.org/jira/browse/KAFKA-8764 > Project: Kafka > Issue Type: Bug > Components: log cleaner >Affects Versions: 2.3.0, 2.2.1, 2.4.0 > Environment: docker base image: openjdk:8-jre-alpine base image, > kafka from http://ftp.carnet.hr/misc/apache/kafka/2.2.1/kafka_2.12-2.2.1.tgz >Reporter: Tomislav Rajakovic >Priority: Major > Labels: patch > Fix For: 2.2.3, 2.5.0, 2.3.2, 2.4.1 > > Attachments: Screen Shot 2020-01-10 at 8.38.25 AM.png, > kafka2.4.0-KAFKA-8764.patch, kafka2.4.0-KAFKA-8764.patch, > log-cleaner-bug-reproduction.zip > > > {{LogCleanerManager stuck in endless loop while clearing segments for one > partition resulting with many log outputs and heavy disk read/writes/IOPS.}} > > Issue appeared on follower brokers, and it happens on every (new) broker if > partition assignment is changed. > > Original issue setup: > * kafka_2.12-2.2.1 deployed as statefulset on kubernetes, 5 brokers > * log directory is (AWS) EBS mounted PV, gp2 (ssd) kind of 750GiB > * 5 zookeepers > * topic created with config: > ** name = "backup_br_domain_squad" > partitions = 36 > replication_factor = 3 > config = { > "cleanup.policy" = "compact" > "min.compaction.lag.ms" = "8640" > "min.cleanable.dirty.ratio" = "0.3" > } > > > Log excerpt: > {{[2019-08-07 12:10:53,895] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:53,895] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,896] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,896] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,964] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:53,964] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,964] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,964] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,031] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:54,032] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,032] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,032] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,101] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:54,101] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,101] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,101] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,173] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:54,173] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,173] INFO Deleted offset index >
[jira] [Commented] (KAFKA-8764) LogCleanerManager endless loop while compacting/cleaning segments
[ https://issues.apache.org/jira/browse/KAFKA-8764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17189628#comment-17189628 ] James Cheng commented on KAFKA-8764: [~junrao], in the comment https://issues.apache.org/jira/browse/KAFKA-8764?focusedCommentId=17013081=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17013081 above, you said {quote}Currently, the log cleaner runs independently on each replica. If a follower has been down for some time, it is possible that the leader has already cleaned the data and left holes in some log segments. When those segments get replicated to the follower, the follower will clean the same data again and potentially hit the above issue. {quote} Would this also happen if you added a new follower? For example, if I have a compacted topic on Brokers 1 2 3 where compaction has already happened, and then I use kafka-reassign-partitions to move the brokers 4 5 6, it sounds like compaction will happen independently on brokers 4 5 6, and will run into this issue. > LogCleanerManager endless loop while compacting/cleaning segments > - > > Key: KAFKA-8764 > URL: https://issues.apache.org/jira/browse/KAFKA-8764 > Project: Kafka > Issue Type: Bug > Components: log cleaner >Affects Versions: 2.3.0, 2.2.1, 2.4.0 > Environment: docker base image: openjdk:8-jre-alpine base image, > kafka from http://ftp.carnet.hr/misc/apache/kafka/2.2.1/kafka_2.12-2.2.1.tgz >Reporter: Tomislav Rajakovic >Priority: Major > Labels: patch > Fix For: 2.2.3, 2.5.0, 2.3.2, 2.4.1 > > Attachments: Screen Shot 2020-01-10 at 8.38.25 AM.png, > kafka2.4.0-KAFKA-8764.patch, kafka2.4.0-KAFKA-8764.patch, > log-cleaner-bug-reproduction.zip > > > {{LogCleanerManager stuck in endless loop while clearing segments for one > partition resulting with many log outputs and heavy disk read/writes/IOPS.}} > > Issue appeared on follower brokers, and it happens on every (new) broker if > partition assignment is changed. > > Original issue setup: > * kafka_2.12-2.2.1 deployed as statefulset on kubernetes, 5 brokers > * log directory is (AWS) EBS mounted PV, gp2 (ssd) kind of 750GiB > * 5 zookeepers > * topic created with config: > ** name = "backup_br_domain_squad" > partitions = 36 > replication_factor = 3 > config = { > "cleanup.policy" = "compact" > "min.compaction.lag.ms" = "8640" > "min.cleanable.dirty.ratio" = "0.3" > } > > > Log excerpt: > {{[2019-08-07 12:10:53,895] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:53,895] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,896] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,896] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,964] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:53,964] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,964] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,964] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,031] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:54,032] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,032] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,032] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,101] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:54,101] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. >
[jira] [Commented] (KAFKA-8764) LogCleanerManager endless loop while compacting/cleaning segments
[ https://issues.apache.org/jira/browse/KAFKA-8764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17015432#comment-17015432 ] ASF GitHub Bot commented on KAFKA-8764: --- mumrah commented on pull request #7932: KAFKA-8764: LogCleanerManager endless loop while compacting/clea URL: https://github.com/apache/kafka/pull/7932 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > LogCleanerManager endless loop while compacting/cleaning segments > - > > Key: KAFKA-8764 > URL: https://issues.apache.org/jira/browse/KAFKA-8764 > Project: Kafka > Issue Type: Bug > Components: log cleaner >Affects Versions: 2.3.0, 2.2.1, 2.4.0 > Environment: docker base image: openjdk:8-jre-alpine base image, > kafka from http://ftp.carnet.hr/misc/apache/kafka/2.2.1/kafka_2.12-2.2.1.tgz >Reporter: Tomislav Rajakovic >Priority: Major > Labels: patch > Attachments: Screen Shot 2020-01-10 at 8.38.25 AM.png, > kafka2.4.0-KAFKA-8764.patch, kafka2.4.0-KAFKA-8764.patch, > log-cleaner-bug-reproduction.zip > > > {{LogCleanerManager stuck in endless loop while clearing segments for one > partition resulting with many log outputs and heavy disk read/writes/IOPS.}} > > Issue appeared on follower brokers, and it happens on every (new) broker if > partition assignment is changed. > > Original issue setup: > * kafka_2.12-2.2.1 deployed as statefulset on kubernetes, 5 brokers > * log directory is (AWS) EBS mounted PV, gp2 (ssd) kind of 750GiB > * 5 zookeepers > * topic created with config: > ** name = "backup_br_domain_squad" > partitions = 36 > replication_factor = 3 > config = { > "cleanup.policy" = "compact" > "min.compaction.lag.ms" = "8640" > "min.cleanable.dirty.ratio" = "0.3" > } > > > Log excerpt: > {{[2019-08-07 12:10:53,895] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:53,895] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,896] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,896] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,964] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:53,964] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,964] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,964] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,031] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:54,032] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,032] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,032] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,101] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:54,101] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,101] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,101] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,173] INFO [Log partition=backup_br_domain_squad-14, >
[jira] [Commented] (KAFKA-8764) LogCleanerManager endless loop while compacting/cleaning segments
[ https://issues.apache.org/jira/browse/KAFKA-8764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17013187#comment-17013187 ] Tomislav Rajakovic commented on KAFKA-8764: --- [~jnadler] for now, try to "help" LogCleaner by following solution steps from above. Idea is to move it's state file (cleaner-offset-checkpoint inside topic-partition folder) on all brokers that experiencing issue. Once when next LogSegment becomes available for cleaning, LogCleaner would fix himself and continue to work as expected (unless you'll have new record "holes", but that is, as [~junrao] stated, is rare event happening in edge cases). Advanced solution is to checkout kafka 2.4.0 from github, apply patch attached to this issue, rebuild kafka and run patched version of kafka with your data, and issue, hopefully, should be gone. > LogCleanerManager endless loop while compacting/cleaning segments > - > > Key: KAFKA-8764 > URL: https://issues.apache.org/jira/browse/KAFKA-8764 > Project: Kafka > Issue Type: Bug > Components: log cleaner >Affects Versions: 2.3.0, 2.2.1, 2.4.0 > Environment: docker base image: openjdk:8-jre-alpine base image, > kafka from http://ftp.carnet.hr/misc/apache/kafka/2.2.1/kafka_2.12-2.2.1.tgz >Reporter: Tomislav Rajakovic >Priority: Major > Labels: patch > Attachments: Screen Shot 2020-01-10 at 8.38.25 AM.png, > kafka2.4.0-KAFKA-8764.patch, kafka2.4.0-KAFKA-8764.patch, > log-cleaner-bug-reproduction.zip > > > {{LogCleanerManager stuck in endless loop while clearing segments for one > partition resulting with many log outputs and heavy disk read/writes/IOPS.}} > > Issue appeared on follower brokers, and it happens on every (new) broker if > partition assignment is changed. > > Original issue setup: > * kafka_2.12-2.2.1 deployed as statefulset on kubernetes, 5 brokers > * log directory is (AWS) EBS mounted PV, gp2 (ssd) kind of 750GiB > * 5 zookeepers > * topic created with config: > ** name = "backup_br_domain_squad" > partitions = 36 > replication_factor = 3 > config = { > "cleanup.policy" = "compact" > "min.compaction.lag.ms" = "8640" > "min.cleanable.dirty.ratio" = "0.3" > } > > > Log excerpt: > {{[2019-08-07 12:10:53,895] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:53,895] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,896] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,896] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,964] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:53,964] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,964] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,964] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,031] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:54,032] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,032] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,032] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,101] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:54,101] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,101] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,101] INFO Deleted time index >
[jira] [Commented] (KAFKA-8764) LogCleanerManager endless loop while compacting/cleaning segments
[ https://issues.apache.org/jira/browse/KAFKA-8764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17013179#comment-17013179 ] ASF GitHub Bot commented on KAFKA-8764: --- trajakovic commented on pull request #7932: KAFKA-8764: LogCleanerManager endless loop while compacting/clea URL: https://github.com/apache/kafka/pull/7932 This PR fixes LogCleaner's endless loop while clearing LogSegemnts with holes. In rare cases, when clearing LogSegments with missing records, LogCleaner was unable to progress resulting with high CPU usage, high disk read/writes and excessive cleaner logs (if enabled). This PR addresses such situation by skipping missing record(s) and, as result, avoiding endless loop while clearing such Logs. ### Committer Checklist (excluded from commit message) - [ ] Verify design and implementation - [ ] Verify test coverage and CI build status - [ ] Verify documentation (including upgrade notes) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > LogCleanerManager endless loop while compacting/cleaning segments > - > > Key: KAFKA-8764 > URL: https://issues.apache.org/jira/browse/KAFKA-8764 > Project: Kafka > Issue Type: Bug > Components: log cleaner >Affects Versions: 2.3.0, 2.2.1, 2.4.0 > Environment: docker base image: openjdk:8-jre-alpine base image, > kafka from http://ftp.carnet.hr/misc/apache/kafka/2.2.1/kafka_2.12-2.2.1.tgz >Reporter: Tomislav Rajakovic >Priority: Major > Labels: patch > Attachments: Screen Shot 2020-01-10 at 8.38.25 AM.png, > kafka2.4.0-KAFKA-8764.patch, kafka2.4.0-KAFKA-8764.patch, > log-cleaner-bug-reproduction.zip > > > {{LogCleanerManager stuck in endless loop while clearing segments for one > partition resulting with many log outputs and heavy disk read/writes/IOPS.}} > > Issue appeared on follower brokers, and it happens on every (new) broker if > partition assignment is changed. > > Original issue setup: > * kafka_2.12-2.2.1 deployed as statefulset on kubernetes, 5 brokers > * log directory is (AWS) EBS mounted PV, gp2 (ssd) kind of 750GiB > * 5 zookeepers > * topic created with config: > ** name = "backup_br_domain_squad" > partitions = 36 > replication_factor = 3 > config = { > "cleanup.policy" = "compact" > "min.compaction.lag.ms" = "8640" > "min.cleanable.dirty.ratio" = "0.3" > } > > > Log excerpt: > {{[2019-08-07 12:10:53,895] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:53,895] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,896] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,896] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,964] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:53,964] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,964] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,964] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,031] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:54,032] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,032] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,032] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,101] INFO [Log partition=backup_br_domain_squad-14, >
[jira] [Commented] (KAFKA-8764) LogCleanerManager endless loop while compacting/cleaning segments
[ https://issues.apache.org/jira/browse/KAFKA-8764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17013081#comment-17013081 ] Jun Rao commented on KAFKA-8764: [~trajakovic]: Thanks for providing the additional info. That makes sense now. Currently, the log cleaner runs independently on each replica. If a follower has been down for some time, it is possible that the leader has already cleaned the data and left holes in some log segments. When those segments get replicated to the follower, the follower will clean the same data again and potentially hit the above issue. > LogCleanerManager endless loop while compacting/cleaning segments > - > > Key: KAFKA-8764 > URL: https://issues.apache.org/jira/browse/KAFKA-8764 > Project: Kafka > Issue Type: Bug > Components: log cleaner >Affects Versions: 2.3.0, 2.2.1, 2.4.0 > Environment: docker base image: openjdk:8-jre-alpine base image, > kafka from http://ftp.carnet.hr/misc/apache/kafka/2.2.1/kafka_2.12-2.2.1.tgz >Reporter: Tomislav Rajakovic >Priority: Major > Labels: patch > Attachments: Screen Shot 2020-01-10 at 8.38.25 AM.png, > kafka2.4.0-KAFKA-8764.patch, kafka2.4.0-KAFKA-8764.patch, > log-cleaner-bug-reproduction.zip > > > {{LogCleanerManager stuck in endless loop while clearing segments for one > partition resulting with many log outputs and heavy disk read/writes/IOPS.}} > > Issue appeared on follower brokers, and it happens on every (new) broker if > partition assignment is changed. > > Original issue setup: > * kafka_2.12-2.2.1 deployed as statefulset on kubernetes, 5 brokers > * log directory is (AWS) EBS mounted PV, gp2 (ssd) kind of 750GiB > * 5 zookeepers > * topic created with config: > ** name = "backup_br_domain_squad" > partitions = 36 > replication_factor = 3 > config = { > "cleanup.policy" = "compact" > "min.compaction.lag.ms" = "8640" > "min.cleanable.dirty.ratio" = "0.3" > } > > > Log excerpt: > {{[2019-08-07 12:10:53,895] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:53,895] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,896] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,896] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,964] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:53,964] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,964] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,964] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,031] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:54,032] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,032] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,032] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,101] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:54,101] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,101] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,101] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,173] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} >
[jira] [Commented] (KAFKA-8764) LogCleanerManager endless loop while compacting/cleaning segments
[ https://issues.apache.org/jira/browse/KAFKA-8764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17013073#comment-17013073 ] Tomislav Rajakovic commented on KAFKA-8764: --- Hy [~junrao]. This issue happened on relatively busy kafka cluster, but I've never tinkered with kafka files, or did any manual actions. And what's more indicative, this issue first time happened on follower brokers/replicas (ISRs), while master broker for topic-partition didn't "feel" same effect. Maybe origin of the problem was "first" ever cleanup on master broker, leaving holes, but that's just my guess. And yes, I'm gonna make PR, just need to see "first good PR" and Contribution guideline. > LogCleanerManager endless loop while compacting/cleaning segments > - > > Key: KAFKA-8764 > URL: https://issues.apache.org/jira/browse/KAFKA-8764 > Project: Kafka > Issue Type: Bug > Components: log cleaner >Affects Versions: 2.3.0, 2.2.1, 2.4.0 > Environment: docker base image: openjdk:8-jre-alpine base image, > kafka from http://ftp.carnet.hr/misc/apache/kafka/2.2.1/kafka_2.12-2.2.1.tgz >Reporter: Tomislav Rajakovic >Priority: Major > Labels: patch > Attachments: Screen Shot 2020-01-10 at 8.38.25 AM.png, > kafka2.4.0-KAFKA-8764.patch, kafka2.4.0-KAFKA-8764.patch, > log-cleaner-bug-reproduction.zip > > > {{LogCleanerManager stuck in endless loop while clearing segments for one > partition resulting with many log outputs and heavy disk read/writes/IOPS.}} > > Issue appeared on follower brokers, and it happens on every (new) broker if > partition assignment is changed. > > Original issue setup: > * kafka_2.12-2.2.1 deployed as statefulset on kubernetes, 5 brokers > * log directory is (AWS) EBS mounted PV, gp2 (ssd) kind of 750GiB > * 5 zookeepers > * topic created with config: > ** name = "backup_br_domain_squad" > partitions = 36 > replication_factor = 3 > config = { > "cleanup.policy" = "compact" > "min.compaction.lag.ms" = "8640" > "min.cleanable.dirty.ratio" = "0.3" > } > > > Log excerpt: > {{[2019-08-07 12:10:53,895] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:53,895] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,896] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,896] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,964] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:53,964] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,964] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,964] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,031] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:54,032] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,032] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,032] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,101] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:54,101] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,101] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,101] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,173] INFO
[jira] [Commented] (KAFKA-8764) LogCleanerManager endless loop while compacting/cleaning segments
[ https://issues.apache.org/jira/browse/KAFKA-8764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17013058#comment-17013058 ] Jun Rao commented on KAFKA-8764: [~trajakovic]: Thanks for the investigation. Normally, the offset map is built from the dirty portion of the log that shouldn't contain holes in offsets. If segment [0,233) misses offset 232, it means that this segment has been cleaned and the dirty offset should have moved to 233 after the first round of cleaning. Was the dirty offset ever reset manually? In any case, I agree that it would be better to make the code more defensive. Could you submit the patch as a PR (details can be found in [https://cwiki.apache.org/confluence/display/KAFKA/Contributing+Code+Changes])? > LogCleanerManager endless loop while compacting/cleaning segments > - > > Key: KAFKA-8764 > URL: https://issues.apache.org/jira/browse/KAFKA-8764 > Project: Kafka > Issue Type: Bug > Components: log cleaner >Affects Versions: 2.3.0, 2.2.1, 2.4.0 > Environment: docker base image: openjdk:8-jre-alpine base image, > kafka from http://ftp.carnet.hr/misc/apache/kafka/2.2.1/kafka_2.12-2.2.1.tgz >Reporter: Tomislav Rajakovic >Priority: Major > Labels: patch > Attachments: Screen Shot 2020-01-10 at 8.38.25 AM.png, > kafka2.4.0-KAFKA-8764.patch, kafka2.4.0-KAFKA-8764.patch, > log-cleaner-bug-reproduction.zip > > > {{LogCleanerManager stuck in endless loop while clearing segments for one > partition resulting with many log outputs and heavy disk read/writes/IOPS.}} > > Issue appeared on follower brokers, and it happens on every (new) broker if > partition assignment is changed. > > Original issue setup: > * kafka_2.12-2.2.1 deployed as statefulset on kubernetes, 5 brokers > * log directory is (AWS) EBS mounted PV, gp2 (ssd) kind of 750GiB > * 5 zookeepers > * topic created with config: > ** name = "backup_br_domain_squad" > partitions = 36 > replication_factor = 3 > config = { > "cleanup.policy" = "compact" > "min.compaction.lag.ms" = "8640" > "min.cleanable.dirty.ratio" = "0.3" > } > > > Log excerpt: > {{[2019-08-07 12:10:53,895] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:53,895] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,896] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,896] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,964] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:53,964] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,964] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,964] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,031] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:54,032] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,032] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,032] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,101] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:54,101] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,101] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,101] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. >
[jira] [Commented] (KAFKA-8764) LogCleanerManager endless loop while compacting/cleaning segments
[ https://issues.apache.org/jira/browse/KAFKA-8764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17013052#comment-17013052 ] Jeff Nadler commented on KAFKA-8764: I just attached a graph of CPU usage in a small, 3-node cluster that shows this issue. You can see that for all 3 nodes a massive increase in CPU usage when the log cleaner is going nuts - the high CPU periods correspond to tons of log entries in 'log-cleaner.log', and high CPU usage of the 'kafka-log-clean' thread. > LogCleanerManager endless loop while compacting/cleaning segments > - > > Key: KAFKA-8764 > URL: https://issues.apache.org/jira/browse/KAFKA-8764 > Project: Kafka > Issue Type: Bug > Components: log cleaner >Affects Versions: 2.3.0, 2.2.1, 2.4.0 > Environment: docker base image: openjdk:8-jre-alpine base image, > kafka from http://ftp.carnet.hr/misc/apache/kafka/2.2.1/kafka_2.12-2.2.1.tgz >Reporter: Tomislav Rajakovic >Priority: Major > Labels: patch > Attachments: Screen Shot 2020-01-10 at 8.38.25 AM.png, > kafka2.4.0-KAFKA-8764.patch, kafka2.4.0-KAFKA-8764.patch, > log-cleaner-bug-reproduction.zip > > > {{LogCleanerManager stuck in endless loop while clearing segments for one > partition resulting with many log outputs and heavy disk read/writes/IOPS.}} > > Issue appeared on follower brokers, and it happens on every (new) broker if > partition assignment is changed. > > Original issue setup: > * kafka_2.12-2.2.1 deployed as statefulset on kubernetes, 5 brokers > * log directory is (AWS) EBS mounted PV, gp2 (ssd) kind of 750GiB > * 5 zookeepers > * topic created with config: > ** name = "backup_br_domain_squad" > partitions = 36 > replication_factor = 3 > config = { > "cleanup.policy" = "compact" > "min.compaction.lag.ms" = "8640" > "min.cleanable.dirty.ratio" = "0.3" > } > > > Log excerpt: > {{[2019-08-07 12:10:53,895] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:53,895] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,896] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,896] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,964] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:53,964] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,964] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,964] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,031] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:54,032] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,032] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,032] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,101] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:54,101] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,101] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,101] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,173] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:54,173] INFO Deleted log >
[jira] [Commented] (KAFKA-8764) LogCleanerManager endless loop while compacting/cleaning segments
[ https://issues.apache.org/jira/browse/KAFKA-8764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17012952#comment-17012952 ] Tomislav Rajakovic commented on KAFKA-8764: --- I think that this [^kafka2.4.0-KAFKA-8764.patch] resolves LogCleaner issue(s). > LogCleanerManager endless loop while compacting/cleaning segments > - > > Key: KAFKA-8764 > URL: https://issues.apache.org/jira/browse/KAFKA-8764 > Project: Kafka > Issue Type: Bug > Components: log cleaner >Affects Versions: 2.3.0, 2.2.1, 2.4.0 > Environment: docker base image: openjdk:8-jre-alpine base image, > kafka from http://ftp.carnet.hr/misc/apache/kafka/2.2.1/kafka_2.12-2.2.1.tgz >Reporter: Tomislav Rajakovic >Priority: Major > Attachments: kafka2.4.0-KAFKA-8764.patch, > log-cleaner-bug-reproduction.zip > > > {{LogCleanerManager stuck in endless loop while clearing segments for one > partition resulting with many log outputs and heavy disk read/writes/IOPS.}} > > Issue appeared on follower brokers, and it happens on every (new) broker if > partition assignment is changed. > > Original issue setup: > * kafka_2.12-2.2.1 deployed as statefulset on kubernetes, 5 brokers > * log directory is (AWS) EBS mounted PV, gp2 (ssd) kind of 750GiB > * 5 zookeepers > * topic created with config: > ** name = "backup_br_domain_squad" > partitions = 36 > replication_factor = 3 > config = { > "cleanup.policy" = "compact" > "min.compaction.lag.ms" = "8640" > "min.cleanable.dirty.ratio" = "0.3" > } > > > Log excerpt: > {{[2019-08-07 12:10:53,895] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:53,895] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,896] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,896] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,964] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:53,964] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,964] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,964] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,031] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:54,032] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,032] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,032] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,101] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:54,101] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,101] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,101] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,173] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:54,173] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,173] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,173] INFO Deleted time index >
[jira] [Commented] (KAFKA-8764) LogCleanerManager endless loop while compacting/cleaning segments
[ https://issues.apache.org/jira/browse/KAFKA-8764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17012683#comment-17012683 ] Tomislav Rajakovic commented on KAFKA-8764: --- [~jnadler] and [~seva.f] so it seems that they still didn't fix this issue, or it happens in rare condition(s). I know that in time when I was solving this issue, I've tested it with multiple versions (of 2.x.x) and all had same problem. Although I'm not so much fluent in Scala, I could probably compile latest version and give some hints about patching this issue out (got some clues back then when issue occurred). > LogCleanerManager endless loop while compacting/cleaning segments > - > > Key: KAFKA-8764 > URL: https://issues.apache.org/jira/browse/KAFKA-8764 > Project: Kafka > Issue Type: Bug > Components: log cleaner >Affects Versions: 2.3.0, 2.2.1 > Environment: docker base image: openjdk:8-jre-alpine base image, > kafka from http://ftp.carnet.hr/misc/apache/kafka/2.2.1/kafka_2.12-2.2.1.tgz >Reporter: Tomislav Rajakovic >Priority: Major > Attachments: log-cleaner-bug-reproduction.zip > > > {{LogCleanerManager stuck in endless loop while clearing segments for one > partition resulting with many log outputs and heavy disk read/writes/IOPS.}} > > Issue appeared on follower brokers, and it happens on every (new) broker if > partition assignment is changed. > > Original issue setup: > * kafka_2.12-2.2.1 deployed as statefulset on kubernetes, 5 brokers > * log directory is (AWS) EBS mounted PV, gp2 (ssd) kind of 750GiB > * 5 zookeepers > * topic created with config: > ** name = "backup_br_domain_squad" > partitions = 36 > replication_factor = 3 > config = { > "cleanup.policy" = "compact" > "min.compaction.lag.ms" = "8640" > "min.cleanable.dirty.ratio" = "0.3" > } > > > Log excerpt: > {{[2019-08-07 12:10:53,895] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:53,895] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,896] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,896] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,964] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:53,964] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,964] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,964] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,031] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:54,032] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,032] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,032] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,101] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:54,101] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,101] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,101] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,173] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:54,173] INFO Deleted log >
[jira] [Commented] (KAFKA-8764) LogCleanerManager endless loop while compacting/cleaning segments
[ https://issues.apache.org/jira/browse/KAFKA-8764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17012667#comment-17012667 ] Seva Feldman commented on KAFKA-8764: - Hi, We have the exact same issue with __consumer_offset compacted topic which kills our consumer groups. Thanks, [~trajakovic], for the solution on manually update *cleaner-offset-checkpoint file.* BR > LogCleanerManager endless loop while compacting/cleaning segments > - > > Key: KAFKA-8764 > URL: https://issues.apache.org/jira/browse/KAFKA-8764 > Project: Kafka > Issue Type: Bug > Components: log cleaner >Affects Versions: 2.3.0, 2.2.1 > Environment: docker base image: openjdk:8-jre-alpine base image, > kafka from http://ftp.carnet.hr/misc/apache/kafka/2.2.1/kafka_2.12-2.2.1.tgz >Reporter: Tomislav Rajakovic >Priority: Major > Attachments: log-cleaner-bug-reproduction.zip > > > {{LogCleanerManager stuck in endless loop while clearing segments for one > partition resulting with many log outputs and heavy disk read/writes/IOPS.}} > > Issue appeared on follower brokers, and it happens on every (new) broker if > partition assignment is changed. > > Original issue setup: > * kafka_2.12-2.2.1 deployed as statefulset on kubernetes, 5 brokers > * log directory is (AWS) EBS mounted PV, gp2 (ssd) kind of 750GiB > * 5 zookeepers > * topic created with config: > ** name = "backup_br_domain_squad" > partitions = 36 > replication_factor = 3 > config = { > "cleanup.policy" = "compact" > "min.compaction.lag.ms" = "8640" > "min.cleanable.dirty.ratio" = "0.3" > } > > > Log excerpt: > {{[2019-08-07 12:10:53,895] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:53,895] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,896] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,896] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,964] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:53,964] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,964] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,964] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,031] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:54,032] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,032] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,032] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,101] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:54,101] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,101] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,101] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,173] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:54,173] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,173] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} >
[jira] [Commented] (KAFKA-8764) LogCleanerManager endless loop while compacting/cleaning segments
[ https://issues.apache.org/jira/browse/KAFKA-8764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17009262#comment-17009262 ] Jeff Nadler commented on KAFKA-8764: I wanted to throw my .02 in here: We are seeing the same issue, also with compact topics. The compact topics in question are quite low traffic. LogCleaner is running flat-out, with only a few ms between attempts to clean a single topic-partition. The log cleaner thread is consuming almost an entire core: {code:java} 14857 kafka 20 0 8894844 2.187g 12396 R 87.5 56.8 47:04.92 kafka-log-clean {code} We're running 2.4.0, openjdk11, happy to provide any add'l info to help with this issue. > LogCleanerManager endless loop while compacting/cleaning segments > - > > Key: KAFKA-8764 > URL: https://issues.apache.org/jira/browse/KAFKA-8764 > Project: Kafka > Issue Type: Bug > Components: log cleaner >Affects Versions: 2.3.0, 2.2.1 > Environment: docker base image: openjdk:8-jre-alpine base image, > kafka from http://ftp.carnet.hr/misc/apache/kafka/2.2.1/kafka_2.12-2.2.1.tgz >Reporter: Tomislav Rajakovic >Priority: Major > Attachments: log-cleaner-bug-reproduction.zip > > > {{LogCleanerManager stuck in endless loop while clearing segments for one > partition resulting with many log outputs and heavy disk read/writes/IOPS.}} > > Issue appeared on follower brokers, and it happens on every (new) broker if > partition assignment is changed. > > Original issue setup: > * kafka_2.12-2.2.1 deployed as statefulset on kubernetes, 5 brokers > * log directory is (AWS) EBS mounted PV, gp2 (ssd) kind of 750GiB > * 5 zookeepers > * topic created with config: > ** name = "backup_br_domain_squad" > partitions = 36 > replication_factor = 3 > config = { > "cleanup.policy" = "compact" > "min.compaction.lag.ms" = "8640" > "min.cleanable.dirty.ratio" = "0.3" > } > > > Log excerpt: > {{[2019-08-07 12:10:53,895] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:53,895] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,896] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,896] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,964] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:53,964] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,964] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,964] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,031] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:54,032] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,032] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,032] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,101] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:54,101] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,101] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,101] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,173] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:54,173] INFO Deleted log
[jira] [Commented] (KAFKA-8764) LogCleanerManager endless loop while compacting/cleaning segments
[ https://issues.apache.org/jira/browse/KAFKA-8764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16912204#comment-16912204 ] Tomislav Rajakovic commented on KAFKA-8764: --- Btw. *solution* for this issue, while I'm waiting for someone smart to find/fix the bug, is to manually update *cleaner-offset-checkpoint.* h2. Steps: # Stop broker(s) with described issue # find partition(s) that bugged LogCleaner ** good place for search them is log-cleaner.log output, something like endless lines of this: ** [2019-08-21 09:33:32,060] INFO Cleaner 0: Building offset map for log br_domain_match-22 for 1 segments in offset range [0, 180015). (kafka.log.LogCleaner) ** [2019-08-21 09:33:32,167] INFO Cleaner 0: Building offset map for log br_domain_match-22 for 1 segments in offset range [179960, 180015). (kafka.log.LogCleaner) ** [2019-08-21 09:33:32,195] INFO Cleaner 0: Building offset map for log br_domain_match-22 for 1 segments in offset range [0, 180015). (kafka.log.LogCleaner) ** [2019-08-21 09:33:32,440] INFO Cleaner 0: Building offset map for log br_domain_match-22 for 1 segments in offset range [179960, 180015). (kafka.log.LogCleaner) ** ... ** in this case, partition br_domain_match-22 is troubled one, and LogCleaner flip-flops between [0, 180015), [179960, 180015), [0, 180015), # remember/writeup last offset for partition in ranges (in my case [179960, 180015), so it's *NEXT_CLEANING_OFFSET=180015*) # open cleaner-offset-checkpoint file (placed at root of log.dirs folder), and find line with topic-partition ** in my case, it's topic *br_domain_match*, partition *22*, so line contains *br_domain_match 22 X* ** replace *X* with *NEXT_CLEANING_OFFSET* (from point 3.) ** save file # and repeat that for each partition with issue # start broker and watch cleaner log file, with little luck, everything should be ok > LogCleanerManager endless loop while compacting/cleaning segments > - > > Key: KAFKA-8764 > URL: https://issues.apache.org/jira/browse/KAFKA-8764 > Project: Kafka > Issue Type: Bug > Components: log cleaner >Affects Versions: 2.3.0, 2.2.1 > Environment: docker base image: openjdk:8-jre-alpine base image, > kafka from http://ftp.carnet.hr/misc/apache/kafka/2.2.1/kafka_2.12-2.2.1.tgz >Reporter: Tomislav Rajakovic >Priority: Major > Attachments: log-cleaner-bug-reproduction.zip > > > {{LogCleanerManager stuck in endless loop while clearing segments for one > partition resulting with many log outputs and heavy disk read/writes/IOPS.}} > > Issue appeared on follower brokers, and it happens on every (new) broker if > partition assignment is changed. > > Original issue setup: > * kafka_2.12-2.2.1 deployed as statefulset on kubernetes, 5 brokers > * log directory is (AWS) EBS mounted PV, gp2 (ssd) kind of 750GiB > * 5 zookeepers > * topic created with config: > ** name = "backup_br_domain_squad" > partitions = 36 > replication_factor = 3 > config = { > "cleanup.policy" = "compact" > "min.compaction.lag.ms" = "8640" > "min.cleanable.dirty.ratio" = "0.3" > } > > > Log excerpt: > {{[2019-08-07 12:10:53,895] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:53,895] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,896] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,896] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,964] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:53,964] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,964] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,964] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,031] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:54,032] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} >
[jira] [Commented] (KAFKA-8764) LogCleanerManager endless loop while compacting/cleaning segments
[ https://issues.apache.org/jira/browse/KAFKA-8764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16912178#comment-16912178 ] Tomislav Rajakovic commented on KAFKA-8764: --- Well it happened again on totally separated environment (production this time), same version of kafka (2.2.1). Issue started after I've removed one broker from cluster, wipe out data, and return it to cluster. When checked log segment (kafka.tools.DumpLogSegments) of troublesome partition, last offset was 179959, while next (active) segment was rolled with 180015 offset. So log cleaner again was jumping: * Building offset map for log X for 1 segments in offset range [0, 180015) * Building offset map for log X for 1 segments in offset range [179960, 180015) * Building offset map for log X for 1 segments in offset range [0, 180015) * ...etc Since I'm not familiar with LOG structure, I don't know if empty offsets (from [179960, 180015)) in 0 segment are allowed, and if it's problem in LogCleaner, LOG structure or discrepancy between them. > LogCleanerManager endless loop while compacting/cleaning segments > - > > Key: KAFKA-8764 > URL: https://issues.apache.org/jira/browse/KAFKA-8764 > Project: Kafka > Issue Type: Bug > Components: log cleaner >Affects Versions: 2.3.0, 2.2.1 > Environment: docker base image: openjdk:8-jre-alpine base image, > kafka from http://ftp.carnet.hr/misc/apache/kafka/2.2.1/kafka_2.12-2.2.1.tgz >Reporter: Tomislav Rajakovic >Priority: Major > Attachments: log-cleaner-bug-reproduction.zip > > > {{LogCleanerManager stuck in endless loop while clearing segments for one > partition resulting with many log outputs and heavy disk read/writes/IOPS.}} > > Issue appeared on follower brokers, and it happens on every (new) broker if > partition assignment is changed. > > Original issue setup: > * kafka_2.12-2.2.1 deployed as statefulset on kubernetes, 5 brokers > * log directory is (AWS) EBS mounted PV, gp2 (ssd) kind of 750GiB > * 5 zookeepers > * topic created with config: > ** name = "backup_br_domain_squad" > partitions = 36 > replication_factor = 3 > config = { > "cleanup.policy" = "compact" > "min.compaction.lag.ms" = "8640" > "min.cleanable.dirty.ratio" = "0.3" > } > > > Log excerpt: > {{[2019-08-07 12:10:53,895] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:53,895] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,896] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,896] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,964] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:53,964] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,964] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,964] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,031] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:54,032] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,032] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,032] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,101] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:54,101] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,101] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. >