[jira] [Updated] (KAFKA-8764) LogCleanerManager endless loop while compacting/cleaning segments
[ https://issues.apache.org/jira/browse/KAFKA-8764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tomislav Rajakovic updated KAFKA-8764: -- Affects Version/s: 2.4.0 > LogCleanerManager endless loop while compacting/cleaning segments > - > > Key: KAFKA-8764 > URL: https://issues.apache.org/jira/browse/KAFKA-8764 > Project: Kafka > Issue Type: Bug > Components: log cleaner >Affects Versions: 2.3.0, 2.2.1, 2.4.0 > Environment: docker base image: openjdk:8-jre-alpine base image, > kafka from http://ftp.carnet.hr/misc/apache/kafka/2.2.1/kafka_2.12-2.2.1.tgz >Reporter: Tomislav Rajakovic >Priority: Major > Attachments: log-cleaner-bug-reproduction.zip > > > {{LogCleanerManager stuck in endless loop while clearing segments for one > partition resulting with many log outputs and heavy disk read/writes/IOPS.}} > > Issue appeared on follower brokers, and it happens on every (new) broker if > partition assignment is changed. > > Original issue setup: > * kafka_2.12-2.2.1 deployed as statefulset on kubernetes, 5 brokers > * log directory is (AWS) EBS mounted PV, gp2 (ssd) kind of 750GiB > * 5 zookeepers > * topic created with config: > ** name = "backup_br_domain_squad" > partitions = 36 > replication_factor = 3 > config = { > "cleanup.policy" = "compact" > "min.compaction.lag.ms" = "8640" > "min.cleanable.dirty.ratio" = "0.3" > } > > > Log excerpt: > {{[2019-08-07 12:10:53,895] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:53,895] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,896] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,896] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,964] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:53,964] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,964] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,964] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,031] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:54,032] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,032] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,032] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,101] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:54,101] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,101] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,101] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,173] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:54,173] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,173] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,173] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > > And such log keeps repeating
[jira] [Commented] (KAFKA-8764) LogCleanerManager endless loop while compacting/cleaning segments
[ https://issues.apache.org/jira/browse/KAFKA-8764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17012683#comment-17012683 ] Tomislav Rajakovic commented on KAFKA-8764: --- [~jnadler] and [~seva.f] so it seems that they still didn't fix this issue, or it happens in rare condition(s). I know that in time when I was solving this issue, I've tested it with multiple versions (of 2.x.x) and all had same problem. Although I'm not so much fluent in Scala, I could probably compile latest version and give some hints about patching this issue out (got some clues back then when issue occurred). > LogCleanerManager endless loop while compacting/cleaning segments > - > > Key: KAFKA-8764 > URL: https://issues.apache.org/jira/browse/KAFKA-8764 > Project: Kafka > Issue Type: Bug > Components: log cleaner >Affects Versions: 2.3.0, 2.2.1 > Environment: docker base image: openjdk:8-jre-alpine base image, > kafka from http://ftp.carnet.hr/misc/apache/kafka/2.2.1/kafka_2.12-2.2.1.tgz >Reporter: Tomislav Rajakovic >Priority: Major > Attachments: log-cleaner-bug-reproduction.zip > > > {{LogCleanerManager stuck in endless loop while clearing segments for one > partition resulting with many log outputs and heavy disk read/writes/IOPS.}} > > Issue appeared on follower brokers, and it happens on every (new) broker if > partition assignment is changed. > > Original issue setup: > * kafka_2.12-2.2.1 deployed as statefulset on kubernetes, 5 brokers > * log directory is (AWS) EBS mounted PV, gp2 (ssd) kind of 750GiB > * 5 zookeepers > * topic created with config: > ** name = "backup_br_domain_squad" > partitions = 36 > replication_factor = 3 > config = { > "cleanup.policy" = "compact" > "min.compaction.lag.ms" = "8640" > "min.cleanable.dirty.ratio" = "0.3" > } > > > Log excerpt: > {{[2019-08-07 12:10:53,895] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:53,895] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,896] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,896] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,964] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:53,964] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,964] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,964] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,031] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:54,032] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,032] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,032] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,101] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:54,101] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,101] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,101] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,173] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:54,173] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/0
[jira] [Comment Edited] (KAFKA-8764) LogCleanerManager endless loop while compacting/cleaning segments
[ https://issues.apache.org/jira/browse/KAFKA-8764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17012683#comment-17012683 ] Tomislav Rajakovic edited comment on KAFKA-8764 at 1/10/20 10:24 AM: - [~jnadler] and [~seva.f] so it seems that they still didn't fix this issue, or it happens in rare condition(s). I know that in time when I was solving this issue, I've tested it with multiple versions (of 2.x.x) and all had same problem. Although I'm not so much fluent in Scala, I could probably compile latest version and give some hints about patching this issue out (got some clues back then when issue occurred). Btw. voting for this issue might get faster help/attention ;). was (Author: trajakovic): [~jnadler] and [~seva.f] so it seems that they still didn't fix this issue, or it happens in rare condition(s). I know that in time when I was solving this issue, I've tested it with multiple versions (of 2.x.x) and all had same problem. Although I'm not so much fluent in Scala, I could probably compile latest version and give some hints about patching this issue out (got some clues back then when issue occurred). > LogCleanerManager endless loop while compacting/cleaning segments > - > > Key: KAFKA-8764 > URL: https://issues.apache.org/jira/browse/KAFKA-8764 > Project: Kafka > Issue Type: Bug > Components: log cleaner >Affects Versions: 2.3.0, 2.2.1, 2.4.0 > Environment: docker base image: openjdk:8-jre-alpine base image, > kafka from http://ftp.carnet.hr/misc/apache/kafka/2.2.1/kafka_2.12-2.2.1.tgz >Reporter: Tomislav Rajakovic >Priority: Major > Attachments: log-cleaner-bug-reproduction.zip > > > {{LogCleanerManager stuck in endless loop while clearing segments for one > partition resulting with many log outputs and heavy disk read/writes/IOPS.}} > > Issue appeared on follower brokers, and it happens on every (new) broker if > partition assignment is changed. > > Original issue setup: > * kafka_2.12-2.2.1 deployed as statefulset on kubernetes, 5 brokers > * log directory is (AWS) EBS mounted PV, gp2 (ssd) kind of 750GiB > * 5 zookeepers > * topic created with config: > ** name = "backup_br_domain_squad" > partitions = 36 > replication_factor = 3 > config = { > "cleanup.policy" = "compact" > "min.compaction.lag.ms" = "8640" > "min.cleanable.dirty.ratio" = "0.3" > } > > > Log excerpt: > {{[2019-08-07 12:10:53,895] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:53,895] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,896] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,896] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,964] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:53,964] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,964] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,964] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,031] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:54,032] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,032] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,032] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,101] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:54,101] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,101] INFO
[jira] [Updated] (KAFKA-8764) LogCleanerManager endless loop while compacting/cleaning segments
[ https://issues.apache.org/jira/browse/KAFKA-8764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tomislav Rajakovic updated KAFKA-8764: -- Attachment: kafka2.4.0-KAFKA-8764.patch > LogCleanerManager endless loop while compacting/cleaning segments > - > > Key: KAFKA-8764 > URL: https://issues.apache.org/jira/browse/KAFKA-8764 > Project: Kafka > Issue Type: Bug > Components: log cleaner >Affects Versions: 2.3.0, 2.2.1, 2.4.0 > Environment: docker base image: openjdk:8-jre-alpine base image, > kafka from http://ftp.carnet.hr/misc/apache/kafka/2.2.1/kafka_2.12-2.2.1.tgz >Reporter: Tomislav Rajakovic >Priority: Major > Attachments: kafka2.4.0-KAFKA-8764.patch, > log-cleaner-bug-reproduction.zip > > > {{LogCleanerManager stuck in endless loop while clearing segments for one > partition resulting with many log outputs and heavy disk read/writes/IOPS.}} > > Issue appeared on follower brokers, and it happens on every (new) broker if > partition assignment is changed. > > Original issue setup: > * kafka_2.12-2.2.1 deployed as statefulset on kubernetes, 5 brokers > * log directory is (AWS) EBS mounted PV, gp2 (ssd) kind of 750GiB > * 5 zookeepers > * topic created with config: > ** name = "backup_br_domain_squad" > partitions = 36 > replication_factor = 3 > config = { > "cleanup.policy" = "compact" > "min.compaction.lag.ms" = "8640" > "min.cleanable.dirty.ratio" = "0.3" > } > > > Log excerpt: > {{[2019-08-07 12:10:53,895] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:53,895] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,896] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,896] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,964] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:53,964] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,964] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,964] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,031] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:54,032] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,032] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,032] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,101] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:54,101] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,101] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,101] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,173] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:54,173] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,173] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,173] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.L
[jira] [Commented] (KAFKA-8764) LogCleanerManager endless loop while compacting/cleaning segments
[ https://issues.apache.org/jira/browse/KAFKA-8764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17012952#comment-17012952 ] Tomislav Rajakovic commented on KAFKA-8764: --- I think that this [^kafka2.4.0-KAFKA-8764.patch] resolves LogCleaner issue(s). > LogCleanerManager endless loop while compacting/cleaning segments > - > > Key: KAFKA-8764 > URL: https://issues.apache.org/jira/browse/KAFKA-8764 > Project: Kafka > Issue Type: Bug > Components: log cleaner >Affects Versions: 2.3.0, 2.2.1, 2.4.0 > Environment: docker base image: openjdk:8-jre-alpine base image, > kafka from http://ftp.carnet.hr/misc/apache/kafka/2.2.1/kafka_2.12-2.2.1.tgz >Reporter: Tomislav Rajakovic >Priority: Major > Attachments: kafka2.4.0-KAFKA-8764.patch, > log-cleaner-bug-reproduction.zip > > > {{LogCleanerManager stuck in endless loop while clearing segments for one > partition resulting with many log outputs and heavy disk read/writes/IOPS.}} > > Issue appeared on follower brokers, and it happens on every (new) broker if > partition assignment is changed. > > Original issue setup: > * kafka_2.12-2.2.1 deployed as statefulset on kubernetes, 5 brokers > * log directory is (AWS) EBS mounted PV, gp2 (ssd) kind of 750GiB > * 5 zookeepers > * topic created with config: > ** name = "backup_br_domain_squad" > partitions = 36 > replication_factor = 3 > config = { > "cleanup.policy" = "compact" > "min.compaction.lag.ms" = "8640" > "min.cleanable.dirty.ratio" = "0.3" > } > > > Log excerpt: > {{[2019-08-07 12:10:53,895] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:53,895] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,896] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,896] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,964] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:53,964] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,964] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,964] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,031] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:54,032] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,032] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,032] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,101] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:54,101] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,101] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,101] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,173] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:54,173] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,173] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,173] INFO Deleted time index > /var/lib/kafka/
[jira] [Issue Comment Deleted] (KAFKA-8764) LogCleanerManager endless loop while compacting/cleaning segments
[ https://issues.apache.org/jira/browse/KAFKA-8764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tomislav Rajakovic updated KAFKA-8764: -- Comment: was deleted (was: I think that this [^kafka2.4.0-KAFKA-8764.patch] resolves LogCleaner issue(s).) > LogCleanerManager endless loop while compacting/cleaning segments > - > > Key: KAFKA-8764 > URL: https://issues.apache.org/jira/browse/KAFKA-8764 > Project: Kafka > Issue Type: Bug > Components: log cleaner >Affects Versions: 2.3.0, 2.2.1, 2.4.0 > Environment: docker base image: openjdk:8-jre-alpine base image, > kafka from http://ftp.carnet.hr/misc/apache/kafka/2.2.1/kafka_2.12-2.2.1.tgz >Reporter: Tomislav Rajakovic >Priority: Major > Labels: patch > Attachments: kafka2.4.0-KAFKA-8764.patch, > kafka2.4.0-KAFKA-8764.patch, log-cleaner-bug-reproduction.zip > > > {{LogCleanerManager stuck in endless loop while clearing segments for one > partition resulting with many log outputs and heavy disk read/writes/IOPS.}} > > Issue appeared on follower brokers, and it happens on every (new) broker if > partition assignment is changed. > > Original issue setup: > * kafka_2.12-2.2.1 deployed as statefulset on kubernetes, 5 brokers > * log directory is (AWS) EBS mounted PV, gp2 (ssd) kind of 750GiB > * 5 zookeepers > * topic created with config: > ** name = "backup_br_domain_squad" > partitions = 36 > replication_factor = 3 > config = { > "cleanup.policy" = "compact" > "min.compaction.lag.ms" = "8640" > "min.cleanable.dirty.ratio" = "0.3" > } > > > Log excerpt: > {{[2019-08-07 12:10:53,895] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:53,895] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,896] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,896] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,964] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:53,964] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,964] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,964] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,031] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:54,032] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,032] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,032] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,101] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:54,101] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,101] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,101] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,173] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:54,173] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,173] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,173] INFO Del
[jira] [Commented] (KAFKA-8764) LogCleanerManager endless loop while compacting/cleaning segments
[ https://issues.apache.org/jira/browse/KAFKA-8764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17013073#comment-17013073 ] Tomislav Rajakovic commented on KAFKA-8764: --- Hy [~junrao]. This issue happened on relatively busy kafka cluster, but I've never tinkered with kafka files, or did any manual actions. And what's more indicative, this issue first time happened on follower brokers/replicas (ISRs), while master broker for topic-partition didn't "feel" same effect. Maybe origin of the problem was "first" ever cleanup on master broker, leaving holes, but that's just my guess. And yes, I'm gonna make PR, just need to see "first good PR" and Contribution guideline. > LogCleanerManager endless loop while compacting/cleaning segments > - > > Key: KAFKA-8764 > URL: https://issues.apache.org/jira/browse/KAFKA-8764 > Project: Kafka > Issue Type: Bug > Components: log cleaner >Affects Versions: 2.3.0, 2.2.1, 2.4.0 > Environment: docker base image: openjdk:8-jre-alpine base image, > kafka from http://ftp.carnet.hr/misc/apache/kafka/2.2.1/kafka_2.12-2.2.1.tgz >Reporter: Tomislav Rajakovic >Priority: Major > Labels: patch > Attachments: Screen Shot 2020-01-10 at 8.38.25 AM.png, > kafka2.4.0-KAFKA-8764.patch, kafka2.4.0-KAFKA-8764.patch, > log-cleaner-bug-reproduction.zip > > > {{LogCleanerManager stuck in endless loop while clearing segments for one > partition resulting with many log outputs and heavy disk read/writes/IOPS.}} > > Issue appeared on follower brokers, and it happens on every (new) broker if > partition assignment is changed. > > Original issue setup: > * kafka_2.12-2.2.1 deployed as statefulset on kubernetes, 5 brokers > * log directory is (AWS) EBS mounted PV, gp2 (ssd) kind of 750GiB > * 5 zookeepers > * topic created with config: > ** name = "backup_br_domain_squad" > partitions = 36 > replication_factor = 3 > config = { > "cleanup.policy" = "compact" > "min.compaction.lag.ms" = "8640" > "min.cleanable.dirty.ratio" = "0.3" > } > > > Log excerpt: > {{[2019-08-07 12:10:53,895] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:53,895] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,896] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,896] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,964] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:53,964] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,964] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,964] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,031] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:54,032] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,032] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,032] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,101] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:54,101] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,101] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,101] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:1
[jira] [Commented] (KAFKA-8764) LogCleanerManager endless loop while compacting/cleaning segments
[ https://issues.apache.org/jira/browse/KAFKA-8764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17013187#comment-17013187 ] Tomislav Rajakovic commented on KAFKA-8764: --- [~jnadler] for now, try to "help" LogCleaner by following solution steps from above. Idea is to move it's state file (cleaner-offset-checkpoint inside topic-partition folder) on all brokers that experiencing issue. Once when next LogSegment becomes available for cleaning, LogCleaner would fix himself and continue to work as expected (unless you'll have new record "holes", but that is, as [~junrao] stated, is rare event happening in edge cases). Advanced solution is to checkout kafka 2.4.0 from github, apply patch attached to this issue, rebuild kafka and run patched version of kafka with your data, and issue, hopefully, should be gone. > LogCleanerManager endless loop while compacting/cleaning segments > - > > Key: KAFKA-8764 > URL: https://issues.apache.org/jira/browse/KAFKA-8764 > Project: Kafka > Issue Type: Bug > Components: log cleaner >Affects Versions: 2.3.0, 2.2.1, 2.4.0 > Environment: docker base image: openjdk:8-jre-alpine base image, > kafka from http://ftp.carnet.hr/misc/apache/kafka/2.2.1/kafka_2.12-2.2.1.tgz >Reporter: Tomislav Rajakovic >Priority: Major > Labels: patch > Attachments: Screen Shot 2020-01-10 at 8.38.25 AM.png, > kafka2.4.0-KAFKA-8764.patch, kafka2.4.0-KAFKA-8764.patch, > log-cleaner-bug-reproduction.zip > > > {{LogCleanerManager stuck in endless loop while clearing segments for one > partition resulting with many log outputs and heavy disk read/writes/IOPS.}} > > Issue appeared on follower brokers, and it happens on every (new) broker if > partition assignment is changed. > > Original issue setup: > * kafka_2.12-2.2.1 deployed as statefulset on kubernetes, 5 brokers > * log directory is (AWS) EBS mounted PV, gp2 (ssd) kind of 750GiB > * 5 zookeepers > * topic created with config: > ** name = "backup_br_domain_squad" > partitions = 36 > replication_factor = 3 > config = { > "cleanup.policy" = "compact" > "min.compaction.lag.ms" = "8640" > "min.cleanable.dirty.ratio" = "0.3" > } > > > Log excerpt: > {{[2019-08-07 12:10:53,895] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:53,895] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,896] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,896] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,964] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:53,964] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,964] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,964] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,031] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:54,032] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,032] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,032] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,101] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:54,101] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,101] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,101] INFO Deleted time index > /va
[jira] [Created] (KAFKA-8764) LogCleanerManager endless loop while compacting/cleaning segments
Tomislav Rajakovic created KAFKA-8764: - Summary: LogCleanerManager endless loop while compacting/cleaning segments Key: KAFKA-8764 URL: https://issues.apache.org/jira/browse/KAFKA-8764 Project: Kafka Issue Type: Bug Components: log cleaner Affects Versions: 2.2.1 Environment: docker base image: openjdk:8-jre-alpine base image, kafka from http://ftp.carnet.hr/misc/apache/kafka/2.2.1/kafka_2.12-2.2.1.tgz Reporter: Tomislav Rajakovic Attachments: log-cleaner-bug-reproduction.zip {{LogCleanerManager stuck in endless loop while clearing segments for one partition resulting with many log outputs and heavy disk read/writes/IOPS.}} Issue appeared on follower brokers, and it happens on every (new) broker if partition assignment is changed. Original issue setup: * kafka_2.12-2.2.1 deployed as statefulset on kubernetes, 5 brokers * log directory is (AWS) EBS mounted PV, gp2 (ssd) kind of 750GiB * 5 zookeepers * topic created with config: ** name = "backup_br_domain_squad" partitions = 36 replication_factor = 3 config = { "cleanup.policy" = "compact" "min.compaction.lag.ms" = "8640" "min.cleanable.dirty.ratio" = "0.3" } Log excerpt: {{[2019-08-07 12:10:53,895] INFO [Log partition=backup_br_domain_squad-14, dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} {{[2019-08-07 12:10:53,895] INFO Deleted log /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. (kafka.log.LogSegment)}} {{[2019-08-07 12:10:53,896] INFO Deleted offset index /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. (kafka.log.LogSegment)}} {{[2019-08-07 12:10:53,896] INFO Deleted time index /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. (kafka.log.LogSegment)}} {{[2019-08-07 12:10:53,964] INFO [Log partition=backup_br_domain_squad-14, dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} {{[2019-08-07 12:10:53,964] INFO Deleted log /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. (kafka.log.LogSegment)}} {{[2019-08-07 12:10:53,964] INFO Deleted offset index /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. (kafka.log.LogSegment)}} {{[2019-08-07 12:10:53,964] INFO Deleted time index /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. (kafka.log.LogSegment)}} {{[2019-08-07 12:10:54,031] INFO [Log partition=backup_br_domain_squad-14, dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} {{[2019-08-07 12:10:54,032] INFO Deleted log /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. (kafka.log.LogSegment)}} {{[2019-08-07 12:10:54,032] INFO Deleted offset index /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. (kafka.log.LogSegment)}} {{[2019-08-07 12:10:54,032] INFO Deleted time index /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. (kafka.log.LogSegment)}} {{[2019-08-07 12:10:54,101] INFO [Log partition=backup_br_domain_squad-14, dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} {{[2019-08-07 12:10:54,101] INFO Deleted log /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. (kafka.log.LogSegment)}} {{[2019-08-07 12:10:54,101] INFO Deleted offset index /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. (kafka.log.LogSegment)}} {{[2019-08-07 12:10:54,101] INFO Deleted time index /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. (kafka.log.LogSegment)}} {{[2019-08-07 12:10:54,173] INFO [Log partition=backup_br_domain_squad-14, dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} {{[2019-08-07 12:10:54,173] INFO Deleted log /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. (kafka.log.LogSegment)}} {{[2019-08-07 12:10:54,173] INFO Deleted offset index /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. (kafka.log.LogSegment)}} {{[2019-08-07 12:10:54,173] INFO Deleted time index /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. (kafka.log.LogSegment)}} And such log keeps repeating forever. I've been able to extract segment files from (running) leader broker, and replicated same behaviour locally. h1. h1. *Reproduction setup:* * start single broker kafka_2.12-2.2.1 * create topic ** {{./bin/kafka-topics.sh --bootstrap-server *__BOOTSTRAP_SERVER__*:9092 --create --topic backup_br_domain_squad --partitions 1 --replication-factor 1 --config cleanup.policy=compact --conf
[jira] [Updated] (KAFKA-8764) LogCleanerManager endless loop while compacting/cleaning segments
[ https://issues.apache.org/jira/browse/KAFKA-8764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tomislav Rajakovic updated KAFKA-8764: -- Affects Version/s: 2.3.0 > LogCleanerManager endless loop while compacting/cleaning segments > - > > Key: KAFKA-8764 > URL: https://issues.apache.org/jira/browse/KAFKA-8764 > Project: Kafka > Issue Type: Bug > Components: log cleaner >Affects Versions: 2.3.0, 2.2.1 > Environment: docker base image: openjdk:8-jre-alpine base image, > kafka from http://ftp.carnet.hr/misc/apache/kafka/2.2.1/kafka_2.12-2.2.1.tgz >Reporter: Tomislav Rajakovic >Priority: Major > Attachments: log-cleaner-bug-reproduction.zip > > > {{LogCleanerManager stuck in endless loop while clearing segments for one > partition resulting with many log outputs and heavy disk read/writes/IOPS.}} > > Issue appeared on follower brokers, and it happens on every (new) broker if > partition assignment is changed. > > Original issue setup: > * kafka_2.12-2.2.1 deployed as statefulset on kubernetes, 5 brokers > * log directory is (AWS) EBS mounted PV, gp2 (ssd) kind of 750GiB > * 5 zookeepers > * topic created with config: > ** name = "backup_br_domain_squad" > partitions = 36 > replication_factor = 3 > config = { > "cleanup.policy" = "compact" > "min.compaction.lag.ms" = "8640" > "min.cleanable.dirty.ratio" = "0.3" > } > > > Log excerpt: > {{[2019-08-07 12:10:53,895] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:53,895] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,896] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,896] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,964] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:53,964] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,964] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,964] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,031] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:54,032] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,032] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,032] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,101] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:54,101] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,101] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,101] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,173] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:54,173] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,173] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,173] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > > And such log keeps repeating foreve
[jira] [Commented] (KAFKA-8764) LogCleanerManager endless loop while compacting/cleaning segments
[ https://issues.apache.org/jira/browse/KAFKA-8764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16903102#comment-16903102 ] Tomislav Rajakovic commented on KAFKA-8764: --- h1. Clue: By further investigation, it looks like: * segment has end (existing) offset at 231 * record with offset 232 is missing?! * while (active) segment 0233 has first offset 233 When LogCleaner starts, it first looks for [0,233) to cleanup, but while building buildOffsetMap, it ends up with offset 231 (that's last written offset in segment), and new endOffset = 231 + 1 = 232, and this is written in cleaner-offset-checkpoint file. Next time, LogCleaner starts with [232,233) to cleanup, but while building buildOffsetMap, it ends up with offset -1, since 232 is non-existing in segment, and new endOffset= -1 + 1 = 0, and this is written in cleaner-offset-checkpoint. h1. Elaboration: By looking at *kafka_2.12_2.2.1* source code, * all this happens inside LogCleaner.doClean (lines: 492-520), where buildOffsetMap(line: 501) in second clean run (probably also in first run, since 232 is non-existing offset) builds wrong offsetMap since reading * segment.log.readInto(readBuffer, position) (line: 899) and val records = MemoryRecords.readableRecords(readBuffer) (line: 905) * yields with offset 231 record, and * for (batch <- records.batches.asScala) (line: 909-934) is skipped, while * map: OffsetMap remains unfilled with default lastOffset = -1 which is later used to determine endOffset for next cleaning (thus endOffset = -1 + 1 = 0, later written in cleaner-offset-checkpoint) h1. Summary: Since last record offset in segment is 231 (232 is missing) and next rolled segment is 0233, LogCleaner never progress past segment ending in endless loop cleaning segment with *cleaner-offset-checkpoint* 0->232->0->232causing high disk read/write/iops operations. > LogCleanerManager endless loop while compacting/cleaning segments > - > > Key: KAFKA-8764 > URL: https://issues.apache.org/jira/browse/KAFKA-8764 > Project: Kafka > Issue Type: Bug > Components: log cleaner >Affects Versions: 2.3.0, 2.2.1 > Environment: docker base image: openjdk:8-jre-alpine base image, > kafka from http://ftp.carnet.hr/misc/apache/kafka/2.2.1/kafka_2.12-2.2.1.tgz >Reporter: Tomislav Rajakovic >Priority: Major > Attachments: log-cleaner-bug-reproduction.zip > > > {{LogCleanerManager stuck in endless loop while clearing segments for one > partition resulting with many log outputs and heavy disk read/writes/IOPS.}} > > Issue appeared on follower brokers, and it happens on every (new) broker if > partition assignment is changed. > > Original issue setup: > * kafka_2.12-2.2.1 deployed as statefulset on kubernetes, 5 brokers > * log directory is (AWS) EBS mounted PV, gp2 (ssd) kind of 750GiB > * 5 zookeepers > * topic created with config: > ** name = "backup_br_domain_squad" > partitions = 36 > replication_factor = 3 > config = { > "cleanup.policy" = "compact" > "min.compaction.lag.ms" = "8640" > "min.cleanable.dirty.ratio" = "0.3" > } > > > Log excerpt: > {{[2019-08-07 12:10:53,895] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:53,895] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,896] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,896] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,964] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:53,964] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,964] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,964] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,031] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segme
[jira] [Commented] (KAFKA-8764) LogCleanerManager endless loop while compacting/cleaning segments
[ https://issues.apache.org/jira/browse/KAFKA-8764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16912178#comment-16912178 ] Tomislav Rajakovic commented on KAFKA-8764: --- Well it happened again on totally separated environment (production this time), same version of kafka (2.2.1). Issue started after I've removed one broker from cluster, wipe out data, and return it to cluster. When checked log segment (kafka.tools.DumpLogSegments) of troublesome partition, last offset was 179959, while next (active) segment was rolled with 180015 offset. So log cleaner again was jumping: * Building offset map for log X for 1 segments in offset range [0, 180015) * Building offset map for log X for 1 segments in offset range [179960, 180015) * Building offset map for log X for 1 segments in offset range [0, 180015) * ...etc Since I'm not familiar with LOG structure, I don't know if empty offsets (from [179960, 180015)) in 0 segment are allowed, and if it's problem in LogCleaner, LOG structure or discrepancy between them. > LogCleanerManager endless loop while compacting/cleaning segments > - > > Key: KAFKA-8764 > URL: https://issues.apache.org/jira/browse/KAFKA-8764 > Project: Kafka > Issue Type: Bug > Components: log cleaner >Affects Versions: 2.3.0, 2.2.1 > Environment: docker base image: openjdk:8-jre-alpine base image, > kafka from http://ftp.carnet.hr/misc/apache/kafka/2.2.1/kafka_2.12-2.2.1.tgz >Reporter: Tomislav Rajakovic >Priority: Major > Attachments: log-cleaner-bug-reproduction.zip > > > {{LogCleanerManager stuck in endless loop while clearing segments for one > partition resulting with many log outputs and heavy disk read/writes/IOPS.}} > > Issue appeared on follower brokers, and it happens on every (new) broker if > partition assignment is changed. > > Original issue setup: > * kafka_2.12-2.2.1 deployed as statefulset on kubernetes, 5 brokers > * log directory is (AWS) EBS mounted PV, gp2 (ssd) kind of 750GiB > * 5 zookeepers > * topic created with config: > ** name = "backup_br_domain_squad" > partitions = 36 > replication_factor = 3 > config = { > "cleanup.policy" = "compact" > "min.compaction.lag.ms" = "8640" > "min.cleanable.dirty.ratio" = "0.3" > } > > > Log excerpt: > {{[2019-08-07 12:10:53,895] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:53,895] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,896] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,896] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,964] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:53,964] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,964] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,964] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,031] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:54,032] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,032] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,032] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,101] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:54,101] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,101] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted.
[jira] [Commented] (KAFKA-8764) LogCleanerManager endless loop while compacting/cleaning segments
[ https://issues.apache.org/jira/browse/KAFKA-8764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16912204#comment-16912204 ] Tomislav Rajakovic commented on KAFKA-8764: --- Btw. *solution* for this issue, while I'm waiting for someone smart to find/fix the bug, is to manually update *cleaner-offset-checkpoint.* h2. Steps: # Stop broker(s) with described issue # find partition(s) that bugged LogCleaner ** good place for search them is log-cleaner.log output, something like endless lines of this: ** [2019-08-21 09:33:32,060] INFO Cleaner 0: Building offset map for log br_domain_match-22 for 1 segments in offset range [0, 180015). (kafka.log.LogCleaner) ** [2019-08-21 09:33:32,167] INFO Cleaner 0: Building offset map for log br_domain_match-22 for 1 segments in offset range [179960, 180015). (kafka.log.LogCleaner) ** [2019-08-21 09:33:32,195] INFO Cleaner 0: Building offset map for log br_domain_match-22 for 1 segments in offset range [0, 180015). (kafka.log.LogCleaner) ** [2019-08-21 09:33:32,440] INFO Cleaner 0: Building offset map for log br_domain_match-22 for 1 segments in offset range [179960, 180015). (kafka.log.LogCleaner) ** ... ** in this case, partition br_domain_match-22 is troubled one, and LogCleaner flip-flops between [0, 180015), [179960, 180015), [0, 180015), # remember/writeup last offset for partition in ranges (in my case [179960, 180015), so it's *NEXT_CLEANING_OFFSET=180015*) # open cleaner-offset-checkpoint file (placed at root of log.dirs folder), and find line with topic-partition ** in my case, it's topic *br_domain_match*, partition *22*, so line contains *br_domain_match 22 X* ** replace *X* with *NEXT_CLEANING_OFFSET* (from point 3.) ** save file # and repeat that for each partition with issue # start broker and watch cleaner log file, with little luck, everything should be ok > LogCleanerManager endless loop while compacting/cleaning segments > - > > Key: KAFKA-8764 > URL: https://issues.apache.org/jira/browse/KAFKA-8764 > Project: Kafka > Issue Type: Bug > Components: log cleaner >Affects Versions: 2.3.0, 2.2.1 > Environment: docker base image: openjdk:8-jre-alpine base image, > kafka from http://ftp.carnet.hr/misc/apache/kafka/2.2.1/kafka_2.12-2.2.1.tgz >Reporter: Tomislav Rajakovic >Priority: Major > Attachments: log-cleaner-bug-reproduction.zip > > > {{LogCleanerManager stuck in endless loop while clearing segments for one > partition resulting with many log outputs and heavy disk read/writes/IOPS.}} > > Issue appeared on follower brokers, and it happens on every (new) broker if > partition assignment is changed. > > Original issue setup: > * kafka_2.12-2.2.1 deployed as statefulset on kubernetes, 5 brokers > * log directory is (AWS) EBS mounted PV, gp2 (ssd) kind of 750GiB > * 5 zookeepers > * topic created with config: > ** name = "backup_br_domain_squad" > partitions = 36 > replication_factor = 3 > config = { > "cleanup.policy" = "compact" > "min.compaction.lag.ms" = "8640" > "min.cleanable.dirty.ratio" = "0.3" > } > > > Log excerpt: > {{[2019-08-07 12:10:53,895] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:53,895] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,896] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,896] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,964] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:53,964] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,964] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,964] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,031] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:54,032] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegme
[jira] [Commented] (KAFKA-7329) Continuous warning message of LEADER_NOT_AVAILABLE
[ https://issues.apache.org/jira/browse/KAFKA-7329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17229373#comment-17229373 ] Tomislav Rajakovic commented on KAFKA-7329: --- Is there any reason why is this? Running kafka 2.6.0 (i docker/k8s), and still seeing endless logs when querying non-existing topic with java kafka drivers 2.6.0. > Continuous warning message of LEADER_NOT_AVAILABLE > -- > > Key: KAFKA-7329 > URL: https://issues.apache.org/jira/browse/KAFKA-7329 > Project: Kafka > Issue Type: Bug > Components: consumer, producer >Affects Versions: 2.0.0 > Environment: macOS - High Sierra; Java 1.8 >Reporter: Vasudevan Seshadri >Priority: Major > > I am running kafka version kafka_2.11-2.0.0. I have followed the instruction > mentioned in quick start and was able to run zookeeper and server (broker > with id=0) without any issues. Note: I have NOT changed any config file > entries. Everything is same as downloaded by zip file > I also have created two topics as "test" and "topic_test" > Issue: Whenever I run producer or consumer and try to publish or consume on > any of the above topics, following error is thrown continuously/non-stop: > [2018-08-22 22:36:34,380] WARN [Producer clientId=console-producer] Error > while fetching metadata with correlation id 1 : > \{topic_test=LEADER_NOT_AVAILABLE} (org.apache.kafka.clients.NetworkClient) > [2018-08-22 22:36:34,474] WARN [Producer clientId=console-producer] Error > while fetching metadata with correlation id 2 : > \{topic_test=LEADER_NOT_AVAILABLE} (org.apache.kafka.clients.NetworkClient) > [2018-08-22 22:36:34,579] WARN [Producer clientId=console-producer] Error > while fetching metadata with correlation id 3 : > \{topic_test=LEADER_NOT_AVAILABLE} (org.apache.kafka.clients.NetworkClient) > [2018-08-22 22:36:34,685] WARN [Producer clientId=console-producer] Error > while fetching metadata with correlation id 4 : > \{topic_test=LEADER_NOT_AVAILABLE} (org.apache.kafka.clients.NetworkClient) > Am I missing any settings? -- This message was sent by Atlassian Jira (v8.3.4#803005)