[jira] [Commented] (KAFKA-9824) Consumer loses partition offset and resets post 2.4.1 version upgrade
[ https://issues.apache.org/jira/browse/KAFKA-9824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17083924#comment-17083924 ] Seva Feldman commented on KAFKA-9824: - I've seen it happened while aws spot node (consumer) has been taken. can it be related to forceful consumer termination? On Tue, Apr 14, 2020 at 2:20 AM Jason Gustafson (Jira) -- Regards, Seva Feldman > Consumer loses partition offset and resets post 2.4.1 version upgrade > - > > Key: KAFKA-9824 > URL: https://issues.apache.org/jira/browse/KAFKA-9824 > Project: Kafka > Issue Type: Bug >Affects Versions: 2.4.1 >Reporter: Nitay Kufert >Priority: Major > Attachments: image-2020-04-06-13-14-47-014.png > > > Hello, > around 2 weeks ago we upgraded our Kafka clients & brokers to 2.4.1 (from > 2.3.1), > and we started noticing a troubling behavior that we didn't see before: > > Without apparent reason, a specific partition on a specific consumer loses > its offset and start re-consuming the entire partition from the beginning > (according to the retention). > > Messages appearing on the consumer (client): > {quote}Apr 5, 2020 @ 14:54:47.327 INFO sonic-fire-attribution [Consumer > clientId=consumer-fireAttributionConsumerGroup4-2, > groupId=fireAttributionConsumerGroup4] Resetting offset for partition > trackingSolutionAttribution-48 to offset 1216430527. > {quote} > {quote}Apr 5, 2020 @ 14:54:46.797 INFO sonic-fire-attribution [Consumer > clientId=consumer-fireAttributionConsumerGroup4-2, > groupId=fireAttributionConsumerGroup4] Fetch offset 1222791071 is out of > range for partition trackingSolutionAttribution-48 > {quote} > Those are the logs from the brokers at the same time (searched for > "trackingSolutionAttribution-48" OR "fireAttributionConsumerGroup4") > {quote}Apr 5, 2020 @ 14:54:46.801 INFO Writing producer snapshot at offset > 1222791065 > > Apr 5, 2020 @ 14:54:46.801 INFO Writing producer snapshot at offset > 1222791065 > > Apr 5, 2020 @ 14:54:46.801 INFO Rolled new log segment at offset 1222791065 > in 0 ms. > > Apr 5, 2020 @ 14:54:04.400 INFO BrokerId 1033 is no longer a coordinator for > the group fireAttributionConsumerGroup4. Proceeding cleanup for other alive > groups > > Apr 5, 2020 @ 14:54:04.400 INFO BrokerId 1033 is no longer a coordinator for > the group fireAttributionConsumerGroup4. Proceeding cleanup for other alive > groups > {quote} > Another way to see the same thing, from our monitoring (DD) on the partition > offset: > !image-2020-04-06-13-14-47-014.png|width=530,height=152! > The recovery you are seeing is after I run partition offset reset manually > (using kafka-consumer-groups.sh --bootstrap-server localhost:9092 --topic > trackingSolutionAttribution:57 --group fireAttributionConsumerGroup4 > --reset-offsets --to-datetime 'SOME DATE') > > Any idea what can be causing this? we have it happen to us at least 5 times > since the upgrade, and before that, I don't remember it ever happening to us. > > Topic config is set to default, except the retention, which is manually set > to 4320. > The topic has 60 partitions & a replication factor of 2. > > Consumer config: > {code:java} > ConsumerConfig values: > allow.auto.create.topics = true > auto.commit.interval.ms = 5000 > auto.offset.reset = earliest > bootstrap.servers = [..] > check.crcs = true > client.dns.lookup = default > client.id = > client.rack = > connections.max.idle.ms = 54 > default.api.timeout.ms = 6 > enable.auto.commit = true > exclude.internal.topics = true > fetch.max.bytes = 52428800 > fetch.max.wait.ms = 500 > fetch.min.bytes = 1 > group.id = fireAttributionConsumerGroup4 > group.instance.id = null > heartbeat.interval.ms = 1 > interceptor.classes = [] > internal.leave.group.on.close = true > isolation.level = read_uncommitted > key.deserializer = class > org.apache.kafka.common.serialization.ByteArrayDeserializer > max.partition.fetch.bytes = 1048576 > max.poll.interval.ms = 30 > max.poll.records = 500 > metadata.max.age.ms = 30 > metric.reporters = [] > metrics.num.samples = 2 > metrics.recording.level = INFO > metrics.sample.window.ms = 3 > partition.assignment.strategy = [class > org.apache.kafka.clients.consumer.RangeAssignor] > receive.buffer.bytes = 65536 > reconnect.backoff.max.ms = 1000 > reconnect.backoff.ms = 50 > request.timeout.ms = 3 > retry.backoff.ms = 100 > sasl.client.callback.handler.class = null > sasl.jaas.config = null > sasl.kerberos.kinit.cmd = /usr/bin/kinit >
[jira] [Commented] (KAFKA-8764) LogCleanerManager endless loop while compacting/cleaning segments
[ https://issues.apache.org/jira/browse/KAFKA-8764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17012667#comment-17012667 ] Seva Feldman commented on KAFKA-8764: - Hi, We have the exact same issue with __consumer_offset compacted topic which kills our consumer groups. Thanks, [~trajakovic], for the solution on manually update *cleaner-offset-checkpoint file.* BR > LogCleanerManager endless loop while compacting/cleaning segments > - > > Key: KAFKA-8764 > URL: https://issues.apache.org/jira/browse/KAFKA-8764 > Project: Kafka > Issue Type: Bug > Components: log cleaner >Affects Versions: 2.3.0, 2.2.1 > Environment: docker base image: openjdk:8-jre-alpine base image, > kafka from http://ftp.carnet.hr/misc/apache/kafka/2.2.1/kafka_2.12-2.2.1.tgz >Reporter: Tomislav Rajakovic >Priority: Major > Attachments: log-cleaner-bug-reproduction.zip > > > {{LogCleanerManager stuck in endless loop while clearing segments for one > partition resulting with many log outputs and heavy disk read/writes/IOPS.}} > > Issue appeared on follower brokers, and it happens on every (new) broker if > partition assignment is changed. > > Original issue setup: > * kafka_2.12-2.2.1 deployed as statefulset on kubernetes, 5 brokers > * log directory is (AWS) EBS mounted PV, gp2 (ssd) kind of 750GiB > * 5 zookeepers > * topic created with config: > ** name = "backup_br_domain_squad" > partitions = 36 > replication_factor = 3 > config = { > "cleanup.policy" = "compact" > "min.compaction.lag.ms" = "8640" > "min.cleanable.dirty.ratio" = "0.3" > } > > > Log excerpt: > {{[2019-08-07 12:10:53,895] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:53,895] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,896] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,896] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,964] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:53,964] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,964] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:53,964] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,031] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:54,032] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,032] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,032] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,101] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:54,101] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,101] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,101] INFO Deleted time index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,173] INFO [Log partition=backup_br_domain_squad-14, > dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}} > {{[2019-08-07 12:10:54,173] INFO Deleted log > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted. > (kafka.log.LogSegment)}} > {{[2019-08-07 12:10:54,173] INFO Deleted offset index > /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted. > (kafka.log.LogSegment)}} >