[jira] [Commented] (KAFKA-9824) Consumer loses partition offset and resets post 2.4.1 version upgrade

2020-04-15 Thread Seva Feldman (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17083924#comment-17083924
 ] 

Seva Feldman commented on KAFKA-9824:
-

I've seen it happened while aws spot node (consumer) has been taken. can it
be related to forceful consumer termination?

On Tue, Apr 14, 2020 at 2:20 AM Jason Gustafson (Jira) 



-- 
Regards,
Seva Feldman


> Consumer loses partition offset and resets post 2.4.1 version upgrade
> -
>
> Key: KAFKA-9824
> URL: https://issues.apache.org/jira/browse/KAFKA-9824
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 2.4.1
>Reporter: Nitay Kufert
>Priority: Major
> Attachments: image-2020-04-06-13-14-47-014.png
>
>
> Hello,
>  around 2 weeks ago we upgraded our Kafka clients & brokers to 2.4.1 (from 
> 2.3.1), 
>  and we started noticing a troubling behavior that we didn't see before:
>   
>  Without apparent reason, a specific partition on a specific consumer loses 
> its offset and start re-consuming the entire partition from the beginning 
> (according to the retention).
>   
>  Messages appearing on the consumer (client):
> {quote}Apr 5, 2020 @ 14:54:47.327 INFO sonic-fire-attribution [Consumer 
> clientId=consumer-fireAttributionConsumerGroup4-2, 
> groupId=fireAttributionConsumerGroup4] Resetting offset for partition 
> trackingSolutionAttribution-48 to offset 1216430527.
> {quote}
> {quote}Apr 5, 2020 @ 14:54:46.797 INFO sonic-fire-attribution [Consumer 
> clientId=consumer-fireAttributionConsumerGroup4-2, 
> groupId=fireAttributionConsumerGroup4] Fetch offset 1222791071 is out of 
> range for partition trackingSolutionAttribution-48
> {quote}
> Those are the logs from the brokers at the same time (searched for 
> "trackingSolutionAttribution-48" OR "fireAttributionConsumerGroup4")
> {quote}Apr 5, 2020 @ 14:54:46.801 INFO Writing producer snapshot at offset 
> 1222791065
>   
>  Apr 5, 2020 @ 14:54:46.801 INFO Writing producer snapshot at offset 
> 1222791065
>   
>  Apr 5, 2020 @ 14:54:46.801 INFO Rolled new log segment at offset 1222791065 
> in 0 ms.
>   
>  Apr 5, 2020 @ 14:54:04.400 INFO BrokerId 1033 is no longer a coordinator for 
> the group fireAttributionConsumerGroup4. Proceeding cleanup for other alive 
> groups
>   
>  Apr 5, 2020 @ 14:54:04.400 INFO BrokerId 1033 is no longer a coordinator for 
> the group fireAttributionConsumerGroup4. Proceeding cleanup for other alive 
> groups
> {quote}
> Another way to see the same thing, from our monitoring (DD) on the partition 
> offset:
> !image-2020-04-06-13-14-47-014.png|width=530,height=152!
> The recovery you are seeing is after I run partition offset reset manually 
> (using kafka-consumer-groups.sh --bootstrap-server localhost:9092 --topic 
> trackingSolutionAttribution:57 --group fireAttributionConsumerGroup4 
> --reset-offsets --to-datetime 'SOME DATE')
>   
>  Any idea what can be causing this? we have it happen to us at least 5 times 
> since the upgrade, and before that, I don't remember it ever happening to us.
>   
>  Topic config is set to default, except the retention, which is manually set 
> to 4320.
>  The topic has 60 partitions & a replication factor of 2. 
>   
>  Consumer config:
> {code:java}
> ConsumerConfig values:
>   allow.auto.create.topics = true
>   auto.commit.interval.ms = 5000
>   auto.offset.reset = earliest
>   bootstrap.servers = [..]
>   check.crcs = true
>   client.dns.lookup = default
>   client.id =
>   client.rack =
>   connections.max.idle.ms = 54
>   default.api.timeout.ms = 6
>   enable.auto.commit = true
>   exclude.internal.topics = true
>   fetch.max.bytes = 52428800
>   fetch.max.wait.ms = 500
>   fetch.min.bytes = 1
>   group.id = fireAttributionConsumerGroup4
>   group.instance.id = null
>   heartbeat.interval.ms = 1
>   interceptor.classes = []
>   internal.leave.group.on.close = true
>   isolation.level = read_uncommitted
>   key.deserializer = class 
> org.apache.kafka.common.serialization.ByteArrayDeserializer
>   max.partition.fetch.bytes = 1048576
>   max.poll.interval.ms = 30
>   max.poll.records = 500
>   metadata.max.age.ms = 30
>   metric.reporters = []
>   metrics.num.samples = 2
>   metrics.recording.level = INFO
>   metrics.sample.window.ms = 3
>   partition.assignment.strategy = [class 
> org.apache.kafka.clients.consumer.RangeAssignor]
>   receive.buffer.bytes = 65536
>   reconnect.backoff.max.ms = 1000
>   reconnect.backoff.ms = 50
>   request.timeout.ms = 3
>   retry.backoff.ms = 100
>   sasl.client.callback.handler.class = null
>   sasl.jaas.config = null
>   sasl.kerberos.kinit.cmd = /usr/bin/kinit
>

[jira] [Commented] (KAFKA-8764) LogCleanerManager endless loop while compacting/cleaning segments

2020-01-10 Thread Seva Feldman (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-8764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17012667#comment-17012667
 ] 

Seva Feldman commented on KAFKA-8764:
-

Hi,

We have the exact same issue with __consumer_offset compacted topic which kills 
our consumer groups. Thanks, [~trajakovic], for the solution on manually update 
*cleaner-offset-checkpoint file.*

BR

> LogCleanerManager endless loop while compacting/cleaning segments
> -
>
> Key: KAFKA-8764
> URL: https://issues.apache.org/jira/browse/KAFKA-8764
> Project: Kafka
>  Issue Type: Bug
>  Components: log cleaner
>Affects Versions: 2.3.0, 2.2.1
> Environment: docker base image: openjdk:8-jre-alpine base image, 
> kafka from http://ftp.carnet.hr/misc/apache/kafka/2.2.1/kafka_2.12-2.2.1.tgz
>Reporter: Tomislav Rajakovic
>Priority: Major
> Attachments: log-cleaner-bug-reproduction.zip
>
>
> {{LogCleanerManager stuck in endless loop while clearing segments for one 
> partition resulting with many log outputs and heavy disk read/writes/IOPS.}}
>  
> Issue appeared on follower brokers, and it happens on every (new) broker if 
> partition assignment is changed.
>  
> Original issue setup:
>  * kafka_2.12-2.2.1 deployed as statefulset on kubernetes, 5 brokers
>  * log directory is (AWS) EBS mounted PV, gp2 (ssd) kind of 750GiB
>  * 5 zookeepers
>  * topic created with config:
>  ** name = "backup_br_domain_squad"
> partitions = 36
> replication_factor = 3
> config = {
>  "cleanup.policy" = "compact"
>  "min.compaction.lag.ms" = "8640"
>  "min.cleanable.dirty.ratio" = "0.3"
> }
>  
>  
> Log excerpt:
> {{[2019-08-07 12:10:53,895] INFO [Log partition=backup_br_domain_squad-14, 
> dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}}
> {{[2019-08-07 12:10:53,895] INFO Deleted log 
> /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted.
>  (kafka.log.LogSegment)}}
> {{[2019-08-07 12:10:53,896] INFO Deleted offset index 
> /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted.
>  (kafka.log.LogSegment)}}
> {{[2019-08-07 12:10:53,896] INFO Deleted time index 
> /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted.
>  (kafka.log.LogSegment)}}
> {{[2019-08-07 12:10:53,964] INFO [Log partition=backup_br_domain_squad-14, 
> dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}}
> {{[2019-08-07 12:10:53,964] INFO Deleted log 
> /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted.
>  (kafka.log.LogSegment)}}
> {{[2019-08-07 12:10:53,964] INFO Deleted offset index 
> /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted.
>  (kafka.log.LogSegment)}}
> {{[2019-08-07 12:10:53,964] INFO Deleted time index 
> /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted.
>  (kafka.log.LogSegment)}}
> {{[2019-08-07 12:10:54,031] INFO [Log partition=backup_br_domain_squad-14, 
> dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}}
> {{[2019-08-07 12:10:54,032] INFO Deleted log 
> /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted.
>  (kafka.log.LogSegment)}}
> {{[2019-08-07 12:10:54,032] INFO Deleted offset index 
> /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted.
>  (kafka.log.LogSegment)}}
> {{[2019-08-07 12:10:54,032] INFO Deleted time index 
> /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted.
>  (kafka.log.LogSegment)}}
> {{[2019-08-07 12:10:54,101] INFO [Log partition=backup_br_domain_squad-14, 
> dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}}
> {{[2019-08-07 12:10:54,101] INFO Deleted log 
> /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted.
>  (kafka.log.LogSegment)}}
> {{[2019-08-07 12:10:54,101] INFO Deleted offset index 
> /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted.
>  (kafka.log.LogSegment)}}
> {{[2019-08-07 12:10:54,101] INFO Deleted time index 
> /var/lib/kafka/data/topics/backup_br_domain_squad-14/.timeindex.deleted.
>  (kafka.log.LogSegment)}}
> {{[2019-08-07 12:10:54,173] INFO [Log partition=backup_br_domain_squad-14, 
> dir=/var/lib/kafka/data/topics] Deleting segment 0 (kafka.log.Log)}}
> {{[2019-08-07 12:10:54,173] INFO Deleted log 
> /var/lib/kafka/data/topics/backup_br_domain_squad-14/.log.deleted.
>  (kafka.log.LogSegment)}}
> {{[2019-08-07 12:10:54,173] INFO Deleted offset index 
> /var/lib/kafka/data/topics/backup_br_domain_squad-14/.index.deleted.
>  (kafka.log.LogSegment)}}
>