[
https://issues.apache.org/jira/browse/KAFKA-10313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Varsha Abhinandan updated KAFKA-10313:
--------------------------------------
Priority: Major (was: Minor)
> Out of range offset errors leading to offset reset
> --------------------------------------------------
>
> Key: KAFKA-10313
> URL: https://issues.apache.org/jira/browse/KAFKA-10313
> Project: Kafka
> Issue Type: Bug
> Components: consumer
> Affects Versions: 2.2.2
> Reporter: Varsha Abhinandan
> Priority: Major
>
> Hi,
>
> We have been occasionally noticing offset resets happening on the Kafka
> consumer because of offset out of range error. However, I don't see any
> errors in the broker logs. No logs related to leader-election, replica lag,
> Kafka broker pod restarts or anything. (just info logs were enabled in the
> prod environment).
>
> It appeared from the logs that the out of range error was because of the
> fetch offset being larger than the offset range on the broker. Noticed this
> happening multiple times on different consumers, stream apps in the prod
> environment. So, it doesn't seem like an application bug and more like a bug
> in the KafkaConsumer. Would like to understand the cause for such errors.
>
> Also, none of the offset reset options are desirable. Choosing "earliest"
> creates a sudden huge lag (we have a retention of 24hours) and choosing
> "latest" leads to data loss (the records produced between the out of range
> error and when offset reset happens on the consumer). So, wondering if it is
> better for the Kafka client to separate out 'auto.offset.reset' config for
> just offset not found. For, out of range error maybe the Kafka client can
> automatically reset the offset to latest if the fetch offset is higher to
> prevent data loss. Also, automatically reset it to earliest if the fetch
> offset is lesser than the start offset.
>
>
> Following are the logs on the consumer side :
>
> {noformat}
> [2020-07-17T08:46:00,322Z] [INFO ] [pipeline-thread-12
> ([prd453-19-event-upsert]-bo-pipeline-12)]
> [o.a.k.c.consumer.internals.Fetcher] [Consumer
> clientId=bo-indexer-group-prd453-19-on-c19-bo-indexer-upsert-blue-5d665bcbb7-dnvkh-pid-1-kafka-message-source-id-544,
> groupId=bo-indexer-group-prd453-19] Fetch offset 476383711 is out of range
> for partition prd453-19-event-upsert-32, resetting offset
> [2020-07-17T08:46:00,330Z] [INFO ] [pipeline-thread-12
> ([prd453-19-event-upsert]-bo-pipeline-12)]
> [o.a.k.c.consumer.internals.Fetcher] [Consumer
> clientId=bo-indexer-group-prd453-19-on-c19-bo-indexer-upsert-blue-5d665bcbb7-dnvkh-pid-1-kafka-message-source-id-544,
> groupId=bo-indexer-group-prd453-19] Resetting offset for partition
> prd453-19-event-upsert-32 to offset 453223789.
> {noformat}
> Broker logs for the partition :
> {noformat}
> [2020-07-17T07:40:12,082Z] [INFO ] [kafka-scheduler-4] [kafka.log.Log]
> [Log partition=prd453-19-event-upsert-32, dir=/data/kafka] Found deletable
> segments with base offsets [452091893] due to retention time 86400000ms breach
> [2020-07-17T07:40:12,082Z] [INFO ] [kafka-scheduler-4] [kafka.log.Log]
> [Log partition=prd453-19-event-upsert-32, dir=/data/kafka] Scheduling log
> segment [baseOffset 452091893, size 1073741693] for deletion.
> [2020-07-17T07:40:12,083Z] [INFO ] [kafka-scheduler-4] [kafka.log.Log]
> [Log partition=prd453-19-event-upsert-32, dir=/data/kafka] Incrementing log
> start offset to 453223789
> [2020-07-17T07:41:12,083Z] [INFO ] [kafka-scheduler-7] [kafka.log.Log]
> [Log partition=prd453-19-event-upsert-32, dir=/data/kafka] Deleting segment
> 452091893
> [2020-07-17T07:41:12,114Z] [INFO ] [kafka-scheduler-7]
> [kafka.log.LogSegment] Deleted log
> /data/kafka/prd453-19-event-upsert-32/00000000000452091893.log.deleted.
> [2020-07-17T07:41:12,114Z] [INFO ] [kafka-scheduler-7]
> [kafka.log.LogSegment] Deleted offset index
> /data/kafka/prd453-19-event-upsert-32/00000000000452091893.index.deleted.
> [2020-07-17T07:41:12,114Z] [INFO ] [kafka-scheduler-7]
> [kafka.log.LogSegment] Deleted time index
> /data/kafka/prd453-19-event-upsert-32/00000000000452091893.timeindex.deleted.
> [2020-07-17T07:52:31,836Z] [INFO ] [data-plane-kafka-request-handler-3]
> [kafka.log.ProducerStateManager] [ProducerStateManager
> partition=prd453-19-event-upsert-32] Writing producer snapshot at offset
> 475609786
> [2020-07-17T07:52:31,836Z] [INFO ] [data-plane-kafka-request-handler-3]
> [kafka.log.Log] [Log partition=prd453-19-event-upsert-32, dir=/data/kafka]
> Rolled new log segment at offset 475609786 in 1 ms.{noformat}
>
> {noformat}
> [2020-07-17T09:05:12,075Z] [INFO ] [kafka-scheduler-2] [kafka.log.Log]
> [Log partition=prd453-19-event-upsert-32, dir=/data/kafka] Found deletable
> segments with base offsets [453223789] due to retention time 86400000ms breach
> [2020-07-17T09:05:12,075Z] [INFO ] [kafka-scheduler-2] [kafka.log.Log]
> [Log partition=prd453-19-event-upsert-32, dir=/data/kafka] Scheduling log
> segment [baseOffset 453223789, size 1073741355] for deletion.
> [2020-07-17T09:05:12,075Z] [INFO ] [kafka-scheduler-2] [kafka.log.Log]
> [Log partition=prd453-19-event-upsert-32, dir=/data/kafka] Incrementing log
> start offset to 454388428
> [2020-07-17T09:06:12,075Z] [INFO ] [kafka-scheduler-6] [kafka.log.Log]
> [Log partition=prd453-19-event-upsert-32, dir=/data/kafka] Deleting segment
> 453223789
> [2020-07-17T09:06:12,108Z] [INFO ] [kafka-scheduler-6]
> [kafka.log.LogSegment] Deleted log
> /data/kafka/prd453-19-event-upsert-32/00000000000453223789.log.deleted.
> [2020-07-17T09:06:12,108Z] [INFO ] [kafka-scheduler-6]
> [kafka.log.LogSegment] Deleted offset index
> /data/kafka/prd453-19-event-upsert-32/00000000000453223789.index.deleted.
> [2020-07-17T09:06:12,108Z] [INFO ] [kafka-scheduler-6]
> [kafka.log.LogSegment] Deleted time index
> /data/kafka/prd453-19-event-upsert-32/00000000000453223789.timeindex.deleted.
> [2020-07-17T09:06:56,682Z] [INFO ] [data-plane-kafka-request-handler-1]
> [kafka.log.ProducerStateManager] [ProducerStateManager
> partition=prd453-19-event-upsert-32] Writing producer snapshot at offset
> 476681303
> [2020-07-17T09:06:56,683Z] [INFO ] [data-plane-kafka-request-handler-1]
> [kafka.log.Log] [Log partition=prd453-19-event-upsert-32, dir=/data/kafka]
> Rolled new log segment at offset 476681303 in 1 ms.
> {noformat}
>
> Kafka version : 2.2.2
> Number of brokers : 9
> Log retention of the topic : 24 hours
> Few broker configs :
> min.insync.replicas = 1
> default.replication.factor = 3
> unclean.leader.election.enable = false
> auto.leader.rebalance.enable = true
> log.cleaner.delete.retention.ms = 86400000
> log.cleaner.enable = true
> log.retention.hours = 24
> replica.high.watermark.checkpoint.interval.ms = 5000
> replica.lag.time.max.ms = 25000
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)