[ 
https://issues.apache.org/jira/browse/KAFKA-10313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17196402#comment-17196402
 ] 

Andrey Polyakov commented on KAFKA-10313:
-----------------------------------------

Possibly a duplicate of KAFKA-9543?

> Out of range offset errors leading to offset reset
> --------------------------------------------------
>
>                 Key: KAFKA-10313
>                 URL: https://issues.apache.org/jira/browse/KAFKA-10313
>             Project: Kafka
>          Issue Type: Bug
>          Components: consumer
>    Affects Versions: 2.2.2
>            Reporter: Varsha Abhinandan
>            Priority: Major
>
> Hi,
>   
>  We have been occasionally noticing offset resets happening on the Kafka 
> consumer because of offset out of range error. However, I don't see any 
> errors in the broker logs. No logs related to leader-election, replica lag, 
> Kafka broker pod restarts or anything. (just info logs were enabled in the 
> prod environment).
>   
>  It appeared from the logs that the out of range error was because of the 
> fetch offset being larger than the offset range on the broker. Noticed this 
> happening multiple times on different consumers, stream apps in the prod 
> environment. So, it doesn't seem like an application bug and more like a bug 
> in the KafkaConsumer. Would like to understand the cause for such errors.
>   
>  Also, none of the offset reset options are desirable. Choosing "earliest" 
> creates a sudden huge lag (we have a retention of 24hours) and choosing 
> "latest" leads to data loss (the records produced between the out of range 
> error and when offset reset happens on the consumer). So, wondering if it is 
> better for the Kafka client to separate out 'auto.offset.reset' config for 
> just offset not found. For, out of range error maybe the Kafka client can 
> automatically reset the offset to latest if the fetch offset is higher to 
> prevent data loss. Also, automatically reset it to earliest if the fetch 
> offset is lesser than the start offset. 
>   
>  Following are the logs on the consumer side :
> {noformat}
> [2020-07-17T08:46:00,322Z] [INFO ] [pipeline-thread-12 
> ([prd453-19-event-upsert]-bo-pipeline-12)] 
> [o.a.k.c.consumer.internals.Fetcher] [Consumer 
> clientId=bo-indexer-group-prd453-19-on-c19-bo-indexer-upsert-blue-5d665bcbb7-dnvkh-pid-1-kafka-message-source-id-544,
>  groupId=bo-indexer-group-prd453-19] Fetch offset 476383711 is out of range 
> for partition prd453-19-event-upsert-32, resetting offset
> [2020-07-17T08:46:00,330Z] [INFO ] [pipeline-thread-12 
> ([prd453-19-event-upsert]-bo-pipeline-12)] 
> [o.a.k.c.consumer.internals.Fetcher] [Consumer 
> clientId=bo-indexer-group-prd453-19-on-c19-bo-indexer-upsert-blue-5d665bcbb7-dnvkh-pid-1-kafka-message-source-id-544,
>  groupId=bo-indexer-group-prd453-19] Resetting offset for partition 
> prd453-19-event-upsert-32 to offset 453223789.
>   {noformat}
> Broker logs for the partition :
> {noformat}
> [2020-07-17T07:40:12,082Z]  [INFO ]  [kafka-scheduler-4]  [kafka.log.Log]  
> [Log partition=prd453-19-event-upsert-32, dir=/data/kafka] Found deletable 
> segments with base offsets [452091893] due to retention time 86400000ms breach
>  [2020-07-17T07:40:12,082Z]  [INFO ]  [kafka-scheduler-4]  [kafka.log.Log]  
> [Log partition=prd453-19-event-upsert-32, dir=/data/kafka] Scheduling log 
> segment [baseOffset 452091893, size 1073741693] for deletion.
>  [2020-07-17T07:40:12,083Z]  [INFO ]  [kafka-scheduler-4]  [kafka.log.Log]  
> [Log partition=prd453-19-event-upsert-32, dir=/data/kafka] Incrementing log 
> start offset to 453223789
>  [2020-07-17T07:41:12,083Z]  [INFO ]  [kafka-scheduler-7]  [kafka.log.Log]  
> [Log partition=prd453-19-event-upsert-32, dir=/data/kafka] Deleting segment 
> 452091893
>  [2020-07-17T07:41:12,114Z]  [INFO ]  [kafka-scheduler-7]  
> [kafka.log.LogSegment]  Deleted log 
> /data/kafka/prd453-19-event-upsert-32/00000000000452091893.log.deleted.
>  [2020-07-17T07:41:12,114Z]  [INFO ]  [kafka-scheduler-7]  
> [kafka.log.LogSegment]  Deleted offset index 
> /data/kafka/prd453-19-event-upsert-32/00000000000452091893.index.deleted.
>  [2020-07-17T07:41:12,114Z]  [INFO ]  [kafka-scheduler-7]  
> [kafka.log.LogSegment]  Deleted time index 
> /data/kafka/prd453-19-event-upsert-32/00000000000452091893.timeindex.deleted.
>  [2020-07-17T07:52:31,836Z]  [INFO ]  [data-plane-kafka-request-handler-3]  
> [kafka.log.ProducerStateManager]  [ProducerStateManager 
> partition=prd453-19-event-upsert-32] Writing producer snapshot at offset 
> 475609786
>  [2020-07-17T07:52:31,836Z]  [INFO ]  [data-plane-kafka-request-handler-3]  
> [kafka.log.Log]  [Log partition=prd453-19-event-upsert-32, dir=/data/kafka] 
> Rolled new log segment at offset 475609786 in 1 ms.{noformat}
>  
> {noformat}
> [2020-07-17T09:05:12,075Z]  [INFO ]  [kafka-scheduler-2]  [kafka.log.Log]  
> [Log partition=prd453-19-event-upsert-32, dir=/data/kafka] Found deletable 
> segments with base offsets [453223789] due to retention time 86400000ms breach
>  [2020-07-17T09:05:12,075Z]  [INFO ]  [kafka-scheduler-2]  [kafka.log.Log]  
> [Log partition=prd453-19-event-upsert-32, dir=/data/kafka] Scheduling log 
> segment [baseOffset 453223789, size 1073741355] for deletion.
>  [2020-07-17T09:05:12,075Z]  [INFO ]  [kafka-scheduler-2]  [kafka.log.Log]  
> [Log partition=prd453-19-event-upsert-32, dir=/data/kafka] Incrementing log 
> start offset to 454388428
>  [2020-07-17T09:06:12,075Z]  [INFO ]  [kafka-scheduler-6]  [kafka.log.Log]  
> [Log partition=prd453-19-event-upsert-32, dir=/data/kafka] Deleting segment 
> 453223789
>  [2020-07-17T09:06:12,108Z]  [INFO ]  [kafka-scheduler-6]  
> [kafka.log.LogSegment]  Deleted log 
> /data/kafka/prd453-19-event-upsert-32/00000000000453223789.log.deleted.
>  [2020-07-17T09:06:12,108Z]  [INFO ]  [kafka-scheduler-6]  
> [kafka.log.LogSegment]  Deleted offset index 
> /data/kafka/prd453-19-event-upsert-32/00000000000453223789.index.deleted.
>  [2020-07-17T09:06:12,108Z]  [INFO ]  [kafka-scheduler-6]  
> [kafka.log.LogSegment]  Deleted time index 
> /data/kafka/prd453-19-event-upsert-32/00000000000453223789.timeindex.deleted.
>  [2020-07-17T09:06:56,682Z]  [INFO ]  [data-plane-kafka-request-handler-1]  
> [kafka.log.ProducerStateManager]  [ProducerStateManager 
> partition=prd453-19-event-upsert-32] Writing producer snapshot at offset 
> 476681303
>  [2020-07-17T09:06:56,683Z]  [INFO ]  [data-plane-kafka-request-handler-1]  
> [kafka.log.Log]  [Log partition=prd453-19-event-upsert-32, dir=/data/kafka] 
> Rolled new log segment at offset 476681303 in 1 ms.  
> {noformat}
>  
> Kafka version : 2.2.2
> Number of brokers : 9
> Log retention of the topic : 24 hours
> Few broker configs :
> min.insync.replicas = 1
>  default.replication.factor = 3
>  unclean.leader.election.enable = false
>  auto.leader.rebalance.enable = true
>  log.cleaner.delete.retention.ms = 86400000
>  log.cleaner.enable = true
>  log.retention.hours = 24
>  replica.high.watermark.checkpoint.interval.ms = 5000
>  replica.lag.time.max.ms = 25000
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to