[ 
https://issues.apache.org/jira/browse/KAFKA-9543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17047711#comment-17047711
 ] 

Rafał Boniecki edited comment on KAFKA-9543 at 2/28/20 3:08 PM:
----------------------------------------------------------------

I cannot reproduce it on my development environment.  Couple of facts to add 
what Brian wrote:
 * This indeed does not happen on every segment rollover, but when it happens 
it is always on segment rollover
 * We have no compacted topics in our production cluster, so topic type doesn't 
matter.
 * No topic in our production environment starts at offset 0 - so this doesn't 
matter as well.
 * Topic where we definetly seen this happen has about 5MB/s traffic (so not 
that much traffic)
 * Fetch offset ... is out of range for partition is always about offset "from 
the future". Kafka broker does not have this offset in log (at least according 
to data we gather from jmx). This suggests that maybe offsets are incorrectly 
cached or cache update has race condition. Also notice that before update 
client had 0 lag (you can see this in my attached screenshot), so probably this 
is crucial to reproduce this bug - you have to be reading top of the log 
all/most of the time to hit this.
 * we tested this in our development environment, where we load generated about 
5MB/s  traffic (using kafka-producer-perf-test.sh) and read it back (using 
identically , as in production environment, configured consumer) at the same 
time as it was written and cannot reproduce this. Test ran for 3 days non stop 
- we looked for offset resets and there were none.


was (Author: boniek):
I cannot reproduce it on my development environment.  Couple of facts to add 
what Brian wrote:
 * This indeed does not happen on every segment rollover, but when it happens 
it is always on segment rollover
 * We have no compacted topics in our production cluster, so topic type doesn't 
matter.
 * No topic in our production environment starts at offset 0 - so this doesn't 
matter as well.
 * Topic where we definetly seen this happen has about 5MB/s traffic (so not 
that much traffic)
 * Fetch offset ... is out of range for partition is always about offset "from 
the future". Kafka broker does not have this offset in log (at least according 
to data we gather from jmx). This suggests that maybe offsets are incorrectly 
cached or cache update has race condition. Also notice that before update 
client had 0 lag, so probably this is crucial to reproduce this bug - you have 
to be reading top of the log all/most of the time to hit this.
 * we tested this in our development environment, where we load generated about 
5MB/s  traffic (using kafka-producer-perf-test.sh) and read it back (using 
identically , as in production environment, configured consumer) at the same 
time as it was written and cannot reproduce this. Test ran for 3 days non stop 
- we looked for offset resets and there were none.

> Consumer offset reset after new segment rolling
> -----------------------------------------------
>
>                 Key: KAFKA-9543
>                 URL: https://issues.apache.org/jira/browse/KAFKA-9543
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 2.4.0
>            Reporter: Rafał Boniecki
>            Priority: Major
>         Attachments: Untitled.png
>
>
> After upgrade from kafka 2.1.1 to 2.4.0, I'm experiencing unexpected consumer 
> offset resets.
> Consumer:
> {code:java}
> 2020-02-12T11:12:58.402+01:00 hostname 4a2a39a35a02 
> [2020-02-12T11:12:58,402][INFO 
> ][org.apache.kafka.clients.consumer.internals.Fetcher] [Consumer 
> clientId=logstash-1, groupId=logstash] Fetch offset 1632750575 is out of 
> range for partition stats-5, resetting offset
> {code}
> Broker:
> {code:java}
> 2020-02-12 11:12:58:400 CET INFO  
> [data-plane-kafka-request-handler-1][kafka.log.Log] [Log partition=stats-5, 
> dir=/kafka4/data] Rolled new log segment at offset 1632750565 in 2 ms.{code}
> All resets are perfectly correlated to rolling new segments at the broker - 
> segment is rolled first, then, couple of ms later, reset on the consumer 
> occurs. Attached is grafana graph with consumer lag per partition. All sudden 
> spikes in lag are offset resets due to this bug.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to