[ 
https://issues.apache.org/jira/browse/KAFKA-9543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17084605#comment-17084605
 ] 

Jason Gustafson commented on KAFKA-9543:
----------------------------------------

In KAFKA-9838, I fixed what I thought was a minor race condition with 
truncation. Since there is no truncation involved with a segment roll, I 
thought it was unrelated. After thinking a little more about the edge case it 
uncovered, it seems possible it can explain this issue as well. When we roll a 
segment, if the log end offset is equal to the high watermark, then both will 
be updated to point to the new segment. The problem is that the leader also 
keeps a cache of pointers to offsets which may refer to the older segment. It 
is possible that a leader update following the roll would leave the high 
watermark pointing to the older segment. This was not a problem prior to 2.4 
because we always looked up the position of the high watermark on every fetch.

As an example, let's say we have a single segment which begins at offset 0 and 
suppose that we have log end offset = high watermark = 5. The sequence of 
events is like this:

t1: Initial state
log end offset = (offset = 10, segment = 0)
high watermark = (offset = 5, segment = 0)

t2: Log is rolled
log end offset = (offset = 10, segment = 10)
high watermark = (offset = 5, segment = 0)

t3: Leader calls `maybeIncrementHighWatermark` to update high watermark to 10, 
but with a reference to the old segment:
log end offset = (offset = 10, segment = 10)
high watermark = (offset = 10, segment = 0)

I verified with a simple test case that the log can get into this state. Prior 
to the fix in KAFKA-9838, a fetch from the high watermark with the log in this 
state would result in a read of all the data from segment 10. So that opens the 
door to a race condition like the following:

1. Say one record is appended and we update the log end offset:
log end offset = (offset = 11, segment = 10)
high watermark = (offset = 10, segment = 0)

2. Now say that one more record is appended to segment 10 at offset 11, but log 
end offset is not immediately updated
3. A fetch at offset 10 returns the two new record at offset 10 and 11 because 
of the bug above.
4. A second fetch at offset 12 now returns out of range error because log end 
offset is still 11
5. Finally log end offset is updated to 12.

This is a devilishly tricky scenario to hit in a test case. I was able to do 
it, but only by introducing an artificial delay into the append logic. Still I 
think this is probably on the right track since it explains how it is possible 
to hit an out of range error with only a segment roll and also why versions 
older than 2.4 are not affected.

Unfortunately the patch for KAFKA-9838 did not get into 2.5.0, which was just 
released today. However, I've merged it into the 
[2.4|https://github.com/apache/kafka/commit/e1f18df7f6615109b6cc77b66d9be37b09256a0a]
 and 
[2.5|https://github.com/apache/kafka/commit/3b17fecc9b6af5f896ce2df4b8b6ce23cfd40f17]
 branches. If anyone who is running into this problem is willing to try one of 
these patches and let me know, I'd appreciate it. I will think a little bit 
more whether the patch fixes all problems that can be caused by this case when 
the high watermark points to the wrong segment (I think it does, but not 100% 
sure).

> Consumer offset reset after new segment rolling
> -----------------------------------------------
>
>                 Key: KAFKA-9543
>                 URL: https://issues.apache.org/jira/browse/KAFKA-9543
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 2.4.0
>            Reporter: RafaƂ Boniecki
>            Priority: Major
>         Attachments: Untitled.png, image-2020-04-06-17-10-32-636.png
>
>
> After upgrade from kafka 2.1.1 to 2.4.0, I'm experiencing unexpected consumer 
> offset resets.
> Consumer:
> {code:java}
> 2020-02-12T11:12:58.402+01:00 hostname 4a2a39a35a02 
> [2020-02-12T11:12:58,402][INFO 
> ][org.apache.kafka.clients.consumer.internals.Fetcher] [Consumer 
> clientId=logstash-1, groupId=logstash] Fetch offset 1632750575 is out of 
> range for partition stats-5, resetting offset
> {code}
> Broker:
> {code:java}
> 2020-02-12 11:12:58:400 CET INFO  
> [data-plane-kafka-request-handler-1][kafka.log.Log] [Log partition=stats-5, 
> dir=/kafka4/data] Rolled new log segment at offset 1632750565 in 2 ms.{code}
> All resets are perfectly correlated to rolling new segments at the broker - 
> segment is rolled first, then, couple of ms later, reset on the consumer 
> occurs. Attached is grafana graph with consumer lag per partition. All sudden 
> spikes in lag are offset resets due to this bug.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to