[ 
https://issues.apache.org/jira/browse/KAFKA-16073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17801663#comment-17801663
 ] 

Satish Duggana commented on KAFKA-16073:
----------------------------------------

Thanks [~hzh0425@apache] for filing JIRA with a detailed description. 

I am trying to summarize the scenario that you mentioned earlier in JIRA 
description with an example. Let me know if I am missing anything here. 
Let us assume each segment has one offset in this example.

log start offset                                0
log end offset                                  10
local log start offset                  4 
fetch offset                                    6 
new local log start offset              7

Deletion based on retention configs is started and eventually updating the 
local log start offset as 7.

There is a race condition here where the segments list is updated by removing 
4, 5, and 6 offset segments in LocalLog and then updates the 
local-log-start-offset. But fetch offset is being served concurrently and it 
may throw OffsetOutOfRangeException if the inmemory segments are already 
removed in LocalLog and local-log-start-offset is not yet updated as 7 when it 
executes the 
[code|https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/server/ReplicaManager.scala#L1866]
 as it fails the condition because fetch offset(6) < old 
local-log-start-offset(4).


> Kafka Tiered Storage Bug: Consumer Fetch Error Due to Delayed 
> localLogStartOffset Update During Segment Deletion
> ----------------------------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-16073
>                 URL: https://issues.apache.org/jira/browse/KAFKA-16073
>             Project: Kafka
>          Issue Type: Bug
>          Components: core, Tiered-Storage
>    Affects Versions: 3.6.1
>            Reporter: hzh0425
>            Assignee: hzh0425
>            Priority: Major
>              Labels: KIP-405, kip-405, tiered-storage
>             Fix For: 3.6.1
>
>
> The identified bug in Apache Kafka's tiered storage feature involves a 
> delayed update of {{localLogStartOffset}} in the 
> {{UnifiedLog.deleteSegments}} method, impacting consumer fetch operations. 
> When segments are deleted from the log's memory state, the 
> {{localLogStartOffset}} isn't promptly updated. Concurrently, 
> {{ReplicaManager.handleOffsetOutOfRangeError}} checks if a consumer's fetch 
> offset is less than the {{{}localLogStartOffset{}}}. If it's greater, Kafka 
> erroneously sends an {{OffsetOutOfRangeException}} to the consumer.
> In a specific concurrent scenario, imagine sequential offsets: {{{}offset1 < 
> offset2 < offset3{}}}. A client requests data at {{{}offset2{}}}. While a 
> background deletion process removes segments from memory, it hasn't yet 
> updated the {{LocalLogStartOffset}} from {{offset1}} to {{{}offset3{}}}. 
> Consequently, when the fetch offset ({{{}offset2{}}}) is evaluated against 
> the stale {{offset1}} in {{{}ReplicaManager.handleOffsetOutOfRangeError{}}}, 
> it incorrectly triggers an {{{}OffsetOutOfRangeException{}}}. This issue 
> arises from the out-of-sync update of {{{}localLogStartOffset{}}}, leading to 
> incorrect handling of consumer fetch requests and potential data access 
> errors.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to