[jira] [Commented] (KAFKA-15609) Corrupted index uploaded to remote tier

Divij Vaidya (Jira) Thu, 19 Oct 2023 03:51:04 -0700


    [ 
https://issues.apache.org/jira/browse/KAFKA-15609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17777186#comment-17777186
 ]


Divij Vaidya commented on KAFKA-15609:
--------------------------------------

Hey Luke
The corruption we observed was observed during disk full or process crash 
scenarios. The ticket and corresponding fix at 
https://issues.apache.org/jira/browse/KAFKA-15401 by my co-worker should be 
sufficient to address that problem. We don't have complete data but from what 
we have, the exact nature of corruption is data being truncated/missing from 
index, hence, leading to failures during sanityCheck().

Also, we have another ticket https://issues.apache.org/jira/browse/KAFKA-15612 
to decide on whether we need to flush the index or not. The above conversation 
has been inconclusive because we have conflicting opinions on whether mmap 
needs a flush or not (we all agree that user data files don't need flush since 
OS provides page cache read after write guarantee). We can close this one and 
continue discussion in the ticket I linked here.

> Corrupted index uploaded to remote tier
> ---------------------------------------
>
>                 Key: KAFKA-15609
>                 URL: https://issues.apache.org/jira/browse/KAFKA-15609
>             Project: Kafka
>          Issue Type: Bug
>          Components: Tiered-Storage
>    Affects Versions: 3.6.0
>            Reporter: Divij Vaidya
>            Priority: Minor
>
> While testing Tiered Storage, we have observed corrupt indexes being present 
> in remote tier. One such situation is covered here at 
> https://issues.apache.org/jira/browse/KAFKA-15401. This Jira presents another 
> such possible case of corruption.
> Potential cause of index corruption:
> We want to ensure that the file we are passing to RSM plugin contains all the 
> data which is present in MemoryByteBuffer i.e. we should have flushed the 
> MemoryByteBuffer to the file using force(). In Kafka, when we close a 
> segment, indexes are flushed asynchronously [1]. Hence, it might be possible 
> that when we are passing the file to RSM, the file doesn't contain flushed 
> data. Hence, we may end up uploading indexes which haven't been flushed yet. 
> Ideally, the contract should enforce that we force flush the content of 
> MemoryByteBuffer before we give the file for RSM. This will ensure that 
> indexes are not corrupted/incomplete.
> [1] 
> [https://github.com/apache/kafka/blob/4150595b0a2e0f45f2827cebc60bcb6f6558745d/core/src/main/scala/kafka/log/UnifiedLog.scala#L1613]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (KAFKA-15609) Corrupted index uploaded to remote tier

Reply via email to