[ https://issues.apache.org/jira/browse/KAFKA-15609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17782229#comment-17782229 ]
Alexandre Dupriez edited comment on KAFKA-15609 at 11/2/23 5:19 PM: -------------------------------------------------------------------- The nature - private or shared - of a memory mapping have visibility implications between processes, but from within the same process read-after-write consistency should always be guaranteed. "Flushing" a memory-mapped file to the block device can be initiated with the {{msync}} syscall but that operation is not necessary for the visibility guarantees which are questioned in this ticket. A succinct description of memory mapping can be found in {_}Understanding the Linux Kernel, Third Edition{_}, edition O'Reilly, pages 657-668. was (Author: adupriez): The nature - private or shared - of a memory mapping have visibility implications between processes, but from within the same process read-after-write consistency should always be guaranteed. "Flushing" a memory-mapped file to the block device can be initiated with the {{msync}} syscall but that operation is not necessary for the visibility guarantees which are questioned in this ticket. A succinct description of memory mapping and can be found in {_}Understanding the Linux Kernel, Third Edition{_}, edition O'Reilly, pages 657-668. > Corrupted index uploaded to remote tier > --------------------------------------- > > Key: KAFKA-15609 > URL: https://issues.apache.org/jira/browse/KAFKA-15609 > Project: Kafka > Issue Type: Bug > Components: Tiered-Storage > Affects Versions: 3.6.0 > Reporter: Divij Vaidya > Priority: Minor > > While testing Tiered Storage, we have observed corrupt indexes being present > in remote tier. One such situation is covered here at > https://issues.apache.org/jira/browse/KAFKA-15401. This Jira presents another > such possible case of corruption. > Potential cause of index corruption: > We want to ensure that the file we are passing to RSM plugin contains all the > data which is present in MemoryByteBuffer i.e. we should have flushed the > MemoryByteBuffer to the file using force(). In Kafka, when we close a > segment, indexes are flushed asynchronously [1]. Hence, it might be possible > that when we are passing the file to RSM, the file doesn't contain flushed > data. Hence, we may end up uploading indexes which haven't been flushed yet. > Ideally, the contract should enforce that we force flush the content of > MemoryByteBuffer before we give the file for RSM. This will ensure that > indexes are not corrupted/incomplete. > [1] > [https://github.com/apache/kafka/blob/4150595b0a2e0f45f2827cebc60bcb6f6558745d/core/src/main/scala/kafka/log/UnifiedLog.scala#L1613] > -- This message was sent by Atlassian Jira (v8.20.10#820010)