[ 
https://issues.apache.org/jira/browse/KAFKA-15169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17769071#comment-17769071
 ] 

Divij Vaidya commented on KAFKA-15169:
--------------------------------------

Hey Arpit

Asserting the sanity of the index (or any files on disk) is an expensive 
operation. Hence, we have to strike a balance on when do we assert sanity vs. 
trust that the file is not corrupted on disk.

For logs, we perform CRC checksum while storing data on disk and after that the 
assumption is that files on disk will not get corrupted, i.e. we consider 
transfer over the network a possible culprit for corruption but don't consider 
that a file sitting on disk will get corrupted. Extending the same analogy to 
this cache, when we fetch the index files from remote store, they may be 
corrupted, so we perform a sanity check, but once stored on disk, we assume 
that files will not be corrupted.

The case you mention assumes that file sitting on disk may get corrupted but 
that is a risk we choose to accept in Kafka, given the tradeoff mentioned 
above. Hence, the case you mentioned is an acceptable risk by design.

> Add tests for RemoteIndexCache
> ------------------------------
>
>                 Key: KAFKA-15169
>                 URL: https://issues.apache.org/jira/browse/KAFKA-15169
>             Project: Kafka
>          Issue Type: Test
>            Reporter: Satish Duggana
>            Assignee: Arpit Goyal
>            Priority: Major
>              Labels: KIP-405
>             Fix For: 3.7.0
>
>
> Follow-up from 
> https://github.com/apache/kafka/pull/13275#discussion_r1257490978



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to