[jira] [Commented] (KAFKA-15169) Add tests for RemoteIndexCache
[ https://issues.apache.org/jira/browse/KAFKA-15169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17786929#comment-17786929 ] Divij Vaidya commented on KAFKA-15169: -- I backported this to 3.6.1 to ease backporting of a bug fix on top of this commit. > Add tests for RemoteIndexCache > -- > > Key: KAFKA-15169 > URL: https://issues.apache.org/jira/browse/KAFKA-15169 > Project: Kafka > Issue Type: Test >Reporter: Satish Duggana >Assignee: Arpit Goyal >Priority: Major > Labels: KIP-405 > Fix For: 3.7.0, 3.6.1 > > > Follow-up from > https://github.com/apache/kafka/pull/13275#discussion_r1257490978 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-15169) Add tests for RemoteIndexCache
[ https://issues.apache.org/jira/browse/KAFKA-15169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17771713#comment-17771713 ] Arpit Goyal commented on KAFKA-15169: - [~divijvaidya] [~satish.duggana] [~showuon] Added few test cases Please review the PR https://github.com/apache/kafka/pull/14482/files > Add tests for RemoteIndexCache > -- > > Key: KAFKA-15169 > URL: https://issues.apache.org/jira/browse/KAFKA-15169 > Project: Kafka > Issue Type: Test >Reporter: Satish Duggana >Assignee: Arpit Goyal >Priority: Major > Labels: KIP-405 > Fix For: 3.7.0 > > > Follow-up from > https://github.com/apache/kafka/pull/13275#discussion_r1257490978 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-15169) Add tests for RemoteIndexCache
[ https://issues.apache.org/jira/browse/KAFKA-15169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17770491#comment-17770491 ] Arpit Goyal commented on KAFKA-15169: - > Another one that comes to my mind is if we have a > ".delete" file on the disk (because files with delete > suffix are removed async) and another one with the same name is created (it > would occur second time the entry with same key gets invalidated). It should > be a no-op in that case (instead of throwing an error). [~divijvaidya] This should be similar to what you mentioned long back >Another test: 1. Create an index and verify file exists 2. Invalidate the index from cache which will mark it for cleanup i.e. change suffix to .deleted 3. Try to fetch the same index from cache, cache will not have it and re-fetch it. 4. Invalidate this index. 5. If the cleaner thread is slow, the file in step 2 will still exist 6. When we try to move this new file at step 4 to the same name with deleted, verify that it doesn't fail and instead overwrites the file. > Add tests for RemoteIndexCache > -- > > Key: KAFKA-15169 > URL: https://issues.apache.org/jira/browse/KAFKA-15169 > Project: Kafka > Issue Type: Test >Reporter: Satish Duggana >Assignee: Arpit Goyal >Priority: Major > Labels: KIP-405 > Fix For: 3.7.0 > > > Follow-up from > https://github.com/apache/kafka/pull/13275#discussion_r1257490978 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-15169) Add tests for RemoteIndexCache
[ https://issues.apache.org/jira/browse/KAFKA-15169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17769486#comment-17769486 ] Divij Vaidya commented on KAFKA-15169: -- > And adding Other test cases of what we discussed i will cover it is as part > of this JIRA. WDYT ? Sounds good. Please feel free to add additional tests that you think are not covered. Another one that comes to my mind is if we have a ".delete" file on the disk (because files with delete suffix are removed async) and another one with the same name is created (it would occur second time the entry with same key gets invalidated). It should be a no-op in that case (instead of throwing an error). > Add tests for RemoteIndexCache > -- > > Key: KAFKA-15169 > URL: https://issues.apache.org/jira/browse/KAFKA-15169 > Project: Kafka > Issue Type: Test >Reporter: Satish Duggana >Assignee: Arpit Goyal >Priority: Major > Labels: KIP-405 > Fix For: 3.7.0 > > > Follow-up from > https://github.com/apache/kafka/pull/13275#discussion_r1257490978 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-15169) Add tests for RemoteIndexCache
[ https://issues.apache.org/jira/browse/KAFKA-15169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17769459#comment-17769459 ] Arpit Goyal commented on KAFKA-15169: - [~divijvaidya] [~satish.duggana] I discovered a bug while writing test cases for inconsistencies between disk storage and RemoteIndexCache. Please have a look. https://issues.apache.org/jira/browse/KAFKA-15511 > Add tests for RemoteIndexCache > -- > > Key: KAFKA-15169 > URL: https://issues.apache.org/jira/browse/KAFKA-15169 > Project: Kafka > Issue Type: Test >Reporter: Satish Duggana >Assignee: Arpit Goyal >Priority: Major > Labels: KIP-405 > Fix For: 3.7.0 > > > Follow-up from > https://github.com/apache/kafka/pull/13275#discussion_r1257490978 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-15169) Add tests for RemoteIndexCache
[ https://issues.apache.org/jira/browse/KAFKA-15169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17769425#comment-17769425 ] Arpit Goyal commented on KAFKA-15169: - [~divijvaidya] As per the code RemoteIndexCache never retries if file gets corrupted after remote storage fetch. I will create a separate ticket to track this enhancement and which indirectly cover1st test case we discussed. Other test cases of what we discussed i will cover it is as part of this JIRA. WDYT ? > Add tests for RemoteIndexCache > -- > > Key: KAFKA-15169 > URL: https://issues.apache.org/jira/browse/KAFKA-15169 > Project: Kafka > Issue Type: Test >Reporter: Satish Duggana >Assignee: Arpit Goyal >Priority: Major > Labels: KIP-405 > Fix For: 3.7.0 > > > Follow-up from > https://github.com/apache/kafka/pull/13275#discussion_r1257490978 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-15169) Add tests for RemoteIndexCache
[ https://issues.apache.org/jira/browse/KAFKA-15169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17769119#comment-17769119 ] Divij Vaidya commented on KAFKA-15169: -- First some correction on what I said above. > The case you mention assumes that file sitting on disk may get corrupted but > that is a risk we choose to accept in Kafka, The files sitting on disk do actually get corrupted. We know of such cases when the disk gets full and sometimes leaves the indexes in an inconsistent state. We perform a restart on disk full case and hence, we can assume that during the lifecycle of a broker, files sitting on disk will not get corrupted. But on restart, we should definitely perform a check. Next, for test case 1, it validates recovery if the index fetched from remote was corrupted during network transfer, i.e. 1. we call getIndexEntry 2. It throws corrupt index exception( This exception will be thrown after fetching from remote storage ) at "index.sanityCheck();" (line 361) 3. I haven't looked at how we are handling it, but ideally the system should retry fetch from remote and this time it should succeed (no corruption during transfer), the test should validate that a retry occur and it is successful. Next, for test case 2, the test you mentioned sounds a nice addition. It validates the situation where we have a file on disk but it's not in cache. In such case, we should add cache entry from the file if it is correct else try to fetch from remote. You are right in assuming that this case code never occur (because ideally if a file exist on disk, it should have a corresponding entry in cache already), but this code is a fail safe scenario in case we are accidentally left with an inconsistency between the file on disk and in-memory cache. > Add tests for RemoteIndexCache > -- > > Key: KAFKA-15169 > URL: https://issues.apache.org/jira/browse/KAFKA-15169 > Project: Kafka > Issue Type: Test >Reporter: Satish Duggana >Assignee: Arpit Goyal >Priority: Major > Labels: KIP-405 > Fix For: 3.7.0 > > > Follow-up from > https://github.com/apache/kafka/pull/13275#discussion_r1257490978 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-15169) Add tests for RemoteIndexCache
[ https://issues.apache.org/jira/browse/KAFKA-15169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17769103#comment-17769103 ] Arpit Goyal commented on KAFKA-15169: - [~divijvaidya] Just to confirm what I understood the flow of the first test case 1. we call getIndexEntry 2. It throws corrupt storage exception( This exception will be thrown after fetching from remote storage ) i.e. {code:java} Utils.atomicMoveWithFallback(tmpIndexFile.toPath(), indexFile.toPath(), false); index = readIndex.apply(indexFile); // throws remote Storage exception {code} 3. We call getIndexEntry again 4. This time file already exist on disk , it will log the corrupted error 5. It will refetch from remote storage and passes the sanity check. The test case is basically to test the flow when corrupted file already exist on disk ? > Add tests for RemoteIndexCache > -- > > Key: KAFKA-15169 > URL: https://issues.apache.org/jira/browse/KAFKA-15169 > Project: Kafka > Issue Type: Test >Reporter: Satish Duggana >Assignee: Arpit Goyal >Priority: Major > Labels: KIP-405 > Fix For: 3.7.0 > > > Follow-up from > https://github.com/apache/kafka/pull/13275#discussion_r1257490978 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-15169) Add tests for RemoteIndexCache
[ https://issues.apache.org/jira/browse/KAFKA-15169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17769086#comment-17769086 ] Arpit Goyal commented on KAFKA-15169: - Thanks [~divijvaidya] I have two questions based on the code walkthrough {code:java} private T loadIndexFile(File file, RemoteLogSegmentMetadata remoteLogSegmentMetadata, Function fetchRemoteIndex, Function readIndex) throws IOException { File indexFile = new File(cacheDir, file.getName()); T index = null; if (Files.exists(indexFile.toPath())) { try { index = readIndex.apply(indexFile); } catch (CorruptRecordException ex) { log.info("Error occurred while loading the stored index file {}", indexFile.getPath(), ex); } } if (index == null) { File tmpIndexFile = new File(indexFile.getParentFile(), indexFile.getName() + RemoteIndexCache.TMP_FILE_SUFFIX); try (InputStream inputStream = fetchRemoteIndex.apply(remoteLogSegmentMetadata)) { Files.copy(inputStream, tmpIndexFile.toPath(), StandardCopyOption.REPLACE_EXISTING); } Utils.atomicMoveWithFallback(tmpIndexFile.toPath(), indexFile.toPath(), false); index = readIndex.apply(indexFile); } return index; } {code} In the RemoteIndexCache (loadIndexFile) function 1. First we check if file exists on the disk and do a sanityCheck. I believe this part of code will never be executed as it occurs only when there is a cache miss operation. 2. As per the first test case it would through Corrupt record exception at the later part of the code where we fetch it from remote segment and doing a sanityCheck {code:java} Utils.atomicMoveWithFallback(tmpIndexFile.toPath(), indexFile.toPath(), false); index = readIndex.apply(indexFile); {code} I was believing the first test case was related to file already exist on the disk and then call getIndexEntry 1. Create a empty/corrupt file on disk 2. Call getIndexEntry 3. It throws record corrupted action 4. In the next line it fetches from remote storage and restore the file. > Add tests for RemoteIndexCache > -- > > Key: KAFKA-15169 > URL: https://issues.apache.org/jira/browse/KAFKA-15169 > Project: Kafka > Issue Type: Test >Reporter: Satish Duggana >Assignee: Arpit Goyal >Priority: Major > Labels: KIP-405 > Fix For: 3.7.0 > > > Follow-up from > https://github.com/apache/kafka/pull/13275#discussion_r1257490978 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-15169) Add tests for RemoteIndexCache
[ https://issues.apache.org/jira/browse/KAFKA-15169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17769072#comment-17769072 ] Divij Vaidya commented on KAFKA-15169: -- Separately, while you are writing tests for this cache, see if you would be interested to fix https://issues.apache.org/jira/browse/KAFKA-15481 as well. We would ideally like to write a test for the scenario mentioned in that ticket which fails prior to the fix and succeeds after it. > Add tests for RemoteIndexCache > -- > > Key: KAFKA-15169 > URL: https://issues.apache.org/jira/browse/KAFKA-15169 > Project: Kafka > Issue Type: Test >Reporter: Satish Duggana >Assignee: Arpit Goyal >Priority: Major > Labels: KIP-405 > Fix For: 3.7.0 > > > Follow-up from > https://github.com/apache/kafka/pull/13275#discussion_r1257490978 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-15169) Add tests for RemoteIndexCache
[ https://issues.apache.org/jira/browse/KAFKA-15169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17769071#comment-17769071 ] Divij Vaidya commented on KAFKA-15169: -- Hey Arpit Asserting the sanity of the index (or any files on disk) is an expensive operation. Hence, we have to strike a balance on when do we assert sanity vs. trust that the file is not corrupted on disk. For logs, we perform CRC checksum while storing data on disk and after that the assumption is that files on disk will not get corrupted, i.e. we consider transfer over the network a possible culprit for corruption but don't consider that a file sitting on disk will get corrupted. Extending the same analogy to this cache, when we fetch the index files from remote store, they may be corrupted, so we perform a sanity check, but once stored on disk, we assume that files will not be corrupted. The case you mention assumes that file sitting on disk may get corrupted but that is a risk we choose to accept in Kafka, given the tradeoff mentioned above. Hence, the case you mentioned is an acceptable risk by design. > Add tests for RemoteIndexCache > -- > > Key: KAFKA-15169 > URL: https://issues.apache.org/jira/browse/KAFKA-15169 > Project: Kafka > Issue Type: Test >Reporter: Satish Duggana >Assignee: Arpit Goyal >Priority: Major > Labels: KIP-405 > Fix For: 3.7.0 > > > Follow-up from > https://github.com/apache/kafka/pull/13275#discussion_r1257490978 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-15169) Add tests for RemoteIndexCache
[ https://issues.apache.org/jira/browse/KAFKA-15169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17769006#comment-17769006 ] Arpit Goyal commented on KAFKA-15169: - [~divijvaidya] [~satish.duggana] In the test case 1 which you mentioned to test for the corrupted scenario , I discovered we are not handling one more use case which can be problematic ? Example 1. we call getIndexEntry 2. It succeeded 3. Somehow the offsetIndex file is corrupted 4. we call getIndexEntry 5. As it is already in the cache , it would always return the corrupted file without validating the sanityCheck ? > Add tests for RemoteIndexCache > -- > > Key: KAFKA-15169 > URL: https://issues.apache.org/jira/browse/KAFKA-15169 > Project: Kafka > Issue Type: Test >Reporter: Satish Duggana >Assignee: Arpit Goyal >Priority: Major > Labels: KIP-405 > Fix For: 3.7.0 > > > Follow-up from > https://github.com/apache/kafka/pull/13275#discussion_r1257490978 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-15169) Add tests for RemoteIndexCache
[ https://issues.apache.org/jira/browse/KAFKA-15169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17768372#comment-17768372 ] Arpit Goyal commented on KAFKA-15169: - [~divijvaidya] In the first test case Step no 3 Do we support this functionality fetchAndCreateIndex ? As per the codebase there is only getIndexEntry function which calls remote storage only if there is a miss ? > Add tests for RemoteIndexCache > -- > > Key: KAFKA-15169 > URL: https://issues.apache.org/jira/browse/KAFKA-15169 > Project: Kafka > Issue Type: Test >Reporter: Satish Duggana >Assignee: Arpit Goyal >Priority: Major > Labels: KIP-405 > Fix For: 3.7.0 > > > Follow-up from > https://github.com/apache/kafka/pull/13275#discussion_r1257490978 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-15169) Add tests for RemoteIndexCache
[ https://issues.apache.org/jira/browse/KAFKA-15169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17768352#comment-17768352 ] Arpit Goyal commented on KAFKA-15169: - [~divijvaidya] I have started working on it . Can you help me understand what does "fetchAndCreateIndex' means . There is no such function exists in the remotestoragemanager class. > Add tests for RemoteIndexCache > -- > > Key: KAFKA-15169 > URL: https://issues.apache.org/jira/browse/KAFKA-15169 > Project: Kafka > Issue Type: Test >Reporter: Satish Duggana >Assignee: Arpit Goyal >Priority: Major > Labels: KIP-405 > Fix For: 3.7.0 > > > Follow-up from > https://github.com/apache/kafka/pull/13275#discussion_r1257490978 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-15169) Add tests for RemoteIndexCache
[ https://issues.apache.org/jira/browse/KAFKA-15169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17767639#comment-17767639 ] Arpit Goyal commented on KAFKA-15169: - [~divijvaidya] [~isding_l] [~satish.duggana] I am picking it up as I see no activity on this from July. > Add tests for RemoteIndexCache > -- > > Key: KAFKA-15169 > URL: https://issues.apache.org/jira/browse/KAFKA-15169 > Project: Kafka > Issue Type: Test >Reporter: Satish Duggana >Assignee: Lan Ding >Priority: Major > Labels: KIP-405 > Fix For: 3.7.0 > > > Follow-up from > https://github.com/apache/kafka/pull/13275#discussion_r1257490978 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-15169) Add tests for RemoteIndexCache
[ https://issues.apache.org/jira/browse/KAFKA-15169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17766318#comment-17766318 ] Arpit Goyal commented on KAFKA-15169: - [~isding_l] Can i pick this up ? > Add tests for RemoteIndexCache > -- > > Key: KAFKA-15169 > URL: https://issues.apache.org/jira/browse/KAFKA-15169 > Project: Kafka > Issue Type: Test >Reporter: Satish Duggana >Assignee: Lan Ding >Priority: Major > Labels: KIP-405 > Fix For: 3.7.0 > > > Follow-up from > https://github.com/apache/kafka/pull/13275#discussion_r1257490978 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-15169) Add tests for RemoteIndexCache
[ https://issues.apache.org/jira/browse/KAFKA-15169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17761548#comment-17761548 ] Satish Duggana commented on KAFKA-15169: Moving it to 3.7.0 as we are near code freeze and it is not a blocker. > Add tests for RemoteIndexCache > -- > > Key: KAFKA-15169 > URL: https://issues.apache.org/jira/browse/KAFKA-15169 > Project: Kafka > Issue Type: Test >Reporter: Satish Duggana >Assignee: Lan Ding >Priority: Major > Labels: KIP-405 > Fix For: 3.7.0 > > > Follow-up from > https://github.com/apache/kafka/pull/13275#discussion_r1257490978 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-15169) Add tests for RemoteIndexCache
[ https://issues.apache.org/jira/browse/KAFKA-15169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17747904#comment-17747904 ] Divij Vaidya commented on KAFKA-15169: -- [~isding_l] did you get a chance to work on this ticket yet? > Add tests for RemoteIndexCache > -- > > Key: KAFKA-15169 > URL: https://issues.apache.org/jira/browse/KAFKA-15169 > Project: Kafka > Issue Type: Test >Reporter: Satish Duggana >Assignee: Lan Ding >Priority: Major > Labels: KIP-405 > Fix For: 3.6.0 > > > Follow-up from > https://github.com/apache/kafka/pull/13275#discussion_r1257490978 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-15169) Add tests for RemoteIndexCache
[ https://issues.apache.org/jira/browse/KAFKA-15169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17742602#comment-17742602 ] Lan Ding commented on KAFKA-15169: -- Sure, I will add tests for RemoteIndexCache in days. > Add tests for RemoteIndexCache > -- > > Key: KAFKA-15169 > URL: https://issues.apache.org/jira/browse/KAFKA-15169 > Project: Kafka > Issue Type: Test >Reporter: Satish Duggana >Priority: Major > Labels: KIP-405 > Fix For: 3.6.0 > > > Follow-up from > https://github.com/apache/kafka/pull/13275#discussion_r1257490978 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-15169) Add tests for RemoteIndexCache
[ https://issues.apache.org/jira/browse/KAFKA-15169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17742391#comment-17742391 ] Divij Vaidya commented on KAFKA-15169: -- [~isding_l] would one you be interested in picking this up? > Add tests for RemoteIndexCache > -- > > Key: KAFKA-15169 > URL: https://issues.apache.org/jira/browse/KAFKA-15169 > Project: Kafka > Issue Type: Test >Reporter: Satish Duggana >Priority: Major > Labels: KIP-405 > Fix For: 3.6.0 > > > Follow-up from > https://github.com/apache/kafka/pull/13275#discussion_r1257490978 -- This message was sent by Atlassian Jira (v8.20.10#820010)