[ https://issues.apache.org/jira/browse/KAFKA-15388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17782492#comment-17782492 ]
Arpit Goyal commented on KAFKA-15388: ------------------------------------- [~divijvaidya] [~christo_lolov] Need more help in understanding the use case here on this line https://github.com/satishd/kafka/blob/46c96f4868d51c84b43003bbb80bc07297016912/core/src/main/java/kafka/log/remote/RemoteLogManager.java#L1339 Let's say we are fetching the data for offset k 1. We try to find leaderEpoch for the requested offset k 2. Using leader epoch and offset k , we try to find out the corresponding RemoteLogSegmentMetadata 3. Using RemoteLogSegmentMetadata and offsetIndex we try to find the position of the highest possible entry less than the requested offset k. 4. Using the startposition fetched from 3rd step , we fetched remotesegInputStream from RemoteStorageManager. Now here we try to find the right record batch where our offset lies within the corresponding batch. But here IMP same use case arises. If it is a historically compacted topic and the record batch last offset is a compacted one , then we should return the ideal batch instead of empty ? > Handle topics that were having compaction as retention earlier are changed to > delete only retention policy and onboarded to tiered storage. > -------------------------------------------------------------------------------------------------------------------------------------------- > > Key: KAFKA-15388 > URL: https://issues.apache.org/jira/browse/KAFKA-15388 > Project: Kafka > Issue Type: Bug > Reporter: Satish Duggana > Assignee: Arpit Goyal > Priority: Blocker > Fix For: 3.7.0 > > > Context: [https://github.com/apache/kafka/pull/13561#discussion_r1300055517] > > There are 3 paths I looked at: > * When data is moved to remote storage (1) > * When data is read from remote storage (2) > * When data is deleted from remote storage (3) > (1) Does not have a problem with compacted topics. Compacted segments are > uploaded and their metadata claims they contain offset from the baseOffset of > the segment until the next segment's baseOffset. There are no gaps in offsets. > (2) Does not have a problem if a customer is querying offsets which do not > exist within a segment, but there are offset after the queried offset within > the same segment. *However, it does have a problem when the next available > offset is in a subsequent segment.* > (3) For data deleted via DeleteRecords there is no problem. For data deleted > via retention there is no problem. > > *I believe the proper solution to (2) is to make tiered storage continue > looking for the next greater offset in subsequent segments.* > Steps to reproduce the issue: > {code:java} > // TODO (christo) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)