[jira] [Commented] (KAFKA-15388) Handle topics that were having compaction as retention earlier are changed to delete only retention policy and onboarded to tiered storage.

Arpit Goyal (Jira) Fri, 03 Nov 2023 02:45:04 -0700


    [ 
https://issues.apache.org/jira/browse/KAFKA-15388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17782492#comment-17782492
 ]


Arpit Goyal commented on KAFKA-15388:
-------------------------------------

[~divijvaidya] [~christo_lolov] Need more help in understanding the use case 
here on this line 
https://github.com/satishd/kafka/blob/46c96f4868d51c84b43003bbb80bc07297016912/core/src/main/java/kafka/log/remote/RemoteLogManager.java#L1339

Let's say we are fetching the data for offset k 

1. We try to find leaderEpoch for the requested offset k 
2. Using leader epoch and offset k , we try to find out the corresponding 
RemoteLogSegmentMetadata 
3.  Using  RemoteLogSegmentMetadata  and offsetIndex we try to find the  
position of the  highest possible entry less than the requested offset k. 
4. Using the startposition fetched from 3rd step , we fetched 
remotesegInputStream  from RemoteStorageManager. 
Now here we try to find the right record batch where our offset  lies within 
the corresponding batch. 
But here IMP same use case arises. If it is a historically compacted topic and 
the record batch last offset  is a compacted one , then we should return the 
ideal batch instead of empty ?

> Handle topics that were having compaction as retention earlier are changed to 
> delete only retention policy and onboarded to tiered storage. 
> --------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-15388
>                 URL: https://issues.apache.org/jira/browse/KAFKA-15388
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Satish Duggana
>            Assignee: Arpit Goyal
>            Priority: Blocker
>             Fix For: 3.7.0
>
>
> Context: [https://github.com/apache/kafka/pull/13561#discussion_r1300055517]
>  
> There are 3 paths I looked at:
>  * When data is moved to remote storage (1)
>  * When data is read from remote storage (2)
>  * When data is deleted from remote storage (3)
> (1) Does not have a problem with compacted topics. Compacted segments are 
> uploaded and their metadata claims they contain offset from the baseOffset of 
> the segment until the next segment's baseOffset. There are no gaps in offsets.
> (2) Does not have a problem if a customer is querying offsets which do not 
> exist within a segment, but there are offset after the queried offset within 
> the same segment. *However, it does have a problem when the next available 
> offset is in a subsequent segment.*
> (3) For data deleted via DeleteRecords there is no problem. For data deleted 
> via retention there is no problem.
>  
> *I believe the proper solution to (2) is to make tiered storage continue 
> looking for the next greater offset in subsequent segments.*
> Steps to reproduce the issue:
> {code:java}
> // TODO (christo)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (KAFKA-15388) Handle topics that were having compaction as retention earlier are changed to delete only retention policy and onboarded to tiered storage.

Reply via email to