[ 
https://issues.apache.org/jira/browse/KAFKA-15388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17779426#comment-17779426
 ] 

Divij Vaidya commented on KAFKA-15388:
--------------------------------------

Hey [~goyarpit] 

I would suggest the following steps for you to handle this ticket.

1. Understand how Kafka handles historically compacted without Tiered Storage 
(i.e. compacting was turned on, it compacted some data and then compaction was 
turned off, but the data remains) data for fetch request. The behaviour should 
be, if 10 is an offset which has been compacted away, and next available offset 
is 12, Kafka should return 12.
2. Write a test to mimic the situation described in description i.e. the last 
offset in a segment gets compacted away. Observe that the test fails.
3. Fix TS fetch part of the code to continue fetching data beyond the end of 
segment if offset is not present.

Let me know if you have questions. [~christo_lolov] has also done some research 
on it and we will be happy to answer your questions.

> Handle topics that were having compaction as retention earlier are changed to 
> delete only retention policy and onboarded to tiered storage. 
> --------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-15388
>                 URL: https://issues.apache.org/jira/browse/KAFKA-15388
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Satish Duggana
>            Assignee: Arpit Goyal
>            Priority: Blocker
>             Fix For: 3.7.0
>
>
> Context: [https://github.com/apache/kafka/pull/13561#discussion_r1300055517]
>  
> There are 3 paths I looked at:
>  * When data is moved to remote storage (1)
>  * When data is read from remote storage (2)
>  * When data is deleted from remote storage (3)
> (1) Does not have a problem with compacted topics. Compacted segments are 
> uploaded and their metadata claims they contain offset from the baseOffset of 
> the segment until the next segment's baseOffset. There are no gaps in offsets.
> (2) Does not have a problem if a customer is querying offsets which do not 
> exist within a segment, but there are offset after the queried offset within 
> the same segment. *However, it does have a problem when the next available 
> offset is in a subsequent segment.*
> (3) For data deleted via DeleteRecords there is no problem. For data deleted 
> via retention there is no problem.
>  
> *I believe the proper solution to (2) is to make tiered storage continue 
> looking for the next greater offset in subsequent segments.*
> Steps to reproduce the issue:
> {code:java}
> // TODO (christo)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to