[jira] [Commented] (KAFKA-15388) Handle topics that were having compaction as retention earlier are changed to delete only retention policy and onboarded to tiered storage.

Arpit Goyal (Jira) Wed, 15 Nov 2023 02:40:23 -0800


    [ 
https://issues.apache.org/jira/browse/KAFKA-15388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17786299#comment-17786299
 ]


Arpit Goyal commented on KAFKA-15388:
-------------------------------------

Hey [~divijvaidya] I am successfully able to reproduce the issue locally. 
*Configuration* 
1. Create Topic name - test20 partition 0 
2. add segment.bytes as 100 for quicker roll of segments 
{code:java}
bin/kafka-configs.sh --bootstrap-server localhost:9092 --alter --entity-type 
topics --entity-name test20  --add-config segment.bytes=100
{code}
3. Enable clean up compact policy 
{code:java}
bin/kafka-configs.sh --bootstrap-server localhost:9092 --alter --entity-type 
topics --entity-name test20  --add-config cleanup.policy=compact
{code} 
4. *Produce some messages *

{code:java}
bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test20 
--property "parse.key=true" --property "key.separator=:"
>1:1
>1:2
>1:3
>1:4
>1:5
>1:6
>1:7
>1:8
>1:9
>1:10
>1:11
>1:12
>1:13
>1:14
>1:15
>1:16
{code}
5. When we try to consume message from the topic without remote storage enable 
{code:java}
(base) ➜  kafka git:(trunk) ✗ bin/kafka-console-consumer.sh  --bootstrap-server 
localhost:9092 --topic test20  --offset 0 --partition 0 --property 
print.offset=true
Offset:14       15
Offset:15       16
{code}
6. Disable the compact policy 
{code:java}
bin/kafka-configs.sh --bootstrap-server localhost:9092 --alter --entity-type 
topics --entity-name test20  --delete-config cleanup.policy
{code}
7. Enable remote storage for the required topic 
{code:java}
bin/kafka-configs.sh --bootstrap-server localhost:9092 --alter --entity-type 
topics --entity-name test20  --add-config remote.storage.enable=true
{code}
8. The rolled over segments has been moved to tiered storage topic. Attached 
screenshot for the reference. (tieredtopicconfiglist)
9. As it was a historically compacted topic with only key 1 , most of the log 
segments had zero bytes.Please refer (tieredtopicconfiglist) screenshot.Only 
14.log and 15.log has some data. 
10. When we try fetching it from the remote storage , it looks for the first 
batch assuming it would always exist , but in this case batch always return 
null because segment is empty and batch would always be null (  0 >= 0 -11)
https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/common/record/FileLogInputStream.java#L65
https://github.com/apache/kafka/blob/trunk/core/src/main/java/kafka/log/remote/RemoteLogManager.java#L1411
https://github.com/apache/kafka/blob/trunk/core/src/main/java/kafka/log/remote/RemoteLogManager.java#L1277

11. This seems to be a critical bug , and it should be inline with the local 
log fetch mechanism , ie if segment is null , it should fetch the higher 
segment and continue till the end offset. 

Should i create a different bug for this ? 







> Handle topics that were having compaction as retention earlier are changed to 
> delete only retention policy and onboarded to tiered storage. 
> --------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-15388
>                 URL: https://issues.apache.org/jira/browse/KAFKA-15388
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Satish Duggana
>            Assignee: Arpit Goyal
>            Priority: Blocker
>             Fix For: 3.7.0
>
>         Attachments: Screenshot 2023-11-15 at 3.47.54 PM.png, 
> tieredtopicloglist.png
>
>
> Context: [https://github.com/apache/kafka/pull/13561#discussion_r1300055517]
>  
> There are 3 paths I looked at:
>  * When data is moved to remote storage (1)
>  * When data is read from remote storage (2)
>  * When data is deleted from remote storage (3)
> (1) Does not have a problem with compacted topics. Compacted segments are 
> uploaded and their metadata claims they contain offset from the baseOffset of 
> the segment until the next segment's baseOffset. There are no gaps in offsets.
> (2) Does not have a problem if a customer is querying offsets which do not 
> exist within a segment, but there are offset after the queried offset within 
> the same segment. *However, it does have a problem when the next available 
> offset is in a subsequent segment.*
> (3) For data deleted via DeleteRecords there is no problem. For data deleted 
> via retention there is no problem.
>  
> *I believe the proper solution to (2) is to make tiered storage continue 
> looking for the next greater offset in subsequent segments.*
> Steps to reproduce the issue:
> {code:java}
> // TODO (christo)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (KAFKA-15388) Handle topics that were having compaction as retention earlier are changed to delete only retention policy and onboarded to tiered storage.

Reply via email to