[jira] [Comment Edited] (KAFKA-16385) Segment is rolled before segment.ms or segment.bytes breached

Luke Chen (Jira) Tue, 19 Mar 2024 06:03:13 -0700


    [ 
https://issues.apache.org/jira/browse/KAFKA-16385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17828319#comment-17828319
 ]


Luke Chen edited comment on KAFKA-16385 at 3/19/24 12:57 PM:
-------------------------------------------------------------

[~divijvaidya], I was thinking the use case you mentioned:

_I have set max segment size to be 1 GB and I have a topic with low ingress 
traffic. I want to expire data in my log every 1 day due to compliance 
requirement. But the partition doesn't receive 1GB of data in one day and 
hence, my active segment will never become eligible for expiration._

OK, so, even if we adopt the option 2, we still cannot guarantee all the data 
expire the 1 day limit will get deleted. Let's say, Right before the retention 
thread starting to check, a new record arrived. Then, in this case, this 
segment won't be eligible for expiration even though it contains data over 1 
day. And it breaks the contract of the retention.ms.

Again, I don't know which is the expected behavior we want. So I'd like to hear 
more comments from the community/experts.


was (Author: showuon):
[~divijvaidya], I was thinking the use case you mentioned:

_I have set max segment size to be 1 GB and I have a topic with low ingress 
traffic. I want to expire data in my log every 1 day due to compliance 
requirement. But the partition doesn't receive 1GB of data in one day and 
hence, my active segment will never become eligible for expiration. _

OK, so, even if we adopt the option 2, we still cannot guarantee all the data 
expire the 1 day limit will get deleted. Let's say, when right before the 
retention thread starting to check, a new record arrived. In this case, this 
segment won't be eligible for expiration even though it contains data over 1 
day. And it breaks the contract of the retention.ms.

Again, I don't know which is the expected behavior we want. So I'd like to hear 
more comments from the community/experts. 

> Segment is rolled before segment.ms or segment.bytes breached
> -------------------------------------------------------------
>
>                 Key: KAFKA-16385
>                 URL: https://issues.apache.org/jira/browse/KAFKA-16385
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 3.5.1, 3.7.0
>            Reporter: Luke Chen
>            Assignee: Kuan Po Tseng
>            Priority: Major
>
> Steps to reproduce:
> 0. Startup a broker with `log.retention.check.interval.ms=1000` to speed up 
> the test.
> 1. Creating a topic with the config: segment.ms=7days , retention.ms=1sec .
> 2. Send a record "aaa" to the topic
> 3. Wait for 1 second
> Will this segment will rolled? I thought no.
> But what I have tested is it will roll:
> {code:java}
> [2024-03-19 15:23:13,924] INFO [LocalLog partition=t2-1, 
> dir=/tmp/kafka-logs_jbod] Rolled new log segment at offset 1 in 3 ms. 
> (kafka.log.LocalLog)
> [2024-03-19 15:23:13,925] INFO [ProducerStateManager partition=t2-1] Wrote 
> producer snapshot at offset 1 with 1 producer ids in 1 ms. 
> (org.apache.kafka.storage.internals.log.ProducerStateManager)
> [2024-03-19 15:23:13,925] INFO [UnifiedLog partition=t2-1, 
> dir=/tmp/kafka-logs_jbod] Deleting segment LogSegment(baseOffset=0, size=71, 
> lastModifiedTime=1710832993131, largestRecordTimestamp=1710832992125) due to 
> log retention time 1000ms breach based on the largest record timestamp in the 
> segment (kafka.log.UnifiedLog)
> {code}
> The segment is rolled due to log retention time 1000ms breached, which is 
> unexpected.
> Tested in v3.5.1, it has the same issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Comment Edited] (KAFKA-16385) Segment is rolled before segment.ms or segment.bytes breached

Reply via email to