[ https://issues.apache.org/jira/browse/KAFKA-16385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17828319#comment-17828319 ]
Luke Chen edited comment on KAFKA-16385 at 3/19/24 12:57 PM: ------------------------------------------------------------- [~divijvaidya], I was thinking the use case you mentioned: _I have set max segment size to be 1 GB and I have a topic with low ingress traffic. I want to expire data in my log every 1 day due to compliance requirement. But the partition doesn't receive 1GB of data in one day and hence, my active segment will never become eligible for expiration._ OK, so, even if we adopt the option 2, we still cannot guarantee all the data expire the 1 day limit will get deleted. Let's say, Right before the retention thread starting to check, a new record arrived. Then, in this case, this segment won't be eligible for expiration even though it contains data over 1 day. And it breaks the contract of the retention.ms. Again, I don't know which is the expected behavior we want. So I'd like to hear more comments from the community/experts. was (Author: showuon): [~divijvaidya], I was thinking the use case you mentioned: _I have set max segment size to be 1 GB and I have a topic with low ingress traffic. I want to expire data in my log every 1 day due to compliance requirement. But the partition doesn't receive 1GB of data in one day and hence, my active segment will never become eligible for expiration. _ OK, so, even if we adopt the option 2, we still cannot guarantee all the data expire the 1 day limit will get deleted. Let's say, when right before the retention thread starting to check, a new record arrived. In this case, this segment won't be eligible for expiration even though it contains data over 1 day. And it breaks the contract of the retention.ms. Again, I don't know which is the expected behavior we want. So I'd like to hear more comments from the community/experts. > Segment is rolled before segment.ms or segment.bytes breached > ------------------------------------------------------------- > > Key: KAFKA-16385 > URL: https://issues.apache.org/jira/browse/KAFKA-16385 > Project: Kafka > Issue Type: Bug > Affects Versions: 3.5.1, 3.7.0 > Reporter: Luke Chen > Assignee: Kuan Po Tseng > Priority: Major > > Steps to reproduce: > 0. Startup a broker with `log.retention.check.interval.ms=1000` to speed up > the test. > 1. Creating a topic with the config: segment.ms=7days , retention.ms=1sec . > 2. Send a record "aaa" to the topic > 3. Wait for 1 second > Will this segment will rolled? I thought no. > But what I have tested is it will roll: > {code:java} > [2024-03-19 15:23:13,924] INFO [LocalLog partition=t2-1, > dir=/tmp/kafka-logs_jbod] Rolled new log segment at offset 1 in 3 ms. > (kafka.log.LocalLog) > [2024-03-19 15:23:13,925] INFO [ProducerStateManager partition=t2-1] Wrote > producer snapshot at offset 1 with 1 producer ids in 1 ms. > (org.apache.kafka.storage.internals.log.ProducerStateManager) > [2024-03-19 15:23:13,925] INFO [UnifiedLog partition=t2-1, > dir=/tmp/kafka-logs_jbod] Deleting segment LogSegment(baseOffset=0, size=71, > lastModifiedTime=1710832993131, largestRecordTimestamp=1710832992125) due to > log retention time 1000ms breach based on the largest record timestamp in the > segment (kafka.log.UnifiedLog) > {code} > The segment is rolled due to log retention time 1000ms breached, which is > unexpected. > Tested in v3.5.1, it has the same issue. -- This message was sent by Atlassian Jira (v8.20.10#820010)