[ https://issues.apache.org/jira/browse/KAFKA-16385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17828247#comment-17828247 ]
Divij Vaidya commented on KAFKA-16385: -------------------------------------- [~showuon] I must be missing something here but the current behaviour looks correct to me. Let's consider a use case from a Apache Kafka user: I have set max segment size to be 1 GB and I have a topic with low ingress traffic. I want to expire data in my log every 1 day due to compliance requirement. But the partition doesn't receive 1GB of data in one day and hence, my active segment will never become eligible for expiration. Now, user can set segment.ms = 1 day to force a rotation even when segment is not full. This should satisfy the use case. But how do we define the behaviour when expiration configuration is less than roll configuration. We have have two options: Option 1: Ignore expiration config if it is less than rotation config Option 2: Expiration config overrides rotation config Option 1 prioritizes an internal configuration (ideally a user shouldn't know about segments etc in a log) over a functional config (user wants to expire data). This requires users to know about inner details of logs such as presence of a segment or index etc. At Apache Kafka, we have chosen option 2, i.e. prioritize a user facing functionality config (expiration config) over an internal config (rotation config). Thoughts? > Segment is rolled before segment.ms or segment.bytes breached > ------------------------------------------------------------- > > Key: KAFKA-16385 > URL: https://issues.apache.org/jira/browse/KAFKA-16385 > Project: Kafka > Issue Type: Bug > Affects Versions: 3.5.1, 3.7.0 > Reporter: Luke Chen > Assignee: Kuan Po Tseng > Priority: Major > > Steps to reproduce: > 0. Startup a broker with `log.retention.check.interval.ms=1000` to speed up > the test. > 1. Creating a topic with the config: segment.ms=7days , retention.ms=1sec . > 2. Send a record "aaa" to the topic > 3. Wait for 1 second > Will this segment will rolled? I thought no. > But what I have tested is it will roll: > {code:java} > [2024-03-19 15:23:13,924] INFO [LocalLog partition=t2-1, > dir=/tmp/kafka-logs_jbod] Rolled new log segment at offset 1 in 3 ms. > (kafka.log.LocalLog) > [2024-03-19 15:23:13,925] INFO [ProducerStateManager partition=t2-1] Wrote > producer snapshot at offset 1 with 1 producer ids in 1 ms. > (org.apache.kafka.storage.internals.log.ProducerStateManager) > [2024-03-19 15:23:13,925] INFO [UnifiedLog partition=t2-1, > dir=/tmp/kafka-logs_jbod] Deleting segment LogSegment(baseOffset=0, size=71, > lastModifiedTime=1710832993131, largestRecordTimestamp=1710832992125) due to > log retention time 1000ms breach based on the largest record timestamp in the > segment (kafka.log.UnifiedLog) > {code} > The segment is rolled due to log retention time 1000ms breached, which is > unexpected. > Tested in v3.5.1, it has the same issue. -- This message was sent by Atlassian Jira (v8.20.10#820010)