[ 
https://issues.apache.org/jira/browse/KAFKA-16385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17828247#comment-17828247
 ] 

Divij Vaidya commented on KAFKA-16385:
--------------------------------------

[~showuon] I must be missing something here but the current behaviour looks 
correct to me.

Let's consider a use case from a Apache Kafka user:
I have set max segment size to be 1 GB and I have a topic with low ingress 
traffic. I want to expire data in my log every 1 day due to compliance 
requirement. But the partition doesn't receive 1GB of data in one day and 
hence, my active segment will never become eligible for expiration. 

Now, user can set segment.ms = 1 day to force a rotation even when segment is 
not full. This should satisfy the use case. But how do we define the behaviour 
when expiration configuration is less than roll configuration.

We have have two options:
Option 1: Ignore expiration config if it is less than rotation config
Option 2: Expiration config overrides rotation config

Option 1 prioritizes an internal configuration (ideally a user shouldn't know 
about segments etc in a log) over a functional config (user wants to expire 
data). This requires users to know about inner details of logs such as presence 
of a segment or index etc.

At Apache Kafka, we have chosen option 2, i.e. prioritize a user facing 
functionality config (expiration config) over an internal config (rotation 
config).

Thoughts?

> Segment is rolled before segment.ms or segment.bytes breached
> -------------------------------------------------------------
>
>                 Key: KAFKA-16385
>                 URL: https://issues.apache.org/jira/browse/KAFKA-16385
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 3.5.1, 3.7.0
>            Reporter: Luke Chen
>            Assignee: Kuan Po Tseng
>            Priority: Major
>
> Steps to reproduce:
> 0. Startup a broker with `log.retention.check.interval.ms=1000` to speed up 
> the test.
> 1. Creating a topic with the config: segment.ms=7days , retention.ms=1sec .
> 2. Send a record "aaa" to the topic
> 3. Wait for 1 second
> Will this segment will rolled? I thought no.
> But what I have tested is it will roll:
> {code:java}
> [2024-03-19 15:23:13,924] INFO [LocalLog partition=t2-1, 
> dir=/tmp/kafka-logs_jbod] Rolled new log segment at offset 1 in 3 ms. 
> (kafka.log.LocalLog)
> [2024-03-19 15:23:13,925] INFO [ProducerStateManager partition=t2-1] Wrote 
> producer snapshot at offset 1 with 1 producer ids in 1 ms. 
> (org.apache.kafka.storage.internals.log.ProducerStateManager)
> [2024-03-19 15:23:13,925] INFO [UnifiedLog partition=t2-1, 
> dir=/tmp/kafka-logs_jbod] Deleting segment LogSegment(baseOffset=0, size=71, 
> lastModifiedTime=1710832993131, largestRecordTimestamp=1710832992125) due to 
> log retention time 1000ms breach based on the largest record timestamp in the 
> segment (kafka.log.UnifiedLog)
> {code}
> The segment is rolled due to log retention time 1000ms breached, which is 
> unexpected.
> Tested in v3.5.1, it has the same issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to