[jira] [Commented] (KAFKA-16385) Segment is rolled before segment.ms or segment.bytes breached
[ https://issues.apache.org/jira/browse/KAFKA-16385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847889#comment-17847889 ] Chia-Ping Tsai commented on KAFKA-16385: {quote} worst case {{deletion time = retention.ms + segment.ms}} {quote} The record timestamp can be defined by users. Hence, the deletion time will be the end of world if the timestamp is a huge number (ya, the data from the future) :) > Segment is rolled before segment.ms or segment.bytes breached > - > > Key: KAFKA-16385 > URL: https://issues.apache.org/jira/browse/KAFKA-16385 > Project: Kafka > Issue Type: Bug >Affects Versions: 3.5.1, 3.7.0 >Reporter: Luke Chen >Assignee: Kuan Po Tseng >Priority: Major > Fix For: 3.8.0 > > > Steps to reproduce: > 0. Startup a broker with `log.retention.check.interval.ms=1000` to speed up > the test. > 1. Creating a topic with the config: segment.ms=7days , segment.bytes=1GB, > retention.ms=1sec . > 2. Send a record "aaa" to the topic > 3. Wait for 1 second > Will this segment will rolled? I thought no. > But what I have tested is it will roll: > {code:java} > [2024-03-19 15:23:13,924] INFO [LocalLog partition=t2-1, > dir=/tmp/kafka-logs_jbod] Rolled new log segment at offset 1 in 3 ms. > (kafka.log.LocalLog) > [2024-03-19 15:23:13,925] INFO [ProducerStateManager partition=t2-1] Wrote > producer snapshot at offset 1 with 1 producer ids in 1 ms. > (org.apache.kafka.storage.internals.log.ProducerStateManager) > [2024-03-19 15:23:13,925] INFO [UnifiedLog partition=t2-1, > dir=/tmp/kafka-logs_jbod] Deleting segment LogSegment(baseOffset=0, size=71, > lastModifiedTime=1710832993131, largestRecordTimestamp=1710832992125) due to > log retention time 1000ms breach based on the largest record timestamp in the > segment (kafka.log.UnifiedLog) > {code} > The segment is rolled due to log retention time 1000ms breached, which is > unexpected. > Tested in v3.5.1, it has the same issue. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-16385) Segment is rolled before segment.ms or segment.bytes breached
[ https://issues.apache.org/jira/browse/KAFKA-16385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847873#comment-17847873 ] Kamal Chandraprakash commented on KAFKA-16385: -- Adding the comment for posterity: > OK, so, even if we adopt the option 2, we still cannot guarantee all the data > expire the 1 day limit will get deleted. Let's say, Right before the > retention thread starting to check, a new record arrived. Then, in this case, > this segment won't be eligible for expiration even though it contains data > over 1 day. And it still breaks the contract of the retention.ms. If segment.ms is configured to be 1 day, then all the segments regardless of active/stale gets rotated once a day and is eligible for deletion by the log cleaner thread. The deletion may not be exact, worst case {{deletion time = retention.ms + segment.ms}} > Segment is rolled before segment.ms or segment.bytes breached > - > > Key: KAFKA-16385 > URL: https://issues.apache.org/jira/browse/KAFKA-16385 > Project: Kafka > Issue Type: Bug >Affects Versions: 3.5.1, 3.7.0 >Reporter: Luke Chen >Assignee: Kuan Po Tseng >Priority: Major > Fix For: 3.8.0 > > > Steps to reproduce: > 0. Startup a broker with `log.retention.check.interval.ms=1000` to speed up > the test. > 1. Creating a topic with the config: segment.ms=7days , segment.bytes=1GB, > retention.ms=1sec . > 2. Send a record "aaa" to the topic > 3. Wait for 1 second > Will this segment will rolled? I thought no. > But what I have tested is it will roll: > {code:java} > [2024-03-19 15:23:13,924] INFO [LocalLog partition=t2-1, > dir=/tmp/kafka-logs_jbod] Rolled new log segment at offset 1 in 3 ms. > (kafka.log.LocalLog) > [2024-03-19 15:23:13,925] INFO [ProducerStateManager partition=t2-1] Wrote > producer snapshot at offset 1 with 1 producer ids in 1 ms. > (org.apache.kafka.storage.internals.log.ProducerStateManager) > [2024-03-19 15:23:13,925] INFO [UnifiedLog partition=t2-1, > dir=/tmp/kafka-logs_jbod] Deleting segment LogSegment(baseOffset=0, size=71, > lastModifiedTime=1710832993131, largestRecordTimestamp=1710832992125) due to > log retention time 1000ms breach based on the largest record timestamp in the > segment (kafka.log.UnifiedLog) > {code} > The segment is rolled due to log retention time 1000ms breached, which is > unexpected. > Tested in v3.5.1, it has the same issue. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-16385) Segment is rolled before segment.ms or segment.bytes breached
[ https://issues.apache.org/jira/browse/KAFKA-16385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17830230#comment-17830230 ] Kuan Po Tseng commented on KAFKA-16385: --- Gentle ping [~chia7712], [~jeqo]. I've filed another JIRA ticket KAFKA-16414 to discuss the different behavior between `retention.ms` and `retention.bytes. Many thanks :) > Segment is rolled before segment.ms or segment.bytes breached > - > > Key: KAFKA-16385 > URL: https://issues.apache.org/jira/browse/KAFKA-16385 > Project: Kafka > Issue Type: Bug >Affects Versions: 3.5.1, 3.7.0 >Reporter: Luke Chen >Assignee: Kuan Po Tseng >Priority: Major > > Steps to reproduce: > 0. Startup a broker with `log.retention.check.interval.ms=1000` to speed up > the test. > 1. Creating a topic with the config: segment.ms=7days , segment.bytes=1GB, > retention.ms=1sec . > 2. Send a record "aaa" to the topic > 3. Wait for 1 second > Will this segment will rolled? I thought no. > But what I have tested is it will roll: > {code:java} > [2024-03-19 15:23:13,924] INFO [LocalLog partition=t2-1, > dir=/tmp/kafka-logs_jbod] Rolled new log segment at offset 1 in 3 ms. > (kafka.log.LocalLog) > [2024-03-19 15:23:13,925] INFO [ProducerStateManager partition=t2-1] Wrote > producer snapshot at offset 1 with 1 producer ids in 1 ms. > (org.apache.kafka.storage.internals.log.ProducerStateManager) > [2024-03-19 15:23:13,925] INFO [UnifiedLog partition=t2-1, > dir=/tmp/kafka-logs_jbod] Deleting segment LogSegment(baseOffset=0, size=71, > lastModifiedTime=1710832993131, largestRecordTimestamp=1710832992125) due to > log retention time 1000ms breach based on the largest record timestamp in the > segment (kafka.log.UnifiedLog) > {code} > The segment is rolled due to log retention time 1000ms breached, which is > unexpected. > Tested in v3.5.1, it has the same issue. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-16385) Segment is rolled before segment.ms or segment.bytes breached
[ https://issues.apache.org/jira/browse/KAFKA-16385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17830200#comment-17830200 ] Chia-Ping Tsai commented on KAFKA-16385: {quote} I, as well, agree that we should include the active segment on the retention checks; but would like to also discuss whether we should align active segment rotation for size-based retention as well. {quote} I'm on the sage page. the consistent behavior can reduce the incorrect usage about the retention configuration. {quote} That might need more discussion if we want to align their behavior. Before we conclude this discussion, we should document these differences so users don't get confused. I've address a PR in https://github.com/apache/kafka/pull/15588 and add more description mentioned in this JIRA discussion. {quote} [~brandboat] As you are starting the work, could you please file another to open the discussion for retention.byte? We need to reach the consensus for the behavior change. > Segment is rolled before segment.ms or segment.bytes breached > - > > Key: KAFKA-16385 > URL: https://issues.apache.org/jira/browse/KAFKA-16385 > Project: Kafka > Issue Type: Bug >Affects Versions: 3.5.1, 3.7.0 >Reporter: Luke Chen >Assignee: Kuan Po Tseng >Priority: Major > > Steps to reproduce: > 0. Startup a broker with `log.retention.check.interval.ms=1000` to speed up > the test. > 1. Creating a topic with the config: segment.ms=7days , segment.bytes=1GB, > retention.ms=1sec . > 2. Send a record "aaa" to the topic > 3. Wait for 1 second > Will this segment will rolled? I thought no. > But what I have tested is it will roll: > {code:java} > [2024-03-19 15:23:13,924] INFO [LocalLog partition=t2-1, > dir=/tmp/kafka-logs_jbod] Rolled new log segment at offset 1 in 3 ms. > (kafka.log.LocalLog) > [2024-03-19 15:23:13,925] INFO [ProducerStateManager partition=t2-1] Wrote > producer snapshot at offset 1 with 1 producer ids in 1 ms. > (org.apache.kafka.storage.internals.log.ProducerStateManager) > [2024-03-19 15:23:13,925] INFO [UnifiedLog partition=t2-1, > dir=/tmp/kafka-logs_jbod] Deleting segment LogSegment(baseOffset=0, size=71, > lastModifiedTime=1710832993131, largestRecordTimestamp=1710832992125) due to > log retention time 1000ms breach based on the largest record timestamp in the > segment (kafka.log.UnifiedLog) > {code} > The segment is rolled due to log retention time 1000ms breached, which is > unexpected. > Tested in v3.5.1, it has the same issue. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-16385) Segment is rolled before segment.ms or segment.bytes breached
[ https://issues.apache.org/jira/browse/KAFKA-16385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17830199#comment-17830199 ] Kuan Po Tseng commented on KAFKA-16385: --- {quote} I, as well, agree that we should include the active segment on the retention checks; but would like to also discuss whether we should align active segment rotation for size-based retention as well. {quote} Thank you [~jeqo]. Indeed there are some inconsistent behavior between retention.ms and retention.bytes regrading the expiration of active segments. That might need more discussion if we want to align their behavior. Before we conclude this discussion, we should document these differences so users don't get confused. I've address a PR in https://github.com/apache/kafka/pull/15588 and add more description mentioned in this JIRA discussion. > Segment is rolled before segment.ms or segment.bytes breached > - > > Key: KAFKA-16385 > URL: https://issues.apache.org/jira/browse/KAFKA-16385 > Project: Kafka > Issue Type: Bug >Affects Versions: 3.5.1, 3.7.0 >Reporter: Luke Chen >Assignee: Kuan Po Tseng >Priority: Major > > Steps to reproduce: > 0. Startup a broker with `log.retention.check.interval.ms=1000` to speed up > the test. > 1. Creating a topic with the config: segment.ms=7days , segment.bytes=1GB, > retention.ms=1sec . > 2. Send a record "aaa" to the topic > 3. Wait for 1 second > Will this segment will rolled? I thought no. > But what I have tested is it will roll: > {code:java} > [2024-03-19 15:23:13,924] INFO [LocalLog partition=t2-1, > dir=/tmp/kafka-logs_jbod] Rolled new log segment at offset 1 in 3 ms. > (kafka.log.LocalLog) > [2024-03-19 15:23:13,925] INFO [ProducerStateManager partition=t2-1] Wrote > producer snapshot at offset 1 with 1 producer ids in 1 ms. > (org.apache.kafka.storage.internals.log.ProducerStateManager) > [2024-03-19 15:23:13,925] INFO [UnifiedLog partition=t2-1, > dir=/tmp/kafka-logs_jbod] Deleting segment LogSegment(baseOffset=0, size=71, > lastModifiedTime=1710832993131, largestRecordTimestamp=1710832992125) due to > log retention time 1000ms breach based on the largest record timestamp in the > segment (kafka.log.UnifiedLog) > {code} > The segment is rolled due to log retention time 1000ms breached, which is > unexpected. > Tested in v3.5.1, it has the same issue. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-16385) Segment is rolled before segment.ms or segment.bytes breached
[ https://issues.apache.org/jira/browse/KAFKA-16385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17829682#comment-17829682 ] Jorge Esteban Quilcate Otoya commented on KAFKA-16385: -- > I admin my original understanding of `retention.ms` is it only take affects > to the inactive segments. I'm happy that I'm in great company on this one and I wasn't the only me who believed that active segment was not considered in retention cleanups :) I started a discussion thread[1] earlier today on a related topic: yes, retention.ms takes precedence to segment.ms and log.roll.ms and rolls a new (empty) active segment when max segment timestamp matches the condition. But, retention.bytes doesn't follow a similar path: given the current condition[2] for size-based rotation, it always forces to have at least 1 (active) segment _unless_ (and this is something I discovered on this thread[3] mentioned by [~chia7712] ) the segment size is equal to zero. I, as well, agree that we should include the active segment on the retention checks; but would like to also discuss whether we should align active segment rotation for size-based retention as well. [1] [https://lists.apache.org/thread/s9xp17dpx21wqh9gp42kbvb4m93vvb23] [2] [https://github.com/apache/kafka/blob/55a6d30ccbe971f4d2e99aeb3b1a773ffe5792a2/core/src/main/scala/kafka/log/UnifiedLog.scala#L1575-L1583] [3] https://issues.apache.org/jira/browse/KAFKA-16385?focusedCommentId=17828281=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17828281 > Segment is rolled before segment.ms or segment.bytes breached > - > > Key: KAFKA-16385 > URL: https://issues.apache.org/jira/browse/KAFKA-16385 > Project: Kafka > Issue Type: Bug >Affects Versions: 3.5.1, 3.7.0 >Reporter: Luke Chen >Assignee: Kuan Po Tseng >Priority: Major > > Steps to reproduce: > 0. Startup a broker with `log.retention.check.interval.ms=1000` to speed up > the test. > 1. Creating a topic with the config: segment.ms=7days , segment.bytes=1GB, > retention.ms=1sec . > 2. Send a record "aaa" to the topic > 3. Wait for 1 second > Will this segment will rolled? I thought no. > But what I have tested is it will roll: > {code:java} > [2024-03-19 15:23:13,924] INFO [LocalLog partition=t2-1, > dir=/tmp/kafka-logs_jbod] Rolled new log segment at offset 1 in 3 ms. > (kafka.log.LocalLog) > [2024-03-19 15:23:13,925] INFO [ProducerStateManager partition=t2-1] Wrote > producer snapshot at offset 1 with 1 producer ids in 1 ms. > (org.apache.kafka.storage.internals.log.ProducerStateManager) > [2024-03-19 15:23:13,925] INFO [UnifiedLog partition=t2-1, > dir=/tmp/kafka-logs_jbod] Deleting segment LogSegment(baseOffset=0, size=71, > lastModifiedTime=1710832993131, largestRecordTimestamp=1710832992125) due to > log retention time 1000ms breach based on the largest record timestamp in the > segment (kafka.log.UnifiedLog) > {code} > The segment is rolled due to log retention time 1000ms breached, which is > unexpected. > Tested in v3.5.1, it has the same issue. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-16385) Segment is rolled before segment.ms or segment.bytes breached
[ https://issues.apache.org/jira/browse/KAFKA-16385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17828654#comment-17828654 ] Kuan Po Tseng commented on KAFKA-16385: --- {quote} [~brandboat] , are you clear what you should do for this ticket? Please let us know if you have any question. {quote} Thanks, I'm still poking around the source code, but sounds like we should document the behavior mentioned in this JIRA ticket. If I have any questions, I'll consult with you all again. Huge thanks ! > Segment is rolled before segment.ms or segment.bytes breached > - > > Key: KAFKA-16385 > URL: https://issues.apache.org/jira/browse/KAFKA-16385 > Project: Kafka > Issue Type: Bug >Affects Versions: 3.5.1, 3.7.0 >Reporter: Luke Chen >Assignee: Kuan Po Tseng >Priority: Major > > Steps to reproduce: > 0. Startup a broker with `log.retention.check.interval.ms=1000` to speed up > the test. > 1. Creating a topic with the config: segment.ms=7days , segment.bytes=1GB, > retention.ms=1sec . > 2. Send a record "aaa" to the topic > 3. Wait for 1 second > Will this segment will rolled? I thought no. > But what I have tested is it will roll: > {code:java} > [2024-03-19 15:23:13,924] INFO [LocalLog partition=t2-1, > dir=/tmp/kafka-logs_jbod] Rolled new log segment at offset 1 in 3 ms. > (kafka.log.LocalLog) > [2024-03-19 15:23:13,925] INFO [ProducerStateManager partition=t2-1] Wrote > producer snapshot at offset 1 with 1 producer ids in 1 ms. > (org.apache.kafka.storage.internals.log.ProducerStateManager) > [2024-03-19 15:23:13,925] INFO [UnifiedLog partition=t2-1, > dir=/tmp/kafka-logs_jbod] Deleting segment LogSegment(baseOffset=0, size=71, > lastModifiedTime=1710832993131, largestRecordTimestamp=1710832992125) due to > log retention time 1000ms breach based on the largest record timestamp in the > segment (kafka.log.UnifiedLog) > {code} > The segment is rolled due to log retention time 1000ms breached, which is > unexpected. > Tested in v3.5.1, it has the same issue. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-16385) Segment is rolled before segment.ms or segment.bytes breached
[ https://issues.apache.org/jira/browse/KAFKA-16385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17828620#comment-17828620 ] Luke Chen commented on KAFKA-16385: --- Thanks for the response Jun, and thanks for the summary, Chia-Ping. [~brandboat] , are you clear what you should do for this ticket? Please let us know if you have any question. > Segment is rolled before segment.ms or segment.bytes breached > - > > Key: KAFKA-16385 > URL: https://issues.apache.org/jira/browse/KAFKA-16385 > Project: Kafka > Issue Type: Bug >Affects Versions: 3.5.1, 3.7.0 >Reporter: Luke Chen >Assignee: Kuan Po Tseng >Priority: Major > > Steps to reproduce: > 0. Startup a broker with `log.retention.check.interval.ms=1000` to speed up > the test. > 1. Creating a topic with the config: segment.ms=7days , segment.bytes=1GB, > retention.ms=1sec . > 2. Send a record "aaa" to the topic > 3. Wait for 1 second > Will this segment will rolled? I thought no. > But what I have tested is it will roll: > {code:java} > [2024-03-19 15:23:13,924] INFO [LocalLog partition=t2-1, > dir=/tmp/kafka-logs_jbod] Rolled new log segment at offset 1 in 3 ms. > (kafka.log.LocalLog) > [2024-03-19 15:23:13,925] INFO [ProducerStateManager partition=t2-1] Wrote > producer snapshot at offset 1 with 1 producer ids in 1 ms. > (org.apache.kafka.storage.internals.log.ProducerStateManager) > [2024-03-19 15:23:13,925] INFO [UnifiedLog partition=t2-1, > dir=/tmp/kafka-logs_jbod] Deleting segment LogSegment(baseOffset=0, size=71, > lastModifiedTime=1710832993131, largestRecordTimestamp=1710832992125) due to > log retention time 1000ms breach based on the largest record timestamp in the > segment (kafka.log.UnifiedLog) > {code} > The segment is rolled due to log retention time 1000ms breached, which is > unexpected. > Tested in v3.5.1, it has the same issue. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-16385) Segment is rolled before segment.ms or segment.bytes breached
[ https://issues.apache.org/jira/browse/KAFKA-16385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17828594#comment-17828594 ] Chia-Ping Tsai commented on KAFKA-16385: {quote} One potential way to improve this is to use the timestamp index to find the cutoff offset in the active segment and move the logStartOffset to that point. We need to understand if there is any additional I/O impact because of this. {quote} not sure whether it is worthwhile improvement. We should not encourage users to expect that the cleanup can delete segments accurately. Especially, user can define their timestamp so the expired records could be still existent even though we can move the logStartOffset. For example: (non-expired record has offset=2, timestamp=100) and (expired record has offset=3, timestamp=90) {quote} As you observed, the current implementation is a bit weird since it depends on whether there are new records or not. {quote} That probably makes sense: The segment is NOT expired as it has new records :) In short, the implementation of retention.ms could roll and then delete the active segment. We should improve the documents for such scenario. > Segment is rolled before segment.ms or segment.bytes breached > - > > Key: KAFKA-16385 > URL: https://issues.apache.org/jira/browse/KAFKA-16385 > Project: Kafka > Issue Type: Bug >Affects Versions: 3.5.1, 3.7.0 >Reporter: Luke Chen >Assignee: Kuan Po Tseng >Priority: Major > > Steps to reproduce: > 0. Startup a broker with `log.retention.check.interval.ms=1000` to speed up > the test. > 1. Creating a topic with the config: segment.ms=7days , segment.bytes=1GB, > retention.ms=1sec . > 2. Send a record "aaa" to the topic > 3. Wait for 1 second > Will this segment will rolled? I thought no. > But what I have tested is it will roll: > {code:java} > [2024-03-19 15:23:13,924] INFO [LocalLog partition=t2-1, > dir=/tmp/kafka-logs_jbod] Rolled new log segment at offset 1 in 3 ms. > (kafka.log.LocalLog) > [2024-03-19 15:23:13,925] INFO [ProducerStateManager partition=t2-1] Wrote > producer snapshot at offset 1 with 1 producer ids in 1 ms. > (org.apache.kafka.storage.internals.log.ProducerStateManager) > [2024-03-19 15:23:13,925] INFO [UnifiedLog partition=t2-1, > dir=/tmp/kafka-logs_jbod] Deleting segment LogSegment(baseOffset=0, size=71, > lastModifiedTime=1710832993131, largestRecordTimestamp=1710832992125) due to > log retention time 1000ms breach based on the largest record timestamp in the > segment (kafka.log.UnifiedLog) > {code} > The segment is rolled due to log retention time 1000ms breached, which is > unexpected. > Tested in v3.5.1, it has the same issue. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-16385) Segment is rolled before segment.ms or segment.bytes breached
[ https://issues.apache.org/jira/browse/KAFKA-16385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17828506#comment-17828506 ] Jun Rao commented on KAFKA-16385: - [~showuon] : Yes, the retention by time is supposed to cover the active segment too. As you observed, the current implementation is a bit weird since it depends on whether there are new records or not. One potential way to improve this is to use the timestamp index to find the cutoff offset in the active segment and move the logStartOffset to that point. We need to understand if there is any additional I/O impact because of this. > Segment is rolled before segment.ms or segment.bytes breached > - > > Key: KAFKA-16385 > URL: https://issues.apache.org/jira/browse/KAFKA-16385 > Project: Kafka > Issue Type: Bug >Affects Versions: 3.5.1, 3.7.0 >Reporter: Luke Chen >Assignee: Kuan Po Tseng >Priority: Major > > Steps to reproduce: > 0. Startup a broker with `log.retention.check.interval.ms=1000` to speed up > the test. > 1. Creating a topic with the config: segment.ms=7days , segment.bytes=1GB, > retention.ms=1sec . > 2. Send a record "aaa" to the topic > 3. Wait for 1 second > Will this segment will rolled? I thought no. > But what I have tested is it will roll: > {code:java} > [2024-03-19 15:23:13,924] INFO [LocalLog partition=t2-1, > dir=/tmp/kafka-logs_jbod] Rolled new log segment at offset 1 in 3 ms. > (kafka.log.LocalLog) > [2024-03-19 15:23:13,925] INFO [ProducerStateManager partition=t2-1] Wrote > producer snapshot at offset 1 with 1 producer ids in 1 ms. > (org.apache.kafka.storage.internals.log.ProducerStateManager) > [2024-03-19 15:23:13,925] INFO [UnifiedLog partition=t2-1, > dir=/tmp/kafka-logs_jbod] Deleting segment LogSegment(baseOffset=0, size=71, > lastModifiedTime=1710832993131, largestRecordTimestamp=1710832992125) due to > log retention time 1000ms breach based on the largest record timestamp in the > segment (kafka.log.UnifiedLog) > {code} > The segment is rolled due to log retention time 1000ms breached, which is > unexpected. > Tested in v3.5.1, it has the same issue. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-16385) Segment is rolled before segment.ms or segment.bytes breached
[ https://issues.apache.org/jira/browse/KAFKA-16385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17828319#comment-17828319 ] Luke Chen commented on KAFKA-16385: --- [~divijvaidya], I was thinking the use case you mentioned: _I have set max segment size to be 1 GB and I have a topic with low ingress traffic. I want to expire data in my log every 1 day due to compliance requirement. But the partition doesn't receive 1GB of data in one day and hence, my active segment will never become eligible for expiration. _ OK, so, even if we adopt the option 2, we still cannot guarantee all the data expire the 1 day limit will get deleted. Let's say, when right before the retention thread starting to check, a new record arrived. In this case, this segment won't be eligible for expiration even though it contains data over 1 day. And it breaks the contract of the retention.ms. Again, I don't know which is the expected behavior we want. So I'd like to hear more comments from the community/experts. > Segment is rolled before segment.ms or segment.bytes breached > - > > Key: KAFKA-16385 > URL: https://issues.apache.org/jira/browse/KAFKA-16385 > Project: Kafka > Issue Type: Bug >Affects Versions: 3.5.1, 3.7.0 >Reporter: Luke Chen >Assignee: Kuan Po Tseng >Priority: Major > > Steps to reproduce: > 0. Startup a broker with `log.retention.check.interval.ms=1000` to speed up > the test. > 1. Creating a topic with the config: segment.ms=7days , retention.ms=1sec . > 2. Send a record "aaa" to the topic > 3. Wait for 1 second > Will this segment will rolled? I thought no. > But what I have tested is it will roll: > {code:java} > [2024-03-19 15:23:13,924] INFO [LocalLog partition=t2-1, > dir=/tmp/kafka-logs_jbod] Rolled new log segment at offset 1 in 3 ms. > (kafka.log.LocalLog) > [2024-03-19 15:23:13,925] INFO [ProducerStateManager partition=t2-1] Wrote > producer snapshot at offset 1 with 1 producer ids in 1 ms. > (org.apache.kafka.storage.internals.log.ProducerStateManager) > [2024-03-19 15:23:13,925] INFO [UnifiedLog partition=t2-1, > dir=/tmp/kafka-logs_jbod] Deleting segment LogSegment(baseOffset=0, size=71, > lastModifiedTime=1710832993131, largestRecordTimestamp=1710832992125) due to > log retention time 1000ms breach based on the largest record timestamp in the > segment (kafka.log.UnifiedLog) > {code} > The segment is rolled due to log retention time 1000ms breached, which is > unexpected. > Tested in v3.5.1, it has the same issue. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-16385) Segment is rolled before segment.ms or segment.bytes breached
[ https://issues.apache.org/jira/browse/KAFKA-16385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17828296#comment-17828296 ] Luke Chen commented on KAFKA-16385: --- Thanks Chia-Ping. I admin my original understanding of `retention.ms` is it only take affects to the inactive segments. So, in my example of `segment.ms=7days`, `retention.ms=1sec`, my expectaion is the segment will be rolled after 7 days or size > segment.bytes, and then, the segment will be eligible for deletion. But from Divij's explanation, I agree the definition of `retention.ms` is more like "the oldest record allowed to appear in the log". If that's the case, then I could be wrong and we should improve the doc. Otherwise, this should be a bug that the active (idle) segment should not be rolled even though the maxTimestamp is expired (retention.ms). > Segment is rolled before segment.ms or segment.bytes breached > - > > Key: KAFKA-16385 > URL: https://issues.apache.org/jira/browse/KAFKA-16385 > Project: Kafka > Issue Type: Bug >Affects Versions: 3.5.1, 3.7.0 >Reporter: Luke Chen >Assignee: Kuan Po Tseng >Priority: Major > > Steps to reproduce: > 0. Startup a broker with `log.retention.check.interval.ms=1000` to speed up > the test. > 1. Creating a topic with the config: segment.ms=7days , retention.ms=1sec . > 2. Send a record "aaa" to the topic > 3. Wait for 1 second > Will this segment will rolled? I thought no. > But what I have tested is it will roll: > {code:java} > [2024-03-19 15:23:13,924] INFO [LocalLog partition=t2-1, > dir=/tmp/kafka-logs_jbod] Rolled new log segment at offset 1 in 3 ms. > (kafka.log.LocalLog) > [2024-03-19 15:23:13,925] INFO [ProducerStateManager partition=t2-1] Wrote > producer snapshot at offset 1 with 1 producer ids in 1 ms. > (org.apache.kafka.storage.internals.log.ProducerStateManager) > [2024-03-19 15:23:13,925] INFO [UnifiedLog partition=t2-1, > dir=/tmp/kafka-logs_jbod] Deleting segment LogSegment(baseOffset=0, size=71, > lastModifiedTime=1710832993131, largestRecordTimestamp=1710832992125) due to > log retention time 1000ms breach based on the largest record timestamp in the > segment (kafka.log.UnifiedLog) > {code} > The segment is rolled due to log retention time 1000ms breached, which is > unexpected. > Tested in v3.5.1, it has the same issue. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-16385) Segment is rolled before segment.ms or segment.bytes breached
[ https://issues.apache.org/jira/browse/KAFKA-16385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17828281#comment-17828281 ] Chia-Ping Tsai commented on KAFKA-16385: It seems to me the scenario is about " should we roll the idle segment even though it is the active one"? the expire-based cleanup normally skips the active segment since the max timestamp of active segment keep updating. However, if we stop pushing data to active segment, it has chance to be viewed as expired segment and be cleaned-up by the "kafka-log-retention" thread. I feel that is option 2 mentioned by [~divijvaidya] Also, there is similar scenario if we define `retention.bytes to zero. > Segment is rolled before segment.ms or segment.bytes breached > - > > Key: KAFKA-16385 > URL: https://issues.apache.org/jira/browse/KAFKA-16385 > Project: Kafka > Issue Type: Bug >Affects Versions: 3.5.1, 3.7.0 >Reporter: Luke Chen >Assignee: Kuan Po Tseng >Priority: Major > > Steps to reproduce: > 0. Startup a broker with `log.retention.check.interval.ms=1000` to speed up > the test. > 1. Creating a topic with the config: segment.ms=7days , retention.ms=1sec . > 2. Send a record "aaa" to the topic > 3. Wait for 1 second > Will this segment will rolled? I thought no. > But what I have tested is it will roll: > {code:java} > [2024-03-19 15:23:13,924] INFO [LocalLog partition=t2-1, > dir=/tmp/kafka-logs_jbod] Rolled new log segment at offset 1 in 3 ms. > (kafka.log.LocalLog) > [2024-03-19 15:23:13,925] INFO [ProducerStateManager partition=t2-1] Wrote > producer snapshot at offset 1 with 1 producer ids in 1 ms. > (org.apache.kafka.storage.internals.log.ProducerStateManager) > [2024-03-19 15:23:13,925] INFO [UnifiedLog partition=t2-1, > dir=/tmp/kafka-logs_jbod] Deleting segment LogSegment(baseOffset=0, size=71, > lastModifiedTime=1710832993131, largestRecordTimestamp=1710832992125) due to > log retention time 1000ms breach based on the largest record timestamp in the > segment (kafka.log.UnifiedLog) > {code} > The segment is rolled due to log retention time 1000ms breached, which is > unexpected. > Tested in v3.5.1, it has the same issue. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-16385) Segment is rolled before segment.ms or segment.bytes breached
[ https://issues.apache.org/jira/browse/KAFKA-16385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17828251#comment-17828251 ] Luke Chen commented on KAFKA-16385: --- [~divijvaidya], I agree with you in Kafka, we don't have clear definition/documentation about: _How do we define the behaviour when expiration configuration is less than roll configuration?_ Obviously, the current behavior goes with option 2 in your description. If that's the expected behavior, we should document it to avoid confusion. From current document about `segment.ms`/`segment.bytes` you will think this is the only factor to affect the segment roll timing, while it's not true. If we decide to go with Option 2, we should at least improve the document. That said, I agree option 2 is what we should choose. But before we made final decision, I'd like to hear some more experts' comments about this. [~ijuma] [~junrao], thoughts about this? > Segment is rolled before segment.ms or segment.bytes breached > - > > Key: KAFKA-16385 > URL: https://issues.apache.org/jira/browse/KAFKA-16385 > Project: Kafka > Issue Type: Bug >Affects Versions: 3.5.1, 3.7.0 >Reporter: Luke Chen >Assignee: Kuan Po Tseng >Priority: Major > > Steps to reproduce: > 0. Startup a broker with `log.retention.check.interval.ms=1000` to speed up > the test. > 1. Creating a topic with the config: segment.ms=7days , retention.ms=1sec . > 2. Send a record "aaa" to the topic > 3. Wait for 1 second > Will this segment will rolled? I thought no. > But what I have tested is it will roll: > {code:java} > [2024-03-19 15:23:13,924] INFO [LocalLog partition=t2-1, > dir=/tmp/kafka-logs_jbod] Rolled new log segment at offset 1 in 3 ms. > (kafka.log.LocalLog) > [2024-03-19 15:23:13,925] INFO [ProducerStateManager partition=t2-1] Wrote > producer snapshot at offset 1 with 1 producer ids in 1 ms. > (org.apache.kafka.storage.internals.log.ProducerStateManager) > [2024-03-19 15:23:13,925] INFO [UnifiedLog partition=t2-1, > dir=/tmp/kafka-logs_jbod] Deleting segment LogSegment(baseOffset=0, size=71, > lastModifiedTime=1710832993131, largestRecordTimestamp=1710832992125) due to > log retention time 1000ms breach based on the largest record timestamp in the > segment (kafka.log.UnifiedLog) > {code} > The segment is rolled due to log retention time 1000ms breached, which is > unexpected. > Tested in v3.5.1, it has the same issue. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-16385) Segment is rolled before segment.ms or segment.bytes breached
[ https://issues.apache.org/jira/browse/KAFKA-16385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17828247#comment-17828247 ] Divij Vaidya commented on KAFKA-16385: -- [~showuon] I must be missing something here but the current behaviour looks correct to me. Let's consider a use case from a Apache Kafka user: I have set max segment size to be 1 GB and I have a topic with low ingress traffic. I want to expire data in my log every 1 day due to compliance requirement. But the partition doesn't receive 1GB of data in one day and hence, my active segment will never become eligible for expiration. Now, user can set segment.ms = 1 day to force a rotation even when segment is not full. This should satisfy the use case. But how do we define the behaviour when expiration configuration is less than roll configuration. We have have two options: Option 1: Ignore expiration config if it is less than rotation config Option 2: Expiration config overrides rotation config Option 1 prioritizes an internal configuration (ideally a user shouldn't know about segments etc in a log) over a functional config (user wants to expire data). This requires users to know about inner details of logs such as presence of a segment or index etc. At Apache Kafka, we have chosen option 2, i.e. prioritize a user facing functionality config (expiration config) over an internal config (rotation config). Thoughts? > Segment is rolled before segment.ms or segment.bytes breached > - > > Key: KAFKA-16385 > URL: https://issues.apache.org/jira/browse/KAFKA-16385 > Project: Kafka > Issue Type: Bug >Affects Versions: 3.5.1, 3.7.0 >Reporter: Luke Chen >Assignee: Kuan Po Tseng >Priority: Major > > Steps to reproduce: > 0. Startup a broker with `log.retention.check.interval.ms=1000` to speed up > the test. > 1. Creating a topic with the config: segment.ms=7days , retention.ms=1sec . > 2. Send a record "aaa" to the topic > 3. Wait for 1 second > Will this segment will rolled? I thought no. > But what I have tested is it will roll: > {code:java} > [2024-03-19 15:23:13,924] INFO [LocalLog partition=t2-1, > dir=/tmp/kafka-logs_jbod] Rolled new log segment at offset 1 in 3 ms. > (kafka.log.LocalLog) > [2024-03-19 15:23:13,925] INFO [ProducerStateManager partition=t2-1] Wrote > producer snapshot at offset 1 with 1 producer ids in 1 ms. > (org.apache.kafka.storage.internals.log.ProducerStateManager) > [2024-03-19 15:23:13,925] INFO [UnifiedLog partition=t2-1, > dir=/tmp/kafka-logs_jbod] Deleting segment LogSegment(baseOffset=0, size=71, > lastModifiedTime=1710832993131, largestRecordTimestamp=1710832992125) due to > log retention time 1000ms breach based on the largest record timestamp in the > segment (kafka.log.UnifiedLog) > {code} > The segment is rolled due to log retention time 1000ms breached, which is > unexpected. > Tested in v3.5.1, it has the same issue. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-16385) Segment is rolled before segment.ms or segment.bytes breached
[ https://issues.apache.org/jira/browse/KAFKA-16385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17828227#comment-17828227 ] Kuan Po Tseng commented on KAFKA-16385: --- I'm willing to take over this ! Many thanks ! > Segment is rolled before segment.ms or segment.bytes breached > - > > Key: KAFKA-16385 > URL: https://issues.apache.org/jira/browse/KAFKA-16385 > Project: Kafka > Issue Type: Bug >Affects Versions: 3.7.0 >Reporter: Luke Chen >Priority: Major > > Steps to reproduce: > 1. Creating a topic with the config: segment.ms=7days , retention.ms=1sec . > 2. Send a record "aaa" to the topic > 3. Wait for 1 second > Will this segment will rolled? I thought no. > But what I have tested is it will roll: > {code:java} > [2024-03-19 15:23:13,924] INFO [LocalLog partition=t2-1, > dir=/tmp/kafka-logs_jbod] Rolled new log segment at offset 1 in 3 ms. > (kafka.log.LocalLog) > [2024-03-19 15:23:13,925] INFO [ProducerStateManager partition=t2-1] Wrote > producer snapshot at offset 1 with 1 producer ids in 1 ms. > (org.apache.kafka.storage.internals.log.ProducerStateManager) > [2024-03-19 15:23:13,925] INFO [UnifiedLog partition=t2-1, > dir=/tmp/kafka-logs_jbod] Deleting segment LogSegment(baseOffset=0, size=71, > lastModifiedTime=1710832993131, largestRecordTimestamp=1710832992125) due to > log retention time 1000ms breach based on the largest record timestamp in the > segment (kafka.log.UnifiedLog) > {code} > The segment is rolled due to log retention time 1000ms breached, which is > unexpected. -- This message was sent by Atlassian Jira (v8.20.10#820010)