[
https://issues.apache.org/jira/browse/KAFKA-475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13438911#comment-13438911
]
Neha Narkhede commented on KAFKA-475:
-------------------------------------
If you roll log segments based on retention time, seems like you can have only
one segment for that log at any point of time. If you want to roll 5 minute
segments, it means that you can only have 5 minute worth of data for that
partition. On the contrary, if I choose size based rolling and size based
retention, I can have multiple log segments each of a specific size. What seems
desirable is to have time based rolling + retention also behave the same way. I
would imagine applications wanting to roll segments every 1 hour and retain 24
hours worth of data. This is an advantage for applications using
getOffsetsBefore() to do some time indexed fetch of the data, since
getOffsetsBefore only returns offsets at the log segment granularity. And it
also gives applications a way to reason about the time window of the data
retained for a partition. One potential downside is that, you can end up
creating large number of log segments for your partition, if you choose too
small a value for log.file.time.ms. But this problem exists today with size
based log segment rolling too. So we are not introducing any regression in
behavior.
Other review comments -
1. Log
1.1 Rename currentMS to currentMs (Follow camel case convention).
1.2 How about renaming retentionMSInterval to retentionIntervalMs to be
consistent with naming convention ?
1.3 In maybeRoll, looks like currentMS is unused apart from being used to
compute the time difference. How about removing currentMS ?
2. LogManager
2.1 This is unrelated to your patch, but lets also rename logRetentionMSMap to
logRetentionMsMap
> Time based log segment rollout
> ------------------------------
>
> Key: KAFKA-475
> URL: https://issues.apache.org/jira/browse/KAFKA-475
> Project: Kafka
> Issue Type: New Feature
> Affects Versions: 0.7.1
> Reporter: Swapnil Ghike
> Assignee: Swapnil Ghike
> Labels: features
> Fix For: 0.7.2
>
> Attachments: kafka-475-v1.patch
>
> Original Estimate: 48h
> Remaining Estimate: 48h
>
> Some applications might want their data to be deleted from the Kafka servers
> earlier than the default retention time.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira