[ 
https://issues.apache.org/jira/browse/KAFKA-4099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15596037#comment-15596037
 ] 

Jun Rao commented on KAFKA-4099:
--------------------------------

I had two use cases of time-based rolling in mind. The first one is for users 
who don't want to retain a message (say sensitive data) in the log for too 
long. In this case, we want to be able to roll the log periodically based on 
time such that it will freeze the largest timestamp in the rolled segment and 
cause it to be deleted when the time limit has been reached. The second one is 
for log cleaner to happen quicker since the cleaner never cleans the active 
segment. In both cases, we really just want to be able to roll the log at some 
predicable time interval. There are different implementations can achieve this. 

The issue with the current implementation is that if data with oscillating 
timestamp are published at the same time, it causes the log to roll to quickly, 
which will surprise people. We can ask people to turn off log rolling in most 
cases. However, the default log rolling is 7 days and people could hit this 
issue before realizing it. In some of the rare cases, people may indeed want to 
configure time-based log rolling and may still send data with oscillating 
timestamp. It would be good if the underlying system can support his without 
any performance impact.

As for a better implementation, the original approach of just rolling based on 
create time addresses both use cases in the common cases, without the risk of 
rolling too frequently. The only thing is that create time will be reset when 
segments get moved. However, that happens rarely though. So, if there are no 
other better solutions that we could think of, this could be a safer 
implementation.

> Change the time based log rolling to only based on the message timestamp.
> -------------------------------------------------------------------------
>
>                 Key: KAFKA-4099
>                 URL: https://issues.apache.org/jira/browse/KAFKA-4099
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>            Reporter: Jiangjie Qin
>            Assignee: Jiangjie Qin
>             Fix For: 0.10.1.0
>
>
> This is an issue introduced in KAFKA-3163. When partition relocation occurs, 
> the newly created replica may have messages with old timestamp and cause the 
> log segment rolling for each message. The fix is to change the log rolling 
> behavior to only based on the message timestamp when the messages are in 
> message format 0.10.0 or above. If the first message in the segment does not 
> have a timetamp, we will fall back to use the wall clock time for log rolling.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to