[ 
https://issues.apache.org/jira/browse/KAFKA-4099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15447253#comment-15447253
 ] 

Jiangjie Qin commented on KAFKA-4099:
-------------------------------------

[~junrao] I am thinking about this solution. It seems still not ideal. For some 
low volume topics, if we roll the log based on the segment create time, during 
partition relocation, we may keep the sensitive data for much longer than we 
wanted to - because all the data may be end up in the same segment and the old 
data cannot be deleted because they are still with the new data.

It seems the root cause of the unnecessary log rolling is that we are comparing 
the timestamp in the message and the wall clock time. This caused the log 
rolling to become wall clock time sensitive. I am thinking may be we should 
always use the timestamp in the message. i.e. we roll out the log segment if 
the timestamp in the current message is greater than the timestamp of the first 
message in the segment by more than log.roll.ms. This approach is wall clock 
independent and should solve the problem. With 
message.timestamp.difference.max.ms configuration, we can achieve 1) the log 
segment will be rolled out in a bounded time, 2) no excessively large timestamp 
will be accepted and cause frequent log rolling.

What do you think?

> Change the time based log rolling to base on the file create time instead of 
> timestamp of the first message.
> ------------------------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-4099
>                 URL: https://issues.apache.org/jira/browse/KAFKA-4099
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>            Reporter: Jiangjie Qin
>            Assignee: Jiangjie Qin
>             Fix For: 0.10.1.0
>
>
> This is an issue introduced in KAFKA-3163. When partition relocation occurs, 
> the newly created replica may have messages with old timestamp and cause the 
> log segment rolling for each message. The fix is to change the log rolling 
> behavior back to based on segment create time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to