Hi Becket, 1. I'm not entirely sure, log compaction by default only leaves the most recent key in. Are you proposing that we also could drop the unique key as well, depending on its timestamp? 2. Not sure if I'm qualified to answer that question. Would this be covered by existing broker logic that checks for a minimum difference between broker local time and the value of an incoming message's timestamp?
Thanks for getting KIP-32 in, it makes this KIP pretty simple to implement. On Fri, Feb 19, 2016 at 9:21 PM, Becket Qin <becket....@gmail.com> wrote: > Thanks for the KIP, Bill. I think it is a useful retention policy and does > not require big changes. > > I have a few questions: > 1. Does it have value to use this policy to log compaction as well? > Although by definition log compaction means always keeping the latest key, > but sometimes user might want to remove a super old key. > 2. If CreateTime is used, and a message whose timestamp is smaller > than *log.retention.min.timestamp > *is appended, should the server simply reject it or let it go and wait for > the current segment get deleted? Do we have any sanity check on > log.retention.min.timestamp against other existing configurations? > > As Jun pointed out, it seems work for CreateTime as well based on the > current implementation plan of time based log index. > > Thanks, > > Jiangjie (Becket) Qin > > > On Fri, Feb 19, 2016 at 4:02 PM, Jun Rao <j...@confluent.io> wrote: > > > Hi, Bill, > > > > I replied with the following comments earlier to the thread. Did you see > > that? > > > > Thanks for the proposal. A couple of comments. > > > > 1. It seems that this new policy should work for CreateTime as well. If a > > topic is configured with CreateTime, messages may not be added in strict > > order in the log. However, to build a time-based index, we will be > > maintaining the largest timestamp for all messages in a log segment. We > can > > delete a segment if its largest timestamp is less than > > log.retention.min.timestamp. This guarantees that no messages newer than > > log.retention.min.timestamp will be deleted, which is probably what the > > user wants. > > > > 2. Right now, the user can specify "delete" as the retention policy and a > > log segment will be deleted either when the size of a partition exceeds a > > threshold or the timestamp of a segment is older than a relative period > of > > time (say 7 days) from now. What you are proposing is not a new retention > > policy, but an additional check that will cause a segment to be deleted > > when the timestamp of a segment is older than an absolute timestamp? If > so, > > could you update the wiki accordingly? > > > > Thanks, > > > > Jun > > > > On Fri, Feb 19, 2016 at 2:57 PM, Bill Warshaw <wdwars...@gmail.com> > wrote: > > > > > Hello all, > > > > > > What is the next step with this proposal? The work for KIP-32 that it > > was > > > based off merged earlier today ( > https://github.com/apache/kafka/pull/764 > > , > > > thank you Becket). I have an implementation with tests, and I've > > confirmed > > > that it actually works in a live system. Is there more discussion that > > > needs to be had about this KIP, or should I start a VOTE thread? > > > > > > > > > > > > On Tue, Feb 16, 2016 at 5:06 PM, Jun Rao <j...@confluent.io> wrote: > > > > > > > Bill, > > > > > > > > Thanks for the proposal. A couple of comments. > > > > > > > > 1. It seems that this new policy should work for CreateTime as well. > > If a > > > > topic is configured with CreateTime, messages may not be added in > > strict > > > > order in the log. However, to build a time-based index, we will be > > > > maintaining the largest timestamp for all messages in a log segment. > We > > > can > > > > delete a segment if its largest timestamp is less than > > > > log.retention.min.timestamp. This guarantees that no messages newer > > than > > > > log.retention.min.timestamp will be deleted, which is probably what > the > > > > user wants. > > > > > > > > 2. Right now, the user can specify "delete" as the retention policy > > and a > > > > log segment will be deleted either when the size of a partition > > exceeds a > > > > threshold or the timestamp of a segment is older than a relative > period > > > of > > > > time (say 7 days) from now. What you are proposing is not a new > > retention > > > > policy, but an additional check that will cause a segment to be > deleted > > > > when the timestamp of a segment is older than an absolute timestamp? > If > > > so, > > > > could you update the wiki accordingly? > > > > > > > > Jun > > > > > > > > > > > > > > > > On Sat, Feb 13, 2016 at 3:23 PM, Bill Warshaw <wdwars...@gmail.com> > > > wrote: > > > > > > > > > Hello, > > > > > > > > > > That is a good catch, thanks for pointing it out. If this KIP is > > > > accepted, > > > > > we'd need to document this and make the log cleaner not run > > > > timestamp-based > > > > > deletion unless message.timestamp.type=LogAppendTime. > > > > > > > > > > On Sat, Feb 13, 2016 at 5:38 AM, Andrew Schofield < > > > > > andrew_schofield_j...@outlook.com> wrote: > > > > > > > > > > > This KIP is related to KIP-32, but I strikes me that it only > makes > > > > sense > > > > > > with one of the two proposed message timestamp types. If I > > understand > > > > > > correctly, message timestamps are only certain to be > monotonically > > > > > > increasing in the log if message.timestamp.type=LogAppendTime. > > > > > > > > > > > > > > > > > > > > > > > > Does timestamp-based auto-expiration require use of > > > > > > message.timestamp.type=LogAppendTime? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I think this KIP is a good idea, but I think it relies on strict > > > > ordering > > > > > > of timestamps to be workable. > > > > > > > > > > > > > > > > > > > > > > > > Andrew Schofield > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Date: Fri, 12 Feb 2016 10:38:46 -0800 > > > > > > > Subject: Re: [DISCUSS] KIP-47 - Add timestamp-based log > deletion > > > > policy > > > > > > > From: n...@confluent.io > > > > > > > To: dev@kafka.apache.org > > > > > > > > > > > > > > Adding a timestamp based auto-expiration is useful and this > > > proposal > > > > > > makes > > > > > > > sense. Thx! > > > > > > > > > > > > > > On Wed, Feb 10, 2016 at 3:35 PM, Jay Kreps wrote: > > > > > > > > > > > > > >> I think this makes a lot of sense and won't be hard to > implement > > > and > > > > > > >> doesn't create too much in the way of new interfaces. > > > > > > >> > > > > > > >> -Jay > > > > > > >> > > > > > > >> On Tue, Feb 9, 2016 at 8:13 AM, Bill Warshaw wrote: > > > > > > >> > > > > > > >>> Hello, > > > > > > >>> > > > > > > >>> I just submitted KIP-47 for adding a new log deletion policy > > > based > > > > > on a > > > > > > >>> minimum timestamp of messages to retain. > > > > > > >>> > > > > > > >>> > > > > > > >>> > > > > > > >> > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-47+-+Add+timestamp-based+log+deletion+policy > > > > > > >>> > > > > > > >>> I'm open to any comments or suggestions. > > > > > > >>> > > > > > > >>> Thanks, > > > > > > >>> Bill Warshaw > > > > > > >>> > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > Thanks, > > > > > > > Neha > > > > > > > > > > > > > > > > > > > > >