The update looks good to me. Thanks Damian. Guozhang
On Fri, Aug 19, 2016 at 11:05 AM, Damian Guy <damian....@gmail.com> wrote: > Thanks Jun. > > Everyone - i've updated the KIP to use a comma-separated list of > cleanup.policies as suggested. I know we have already voted on this > proposal, so are there any objections to this change? > > Thanks, > Damian > > On Fri, 19 Aug 2016 at 18:38 Jun Rao <j...@confluent.io> wrote: > > > Damian, > > > > Yes, using comma-separated policies does seem more extensible for the > > future. If we want to adopt it, it's better to do it as part of this KIP. > > Perhaps you can just update the KIP and ask this thread to see if there > is > > any objections with the change. > > > > Thanks, > > > > Jun > > > > On Fri, Aug 19, 2016 at 10:01 AM, Damian Guy <damian....@gmail.com> > wrote: > > > > > Hi Grant, > > > > > > I apologise - I seemed to have skipped over Joel's email. > > > It is not something we considered, but seems valid. > > > I'm not sure if we should do it as part of this KIP or revisit it > if/when > > > we have more cleanup policies? > > > > > > Thanks, > > > Damian > > > > > > On Fri, 19 Aug 2016 at 15:57 Grant Henke <ghe...@cloudera.com> wrote: > > > > > > > Thanks for this KIP Damien. > > > > > > > > I know this vote has passed and I think its is okay as is, but I had > > > > similar thoughts as Joel about combining compaction policies. > > > > > > > > I'm just wondering if it makes sense to allow specifying multiple > > > > > comma-separated policies "compact,delete" as opposed to > > > > > "compact_and_delete" or "x_and_y_and_z" or "y_and_z" if we ever > come > > up > > > > > with more policies. The order could potentially indicate > precedence. > > > > > > > > > > > > > Out of curiosity was the option of supporting a list of policies > > > rejected? > > > > Is it something we may consider adding later but didn't want to > include > > > in > > > > the scope of this KIP? > > > > > > > > Thanks, > > > > Grant > > > > > > > > > > > > > > > > On Mon, Aug 15, 2016 at 7:25 PM, Joel Koshy <jjkosh...@gmail.com> > > wrote: > > > > > > > > > Thanks for the proposal. I'm +1 overall with a thought somewhat > > related > > > > to > > > > > Jun's comments. > > > > > > > > > > While there may not yet be a sensible use case for it, it should be > > (in > > > > > theory) legal to have compact_and_delete with size based retention > as > > > > well. > > > > > I'm just wondering if it makes sense to allow specifying multiple > > > > > comma-separated policies "compact,delete" as opposed to > > > > > "compact_and_delete" or "x_and_y_and_z" or "y_and_z" if we ever > come > > up > > > > > with more policies. The order could potentially indicate > precedence. > > > > > Anyway, it is just a thought - it may end up being very confusing > for > > > > > users. > > > > > > > > > > @Jason - I agree this could be used to handle offset expiration as > > > well. > > > > We > > > > > can discuss that separately though; and if we do that we would want > > to > > > > also > > > > > deprecate the retention field in the commit requests. > > > > > > > > > > Joel > > > > > > > > > > On Mon, Aug 15, 2016 at 2:07 AM, Damian Guy <damian....@gmail.com> > > > > wrote: > > > > > > > > > > > Thanks Jason. > > > > > > The log retention.ms will be set to a value that greater than > the > > > > window > > > > > > retention time. So as windows expire, they eventually get cleaned > > up > > > by > > > > > the > > > > > > broker. It doesn't matter if old windows are around for sometime > > > beyond > > > > > > their usefulness, more that they do eventually get removed and > the > > > log > > > > > > doesn't grow indefinitely (as it does now). > > > > > > > > > > > > Damian > > > > > > > > > > > > On Fri, 12 Aug 2016 at 20:25 Jason Gustafson <ja...@confluent.io > > > > > > wrote: > > > > > > > > > > > > > Hey Damian, > > > > > > > > > > > > > > That's true, but it would avoid the need to write tombstones > for > > > the > > > > > > > expired offsets I guess. I'm actually not sure it's a great > idea > > > > > anyway. > > > > > > > One thing we've talked about is not expiring any offsets as > long > > > as a > > > > > > group > > > > > > > is alive, which would require some custom expiration logic. > > There's > > > > > also > > > > > > > the fact that we'd risk expiring group metadata which is stored > > in > > > > the > > > > > > same > > > > > > > log. Having a builtin expiration mechanism may be more useful > for > > > the > > > > > > > compacted topics we maintain in Connect, but I think there too > we > > > > might > > > > > > > need some custom logic. For example, expiring connector configs > > > > purely > > > > > > > based on time doesn't seem like what we'd want. > > > > > > > > > > > > > > By the way, I wonder if you could describe the expected usage > in > > a > > > > > little > > > > > > > more detail in the KIP for those of us who are not as familiar > > with > > > > > Kafka > > > > > > > Streams. Is the intention to have the log retain only the most > > > recent > > > > > > > window? In that case, would you always set the log retention > time > > > to > > > > > the > > > > > > > window length? And I suppose a consumer would do a seek to the > > > start > > > > of > > > > > > the > > > > > > > window (as soon as KIP-33 is available) and consume from there > in > > > > order > > > > > > to > > > > > > > read the current state? > > > > > > > > > > > > > > Thanks, > > > > > > > Jason > > > > > > > > > > > > > > On Fri, Aug 12, 2016 at 8:48 AM, Damian Guy < > > damian....@gmail.com> > > > > > > wrote: > > > > > > > > > > > > > > > Thanks Jun > > > > > > > > > > > > > > > > On Fri, 12 Aug 2016 at 16:41 Jun Rao <j...@confluent.io> > wrote: > > > > > > > > > > > > > > > > > Hi, Damian, > > > > > > > > > > > > > > > > > > I was just wondering if we should disable size-based > > retention > > > in > > > > > the > > > > > > > > > compact_and_delete mode. So, it sounds like that there > could > > > be a > > > > > use > > > > > > > > case > > > > > > > > > for that. Since by default, the size-based retention is > > > > infinite, I > > > > > > > think > > > > > > > > > we can just leave the KIP as it is. > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > > > Jun > > > > > > > > > > > > > > > > > > On Fri, Aug 12, 2016 at 12:10 AM, Damian Guy < > > > > damian....@gmail.com > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > > > > > > > The only concrete example i can think of is a case for > > > limiting > > > > > > disk > > > > > > > > > usage. > > > > > > > > > > Say, i had something like Connect running that was > tracking > > > > > changes > > > > > > > in > > > > > > > > a > > > > > > > > > > database. Downstream i don't really care about every > > change, > > > i > > > > > just > > > > > > > > want > > > > > > > > > > the latest values, so compaction could be enabled. > However, > > > the > > > > > > kafka > > > > > > > > > > cluster has limited disk space so we need to limit the > size > > > of > > > > > each > > > > > > > > > > partition. > > > > > > > > > > In a previous life i have done the same, just without > > > > compaction > > > > > > > turned > > > > > > > > > on. > > > > > > > > > > > > > > > > > > > > Besides, i don't think it costs us anything in terms of > > added > > > > > > > > complexity > > > > > > > > > to > > > > > > > > > > enable it for time & size based retention - the code > > already > > > > does > > > > > > > this > > > > > > > > > for > > > > > > > > > > us. > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > Damian > > > > > > > > > > > > > > > > > > > > On Fri, 12 Aug 2016 at 05:30 Neha Narkhede < > > > n...@confluent.io> > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > Jun, > > > > > > > > > > > > > > > > > > > > > > The motivation for this KIP is to handle joins and > > windows > > > in > > > > > > Kafka > > > > > > > > > > > streams better and since Streams supports time-based > > > windows, > > > > > the > > > > > > > KIP > > > > > > > > > > > suggests combining time-based deletion and compaction. > > > > > > > > > > > > > > > > > > > > > > It might make sense to do the same for size-based > > windows, > > > > but > > > > > > can > > > > > > > > you > > > > > > > > > > > think of a concrete use case? If not, perhaps we can > come > > > > back > > > > > to > > > > > > > it. > > > > > > > > > > > On Thu, Aug 11, 2016 at 3:08 PM Jun Rao < > > j...@confluent.io> > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > >> Hi, Damian, > > > > > > > > > > >> > > > > > > > > > > >> Thanks for the proposal. It makes sense to use > > time-based > > > > > > deletion > > > > > > > > > > >> retention and compaction together, as you mentioned in > > the > > > > > > > KStream. > > > > > > > > > > >> > > > > > > > > > > >> Is there a use case where we want to combine > size-based > > > > > deletion > > > > > > > > > > retention > > > > > > > > > > >> and compaction together? > > > > > > > > > > >> > > > > > > > > > > >> Jun > > > > > > > > > > >> > > > > > > > > > > >> On Thu, Aug 11, 2016 at 2:00 AM, Damian Guy < > > > > > > damian....@gmail.com > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > >> > > > > > > > > > > >> > Hi Jason, > > > > > > > > > > >> > > > > > > > > > > > >> > Thanks for your input - appreciated. > > > > > > > > > > >> > > > > > > > > > > > >> > 1. Would it make sense to use this KIP in the > consumer > > > > > > > coordinator > > > > > > > > > to > > > > > > > > > > >> > > expire offsets based on the topic's retention > time? > > > > > > Currently, > > > > > > > > we > > > > > > > > > > >> have a > > > > > > > > > > >> > > periodic task which scans the full cache to check > > > which > > > > > > > offsets > > > > > > > > > can > > > > > > > > > > be > > > > > > > > > > >> > > expired, but we might be able to get rid of this > if > > we > > > > > had a > > > > > > > > > > callback > > > > > > > > > > >> to > > > > > > > > > > >> > > update the cache when a segment was deleted. > > > Technically > > > > > > > offsets > > > > > > > > > can > > > > > > > > > > >> be > > > > > > > > > > >> > > given their own expiration time, but it seems > > > > questionable > > > > > > > > whether > > > > > > > > > > we > > > > > > > > > > >> > need > > > > > > > > > > >> > > this going forward (the new consumer doesn't even > > > expose > > > > > it > > > > > > at > > > > > > > > the > > > > > > > > > > >> > moment). > > > > > > > > > > >> > > > > > > > > > > > > >> > > > > > > > > > > > >> > The KIP in its current form isn't adding a callback. > > So > > > > > you'd > > > > > > > > still > > > > > > > > > > >> need to > > > > > > > > > > >> > scan the cache and remove any expired offsets, > however > > > you > > > > > > > > wouldn't > > > > > > > > > > send > > > > > > > > > > >> > the tombstone messages. > > > > > > > > > > >> > Having a callback sounds useful, though it isn't > clear > > > to > > > > me > > > > > > how > > > > > > > > you > > > > > > > > > > >> would > > > > > > > > > > >> > know which offsets to remove from the cache on > segment > > > > > > > deletion? I > > > > > > > > > > will > > > > > > > > > > >> > look into it. > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > > >> > > 2. This KIP could also be useful for expiration in > > the > > > > > case > > > > > > > of a > > > > > > > > > > cache > > > > > > > > > > >> > > maintained on the client, but I don't see an > obvious > > > way > > > > > > that > > > > > > > > we'd > > > > > > > > > > be > > > > > > > > > > >> > able > > > > > > > > > > >> > > to leverage it since there's no indication to the > > > client > > > > > > when > > > > > > > a > > > > > > > > > > >> segment > > > > > > > > > > >> > has > > > > > > > > > > >> > > been deleted (unless they reload the cache from > the > > > > > > beginning > > > > > > > of > > > > > > > > > the > > > > > > > > > > >> > log). > > > > > > > > > > >> > > One approach I can think of would be to write > > > > > corresponding > > > > > > > > > > >> tombstones as > > > > > > > > > > >> > > necessary when a segment is removed, but that > seems > > > > pretty > > > > > > > > heavy. > > > > > > > > > > Have > > > > > > > > > > >> > you > > > > > > > > > > >> > > considered this problem? > > > > > > > > > > >> > > > > > > > > > > > > >> > > > > > > > > > > > > >> > We've not considered this and I'm not sure we want > to > > as > > > > > part > > > > > > of > > > > > > > > > this > > > > > > > > > > >> KIP. > > > > > > > > > > >> > > > > > > > > > > > >> > Thanks, > > > > > > > > > > >> > Damian > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > > > >> > > > > > > > > > > > > >> > > On Mon, Aug 8, 2016 at 12:41 AM, Damian Guy < > > > > > > > > damian....@gmail.com > > > > > > > > > > > > > > > > > > > > >> > wrote: > > > > > > > > > > >> > > > > > > > > > > > > >> > > > Hi, > > > > > > > > > > >> > > > > > > > > > > > > > >> > > > We have created KIP 71: Enable log compaction > and > > > > > deletion > > > > > > > to > > > > > > > > > > >> co-exist` > > > > > > > > > > >> > > > > > > > > > > > > > >> > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP- > > > > > > > > > > >> > > > 71%3A+Enable+log+compaction+ > > > and+deletion+to+co-exist > > > > > > > > > > >> > > > > > > > > > > > > > >> > > > Please take a look. Feedback is appreciated. > > > > > > > > > > >> > > > > > > > > > > > > > >> > > > Thank you > > > > > > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > Grant Henke > > > > Software Engineer | Cloudera > > > > gr...@cloudera.com | twitter.com/gchenke | > linkedin.com/in/granthenke > > > > > > > > > > -- -- Guozhang