Hi, The only concrete example i can think of is a case for limiting disk usage. Say, i had something like Connect running that was tracking changes in a database. Downstream i don't really care about every change, i just want the latest values, so compaction could be enabled. However, the kafka cluster has limited disk space so we need to limit the size of each partition. In a previous life i have done the same, just without compaction turned on.
Besides, i don't think it costs us anything in terms of added complexity to enable it for time & size based retention - the code already does this for us. Thanks, Damian On Fri, 12 Aug 2016 at 05:30 Neha Narkhede <n...@confluent.io> wrote: > Jun, > > The motivation for this KIP is to handle joins and windows in Kafka > streams better and since Streams supports time-based windows, the KIP > suggests combining time-based deletion and compaction. > > It might make sense to do the same for size-based windows, but can you > think of a concrete use case? If not, perhaps we can come back to it. > On Thu, Aug 11, 2016 at 3:08 PM Jun Rao <j...@confluent.io> wrote: > >> Hi, Damian, >> >> Thanks for the proposal. It makes sense to use time-based deletion >> retention and compaction together, as you mentioned in the KStream. >> >> Is there a use case where we want to combine size-based deletion retention >> and compaction together? >> >> Jun >> >> On Thu, Aug 11, 2016 at 2:00 AM, Damian Guy <damian....@gmail.com> wrote: >> >> > Hi Jason, >> > >> > Thanks for your input - appreciated. >> > >> > 1. Would it make sense to use this KIP in the consumer coordinator to >> > > expire offsets based on the topic's retention time? Currently, we >> have a >> > > periodic task which scans the full cache to check which offsets can be >> > > expired, but we might be able to get rid of this if we had a callback >> to >> > > update the cache when a segment was deleted. Technically offsets can >> be >> > > given their own expiration time, but it seems questionable whether we >> > need >> > > this going forward (the new consumer doesn't even expose it at the >> > moment). >> > > >> > >> > The KIP in its current form isn't adding a callback. So you'd still >> need to >> > scan the cache and remove any expired offsets, however you wouldn't send >> > the tombstone messages. >> > Having a callback sounds useful, though it isn't clear to me how you >> would >> > know which offsets to remove from the cache on segment deletion? I will >> > look into it. >> > >> > >> > > 2. This KIP could also be useful for expiration in the case of a cache >> > > maintained on the client, but I don't see an obvious way that we'd be >> > able >> > > to leverage it since there's no indication to the client when a >> segment >> > has >> > > been deleted (unless they reload the cache from the beginning of the >> > log). >> > > One approach I can think of would be to write corresponding >> tombstones as >> > > necessary when a segment is removed, but that seems pretty heavy. Have >> > you >> > > considered this problem? >> > > >> > > >> > We've not considered this and I'm not sure we want to as part of this >> KIP. >> > >> > Thanks, >> > Damian >> > >> > >> > > >> > > >> > > On Mon, Aug 8, 2016 at 12:41 AM, Damian Guy <damian....@gmail.com> >> > wrote: >> > > >> > > > Hi, >> > > > >> > > > We have created KIP 71: Enable log compaction and deletion to >> co-exist` >> > > > >> > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP- >> > > > 71%3A+Enable+log+compaction+and+deletion+to+co-exist >> > > > >> > > > Please take a look. Feedback is appreciated. >> > > > >> > > > Thank you >> > > > >> > > >> > >> >