Re: [DISCUSS] KIP-71 Enable log compaction and deletion to co-exist

Jason Gustafson Fri, 12 Aug 2016 12:25:43 -0700

Hey Damian,

That's true, but it would avoid the need to write tombstones for the
expired offsets I guess. I'm actually not sure it's a great idea anyway.
One thing we've talked about is not expiring any offsets as long as a group
is alive, which would require some custom expiration logic. There's also
the fact that we'd risk expiring group metadata which is stored in the same
log. Having a builtin expiration mechanism may be more useful for the
compacted topics we maintain in Connect, but I think there too we might
need some custom logic. For example, expiring connector configs purely
based on time doesn't seem like what we'd want.


By the way, I wonder if you could describe the expected usage in a little
more detail in the KIP for those of us who are not as familiar with Kafka
Streams. Is the intention to have the log retain only the most recent
window? In that case, would you always set the log retention time to the
window length? And I suppose a consumer would do a seek to the start of the
window (as soon as KIP-33 is available) and consume from there in order to
read the current state?

Thanks,
Jason

On Fri, Aug 12, 2016 at 8:48 AM, Damian Guy <damian....@gmail.com> wrote:

> Thanks Jun
>
> On Fri, 12 Aug 2016 at 16:41 Jun Rao <j...@confluent.io> wrote:
>
> > Hi, Damian,
> >
> > I was just wondering if we should disable size-based retention in the
> > compact_and_delete mode. So, it sounds like that there could be a use
> case
> > for that. Since by default, the size-based retention is infinite, I think
> > we can just leave the KIP as it is.
> >
> > Thanks,
> >
> > Jun
> >
> > On Fri, Aug 12, 2016 at 12:10 AM, Damian Guy <damian....@gmail.com>
> wrote:
> >
> > > Hi,
> > >
> > > The only concrete example i can think of is a case for limiting disk
> > usage.
> > > Say, i had something like Connect running that was tracking changes in
> a
> > > database. Downstream i don't really care about every change, i just
> want
> > > the latest values, so compaction could be enabled. However, the kafka
> > > cluster has limited disk space so we need to limit the size of each
> > > partition.
> > > In a previous life i have done the same, just without compaction turned
> > on.
> > >
> > > Besides, i don't think it costs us anything in terms of added
> complexity
> > to
> > > enable it for time & size based retention - the code already does this
> > for
> > > us.
> > >
> > > Thanks,
> > > Damian
> > >
> > > On Fri, 12 Aug 2016 at 05:30 Neha Narkhede <n...@confluent.io> wrote:
> > >
> > > > Jun,
> > > >
> > > > The motivation for this KIP is to handle joins and windows in Kafka
> > > > streams better and since Streams supports time-based windows, the KIP
> > > > suggests combining time-based deletion and compaction.
> > > >
> > > > It might make sense to do the same for size-based windows, but can
> you
> > > > think of a concrete use case? If not, perhaps we can come back to it.
> > > > On Thu, Aug 11, 2016 at 3:08 PM Jun Rao <j...@confluent.io> wrote:
> > > >
> > > >> Hi, Damian,
> > > >>
> > > >> Thanks for the proposal. It makes sense to use time-based deletion
> > > >> retention and compaction together, as you mentioned in the KStream.
> > > >>
> > > >> Is there a use case where we want to combine size-based deletion
> > > retention
> > > >> and compaction together?
> > > >>
> > > >> Jun
> > > >>
> > > >> On Thu, Aug 11, 2016 at 2:00 AM, Damian Guy <damian....@gmail.com>
> > > wrote:
> > > >>
> > > >> > Hi Jason,
> > > >> >
> > > >> > Thanks for your input - appreciated.
> > > >> >
> > > >> > 1. Would it make sense to use this KIP in the consumer coordinator
> > to
> > > >> > > expire offsets based on the topic's retention time? Currently,
> we
> > > >> have a
> > > >> > > periodic task which scans the full cache to check which offsets
> > can
> > > be
> > > >> > > expired, but we might be able to get rid of this if we had a
> > > callback
> > > >> to
> > > >> > > update the cache when a segment was deleted. Technically offsets
> > can
> > > >> be
> > > >> > > given their own expiration time, but it seems questionable
> whether
> > > we
> > > >> > need
> > > >> > > this going forward (the new consumer doesn't even expose it at
> the
> > > >> > moment).
> > > >> > >
> > > >> >
> > > >> > The KIP in its current form isn't adding a callback. So you'd
> still
> > > >> need to
> > > >> > scan the cache and remove any expired offsets, however you
> wouldn't
> > > send
> > > >> > the tombstone messages.
> > > >> > Having a callback sounds useful, though it isn't clear to me how
> you
> > > >> would
> > > >> > know which offsets to remove from the cache on segment deletion? I
> > > will
> > > >> > look into it.
> > > >> >
> > > >> >
> > > >> > > 2. This KIP could also be useful for expiration in the case of a
> > > cache
> > > >> > > maintained on the client, but I don't see an obvious way that
> we'd
> > > be
> > > >> > able
> > > >> > > to leverage it since there's no indication to the client when a
> > > >> segment
> > > >> > has
> > > >> > > been deleted (unless they reload the cache from the beginning of
> > the
> > > >> > log).
> > > >> > > One approach I can think of would be to write corresponding
> > > >> tombstones as
> > > >> > > necessary when a segment is removed, but that seems pretty
> heavy.
> > > Have
> > > >> > you
> > > >> > > considered this problem?
> > > >> > >
> > > >> > >
> > > >> > We've not considered this and I'm not sure we want to as part of
> > this
> > > >> KIP.
> > > >> >
> > > >> > Thanks,
> > > >> > Damian
> > > >> >
> > > >> >
> > > >> > >
> > > >> > >
> > > >> > > On Mon, Aug 8, 2016 at 12:41 AM, Damian Guy <
> damian....@gmail.com
> > >
> > > >> > wrote:
> > > >> > >
> > > >> > > > Hi,
> > > >> > > >
> > > >> > > > We have created KIP 71: Enable log compaction and deletion to
> > > >> co-exist`
> > > >> > > >
> > > >> > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > >> > > > 71%3A+Enable+log+compaction+and+deletion+to+co-exist
> > > >> > > >
> > > >> > > > Please take a look. Feedback is appreciated.
> > > >> > > >
> > > >> > > > Thank you
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-71 Enable log compaction and deletion to co-exist

Reply via email to