Thanks for the proposal. I'm +1 overall with a thought somewhat related to
Jun's comments.

While there may not yet be a sensible use case for it, it should be (in
theory) legal to have compact_and_delete with size based retention as well.
I'm just wondering if it makes sense to allow specifying multiple
comma-separated policies "compact,delete" as opposed to
"compact_and_delete" or "x_and_y_and_z" or "y_and_z" if we ever come up
with more policies. The order could potentially indicate precedence.
Anyway, it is just a thought - it may end up being very confusing for users.

@Jason - I agree this could be used to handle offset expiration as well. We
can discuss that separately though; and if we do that we would want to also
deprecate the retention field in the commit requests.

Joel

On Mon, Aug 15, 2016 at 2:07 AM, Damian Guy <damian....@gmail.com> wrote:

> Thanks Jason.
> The log retention.ms will be set to a value that greater than the window
> retention time. So as windows expire, they eventually get cleaned up by the
> broker. It doesn't matter if old windows are around for sometime beyond
> their usefulness, more that they do eventually get removed and the log
> doesn't grow indefinitely (as it does now).
>
> Damian
>
> On Fri, 12 Aug 2016 at 20:25 Jason Gustafson <ja...@confluent.io> wrote:
>
> > Hey Damian,
> >
> > That's true, but it would avoid the need to write tombstones for the
> > expired offsets I guess. I'm actually not sure it's a great idea anyway.
> > One thing we've talked about is not expiring any offsets as long as a
> group
> > is alive, which would require some custom expiration logic. There's also
> > the fact that we'd risk expiring group metadata which is stored in the
> same
> > log. Having a builtin expiration mechanism may be more useful for the
> > compacted topics we maintain in Connect, but I think there too we might
> > need some custom logic. For example, expiring connector configs purely
> > based on time doesn't seem like what we'd want.
> >
> > By the way, I wonder if you could describe the expected usage in a little
> > more detail in the KIP for those of us who are not as familiar with Kafka
> > Streams. Is the intention to have the log retain only the most recent
> > window? In that case, would you always set the log retention time to the
> > window length? And I suppose a consumer would do a seek to the start of
> the
> > window (as soon as KIP-33 is available) and consume from there in order
> to
> > read the current state?
> >
> > Thanks,
> > Jason
> >
> > On Fri, Aug 12, 2016 at 8:48 AM, Damian Guy <damian....@gmail.com>
> wrote:
> >
> > > Thanks Jun
> > >
> > > On Fri, 12 Aug 2016 at 16:41 Jun Rao <j...@confluent.io> wrote:
> > >
> > > > Hi, Damian,
> > > >
> > > > I was just wondering if we should disable size-based retention in the
> > > > compact_and_delete mode. So, it sounds like that there could be a use
> > > case
> > > > for that. Since by default, the size-based retention is infinite, I
> > think
> > > > we can just leave the KIP as it is.
> > > >
> > > > Thanks,
> > > >
> > > > Jun
> > > >
> > > > On Fri, Aug 12, 2016 at 12:10 AM, Damian Guy <damian....@gmail.com>
> > > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > The only concrete example i can think of is a case for limiting
> disk
> > > > usage.
> > > > > Say, i had something like Connect running that was tracking changes
> > in
> > > a
> > > > > database. Downstream i don't really care about every change, i just
> > > want
> > > > > the latest values, so compaction could be enabled. However, the
> kafka
> > > > > cluster has limited disk space so we need to limit the size of each
> > > > > partition.
> > > > > In a previous life i have done the same, just without compaction
> > turned
> > > > on.
> > > > >
> > > > > Besides, i don't think it costs us anything in terms of added
> > > complexity
> > > > to
> > > > > enable it for time & size based retention - the code already does
> > this
> > > > for
> > > > > us.
> > > > >
> > > > > Thanks,
> > > > > Damian
> > > > >
> > > > > On Fri, 12 Aug 2016 at 05:30 Neha Narkhede <n...@confluent.io>
> > wrote:
> > > > >
> > > > > > Jun,
> > > > > >
> > > > > > The motivation for this KIP is to handle joins and windows in
> Kafka
> > > > > > streams better and since Streams supports time-based windows, the
> > KIP
> > > > > > suggests combining time-based deletion and compaction.
> > > > > >
> > > > > > It might make sense to do the same for size-based windows, but
> can
> > > you
> > > > > > think of a concrete use case? If not, perhaps we can come back to
> > it.
> > > > > > On Thu, Aug 11, 2016 at 3:08 PM Jun Rao <j...@confluent.io>
> wrote:
> > > > > >
> > > > > >> Hi, Damian,
> > > > > >>
> > > > > >> Thanks for the proposal. It makes sense to use time-based
> deletion
> > > > > >> retention and compaction together, as you mentioned in the
> > KStream.
> > > > > >>
> > > > > >> Is there a use case where we want to combine size-based deletion
> > > > > retention
> > > > > >> and compaction together?
> > > > > >>
> > > > > >> Jun
> > > > > >>
> > > > > >> On Thu, Aug 11, 2016 at 2:00 AM, Damian Guy <
> damian....@gmail.com
> > >
> > > > > wrote:
> > > > > >>
> > > > > >> > Hi Jason,
> > > > > >> >
> > > > > >> > Thanks for your input - appreciated.
> > > > > >> >
> > > > > >> > 1. Would it make sense to use this KIP in the consumer
> > coordinator
> > > > to
> > > > > >> > > expire offsets based on the topic's retention time?
> Currently,
> > > we
> > > > > >> have a
> > > > > >> > > periodic task which scans the full cache to check which
> > offsets
> > > > can
> > > > > be
> > > > > >> > > expired, but we might be able to get rid of this if we had a
> > > > > callback
> > > > > >> to
> > > > > >> > > update the cache when a segment was deleted. Technically
> > offsets
> > > > can
> > > > > >> be
> > > > > >> > > given their own expiration time, but it seems questionable
> > > whether
> > > > > we
> > > > > >> > need
> > > > > >> > > this going forward (the new consumer doesn't even expose it
> at
> > > the
> > > > > >> > moment).
> > > > > >> > >
> > > > > >> >
> > > > > >> > The KIP in its current form isn't adding a callback. So you'd
> > > still
> > > > > >> need to
> > > > > >> > scan the cache and remove any expired offsets, however you
> > > wouldn't
> > > > > send
> > > > > >> > the tombstone messages.
> > > > > >> > Having a callback sounds useful, though it isn't clear to me
> how
> > > you
> > > > > >> would
> > > > > >> > know which offsets to remove from the cache on segment
> > deletion? I
> > > > > will
> > > > > >> > look into it.
> > > > > >> >
> > > > > >> >
> > > > > >> > > 2. This KIP could also be useful for expiration in the case
> > of a
> > > > > cache
> > > > > >> > > maintained on the client, but I don't see an obvious way
> that
> > > we'd
> > > > > be
> > > > > >> > able
> > > > > >> > > to leverage it since there's no indication to the client
> when
> > a
> > > > > >> segment
> > > > > >> > has
> > > > > >> > > been deleted (unless they reload the cache from the
> beginning
> > of
> > > > the
> > > > > >> > log).
> > > > > >> > > One approach I can think of would be to write corresponding
> > > > > >> tombstones as
> > > > > >> > > necessary when a segment is removed, but that seems pretty
> > > heavy.
> > > > > Have
> > > > > >> > you
> > > > > >> > > considered this problem?
> > > > > >> > >
> > > > > >> > >
> > > > > >> > We've not considered this and I'm not sure we want to as part
> of
> > > > this
> > > > > >> KIP.
> > > > > >> >
> > > > > >> > Thanks,
> > > > > >> > Damian
> > > > > >> >
> > > > > >> >
> > > > > >> > >
> > > > > >> > >
> > > > > >> > > On Mon, Aug 8, 2016 at 12:41 AM, Damian Guy <
> > > damian....@gmail.com
> > > > >
> > > > > >> > wrote:
> > > > > >> > >
> > > > > >> > > > Hi,
> > > > > >> > > >
> > > > > >> > > > We have created KIP 71: Enable log compaction and deletion
> > to
> > > > > >> co-exist`
> > > > > >> > > >
> > > > > >> > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > > >> > > > 71%3A+Enable+log+compaction+and+deletion+to+co-exist
> > > > > >> > > >
> > > > > >> > > > Please take a look. Feedback is appreciated.
> > > > > >> > > >
> > > > > >> > > > Thank you
> > > > > >> > > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to