I'd like to re-awaken this voting thread now that KIP-33 has merged.  This
KIP is now completely unblocked.  I have a working branch off of trunk with
my proposed fix, including testing.

On Mon, May 9, 2016 at 8:30 PM Guozhang Wang <wangg...@gmail.com> wrote:

> Jay, Bill:
>
> I'm thinking of one general use case of using timestamp rather than offset
> for log deletion, which is that for expiration handling in data
> replication, when the source data store decides to expire some data records
> based on their timestamps, today we need to configure the corresponding
> Kafka changelog topic for compaction, and actively send a tombstone for
> each expired record. Since expiration usually happens with a bunch of
> records, this could generate large tombstone traffic. For example I think
> LI's data replication for Espresso is seeing similar issues and they are
> just not sending tombstone at all.
>
> With timestamp based log deletion policy, this can be easily handled by
> simply setting the current expiration timestamp; but ideally one would
> prefer to configure this topic to be both log compaction enabled as well as
> log deletion enabled. From that point of view, I feel that current KIP
> still has value to be accepted.
>
> Guozhang
>
>
> On Mon, May 2, 2016 at 2:37 PM, Bill Warshaw <wdwars...@gmail.com> wrote:
>
> > Yes, I'd agree that offset is a more precise configuration than
> timestamp.
> > If there was a way to set a partition-level configuration, I would rather
> > use log.retention.min.offset than timestamp.  If you have an approach in
> > mind I'd be open to investigating it.
> >
> > On Mon, May 2, 2016 at 5:33 PM, Jay Kreps <j...@confluent.io> wrote:
> >
> > > Gotcha, good point. But barring that limitation, you agree that that
> > makes
> > > more sense?
> > >
> > > -Jay
> > >
> > > On Mon, May 2, 2016 at 2:29 PM, Bill Warshaw <wdwars...@gmail.com>
> > wrote:
> > >
> > > > The problem with offset as a config option is that offsets are
> > > > partition-specific, so we'd need a per-partition config.  This would
> > work
> > > > for our particular use case, where we have single-partition topics,
> but
> > > for
> > > > multiple-partition topics it would delete from all partitions based
> on
> > a
> > > > global topic-level offset.
> > > >
> > > > On Mon, May 2, 2016 at 4:32 PM, Jay Kreps <j...@confluent.io> wrote:
> > > >
> > > > > I think you are saying you considered a kind of trim() api that
> would
> > > > > synchronously chop off the tail of the log starting from a given
> > > offset.
> > > > > That would be one option, but what I was saying was slightly
> > different:
> > > > in
> > > > > the proposal you have where there is a config that controls
> retention
> > > > that
> > > > > the user would update, wouldn't it make more sense for this config
> to
> > > be
> > > > > based on offset rather than timestamp?
> > > > >
> > > > > -Jay
> > > > >
> > > > > On Mon, May 2, 2016 at 12:53 PM, Bill Warshaw <wdwars...@gmail.com
> >
> > > > wrote:
> > > > >
> > > > > > 1.  Initially I looked at using the actual offset, by adding a
> call
> > > to
> > > > > > AdminUtils to just delete anything in a given topic/partition to
> a
> > > > given
> > > > > > offset.  I ran into a lot of trouble here trying to work out how
> > the
> > > > > system
> > > > > > would recognize that every broker had successfully deleted that
> > range
> > > > > from
> > > > > > the partition before returning to the client.  If we were ok
> > treating
> > > > > this
> > > > > > as a completely asynchronous operation I would be open to
> > revisiting
> > > > this
> > > > > > approach.
> > > > > >
> > > > > > 2.  For our use case, we would be updating the config every few
> > hours
> > > > > for a
> > > > > > given topic, and there would not a be a sizable amount of
> > > consumers.  I
> > > > > > imagine that this would not scale well if someone was adjusting
> > this
> > > > > config
> > > > > > very frequently on a large system, but I don't know if there are
> > any
> > > > use
> > > > > > cases where that would occur.  I imagine most use cases would
> > involve
> > > > > > truncating the log after taking a snapshot or doing some other
> > > > expensive
> > > > > > operation that didn't occur very frequently.
> > > > > >
> > > > > > On Mon, May 2, 2016 at 2:23 PM, Jay Kreps <j...@confluent.io>
> > wrote:
> > > > > >
> > > > > > > Two comments:
> > > > > > >
> > > > > > >    1. Is there a reason to use physical time rather than
> offset?
> > > The
> > > > > idea
> > > > > > >    is for the consumer to say when it has consumed something so
> > it
> > > > can
> > > > > be
> > > > > > >    deleted, right? It seems like offset would be a much more
> > > precise
> > > > > way
> > > > > > > to do
> > > > > > >    this--i.e. the consumer says "I have checkpointed state up
> to
> > > > > offset X
> > > > > > > you
> > > > > > >    can get rid of anything prior to that". Doing this by
> > timestamp
> > > > > seems
> > > > > > > like
> > > > > > >    it is just more error prone...
> > > > > > >    2. Is this mechanism practical to use at scale? It requires
> > > > several
> > > > > ZK
> > > > > > >    writes per config change, so I guess that depends on how
> > > > frequently
> > > > > > the
> > > > > > >    consumers would update the value and how many consumers
> there
> > > > > > are...any
> > > > > > >    thoughts on this?
> > > > > > >
> > > > > > > -Jay
> > > > > > >
> > > > > > > On Thu, Apr 28, 2016 at 8:28 AM, Bill Warshaw <
> > wdwars...@gmail.com
> > > >
> > > > > > wrote:
> > > > > > >
> > > > > > > > I'd like to re-initiate the vote for KIP-47 now that KIP-33
> has
> > > > been
> > > > > > > > accepted and is in-progress.  I've updated the KIP (
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-47+-+Add+timestamp-based+log+deletion+policy
> > > > > > > > ).
> > > > > > > > I have a commit with the functionality for KIP-47 ready to go
> > > once
> > > > > > KIP-33
> > > > > > > > is complete; it's a fairly minor change.
> > > > > > > >
> > > > > > > > On Wed, Mar 9, 2016 at 8:42 PM, Gwen Shapira <
> > g...@confluent.io>
> > > > > > wrote:
> > > > > > > >
> > > > > > > > > For convenience, the KIP is here:
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-47+-+Add+timestamp-based+log+deletion+policy
> > > > > > > > >
> > > > > > > > > Do you mind updating the KIP with  time formats we plan on
> > > > > supporting
> > > > > > > > > in the configuration?
> > > > > > > > >
> > > > > > > > > On Wed, Mar 9, 2016 at 11:44 AM, Bill Warshaw <
> > > > wdwars...@gmail.com
> > > > > >
> > > > > > > > wrote:
> > > > > > > > > > Hello,
> > > > > > > > > >
> > > > > > > > > > I'd like to initiate the vote for KIP-47.
> > > > > > > > > >
> > > > > > > > > > Thanks,
> > > > > > > > > > Bill Warshaw
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>
>
>
> --
> -- Guozhang
>

Reply via email to