On Tue, Jan 3, 2017 at 6:14 PM, Dong Lin <lindon...@gmail.com> wrote:

> Hey Ewen,
>
> Thanks for the review. As Radai explained, it would be complex in terms of
> user configuration if we were to use committed offset to decide data
> deletion. We need a way to specify which groups need to consume data of
> this partition. The broker will also need to consume the entire offsets
> topic in that approach which has some overhead. I don't think it is that
> hard to implement. But it will likely take more time to discuss that
> approach due to the new config and the server side overhead.
>
> We choose to put this API in AdminClient because the API is more like an
> administrative operation (such as listGroups, deleteTopics) than a consumer
> operation. It is not necessarily called by consumer only. For example, we
> can implement the "delete data before committed offset" approach by running
> an external service which calls purgeDataBefore() API based on committed
> offset of consumer groups.
>
> I am not aware that AdminClient is not a public API. Suppose it is not
> public now, I assume we plan to make it public in the future as part of
> KIP-4. Are we not making it public because its interface is not stable? If
> so, can we just tag this new API as not stable in the code?
>


The AdminClient planned for KIP-4 is a new Java-based implementation. It's
definitely confusing that both will be (could be?) named AdminClient, but
we've kept the existing Scala AdminClient out of the public API and have
not required KIPs for changes to it.

That said, I agree that if this type of API makes it into Kafka, having a
(new, Java-based) AdminClient method would definitely be a good idea. An
alternative path might be to have a Consumer-based implementation since
that seems like a very intuitive, natural way to use the protocol. I think
optimizing for the expected use case would be a good idea.

-Ewen


>
> Thanks,
> Dong
>
> On Tue, Jan 3, 2017 at 3:56 PM, Ewen Cheslack-Postava <e...@confluent.io>
> wrote:
>
> > Dong,
> >
> > Looks like that's an internal link,
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-107%
> > 3A+Add+purgeDataBefore%28%29+API+in+AdminClient
> > is the right one.
> >
> > I have a question about one of the rejected alternatives:
> >
> > > Using committed offset instead of an extra API to trigger data purge
> > operation.
> >
> > The KIP says this would be more complicated to implement. Why is that? I
> > think brokers would have to consume the entire offsets topic, but the
> data
> > stored in memory doesn't seem to change and applying this when updated
> > offsets are seen seems basically the same. It might also be possible to
> > make it work even with multiple consumer groups if that was desired
> > (although that'd require tracking more data in memory) as a
> generalization
> > without requiring coordination between the consumer groups. Given the
> > motivation, I'm assuming this was considered unnecessary since this
> > specifically targets intermediate stream processing topics.
> >
> > Another question is why expose this via AdminClient (which isn't public
> API
> > afaik)? Why not, for example, expose it on the Consumer, which is
> > presumably where you'd want access to it since the functionality depends
> on
> > the consumer actually having consumed the data?
> >
> > -Ewen
> >
> > On Tue, Jan 3, 2017 at 2:45 PM, Dong Lin <lindon...@gmail.com> wrote:
> >
> > > Hi all,
> > >
> > > We created KIP-107 to propose addition of purgeDataBefore() API in
> > > AdminClient.
> > >
> > > Please find the KIP wiki in the link https://iwww.corp.linkedin.
> > > com/wiki/cf/display/ENGS/Kafka+purgeDataBefore%28%29+API+
> > design+proposal.
> > > We
> > > would love to hear your comments and suggestions.
> > >
> > > Thanks,
> > > Dong
> > >
> >
>

Reply via email to