Hey Ewen,

Thanks for the review. As Radai explained, it would be complex in terms of
user configuration if we were to use committed offset to decide data
deletion. We need a way to specify which groups need to consume data of
this partition. The broker will also need to consume the entire offsets
topic in that approach which has some overhead. I don't think it is that
hard to implement. But it will likely take more time to discuss that
approach due to the new config and the server side overhead.

We choose to put this API in AdminClient because the API is more like an
administrative operation (such as listGroups, deleteTopics) than a consumer
operation. It is not necessarily called by consumer only. For example, we
can implement the "delete data before committed offset" approach by running
an external service which calls purgeDataBefore() API based on committed
offset of consumer groups.

I am not aware that AdminClient is not a public API. Suppose it is not
public now, I assume we plan to make it public in the future as part of
KIP-4. Are we not making it public because its interface is not stable? If
so, can we just tag this new API as not stable in the code?

Thanks,
Dong

On Tue, Jan 3, 2017 at 3:56 PM, Ewen Cheslack-Postava <e...@confluent.io>
wrote:

> Dong,
>
> Looks like that's an internal link,
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-107%
> 3A+Add+purgeDataBefore%28%29+API+in+AdminClient
> is the right one.
>
> I have a question about one of the rejected alternatives:
>
> > Using committed offset instead of an extra API to trigger data purge
> operation.
>
> The KIP says this would be more complicated to implement. Why is that? I
> think brokers would have to consume the entire offsets topic, but the data
> stored in memory doesn't seem to change and applying this when updated
> offsets are seen seems basically the same. It might also be possible to
> make it work even with multiple consumer groups if that was desired
> (although that'd require tracking more data in memory) as a generalization
> without requiring coordination between the consumer groups. Given the
> motivation, I'm assuming this was considered unnecessary since this
> specifically targets intermediate stream processing topics.
>
> Another question is why expose this via AdminClient (which isn't public API
> afaik)? Why not, for example, expose it on the Consumer, which is
> presumably where you'd want access to it since the functionality depends on
> the consumer actually having consumed the data?
>
> -Ewen
>
> On Tue, Jan 3, 2017 at 2:45 PM, Dong Lin <lindon...@gmail.com> wrote:
>
> > Hi all,
> >
> > We created KIP-107 to propose addition of purgeDataBefore() API in
> > AdminClient.
> >
> > Please find the KIP wiki in the link https://iwww.corp.linkedin.
> > com/wiki/cf/display/ENGS/Kafka+purgeDataBefore%28%29+API+
> design+proposal.
> > We
> > would love to hear your comments and suggestions.
> >
> > Thanks,
> > Dong
> >
>

Reply via email to