bq. describe topics by a regular expression on the server side Should caution be taken if the regex doesn't filter ("*") ?
Cheers On Fri, Jul 13, 2018 at 6:02 PM Colin McCabe <cmcc...@apache.org> wrote: > As Jason wrote, this won't scale as the number of partitions increases. > We already have users who have tens of thousands of topics, or more. If > you multiply that by 100x over the next few years, you end up with this API > returning full information about millions of topics, which clearly doesn't > work. > > We discussed this a lot in the original KIP-117 DISCUSS thread which added > the Java AdminClient. ListTopics and DescribeTopics were deliberately kept > separate because we understood that eventually a single RPC would not be > able to return information about all the topics in the cluster. So I have > to vote -1 for this proposal as it stands. > > I do agree that adding a way to describe topics by a regular expression on > the server side would be very useful. This would also fix a major > scalability problem we have now, which is that when subscribing via a > regular expression, clients need to fetch the full list of all topics in > the cluster and filter locally. > > I think a regular expression library like re2 would be ideal for this > purpose. re2 is standardized and language-agnostic (it's not tied only to > Java). In contrast, Java regular expression change with different releases > of the JDK (there were some changes in java 8, for example). Also, re2 > regular expressions are linear time, never exponential time. See > https://github.com/google/re2j > > regards, > Colin > > > On Fri, Jul 13, 2018, at 05:00, Andras Beni wrote: > > The KIP looks good to me. > > However, if there is willingness in the community to work on metadata > > request with patterns, the feature proposed here and filtering by '*' or > > '.*' would be redundant. > > > > Andras > > > > > > > > On Fri, Jul 13, 2018 at 12:38 AM Jason Gustafson <ja...@confluent.io> > wrote: > > > > > Hey Manikumar, > > > > > > As Kafka begins to scale to larger and larger numbers of > topics/partitions, > > > I'm a little concerned about the scalability of APIs such as this. The > API > > > looks benign, but imagine you have have a few million partitions. We > > > already expose similar APIs in the producer and consumer, so probably > not > > > much additional harm to expose it in the AdminClient, but it would be > nice > > > to put a little thought into some longer term options. We should be > giving > > > users an efficient way to select a smaller set of the topics they are > > > interested in. We have always discussed adding some filtering support > to > > > the Metadata API. Perhaps now is a good time to reconsider this? We now > > > have a convention for wildcard ACLs, so perhaps we can do something > > > similar. Full regex support might be ideal given the consumer's > > > subscription API, but that is more challenging. What do you think? > > > > > > Thanks, > > > Jason > > > > > > On Thu, Jul 12, 2018 at 2:35 PM, Harsha <ka...@harsha.io> wrote: > > > > > > > Very useful. LGTM. > > > > > > > > Thanks, > > > > Harsha > > > > > > > > On Thu, Jul 12, 2018, at 9:56 AM, Manikumar wrote: > > > > > Hi all, > > > > > > > > > > I have created a KIP to add describe all topics API to AdminClient > . > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP- > > > > 327%3A+Add+describe+all+topics+API+to+AdminClient > > > > > > > > > > Please take a look. > > > > > > > > > > Thanks, > > > > > > > >