For resolving the metrics conflicts, we can alternatively let Kafka to replace "." with double underscores "__" if that is the primary reason for topic name restrictions.
Guozhang On Sat, Jul 11, 2015 at 12:54 AM, Ewen Cheslack-Postava <e...@confluent.io> wrote: > On Fri, Jul 10, 2015 at 4:41 PM, Gwen Shapira <gshap...@cloudera.com> > wrote: > > > Yeah, I have an actual customer who ran into this. Unfortunately, > > inconsistencies in the way things are named are pretty common - just > > look at Kafka's many CLI options. > > > > I don't think that supporting both and pointing at the docs with "I > > told you so" when our metrics break is a good solution. > > > > I agree, especially since we don't *already* have something in the docs > indicating this will be an issue. I was flippant about the situation > because I *wish* there was more careful consideration + naming policy in > place, but I realize that doesn't always happen in practice. I guess I need > to take Compatibility Czar more seriously :) > > I see think the obvious practical options are as follows: > > 1. Kill support for "_". Piss off the entire set of people who currently > use "_" anywhere in topic names. > 2. Kill support for ".". Piss off the entire set of people who currently > use "." anywhere in topic names. > 3. Tell people they need to be careful about this issue. Piss off the set > of people who use both "_" and "." *and* happen to have conflicting topic > names. They will have some pain when they discover the issue and have to > figure out how to move one of those topics over to a non-conflicting name. > I'm going to claim that this group must be an *extremely* small fraction of > users, which doesn't make it better to allow things to break for them, but > at least gives us an idea of the scale of impact. > > (One other alternative suggested earlier was encoding metric names to > account for differences; given the metric renaming mess in the last > release, I'm extremely hesitant to suggest anything of the sort...) > > None of the options are ideal, but to me, 3 seems like the least painful. > Both for us, and for the vast majority of users. It seems to me that the > number of users that would complain about (1) or (2) drastically outweigh > (3). > > At this point, I don't think it's practical to keep switching the rules > about which characters are allowed and which aren't because the previous > attempts haven't been successful -- it seems the rules have changed > multiple times, whether intentionally or accidentally, such that any more > changes will cause problems. At this point, I think we just need to accept > being liberal in accepting the range of topic names that have been > permitted so far and make the best of the situation, even if it means only > being able to warn people of conflicts. > > Here's another alternative: how about being liberal with topic name > characters, but upon topic creation we convert the name to the metric name > and fail if there's a conflict with another topic? This is relatively > expensive (requires getting the metric name of all other topics), but it > avoids the bad situation we're encountering here (conflicting metrics), > avoids getting into a persistent conflict (we kill topic creation when we > detect the issue rather than noticing it when the metrics conflict > happens), and keeps the vast majority of existing users happy (both _ and . > work in topic names as long as you don't create topics with conflicting > metric names). > > There are definitely details to be worked out (auto topic creation?), but > it seems like a more realistic solution than to start disallowing _ or . in > topic names. > > -Ewen > > > > > > On Fri, Jul 10, 2015 at 4:33 PM, Ewen Cheslack-Postava > > <e...@confluent.io> wrote: > > > I figure you'll probably see complaints no matter what change you make. > > > Gwen, given that you raised this, another important question might be > how > > > many people you see using *both*. I'm guessing this question came up > > > because you actually saw a conflict? But I'd imagine (or at least hope) > > > that most organizations are mostly consistent about naming topics -- > they > > > standardize on one or the other. > > > > > > Since there's no "right" way to name them, I'd just leave it supporting > > > both and document the potential conflict in metrics. And if people use > > both > > > naming schemes, they probably deserve to suffer for their inconsistency > > :) > > > > > > -Ewen > > > > > > On Fri, Jul 10, 2015 at 3:28 PM, Gwen Shapira <gshap...@cloudera.com> > > wrote: > > > > > >> I find dots more common in my customer base, so I will definitely feel > > >> the pain of removing them. > > >> > > >> However, "." are already used in metrics, file names, directories, etc > > >> - so if we keep the dots, we need to keep code that translates them > > >> and document the translation. Just banning "." seems more natural. > > >> Also, as Grant mentioned, we'll probably have our own special usage > > >> for "." down the line. > > >> > > >> On Fri, Jul 10, 2015 at 2:12 PM, Todd Palino <tpal...@gmail.com> > wrote: > > >> > I absolutely disagree with #2, Neha. That will break a lot of > > >> > infrastructure within LinkedIn. That said, removing "." might break > > other > > >> > people as well, but I think we should have a clearer idea of how > much > > >> usage > > >> > there is on either side. > > >> > > > >> > -Todd > > >> > > > >> > > > >> > On Fri, Jul 10, 2015 at 2:08 PM, Neha Narkhede <n...@confluent.io> > > >> wrote: > > >> > > > >> >> "." seems natural for grouping topic names. +1 for 2) going forward > > only > > >> >> without breaking previously created topics with "_" though that > might > > >> >> require us to patch the code somewhat awkwardly till we phase it > out > > a > > >> >> couple (purposely left vague to stay out of Ewen's wrath :-)) > > versions > > >> >> later. > > >> >> > > >> >> On Fri, Jul 10, 2015 at 2:02 PM, Gwen Shapira < > gshap...@cloudera.com > > > > > >> >> wrote: > > >> >> > > >> >> > I don't think we should break existing topics. Just disallow new > > >> >> > topics going forward. > > >> >> > > > >> >> > Agree that having both is horrible, but we should have a solution > > that > > >> >> > fails when you run "kafka_topics.sh --create", not when you > > configure > > >> >> > Ganglia. > > >> >> > > > >> >> > Gwen > > >> >> > > > >> >> > On Fri, Jul 10, 2015 at 1:53 PM, Jay Kreps <j...@confluent.io> > > wrote: > > >> >> > > Unfortunately '.' is pretty common too. I agree that it is > > perverse, > > >> >> but > > >> >> > > people seem to do it. Breaking all the topics with '.' in the > > name > > >> >> seems > > >> >> > > like it could be worse than combining metrics for people who > > have a > > >> >> > > 'foo_bar' AND 'foo.bar' (and after all, having both is DEEPLY > > >> perverse, > > >> >> > > no?). > > >> >> > > > > >> >> > > Where is our Dean of Compatibility, Ewen, on this? > > >> >> > > > > >> >> > > -Jay > > >> >> > > > > >> >> > > On Fri, Jul 10, 2015 at 1:32 PM, Todd Palino < > tpal...@gmail.com> > > >> >> wrote: > > >> >> > > > > >> >> > >> My selfish point of view is that we do #1, as we use "_" > > >> extensively > > >> >> in > > >> >> > >> topic names here :) I also happen to think it's the right > > choice, > > >> >> > >> specifically because "." has more special meanings, as you > > noted. > > >> >> > >> > > >> >> > >> -Todd > > >> >> > >> > > >> >> > >> > > >> >> > >> On Fri, Jul 10, 2015 at 1:30 PM, Gwen Shapira < > > >> gshap...@cloudera.com> > > >> >> > >> wrote: > > >> >> > >> > > >> >> > >> > Unintentional side effect from allowing IP addresses in > > consumer > > >> >> > client > > >> >> > >> > IDs :) > > >> >> > >> > > > >> >> > >> > So the question is, what do we do now? > > >> >> > >> > > > >> >> > >> > 1) disallow "." > > >> >> > >> > 2) disallow "_" > > >> >> > >> > 3) find a reversible way to encode "." and "_" that won't > > break > > >> >> > existing > > >> >> > >> > metrics > > >> >> > >> > 4) all of the above? > > >> >> > >> > > > >> >> > >> > btw. it looks like "." and ".." are currently valid. Topic > > names > > >> are > > >> >> > >> > used for directories, right? this sounds like fun :) > > >> >> > >> > > > >> >> > >> > I vote for option #1, although if someone has a good idea > for > > #3 > > >> it > > >> >> > >> > will be even better. > > >> >> > >> > > > >> >> > >> > Gwen > > >> >> > >> > > > >> >> > >> > > > >> >> > >> > > > >> >> > >> > On Fri, Jul 10, 2015 at 1:22 PM, Grant Henke < > > >> ghe...@cloudera.com> > > >> >> > >> wrote: > > >> >> > >> > > Found it was added here: > > >> >> > >> https://issues.apache.org/jira/browse/KAFKA-697 > > >> >> > >> > > > > >> >> > >> > > On Fri, Jul 10, 2015 at 3:18 PM, Todd Palino < > > >> tpal...@gmail.com> > > >> >> > >> wrote: > > >> >> > >> > > > > >> >> > >> > >> This was definitely changed at some point after > KAFKA-495. > > The > > >> >> > >> question > > >> >> > >> > is > > >> >> > >> > >> when and why. > > >> >> > >> > >> > > >> >> > >> > >> Here's the relevant code from that patch: > > >> >> > >> > >> > > >> >> > >> > >> > > >> >> =================================================================== > > >> >> > >> > >> --- core/src/main/scala/kafka/utils/Topic.scala (revision > > >> >> 1390178) > > >> >> > >> > >> +++ core/src/main/scala/kafka/utils/Topic.scala (working > > copy) > > >> >> > >> > >> @@ -21,24 +21,21 @@ > > >> >> > >> > >> import util.matching.Regex > > >> >> > >> > >> > > >> >> > >> > >> object Topic { > > >> >> > >> > >> + val legalChars = "[a-zA-Z0-9_-]" > > >> >> > >> > >> > > >> >> > >> > >> > > >> >> > >> > >> > > >> >> > >> > >> -Todd > > >> >> > >> > >> > > >> >> > >> > >> > > >> >> > >> > >> On Fri, Jul 10, 2015 at 1:02 PM, Grant Henke < > > >> >> ghe...@cloudera.com> > > >> >> > >> > wrote: > > >> >> > >> > >> > > >> >> > >> > >> > kafka.common.Topic shows that currently period is a > valid > > >> >> > character > > >> >> > >> > and I > > >> >> > >> > >> > have verified I can use kafka-topics.sh to create a new > > >> topic > > >> >> > with a > > >> >> > >> > >> > period. > > >> >> > >> > >> > > > >> >> > >> > >> > > > >> >> > >> > >> > > AdminUtils.createOrUpdateTopicPartitionAssignmentPathInZK > > >> >> > currently > > >> >> > >> > uses > > >> >> > >> > >> > Topic.validate before writing to Zookeeper. > > >> >> > >> > >> > > > >> >> > >> > >> > Should period character support be removed? I was under > > the > > >> >> same > > >> >> > >> > >> impression > > >> >> > >> > >> > as Gwen, that a period was used by many as a way to > > "group" > > >> >> > topics. > > >> >> > >> > >> > > > >> >> > >> > >> > The code is pasted below since its small: > > >> >> > >> > >> > > > >> >> > >> > >> > object Topic { > > >> >> > >> > >> > val legalChars = "[a-zA-Z0-9\\._\\-]" > > >> >> > >> > >> > private val maxNameLength = 255 > > >> >> > >> > >> > private val rgx = new Regex(legalChars + "+") > > >> >> > >> > >> > > > >> >> > >> > >> > val InternalTopics = > > Set(OffsetManager.OffsetsTopicName) > > >> >> > >> > >> > > > >> >> > >> > >> > def validate(topic: String) { > > >> >> > >> > >> > if (topic.length <= 0) > > >> >> > >> > >> > throw new InvalidTopicException("topic name is > > >> illegal, > > >> >> > can't > > >> >> > >> be > > >> >> > >> > >> > empty") > > >> >> > >> > >> > else if (topic.equals(".") || topic.equals("..")) > > >> >> > >> > >> > throw new InvalidTopicException("topic name > cannot > > be > > >> >> > \".\" or > > >> >> > >> > >> > \"..\"") > > >> >> > >> > >> > else if (topic.length > maxNameLength) > > >> >> > >> > >> > throw new InvalidTopicException("topic name is > > >> illegal, > > >> >> > can't > > >> >> > >> be > > >> >> > >> > >> > longer than " + maxNameLength + " characters") > > >> >> > >> > >> > > > >> >> > >> > >> > rgx.findFirstIn(topic) match { > > >> >> > >> > >> > case Some(t) => > > >> >> > >> > >> > if (!t.equals(topic)) > > >> >> > >> > >> > throw new InvalidTopicException("topic name > " + > > >> topic > > >> >> > + " > > >> >> > >> is > > >> >> > >> > >> > illegal, contains a character other than ASCII > > >> alphanumerics, > > >> >> > '.', > > >> >> > >> '_' > > >> >> > >> > >> and > > >> >> > >> > >> > '-'") > > >> >> > >> > >> > case None => throw new > InvalidTopicException("topic > > >> name > > >> >> " > > >> >> > + > > >> >> > >> > topic > > >> >> > >> > >> + > > >> >> > >> > >> > " is illegal, contains a character other than ASCII > > >> >> > alphanumerics, > > >> >> > >> > '.', > > >> >> > >> > >> > '_' and '-'") > > >> >> > >> > >> > } > > >> >> > >> > >> > } > > >> >> > >> > >> > } > > >> >> > >> > >> > > > >> >> > >> > >> > On Fri, Jul 10, 2015 at 2:50 PM, Todd Palino < > > >> >> tpal...@gmail.com> > > >> >> > >> > wrote: > > >> >> > >> > >> > > > >> >> > >> > >> > > I had to go look this one up again to make sure - > > >> >> > >> > >> > > https://issues.apache.org/jira/browse/KAFKA-495 > > >> >> > >> > >> > > > > >> >> > >> > >> > > The only valid character names for topics are > > >> alphanumeric, > > >> >> > >> > underscore, > > >> >> > >> > >> > and > > >> >> > >> > >> > > dash. A period is not supposed to be a valid > character > > to > > >> >> use. > > >> >> > If > > >> >> > >> > >> you're > > >> >> > >> > >> > > seeing them, then one of two things have happened: > > >> >> > >> > >> > > > > >> >> > >> > >> > > 1) You have topic names that are grandfathered in > from > > >> before > > >> >> > that > > >> >> > >> > >> patch > > >> >> > >> > >> > > 2) The patch is not working properly and there is > > >> somewhere > > >> >> in > > >> >> > the > > >> >> > >> > >> broker > > >> >> > >> > >> > > that the standard is not being enforced. > > >> >> > >> > >> > > > > >> >> > >> > >> > > -Todd > > >> >> > >> > >> > > > > >> >> > >> > >> > > > > >> >> > >> > >> > > On Fri, Jul 10, 2015 at 12:13 PM, Brock Noland < > > >> >> > br...@apache.org> > > >> >> > >> > >> wrote: > > >> >> > >> > >> > > > > >> >> > >> > >> > > > On Fri, Jul 10, 2015 at 11:34 AM, Gwen Shapira < > > >> >> > >> > >> gshap...@cloudera.com> > > >> >> > >> > >> > > > wrote: > > >> >> > >> > >> > > > > Hi Kafka Fans, > > >> >> > >> > >> > > > > > > >> >> > >> > >> > > > > If you have one topic named "kafka_lab_2" and the > > >> other > > >> >> > named > > >> >> > >> > >> > > > > "kafka.lab.2", the topic level metrics will be > > named > > >> >> > >> kafka_lab_2 > > >> >> > >> > >> for > > >> >> > >> > >> > > > > both, effectively making it impossible to monitor > > them > > >> >> > >> properly. > > >> >> > >> > >> > > > > > > >> >> > >> > >> > > > > The reason this happens is that using "." in > topic > > >> names > > >> >> is > > >> >> > >> > pretty > > >> >> > >> > >> > > > > common, especially as a way to group topics into > > data > > >> >> > centers, > > >> >> > >> > >> > > > > relevant apps, etc - basically a work-around to > our > > >> >> current > > >> >> > >> > lack of > > >> >> > >> > >> > > > > name spaces. However, most metric monitoring > > systems > > >> >> using > > >> >> > "." > > >> >> > >> > to > > >> >> > >> > >> > > > > annotate hierarchy, so to avoid issues around > > metric > > >> >> names, > > >> >> > >> > Kafka > > >> >> > >> > >> > > > > replaces the "." in the name with an underscore. > > >> >> > >> > >> > > > > > > >> >> > >> > >> > > > > This generates good metric names, but creates the > > >> problem > > >> >> > with > > >> >> > >> > name > > >> >> > >> > >> > > > collisions. > > >> >> > >> > >> > > > > > > >> >> > >> > >> > > > > I'm wondering if it makes sense to simply limit > the > > >> range > > >> >> > of > > >> >> > >> > >> > > > > characters permitted in a topic name and disallow > > "_"? > > >> >> > >> Obviously > > >> >> > >> > >> > > > > existing topics will need to remain as is, which > > is a > > >> bit > > >> >> > >> > awkward. > > >> >> > >> > >> > > > > > >> >> > >> > >> > > > Interesting problem! Many if not most users I > > >> personally am > > >> >> > >> aware > > >> >> > >> > of > > >> >> > >> > >> > > > use "_" as a separator in topic names. I am sure > that > > >> many > > >> >> > users > > >> >> > >> > >> would > > >> >> > >> > >> > > > be quite surprised by this limitation. With that > > said, > > >> I am > > >> >> > sure > > >> >> > >> > >> > > > they'd transition accordingly. > > >> >> > >> > >> > > > > > >> >> > >> > >> > > > > > > >> >> > >> > >> > > > > If anyone has better backward-compatible > solutions > > to > > >> >> this, > > >> >> > >> I'm > > >> >> > >> > all > > >> >> > >> > >> > > ears > > >> >> > >> > >> > > > :) > > >> >> > >> > >> > > > > > > >> >> > >> > >> > > > > Gwen > > >> >> > >> > >> > > > > > >> >> > >> > >> > > > > >> >> > >> > >> > > > >> >> > >> > >> > > > >> >> > >> > >> > > > >> >> > >> > >> > -- > > >> >> > >> > >> > Grant Henke > > >> >> > >> > >> > Solutions Consultant | Cloudera > > >> >> > >> > >> > ghe...@cloudera.com | twitter.com/gchenke | > > >> >> > >> > linkedin.com/in/granthenke > > >> >> > >> > >> > > > >> >> > >> > >> > > >> >> > >> > > > > >> >> > >> > > > > >> >> > >> > > > > >> >> > >> > > -- > > >> >> > >> > > Grant Henke > > >> >> > >> > > Solutions Consultant | Cloudera > > >> >> > >> > > ghe...@cloudera.com | twitter.com/gchenke | > > >> >> > linkedin.com/in/granthenke > > >> >> > >> > > > >> >> > >> > > >> >> > > > >> >> > > >> >> > > >> >> > > >> >> -- > > >> >> Thanks, > > >> >> Neha > > >> >> > > >> > > > > > > > > > > > > -- > > > Thanks, > > > Ewen > > > > > > -- > Thanks, > Ewen > -- -- Guozhang