For resolving the metrics conflicts, we can alternatively let Kafka to
replace "." with double underscores "__" if that is the primary reason for
topic name restrictions.

Guozhang

On Sat, Jul 11, 2015 at 12:54 AM, Ewen Cheslack-Postava <e...@confluent.io>
wrote:

> On Fri, Jul 10, 2015 at 4:41 PM, Gwen Shapira <gshap...@cloudera.com>
> wrote:
>
> > Yeah, I have an actual customer who ran into this. Unfortunately,
> > inconsistencies in the way things are named are pretty common - just
> > look at Kafka's many CLI options.
> >
> > I don't think that supporting both and pointing at the docs with "I
> > told you so" when our metrics break is a good solution.
> >
>
> I agree, especially since we don't *already* have something in the docs
> indicating this will be an issue. I was flippant about the situation
> because I *wish* there was more careful consideration + naming policy in
> place, but I realize that doesn't always happen in practice. I guess I need
> to take Compatibility Czar more seriously :)
>
> I see think the obvious practical options are as follows:
>
> 1. Kill support for "_". Piss off the entire set of people who currently
> use "_" anywhere in topic names.
> 2. Kill support for ".". Piss off the entire set of people who currently
> use "." anywhere in topic names.
> 3. Tell people they need to be careful about this issue. Piss off the set
> of people who use both "_" and "." *and* happen to have conflicting topic
> names. They will have some pain when they discover the issue and have to
> figure out how to move one of those topics over to a non-conflicting name.
> I'm going to claim that this group must be an *extremely* small fraction of
> users, which doesn't make it better to allow things to break for them, but
> at least gives us an idea of the scale of impact.
>
> (One other alternative suggested earlier was encoding metric names to
> account for differences; given the metric renaming mess in the last
> release, I'm extremely hesitant to suggest anything of the sort...)
>
> None of the options are ideal, but to me, 3 seems like the least painful.
> Both for us, and for the vast majority of users. It seems to me that the
> number of users that would complain about (1) or (2) drastically outweigh
> (3).
>
> At this point, I don't think it's practical to keep switching the rules
> about which characters are allowed and which aren't because the previous
> attempts haven't been successful -- it seems the rules have changed
> multiple times, whether intentionally or accidentally, such that any more
> changes will cause problems. At this point, I think we just need to accept
> being liberal in accepting the range of topic names that have been
> permitted so far and make the best of the situation, even if it means only
> being able to warn people of conflicts.
>
> Here's another alternative: how about being liberal with topic name
> characters, but upon topic creation we convert the name to the metric name
> and fail if there's a conflict with another topic? This is relatively
> expensive (requires getting the metric name of all other topics), but it
> avoids the bad situation we're encountering here (conflicting metrics),
> avoids getting into a persistent conflict (we kill topic creation when we
> detect the issue rather than noticing it when the metrics conflict
> happens), and keeps the vast majority of existing users happy (both _ and .
> work in topic names as long as you don't create topics with conflicting
> metric names).
>
> There are definitely details to be worked out (auto topic creation?), but
> it seems like a more realistic solution than to start disallowing _ or . in
> topic names.
>
> -Ewen
>
>
> >
> > On Fri, Jul 10, 2015 at 4:33 PM, Ewen Cheslack-Postava
> > <e...@confluent.io> wrote:
> > > I figure you'll probably see complaints no matter what change you make.
> > > Gwen, given that you raised this, another important question might be
> how
> > > many people you see using *both*. I'm guessing this question came up
> > > because you actually saw a conflict? But I'd imagine (or at least hope)
> > > that most organizations are mostly consistent about naming topics --
> they
> > > standardize on one or the other.
> > >
> > > Since there's no "right" way to name them, I'd just leave it supporting
> > > both and document the potential conflict in metrics. And if people use
> > both
> > > naming schemes, they probably deserve to suffer for their inconsistency
> > :)
> > >
> > > -Ewen
> > >
> > > On Fri, Jul 10, 2015 at 3:28 PM, Gwen Shapira <gshap...@cloudera.com>
> > wrote:
> > >
> > >> I find dots more common in my customer base, so I will definitely feel
> > >> the pain of removing them.
> > >>
> > >> However, "." are already used in metrics, file names, directories, etc
> > >> - so if we keep the dots, we need to keep code that translates them
> > >> and document the translation. Just banning "." seems more natural.
> > >> Also, as Grant mentioned, we'll probably have our own special usage
> > >> for "." down the line.
> > >>
> > >> On Fri, Jul 10, 2015 at 2:12 PM, Todd Palino <tpal...@gmail.com>
> wrote:
> > >> > I absolutely disagree with #2, Neha. That will break a lot of
> > >> > infrastructure within LinkedIn. That said, removing "." might break
> > other
> > >> > people as well, but I think we should have a clearer idea of how
> much
> > >> usage
> > >> > there is on either side.
> > >> >
> > >> > -Todd
> > >> >
> > >> >
> > >> > On Fri, Jul 10, 2015 at 2:08 PM, Neha Narkhede <n...@confluent.io>
> > >> wrote:
> > >> >
> > >> >> "." seems natural for grouping topic names. +1 for 2) going forward
> > only
> > >> >> without breaking previously created topics with "_" though that
> might
> > >> >> require us to patch the code somewhat awkwardly till we phase it
> out
> > a
> > >> >> couple (purposely left vague to stay out of Ewen's wrath :-))
> > versions
> > >> >> later.
> > >> >>
> > >> >> On Fri, Jul 10, 2015 at 2:02 PM, Gwen Shapira <
> gshap...@cloudera.com
> > >
> > >> >> wrote:
> > >> >>
> > >> >> > I don't think we should break existing topics. Just disallow new
> > >> >> > topics going forward.
> > >> >> >
> > >> >> > Agree that having both is horrible, but we should have a solution
> > that
> > >> >> > fails when you run "kafka_topics.sh --create", not when you
> > configure
> > >> >> > Ganglia.
> > >> >> >
> > >> >> > Gwen
> > >> >> >
> > >> >> > On Fri, Jul 10, 2015 at 1:53 PM, Jay Kreps <j...@confluent.io>
> > wrote:
> > >> >> > > Unfortunately '.' is pretty common too. I agree that it is
> > perverse,
> > >> >> but
> > >> >> > > people seem to do it. Breaking all the topics with '.' in the
> > name
> > >> >> seems
> > >> >> > > like it could be worse than combining metrics for people who
> > have a
> > >> >> > > 'foo_bar' AND 'foo.bar' (and after all, having both is DEEPLY
> > >> perverse,
> > >> >> > > no?).
> > >> >> > >
> > >> >> > > Where is our Dean of Compatibility, Ewen, on this?
> > >> >> > >
> > >> >> > > -Jay
> > >> >> > >
> > >> >> > > On Fri, Jul 10, 2015 at 1:32 PM, Todd Palino <
> tpal...@gmail.com>
> > >> >> wrote:
> > >> >> > >
> > >> >> > >> My selfish point of view is that we do #1, as we use "_"
> > >> extensively
> > >> >> in
> > >> >> > >> topic names here :) I also happen to think it's the right
> > choice,
> > >> >> > >> specifically because "." has more special meanings, as you
> > noted.
> > >> >> > >>
> > >> >> > >> -Todd
> > >> >> > >>
> > >> >> > >>
> > >> >> > >> On Fri, Jul 10, 2015 at 1:30 PM, Gwen Shapira <
> > >> gshap...@cloudera.com>
> > >> >> > >> wrote:
> > >> >> > >>
> > >> >> > >> > Unintentional side effect from allowing IP addresses in
> > consumer
> > >> >> > client
> > >> >> > >> > IDs :)
> > >> >> > >> >
> > >> >> > >> > So the question is, what do we do now?
> > >> >> > >> >
> > >> >> > >> > 1) disallow "."
> > >> >> > >> > 2) disallow "_"
> > >> >> > >> > 3) find a reversible way to encode "." and "_" that won't
> > break
> > >> >> > existing
> > >> >> > >> > metrics
> > >> >> > >> > 4) all of the above?
> > >> >> > >> >
> > >> >> > >> > btw. it looks like "." and ".." are currently valid. Topic
> > names
> > >> are
> > >> >> > >> > used for directories, right? this sounds like fun :)
> > >> >> > >> >
> > >> >> > >> > I vote for option #1, although if someone has a good idea
> for
> > #3
> > >> it
> > >> >> > >> > will be even better.
> > >> >> > >> >
> > >> >> > >> > Gwen
> > >> >> > >> >
> > >> >> > >> >
> > >> >> > >> >
> > >> >> > >> > On Fri, Jul 10, 2015 at 1:22 PM, Grant Henke <
> > >> ghe...@cloudera.com>
> > >> >> > >> wrote:
> > >> >> > >> > > Found it was added here:
> > >> >> > >> https://issues.apache.org/jira/browse/KAFKA-697
> > >> >> > >> > >
> > >> >> > >> > > On Fri, Jul 10, 2015 at 3:18 PM, Todd Palino <
> > >> tpal...@gmail.com>
> > >> >> > >> wrote:
> > >> >> > >> > >
> > >> >> > >> > >> This was definitely changed at some point after
> KAFKA-495.
> > The
> > >> >> > >> question
> > >> >> > >> > is
> > >> >> > >> > >> when and why.
> > >> >> > >> > >>
> > >> >> > >> > >> Here's the relevant code from that patch:
> > >> >> > >> > >>
> > >> >> > >> > >>
> > >> >> ===================================================================
> > >> >> > >> > >> --- core/src/main/scala/kafka/utils/Topic.scala (revision
> > >> >> 1390178)
> > >> >> > >> > >> +++ core/src/main/scala/kafka/utils/Topic.scala (working
> > copy)
> > >> >> > >> > >> @@ -21,24 +21,21 @@
> > >> >> > >> > >>  import util.matching.Regex
> > >> >> > >> > >>
> > >> >> > >> > >>  object Topic {
> > >> >> > >> > >> +  val legalChars = "[a-zA-Z0-9_-]"
> > >> >> > >> > >>
> > >> >> > >> > >>
> > >> >> > >> > >>
> > >> >> > >> > >> -Todd
> > >> >> > >> > >>
> > >> >> > >> > >>
> > >> >> > >> > >> On Fri, Jul 10, 2015 at 1:02 PM, Grant Henke <
> > >> >> ghe...@cloudera.com>
> > >> >> > >> > wrote:
> > >> >> > >> > >>
> > >> >> > >> > >> > kafka.common.Topic shows that currently period is a
> valid
> > >> >> > character
> > >> >> > >> > and I
> > >> >> > >> > >> > have verified I can use kafka-topics.sh to create a new
> > >> topic
> > >> >> > with a
> > >> >> > >> > >> > period.
> > >> >> > >> > >> >
> > >> >> > >> > >> >
> > >> >> > >> > >> >
> AdminUtils.createOrUpdateTopicPartitionAssignmentPathInZK
> > >> >> > currently
> > >> >> > >> > uses
> > >> >> > >> > >> > Topic.validate before writing to Zookeeper.
> > >> >> > >> > >> >
> > >> >> > >> > >> > Should period character support be removed? I was under
> > the
> > >> >> same
> > >> >> > >> > >> impression
> > >> >> > >> > >> > as Gwen, that a period was used by many as a way to
> > "group"
> > >> >> > topics.
> > >> >> > >> > >> >
> > >> >> > >> > >> > The code is pasted below since its small:
> > >> >> > >> > >> >
> > >> >> > >> > >> > object Topic {
> > >> >> > >> > >> >   val legalChars = "[a-zA-Z0-9\\._\\-]"
> > >> >> > >> > >> >   private val maxNameLength = 255
> > >> >> > >> > >> >   private val rgx = new Regex(legalChars + "+")
> > >> >> > >> > >> >
> > >> >> > >> > >> >   val InternalTopics =
> > Set(OffsetManager.OffsetsTopicName)
> > >> >> > >> > >> >
> > >> >> > >> > >> >   def validate(topic: String) {
> > >> >> > >> > >> >     if (topic.length <= 0)
> > >> >> > >> > >> >       throw new InvalidTopicException("topic name is
> > >> illegal,
> > >> >> > can't
> > >> >> > >> be
> > >> >> > >> > >> > empty")
> > >> >> > >> > >> >     else if (topic.equals(".") || topic.equals(".."))
> > >> >> > >> > >> >       throw new InvalidTopicException("topic name
> cannot
> > be
> > >> >> > \".\" or
> > >> >> > >> > >> > \"..\"")
> > >> >> > >> > >> >     else if (topic.length > maxNameLength)
> > >> >> > >> > >> >       throw new InvalidTopicException("topic name is
> > >> illegal,
> > >> >> > can't
> > >> >> > >> be
> > >> >> > >> > >> > longer than " + maxNameLength + " characters")
> > >> >> > >> > >> >
> > >> >> > >> > >> >     rgx.findFirstIn(topic) match {
> > >> >> > >> > >> >       case Some(t) =>
> > >> >> > >> > >> >         if (!t.equals(topic))
> > >> >> > >> > >> >           throw new InvalidTopicException("topic name
> " +
> > >> topic
> > >> >> > + "
> > >> >> > >> is
> > >> >> > >> > >> > illegal, contains a character other than ASCII
> > >> alphanumerics,
> > >> >> > '.',
> > >> >> > >> '_'
> > >> >> > >> > >> and
> > >> >> > >> > >> > '-'")
> > >> >> > >> > >> >       case None => throw new
> InvalidTopicException("topic
> > >> name
> > >> >> "
> > >> >> > +
> > >> >> > >> > topic
> > >> >> > >> > >> +
> > >> >> > >> > >> > " is illegal,  contains a character other than ASCII
> > >> >> > alphanumerics,
> > >> >> > >> > '.',
> > >> >> > >> > >> > '_' and '-'")
> > >> >> > >> > >> >     }
> > >> >> > >> > >> >   }
> > >> >> > >> > >> > }
> > >> >> > >> > >> >
> > >> >> > >> > >> > On Fri, Jul 10, 2015 at 2:50 PM, Todd Palino <
> > >> >> tpal...@gmail.com>
> > >> >> > >> > wrote:
> > >> >> > >> > >> >
> > >> >> > >> > >> > > I had to go look this one up again to make sure -
> > >> >> > >> > >> > > https://issues.apache.org/jira/browse/KAFKA-495
> > >> >> > >> > >> > >
> > >> >> > >> > >> > > The only valid character names for topics are
> > >> alphanumeric,
> > >> >> > >> > underscore,
> > >> >> > >> > >> > and
> > >> >> > >> > >> > > dash. A period is not supposed to be a valid
> character
> > to
> > >> >> use.
> > >> >> > If
> > >> >> > >> > >> you're
> > >> >> > >> > >> > > seeing them, then one of two things have happened:
> > >> >> > >> > >> > >
> > >> >> > >> > >> > > 1) You have topic names that are grandfathered in
> from
> > >> before
> > >> >> > that
> > >> >> > >> > >> patch
> > >> >> > >> > >> > > 2) The patch is not working properly and there is
> > >> somewhere
> > >> >> in
> > >> >> > the
> > >> >> > >> > >> broker
> > >> >> > >> > >> > > that the standard is not being enforced.
> > >> >> > >> > >> > >
> > >> >> > >> > >> > > -Todd
> > >> >> > >> > >> > >
> > >> >> > >> > >> > >
> > >> >> > >> > >> > > On Fri, Jul 10, 2015 at 12:13 PM, Brock Noland <
> > >> >> > br...@apache.org>
> > >> >> > >> > >> wrote:
> > >> >> > >> > >> > >
> > >> >> > >> > >> > > > On Fri, Jul 10, 2015 at 11:34 AM, Gwen Shapira <
> > >> >> > >> > >> gshap...@cloudera.com>
> > >> >> > >> > >> > > > wrote:
> > >> >> > >> > >> > > > > Hi Kafka Fans,
> > >> >> > >> > >> > > > >
> > >> >> > >> > >> > > > > If you have one topic named "kafka_lab_2" and the
> > >> other
> > >> >> > named
> > >> >> > >> > >> > > > > "kafka.lab.2", the topic level metrics will be
> > named
> > >> >> > >> kafka_lab_2
> > >> >> > >> > >> for
> > >> >> > >> > >> > > > > both, effectively making it impossible to monitor
> > them
> > >> >> > >> properly.
> > >> >> > >> > >> > > > >
> > >> >> > >> > >> > > > > The reason this happens is that using "." in
> topic
> > >> names
> > >> >> is
> > >> >> > >> > pretty
> > >> >> > >> > >> > > > > common, especially as a way to group topics into
> > data
> > >> >> > centers,
> > >> >> > >> > >> > > > > relevant apps, etc - basically a work-around to
> our
> > >> >> current
> > >> >> > >> > lack of
> > >> >> > >> > >> > > > > name spaces. However, most metric monitoring
> > systems
> > >> >> using
> > >> >> > "."
> > >> >> > >> > to
> > >> >> > >> > >> > > > > annotate hierarchy, so to avoid issues around
> > metric
> > >> >> names,
> > >> >> > >> > Kafka
> > >> >> > >> > >> > > > > replaces the "." in the name with an underscore.
> > >> >> > >> > >> > > > >
> > >> >> > >> > >> > > > > This generates good metric names, but creates the
> > >> problem
> > >> >> > with
> > >> >> > >> > name
> > >> >> > >> > >> > > > collisions.
> > >> >> > >> > >> > > > >
> > >> >> > >> > >> > > > > I'm wondering if it makes sense to simply limit
> the
> > >> range
> > >> >> > of
> > >> >> > >> > >> > > > > characters permitted in a topic name and disallow
> > "_"?
> > >> >> > >> Obviously
> > >> >> > >> > >> > > > > existing topics will need to remain as is, which
> > is a
> > >> bit
> > >> >> > >> > awkward.
> > >> >> > >> > >> > > >
> > >> >> > >> > >> > > > Interesting problem! Many if not most users I
> > >> personally am
> > >> >> > >> aware
> > >> >> > >> > of
> > >> >> > >> > >> > > > use "_" as a separator in topic names. I am sure
> that
> > >> many
> > >> >> > users
> > >> >> > >> > >> would
> > >> >> > >> > >> > > > be quite surprised by this limitation. With that
> > said,
> > >> I am
> > >> >> > sure
> > >> >> > >> > >> > > > they'd transition accordingly.
> > >> >> > >> > >> > > >
> > >> >> > >> > >> > > > >
> > >> >> > >> > >> > > > > If anyone has better backward-compatible
> solutions
> > to
> > >> >> this,
> > >> >> > >> I'm
> > >> >> > >> > all
> > >> >> > >> > >> > > ears
> > >> >> > >> > >> > > > :)
> > >> >> > >> > >> > > > >
> > >> >> > >> > >> > > > > Gwen
> > >> >> > >> > >> > > >
> > >> >> > >> > >> > >
> > >> >> > >> > >> >
> > >> >> > >> > >> >
> > >> >> > >> > >> >
> > >> >> > >> > >> > --
> > >> >> > >> > >> > Grant Henke
> > >> >> > >> > >> > Solutions Consultant | Cloudera
> > >> >> > >> > >> > ghe...@cloudera.com | twitter.com/gchenke |
> > >> >> > >> > linkedin.com/in/granthenke
> > >> >> > >> > >> >
> > >> >> > >> > >>
> > >> >> > >> > >
> > >> >> > >> > >
> > >> >> > >> > >
> > >> >> > >> > > --
> > >> >> > >> > > Grant Henke
> > >> >> > >> > > Solutions Consultant | Cloudera
> > >> >> > >> > > ghe...@cloudera.com | twitter.com/gchenke |
> > >> >> > linkedin.com/in/granthenke
> > >> >> > >> >
> > >> >> > >>
> > >> >> >
> > >> >>
> > >> >>
> > >> >>
> > >> >> --
> > >> >> Thanks,
> > >> >> Neha
> > >> >>
> > >>
> > >
> > >
> > >
> > > --
> > > Thanks,
> > > Ewen
> >
>
>
>
> --
> Thanks,
> Ewen
>



-- 
-- Guozhang

Reply via email to