Can we provide a tool so folks can "sync back" old topic names to new so
their clusters aren't format lopsided.

~ Joestein
On Jul 11, 2015 1:33 PM, "Todd Palino" <tpal...@gmail.com> wrote:

> I tend to agree with this as a compromise at this point. The reality is
> that this is technical debt that has built up in the project, and it does
> not go away by documenting it, and it will only get worse.
>
> As pointed out, eliminating either character at this point is going to
> cause problems for someone. And unfortunately, Guozhang, converting to __
> doesn't really solve the problem either because that is still a valid topic
> name that could collide. It's less likely, but all it does is move the debt
> around a little.
>
> -Todd
>
> > On Jul 11, 2015, at 10:16 AM, Brock Noland <br...@apache.org> wrote:
> >
> > On Sat, Jul 11, 2015 at 12:54 AM, Ewen Cheslack-Postava
> > <e...@confluent.io> wrote:
> >> On Fri, Jul 10, 2015 at 4:41 PM, Gwen Shapira <gshap...@cloudera.com>
> wrote:
> >>
> >>> Yeah, I have an actual customer who ran into this. Unfortunately,
> >>> inconsistencies in the way things are named are pretty common - just
> >>> look at Kafka's many CLI options.
> >>>
> >>> I don't think that supporting both and pointing at the docs with "I
> >>> told you so" when our metrics break is a good solution.
> >>
> >> I agree, especially since we don't *already* have something in the docs
> >> indicating this will be an issue. I was flippant about the situation
> >> because I *wish* there was more careful consideration + naming policy in
> >> place, but I realize that doesn't always happen in practice. I guess I
> need
> >> to take Compatibility Czar more seriously :)
> >>
> >> I see think the obvious practical options are as follows:
> >>
> >> 1. Kill support for "_". Piss off the entire set of people who currently
> >> use "_" anywhere in topic names.
> >> 2. Kill support for ".". Piss off the entire set of people who currently
> >> use "." anywhere in topic names.
> >> 3. Tell people they need to be careful about this issue. Piss off the
> set
> >> of people who use both "_" and "." *and* happen to have conflicting
> topic
> >> names. They will have some pain when they discover the issue and have to
> >> figure out how to move one of those topics over to a non-conflicting
> name.
> >> I'm going to claim that this group must be an *extremely* small
> fraction of
> >> users, which doesn't make it better to allow things to break for them,
> but
> >> at least gives us an idea of the scale of impact.
> >>
> >> (One other alternative suggested earlier was encoding metric names to
> >> account for differences; given the metric renaming mess in the last
> >> release, I'm extremely hesitant to suggest anything of the sort...)
> >>
> >> None of the options are ideal, but to me, 3 seems like the least
> painful.
> >> Both for us, and for the vast majority of users. It seems to me that the
> >> number of users that would complain about (1) or (2) drastically
> outweigh
> >> (3).
> >>
> >> At this point, I don't think it's practical to keep switching the rules
> >> about which characters are allowed and which aren't because the previous
> >> attempts haven't been successful -- it seems the rules have changed
> >> multiple times, whether intentionally or accidentally, such that any
> more
> >> changes will cause problems. At this point, I think we just need to
> accept
> >> being liberal in accepting the range of topic names that have been
> >> permitted so far and make the best of the situation, even if it means
> only
> >> being able to warn people of conflicts.
> >>
> >> Here's another alternative: how about being liberal with topic name
> >> characters, but upon topic creation we convert the name to the metric
> name
> >> and fail if there's a conflict with another topic? This is relatively
> >> expensive (requires getting the metric name of all other topics), but it
> >> avoids the bad situation we're encountering here (conflicting metrics),
> >> avoids getting into a persistent conflict (we kill topic creation when
> we
> >> detect the issue rather than noticing it when the metrics conflict
> >> happens), and keeps the vast majority of existing users happy (both _
> and .
> >> work in topic names as long as you don't create topics with conflicting
> >> metric names).
> >>
> >> There are definitely details to be worked out (auto topic creation?),
> but
> >> it seems like a more realistic solution than to start disallowing _ or
> . in
> >> topic names.
> >
> > I was thinking the same. Allow a.b or a_b but not a.b and a_b. This
> > seems like it will impact a trivial amount of users and keep both the
> > "." and "_" camps happy.
> >
> >>
> >> -Ewen
> >>
> >>
> >>>
> >>> On Fri, Jul 10, 2015 at 4:33 PM, Ewen Cheslack-Postava
> >>> <e...@confluent.io> wrote:
> >>>> I figure you'll probably see complaints no matter what change you
> make.
> >>>> Gwen, given that you raised this, another important question might be
> how
> >>>> many people you see using *both*. I'm guessing this question came up
> >>>> because you actually saw a conflict? But I'd imagine (or at least
> hope)
> >>>> that most organizations are mostly consistent about naming topics --
> they
> >>>> standardize on one or the other.
> >>>>
> >>>> Since there's no "right" way to name them, I'd just leave it
> supporting
> >>>> both and document the potential conflict in metrics. And if people use
> >>> both
> >>>> naming schemes, they probably deserve to suffer for their
> inconsistency
> >>> :)
> >>>>
> >>>> -Ewen
> >>>>
> >>>>> On Fri, Jul 10, 2015 at 3:28 PM, Gwen Shapira <gshap...@cloudera.com
> >
> >>>> wrote:
> >>>>
> >>>>> I find dots more common in my customer base, so I will definitely
> feel
> >>>>> the pain of removing them.
> >>>>>
> >>>>> However, "." are already used in metrics, file names, directories,
> etc
> >>>>> - so if we keep the dots, we need to keep code that translates them
> >>>>> and document the translation. Just banning "." seems more natural.
> >>>>> Also, as Grant mentioned, we'll probably have our own special usage
> >>>>> for "." down the line.
> >>>>>
> >>>>>> On Fri, Jul 10, 2015 at 2:12 PM, Todd Palino <tpal...@gmail.com>
> wrote:
> >>>>>> I absolutely disagree with #2, Neha. That will break a lot of
> >>>>>> infrastructure within LinkedIn. That said, removing "." might break
> >>> other
> >>>>>> people as well, but I think we should have a clearer idea of how
> much
> >>>>> usage
> >>>>>> there is on either side.
> >>>>>>
> >>>>>> -Todd
> >>>>>>
> >>>>>>
> >>>>>>> On Fri, Jul 10, 2015 at 2:08 PM, Neha Narkhede <n...@confluent.io>
> >>>>>> wrote:
> >>>>>>
> >>>>>>> "." seems natural for grouping topic names. +1 for 2) going forward
> >>> only
> >>>>>>> without breaking previously created topics with "_" though that
> might
> >>>>>>> require us to patch the code somewhat awkwardly till we phase it
> out
> >>> a
> >>>>>>> couple (purposely left vague to stay out of Ewen's wrath :-))
> >>> versions
> >>>>>>> later.
> >>>>>>>
> >>>>>>> On Fri, Jul 10, 2015 at 2:02 PM, Gwen Shapira <
> gshap...@cloudera.com
> >>>>
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>>> I don't think we should break existing topics. Just disallow new
> >>>>>>>> topics going forward.
> >>>>>>>>
> >>>>>>>> Agree that having both is horrible, but we should have a solution
> >>> that
> >>>>>>>> fails when you run "kafka_topics.sh --create", not when you
> >>> configure
> >>>>>>>> Ganglia.
> >>>>>>>>
> >>>>>>>> Gwen
> >>>>>>>>
> >>>>>>>> On Fri, Jul 10, 2015 at 1:53 PM, Jay Kreps <j...@confluent.io>
> >>> wrote:
> >>>>>>>>> Unfortunately '.' is pretty common too. I agree that it is
> >>> perverse,
> >>>>>>> but
> >>>>>>>>> people seem to do it. Breaking all the topics with '.' in the
> >>> name
> >>>>>>> seems
> >>>>>>>>> like it could be worse than combining metrics for people who
> >>> have a
> >>>>>>>>> 'foo_bar' AND 'foo.bar' (and after all, having both is DEEPLY
> >>>>> perverse,
> >>>>>>>>> no?).
> >>>>>>>>>
> >>>>>>>>> Where is our Dean of Compatibility, Ewen, on this?
> >>>>>>>>>
> >>>>>>>>> -Jay
> >>>>>>>>>
> >>>>>>>>> On Fri, Jul 10, 2015 at 1:32 PM, Todd Palino <tpal...@gmail.com>
> >>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>> My selfish point of view is that we do #1, as we use "_"
> >>>>> extensively
> >>>>>>> in
> >>>>>>>>>> topic names here :) I also happen to think it's the right
> >>> choice,
> >>>>>>>>>> specifically because "." has more special meanings, as you
> >>> noted.
> >>>>>>>>>>
> >>>>>>>>>> -Todd
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> On Fri, Jul 10, 2015 at 1:30 PM, Gwen Shapira <
> >>>>> gshap...@cloudera.com>
> >>>>>>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> Unintentional side effect from allowing IP addresses in
> >>> consumer
> >>>>>>>> client
> >>>>>>>>>>> IDs :)
> >>>>>>>>>>>
> >>>>>>>>>>> So the question is, what do we do now?
> >>>>>>>>>>>
> >>>>>>>>>>> 1) disallow "."
> >>>>>>>>>>> 2) disallow "_"
> >>>>>>>>>>> 3) find a reversible way to encode "." and "_" that won't
> >>> break
> >>>>>>>> existing
> >>>>>>>>>>> metrics
> >>>>>>>>>>> 4) all of the above?
> >>>>>>>>>>>
> >>>>>>>>>>> btw. it looks like "." and ".." are currently valid. Topic
> >>> names
> >>>>> are
> >>>>>>>>>>> used for directories, right? this sounds like fun :)
> >>>>>>>>>>>
> >>>>>>>>>>> I vote for option #1, although if someone has a good idea for
> >>> #3
> >>>>> it
> >>>>>>>>>>> will be even better.
> >>>>>>>>>>>
> >>>>>>>>>>> Gwen
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> On Fri, Jul 10, 2015 at 1:22 PM, Grant Henke <
> >>>>> ghe...@cloudera.com>
> >>>>>>>>>> wrote:
> >>>>>>>>>>>> Found it was added here:
> >>>>>>>>>> https://issues.apache.org/jira/browse/KAFKA-697
> >>>>>>>>>>>>
> >>>>>>>>>>>> On Fri, Jul 10, 2015 at 3:18 PM, Todd Palino <
> >>>>> tpal...@gmail.com>
> >>>>>>>>>> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>>> This was definitely changed at some point after KAFKA-495.
> >>> The
> >>>>>>>>>> question
> >>>>>>>>>>> is
> >>>>>>>>>>>>> when and why.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Here's the relevant code from that patch:
> >>>>>>> ===================================================================
> >>>>>>>>>>>>> --- core/src/main/scala/kafka/utils/Topic.scala (revision
> >>>>>>> 1390178)
> >>>>>>>>>>>>> +++ core/src/main/scala/kafka/utils/Topic.scala (working
> >>> copy)
> >>>>>>>>>>>>> @@ -21,24 +21,21 @@
> >>>>>>>>>>>>> import util.matching.Regex
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> object Topic {
> >>>>>>>>>>>>> +  val legalChars = "[a-zA-Z0-9_-]"
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> -Todd
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On Fri, Jul 10, 2015 at 1:02 PM, Grant Henke <
> >>>>>>> ghe...@cloudera.com>
> >>>>>>>>>>> wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> kafka.common.Topic shows that currently period is a valid
> >>>>>>>> character
> >>>>>>>>>>> and I
> >>>>>>>>>>>>>> have verified I can use kafka-topics.sh to create a new
> >>>>> topic
> >>>>>>>> with a
> >>>>>>>>>>>>>> period.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> AdminUtils.createOrUpdateTopicPartitionAssignmentPathInZK
> >>>>>>>> currently
> >>>>>>>>>>> uses
> >>>>>>>>>>>>>> Topic.validate before writing to Zookeeper.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Should period character support be removed? I was under
> >>> the
> >>>>>>> same
> >>>>>>>>>>>>> impression
> >>>>>>>>>>>>>> as Gwen, that a period was used by many as a way to
> >>> "group"
> >>>>>>>> topics.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> The code is pasted below since its small:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> object Topic {
> >>>>>>>>>>>>>>  val legalChars = "[a-zA-Z0-9\\._\\-]"
> >>>>>>>>>>>>>>  private val maxNameLength = 255
> >>>>>>>>>>>>>>  private val rgx = new Regex(legalChars + "+")
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>  val InternalTopics =
> >>> Set(OffsetManager.OffsetsTopicName)
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>  def validate(topic: String) {
> >>>>>>>>>>>>>>    if (topic.length <= 0)
> >>>>>>>>>>>>>>      throw new InvalidTopicException("topic name is
> >>>>> illegal,
> >>>>>>>> can't
> >>>>>>>>>> be
> >>>>>>>>>>>>>> empty")
> >>>>>>>>>>>>>>    else if (topic.equals(".") || topic.equals(".."))
> >>>>>>>>>>>>>>      throw new InvalidTopicException("topic name cannot
> >>> be
> >>>>>>>> \".\" or
> >>>>>>>>>>>>>> \"..\"")
> >>>>>>>>>>>>>>    else if (topic.length > maxNameLength)
> >>>>>>>>>>>>>>      throw new InvalidTopicException("topic name is
> >>>>> illegal,
> >>>>>>>> can't
> >>>>>>>>>> be
> >>>>>>>>>>>>>> longer than " + maxNameLength + " characters")
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>    rgx.findFirstIn(topic) match {
> >>>>>>>>>>>>>>      case Some(t) =>
> >>>>>>>>>>>>>>        if (!t.equals(topic))
> >>>>>>>>>>>>>>          throw new InvalidTopicException("topic name " +
> >>>>> topic
> >>>>>>>> + "
> >>>>>>>>>> is
> >>>>>>>>>>>>>> illegal, contains a character other than ASCII
> >>>>> alphanumerics,
> >>>>>>>> '.',
> >>>>>>>>>> '_'
> >>>>>>>>>>>>> and
> >>>>>>>>>>>>>> '-'")
> >>>>>>>>>>>>>>      case None => throw new InvalidTopicException("topic
> >>>>> name
> >>>>>>> "
> >>>>>>>> +
> >>>>>>>>>>> topic
> >>>>>>>>>>>>> +
> >>>>>>>>>>>>>> " is illegal,  contains a character other than ASCII
> >>>>>>>> alphanumerics,
> >>>>>>>>>>> '.',
> >>>>>>>>>>>>>> '_' and '-'")
> >>>>>>>>>>>>>>    }
> >>>>>>>>>>>>>>  }
> >>>>>>>>>>>>>> }
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On Fri, Jul 10, 2015 at 2:50 PM, Todd Palino <
> >>>>>>> tpal...@gmail.com>
> >>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> I had to go look this one up again to make sure -
> >>>>>>>>>>>>>>> https://issues.apache.org/jira/browse/KAFKA-495
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> The only valid character names for topics are
> >>>>> alphanumeric,
> >>>>>>>>>>> underscore,
> >>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>> dash. A period is not supposed to be a valid character
> >>> to
> >>>>>>> use.
> >>>>>>>> If
> >>>>>>>>>>>>> you're
> >>>>>>>>>>>>>>> seeing them, then one of two things have happened:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> 1) You have topic names that are grandfathered in from
> >>>>> before
> >>>>>>>> that
> >>>>>>>>>>>>> patch
> >>>>>>>>>>>>>>> 2) The patch is not working properly and there is
> >>>>> somewhere
> >>>>>>> in
> >>>>>>>> the
> >>>>>>>>>>>>> broker
> >>>>>>>>>>>>>>> that the standard is not being enforced.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> -Todd
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> On Fri, Jul 10, 2015 at 12:13 PM, Brock Noland <
> >>>>>>>> br...@apache.org>
> >>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> On Fri, Jul 10, 2015 at 11:34 AM, Gwen Shapira <
> >>>>>>>>>>>>> gshap...@cloudera.com>
> >>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>> Hi Kafka Fans,
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> If you have one topic named "kafka_lab_2" and the
> >>>>> other
> >>>>>>>> named
> >>>>>>>>>>>>>>>>> "kafka.lab.2", the topic level metrics will be
> >>> named
> >>>>>>>>>> kafka_lab_2
> >>>>>>>>>>>>> for
> >>>>>>>>>>>>>>>>> both, effectively making it impossible to monitor
> >>> them
> >>>>>>>>>> properly.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> The reason this happens is that using "." in topic
> >>>>> names
> >>>>>>> is
> >>>>>>>>>>> pretty
> >>>>>>>>>>>>>>>>> common, especially as a way to group topics into
> >>> data
> >>>>>>>> centers,
> >>>>>>>>>>>>>>>>> relevant apps, etc - basically a work-around to our
> >>>>>>> current
> >>>>>>>>>>> lack of
> >>>>>>>>>>>>>>>>> name spaces. However, most metric monitoring
> >>> systems
> >>>>>>> using
> >>>>>>>> "."
> >>>>>>>>>>> to
> >>>>>>>>>>>>>>>>> annotate hierarchy, so to avoid issues around
> >>> metric
> >>>>>>> names,
> >>>>>>>>>>> Kafka
> >>>>>>>>>>>>>>>>> replaces the "." in the name with an underscore.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> This generates good metric names, but creates the
> >>>>> problem
> >>>>>>>> with
> >>>>>>>>>>> name
> >>>>>>>>>>>>>>>> collisions.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> I'm wondering if it makes sense to simply limit the
> >>>>> range
> >>>>>>>> of
> >>>>>>>>>>>>>>>>> characters permitted in a topic name and disallow
> >>> "_"?
> >>>>>>>>>> Obviously
> >>>>>>>>>>>>>>>>> existing topics will need to remain as is, which
> >>> is a
> >>>>> bit
> >>>>>>>>>>> awkward.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Interesting problem! Many if not most users I
> >>>>> personally am
> >>>>>>>>>> aware
> >>>>>>>>>>> of
> >>>>>>>>>>>>>>>> use "_" as a separator in topic names. I am sure that
> >>>>> many
> >>>>>>>> users
> >>>>>>>>>>>>> would
> >>>>>>>>>>>>>>>> be quite surprised by this limitation. With that
> >>> said,
> >>>>> I am
> >>>>>>>> sure
> >>>>>>>>>>>>>>>> they'd transition accordingly.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> If anyone has better backward-compatible solutions
> >>> to
> >>>>>>> this,
> >>>>>>>>>> I'm
> >>>>>>>>>>> all
> >>>>>>>>>>>>>>> ears
> >>>>>>>>>>>>>>>> :)
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Gwen
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> --
> >>>>>>>>>>>>>> Grant Henke
> >>>>>>>>>>>>>> Solutions Consultant | Cloudera
> >>>>>>>>>>>>>> ghe...@cloudera.com | twitter.com/gchenke |
> >>>>>>>>>>> linkedin.com/in/granthenke
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> --
> >>>>>>>>>>>> Grant Henke
> >>>>>>>>>>>> Solutions Consultant | Cloudera
> >>>>>>>>>>>> ghe...@cloudera.com | twitter.com/gchenke |
> >>>>>>>> linkedin.com/in/granthenke
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> --
> >>>>>>> Thanks,
> >>>>>>> Neha
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> Thanks,
> >>>> Ewen
> >>
> >>
> >>
> >> --
> >> Thanks,
> >> Ewen
>

Reply via email to