In a recent thread on this mailing list ("Limiting the blast radius of
OTel..."), several people once again suggested that Prometheus should
just allow the dot (`.`) as a regular character in metric and label
names and be done with it. I responded that we have discussed this
topic countless times, always with the result of not doing it
(yet). Of course, we are free to reopen the discussion as often as
anyone wishes (and in fact, one argument in the past was that we
should first introduce full UTF-8 capabilities via quoting and see how
it goes, and then we can still consider "graduating" selected
characters to regular characters that can be used without quoting).

However, the reason for this mail is that I also said that I won't
reiterate all the points made over and over again. After that, an
individual approached me and asked where they could read up about
those points, and I realized that they are hard to find in documented
form. (My vague memory was that I already wrote a mail like this in
the past, but I cannot find it anymore, and the relevant notes from
dev-summits are not detailed and structured enough to serve as a
reference.)

Therefore, I'll reiterate all those points one more time so that we
don't have to do it again in the future. Please amend this list if you
find any omissions. [In this list, I also tried to say something about
the relevance of each point. This is marked by square brackets.]

1. The probably oldest reason is a plan for a short-form notation of
   the job label. `requests_total{job="api"}` could be written as
   `requests_total.api`. This originates from an ancient internal
   Google practice. [I don't think that this point has any relevance
   anymore. The job label is now considered way less special than
   traditionally. Additionally, the short form would only work if the
   value of the job label follows the same character restrictions as
   names, which would cause confusion for sure when it doesn't.]

2. In the early years of Prometheus, the statsd/Graphite stack was
   very relevant. Dots play a very special role there. In contrast,
   even if we had allowed dots in Prometheus names from the beginning,
   they would just have been characters as all the
   others. Superficially, it would have looked like better
   interoperability, but it would not have lived up to its implied
   promises, because Graphite-style globbing would not have worked,
   the metrics would not have had an actual hierachy like in the
   Graphite data model etc. [This point is much weaker nowadays
   because most users are probably more familiar with the
   Prometheus-style label based data model than with the hierarchical
   Graphite data model. I wouldn't expect much confusion because of
   that. However, this point still illustrates the fundamental problem
   of turning a character that is part of the actual syntax and
   arguably even a real operator in one system into "just another
   character" in an opaque string in the other system, where the
   syntactic meaning only exist as a convention among humans. This is
   also relevant for some of the other points below.]

3. Naming is a hard problem, as we all know. Many of the early
   Prometheus contributors had rich experience with running complex
   systems at scale. They all got burned by the fact that our brains
   are really bad at remembering if something was named `foo-bar-baz`
   or `foo_bar_baz` or `foo.bar.baz` or `foo/bar/baz` (or even
   `foo_bar.baz`), especially in the heat of fighting an
   outage. Following the "simple, light-weight, opinionated" paradigm
   (once more many thanks to Julius to have expressed it so concisely
   recently), Prometheus decided to have one and only one separator
   character. In addition, this one separator character isn't really
   special in a lot of languages, so names from the Prometheus
   ecosystem would translate into names in other contexts easily
   (initially and practically most relevant for Go templating, but the
   idea works in a much wider scope). (One might come up with the
   counter argument that Prometheus also allows `:` as a
   separator. That's indeed a deviation from the fundamental idea. `:`
   is meant only for rules, but that's just a convention and not
   enforced by syntax. However, it has worked quite well for all those
   years, presumably because people rarely use `:` as a separator
   character by accident.) OTel semantic conventions are the
   antithesis of this: They introduce two different separator
   characters with a slightly different meaning (`.` for "namespaces",
   but they aren't really namespaces, more about that below). And they
   use a character that has a special meaning in a lot of
   languages. (Coming back to the Go templating example:
   `$labels.service_instance_id` is valid,
   `$labels.service.instance.id` is not. It forces you to jump through
   hoops and write `index $labels "service.instance.id"`. Similar
   issues will occur in many other languages.) [This might appear a
   minor annoyance to many, but in my experience, it creates a huge
   deal of peace of mind in the long run. This is also a good example
   why it is useful to mark `.` as special via requiring the quoting
   syntax. If we allowed `.` as a regular character, it will
   inevitably show up even in use cases that are untouched by OTel's
   semantic conventians, defeating the idea of "one and only one
   separator character". In a way, the effort of quoting protects
   regular Prometheus users from the `.` "pollution". Or in other
   words: By allowing `.` as a regular character, we would make the
   life of regular Prometheus users harder to accommodate OTel needs
   originating from a questionable decision.]

4. Much more vague than (1), but there have been thoughts about
   "proper" namespaces for a long time. The weird namespace concept in
   client_golang is a witness from the distant past, but that
   namespacing appears more like a joke in hindsight and never got
   traction. By now, it has become more of an annoyance we want to get
   rid of (but ironically, it is very similar to the "namespace"
   concept of OTel's semantic conventions). What makes a namespace
   "proper"?  Maybe it's about the ability to be "inside" a namespace
   so you don't have to add the prefix or suffix all the time. Or it's
   about the namespaces to be indexed ("apply this query only to
   metrics in that namespace"). But most importantly, a namespace must
   come with an unambiguous syntax, which mostly boils down to having
   a namespace operator. The most common namespace operator is
   probably `.`, and that has been a good reason to reserve it in
   Prometheus. OTel's semantic conventions claim to use `.` for
   namespacing, too, but it's not an operator, it's just a
   convention. Which leads to weird stipulations like this one
   (https://opentelemetry.io/docs/specs/semconv/general/attribute-naming/):
   "Names SHOULD NOT coincide with namespaces. For example if
   service.instance.id is an attribute name then it is no longer valid
   to have an attribute named service.instance because
   service.instance is already a namespace. Because of this rule be
   careful when choosing names: every existing name prohibits
   existence of an equally named namespace in the future, and vice
   versa: any existing namespace prohibits existence of an equally
   named attribute key in the future." If `.` were a real namespace
   operator, you simply would not have this problem. [It's obviously a
   weak claim to block a feature in the present to keep open the
   option for a vaguely planned feature in the future. Furthermore, we
   could just use another character for the namespace
   operator. (Although C++ style `::` wouldn't work because `:` is
   already a regular character in names. And having a "weird"
   namespace operator next to `.` as a regular character will be
   confusing.) Still, I think proper namespacing would be so nice that
   we shouldn't dismiss it easily. In this context, it's doubly
   annoying that we cannot even just interpret the `.` coming from
   OTel as a "true" namespace operator because OTel _also_ allows it
   as a regular character. You can never know if a `.` coming from
   OTel is meant as a namespace separator, so you would treat it as
   such in your powerful namespace-enabled backend, or if it is just
   part of the name.]

5. Prometheus has been plagued by magic suffixes from the very
   beginning. In my understanding of Prometheus history, suffixing
   components of a summary or a histogram with `_count`, `_sum`,
   `_bucket` was a means to get an MVP running. I think it was a
   mistake to reify this concept (originally constrained to TSDB and
   PromQL) by letting it leak into the exposition format. OpenMetrics
   made things worse by introducing more magic suffixes (`_info` got
   introduced, and `_total` got a promotion from a mere recommendation
   to another magic suffix, and arguably the same happened to the
   unit). The problem with magic suffixes is namespace pollution: It
   prevents usage of any of the magic suffixes in metric names. Or to
   be precise: It's even worse, the usage is not really technically
   forbidde. You can still do it, but then you mught run into
   surprising and confusing namespace collisions that might very well
   show up in the worst of moments. A way out of this is to use a
   separator character for the magic suffixes that is _not_ a valid
   character in names otherwise. And now guess which character comes
   to mind for that... [This is another point in the category "future
   feature has a hard time blocking a feature proposal for the
   present". This was concretely considered when OpenMetrics was
   designed, but it got rejected by the OpenMetrics team. So there is
   a non-zero chance that it will be on the table again when we try to
   "fix" OpenMetrics. A counter point is again that another character
   could be used, at the price of using something that is much less
   intuitive to learn and read.]

6. Native histograms introduced the first instance of a "structured
   metric", but more could happen in the future. Accessing "fields" in
   this structure is currently done by bespoke functions in PromQL
   (`histogram_count(request_latency_seconds)`,
   `histogram_sum(request_latency_seconds)`), but it would be a quite
   obvious alternative to allow something like
   `request_latency_seconds.count` and `request_latency_seconds.sum`,
   which is actually more than just syntactic sugar because we could
   implement "field access" as a different thing from "function
   call". The latter is an evaluation, it changes the timestamp, and
   it cannot be used in a range selector
   (`histogram_count(request_latency_seconds)[5m]` is invalid syntax),
   while we could make `request_latency_seconds.count[5m]` valid
   without changing the language fundamentals. [This is much more
   tangible than (4) and (5), but not decided yet, so it still asks us
   to reject a concrete feature request in the name of a possible
   future feature. And again, there are other ways of implementing
   this, avoiding the usage of a `.` operator, at the price of not
   doing the most obvious.]

In summary, (4), (5), and (6) are all more or less vague, but they are
close to my heart, as I'm continuously thinking about future
improvements of PromQL in particular and Prometheus in general. I
should also note that it isn't clear if all three ideas can be
combined or if they are actually mutually exclusive. So the argument
is not so much "dots will kill three possible features at the same
time", but more like "even though those ideas are somewhat vague and
possibly mutually exclusive, there are so many of them that it's
likely wi will implement at least one of them in the not too far
future". (1) is IMHO irrelevant. (2) nicely illustrates a bigger
fundamental problem, but the concrete reference to Graphite is mostly
historical. Which leaves us with (3) as the most generally applicable
argument to be made, at least if we cut out the visionary part (that
might just live in my head).

As the final point, I would like to circle back to the very beginning
of the mail: The last consensus on `.` was that we want to implement
the full UTF-8 support via quoting first and see how it plays out "in
the wild". Only then we can see how well it really works (or how
badly), and based on that, we can make a much better trade-off about
the damage and benefits of introducing `.` as a regular character. I
would propose to do exactly that and wait for a bit longer (hopefully
just a few months) rather than pushing for a decision now.

-- 
Björn Rabenstein
[PGP-ID] 0x851C3DA17D748D03
[email] bjo...@rabenste.in

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-developers/Zlcn%2B/Mh5apMJnwc%40mail.rabenste.in.

Reply via email to