In a recent thread on this mailing list ("Limiting the blast radius of OTel..."), several people once again suggested that Prometheus should just allow the dot (`.`) as a regular character in metric and label names and be done with it. I responded that we have discussed this topic countless times, always with the result of not doing it (yet). Of course, we are free to reopen the discussion as often as anyone wishes (and in fact, one argument in the past was that we should first introduce full UTF-8 capabilities via quoting and see how it goes, and then we can still consider "graduating" selected characters to regular characters that can be used without quoting).
However, the reason for this mail is that I also said that I won't reiterate all the points made over and over again. After that, an individual approached me and asked where they could read up about those points, and I realized that they are hard to find in documented form. (My vague memory was that I already wrote a mail like this in the past, but I cannot find it anymore, and the relevant notes from dev-summits are not detailed and structured enough to serve as a reference.) Therefore, I'll reiterate all those points one more time so that we don't have to do it again in the future. Please amend this list if you find any omissions. [In this list, I also tried to say something about the relevance of each point. This is marked by square brackets.] 1. The probably oldest reason is a plan for a short-form notation of the job label. `requests_total{job="api"}` could be written as `requests_total.api`. This originates from an ancient internal Google practice. [I don't think that this point has any relevance anymore. The job label is now considered way less special than traditionally. Additionally, the short form would only work if the value of the job label follows the same character restrictions as names, which would cause confusion for sure when it doesn't.] 2. In the early years of Prometheus, the statsd/Graphite stack was very relevant. Dots play a very special role there. In contrast, even if we had allowed dots in Prometheus names from the beginning, they would just have been characters as all the others. Superficially, it would have looked like better interoperability, but it would not have lived up to its implied promises, because Graphite-style globbing would not have worked, the metrics would not have had an actual hierachy like in the Graphite data model etc. [This point is much weaker nowadays because most users are probably more familiar with the Prometheus-style label based data model than with the hierarchical Graphite data model. I wouldn't expect much confusion because of that. However, this point still illustrates the fundamental problem of turning a character that is part of the actual syntax and arguably even a real operator in one system into "just another character" in an opaque string in the other system, where the syntactic meaning only exist as a convention among humans. This is also relevant for some of the other points below.] 3. Naming is a hard problem, as we all know. Many of the early Prometheus contributors had rich experience with running complex systems at scale. They all got burned by the fact that our brains are really bad at remembering if something was named `foo-bar-baz` or `foo_bar_baz` or `foo.bar.baz` or `foo/bar/baz` (or even `foo_bar.baz`), especially in the heat of fighting an outage. Following the "simple, light-weight, opinionated" paradigm (once more many thanks to Julius to have expressed it so concisely recently), Prometheus decided to have one and only one separator character. In addition, this one separator character isn't really special in a lot of languages, so names from the Prometheus ecosystem would translate into names in other contexts easily (initially and practically most relevant for Go templating, but the idea works in a much wider scope). (One might come up with the counter argument that Prometheus also allows `:` as a separator. That's indeed a deviation from the fundamental idea. `:` is meant only for rules, but that's just a convention and not enforced by syntax. However, it has worked quite well for all those years, presumably because people rarely use `:` as a separator character by accident.) OTel semantic conventions are the antithesis of this: They introduce two different separator characters with a slightly different meaning (`.` for "namespaces", but they aren't really namespaces, more about that below). And they use a character that has a special meaning in a lot of languages. (Coming back to the Go templating example: `$labels.service_instance_id` is valid, `$labels.service.instance.id` is not. It forces you to jump through hoops and write `index $labels "service.instance.id"`. Similar issues will occur in many other languages.) [This might appear a minor annoyance to many, but in my experience, it creates a huge deal of peace of mind in the long run. This is also a good example why it is useful to mark `.` as special via requiring the quoting syntax. If we allowed `.` as a regular character, it will inevitably show up even in use cases that are untouched by OTel's semantic conventians, defeating the idea of "one and only one separator character". In a way, the effort of quoting protects regular Prometheus users from the `.` "pollution". Or in other words: By allowing `.` as a regular character, we would make the life of regular Prometheus users harder to accommodate OTel needs originating from a questionable decision.] 4. Much more vague than (1), but there have been thoughts about "proper" namespaces for a long time. The weird namespace concept in client_golang is a witness from the distant past, but that namespacing appears more like a joke in hindsight and never got traction. By now, it has become more of an annoyance we want to get rid of (but ironically, it is very similar to the "namespace" concept of OTel's semantic conventions). What makes a namespace "proper"? Maybe it's about the ability to be "inside" a namespace so you don't have to add the prefix or suffix all the time. Or it's about the namespaces to be indexed ("apply this query only to metrics in that namespace"). But most importantly, a namespace must come with an unambiguous syntax, which mostly boils down to having a namespace operator. The most common namespace operator is probably `.`, and that has been a good reason to reserve it in Prometheus. OTel's semantic conventions claim to use `.` for namespacing, too, but it's not an operator, it's just a convention. Which leads to weird stipulations like this one (https://opentelemetry.io/docs/specs/semconv/general/attribute-naming/): "Names SHOULD NOT coincide with namespaces. For example if service.instance.id is an attribute name then it is no longer valid to have an attribute named service.instance because service.instance is already a namespace. Because of this rule be careful when choosing names: every existing name prohibits existence of an equally named namespace in the future, and vice versa: any existing namespace prohibits existence of an equally named attribute key in the future." If `.` were a real namespace operator, you simply would not have this problem. [It's obviously a weak claim to block a feature in the present to keep open the option for a vaguely planned feature in the future. Furthermore, we could just use another character for the namespace operator. (Although C++ style `::` wouldn't work because `:` is already a regular character in names. And having a "weird" namespace operator next to `.` as a regular character will be confusing.) Still, I think proper namespacing would be so nice that we shouldn't dismiss it easily. In this context, it's doubly annoying that we cannot even just interpret the `.` coming from OTel as a "true" namespace operator because OTel _also_ allows it as a regular character. You can never know if a `.` coming from OTel is meant as a namespace separator, so you would treat it as such in your powerful namespace-enabled backend, or if it is just part of the name.] 5. Prometheus has been plagued by magic suffixes from the very beginning. In my understanding of Prometheus history, suffixing components of a summary or a histogram with `_count`, `_sum`, `_bucket` was a means to get an MVP running. I think it was a mistake to reify this concept (originally constrained to TSDB and PromQL) by letting it leak into the exposition format. OpenMetrics made things worse by introducing more magic suffixes (`_info` got introduced, and `_total` got a promotion from a mere recommendation to another magic suffix, and arguably the same happened to the unit). The problem with magic suffixes is namespace pollution: It prevents usage of any of the magic suffixes in metric names. Or to be precise: It's even worse, the usage is not really technically forbidde. You can still do it, but then you mught run into surprising and confusing namespace collisions that might very well show up in the worst of moments. A way out of this is to use a separator character for the magic suffixes that is _not_ a valid character in names otherwise. And now guess which character comes to mind for that... [This is another point in the category "future feature has a hard time blocking a feature proposal for the present". This was concretely considered when OpenMetrics was designed, but it got rejected by the OpenMetrics team. So there is a non-zero chance that it will be on the table again when we try to "fix" OpenMetrics. A counter point is again that another character could be used, at the price of using something that is much less intuitive to learn and read.] 6. Native histograms introduced the first instance of a "structured metric", but more could happen in the future. Accessing "fields" in this structure is currently done by bespoke functions in PromQL (`histogram_count(request_latency_seconds)`, `histogram_sum(request_latency_seconds)`), but it would be a quite obvious alternative to allow something like `request_latency_seconds.count` and `request_latency_seconds.sum`, which is actually more than just syntactic sugar because we could implement "field access" as a different thing from "function call". The latter is an evaluation, it changes the timestamp, and it cannot be used in a range selector (`histogram_count(request_latency_seconds)[5m]` is invalid syntax), while we could make `request_latency_seconds.count[5m]` valid without changing the language fundamentals. [This is much more tangible than (4) and (5), but not decided yet, so it still asks us to reject a concrete feature request in the name of a possible future feature. And again, there are other ways of implementing this, avoiding the usage of a `.` operator, at the price of not doing the most obvious.] In summary, (4), (5), and (6) are all more or less vague, but they are close to my heart, as I'm continuously thinking about future improvements of PromQL in particular and Prometheus in general. I should also note that it isn't clear if all three ideas can be combined or if they are actually mutually exclusive. So the argument is not so much "dots will kill three possible features at the same time", but more like "even though those ideas are somewhat vague and possibly mutually exclusive, there are so many of them that it's likely wi will implement at least one of them in the not too far future". (1) is IMHO irrelevant. (2) nicely illustrates a bigger fundamental problem, but the concrete reference to Graphite is mostly historical. Which leaves us with (3) as the most generally applicable argument to be made, at least if we cut out the visionary part (that might just live in my head). As the final point, I would like to circle back to the very beginning of the mail: The last consensus on `.` was that we want to implement the full UTF-8 support via quoting first and see how it plays out "in the wild". Only then we can see how well it really works (or how badly), and based on that, we can make a much better trade-off about the damage and benefits of introducing `.` as a regular character. I would propose to do exactly that and wait for a bit longer (hopefully just a few months) rather than pushing for a decision now. -- Björn Rabenstein [PGP-ID] 0x851C3DA17D748D03 [email] bjo...@rabenste.in -- You received this message because you are subscribed to the Google Groups "Prometheus Developers" group. To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-developers+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-developers/Zlcn%2B/Mh5apMJnwc%40mail.rabenste.in.