from:"Bjoern Rabenstein"

Re: [prometheus-developers] The Future of Classic Histograms; Moving to (Custom) Native Histograms?

2024-06-12 Thread Bjoern Rabenstein

Here is my idea for a deprecation plan:

1. On the server side, including PromQL:

A future PromQL server should still be able to recognize classic
histograms when scraped, but only to convert them to native histograms
with custom buckets (NHCB). From that point on (storage, query, remote
write), it shoud have no notion of classic histograms anymore.

This is also a reason why I think a "reverse transparency" layer to
query NHCB as if they were classic histograms is undesirable. We want
to get rid of the query patterns for classic histograms. (Another
reason is that it will be really hard to implement such a "reverse
transparency" layer reliably.)

2. On the instrumentation side, including exposition formats:

In direct instrumentation, if you need a histogram, you should default
to a native histogram with a standard exponential schema (which
guarantees mergeability across time and space and is compatible with
OTel's eponential histogram). Only if you need custom bucket
boundaries for some reason, you should use an NHCB. Generally, I think
the libraries can even keep their existing API for that. If you
instrument a histogram in the classic way, the API doesn't actually
tell you that this will result in a classic histogram. It just asks
you for the bucket boundaries, and then you call `Observe` as you do
for a native histogram, too. Hence, future libraries can just expose
an NHCB in that case.

If you translate a histogram from a 3rd party source, you use a
suitable flavor of native histograms as the translation target. In the
unlikely case that the source histogram fits the exponential bucketing
schema, use a regular exponential native histogram. For specific types
of 3rd party histograms (e.g. DDSketch, but there are many more), we
might implement additional schemas of native histograms that directly
accommodate them. And finally, if nothing else fits, you do NHCB.

The exposition formats of the future should be able to represent all
flavors of native histograms, so that we don't need to expose classic
histograms anymore.

_However_, the existing Prometheus exposition formats are so
ubiquitious by now, that I don't think they will ever die. For as long
as technically feasible, Prometheus servers should be able to
understand old exposition formats. Which circles back to the very
beginning: Any Prometheus server should still understand classic
histograms, but convert them into NHCB on scrape.

--
Björn Rabenstein
[PGP-ID] 0x851C3DA17D748D03
[email] bjo...@rabenste.in

--
You received this message because you are subscribed to the Google Groups
"Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to prometheus-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/prometheus-developers/Zmmjs%2BQen6u7brfz%40mail.rabenste.in.

Re: [prometheus-developers] Limiting the blast radius of OTel / UTF-8 support for normal Prometheus users?

2024-06-05 Thread Bjoern Rabenstein

On 05.06.24 18:07, 'Fabian Stäber' via Prometheus Developers wrote:
> 
> So, is the prefered solution to keep things as they are, i.e. keep
> replacing dots with underscores?

I don't think the purpose of the survey was to find a "preferred
solution". First of all, this is a technical decision, not a
democratic one. And even if it were, an online survey is inherently
biased.

The idea behind the survey was (I hope) to get a broad idea what
people find surprising or annoying, what they expect, what they like,
... and then we can use those inputs in a responsible fashion to
inform decisions.

> > why allow two different separator characters if they have no
> > semantic difference (no true namespacing).
> 
> This argument seems to resonate with the Prometheus team. If this is the
> main concern, we don't solve it by allowing dots in quotes. We solve this
> by replacing dots with underscores.

As discussed before, this solution has issues because you might run
into name collisions, and it is hard to match a name from one side of
the conversion wall to the corresponding name on the other side.

The previous discussion lead to the conclusion that we want allow all
of UTF-8, because OTel does, but that everything that is not a
valid conventional Prometheus name will require quoting.

We kept open the option of later allowing more characters in the
unquoted names, after we have seen how the quoting goes.

> >From the survey it looks like most users prefer the current naming scheme
> as well:
> 
> [image: screenshot_2024-06-05_18:04:05_908234003.png]
> [image: screenshot_2024-06-05_18:04:14_304430186.png]

The people in the survey got confronted with the various quoting
schemas without providing any context. This can only give us some idea
about people's gut feeling, but not much more.

> Shall we just drop the idea of adding UTF-8 support?

I don't understand the jump to this conclusion. OTel stil supports all
of UTF-8 in names. If somebody names a metric in Chinese or Cyrillic,
we cannot convert it to "__". That's the whole point. We
need UTF-8 support _anyway_. So let's do it and see how it goes before
running the umptieth reiteration of "can we just allow dots in metric
names".

-- 
Björn Rabenstein
[PGP-ID] 0x851C3DA17D748D03
[email] bjo...@rabenste.in

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-developers/ZmCYjf9yXTLMqNbW%40mail.rabenste.in.

[prometheus-developers] Collected reasons why Prometheus doesn't allow dot as a regular character in metric and label names

2024-05-29 Thread Bjoern Rabenstein

In a recent thread on this mailing list ("Limiting the blast radius of
OTel..."), several people once again suggested that Prometheus should
just allow the dot (`.`) as a regular character in metric and label
names and be done with it. I responded that we have discussed this
topic countless times, always with the result of not doing it
(yet). Of course, we are free to reopen the discussion as often as
anyone wishes (and in fact, one argument in the past was that we
should first introduce full UTF-8 capabilities via quoting and see how
it goes, and then we can still consider "graduating" selected
characters to regular characters that can be used without quoting).

However, the reason for this mail is that I also said that I won't
reiterate all the points made over and over again. After that, an
individual approached me and asked where they could read up about
those points, and I realized that they are hard to find in documented
form. (My vague memory was that I already wrote a mail like this in
the past, but I cannot find it anymore, and the relevant notes from
dev-summits are not detailed and structured enough to serve as a
reference.)

Therefore, I'll reiterate all those points one more time so that we
don't have to do it again in the future. Please amend this list if you
find any omissions. [In this list, I also tried to say something about
the relevance of each point. This is marked by square brackets.]

1. The probably oldest reason is a plan for a short-form notation of
   the job label. `requests_total{job="api"}` could be written as
   `requests_total.api`. This originates from an ancient internal
   Google practice. [I don't think that this point has any relevance
   anymore. The job label is now considered way less special than
   traditionally. Additionally, the short form would only work if the
   value of the job label follows the same character restrictions as
   names, which would cause confusion for sure when it doesn't.]

2. In the early years of Prometheus, the statsd/Graphite stack was
   very relevant. Dots play a very special role there. In contrast,
   even if we had allowed dots in Prometheus names from the beginning,
   they would just have been characters as all the
   others. Superficially, it would have looked like better
   interoperability, but it would not have lived up to its implied
   promises, because Graphite-style globbing would not have worked,
   the metrics would not have had an actual hierachy like in the
   Graphite data model etc. [This point is much weaker nowadays
   because most users are probably more familiar with the
   Prometheus-style label based data model than with the hierarchical
   Graphite data model. I wouldn't expect much confusion because of
   that. However, this point still illustrates the fundamental problem
   of turning a character that is part of the actual syntax and
   arguably even a real operator in one system into "just another
   character" in an opaque string in the other system, where the
   syntactic meaning only exist as a convention among humans. This is
   also relevant for some of the other points below.]

3. Naming is a hard problem, as we all know. Many of the early
   Prometheus contributors had rich experience with running complex
   systems at scale. They all got burned by the fact that our brains
   are really bad at remembering if something was named `foo-bar-baz`
   or `foo_bar_baz` or `foo.bar.baz` or `foo/bar/baz` (or even
   `foo_bar.baz`), especially in the heat of fighting an
   outage. Following the "simple, light-weight, opinionated" paradigm
   (once more many thanks to Julius to have expressed it so concisely
   recently), Prometheus decided to have one and only one separator
   character. In addition, this one separator character isn't really
   special in a lot of languages, so names from the Prometheus
   ecosystem would translate into names in other contexts easily
   (initially and practically most relevant for Go templating, but the
   idea works in a much wider scope). (One might come up with the
   counter argument that Prometheus also allows `:` as a
   separator. That's indeed a deviation from the fundamental idea. `:`
   is meant only for rules, but that's just a convention and not
   enforced by syntax. However, it has worked quite well for all those
   years, presumably because people rarely use `:` as a separator
   character by accident.) OTel semantic conventions are the
   antithesis of this: They introduce two different separator
   characters with a slightly different meaning (`.` for "namespaces",
   but they aren't really namespaces, more about that below). And they
   use a character that has a special meaning in a lot of
   languages. (Coming back to the Go templating example:
   `$labels.service_instance_id` is valid,
   `$labels.service.instance.id` is not. It forces you to jump through
   hoops and write `index $labels "service.instance.id"`. Similar
   issues will occur in many other

Re: [prometheus-developers] Limiting the blast radius of OTel / UTF-8 support for normal Prometheus users?

2024-05-28 Thread Bjoern Rabenstein

I'm trying to keep things short, as all of this had been discussed
at length before.

WRT "how to explain UTF-8 support to users": I actually don't think
this is a huge problem. I would frame it "this is like file
names". You can use blanks and slashes in Unix file names, and if you
do, it requires weird quoting or escaping, but that's not a huge
problem in practice. People just don't use them if they care. And if
they have to interact with other file sources, where blanks are
common, they cope. And yes, that means that names from OTel semantic
conventions will always be considered weird, but that's a problem of
OTel, not all the other languages where a dot has a special
meaning. Segue to the next paragraph...

WRT the dot in OTel semantic conventions: Personally, I'm more
convinced than ever that it was a grave mistake to use dots in the
semantic conventions. I understand the history thereof, but the moment
that OTel self-declared as the overarching standard for all kind of
telemetry, they should have realized that using a character that has a
special meaning or is even an operator in so many languages is a
really really bad idea. This is not just PromQL specific. Originally,
I thought it's infeasible to change the semantic conventions at this
point, but by now, that's exactly what I think OTel should do. If the
dot were an actual operator in OTel (let's say a separator of actual
1st class namespaces) rather than just a convention within a
technically opaque string, I could see some merit. But as it is not,
it's just annoying and has no benefits whatsoever.

Despite having said all of that, I don't realistically expect that
OTel is going to change the semantic conventions. So next question is
how to deal with it. There are many reasons why it's a bad idea to
allow the dot in Prometheus metric names, most of them weren't
mentioned in this thread. I won't enumerate them all again. We can do
that if we really want to open that can of worms again. Segue to the
next paragraph...

In all the discussions we had before, my impression was that the
consensus (in the spirit of RFC 7282) was to not add the dot to the
characters that don't require quoting. As the saying goes, in OSS, a
"no" is temporary and a "yes" is forever. So we can re-open this
debate as often as anyone wishes. If the result is different at some
point in the future, so be it. It's unlikely that I will change my
mind (in fact, as alluded to above, I'm more convinced than ever that
Prometheus should resist the urge). But that doesn't necessarily
prevent an RFC-7282-style consensus. (Or we could also just have a
vote, like in the old days, although that should be a last resort.)
Despite the opinions expressed so far, I would doubt that I'm the only
one who will be opposed.

Julius has previously described quite nicely how OTel conventions and
practices creep into the Prometheus ecosystem, undermining original
properties of Prometheus as "simple, light-weight, and
opinionated". The whole quoting syntax that opened this thread is for
me a way of allowing what OTel needs but also of containing the damage
and keep things in spirit for normal Prometheus users. Maybe another
thing to include when explaining the syntax to normal Prometheus
users.

--
Björn Rabenstein
[PGP-ID] 0x851C3DA17D748D03
[email] bjo...@rabenste.in

[prometheus-developers] dev-summit this Thursday

2024-05-21 Thread Bjoern Rabenstein

Hi Prometheans,

This is a reminder that we have our monthly dev-summit this week:
  Thursday, 2024-05-23, 1500UTC

The link to join is https://meet.google.com/hmj-eyrv-fhr .
If you have trouble joining, please yell in #prometheus-dev on the
CNCF Slack. (We've had trouble in the past with admitting guests to
Google Meets...)

The dev-summit is also on our public events calendar, see details on
https://prometheus.io/community/ how to access it.

-- 
Björn Rabenstein
[PGP-ID] 0x851C3DA17D748D03
[email] bjo...@rabenste.in

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-developers/ZkygV5O8vdGcmLXA%40mail.rabenste.in.

Re: [prometheus-developers] Re: metrics TTL for pushgateway

2024-04-19 Thread Bjoern Rabenstein

On 19.04.24 01:33, John Yu wrote:
> I'm thinking, why can't we deploy an additional prom as an agent to receive 
> data, and then write to the core prom remotely after receiving the data?
> Although I know that this will lead to breaking away from the pull model to 
> a certain extent, it is undeniable that we do have push scenarios in 
> metrics monitoring. Having it seems to be a better solution to push 
> scenarios. At least in my opinion, increasing ttl will be better than pgw. 
> better?

Remote-writing into a vanilla Prometheus server has its own set of
problems, but it's safe to say that it's less of an abuse then using
the Pushgateway to turn Prometheus into a push-based metrics
collection system.

-- 
Björn Rabenstein
[PGP-ID] 0x851C3DA17D748D03
[email] bjo...@rabenste.in

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-developers/ZiJ6hfikXX4Jpss0%40mail.rabenste.in.

Re: [prometheus-developers] Native Histograms and Pushgateway Backwards Compatibility

2024-04-11 Thread Bjoern Rabenstein

On 10.04.24 11:40, 'Fabian Stäber' via Prometheus Developers wrote:
> 
> I'm wondering about backwards compatibility: What happens if I push a 
> histogram (by default it's both classic and native representation) to an 
> old Pushgateway that does not have native histogram support? If the native 
> histogram representation just gets silently dropped that would be perfect, 
> then I can just do it and not care about backwards compatibility. Is that 
> the case, or could there be compatibility issues with older Pushgateways?

The native histogram support for PGW is tracked in
https://github.com/prometheus/pushgateway/issues/515

IIRC the only thing that is really needed (and what Jan is currently
working on) is essentially UI.

The "backend" support always worked. So if you push a protobuf
message, it will be stored as-is, and also exposed as you would
expect. Even the oldest PGW releases therefore implicitly support
native histograms. If you push a protobuf with classic and native
buckets, both will be stored and both will be exposed.

That's the current understanding. If you encounter any other behavior,
please let us know (maybe in the aforementioned issue).
-- 
Björn Rabenstein
[PGP-ID] 0x851C3DA17D748D03
[email] bjo...@rabenste.in

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-developers/Zhe9Sw0ShsPS99a6%40mail.rabenste.in.

Re: [prometheus-developers] Protobuf support

2024-03-06 Thread Bjoern Rabenstein

Disclaimer: I'm not the expert for remote-write/-read, and only do a
little with the protobuf stuff. But since you didn't get an answer
in a week, I'm trying now.

On 29.02.24 20:38, Clément Jean wrote:
> 
> I finally go around to do this and the unit tests seem to be passing. You 
> can find what I did here: 
> https://github.com/Clement-Jean/prometheus-move-to-proto.

Thank you. This looks very unconventional with separate patch
files. Wouldn't it be more straight forward to just fork
prometheus/prometheus and work in the normal git style in that fork?

> I still have few questions:
> - For the benchmarking, should I try running the benchmark on the current 
> version of prometheus and then run it on my version, or is there a better 
> way to do this?

That sounds like the way to go.

> - Is `make bench_tsdb` enough or should I run other benchmarks?

`make bench_tsdb` is a rather special benchmark for certain TSDB
aspects. I doubt it touches protobuf at all.

We usually use the benchmarking framework as built into the Go
toolchain, i.e. you run `go test -bench NameOfBenchmarkToRun`, usually
also using the `-benchmem` flag to see memary allocation stats.

Bryan referred to
https://github.com/prometheus/prometheus/blob/122f9506e9c6/storage/remote/queue_manager_test.go#L872
previously. You would run that benchmark in the storage/remote
directory by typing `go test -bench BenchmarkSampleSend -benchmem`

> - For forward/backward compat, I'm not sure how I should go about it. Could 
> you clarify a little bit?

It's important that the protobuf messages on the wire are still
encodable and decodable by older and newer versions of the code. So I
believe what Brian was referring to is that you set up a Prometheus
server with and without your changes and let them send remote write
and remote read to each other.

In different news: On this same mailing list, somebody else
(mircodezo...@gmail.com) is also working on the same topic, see thread
titled "Migrating away from github.com/gogo/protobuf". They have run a
bunch of benchmarks already, and their approach to replace
gogo-protobuf has run into some issues. I suggest you two join forces
and exchange your experiences. If you want a chat-like medium to
discuss things, you should know that most of the dev conversation
happens on the CNCF Slack these days (channel #prometheus-dev). (There
is also a #prometheus-dev on IRC, but I'm afraid Slack has
successfully sucked most devs into its black hole.)

Hope that helps a bit.
-- 
Björn Rabenstein
[PGP-ID] 0x851C3DA17D748D03
[email] bjo...@rabenste.in

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-developers/ZehYOmNqb7R%2BsNFb%40mail.rabenste.in.

Re: GRe: [prometheus-developers] Migrating away from github.com/gogo/protobuf

2024-03-06 Thread Bjoern Rabenstein

On 02.03.24 08:00, Mirco wrote:
> Any reason samples are still encoded as a list of pairs? Using two arrays 
> would both reduce the number of objects and allow using varint encoding for 
> timestamps.

I assume you are referring to the protobuf messages for TimeSeries and Sample?

I'm not an expert for remote write (and even less so for remote read),
but I think it's safe to say that a change in the protobuf layout
would mean a new major version of the protocols. v2 is just underway,
so that would require the next major version bump to v3, which would
be a big deal given how widely used the protocol is.

Having said that, at least for remote write, there are usually not a
lot of samples in a TimeSeries message. The most common number is
AFAIK one. Mileage might vary for remote read, but that's also far
less used.

WRT varint: In my understanding of protobuf, varint will always be
used for an int64, even if it is a field in another message.

-- 
Björn Rabenstein
[PGP-ID] 0x851C3DA17D748D03
[email] bjo...@rabenste.in

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-developers/ZehS9eE4sHeMC9Hc%40mail.rabenste.in.

Re: [prometheus-developers] Migrating away from github.com/gogo/protobuf

2024-02-15 Thread Bjoern Rabenstein

Thanks for doing this.

Beyond benchmarks, two general concerns:

- How unsafe is `enableunsafedecode=true`? I spot-checked the csproto
  code, and the risk seems to be on the side of the user code,
  i.e. luckily there isn't any unsafe input, but I'm wondering how
  easily we'll introduce bugs in that way. What's the gain by using
  the flag vs. not using it? (FTR, the Prometheus code is using the
  same trick with the `yoloString` itself, but that has also been
  frowned upon...)

- How confident can we be that csproto will be consistently
  maintained? It seems to be mostly the work of a single person, and
  it is sponsored by a single company (which is at least already 13
  years around, and has 7k employees, so probably not disappearing
  tomorrow).

Not saying these are blockers, just trying to come to an informed
decision.

-- 
Björn Rabenstein
[PGP-ID] 0x851C3DA17D748D03
[email] bjo...@rabenste.in

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-developers/Zc4p75tVV%2BB5sFIN%40mail.rabenste.in.

Re: [prometheus-developers] Re: Call for maintainers: Exporter team

2024-01-25 Thread Bjoern Rabenstein

On 17.01.24 11:28, Matt Doughty wrote:
> I've gotten so much from prometheus that I would be happy to help, but
> I haven't been involved.  Just let me know if there is anything I can
> do.

There is plenty to do.

The best is, as I like to say, if you "can scratch your own itch" -
work on something that you would find useful yourself.

Generally, you can look at the open issues in the various Prometheus
repos. Sometimes we tag good first issues as such (but less often than
we should). Avoid those tagged as "not as easy as it looks".

-- 
Björn Rabenstein
[PGP-ID] 0x851C3DA17D748D03
[email] bjo...@rabenste.in

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-developers/ZbLlllh4FKaBW3uX%40mail.rabenste.in.

[prometheus-developers] Proposing George Krajcsovits as maintainer for native histogram related code in prometheus/prometheus

2024-01-25 Thread Bjoern Rabenstein

I hereby propose George Krajcsovits AKA Krajo as a maintainer for the
native histogram related code in prometheus/prometheus. Krajo has
contributed a lot of native histogram code, but more importantly, he
has contributed substantially to reviewing other contributors' native
histogram code, up to a point where I was merely rubberstamping the
PRs he had already reviewed. I'm confident that he is ready to to be
granted commit rights as outlined for non-team-member maintainers in
the "Maintainers" section of the governance:
https://prometheus.io/governance/#maintainers

See https://github.com/prometheus/prometheus/pull/13466 for the PR
implementing this change, which also includes formalizing my own
maintainership for native histograms.

This proposal is subject to lazy consensus, so please voice any
objections in the next few days.
-- 
Björn Rabenstein
[PGP-ID] 0x851C3DA17D748D03
[email] bjo...@rabenste.in

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-developers/ZbJxSRmYvbh7tuea%40mail.rabenste.in.

Re: [prometheus-developers] Re: Call for maintainers: Exporter team

2024-01-17 Thread Bjoern Rabenstein

On 15.01.24 10:00, gitperr wrote:
> 
> Currently, I have some MRs waiting for review/help in different parts of 
> the project.
> https://github.com/prometheus/golang-builder/pull/239
> https://github.com/prometheus/node_exporter/pull/2833

Thanks for your contributions.

The first PR seems to be best reviewed by @SuperQ, whom you have
already mentioned in the PR. It probably fell through the cracks over
the holidays. I have assigned the PR to SuperQ, hoping he'll notice
that. (He should also be reading this mailing list. Last resort would
be to try to catch him on IRC or Slack. But let's give him a few
days.)

The second PR is still marked as a draft, which essentially signals
that you don't want a review yet. Marking the PR ready for review
should (ideally) get you a review soon. If you need some interactive
help to get the PR ready, it might be a good idea to try the developer
channels on IRC or Slack, or add an item to the Contributor Office
Hour agenda. All of this is described in more detail in the
"Contributing" section on https://prometheus.io/community/

Thanks again.
-- 
Björn Rabenstein
[PGP-ID] 0x851C3DA17D748D03
[email] bjo...@rabenste.in

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-developers/Zaf%2BrBBwL7eD6xSZ%40mail.rabenste.in.

Re: [prometheus-developers] [Feature/Proposal] Concurrent evaluation of independent rules

2023-11-08 Thread Bjoern Rabenstein

On 28.10.23 04:32, Danny Kopping wrote:
> 
> The feature is hidden behind a feature-flag, but I would argue that we can 
> drop the flag and simply set --rules.max-concurrent-evals=0 as default which 
> is functionally equivalent to not having any concurrency at all (the 
> current behaviour); double opt-in feels unnecessary.

Just a high level note about feature flags: The opt-in part is only
one reason to use a feature flag. The other is that it clearly marks a
feature as experimental. If we just introduced
`--rules.max-concurrent-evals`, people would inevitably use it
assuming it's a stable feature. Now imagine that it turns out that the
whole thing was a bad idea and we remove the feature again, those
users would see an unexpected breaking change.

-- 
Björn Rabenstein
[PGP-ID] 0x851C3DA17D748D03
[email] bjo...@rabenste.in

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-developers/ZUu4pk4Fe8Yaz/cS%40mail.rabenste.in.

Re: [prometheus-developers] Protobuf support

2023-10-18 Thread Bjoern Rabenstein

On 05.10.23 17:48, Clément Jean wrote:
> I'm not entirely sure yet because I'm new to contributing to prometheus. If 
> there is any use cases that you guys already discussed around Protobuf, I'd 
> be happy to help.

If you are really deep into protobuf, there is definitely one big and
fat issue to solve: We are still using gogo-protobuf in
prometheus/prometheus, which has good performance properties, but is
unmaintained. The plan has been for a while to migrate to another
protobuf implementation that performs similarly well. Here is the
discussion on this mailing list:
https://groups.google.com/g/prometheus-developers/c/uFWRyqZaQis/m/1OOGT7s5AwAJ

And here is a branch that contains a PoC of migrating to the vitess
protoc plugin:
https://github.com/austince/prometheus/tree/feat/drop-gogo
(Note that it is more than two years old, and I'm not sure if the
vitess plugin performs well enough. But it could be a starting point.)

-- 
Björn Rabenstein
[PGP-ID] 0x851C3DA17D748D03
[email] bjo...@rabenste.in

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-developers/ZS/SqHfCPP9Cag2m%40mail.rabenste.in.

Re: [prometheus-developers] Protobuf support

2023-10-04 Thread Bjoern Rabenstein

On 03.10.23 18:06, Clément Jean wrote:
> 
> I met with Richard Hartmann at KubeCon Shanghai and he mentioned that the 
> team is interested in adding support for Protobuf. I'm here to see what I 
> can do to help. If you have any recommendation on how to get started, I'd 
> be happy to start contributing.

Thanks for your interest in contributing to Prometheus.

Could you provide more details about what kind of protobuf support you
would like to add? Prometheus already supports protobuf for scraping,
remote write, and remote read.

In general, you could check out issues on the many prometheus
repositories to get inspiration what's needed. Ideally, you pick an
issue that also scratches one of your own itches. If it is tagged as
"good first issue", even better. I would avoid the issues tagged as
"not as easy as it looks" until you have gathered more experience.

Cheers,
-- 
Björn Rabenstein
[PGP-ID] 0x851C3DA17D748D03
[email] bjo...@rabenste.in

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-developers/ZR1m%2BPFlN3H/nnXJ%40mail.rabenste.in.

Re: [prometheus-developers] In-person dev-summit this Saturday in Berlin

2023-09-30 Thread Bjoern Rabenstein

The dev summit has started, and here is the link for online
participation: https://meet.google.com/uvm-fcst-bsc

See you there.
-- 
Björn Rabenstein
[PGP-ID] 0x851C3DA17D748D03
[email] bjo...@rabenste.in

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-developers/ZRfM/YPZi9rUMFOw%40mail.rabenste.in.

[prometheus-developers] In-person dev-summit this Saturday in Berlin

2023-09-26 Thread Bjoern Rabenstein

Dear Prometheans,

as you might have noticed, this week PromCon is happening in Berlin,
see https://promcon.io .

Following our ancient traditions, there will be an in-person developer
summit on the day after PromCon, which is Saturday (2023-09-30). We'll
start 06:30 UTC (08:30 CEST) with coffee etc. The real work will
commence at 07:00 UTC (09:00 CEST). We'll have time until 15:00 UTC
(17:00 CEST).

Please check out the dev-summit agenda (where you can also suggest new
agenda items):
https://docs.google.com/document/d/11LC3wJcVk00l8w5P3oLQ-m3Y37iom6INAMEu2ZAGIIE/edit

If we can make it work at the venue, we'll provide an option to join
online. I'll update this thread the moment I can provide you with a
link, which might be just when the summit starts (or never if we fail
to set it up - sorry for the "best effort" approach, our priority is
the in-person experience).

If you are in Berlin and would like to attend in person, please
contact Bartek Płotka, either on the CNCF Slack ( @bwplotka ) or via
email ( bwplo...@gmail.com ), so that we can add you to the guest list
etc.

Looking forward to seeing you in Berlin later this week,
--
Björn Rabenstein
[PGP-ID] 0x851C3DA17D748D03
[email] bjo...@rabenste.in

Re: [prometheus-developers] client_golang: updating client_model dependency to 0.4.0

2023-07-18 Thread Bjoern Rabenstein

On 17.07.23 16:47, Daniel Swarbrick wrote:
> 
> But now we get to the ugly stuff. There are several testable examples in 
> the client_golang package, which compare the example code's text marshaled 
> metrics output to the golden "// Output:" text. As you've probably guessed, 
> these now all fail. Even if the golden text were to be updated to the new 
> text marshal format, they would still fail *intermittently*, since the 
> format is non-deterministic. Some of these tests can be fixed by ensuring 
> that the example outputs "compacted" text, using a function such as the 
> compact() 
>  
> function in the legacy protobuf library. Other tests are trickier, because 
> they are comparing entire HTTP response bodies, albeit with a protobuf 
> error message part way through (which are *also* no longer deterministic).
> 
> I've done some initial work in 
> https://github.com/dswarbrick/client_golang/tree/client_model-0.4.0, but 
> would appreciate if anybody has some bright ideas how to handle the 
> testable examples.

Hi Daniel, I think your analysis is spot on. For classical text, we
can do stuff do compare in different ways, but for the Go example
tests, we need text output, so we have to do something pretty
involved. (The current way of writing those example tests were really
driven by pragmatism – use the easiest way to got a more or less
readable and deterministic output, and the latter doesn't work
anymore.) Unfortunately, I couldn't come up with any bright ideas for
an easy way out.

Thanks for tackling this.
-- 
Björn Rabenstein
[PGP-ID] 0x851C3DA17D748D03
[email] bjo...@rabenste.in

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-developers/ZLaaTcjzlWVMN4IR%40mail.rabenste.in.

Re: [prometheus-developers] Prometheus Pushgateway TTL

2023-07-12 Thread Bjoern Rabenstein

On 12.07.23 10:10, E wrote:
> I think optional TTL per time series is a good idea. It might have several
> use cases, it doesn't break anything, and it shouldn't be too hard to make.
> So why not?

Because all the use cases discussed so far have turned out to be
anti-patterns we don't want to support. This topic was brought up
multiple times at dev-summits etc., and the outcome was always the
same.

> I might have used this feature to trigger short-lived alerts with arbitrary
> text in a label, something I wouldn't do without TTL because it would
> require a cleanup.

I don't quite understand that use case, but feel free to flesh it out
a bit more and propose it as a topic for the dev-summit by adding it
to the agenda:
https://docs.google.com/document/d/11LC3wJcVk00l8w5P3oLQ-m3Y37iom6INAMEu2ZAGIIE/edit?pli=1
 

-- 
Björn Rabenstein
[PGP-ID] 0x851C3DA17D748D03
[email] bjo...@rabenste.in

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-developers/ZK6TOzCck1MSMuW0%40mail.rabenste.in.

Re: [prometheus-developers] Prometheus Pushgateway TTL

2023-07-12 Thread Bjoern Rabenstein

On 11.07.23 15:23, 'Braden Schaeffer' via Prometheus Developers wrote:
> They could live for 5s or 1 hour.

The whole idea of a Prometheus counter doesn't really make sense for a
job that lives for just 5s, if you are scraping every 15s or every
minute or so.

And a job that lives for 1 hour should be scraped directly.

So in the first case, using a counter doesn't make sense, and in the
second case using the Pushgateway doesn't make sense.

> Does it really matter what you send to pushgateway? It supports
> counters so why not push them?

We could be stricter and just reject counters being pushed to the
Pushgateway, but that would be a breaking change. Historically, the
metric type information in Prometheus was (and to a good part still
is) some kind of "weak typing", so no hard restrictions were imposed
(you can apply `rate` to a gauge or `delta` to a counter without
Prometheus complaining about it).

Also, it feels natural to count "records backed up by the daily
database back up job" in a counter and push it to the
Pushgateway. However, when it arrives on your Prometheus server, it
doesn't really behave as a counter. Summing those values up across
instances is really painful with PromQL, and the reason for that is
that we are essentially handling events here, for which Prometheus as
a whole wasn't really designed.

If you really have to use Prometheus for that case, the "least bad"
solutions I know of is statsd with the statsd-exporter (
https://github.com/prometheus/statsd_exporter ) or the
prom-aggregation-gateway
( https://github.com/zapier/prom-aggregation-gateway ).

A TTL doesn't really address the fundamental problem. It might enable
a very brittle solution that is worse than the solution that are
already available.

--
Björn Rabenstein
[PGP-ID] 0x851C3DA17D748D03
[email] bjo...@rabenste.in

Re: [prometheus-developers] Prometheus Pushgateway TTL

2023-07-01 Thread Bjoern Rabenstein

On 29.06.23 08:47, 'Braden Schaeffer' via Prometheus Developers wrote:
> It's the same as calculating the total incoming request rate of N pods in a
> deployment: sum(rate(grpc_request_count{service=foo}[5m]))

樂 I'm surprised that you seem to push a counter metric to the
Pushgateway.

I would say the intended use case for the Pushgateway is that a
batch job pushes its metrics upon completion. That means you only ever
have one value of those metrics, so a `rate` on those would always
result in zero.

Are you perhaps pushing multiple times during the runtime of your
batch jobs? That would be weird indeed for a PGW use case. Why don't
you just scrape your jobs normally then?

-- 
Björn Rabenstein
[PGP-ID] 0x851C3DA17D748D03
[email] bjo...@rabenste.in

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-developers/ZKCbdJPyXMrHHvaa%40mail.rabenste.in.

Re: [prometheus-developers] Prometheus Pushgateway TTL

2023-06-29 Thread Bjoern Rabenstein

On 14.06.23 13:10, 'Braden Schaeffer' via Prometheus Developers wrote:
> 
> The most basic example, two batch jobs that produce the same metrics (grpc 
> or http metrics). This is not just `last_completed_at` or something as I 
> have seen before where its the same metric being updated over and over 
> agin. You have to include a label that identifies these jobs as different 
> so that metrics like gRPC request rates can be calculated correctly. In the 
> kubernetes world this usually means pod ID. Simple enough until you have 
> 1000s of these pod IDs compounded by other labels.

I don't fully understand what you are trying to do. Could you explain
what metrics you are pushing exactly, and what PromQL expressions you
are using to "correctly calculate a gRPC request rate"?

-- 
Björn Rabenstein
[PGP-ID] 0x851C3DA17D748D03
[email] bjo...@rabenste.in

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-developers/ZJ1k6tQ/nrX8cgP%2B%40mail.rabenste.in.

Re: [prometheus-developers] New Prometheus client library for Delphi

2023-03-27 Thread Bjoern Rabenstein

On 26.03.23 07:40, Marco Breveglieri wrote:
> 
> But let's get to the point: is there any path that I could walk to get this 
> client listed in the "Unofficial third-party client libraries" page 
> (https://prometheus.io/docs/instrumenting/clientlibs/) on Prometheus 
> website?

Create a PR against the https://github.com/prometheus/docs repo. (The
file in question is
https://github.com/prometheus/docs/blob/main/content/docs/instrumenting/clientlibs.md
 .)

Thanks for your work on the client library.
-- 
Björn Rabenstein
[PGP-ID] 0x851C3DA17D748D03
[email] bjo...@rabenste.in

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-developers/ZCGOy8f2y96sy4Se%40mail.rabenste.in.

Re: [prometheus-developers] Should Alertmanager be more tolerant of templating errors?

2023-02-09 Thread Bjoern Rabenstein

On 07.02.23 05:57, 'George Robinson' via Prometheus Developers wrote:
> 
> While I appreciate the responsibility of writing correct templates is on 
> the user, I have also been considering whether Alertmanager should be more 
> tolerant of template errors, and attempt to send some kind of notification 
> when this happens. For example, falling back to the default template that 
> we have high confidence of being correct.

I think that makes sense. The fall-back template could call out very
explicitly that the intended template failed to expand and therefore
you get a replacement, maybe even with the error message of the
attempt to expand the original template.

But I'm not really an Alertmanager experts. And despite having a lot
of historical context about Prometheus in general, I don't remember
anything specific about error handling in alert templates.

I only remember that trying out an alert "in production" is really
hard since you need to trigger it. And if the moment you notice that
your template doesn't work is also the moment when your alert is
supposed to fire, that's really bad.

So better test tooling might help here, but even if we had that, I
think there should be a safe fall-back so that no alert is ever
swallowed because of a templating error.

-- 
Björn Rabenstein
[PGP-ID] 0x851C3DA17D748D03
[email] bjo...@rabenste.in

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-developers/Y%2BUxD3QTKJbrLACk%40mail.rabenste.in.

Re: [prometheus-developers] Reserved labels in client libraries

2023-02-09 Thread Bjoern Rabenstein

On 08.02.23 09:50, Chris Sinjakli wrote:
> 
> Am I right in thinking that client libraries generally don't reserve any 
> label names outside of ones prefixed with double underscore? I'm curious if 
> this is something that's changed over time in other clients and we've not 
> kept up, or if at some point we went in a different direction from the rest 
> of the libraries.

I don't think anything is strictly reserved. As Bryan Boreham
mentioned on the issue, Prometheus simply prefixes `instance` and
`job` with `exported_` (unless `honor_labels` is set to true), so that
is handled deliberately.

Then there are other labels that have an implied meaning in certain
contexts, like `le` for (conventional) histogram buckets or `quantile`
for pre-calculated quantiles in summaries. But again, there is no hard
rule to not use them anywhere else.

In other words: It's helpful if instrumentation libraries don't let
you use a `quantile` label on a summary or an `le` label on a
histogram, but it's totally fine to use those labels on a counter or
gauge (or `quantile` on a histogram and `le` on a summary).

-- 
Björn Rabenstein
[PGP-ID] 0x851C3DA17D748D03
[email] bjo...@rabenste.in

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-developers/Y%2BUufFsdg/56yk/A%40mail.rabenste.in.

[prometheus-developers] Brainstorming doc about quoting for metric and label names

2023-02-01 Thread Bjoern Rabenstein

Hi Prometheans,

at the last in-person dev-summit in Munich, we talked a lot about how
to allow arbitrary UTF-8 characters in metric and label names (or
maybe just allow a few like "." and "/"...).

We also had some whiteboarding and brainstorming in the breaks, which
isn't reflected in the dev-summit notes. And obviously, there have
been a lot of discussions on various channels, private and public.

What started as an attempt to document the brainstorming that happened
at the dev-summit evolved into a summary of the current state of all
those discussions, as far as I have noticed. Please have a look:
https://docs.google.com/document/d/1yFj5QSd1AgCYecZ9EJ8f2t4OgF2KBZgJYVde-uzVEtI/edit

Cheers,
-- 
Björn Rabenstein
[PGP-ID] 0x851C3DA17D748D03
[email] bjo...@rabenste.in

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-developers/Y9qHB4ejq27sq8cu%40mail.rabenste.in.

Re: [prometheus-developers] Is the +Inf bucket optional?

2023-02-01 Thread Bjoern Rabenstein

On 17.01.23 10:01, 'Fabian Stäber' via Prometheus Developers wrote:
> 
> I'm having trouble figuring out if the +Inf bucket is optional.
> 
> The client_model protobuf says it's optional:
> 
> 
> https://github.com/prometheus/client_model/blob/63fb9822ca3ba7a4ba5184071fb8f2ea000a99ef/io/prometheus/client/metrics.proto#LL71C89-L71C114

Yes, it's optional in the protobuf format.

> The OpenMetrics protobuf says buckets are optional
> 
> https://github.com/OpenObservability/OpenMetrics/blob/1386544931307dff279688f332890c31b6c5de36/proto/openmetrics_data_model.proto#L140

I guess this part of the OM protobuf spec hasn't been checked with the
same scrutiny as most of the OM text format.

> The OpenMetrics specification says "Histogram MetricPoints MUST have one
> bucket with an +Inf threshold.".
> 
> https://github.com/OpenObservability/OpenMetrics/blob/1386544931307dff279688f332890c31b6c5de36/specification/OpenMetrics.md

Yes, that's pretty clear, and I guess the intention is to make as few
moving parts optional as possible.

Before OM, the +Inf bucket was considered redundant as it is
essentially the same as the _count. There is the weird edge case of
NaN observations, where one might argue they should be counted in
_count but not in the +Inf bucket, but this case is of fairly low
practical relevance (and I read the OM spec as "still count them in
the +Inf bucket").  I would assume Prometheus will happily tolerate
the absence of the +Inf bucket during ingestion and simply create it
from the _count value.

> So, does it depend on the format (required in text format, optional in
> protobuf)?

Yes, that sounds about right.

OM text format is clear about requiring it. OM protobuf is weird, as
you have discovered. The old Prometheus formats (both text and proto)
are blissfully underspecified. For a precise answer, we needed to
double-check what Prometheus is doing upon ingestion. I'm sure that
Prometheus 1.x added the +Inf time series in any case, but Prometheus
2.x might stick closer to what it sees in the text exposition (and
thus not create the +Inf time series if it is missing in the
exposition).

> Bonus question: What about buckets for native histograms, are they optional?

Yes. A native histogram with zero observations will also have zero
buckets.

Which leads to the interesting edge case of how to recognize if a
histogram is meant to be a native one or not if it hasn't received any
observations yet, see
https://github.com/prometheus/client_golang/issues/1127 

-- 
Björn Rabenstein
[PGP-ID] 0x851C3DA17D748D03
[email] bjo...@rabenste.in

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-developers/Y9qFL3rRvX2pBiEj%40mail.rabenste.in.

[prometheus-developers] Some ideas about native histograms and the text format

2023-01-04 Thread Bjoern Rabenstein

Dear Prometheans,

I've just chatted to a few of you abouh options for the text format
representation of native histograms. Here is a doc with the ideas we
discussed:
https://docs.google.com/document/d/1w6GhLmDYKkkNLsyPWkC3TGhW3Rs1G73j7E_nG7bhlw0/edit?usp=sharing

This is especially relevant if you work on one of the instrumentation
libraries that won't support protobuf any time soon. Maybe the ideas
inspire you. Should you plan to work on this, let's coordinate so that
the Prometheus server will be able to understand what your library
will expose, and so that we don't run into different directions at the
same time.

Enjoy,
-- 
Björn Rabenstein
[PGP-ID] 0x851C3DA17D748D03
[email] bjo...@rabenste.in

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-developers/Y7WsbfYfPYzSkrE3%40mail.rabenste.in.

Re: [prometheus-developers] Windows Exporter License

2022-12-20 Thread Bjoern Rabenstein

On 13.12.22 11:19, Julien Pivotto wrote:
> 
> In this sense, I think we should ask for an exception to the GB with the
> following arguments:

Silence means consent, right?

But just in case you are wondering if anyone has read your message, I
think it makes complete sense, and I fully agree with your points.

-- 
Björn Rabenstein
[PGP-ID] 0x851C3DA17D748D03
[email] bjo...@rabenste.in

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-developers/Y6HRoXOgbX%2BxSOuo%40mail.rabenste.in.

Re: [prometheus-developers] Changing consensus on HTTP headers

2022-12-06 Thread Bjoern Rabenstein

On 06.12.22 23:15, Julien Pivotto wrote:
> 
> https://github.com/prometheus/prometheus/issues/1724
> 
> Quoting Brian in 2016:
> > The question here is how complex do we want to allow scraping protocol
> > to be, and how complex a knot are we willing to let users tie themselves
> > in via the core configuration? Are we okay with making it easy for a
> > scrape not to be quickly testable via a browser? At some point we have
> > to tell users to use a proxy server to handle the more obscure use
> > cases, rather than drawing their complexity into Prometheus.
> > 
> > As far as I'm aware the use case here relates to a custom auth solution
> > with a non-recommended network setup. It's not unlikely that the next
> > request in this vein would be to make these relabelable, and as this is
> > an auth-related request, per discussion on #1176 we're not going to do
> > that. I think we'd need a stronger use case to justify adding this
> > complexity.
> 
> I do think that Brian's comments on authorization and security are still
> valid, and I don't plan to add headers support to relabeling - such as I
> don't plan to add relabeling for basic auth and other autorisation
> methods.

Thank you very much. Yes, this all makes sense. I.e. no plans for
support via relabeling, but allow users to do their special thing in
special cases via the config, even if that also opens up the
possibility to build a foot gun. (BTW, I'm a fan of clearly
documenting the dragons, so don't just add the config option, but put
a warning sign next it describing the typical pitfalls, like creating
metric endpoints that are inaccessible to browsers.)

-- 
Björn Rabenstein
[PGP-ID] 0x851C3DA17D748D03
[email] bjo...@rabenste.in

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-developers/Y4/bTp4k4mlqSHWg%40mail.rabenste.in.

Re: [prometheus-developers] [VOTE] Promote Windows Exporter as an official exporter

2022-12-06 Thread Bjoern Rabenstein

On 06.12.22 23:05, Julien Pivotto wrote:
> 
> I had two things in mind when calling the actual move to a vote:
> 
> 1. I considered that the Windows exporter had a large community of users
>and that taking the decision in public would be good for the community.
>Some people might object for various reasons, so it made sense to me
>to do it in public.
> 2. I felt that the one week delay would serve us best than the few days
>that lazy consensus allows. This gives a proper date to end the vote
>and people who want to react know that there is a deadline. I don't
>expect us to reach 1/2 of the Prometheus team voting within a week,
>because it's not a majority vote. I wanted this to be time-framed
>somehow.

I would claim that both can be achieved by mailing this list
(prometheus-developers@) with the proposal, stating that feedback is
welcome, and that lazy consensus is assumed if there are no objections
within a week.

In any case, thanks for sharing your reasoning. It helps me to
understand the context.

And maybe I'm just traumatized from the past when we had to use votes
frequently because consensus was impossible to reach.

-- 
Björn Rabenstein
[PGP-ID] 0x851C3DA17D748D03
[email] bjo...@rabenste.in

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-developers/Y4/ZJmby9WP5Q0WH%40mail.rabenste.in.

Re: [prometheus-developers] [VOTE] Promote Windows Exporter as an official exporter

2022-12-06 Thread Bjoern Rabenstein

YES

For the record:

While I agree with the proposal, I do not think it requires a
vote. I'm mostly mentioning it here so that nobody will see this as a
precedent or a role model that we will require votes from now on on
every new repository in the Prometheus GH org or something.

I think of moving a repository inte the Prometheus GH org as a
technical decision. But even if it is considered non-technical, I
don't read the governance as "any non-technical decision needs a
vote". A vote is only needed if a team member "deems it
necessary". Now that might be exactly the case here, but then I would
like to understand why. I do not expect any controversies here. And
even if they happened, I would only consider a vote after they have
shown up, not proactively.

In yet other words: I think voting is mostly meant for formal
decisions (governance changes, team membership changes). Other
decisions should only be voted on as a last resort. If that happens a
lot, it points towards a problem. We have been there, and luckily got
out of it. From that perspective, I would prefer if we did not call
votes lightheartedly. 

On 05.12.22 11:44, Julien Pivotto wrote:
> Dear Prometheans,
> 
> As per our governance [1], "any matter that needs a decision [...] may
> be called to a vote by any member if they deem it necessary."
> 
> I am therefore calling a vote to promote Prometheus-community's Windows
> Exporter [2] to Prometheus GitHub org, to make it an official exporter.
> 
> Official exporters are exporters under the Prometheus github org, listed
> as official on Prometheus.io and available under the Downloads page.
> 
> This would provide recognition and credibility to the exporter and its
> contributors, which have provided a large amount of work in the last
> years, and built a huge community.
> 
> It would make it easier for users to find and use the exporter, as it
> would be listed on the Prometheus website and promoted on the other
> official channels - such as our announce mailing list.
> 
> Anyone interested is encouraged to participate in this vote and this
> discussion. As per our governance, only votes from the team members will
> be counted.
> 
> Vote is open for 1 week - until December 12.
> 
> [1] https://prometheus.io/governance/
> [2] https://github.com/prometheus-community/windows_exporter
> 
> -- 
> Julien Pivotto
> @roidelapluie
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Prometheus Developers" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to prometheus-developers+unsubscr...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/prometheus-developers/Y43Lmr2%2Bb2fk8YSz%40nixos.

-- 
Björn Rabenstein
[PGP-ID] 0x851C3DA17D748D03
[email] bjo...@rabenste.in

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-developers/Y4%2BEYUpvs50o4u1W%40mail.rabenste.in.

Re: [prometheus-developers] Changing consensus on HTTP headers

2022-12-06 Thread Bjoern Rabenstein

On 28.11.22 11:29, Julien Pivotto wrote:
> 
> However, I have crafted a pull request that changes that consensus and
> makes HTTP headers configurable in the common HTTP client, with some
> reserved headers.

For findability: https://github.com/prometheus/common/pull/416

> What does the community & team members think about this?

Personally, I have no strong opinion on this.

However, since we apparently created a consensus previously to not do
this, could you perhaps remind everyone what the reasoning behind that
consensus was?

-- 
Björn Rabenstein
[PGP-ID] 0x851C3DA17D748D03
[email] bjo...@rabenste.in

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-developers/Y4%2BB0vFK0f2qxims%40mail.rabenste.in.

Re: [prometheus-developers] Name for new histograms

2022-11-14 Thread Bjoern Rabenstein

On 13.11.22 21:21, 'Fabian Stäber' via Prometheus Developers wrote:
> 
> I'm struggling with the name "native histograms" while implementing it in
> client_java because it does not imply a good name for the old histograms
> (maybe non-native histograms?).

I usually used the term "conventional histograms". I think it works
well, but it might not age well. (In 10 years, the native histograms
will feel pretty conventional to most users. On the other hand,
perhaps the old histograms are so rarely used them that we won't need
a name for them so often...)

Besides "non-native" (which I think is a fine name), you could call
them "emulated" or "legacy". We can see what name will stick.

> What do you think of "dynamic histograms" for the new ones and "static
> histograms" for the old ones? Or is it too late to open the naming
> discussion?

You can always open the naming discussion (again), but it was already
confusing to change from  "sparse histogram" to "native histograms". I
expect throwing in yet another term will multiply the confusion.

Having said that, I like "dynamic" much more than "sparse", but it
still suffers from the problem that it describes just a subset of the
features, while "native" describes the fundamental change that enabled
all the features that the other names describe ("sparse", "dynamic",
"high-res", "exponential", ...)

The only problem I have with the term "native" is that the old
histograms were already "native" in the old protobuf format, only that
Prometheus never delivered that promise when ingesting.

As a different thought: "static histograms" might still be a good name
for the old histograms, even if the new ones are called "native"
rather than "dynamic".

-- 
Björn Rabenstein
[PGP-ID] 0x851C3DA17D748D03
[email] bjo...@rabenste.in

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-developers/Y3Ihf54ALS/goQol%40mail.rabenste.in.

Re: [prometheus-developers] Prometheus client_model protobuf question

2022-11-14 Thread Bjoern Rabenstein

On 04.11.22 22:16, 'Fabian Stäber' via Prometheus Developers wrote:
> 
> Now, I'm not sure how to set the MetricFamily.name field for Counter
> metrics.
> 
> Should I use the name including the "_total" suffix here or without the
> "_total" suffix?

Short answer: Include the _total suffix.

Longer answer: OpenMetrics mandates to remove the _total suffix in the
metric family. That's a breaking change from the original Prometheus
format, both text and protobuf. (Which implies that you need to remove
the suffix when implementing the OpenMetrics protobuf format.) While
the change makes sense from a certain consistency perspective, I
personally think it's not worth the breakage it causes in many cases
(among them the collision of two standard Go metrics exposed by
prometheus/client_golang). It is one of a number of adoption hurdles,
and I would love to see this decision revisited.

-- 
Björn Rabenstein
[PGP-ID] 0x851C3DA17D748D03
[email] bjo...@rabenste.in

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-developers/Y3IdZ1oFqVKagR2e%40mail.rabenste.in.

Re: [prometheus-developers] Exemplars for _count in Summaries

2022-10-18 Thread Bjoern Rabenstein

On 06.10.22 14:45, 'Fabian Stäber' via Prometheus Developers wrote:
> 
> Great question from the CNCF Slack: What's the reason why we don't allow 
> Exemplars for _count in Summary metrics?
> 
> What do you think? Any reason why Exemplars don't work in _count in 
> Summaries? Would that be something we could consider supporting?

The _count of a Summary _and_ the _count of a Histogram (both
conventional as well as the new native ones) is essentially a counter
within the larger "structured" metric of a Summary/Histogram.

>From that perspective, it should have the option of attaching an
examplar, as a regular Counter has, too.

My speculation why it doesn't in OpenMetrics:

In an OM Histogram, the +Inf bucket fulfills exactly the same function
as the _count (spec says: "The +Inf bucket counts all requests.") So
if you would like an examplar on the _count of a Histogram, you can as
well use an exemplar on the +Inf bucket.

That obviously doesn't help in the case of a Summary, but I guess the
rationale is that Histograms are generally to be preferred over
Summaries, and therefore didn't get the thourough treatment when it
came to exemplars.

However, even if you really dislike the precalculated quantiles in
Summaries, there is still the case of a Summary without quantiles. I
think adding exemplars to such a Summary is as much needed as adding
exemplars to any regular Counter.

-- 
Björn Rabenstein
[PGP-ID] 0x851C3DA17D748D03
[email] bjo...@rabenste.in

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-developers/Y06klFkG1yubFGmE%40mail.rabenste.in.

[prometheus-developers] Dev summit this Thursday

2022-09-20 Thread Bjoern Rabenstein

Hi Prometheans,

We finally have a (virtual) dev-summit again this Thursday 2022-09-22,
15:00 UTC, see
https://docs.google.com/document/d/11LC3wJcVk00l8w5P3oLQ-m3Y37iom6INAMEu2ZAGIIE/edit

Unfortunately, I have to help out with an all-day event run by my
employer and very likely won't be able to attend. Please feel free to
also discuss the (many) agenda items in the backlog that I have
authored, as long as there are any participants interested in it.

Have fun.
-- 
Björn Rabenstein
[PGP-ID] 0x851C3DA17D748D03
[email] bjo...@rabenste.in

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-developers/YynqIrdTsTUjW%2B3O%40mail.rabenste.in.

Re: [prometheus-developers] Re: Prometheus Events and Calendar

2022-09-12 Thread Bjoern Rabenstein

Oops. Various balls got dropped here, including not responding to this
email thread in a timely fashion. I am very sorry.

We had a discussion in the Prometheus team today, and we came up with a plan
to organize these meetings in a more reliable fashion. Stay tuned...

On 24.08.22 22:20, Alolita Sharma wrote:
> +1 I've also tried to join some of the project contributor office hours 
> unsuccessfully. I would like to see some update on the meeting notes or 
> calendar notification to participants if the meeting is going be cancelled. 
> 
> On Wednesday, August 24, 2022 at 6:06:01 AM UTC-7 kwiesm...@google.com 
> wrote:
> 
> > Hey there,
> > I've tried to join a few of the meetings (mainly contributor office hours) 
> > recently and always give up after 5 minutes.
> > Sometimes the document is updated with a cancellation note at some point, 
> > but I wonder if somebody who owns the calendar could cancel the events at 
> > least an hour ahead of time if it's known the meeting won't happen.
> >
> > It makes it a bit hard for potential contributors to know what's going on 
> > and how to talk to the team.
> >
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Prometheus Developers" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to prometheus-developers+unsubscr...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/prometheus-developers/97c35114-30cf-41c9-beb7-2479080c3153n%40googlegroups.com.


-- 
Björn Rabenstein
[PGP-ID] 0x851C3DA17D748D03
[email] bjo...@rabenste.in

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-developers/Yx/AWOykEDzkpmma%40mail.rabenste.in.

Re: [prometheus-developers] Inf buckets in Native Histograms

2022-09-07 Thread Bjoern Rabenstein

On 06.09.22 02:50, 'Fabian Stäber' via Prometheus Developers wrote:
> 
> Looking at client_golang, it seems you can observe math.Inf(), and bucket 
> index math.MaxInt32 is used to represent the Inf bucket.
> 
> https://github.com/prometheus/client_golang/blob/95cf173f1965388665dcb2a28971f35af280e3a5/prometheus/histogram.go#L589-L590
> 
> I'm wondering how to represent the Inf bucket as a BucketSpan in protobuf.
> Initially I set the offset to current index minus previous index, but 
> obviously that doesn't work if the current index is MaxInt32.
> 
> Any ideas?

Yeah, very good question. And definitely something that needs to get
ironed out before coming up with a final spec for Native Histograms.

In practice, I think, observations of ±Inf will be irrelevant. The set
the sum of observations to ±Inf, too (or even to NaN if it was +Inf
before and then -Inf is observed or vice versa), thereby rendering the
sum useless.

My idea so far was to put observations of ±Inf and even NaN in no
bucket at all, let them "ruin" the sum of observations (setting it to
±Inf or NaN as appropriate), and increment the count of observations
as usual. In that way, the difference between observations in buckets
and observations in the count would account for all those
observations. The downside is that you cannot distinguish between the
three types of "weird" observations (+Inf, -Inf, NaN). On the other
hand, I don't think we should add a whole lot of costly plumbing
throughout the stack to store them separately.

>From a completionist's perspective, observations of very large
positive or negative numbers should be treated similarly as very small
observations, i.e. adding an "overflow bucket" (or even two, for
negative and positive observations separately) similarly to the zero
bucket we already have.

The reason for not doing it so far is mainly pragmatic: While it is
easy to accidentally create values close to zero (may it come from
some calculation or from actual physical measurements), it is far less
likely (but not impossible, of course) to accidentally create numbers
with a very large absolute value of up to ±Inf.

This assumption might not hold, and that's exactly why the Native
Histograms are marked as experimental. We can still correct those
things if needed.

> Not sure if this is covered in client_golang either 
> https://github.com/prometheus/client_golang/blob/95cf173f1965388665dcb2a28971f35af280e3a5/prometheus/histogram.go#L1272-L1280

Yeah, that's weird. I filed
https://github.com/prometheus/client_golang/issues/1131 to investigate
more closely.

-- 
Björn Rabenstein
[PGP-ID] 0x851C3DA17D748D03
[email] bjo...@rabenste.in

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-developers/Yxjfqp8TaxOuxtjx%40mail.rabenste.in.

Re: [prometheus-developers] When to merge native histograms into main branch?

2022-08-10 Thread Bjoern Rabenstein

On 01.08.22 06:28, Ganesh Vernekar wrote:
> 
> We have come up with a checklist of items with P0-P4 priorities for the 
> native histogram work (See here 
> 
> ).
> 
> The native histogram work currently lives in `sparshistogram` branches in 
> prometheus/prometheus, client_golang, client_model.
> 
> We would like to merge this into `main` branch as an experimental, opt-in 
> feature, after we are done with P0 and P1 items, to get it in the hands of 
> users and get early feedback. Remaining work will follow after that.
> 
> Please let us know if you think something is missing from our checklist, if 
> priority numbers look wrong somewhere, or any concerns around native 
> histograms in general.

Thanks for the feedback so far.

I plan to convert the items from the document above into actual GH
issues over the next days (so that we can assign them to individuals
and report progress, attach PRs, discuss details, ... in a better way
than in a Google Doc). I intend to create a milestone in the relevant
repositories (mostly prometheus/prometheus, but also
prometheus/common, prometheus/client_golang and possible other
instrumentation libraries) to clearly separate the histogram related
issues from the usual flow of issues (e.g. a P0 histogram issue is
very different from a P0 issue in main).

If you have any concerns (especially if you are a maintainer of an
affected repository) please let me know ASAP.
-- 
Björn Rabenstein
[PGP-ID] 0x851C3DA17D748D03
[email] bjo...@rabenste.in

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-developers/YvO%2BU2NAFiMNeA76%40mail.rabenste.in.

Re: [prometheus-developers] What if Prometheus to Scape Anything from Anywhere with embedded Zero Trust?

2022-06-14 Thread Bjoern Rabenstein

On 10.06.22 17:48, Rudford Hamon wrote:
> Yes :) What would be the best approach to see adoption and letting the
> community collectively know/try?

I guess you did the right thing already. A web search for "openziti
prometheus" gives tons of relevant results and discussions.

This list (prometheus-developers@) is aimed at the developers of
Prometheus (which seemed appropriate at first as the initial
discussion was around a zitified Prometheus binary). If you are more
interested in talking to _users_ of Prometheus (to help them with the
tunnel sidecar), the sister list prometheus-us...@googlegroups.com
might be a better fit. And there are more community channels, see
https://prometheus.io/community/ .

Pitching a commercial product there is frowned upon, but as long as
you are sticking to an OSS project like OpenZiti, and your posts stay
relevant and to the point, I would assume it's OK to spread the news
via those channels.

-- 
Björn Rabenstein
[PGP-ID] 0x851C3DA17D748D03
[email] bjo...@rabenste.in

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-developers/Yqhp4%2BgfizIOL080%40mail.rabenste.in.

Re: [prometheus-developers] What if Prometheus to Scape Anything from Anywhere with embedded Zero Trust?

2022-06-08 Thread Bjoern Rabenstein

On 01.06.22 11:28, Rudford Hamon wrote:
>
> For example, OpenZiti has a tunneler embedded with zero trust that
> can be used to connect (upstream or downstream) with a few lines of
> code. Super flexible and doesn't require any changes to your
> binaries. Also, as you mentioned, this will give the Prometheus
> family "end-users" an option to use whatever they feel is best for
> them. At least with the OpenZiti tunneler, the connection will be
> free with layer 7 security on the back-end. Once the Prometheus
> community becomes more familiar with embedded zero trust, the family
> may consider embedding zero trust at the application level so the
> project can scrape anything from anywhere without any exposers for
> both, Prometheus and end-users.

That sounds good. In that way, we can see what the adoption is without
requiring any changes on the Prometheus side.

-- 
Björn Rabenstein
[PGP-ID] 0x851C3DA17D748D03
[email] bjo...@rabenste.in

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-developers/YqD1lbZEcITFWx5K%40mail.rabenste.in.

Re: [prometheus-developers] What if Prometheus to Scape Anything from Anywhere with embedded Zero Trust?

2022-05-31 Thread Bjoern Rabenstein

On 23.05.22 15:24, Rudford Hamon wrote:
> 
> So, I am part an open source project called OpenZiti, where you can embed 
> zero trust networking into anything (apps-to-apps, server-to-apps, 
> server-to-server, etc) and be completely invisible while using basic 
> community internet. VPNs, Bastions, or jump servers, including old school 
> firewalls are NOT required. 
> 
> At OpenZiti, we use Prometheus and love the project just as much as 
> everyone else. Since we love embedded zero trust security and scrapping 
> data via Prometheus, we did a "zitification" test to see what the world 
> would look like if Prometheus was able to do its magic with embedded zero 
> trust and be completely invisible and scrape anything/anywhere without 
> inherently risky vulnerabilities. 

I'm not an expert in network security, so please pardon my possibly
imprecise use of jargon, but it sounds to me OpenZiti is a VPN where
you link the VPN parts directly into the software using the VPN.

The Prometheus project traditionally hasn't even tried to address
network security. We decided to delegate it to other components,
partially because Prometheus is already complex enough, partially
because we, as an OSS project, lack capacity and qualification to deal
with the network security aspects. We went as far as even refusing TLS
to be part of Prometheus components. Since TLS is so ubiquitious by
now and essentially seen as part of the network stack, we eventually
decided to support TLS directly rather than asking our users to set up
revers proxies, sidecars, etc. to add TLS support.

The latter gives you an idea what the threshold is where we would
consider linking network security related code directly into the
upstream projects.

Our users have very different approaches how to secure their networks
and how to organize metrics scraping, and I believe that will be the
case for the foreseeable future. (I should mention here that
cross-cluster scraping is considered a rare exception in the general
Prometheus deployment model.) Many might prefer a modular solution
that doesn't require changing all involved binaries with an SDK.

A "zitification" of the upstream Prometheus server (and presumably all
the other components of the Prometheus stack) seems to serve a fairly
niche une case at this moment. You are of course free to offer
"zitified" components, but as long as OpenZiti isn't even remotely as
ubiquitious as TLS, I cannot really imagine 1st class support in the
upstream Prometheus repositories.

That's just my initial thoughts based on a possibly incomplete
understanding of OpenZiti. Happy to hear the thoughts of other
Prometheus developers and of course more explanations from your side.

-- 
Björn Rabenstein
[PGP-ID] 0x851C3DA17D748D03
[email] bjo...@rabenste.in

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-developers/YpYDzdEsG8su1qPY%40mail.rabenste.in.

Re: [prometheus-developers] RFC: The implementation of labels.Labels

2022-05-25 Thread Bjoern Rabenstein

On 12.05.22 17:07, Bryan Boreham wrote:
> 
> If there is interest I will make a PR containing just the above change, no 
> change to the structure labels.Labels itself as yet.

I'm definitely interested. ;o)

-- 
Björn Rabenstein
[PGP-ID] 0x851C3DA17D748D03
[email] bjo...@rabenste.in

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-developers/Yo5N9oaGkUTL4v1I%40mail.rabenste.in.

Re: [prometheus-developers] OpenMetrics Summary Created Timestamp

2022-05-25 Thread Bjoern Rabenstein

On 24.05.22 15:54, Brian Brazil wrote:
> On Tue, 24 May 2022 at 15:38, 'Fabian Stäber' via Prometheus Developers <
> prometheus-developers@googlegroups.com> wrote:
> 
> >
> > I'm wondering: Is this how the Created timestamp for Summary metrics is
> > supposed to be implemented?
> >
> 
> Unless I messed up, it is implemented correctly in client_java.

And IIRC, it's the same way in client_golang. That's just the way it
is. The pre-calculated quantiles are not very "Promethean", as you
cannot really do anything with them at query time. The count and the
sum in the Summary (and additionally the bucket counts in a Histogram)
can be used for any desired time range (in the `rate` or `increase`
function). They are just different beasts.

-- 
Björn Rabenstein
[PGP-ID] 0x851C3DA17D748D03
[email] bjo...@rabenste.in

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-developers/Yo5LgGiputmsuPvW%40mail.rabenste.in.

Re: [prometheus-developers] Help the Alertmanager repository

2022-03-22 Thread Bjoern Rabenstein

On 15.03.22 07:32, 'Josh Abreu Mesa' via Prometheus Developers wrote:
> Dear Prometheus Developers,
> 
> Now that I finally have some long-term dedicated time to spend on the 
> Alertmanager, I'd like to know which area(s) do you consider that need help 
> the most?

I have some vague TODO notes here on my side around escaping and matchers:

- I believe `amtool` isn't yet using the new matchers.

- I'm pretty sure that labels in the UI aren't rendered with the
  appropriate escaping. Matchers render better in this regard, but
  some improvements could be done. And the preview of silenced alerts
  seems to have some escaping problems, too.

The above is fairly vague, just quoting my notes, but I guess I could
find out what my exact thoughts were back then. We could sit together
and nail it down.
https://github.com/prometheus/alertmanager/issues/1913 seems to be a
related issue.

A long time favorite if mine is to make alert groups linkable, so that
you can deep-link from notifications to the alert group that created
them. Relevant issues:
- https://github.com/prometheus/alertmanager/issues/211
- https://github.com/prometheus/alertmanager/issues/868

Finally, there has been a lot of discussion and issues about when
exactly to send out a resolved notification, especially when silences
enter the game. We even discussed it at a recent dev-summit, with the
outcome that we should think more fundamentally about the semantics of
muting in Alertmanager in general, i.e. write a design doc about it
(marked as TODO on
https://prometheus.io/docs/introduction/design-doc/, titled "Semantics
of muting in Alertmanager"). The relevant section in the dev-summits
notes, including links to related issues:
https://docs.google.com/document/d/11LC3wJcVk00l8w5P3oLQ-m3Y37iom6INAMEu2ZAGIIE/edit?pli=1#bookmark=id.xlg5vi22bgrw

Happy to meet for more brainstorming and clarifications.
-- 
Björn Rabenstein
[PGP-ID] 0x851C3DA17D748D03
[email] bjo...@rabenste.in

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-developers/YjmyDGzzyEeEG0gs%40mail.rabenste.in.

[prometheus-developers] Save the date: in-person dev-summit as part of KubeCon + CloudNativeCon Europe

2022-03-11 Thread Bjoern Rabenstein

Hi Prometheus developers,

Finally, we will have an in-person developer summit again. It will be
a pre-conference event prior to KubeCon + CloudNativeCon Europe in
Valencia, Spain. It will happen Monday, 2022-05-16, 08:00–15:00 CEST
(06:00-13:00 UTC). We will get a room at the conference venue, and a
KubeCon ticket is required to attend. (In the unlikely case that you
are in Valencia at that time but do not plan to attend KubeCon, I hope
we can find a solution. One could be to apply for an in-person
scholarship. But you have to do so soon, the deadline is this Sunday,
see
https://events.linuxfoundation.org/kubecon-cloudnativecon-europe/attend/scholarships/
.)

We'll try to set up a Google meet so that you can also participate
remotely, but that will be best effort. The focus will be on the
in-person experience.

More details will be announced closer to the date.

Note that this is an event for Prometheus developers (rather than
users). It's similar to our monthly online dev-summit, but much
longer. It will hopefully allow deeper discussions and closer
interactions than the online format. We might even have time to break
out into smaller groups for some hacking.

If you cannot wait, remember that we still have the online
dev-summits. One on 2022-03-24 and another one on 2022-04-28. See our
public calendar for details:
https://calendar.google.com/calendar/u/0/embed?src=prometheus.io_bdf9qgm081nrd0fe32g3olsld0%40group.calendar.google.com

Take care,
--
Björn Rabenstein
[PGP-ID] 0x851C3DA17D748D03
[email] bjo...@rabenste.in

Re: [prometheus-developers] Context and Timeouts in client_golang

2022-02-16 Thread Bjoern Rabenstein

Hi Joe,

I think you got the below quite right in general. The most relevant
prior discussion I know of is starting at this GH comment (hidden in
the issue about not using channels in the Collector interface):
https://github.com/prometheus/client_golang/issues/228#issuecomment-475214970

Fundamentally, I think the addition of contexts is straight forward
and indeed very desirable, for all the reasons you listed, and maybe
even more.

The reason why it hasn't happened yet is that it requires changes to
so many parts of the whole library. So we end up with a lot of very
similar almost-duplicates of functions/methods, which pollutes the
namespace, confuses the user, and is generally a design smell (in
particular in interfaces), all because we have to keep backwards
compatibility for this incredibly widely used library.

So the plan was to do it all in the glorious v2 rewrite (for which I
piled up a lot of those changes in design or additional features that
would require weird wrinkles to not make them breaking changes, see
the v2 milestone:
https://github.com/prometheus/client_golang/milestone/2 ).

However, for various reason, the progress on the v2 rewrite
stalled. We have new maintainers now (@bwplotka and @kakkoyon on GH),
and it's their call how to proceed here. My gut feeling is that they
tend to rather add more features to v1, even if it requires wrinkles,
than expedite a v2 rewrite. Perhaps they will follow up
here. Otherwise, try to get in touch with them is nome other
way. Maybe file a feature or start a discussion in GH.

On 09.02.22 20:05, Joe Adams wrote:
> I would like to propose adding context to the client_golang 
> promhttp.Handler functions, through a new set of functions in the promhttp 
> package. I believe that there was some discussion around this, but not for 
> a long time. 
> 
> The goal of this change is for exporters other metric providers to have an 
> opportunity to understand cancellation. The cancellation could be through 
> the http.Request.Context() being cancelled when the http request is closed 
> from the client, but Prometheus also provides a header 
> "X-Prometheus-Scrape-Timeout-Seconds" on scrape requests. Today, if an 
> exporter has an expensive collector, there is no way to know that the 
> results will be thrown away and that collection can be stopped in the event 
> that the scrape from Prometheus has timed out.[1]
> 
> I have only begun to dig into the work that would be required to propagate 
> the context the whole way down through a Registry and I can already tell 
> that it would not be trivial. The general idea would be to add context 
> versions of promhttp.Handler() and the supporting functions that would pull 
> the context from the http.Request and optionally create a child context 
> with the timeout/deadline from the X-Prometheus-Scrape-Timeout-Seconds 
> header. Downstream, the prometheus.Registry would also need to understand 
> context. I think this may only need an additional GatherCtx() func, but the 
> downstream prometheus.Collector interface would probably need a 
> context-aware counterpart.
> 
> I believe that this is all possible without any breaking changes, but I 
> have not researched enough to know for sure. I want to put this out to the 
> community and maintainers to get some feedback before spending too much 
> time trying to make these changes.
> 
> Joe Adams
> @sysadmind
> 
> 1. https://github.com/prometheus-community/postgres_exporter/pull/558
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Prometheus Developers" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to prometheus-developers+unsubscr...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/prometheus-developers/25a3ef44-6f02-492f-8f6c-28383cd0d6d8n%40googlegroups.com.

-- 
Björn Rabenstein
[PGP-ID] 0x851C3DA17D748D03
[email] bjo...@rabenste.in

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-developers/Yg0HL0oyZHm3yu5%2B%40mail.rabenste.in.

[prometheus-developers] Brainstorming doc for Histogram JSON format

2022-02-15 Thread Bjoern Rabenstein

Hi devs,

a crucial piece of the puzzle missing in the Sparse Histogram PoC (you
can find it in the
https://github.com/prometheus/prometheus/tree/sparsehistogram branch)
is the representation of the new histograms in the JSON returned by
the Prometheus query API. I created a exploratory document
summarizing my own thoughts and amendments thereof after some
discussions:

https://docs.google.com/document/d/1Efu0LX-fgNWix6ehfeCR0FzeWtHvftWFNoy7cYW9nqU/edit

It's open for commenting, so if you have thoughts to add, please do so.
We are still mostly experimenting, so this is not quite a "proper"
design doc yet. It should merely enable us to get the PoC ready so
that we can play with it, which will then feed back into a detailed
design doc for the "final" implementation (or let's say the one we con
merge into main ;), a similar approach as the one taken for the PromQL
changes.

Enjoy,
-- 
Björn Rabenstein
[PGP-ID] 0x851C3DA17D748D03
[email] bjo...@rabenste.in

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-developers/YgunpT/8%2B0yoqPrW%40mail.rabenste.in.

Re: [prometheus-developers] [VOTE] Rename blackbox_exporter to prober

2022-01-25 Thread Bjoern Rabenstein

NO

Not feeling strongly, actually. In general, I'm not sure if the
renaming itself will have a positive or negative effect (pretty much
along the lines of Marcelo's mail). Add to that the overhead and
confusion that comes with any renaming, I end up with a net-negative.

I explicitly don't want to make the call myself if "blackbox" has
negative connotations. From what I know and feel, it doesn't. More on
the contrary. But I think that call has to be made by those who would
be affected if there were negative connotations.

More generally, I think it is a good idea to avoid metaphors in
technical terms where possible, as attractive as they often
appear. It's the nature of metaphors that they can be understood in
various ways, and some of those might be unintended, distracting, or
even insulting. That's why I think the move from "master/slave" to
"primary/secondary" (or similar) is a great idea. However, "blackbox"
is as much a metaphor as "prober".

On 20.01.22 15:41, Julien Pivotto wrote:
> Dear Prometheans,
> 
> As per our governance, I'd like to cast a vote to rename the Blackbox
> Exporter to Prober.
> This vote is based on the following thread:
> https://groups.google.com/g/prometheus-developers/c/advMjgmJ1E4/m/A0abCsUrBgAJ
> 
> Any Prometheus team member is eligible to vote, and votes for the
> community are welcome too, but do not formally count in the result.
> 
> Here is the content of the vote:
> 
> > We want to rename Blackbox Exporter to Prober.
> 
> I explicitly leave out the "how" out of this vote. If this vote passes,
> a specific issue will be created in the blackbox exporter repository
> explaining how I plan to work and communicate on this change. I will
> make sure that enough time passes so that as many people as possible can
> give their input on the "how".
> 
> The vote is open until February 3rd. If the vote comes positive before
> next week's dev summit, the "how" can also be discussed during the dev
> summit, and I would use that discussion as input for the previously
> mentioned github issue.
> 
> -- 
> Julien Pivotto
> @roidelapluie
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Prometheus Developers" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to prometheus-developers+unsubscr...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/prometheus-developers/20220120144119.GA522055%40hydrogen.

-- 
Björn Rabenstein
[PGP-ID] 0x851C3DA17D748D03
[email] bjo...@rabenste.in

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-developers/YfAmVFJfmSB%2Bn3HV%40mail.rabenste.in.

signature.asc
Description: PGP signature

Re: [prometheus-developers] Evolving remote APIs

2021-12-01 Thread Bjoern Rabenstein

On 25.11.21 10:35, Fabian Reinartz wrote:
> 
> The point on TSDB becoming more structured is interesting – how firm are 
> these plans at this point? Any rough timelines?

I hope there will be a PoC for histograms in two or three months. It's
hard to estimate how long it will take after that to get to a mature
implementation that can be part of a proper Prometheus release.

But that's only histograms, i.e. changing the hardcoded "every sample
is a timestamped float" to a hardcoded "every sample is either a
timestamped float or a timestamped histogram". My hope is that this
change will teach us how we can go one step further in the future and
generalize handling of structured sample data.

So yeah, it's at least three steps away, and timelines are hard to
predict.

> My first hunch would've been to explore integrating directly at the 
> scraping layer to directly stream OpenMetrics (or a proto-equivalent) from 
> there, backed by a separate, per-write-target WAL.
> This wouldn't constrain it by the currently supported storage data model 
> and generally decouple the two aspects, which also seems more aligned with 
> recent developments like the agent mode.
> Any thoughts on that general direction?

Yes, this would be more in line with an "agent" or "collector"
model. However, it would kick in earlier in the ingestion pipeline
than the current Prometheus agent (or Grafana agent, FWIW) and
therefore would need to reimplement certain parts (while the
Prometheus agent, broadly simplified, just takes things away, but
doesn't really change or add anything fundamental): Obviously, it
needed a completely new WAL and the ingestion into it. It even affects
the parser because the Prometheus 2.x parser shortcuts directly into
the flat internal TSDB data model.

Ironically, the idea is similar to the very early attempt of remote
write (pre 1.x), which was closer to the scraping layer. Also, prior
to Prometheus 2.x, parsing was separate from flattening the data
model, with the intention of enabling an easy migration to a TSDB
supporting a structured data model.

Back then, one reason to not go further down that path was the
requirement of also remote-write the result of recording
rules. Recording rules act on data in the TSDB and write data to the
TSDB, so they are closely linked to the data model of the
TSDB. In the spirit of "one day we will just enable the TSDB to handle
structured data", I would have preferred to go the extra mile and
convert the output of recording rules back into the data model of the
exposition format (similar to how we did it (imperfectly) for
federation), but the general consensus was to move remote-write away
from the scraping layer and closer to the TSDB layer (which might have
been a key to the success of remote-write).

That same reasoning is still relevant today, and this might touch the
concerns Julien has expressed: If users use Prometheus (or a
Prometheus-like agent) just to collect metrics into the metrics
solution of a vendor, things work out just fine. But if recording (or
alerting) rules come into the game, things get a bit awkward. Even if
we funneled the result of recording rules back into the future
scrape-layer-centric remote-write somehow, it will still feel a bit
like a misfit, and users might think it's better to not do rule
evaluation in Prometheus anymore but move this kind of processing into
the scope of the metrics vendor (which could be one that is
Prometheus-compatible, which would at least keep the rules portable,
but in many cases, it would be a very different system). From a
pessimistic perspective, one might say this whole approach reduces
Prometheus to service discovery and scraping. Everything from the
parser on will be new or different.

As a Prometheus developer, I would prefer that users utilize a much
larger part of what Prometheus offers today. I also see (and always
have seen) the need for structured data (and metadata, in case that
isn't implied). That's why I want to evolve the internal Prometheus
data model including the one used in the TSDB, and to evolve the
remote write/read protocols with it.

That's an idealistic perspective, of course, and similar to the
remote-write protocol as we know it, a more pragmatic approach might
be necessary to yield working results in time. But perhaps this time,
designs could take into account the vision above so that later, all
the pieces of the puzzle can fall into place rather than moving the
vision even farther out of reach.

-- 
Björn Rabenstein
[PGP-ID] 0x851C3DA17D748D03
[email] bjo...@rabenste.in

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-developers/20211201122425.GS3668%40jahnn.

Re: [prometheus-developers] Evolving remote APIs

2021-11-22 Thread Bjoern Rabenstein

On 18.11.21 16:36, 'Fabian Reinartz' via Prometheus Developers wrote:
> 
> A central issue is that the remote APIs expose the Prometheus storage data
> model. It is notably different from the Prometheus/OpenMetrics
> instrumentation model and discards most of the structure known at scrape
> time.
> Structured data is critical to store and query data more effectively and
> translate it to different underlying storage data models. With the current
> API however the structure is very challenging and sometimes impossible to
> restore.

Thanks for picking this up. These were precisely the concerns when
remote write was sketched out in 2016 – and one of the reasons to mark
it explicitly as experimental. “Sadly” (and also unsurprisingly),
everyone jumped on the experimental specification, and a whole
industry has evolved around it, so that we are essentially required to
go for a v2 to address the concerns now.

I can add from the Prometheus side that things are finally moving
towards storing structured data natively in the TSDB, namely with the
work on the new histograms. I expect that the same work will open up
possibilities for more structured data and also for richer and better
integrated meta-data. The implications for remote-write are twofold:
For one, those changes motivate to change remote-write along with
them. On the other hand, it also enables Prometheus to support a more
structured remote-write protocol in the first place.

(Interestingly, before remote-write, Prometheus had federation, and it
deliberately uses the same format as for scraping. The plan back then
was to “soon” enable the Prometheus TSDB to support all the structure
and meta-data in the exposition format. But that hasn't happened yet,
and federation still exposes all metrics as flat "untyped" metrics
without any meta-data.)

-- 
Björn Rabenstein
[PGP-ID] 0x851C3DA17D748D03
[email] bjo...@rabenste.in

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-developers/20211122121647.GT3660%40jahnn.

Re: [prometheus-developers] Feature request: Setting a custom lookback delta for a query

2021-11-22 Thread Bjoern Rabenstein

As a short- to mid-term remedy, I like the idea of a lookback-delta
per query.

Long-term, I would prefer if we could get rid of the lookback delta
altogether. We want richer metadata anyway, and part of it could be
when a series starts and ends and what the configured scrape interval
was. Once we know that, we can create query strategies so that we can
always return the most recent valid sample (if there is any) without
generally killing query performance.

-- 
Björn Rabenstein
[PGP-ID] 0x851C3DA17D748D03
[email] bjo...@rabenste.in

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-developers/20211122115833.GS3660%40jahnn.

Re: [prometheus-developers] Enable auto merge

2021-11-22 Thread Bjoern Rabenstein

On 16.11.21 14:17, Levi Harrison wrote:
> 
> Recently, I've come across a few instances where auto-merge would have been 
> helpful and was wondering if consensus had been reached here.

My impression was that the discussion dwindled because it became
increasingly unclear what we are even discussing (the term
"auto-merge" seems to be heavily overloaded).

Perhaps you should precisely define what you are proposing when you
say "auto-merge" and then start a new discussion thread with that. Or
have a quick chat with your co-maintainer Julien (assuming you are
takling about enabling auto-merge for prometheus/prometheus) and then
directly call for a consensus.

-- 
Björn Rabenstein
[PGP-ID] 0x851C3DA17D748D03
[email] bjo...@rabenste.in

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-developers/20211122114329.GR3660%40jahnn.

Re: [prometheus-developers] Updating the Prometheus "Roadmap" page

2021-11-10 Thread Bjoern Rabenstein

On 02.11.21 14:48, Julius Volz wrote:
> 
> As Björn pointed out somewhere, our roadmap page at
> https://prometheus.io/docs/introduction/roadmap/ is pretty outdated. I'd
> encourage everyone who is working on great new roadmap items (sparse
> histograms are an obvious candidate, but maybe something else as well?) to
> incorporate them there so that it reflects reality again.
> 
> Also, at least some of the points there are somewhat stale or semi-done
> (like TLS+auth or adoption of OpenMetrics), but not sure if they are done
> enough to be removed?

I'm not so sure if we should even keep the roadmap page. The design
doc page is reflecting quite nicely what we are currently working
on. You can even get a quite good impression which parts are just
vague visions right now and which ones are more concretely sketched
out or even actively being worked on.

And then Julien mentioned some kind of mission statement recently. I
like the idea so that we can codify overarching goals on a higher
level than individual technical designs and decisions.

In between those two, perhaps the roadmap is just not needed
anymore. (Or it could be a page quoting that "mission statement" and
then refer to the design doc page.)

-- 
Björn Rabenstein
[PGP-ID] 0x851C3DA17D748D03
[email] bjo...@rabenste.in

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-developers/2020133434.GH3660%40jahnn.

[prometheus-developers] HEADS-UP: Packages will be moved out of prometheus/prometheus/pkg

2021-10-14 Thread Bjoern Rabenstein

Dear Prometheus developers,

On 2021-08-22, the Prometheus dev summit has expressed a consensus to
deprecate the `pkg` directory, see
https://docs.google.com/document/d/11LC3wJcVk00l8w5P3oLQ-m3Y37iom6INAMEu2ZAGIIE/edit?pli=1#heading=h.80qrixk0sjgv

The `pkg` directory is used in some (but not all) repositories in the
Prometheus GH org as a root directory for some (but not necessarily
all) Go packages in a repository. The discussion of the pros and cons
is nuanced (see the notes and recording of the dev summit), but the
outcome is that we desire consistency between repositories and want to
gravitate towards _not_ using the `pkg` directory. Please do not add
more packages to any existing `pkg` directory, and move packages out
of the directory at your convenience.

We plan to do the latter soon for prometheus/prometheus, and this mail
intends to be a heads-up for those using the affected packages as
libraries from outside the prometheus/prometheus repository.

The packages `gate`, `logging`, `modtimevfs`,  `pool`, and `runtime`
will be moved to the (already existing) `util` directory.

The packages `exemplar`, `labels`, `relabel`, `rulefmt`, `textparse`,
`timestamp`, and `value` will be moved to the (newly created) `model`
directory.

It will happen in this pull request:
https://github.com/prometheus/prometheus/pull/9478

Apologies for any hassle this may cause and thank you very much for
your understanding,
-- 
Björn Rabenstein
[PGP-ID] 0x851C3DA17D748D03
[email] bjo...@rabenste.in

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-developers/20211014115746.GF20744%40jahnn.

[prometheus-developers] Explaratory document: Sparse Histograms and PromQL

2021-10-07 Thread Bjoern Rabenstein

Dear Prometheans,

As you might know, we are busily working on implementing the new
sparse high-res histograms (commonly called “Sparse Histograms”).

Progress can be tracked in the `sparsehistogram` branch of various
repos:
- https://github.com/prometheus/prometheus/tree/sparsehistogram
- https://github.com/prometheus/client_golang/tree/sparsehistogram
- https://github.com/prometheus/client_model/tree/sparsehistogram

We have a PoC implementation of instrumentation, ingestion, storing in
the TSDB, and (raw) retrieval from the TSDB.

We have “run it in production”, at least almost (it was just a dev
cluster, but it was close enough to the real world...), with very
promising results, to be presented at the upcoming PromCon by Ganesh
and Dieter, see https://sched.co/mGK9 .

The missing piece of the puzzle is how to query Sparse Histograms with
PromQL and how to efficiently get the data over into graphing
frontends like Grafana to create awesome high-res heatmaps like the
one we hacked together and tweeted about:
https://twitter.com/_codesome/status/1414483704521498630

These last 1609.34m will be difficult to navigate, so I thought
perhaps I shouldn't start with a design doc but with an “exploratory
document”:
https://docs.google.com/document/d/1ch6ru8GKg03N02jRjYriurt-CZqUVY09evPg6yKTA1s/edit

This is not meant to suggest a particular solution, but to present the
options I see, in some detail, but not down to implementation level. I
hope it will seed a discussion, which might even add more options to
the pool, but should in any case inform us enough to then write a
proper design doc.

So please go ahead, read it and give feedback (comments are open for
everyone).

Thank you very much.
-- 
Björn Rabenstein
[PGP-ID] 0x851C3DA17D748D03
[email] bjo...@rabenste.in

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-developers/20211007141613.GS9442%40jahnn.

Re: [prometheus-developers] regarding missing some timeseries data differs in millisecond in prometheus

2021-10-07 Thread Bjoern Rabenstein

On 06.10.21 23:18, Prince wrote:
> I am monitoring an application. For which Prometheus server is scraping the 
> matrices at the endpoint along with the timestamp.
> But in prometheus graph It is not getting displayed for timestamps that are 
> differ in millisecond. For eg:
> metric_name{name:"abc"} val1  1633586593322// 2021-10-07T06:03:13.322Z
> metric_name{name:"abc"} val2  1633586593578//2021-10-07T06:03:13.578Z
> metric_name{name:"abc"} val3  1633586593983//2021-10-07T06:03:13.983Z
> metric_name{name:"abc"} val4  1633586594322//2021-10-07T06:03:14.322Z
> metric_name{name:"abc"} val5  1633586595322//2021-10-07T06:03:15.322Z
> 
> In The Prometheus graph, it is not showing the second and third time-series 
> data(the second and third are occurring at the same second as the first one 
> but in millisecond is different).  It is showing the first, fourth, and 
> fifth time-series data.

Note that this question has been crossposted to the prometheus-users
mailing list, see
https://groups.google.com/g/prometheus-users/c/rjeTjwd2Bxs/m/Vu1ulPnMAAAJ?utm_medium=email_source=footer

This question is more appropriate on the prometheus-users mailing
list, so I recommend to continue the discussion over there.

-- 
Björn Rabenstein
[PGP-ID] 0x851C3DA17D748D03
[email] bjo...@rabenste.in

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-developers/20211007111650.GP9442%40jahnn.

Re: [prometheus-developers] Change the focus of our docs section

2021-10-07 Thread Bjoern Rabenstein

On 05.10.21 11:22, Richard Hartmann wrote:
> 
> Within Docs WG[1] we realized that we're blocking ourselves. We're
> front-loading huge amounts of work which no one of us can
> realistically perform in the site setup. On the content side, we carry
> huge PRs which are always out of date and a pain to review - without a
> public way to find and use them as they're targetting a branch
> deploying to an obscure netlify instance, not prometheus.io.
> [...]
> As such, we would like to propose flipping it around: Optimize docs/
> for new and intermediate users, and do reference at reasonable effort.

I appreciate a clean and concise reference documentation.

But if keeping it perfectly clean is the enemy of some good
incremental doc improvements, I don't mind being less strict about it.

I trust the Docs WG to find the right balance here.

-- 
Björn Rabenstein
[PGP-ID] 0x851C3DA17D748D03
[email] bjo...@rabenste.in

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-developers/20211007110905.GN9442%40jahnn.

Re: [prometheus-developers] what's experimental .. again

2021-09-16 Thread Bjoern Rabenstein

Thanks, Julien, for bringing this to the mailing list, and apologies
for my late reply.

Despite the long time I needed to reply, only one other reply (by
l.mierzwa) has happened. Not sure if that means there is not much
interest in the topic, or everyone else agrees with Julien.

Anyway, here is my take:

Before we had feature flags, we already had the option of introducing
features declared as experimental and thus not covered by our semantic
versioning.

Before we had feature flags, we already had the option of hiding risky
features or those that introduced additional resource usage behind
config settings or flags (e.g. --storage.tsdb... flags for overlapping
blocks or WAL compression and many more examples). And of course,
there was nothing keeping us turning breaking changes into features
that needed to be turned on explicitly via a flag or a config setting.

Julien's idea about feature flags are the following:
> In my vision, and that's how I acted as maintainer, feature flags
> should be used when:
> 
> - We change an existing behaviour. [...]
> - We introduce very risky features, that introduce additional memory /
>   storage requirements.

If that's the case, why did we introduce feature flags at all? Nothing
really changed, right?

I think we introduced feature flags for more reasons, and those were
crucial for the liberating effect on our velocity:

(1) We shied away from experimental features because we got burned by
too many users using experimental features without being aware of
them being experimental and thus being angry at us if we broke
them (or, in reverse, we being reluctant to change an experimental
feature because too many users were already relying on it). And in
fairness, it is hard as a user to keep track of what features are
experimental if we have many of them. Feature flags make it very
explicit to the user if they are using an experimental feature and
which.

(2) We did not want an explosion of flags or config settings to gate
features or behaviors (as had happened in v1.x). Feature flags are
a lightweight alternative (because they are not separate flags but
just a comma separated list, which also implies that we don't have
to keep old flags around as no-ops once the experimental feature
is declared stable).

(3) We often got caught in long-winded discussions if a certain
feature is even desirable, if it perhaps encourages anti-patterns
or discourages best practices, etc. A feature flag is both light
weight but also very explicit that the feature is not yet
recommended/endorsed. It allows us to shortcut the long-winded
discussion and just try something out without throwing our users
under the bus.

(4) Even the question if a feature is actually breaking is more or
less hard to answer (obligatory reference: https://xkcd.com/1172/
). Feature flags allow us to postpone that discussion to the point
where we consider graduating a feature to stable.

In sum, it's all about "worry less, use more feature flags". But that
only works if we are liberal with using feature flags. Being
restrictive about the cases when to use feature flags will create a
whole new type of long-winded discussion (whether a particular feature
deserves a feature flag or not), and worse, it might just subtly bring
back all those blockers above (if we consciously or sub-consciously
avoid the discussion if that feature deserves a feature flag, we are
back to square one).

> In general, I think it does not benefit users to launch Prometheus with
> lots of feature flags. Our users should be able to assess the risk they
> take by using a feature, without always requiring feature flags.
> Especially for relatively small features like atan2. There is no
> intention to drop atan2 in Prometheus 2.x anyway, just we might find a
> better way to call it.
> 
> I try to draw a line between what's a useful feature flag, and where
> just marking experimental in documentation is fine. Prometheus is very
> conservative anyway, and I value the continuity of our features,
> including the "experimental" ones.
> 
> Just to give you an idea, if we had a very strong feature flags policy
> in Prometheus, here is what it could have looked like, based on
> https://prometheus.io/docs/prometheus/latest/stability/#api-stability-guarantees
> 
> --enable-feature=promql-at-modifier
> --enable-feature=expand-external-labels
> --enable-feature=promql-negative-offset
> --enable-feature=remote-write-receiver
> --enable-feature=exemplar-storage
> --enable-feature=body-size-limit
> --enable-feature=relabel-intervals
> --enable-feature=remote-read
> --enable-feature=https-basic-auth
> --enable-feature=web-ui
> --enable-feature=service-discovery-k8s
> --enable-feature=service-discovery-consul
> --enable-feature=remote-write-retry-on-429
> --enable-feature=target-limit
> 
> And that's what I want to avoid.

I think this whole line of argument is a bit of a red

Re: [prometheus-developers] Combining multiple metric to show on one graph in Prometheus

2021-09-16 Thread Bjoern Rabenstein

On 13.09.21 03:20, Prince wrote:
> 
> I want to combine two metrics in such a way that both should be displayed 
> on one graph in Prometheus. For eg:
>  metric_one{label1:value1} 123
>  metrci_two{label2:value2} 345
>   
> I want these metrics should be displayed on one chart in Prometheus.

That's not easily possible in the native Prometheus UI, but you can
use a 3rd party dashboarding tool like Grafana, where this is just
normal.

Having said that, there is an ancient GH issue to add the feature to
the Prometheus UI directly:
https://github.com/prometheus/prometheus/issues/39

In yet different news, this is the mailing list for Prometheus
development. If you have question about using Prometheus, the
Prometheus-users mailing list is much more suitable, see
https://groups.google.com/forum/#!forum/prometheus-users . There are
also other community channels to get support, see
https://prometheus.io/community/

-- 
Björn Rabenstein
[PGP-ID] 0x851C3DA17D748D03
[email] bjo...@rabenste.in

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-developers/20210916142749.GD3692%40jahnn.

Re: [prometheus-developers] Adding timestamps to Gauge Metric

2021-09-02 Thread Bjoern Rabenstein

On 31.08.21 04:27, Prince wrote:
> 
> for NewMetricWithTimestamp(time.Time,metric)  Is it compulsory that the 
> time should be in UTC?

Short answer: No.

The Go `time.Time` type includes the time zone. It can use any time
zone. The library will then use the capabilities of the Go `time.Time`
type to convert it into Unix-time, as required by the exposition
format, which is independent of time zones.

In different news: Using the Prometheus instrumentation libraries to
instrument your code counts as using Prometheus and should be
discussed on the Prometheus users mailing list, see
https://groups.google.com/forum/#!forum/prometheus-users

This here is the mailing list to discuss development of the various
Prometheus components itself.

-- 
Björn Rabenstein
[PGP-ID] 0x851C3DA17D748D03
[email] bjo...@rabenste.in

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-developers/20210902160551.GP1307%40jahnn.

Re: [prometheus-developers] Deprecating https://github.com/prometheus/nagios_plugins (?)

2021-08-18 Thread Bjoern Rabenstein

On 18.08.21 12:03, Julien Pivotto wrote:
> We should graveyard it. If you are concerned about redirecting people to
> the fork, we should transfer the ownership of the repo.

Yeah, but transferring the ownership will require quite a risky dance
(because the fork already exists for a while, see my original mail).

In any case, we now have diverging opinions about merely archiving
vs. graveyarding. I'll archive the repo now in any case, because that
doesn't exclude graveyarding later, but "stops the bleeding".

Personally, I'm not very concerned about the redirect, but if we don't
do the redirect, we should graveyard only after a good while (if we
want to do it at all).

Bottom line, I guess: Let's discuss the graveyarding in a few months
time.

-- 
Björn Rabenstein
[PGP-ID] 0x851C3DA17D748D03
[email] bjo...@rabenste.in

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-developers/20210818105417.GT3669%40jahnn.

Re: [prometheus-developers] Deprecating https://github.com/prometheus/nagios_plugins (?)

2021-08-18 Thread Bjoern Rabenstein

On 17.08.21 10:54, Matthias Rampke wrote:
> I think the no-magic route is better. You can also archive the repo[0] to
> make it clear that it's read only (with this GitHub feature, do we still
> need to graveyard anything ourselves?

That's a good idea. I'll do just that.

In general, I think it still makes sense to graveyard repos if we want
to reduce visibility, e.g. because continue using the code would be
harmful, but in this case, where you could still use the old code but
should go for the fork for new features etc., archiving seems
precisely the right thing to do.

-- 
Björn Rabenstein
[PGP-ID] 0x851C3DA17D748D03
[email] bjo...@rabenste.in

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-developers/20210818095542.GS3669%40jahnn.

[prometheus-developers] Deprecating https://github.com/prometheus/nagios_plugins (?)

2021-08-13 Thread Bjoern Rabenstein

Hi,

More than a year ago, I added a pointer from
https://github.com/prometheus/nagios_plugins (the "old repo") to its
fork https://github.com/magenta-aps/check_prometheus_metric (the "new
repo"), see https://github.com/prometheus/nagios_plugins/pull/26 .

I've never heard any complaints about the new plugin, so I think it's
about time to properly deprecate the old repo.

First of all: Does anyone have any objections?

Assuming we can go forward with it: What do you think is the best
procedure? Ideally, we would redirect from the old to the new
repo. However, that's not as easy as it looks. So far, I think this
would require the following gymnastics:

- Delete the new repo.
- Transfer the ownership of the old repo to magenta-aps with
  the same name as the (deleted) new repo.
- Replay all the commits that happened in the new repo to the
  transfered repo to make it appear like the new repo before,
  just not as a fork.

Does anyone have a better idea?

And if not, should we really do that or would it be better to apply less
magic, just put a big and fat deprecation warning onto the old repo,
and graveyard it after another half year or so?

Any feedback welcome.
-- 
Björn Rabenstein
[PGP-ID] 0x851C3DA17D748D03
[email] bjo...@rabenste.in

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-developers/20210813155716.GE3669%40jahnn.

Re: [prometheus-developers] How about adding some anomaly detection functions in the promql/function.go file

2021-06-30 Thread Bjoern Rabenstein

On 25.06.21 01:27, Shandong Dong wrote:
> Ok, I will try the PR first. Can I know what‘s the concern of "Personally, 
> I'm still not sure if that's a sustainable approach. "?

We had a handful of requests in the past to add specific advanced
statistics functions. In one case, a function was actually added, see
https://prometheus.io/docs/prometheus/latest/querying/functions/#holt_winters

The problem with the latter is that it was actually not the variety of
Holt-Winters that most people wanted. A lot of misunderstanding
happened because of that. My impression is (but I might be proven
wrong) is that this is a rarely used PromQL function. But now we have
to support it at least until the next major release.

That latter problem will be avoided by feature flags. But if we now
each of the five to ten persons that requested new functions will add
on average two to three new functions, we end up with about 20 new
functions, all with the same potential of being misunderstand. Many
might be overlapping, so any new function needs to be reviewed for
overlap with existing ones. Even if they are all behind feature flags,
they will require a lot of code with potential interaction with
existing code and with each other, so there is some maintenance
overhead.

Eventually, reviewing and acceptance of even more functions behind
feature flags will slow down. So we are back at square one. And the
multitude of experimental functions will make it harder for users to
find the right one to try out. Which in turn will make it harder to
identify the actually generally useful functions and "graduate"
them. Realistically, there will be small groups of users liking
subsets of functions, but rarely functions that aither everyone needs
and likes or nobody.

It feels a bit like the attempt to create a Python interpreter for
data science that doesn't understand modules and instead tries to have
all required functions built-in. That's hardly a reasonably
approach. And that's why my personal idea is that Prometheus either
has to keep stating that it is only meant for basic mathematical
operations on metrics, or it has to provide some kind of "scripting
interface" to allow custom mathematical "libraries" for users with
special requirements.

-- 
Björn Rabenstein
[PGP-ID] 0x851C3DA17D748D03
[email] bjo...@rabenste.in

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-developers/20210630212438.GO11559%40jahnn.

Re: [prometheus-developers] How about adding some anomaly detection functions in the promql/function.go file

2021-06-24 Thread Bjoern Rabenstein

On 24.06.21 00:05, 董善东 wrote:
> hi,all
> In the existing prometheus version, the anomaly detection still relies 
> fully on the rules setting. We find that it is inconvenient to set and hard 
> to maintain in practical use.
> So I propose to add some statistical analysis functions to provide better 
> and stronger AD ability. 

Yeah, that's a frequent request. Unfortunately, there are so many
statistical analysis functions that we can hardly just add them all.

So far, the usual recommendation is to extract data from Prometheus
via the HTTP API and feed it to a fully-fledged statistics tool.

Obviously, that doesn't help you with alerts (which you probably want
to keep within Prometheus).

At the previous to last dev-summit (2021-05-27), we discussed the use
case.

Outcome was the following:
* We want to explore supporting analytics use cases within PromQL behind
  a feature flag
* We are open to wrapping other languages, e.g. R, Fortran, SciPython,
  given an accepted design doc

See alse notes here:
https://docs.google.com/document/d/11LC3wJcVk00l8w5P3oLQ-m3Y37iom6INAMEu2ZAGIIE/edit?ts=6036b8e0=1#heading=h.sa2f6aem9wdt

So I guess you could just implement the functions you like and put
them into a PR, locked behind a feature flag.

Personally, I'm still not sure if that's a sustainable
approach. Perhaps integrating some scripting engine to allow
user-defined functions might be better. But we'll see…

-- 
Björn Rabenstein
[PGP-ID] 0x851C3DA17D748D03
[email] bjo...@rabenste.in

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-developers/20210624211449.GC11559%40jahnn.

Re: [prometheus-developers] Requirements / Best Practices to use Prometheus Metrics for Serverless environments

2021-06-24 Thread Bjoern Rabenstein

On 22.06.21 11:26, Tobias Schmidt wrote:
> 
> Last night I was wondering if there are any other common interfaces
> available in serverless environments and noticed that all products by AWS
> (Lambda) and GCP (Functions, Run) at least provide the option to handle log
> streams, sometimes even log files on disk. I'm currently thinking about
> experimenting with an approach where containers log metrics to stdout /
> some file, get picked up by the serverless runtime and written to some log
> stream. Another service "loggateway" (or otherwise named) would then stream
> the logs, aggregate them and either expose them on the common /metrics
> endpoint or push them with remote write right away to a Prometheus instance
> hosted somewhere (like Grafana Cloud).

Perhaps I'm missing something, but isn't that
https://github.com/google/mtail ?

-- 
Björn Rabenstein
[PGP-ID] 0x851C3DA17D748D03
[email] bjo...@rabenste.in

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-developers/20210624210908.GB11559%40jahnn.

Re: [prometheus-developers] Requirements / Best Practices to use Prometheus Metrics for Serverless environments

2021-06-18 Thread Bjoern Rabenstein

On 15.06.21 20:59, Bartłomiej Płotka wrote:
> 
> Let's now talk about FaaS/Serverless.

Excellent! That's my 2nd favorite topic after histograms. (And while I
provably talked about histograms as my favorite topic since early
2015, I have only started to talk about FaaS/Serverless as an
important gap to fill in the Prometheus story since 2018.)

I think "true FaaS" means that the function calls are
lightweight. The additional overhead of sending anything over the
networks defeats that purpose. So similar to what has been said
before, and what Bartek has already nicely worked out, I think the
metrics have to be managed by the FaaS runtime, in the same path as
billing is managed.

And that's, of course, what cloud providers are doing, and it's also a
formidable way of locking their customers into their own metrics and
monitoring system.

And that's in turn precisely where I think Prometheus can use its
weight. Prometheus has already proven that cloud providers can
essentially not get away with ignoring it, and even halfhearted
integrations won't be enough. With more or less native Prometheus
support by cloud providers, it might actually just require a small
step to come to some convention how to collect and present FaaS
metrics in a "Promethean" way. If all cloud providers do it the same
way, the lock-in is gone.

I think it would be very valuable to study what OpenFaaS has already
done: https://docs.openfaas.com/architecture/metrics/

In the simplest case, we could just say: Please, dear cloud providers,
please expose exactly the same metrics for general benefit. If there
is anything to improve with the OpenFaaS approach, I'm sure they will
be delighted to get help. (Spontaneously, I'm missing a way to define
custom metrics, e.g. how many records a function call has processed.)

> * Suggestion to use event aggregation proxy
> 
> * Pushgateway improvements
>  
> for
> serverless cases

Despite all of what I said above, I think there _are_ quite a few user
of FaaS who have fairly heavy-weight function calls. For them, pushing
counter increments etc. via the network might actually be more
convenient than funneling metrics through the FaaS runtime. This is
then just another use-case of the "distributed counter" idea, which
the Pushgateway quite prominently is not catering for. As discussed
in the thread linked above and at countless other places, I strongly
recommend to not shoehorn the Pushgateway into this use-case, but
create a separate project for it, which would be designed from the
beginning for this use-case. Perhaps
weaveworks/prom-aggregation-gateway is just that. I haven't studied it
in detail yet. In a way, we need "statsd done right". Again, I would
suggest to look what others have already done. For example, there are
tons of statsd users out there. What have they done in the last years
to overcome the known shortcomings? Perhaps statsd instrumentation and
the Prometheus statsd exporter just needs a bit of development in that
way to make it a viable solution.

> I think the main problem appears if those FaaS runtimes are short-living
> workloads that automatically spins up only to run some functions (batch
> jobs). In some way, this is then a problem of short-living jobs and the
> design of those workloads.
> 
> For those short-living jobs, we again see users try to use the push model.
> I think there is room to either streamline those initiatives OR propose
> an alternative. A quick idea, yolo... why not killing the job after the
> first successful scrape (detecting usage on /metric path)?

Ugh, that doesn't sound right. I think this problem should be solved
within the FaaS runtime in the way they prefer. Cloud providers need
billing in any case (they want to make money after all), so they have
already solved reliably metrics collection for that. They just need to
hook in a simple exporter to present Prometheus metrics. See how
OpenFaaS has done it. Knative seems to have gone down the OTel path,
but that could be seen as an implementation detail. If they in the end
expose a /metrics endpoint with the desired metrics for Prometheus to
scrape, all is good. It's just a terribly overengineered exporter,
effectively. (o;

-- 
Björn Rabenstein
[PGP-ID] 0x851C3DA17D748D03
[email] bjo...@rabenste.in

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-developers/20210618221656.GS3670%40jahnn.

Re: [prometheus-developers] Add metric for scrape timeout

2021-06-09 Thread Bjoern Rabenstein

On 06.06.21 09:56, Christian Galsterer wrote:
> There are metrics for the actual scrape duration but currently there are no 
> metrics for the scrape timeouts. Adding metrics for the scrape timeout 
> would it make possible to monitor and alert on scrape timeouts without 
> hard-coding the timeouts in the PromQL queries but the new metric can be 
> used.

Sounds like a good idea at first glance, but note that this would be
yet another metric that gets automatically added to every single
target. I think we have to be careful when doing so.

Your proposal mirrors a part of the configuration into metrics. That
is sometimes a neat thing to do, but it has to be enjoyed responsibly.

In this case, you want to specifically alert on scrape timeouts (or, I
guess, approaching them). The same argument could be made to alert on
exceeding (or approaching) the sample limit. So we need a new scrape
metric for the `sample_limit` configuration setting, too. The same is
true for all the other limits: `label_limit`,
`label_name_length_limit`, `label_value_length_limit`,
`target_limit`. So we have to add _six_ new metrics. Also, I had a
bunch of situations where I would have liked to know the intended
scrape interval of a series (rather than guessing it from the spacing
I could see in the samples of the series). So yet another metric for
the configured scrape interval. Things are getting out of control
here...

The question is, of course, why you would like to alert on scrape
timeout specifically. There are many possible reasons why a scrape
fails. Generally, I would recommend to just alert on `up` being zero
too often. If that alert fires, you can then checkout the Prometheus
server in question and investigate _why_ the scrapes are failing.

Interestingly, we have a metric
`prometheus_rule_group_interval_seconds` for the configured evaluation
interval of a rule group. Note, however, that this is not a synthetic
metric injected alongside the evaluation result of the rule, but only
exposed by the `/metrics` endpoint of Prometheus itself. That's only
one metric per rule group, and it's exposed for meta-monitoring, which
could be on a separate server, so it doesn't "pollute" the normal
metrics.

In summary, I'm pretty sure we shouldn't add half a dozen synthetic
metrics for each target to mirror its configuration into metrics. But
perhaps we could add more metrics for meta-monitoring. Have a look at
the already existing metrics beginning with
`prometheus_target_...`. There is for example
`prometheus_target_scrapes_exceeded_sample_limit_total`, but note that
this is just one metric for the whole server. It's mostly meant to get
a specific alerts if _any_ targets run into the sample limit. Perhaps
the same could be done for timeouts as
`prometheus_target_scrapes_exceeded_scrape_timeout`.

-- 
Björn Rabenstein
[PGP-ID] 0x851C3DA17D748D03
[email] bjo...@rabenste.in

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-developers/20210609162547.GO3670%40jahnn.

Re: [prometheus-developers] Alerting rule for gauge metric with new label value

2021-06-09 Thread Bjoern Rabenstein

[Redirecting this to prometheus-users@ and bcc'ing
prometheus-developers@ because it is about using Prometheus, not
developing it.]

On 05.06.21 10:00, karthik reddy wrote:
> 
> 
> Hello Developers,
> 
> Please let me know how to create alerting rule such that, whenever 
> Prometheus scrapes a gauge metric with a new label value from Pushgateway, 
> I need to check that value range and raise an alert if it is out of range.
> 
> For example:
> I want to alert if file_size>100 for newly added files, id is different and 
> random for each file
> file_size{job=”pushgateway”, id=F234} 80 (in GB)
> file_size{job=”pushgateway”, id=F129} 40 (in GB)
> 
> whenever new file_size(job=”pushgateway”, id=F787} 23 is added to 
> Prometheus, I should be check 23>100? and send an alert mail such that, 
> “file with id F787 size exceeded”.

I think you could craft something with `absent` and `offset` so that
the alert only fires if the corresponding time series wasn't there a
certain amount af time ago.

However, this all smells quite event-driven: Pushing something like an
event to the Pushgateway, then creating a one-shot alert based on that
"event"... Perhaps you are shoehorning Prometheus into something it's
not good at? A Prometheus alert is usually something that keeps firing
for as long as the alerting condition persists. Are files larger than
100GiB suddenly fine once they have been around for a while? (And how
long is that "a while"?)

-- 
Björn Rabenstein
[PGP-ID] 0x851C3DA17D748D03
[email] bjo...@rabenste.in

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-developers/20210609154423.GN3670%40jahnn.

[prometheus-developers] New maintainers for prometheus/client_golang

2021-06-01 Thread Bjoern Rabenstein

Hi Prometheans,

I'm retiring as a maintainer of prometheus/client_golang, focusing on
other projects within the Prometheus ecosystem (like, who would have
guessed, the new histograms...).

Please welcome the new maintainers:
* Bartłomiej Płotka  @bwplotka
* Kemal Akkoyun  @kakkoyun

If you want to see the two in action talking about client_golang,
click here: https://www.youtube.com/watch?v=LU6D5cNeHks

Cheers.
-- 
Björn Rabenstein
[PGP-ID] 0x851C3DA17D748D03
[email] bjo...@rabenste.in

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-developers/20210601150408.GL2608%40jahnn.

Re: [prometheus-developers] count_values string formatting

2021-05-06 Thread Bjoern Rabenstein

On 06.05.21 16:41, Julien Pivotto wrote:
> On 06 May 14:01, Bjoern Rabenstein wrote:
> > Initially, I intuitively thought we should do what Julien has now
> > proposed, too. However, in the course of the discussion, I then
> > convinced myself that Tristan's approach makes more sense.
> > 
> > So maybe we all have to go through these stages. (o:
> 
> 
> I am splitting the end user experience from the implementation.
> 
> We could pass the labelname + "\0" + formatting  to the aggr function,
> which would simulate Tristan's approach without being complex for the
> user.

I think Tristan's approach is easier for the user.

A separate formatting string suggests to be, well, a formatting
string. But we only want a number and a single character. Plus, it's
pretty much a power-user feature, so an additional optional parameter
is a bit too prominent for my taste.

-- 
Björn Rabenstein
[PGP-ID] 0x851C3DA17D748D03
[email] bjo...@rabenste.in

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-developers/20210506145348.GN7498%40jahnn.

Re: [prometheus-developers] count_values string formatting

2021-05-06 Thread Bjoern Rabenstein

Initially, I intuitively thought we should do what Julien has now
proposed, too. However, in the course of the discussion, I then
convinced myself that Tristan's approach makes more sense.

So maybe we all have to go through these stages. (o:
-- 
Björn Rabenstein
[PGP-ID] 0x851C3DA17D748D03
[email] bjo...@rabenste.in

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-developers/20210506120141.GH7498%40jahnn.

Re: [prometheus-developers] count_values string formatting

2021-05-04 Thread Bjoern Rabenstein

As you can guess from my comments on the issue, I like the
idea. Formatting is a problem if you put a number into a string. And
while I doubt that `FormatFloat` has any guarantees of always
formatting in the same way across Go versions and platforms, at least
we can help the user a bit here.

On 04.05.21 09:38, Tristan Colgate wrote:
> 
> In the github issue referenced about, it has been suggested that
> count_values arguments could be used to control the value format.
> We've suggested allowing a fmt.Printf style (actually
> strconv.FormatFloat), specification to be appended to the label name,
> perhaps via a comma. e.g.
> 
> count_values("le,g.2")

I think that would work. We just need any separator that is not a
legal charactor of a label name. Which one we pick is matter of
taste. How about `%` to connect it to the printf-style formatting?

So the format could be "[%[number]char]".

`number` becomes the `prec` parameter of `FormatFloat` (with default
value -1), and `char` becomes the `fmt` parameter of `FormatFloat`.

If nothing is specified, we fall back to `%f`, which is the current
behavior, so wo don't change current usage.

> In addition, to support the bucket labels produced for OpenMetrics
> histograms, we could use 'o' to facilitate OpenMetrics compatible
> formatting (this is %g with s potential additional .0 appended to
> exact integer values). 'o' is unused by FormatFloat.

+1

-- 
Björn Rabenstein
[PGP-ID] 0x851C3DA17D748D03
[email] bjo...@rabenste.in

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-developers/20210504142026.GB2645%40jahnn.

Re: [prometheus-developers] FR/Discuss: Enable non-collection of metrics based on curried labels

2021-03-31 Thread Bjoern Rabenstein

On 30.03.21 17:42, Aaron Gable wrote:
> 
> Suppose I have a `CounterVec` that I call `requestsCounter`, with labels 
> `method` (whose values are only `GET` and `POST`) and `success` (whose 
> values are only `true` and `false`). Maybe I have a set of unittests which 
> implement a purposefully-broken handler, and I'd like to assert that we 
> incremented the counter for unsuccessful requests the appropriate number of 
> times, _regardless_ of whether those requests were GETs or POSTs.

In very general, I'd break this down, following the philosophy of
letting a unit test just exercise the code it is supposed to test.

With your current approach, you are almost doing and end-to-end test
by invoking the whole `Collect` and `Write` machinery. I guess that's
fine for true end-to-end tests. (See
https://pkg.go.dev/github.com/prometheus/client_golang@v1.10.0/prometheus/testutil
for utilities to write those tests.) In your case, as you said, you
want to know if "we incremented the counter for unsuccessful requests
the appropriate number of times". Ideally, you inject a CounterVec
mock that doesn't really do anything with the metrics, but just
records that the right counter child was retrieved and the appropriate
increments were performed on it. Go doesn't lend itself to this kind
of monkey-patching, and the client_golang library misses good support
for this kind of approach right now, which has to do with certain
design problems that cannot be fixed without a breaking change, see
https://github.com/prometheus/client_golang/issues/230 . But it's
still possible. In your case, your code probably would have to act on
an interface, which is designed in a way that both `CounterVec` as
well as your injected mock are implementing it.

> Unfortunately this fails, because the curried `MetricVec` still sends *all* 
> metrics to the `.Collect()` channel, even those that would be excluded by 
> the curried labels.

Yeah, that's another design problem of the library. The excuse is that
currying was introduced relatively late in the design process. In an
ideal world (and in v2 of the library), the curried CounterVec
wouldn't even have a `Collect` method.

You could also follow the "inject a mock" approach half-way and
inject a CounterVec that only contains the relevant metrics instead of
using the full CounterVec with all the other metrics plus currying.

> The only real solution that I see here is to perform the currying ourselves 
> -- rather than passing a curried `MetricVec` into the `AssertMetricEquals` 
> function, pass a set of `Labels` into it and do the filtering ourselves 
> based on the contents of `iom.Label`. This would be functionally the same 
> as the "very hacky and weird" workaround in #834.

Yes, in case you want to stick with the "let's collect the metrics and
inspect the protobuf" approach, that's probably the most
straightforward way to go.

-- 
Björn Rabenstein
[PGP-ID] 0x851C3DA17D748D03
[email] bjo...@rabenste.in

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-developers/20210331203601.GU2627%40jahnn.

Re: [prometheus-developers] [VOTE] Allow environment variable expansion on external label values

2021-03-26 Thread Bjoern Rabenstein

On 25.03.21 23:06, Julien Pivotto wrote:
> Hereby I am calling a vote to allow the expansion on environment
> variables in the prometheus configuration file.
> Because it can be seen as an override of a previous vote[1], I am calling a
> new vote for this specific part.
> 
> The consensus in the dev summit is:
> 
> We will allow substitution of ENV variables into label values in the
> external_label configuration block only, behind an experimental feature
> flag.

YES

And thanks, Julien, for catching this. We should not forget that the
discussions and consensus finding at the dev-summit, as useful as they
are, are informal decisions and in no way comparable to a formal vote.

Which is the perfect opportunity to apologize for my needlessly
aggressive reaction after Richi emphasized the weight of a dev-summit
consensus. What triggered me was the memory of multiple occasions
where I made an effort to bring novel and nuanced technical arguments
to the table, only to get shot down by “That is not what we decided at
the dev summit.”  Which is especially painful if it was “decided” at a
dev summit where I wasn't even present.

> The vote is open for a week (until April 2nd), or until we have 9 ayes or 9 
> noes.
> Any Prometheus team member is eligible to vote[2].
> 
> 1: 
> https://groups.google.com/g/prometheus-developers/c/tSCa4ukhtUw/m/J-j0bSEYCQAJ
> 2: https://prometheus.io/governance/

-- 
Björn Rabenstein
[PGP-ID] 0x851C3DA17D748D03
[email] bjo...@rabenste.in

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-developers/20210326184356.GN2773%40jahnn.

signature.asc
Description: Digital signature

[prometheus-developers] Prometheus dev summit TODAY

2021-03-25 Thread Bjoern Rabenstein

Hi Prometheans,

Sorry for the very late notice. The idea was to send out this
announcement earlier.

We have the March developer summit today, at 15:00 UTC (i.e. in 30m).
Use https://meet.google.com/hmj-eyrv-fhr to join the Google Meet.
Or even better, have a look at our public calendar, which will always
show you public Prometheus events, even if we forget to announce them
here:
https://calendar.google.com/calendar/u/0/embed?src=prometheus.io_bdf9qgm081nrd0fe32g3olsld0%40group.calendar.google.com

And if you cannot make it, don't despair. We'll record the event and
publish it on Youtube.

-- 
Björn Rabenstein
[PGP-ID] 0x851C3DA17D748D03
[email] bjo...@rabenste.in

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-developers/20210325143507.GG2773%40jahnn.

Re: [prometheus-developers] Add collector for database/sql#DBStats

2021-03-23 Thread Bjoern Rabenstein

On 23.03.21 14:31, Julien Pivotto wrote:
> 
> I find what we do with Java quite nice:
> 
> https://github.com/prometheus/client_java

Could you be more specific what you are referring to?

> Unfortunately for golang, it will be more difficult for at least 2
> releases, the time for go mod lazy loading to come and make its youth
> sicknesses. let's not forget client_golang is one of the most used
> go libraries out there.

Is this meant as a concern that adding more collectors will blow up
the number of Go modules involved?

In case of the proposed collector for the database/sql, there won't be
any new dependencies, so that wouldn't be a concern.

Or am I reading you wrong here?


In any case, I get the following from the responses so far:

- Yes, it would be fine to have more collectors for more or less
  standardized things in client_golang. We do have something similar
  in client_java already.

- We could put those into a `collectors` package or similar. I
  actually wanted to do that for the existing collectors, but I
  couldn't move most of them with the current package layout as that
  would create circular dependencies. So we could also put them into
  the normal `client_golang/prometheus` package, given that we have
  already four other `New...Collector` constructors. Please discuss!
  (o:

- For the SQL collector specifically, whoever implements it should
  look through the three known implementations so far (and perhaps
  find out if there are even more) and distill the best design and
  feature set out of them. (Note that we might also just want to add a
  collector for core features and point power users to a more specific
  implementation, cf. how `NewBuildInfoCollector` points to
  https://github.com/povilasv/prommod .)

-- 
Björn Rabenstein
[PGP-ID] 0x851C3DA17D748D03
[email] bjo...@rabenste.in

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-developers/20210323194554.GV2773%40jahnn.

Re: [prometheus-developers] Prometheus builds: Try Earthly?

2021-03-18 Thread Bjoern Rabenstein

On 17.03.21 11:43, Vlad A. Ionescu wrote:
> 
> Not sure if this is the right place for this question. Wondering if anyone
> is interested in trying https://github.com/earthly/earthly for the
> Prometheus build.
> 
> Earthly could help with reproducing CI failures locally (via containers)
> and for performing multiple isolated integration tests in parallel.
> 
> It works well on top of Circle CI.

The Prometheus project has its own organically grown build system, see
https://github.com/prometheus/promu , and rather elaborate CircleCI
setup. Not saying that's a perfect solution, but any new solution
needs to meet that bar, plus justify the effort and friction of
changing by some added relevant value.

>From that perspective, perhaps the first step should be to clarify
what would improve and what would change, and then convince
stakeholders that they actually want and need the improvements.

-- 
Björn Rabenstein
[PGP-ID] 0x851C3DA17D748D03
[email] bjo...@rabenste.in

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-developers/20210318154321.GE2773%40jahnn.

[prometheus-developers] PromCon CfP deadline this Friday! (was: PromCon 2021 virtEUl CfP now open)

2021-03-01 Thread Bjoern Rabenstein

Hi all,

In contrast to what was announced previously (see below), the deadline
for the PromCon online 2021 will be this Friday, 2021-03-05.
See https://promcon.io/2021-online/submit/ .

On 09.02.21 14:42, Richard Hartmann wrote:
> Dear all,
> 
> * CfP runs until 2021-04-02
> * Event co-hosted with KubeCon EU in week of May 3rd
> * We shall make the best of the online-only format
> 
> https://docs.google.com/forms/d/1mKYcdSIw02Dq_uP2gKOiHoOnOQr4W5jwXqdT_Zsk41Y
> 
> 
> Best,
> Richard
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Prometheus Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to prometheus-users+unsubscr...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/prometheus-users/CAD77%2BgS189V9BZ2aDJG4E_2c16SDXYZ51zWLNFZrjM2kvS%2BwUQ%40mail.gmail.com.

-- 
Björn Rabenstein
[PGP-ID] 0x851C3DA17D748D03
[email] bjo...@rabenste.in

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-developers/20210301190728.GN2754%40jahnn.

[prometheus-developers] Configuration: Should we generally enable reading secrets from files?

2021-02-18 Thread Bjoern Rabenstein

Hi Prometheans,

Container orchestration platforms like Kubernetes offer secrets
management. K8s provides those secrets directly to the Kubelet, or via
environment variables, or as files in a volume that containers can
mount, see
https://kubernetes.io/docs/concepts/configuration/secret/#overview-of-secrets
for details.

Good arguments have been made why secrets in environment variables are
problematic. In the Prometheus ecosystem, we have mostly converged on
using files in the scenario described here. That works just fine for
the password of HTTP basic auth, the bearer token, TLS certificates,
and probably more. However, there are a bunch of secrets in config
files (in particular for Prometheus itself and for the Alertmanager)
that _must_ be provided in the config file itself. (Search for
`` in the documentation of a config file to find all secrets.)
If you want to leverage the K8s secrets management for those, you have
to jump through hoops, i.e. set up an init container that creates a
config on the fly before starting the actual Prometheus or
Alertmanager binary.

My inner minister for consistency tells me we should either allow all
secrets to be provided in a file or none. My inner minister for user
experience tells me we can hardly make users jump through those hoops
for the secrets where we currently allow files.

So what do you think about generally providing a `xxx_file: ` config
option where we currently just allow `xxx: `? There are a lot
of those, but maybe it's the way to go?

-- 
Björn Rabenstein
[PGP-ID] 0x851C3DA17D748D03
[email] bjo...@rabenste.in

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-developers/20210218144952.GF2747%40jahnn.

[prometheus-developers] RFC: better histograms for Prometheus

2021-02-10 Thread Bjoern Rabenstein

Hi Prometheans,

tl;dr: I have just published a draft of a design document about
better histograms for Prometheus and would appreciate your feedback:
https://docs.google.com/document/d/1cLNv3aufPZb3fNfaJgdaRBZsInZKKIHo9E6HinJVbpM/edit?usp=sharing

As many of you might know, I have been (in-)famously working on
histograms for Prometheus from the very beginning of the Promethean
era. As Goutham has recently found out, I even mentioned histograms as
my favorite Prometheus topic during the very first conference talk
about Prometheus ever! (To be precise: It was SRECon Europe on
2015-05-14, during the Q, when none less than Brendan Burns asked
about the topic.)

What we currently have in Prometheus was only ever a prototype, at
least from my perspective. (o:

In an ideal world, I would have sat down back in 2015 and created the
document linked above. Too many distractions by other interesting or
even urgent things, I guess. To get the whole narrative, you could
check out my recent "histogram trilogy of talks" (which will also give
you the gist of the design document):

https://fosdem.org/2020/schedule/event/histograms/
https://promcon.io/2019-munich/talks/prometheus-histograms-past-present-and-future/
https://www.youtube.com/watch?v=HG7uzON-IDM

The bad news is that even after all those years, most of the work
still has to be done. Every layer of the Prometheus stack has to
change, which needs a coordinated effort. That's precisely the reason
why I created the design document, which you could also call an
RFC. After collecting your feedback, I hope to be able to evolve it
into something we can agree on as the way forward, serving as a master
plan to align the many detailed efforts that will have to follow.

I hope you will enjoy the read, at least somewhat...
--
Björn Rabenstein
[PGP-ID] 0x851C3DA17D748D03
[email] bjo...@rabenste.in

Re: [prometheus-developers] Docker images from scratch

2021-02-01 Thread Bjoern Rabenstein

On 31.01.21 17:32, Ben Kochie wrote:
> Another option is we could fully build our own busybox binary, with the
> necessary fixes.
> 
> I'm somewhat in favor of going distroless. With a large number of users
> using our container images in Kubernetes, it's less necessary to include
> busybox, as they can attach userspace sidecar containers.

I guess distroless would also simplify the question of how to include
all required licenses, simply by requiring a whole lot less of them.

-- 
Björn Rabenstein
[PGP-ID] 0x851C3DA17D748D03
[email] bjo...@rabenste.in

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-developers/20210201140259.GE3906%40jahnn.

Re: [prometheus-developers] Enable auto merge

2020-12-22 Thread Bjoern Rabenstein

On 22.12.20 17:17, Julien Pivotto wrote:
> Approve and auto merge are different. Auto merge is another value of the merge
> button, next to squash etc.

OK, that's different then. I have no objections to an additional
button that essentially says "merge this once the tests have passed,
and, if applicable, the necessary approval has been given".

(There are too many auto-merge scripts and actions floating around.)

-- 
Björn Rabenstein
[PGP-ID] 0x851C3DA17D748D03
[email] bjo...@rabenste.in

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-developers/20201222170649.GA29629%40jahnn.

Re: [prometheus-developers] Enable auto merge

2020-12-22 Thread Bjoern Rabenstein

On 16.12.20 21:33, Julien Pivotto wrote:
> 
> Can we enable the new github feature, auto-merge, in prometheus
> repositories?
> 
> It waits for everything to be green before merging.

Auto-merge assumes that all tests green and one valid approval means
"please merge". But I don't think that's true. I often approve a PR to
express "looks good to me but others might still chime in". That could
be the maintainer of the repo (or some other person specifically
qualified to review the PR). Those people should have the final call.

Or in other words: Having approval and merge as separate
human-initiated steps models the semantics just right, IMHO.

-- 
Björn Rabenstein
[PGP-ID] 0x851C3DA17D748D03
[email] bjo...@rabenste.in

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-developers/20201222155935.GL17627%40jahnn.

Re: [prometheus-developers] Multiple metrics path for Prometheus

2020-12-14 Thread Bjoern Rabenstein

There is probably same nuance in arguing if and when this is a good
idea and when not.

But in fact, the famous
https://github.com/kubernetes/kube-state-metrics is doing it. It's not
using different paths, but different ports, but that's kind of
similar.

On the Prometheus side, however, you need separate scrape
targets. There is currently no way of "iterating" through multiple
ports or paths of a target. From the Prometheus side, a different
port, a different path, or a different host is just all the same thing
in defining a different target. (And that probably won't change
anytime soon.)

-- 
Björn Rabenstein
[PGP-ID] 0x851C3DA17D748D03
[email] bjo...@rabenste.in

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-developers/20201214183220.prbv7alrfzainisr%40jahnn.

Re: [prometheus-developers] Lazy consensus: Merging options

2020-12-03 Thread Bjoern Rabenstein

On 03.12.20 14:15, Ben Kochie wrote:
> I'd like to adjust our defaults for GitHub merging settings:
> 
> Right now, we allow all three modes for PR merges.
> * Merge commits
> * Squash merging
> * Rebase merging
> 
> Proposal: Remove rebase merging (aka fast-forward merges) so that we stick to
> merge/squash and merge.

Clearly, merge commits and squashing happens most often.

I can see the occasional need for a rebase.

Even though I am a firm advocate of avoiding commit rewriting whenever
reasonably passible, I do think we need to keep the other options
around.

-- 
Björn Rabenstein
[PGP-ID] 0x851C3DA17D748D03
[email] bjo...@rabenste.in

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-developers/20201203174707.GI3432%40jahnn.

Re: [prometheus-developers] converting prometheus.Metric to data suitable for ingesting by prometheus write protocol.

2020-12-03 Thread Bjoern Rabenstein

On 02.12.20 02:15, Alexey Lesovsky wrote:
> I have metrics received from collector over channel like "chan<-
> prometheus.Metric".
> The 'prometheus.Metric' is an interface with Desc and Write methods from '
> github.com/prometheus/client_golang/prometheus' package.
> 
> I'd like to send these metrics into another Prometheus using its remote write
> protocol. My question is how to convert received 'prometheus.Metric' to
> 'prompb.Timeseries'
> I tried to play with Write method but had no success.

Those are really very different things. The types in the
github.com/prometheus/client_golang/prometheus package are for
instrumenting code and expose metrics to be scraped by the Prometheus
servers or other scrapers understanding the Prometheus exposition
format.

The Prometheus remote write protocol is a completely different beast
and used by the Prometheus server to send metrics to remote storage.

The two even have a very different data model: Structured and typed
metrics in the former case, just flat timestamped floating point
numbers in the latter. Single-point-in-time representation in the
former case, a notion of time series in the latter.

-- 
Björn Rabenstein
[PGP-ID] 0x851C3DA17D748D03
[email] bjo...@rabenste.in

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-developers/20201203173740.GH3432%40jahnn.

Re: [prometheus-developers] Delta usage issues？

2020-10-14 Thread Bjoern Rabenstein

On 13.10.20 01:22, linux...@gmail.com wrote:
> I want to get the difference between the current time and the past 5 minutes,
> but I tried two methods and couldn’t get it
> 
> 1. delta(isphone{name="qq",exname!~"test|test1"}[5m]) 
> 
> 2. sum_over_time(isphone{name="qq",exname!~"test|test1"}[5m]) - sum_over_time
> (isphone{name="qq",exname!~"test|test1"}[5m])offset 5m )
> 
> If I execute sum_over_time(isphone{name="qq",exname!~"test|test1"}[5m])
> directly, the data can be displayed normally, but I can’t get the difference
> between now and five minutes ago. Can anyone have a way? ?

I'd say the `offset` has to modify the selector directly:

  sum_over_time(isphone{name="qq",exname!~"test|test1"}[5m])
-
  sum_over_time(isphone{name="qq",exname!~"test|test1"}[5m] offset 5m)

In different news, this question is more a fit for the
prometheus-users mailing list:
https://groups.google.com/forum/#!forum/prometheus-users

-- 
Björn Rabenstein
[PGP-ID] 0x851C3DA17D748D03
[email] bjo...@rabenste.in

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-developers/20201014142931.GH3395%40jahnn.

Re: [prometheus-developers] pushgateway new release to address jquery css vuln. when?

2020-10-01 Thread Bjoern Rabenstein

On 23.09.20 10:48, Don450 wrote:
> My question is, when will the next release of  pushgateway?
> https://coderelease.io/github/repository/prometheus/pushgateway  
> 
> The need is to address security concern jquery < 3.5.0 (pushgateway v1.2.0
> release has jquery-3.4.1) CSS vuln.
> 
> This change has already been merged into master (updated to jquery-3.5.1)
> https://github.com/prometheus/pushgateway/commit/
> 3056a39317756d7225dbb1c88765e83091915211 

AFAIK, the Pushgateway doesn't use any of the vulnerable
functionality, so I wanted to batch up the next release with other
changes. Those never really materialized, and now it's 6 months since
the last release. I'll just cut a release today.

Thanks for the reminder.
-- 
Björn Rabenstein
[PGP-ID] 0x851C3DA17D748D03
[email] bjo...@rabenste.in

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-developers/20201001104450.GC29792%40jahnn.

Re: [prometheus-developers] Prometheus Persistence Volume(Kubernetes)

2020-09-23 Thread Bjoern Rabenstein

On 21.09.20 00:27, prabin...@gmail.com wrote:
> Memory Consumed by prometheus is keep on increasing day by day . Though Number
> of  Targets are same.

Yes, Prometheus uses as much RAM as possible for mmap'ing, making your
queries faster.

> What is is the persistence volume of prometheus , or how we can evaluate the
> same and limit our memory consumtion.

You don't have to. When the OS needs the RAM, it will simply take it
away from Prometheus.

There is a certain amount of memory that Prometheus needs to exist. If
you don't have enough for that, it will OOM.

You can look at
https://www.robustperception.io/how-much-ram-does-prometheus-2-x-need-for-cardinality-and-ingestion
, although a lot has been optimized over the last 1.5 years since that
article was published.

-- 
Björn Rabenstein
[PGP-ID] 0x851C3DA17D748D03
[email] bjo...@rabenste.in

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-developers/20200923145747.GH29792%40jahnn.

Re: [prometheus-developers] Remove /api/v2 of Prometheus

2020-08-27 Thread Bjoern Rabenstein

On 27.08.20 16:40, Julien Pivotto wrote:
> 
> Do we want to add in the 2.21 release notes that we will remove it in
> 2.22?

Good idea. But make clear that this is a weird API that probably next
to nobody is actually using. Along the lines: "In the unlikely case
that you use the HTTP API v2 (list the endpoint), please note that we
will remove this experimental API in the next minor release 2.22."

-- 
Björn Rabenstein
[PGP-ID] 0x851C3DA17D748D03
[email] bjo...@rabenste.in

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-developers/20200827144238.GK2356%40jahnn.

Re: [prometheus-developers] Docker Hub's upcoming 100 pulls per user/IP limit

2020-08-27 Thread Bjoern Rabenstein

On 26.08.20 15:58, Bartłomiej Płotka wrote:
> Quay does not plan anything like this, so we could go for quay if we want.

I hope the Quay plans won't change once most of the universe has
switched from Docker Hub to Quay and the resulting huge traffic
increase raises the attention of some people higher up the ranks at
Bluehat...

-- 
Björn Rabenstein
[PGP-ID] 0x851C3DA17D748D03
[email] bjo...@rabenste.in

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-developers/20200827144003.GJ2356%40jahnn.

Re: [prometheus-developers] Docker Hub's upcoming 100 pulls per user/IP limit

2020-08-27 Thread Bjoern Rabenstein

On 26.08.20 12:07, Julius Volz wrote:
> For a start, I filled out dummy answers (without submitting) to get to see all
> the subsequent pages of the application form. You can see all the questions
> they ask here:
> 
> https://docs.google.com/document/d/123fdfSGk5_tjdXAE0G1CeVcIpMBy9JBthwG0lwYjMXc
> /edit?usp=sharing
> 
> I could fill it out to the best of my ability, but want to give people a 
> chance
> to see the questions in case they have opinions on some of them.

The questions seem mostly harmless.

What I'm more wondering if the open-source plans will actually allow
anonymous our free-riding users to pull our images without limits. I
got the impression, Docket Hub is not so much trying to limit how
often a particular image is pulled (or images from a particular
project) but more how often a particular user or IP number pulls any
images.

If their plan is to "incentivize" users to create a paid user account,
they won't be keen on allowing unlimited access to images from popular
open-source projects (because those are probably the most frequently
pulled images in the first place).

-- 
Björn Rabenstein
[PGP-ID] 0x851C3DA17D748D03
[email] bjo...@rabenste.in

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-developers/20200827143734.GI2356%40jahnn.

Re: [prometheus-developers] Optimizing Histogram Buckets Format

2020-06-22 Thread Bjoern Rabenstein

Hi Bashar,

Thanks for your thoughts and ideas.

I more or less agree with all of them, namely:

- We need sparse histograms.

- Cumulative buckets, despite some tactical advantages, are
  problematic for sparse histograms, while on the other hand a bit of
  math can always emulate dropping buckets or allow Apdex calculation.

- We require a new type of histogram in the exposition format that is
  incompatible with the existing format.

- A repetitive representation of buckets in the exposition format is
  problematic, and becomes more problematic with more buckets. And no,
  compression isn't solving that magically.

I really want histograms to be cheap enogh so that they can be
partioned at will (by status code, path, ...) while still maintaining
a high resolution.

Your approach goes several steps towards this goal.

BUT (and here comes the big "but") it will not go far enough. What we
need, even with sparse histograms, is a histogram implementation that
is efficient enough to suppert hundreds of buckets in a single
histogram at a cost that is comparable or even lower than we have to
pay now for our existing ~10 bucket histograms. I expect that to
require quite invasive changes not only to the exposition format but
also to the way we store histograms in the TSDB and ultimately how we
represent and process them in PromQL.

Now you could say, why not iterate and slowly approach the goal. That
would be totally fine with an experimental software, and I can only
encourage you to play with your approach in an experimental fork. But
we cannot really have those incremental changes in the mainline
Prometheus releases as people will use them in production and then
require backwads compatible support. We cannot really have dozens of
mutually incompatibly ways of dealing with histograms in the released
Prometheus components.

That's why I've been experimenting for a while. I'm currently writing
up a design doc suggesting a plan for the changes we need throughout
the stack. It will not be a precise and perfect solution, but it will
sketch out the direction along which we can then work together towards
a solution. It will take a while before things have stabilized enough
to have them in the regular Prometheus releases. And that's a shame
because in the meantime, people are left with the existing solution
for their production uses – or they can go down the path of adopting
one of the experimental half-baked solution (of which there are more
than just yours) to solve their most pressing problems, with the price
of incompatibility with the future "proper" solution.

I'm currently very focused on getting that design doc done because it
will create the stage for further discussions and the foundation of an
informed decision which way to go.

Stay tuned, I'll publish it here on this list, hopefully very soon.
-- 
Björn Rabenstein
[PGP-ID] 0x851C3DA17D748D03
[email] bjo...@rabenste.in

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-developers/20200622210954.GT3365%40jahnn.

Re: [prometheus-developers] [VOTE] Allow Kelvin as temperature unit in some cases

2020-06-02 Thread Bjoern Rabenstein

Quoting the governance: “A vote may be called and closed early if
enough votes have come in one way so that further votes cannot change
the final decision.”

This vote is about the following proposal: “Allow Kelvin as a base
unit in certain cases and update our documented recommendation and the
linter code accordingly.”

We have now 11 YES votes vs. 1 NO vote with 8 team members not having
voted yet, so that I call the vote and close it early. The proposal
has passed.

-- 
Björn Rabenstein
[PGP-ID] 0x851C3DA17D748D03
[email] bjo...@rabenste.in

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-developers/20200602151526.GP2326%40jahnn.

Re: [prometheus-developers] [VOTE] Allow listing non-SNMP exporters for devices that can already be monitored via the SNMP Exporter

2020-05-29 Thread Bjoern Rabenstein

On 28.05.20 21:30, Julius Volz wrote:
> 
> I therefore call a vote for the following proposal:
> 
> Allow adding exporters to https://prometheus.io/docs/instrumenting/exporters/
>  although the devices or applications that they export data for can already be
> monitored via SNMP (and thus via the SNMP Exporter). This proposal does not
> affect other criteria that we may use in deciding whether to list an exporter
> or not.

YES

It would obviously be better if those exporter listing decisions would
"just work" with best judgement and we didn't need to vote about
individual guideline. However, the discussion in
https://github.com/prometheus/docs/pull/1640 circled back to the SNMP
Exporter argument multiple times. The single person on the one side of
the argument explained their concerns, they were considered, but
failed to convince. With the room leaning so obviously to the other
side, one might ask why that circling back had to happen. The vote can
help here to prune at least one branch of the meandering
discussion. In particular with the often used reasoning that "that's
how we did it before", it's good to know if perhaps "that's not how we
want to do it in the future".

Having said that, I do believe that we should have a more fundamental
discussion about revising "our" criteria of accepting exporter
listings. My impression is that the way it is done right now doesn't
represent our collective intentions very well. Even worse, I am fairly
certain that the process is partially defeating its purpose. In
particular, instead of encouraging the community to join efforts, we
are causing even more fragmentation. Which is really tragic, given how
much time and effort Brian invests in the review work. Kickstarting
such a discussion has been on my agenda for a long time, but given how
my past attempts to move the needle went, it appeared to be a quite
involved effort, for which I'm lacking the capacity. (Others told me
similar things, which reminds me of the "capitulation" topic in
RFC7282, where people cease to express their point of view because
"they don't have the energy to argue against it". Votes, like this
particular one, might then just be an attempt to get out of the many
branches and loops created by persistently upholding objections that
most of the room considers addressed already.)

-- 
Björn Rabenstein
[PGP-ID] 0x851C3DA17D748D03
[email] bjo...@rabenste.in

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-developers/20200529150058.GS2326%40jahnn.

Re: [prometheus-developers] Temperature histograms (was: Allow Kelvin as temperature unit in some cases)

2020-05-29 Thread Bjoern Rabenstein

On 29.05.20 14:42, Julien Pivotto wrote:
> 
> That said, I don't see why a fridge company couldn't have SLA on fridges
> temperature? (99.95% below -21°C; 99.99% below -18°C).

It would still be a gauge. And then you run `quantile_over_time` to
measure your SLO.

Note that you don't need a histogram to sample observations made in
Prometheus scrapes. You need a histogram if the monitored target
itself has to sample observations.

-- 
Björn Rabenstein
[PGP-ID] 0x851C3DA17D748D03
[email] bjo...@rabenste.in

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-developers/20200529125720.GR2326%40jahnn.

1 2 >

1 - 100 of 125 matches

Mail list logo