Re: [prometheus-developers] Prometheus Pushgateway TTL

2023-07-12 Thread Bjoern Rabenstein
On 12.07.23 10:10, E wrote:
> I think optional TTL per time series is a good idea. It might have several
> use cases, it doesn't break anything, and it shouldn't be too hard to make.
> So why not?

Because all the use cases discussed so far have turned out to be
anti-patterns we don't want to support. This topic was brought up
multiple times at dev-summits etc., and the outcome was always the
same.

> I might have used this feature to trigger short-lived alerts with arbitrary
> text in a label, something I wouldn't do without TTL because it would
> require a cleanup.

I don't quite understand that use case, but feel free to flesh it out
a bit more and propose it as a topic for the dev-summit by adding it
to the agenda:
https://docs.google.com/document/d/11LC3wJcVk00l8w5P3oLQ-m3Y37iom6INAMEu2ZAGIIE/edit?pli=1
 

-- 
Björn Rabenstein
[PGP-ID] 0x851C3DA17D748D03
[email] bjo...@rabenste.in

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-developers/ZK6TOzCck1MSMuW0%40mail.rabenste.in.


Re: [prometheus-developers] Prometheus Pushgateway TTL

2023-07-12 Thread Bjoern Rabenstein
On 11.07.23 15:23, 'Braden Schaeffer' via Prometheus Developers wrote:
> They could live for 5s or 1 hour.

The whole idea of a Prometheus counter doesn't really make sense for a
job that lives for just 5s, if you are scraping every 15s or every
minute or so.

And a job that lives for 1 hour should be scraped directly.

So in the first case, using a counter doesn't make sense, and in the
second case using the Pushgateway doesn't make sense.

> Does it really matter what you send to pushgateway?  It supports
> counters so why not push them?

We could be stricter and just reject counters being pushed to the
Pushgateway, but that would be a breaking change. Historically, the
metric type information in Prometheus was (and to a good part still
is) some kind of "weak typing", so no hard restrictions were imposed
(you can apply `rate` to a gauge or `delta` to a counter without
Prometheus complaining about it).

Also, it feels natural to count "records backed up by the daily
database back up job" in a counter and push it to the
Pushgateway. However, when it arrives on your Prometheus server, it
doesn't really behave as a counter. Summing those values up across
instances is really painful with PromQL, and the reason for that is
that we are essentially handling events here, for which Prometheus as
a whole wasn't really designed.

If you really have to use Prometheus for that case, the "least bad"
solutions I know of is statsd with the statsd-exporter (
https://github.com/prometheus/statsd_exporter ) or the
prom-aggregation-gateway
( https://github.com/zapier/prom-aggregation-gateway ).

A TTL doesn't really address the fundamental problem. It might enable
a very brittle solution that is worse than the solution that are
already available.

-- 
Björn Rabenstein
[PGP-ID] 0x851C3DA17D748D03
[email] bjo...@rabenste.in

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-developers/ZK6SUuQBF4657EAX%40mail.rabenste.in.


Re: [prometheus-developers] Prometheus Pushgateway TTL

2023-07-12 Thread E
I think optional TTL per time series is a good idea. It might have 
several use cases, it doesn't break anything, and it shouldn't be too 
hard to make. So why not?
I might have used this feature to trigger short-lived alerts with 
arbitrary text in a label, something I wouldn't do without TTL because 
it would require a cleanup.


--
Best regards,
Evgeniy Yunkin

On 11/07/2023 22:23, 'Braden Schaeffer' via Prometheus Developers wrote:
They could live for 5s or 1 hour.  Does it really matter what you send 
to pushgateway? It supports counters so why not push them?


A TTL is all we need here.

On Sat, Jul 1, 2023 at 5:32 PM Bjoern Rabenstein  
wrote:


On 29.06.23 08:47, 'Braden Schaeffer' via Prometheus Developers wrote:
> It's the same as calculating the total incoming request rate of
N pods in a
> deployment: sum(rate(grpc_request_count{service=foo}[5m]))

🤔 I'm surprised that you seem to push a counter metric to the
Pushgateway.

I would say the intended use case for the Pushgateway is that a
batch job pushes its metrics upon completion. That means you only ever
have one value of those metrics, so a `rate` on those would always
result in zero.

Are you perhaps pushing multiple times during the runtime of your
batch jobs? That would be weird indeed for a PGW use case. Why don't
you just scrape your jobs normally then?

-- 
Björn Rabenstein

[PGP-ID] 0x851C3DA17D748D03
[email] bjo...@rabenste.in

--
You received this message because you are subscribed to the Google 
Groups "Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send 
an email to prometheus-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-developers/CAKG2A1cBTOk_wrJi%2B2uO_2y6LKG0t0AMpMptBCdM6yTR_cTDxg%40mail.gmail.com 
.


--
You received this message because you are subscribed to the Google Groups 
"Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-developers/12953903-7092-2b9c-d26f-7baf5b2c5c90%40gmail.com.


Re: [prometheus-developers] Prometheus Pushgateway TTL

2023-07-11 Thread 'Braden Schaeffer' via Prometheus Developers
They could live for 5s or 1 hour.  Does it really matter what you send to
pushgateway? It supports counters so why not push them?

A TTL is all we need here.

On Sat, Jul 1, 2023 at 5:32 PM Bjoern Rabenstein  wrote:

> On 29.06.23 08:47, 'Braden Schaeffer' via Prometheus Developers wrote:
> > It's the same as calculating the total incoming request rate of N pods
> in a
> > deployment: sum(rate(grpc_request_count{service=foo}[5m]))
>
> 🤔 I'm surprised that you seem to push a counter metric to the
> Pushgateway.
>
> I would say the intended use case for the Pushgateway is that a
> batch job pushes its metrics upon completion. That means you only ever
> have one value of those metrics, so a `rate` on those would always
> result in zero.
>
> Are you perhaps pushing multiple times during the runtime of your
> batch jobs? That would be weird indeed for a PGW use case. Why don't
> you just scrape your jobs normally then?
>
> --
> Björn Rabenstein
> [PGP-ID] 0x851C3DA17D748D03
> [email] bjo...@rabenste.in
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-developers/CAKG2A1cBTOk_wrJi%2B2uO_2y6LKG0t0AMpMptBCdM6yTR_cTDxg%40mail.gmail.com.


Re: [prometheus-developers] Prometheus Pushgateway TTL

2023-07-01 Thread Bjoern Rabenstein
On 29.06.23 08:47, 'Braden Schaeffer' via Prometheus Developers wrote:
> It's the same as calculating the total incoming request rate of N pods in a
> deployment: sum(rate(grpc_request_count{service=foo}[5m]))

🤔 I'm surprised that you seem to push a counter metric to the
Pushgateway.

I would say the intended use case for the Pushgateway is that a
batch job pushes its metrics upon completion. That means you only ever
have one value of those metrics, so a `rate` on those would always
result in zero.

Are you perhaps pushing multiple times during the runtime of your
batch jobs? That would be weird indeed for a PGW use case. Why don't
you just scrape your jobs normally then?

-- 
Björn Rabenstein
[PGP-ID] 0x851C3DA17D748D03
[email] bjo...@rabenste.in

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-developers/ZKCbdJPyXMrHHvaa%40mail.rabenste.in.


Re: [prometheus-developers] Prometheus Pushgateway TTL

2023-06-29 Thread 'Braden Schaeffer' via Prometheus Developers
It's the same as calculating the total incoming request rate of N pods in a
deployment: sum(rate(grpc_request_count{service=foo}[5m]))

For our use case, we have N kubernetes jobs being spun up pushing many
metrics, like grpc_request_count, to pushgateway. If I want to know the
rate of outgoing requests, I have to have unique labels on the metric for
prometheus to accurately calculate the request rate of the service as a
whole. Without them they just overwrite each other. For pushgateway,
though, those unique ids are never GCd like they are in prometheus.

So, every day a service creating 2 pods every 5 minutes and writing just a
single grpc_request_count metric with 20 label values to pushgateway will
have a cardinality of (2 pod ids * 288 times a day * 20 grpc_request_count
labels). That's a cardinality of 11k series per day for that one
metric from that one service that will not be GC'd by pushgateway until I
restart it.

On Thu, Jun 29, 2023 at 7:03 AM Bjoern Rabenstein 
wrote:

> On 14.06.23 13:10, 'Braden Schaeffer' via Prometheus Developers wrote:
> >
> > The most basic example, two batch jobs that produce the same metrics
> (grpc
> > or http metrics). This is not just `last_completed_at` or something as I
> > have seen before where its the same metric being updated over and over
> > agin. You have to include a label that identifies these jobs as
> different
> > so that metrics like gRPC request rates can be calculated correctly. In
> the
> > kubernetes world this usually means pod ID. Simple enough until you have
> > 1000s of these pod IDs compounded by other labels.
>
> I don't fully understand what you are trying to do. Could you explain
> what metrics you are pushing exactly, and what PromQL expressions you
> are using to "correctly calculate a gRPC request rate"?
>
> --
> Björn Rabenstein
> [PGP-ID] 0x851C3DA17D748D03
> [email] bjo...@rabenste.in
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-developers/CAKG2A1cgwqxMQxVYKkBK6r3crFDOh3iZckCbMhuYUBSbLwwVwg%40mail.gmail.com.


Re: [prometheus-developers] Prometheus Pushgateway TTL

2023-06-29 Thread Bjoern Rabenstein
On 14.06.23 13:10, 'Braden Schaeffer' via Prometheus Developers wrote:
> 
> The most basic example, two batch jobs that produce the same metrics (grpc 
> or http metrics). This is not just `last_completed_at` or something as I 
> have seen before where its the same metric being updated over and over 
> agin. You have to include a label that identifies these jobs as different 
> so that metrics like gRPC request rates can be calculated correctly. In the 
> kubernetes world this usually means pod ID. Simple enough until you have 
> 1000s of these pod IDs compounded by other labels.

I don't fully understand what you are trying to do. Could you explain
what metrics you are pushing exactly, and what PromQL expressions you
are using to "correctly calculate a gRPC request rate"?

-- 
Björn Rabenstein
[PGP-ID] 0x851C3DA17D748D03
[email] bjo...@rabenste.in

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-developers/ZJ1k6tQ/nrX8cgP%2B%40mail.rabenste.in.


[prometheus-developers] Prometheus Pushgateway TTL

2023-06-14 Thread 'Braden Schaeffer' via Prometheus Developers
I wanted to reopen this discussion because I am having a very difficult 
time understanding how pushgateway can be the suggested solution for batch 
job metric collection, yet simultaneously batch jobs are not a great use 
case example for why metric TTLs are needed in push gateway.

The most basic example, two batch jobs that produce the same metrics (grpc 
or http metrics). This is not just `last_completed_at` or something as I 
have seen before where its the same metric being updated over and over 
agin. You have to include a label that identifies these jobs as different 
so that metrics like gRPC request rates can be calculated correctly. In the 
kubernetes world this usually means pod ID. Simple enough until you have 
1000s of these pod IDs compounded by other labels.

By now we all know those metrics are going to stay around forever, but I 
don't understand why the answer to this problem is "this is not a a good 
use case". For push gateway? For TTL? What am I doing wrong? 

I've got a pipeline and library code streamlined for prometheus metric 
collection and the only solution I have seen offered at all is "use 
statsd". No. That's silly. I need new clients and two ways of defining 
metrics in code to account for each potential storage solution. Two APIs. 
etc.

Can someone please help me understand why pushgateway's existence is not 
reason enough to implement TTL?

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-developers/2ccd5a73-88c9-4eb3-bbcf-1c53dca11b0en%40googlegroups.com.