Re: [prometheus-users] Guaranteed ingestion of metrics with historical timestamps

2022-06-18 Thread Ben Kochie
For this use case, it's likely what they want is Prometheus in agent mode,
which uses remote write, which can buffer and catch up.

On Sat, Jun 18, 2022, 1:13 PM Stuart Clark  wrote:

> On 14/06/2022 18:32, Jeremy Collette wrote:
>
> Hello,
>
> We have written a custom exporter that exposes metrics with explicit
> timestamps, which Prometheus periodically scrapes. In the case where
> Prometheus becomes temporarily unavailable, these metric samples will be
> cached in the exporter until they are scraped, causing affected metrics to
> age.
>
> I understand that if a metric is older than a certain threshold, it will
> be rejected by Prometheus with the message:  "Error on ingesting samples
> that are too old or are too far into the future".
>
> I'm trying to understand if there are any guarantees surrounding the
> ingestion of historical metrics. Is there some metric sample age that is
> guaranteed to be recent enough to be ingested? For example, are samples
> with timestamps within the last hour always going to be considered recent?
> Within the last five minutes?
>
> According to this previous thread: Error on ingesting samples that are
> too old
> ,
> MR seems to indicate that metrics as old as 1 second can be dropped due to
> being too old. Is this interpretation correct? If so, is there any way to
> ensure metrics with timestamps won't be dropped for being too old?
>
> The use of timestamps in metrics is not something that should be used
> except in some very specific cases. The main use case for adding a
> timestamp is when you are scraping metrics into Prometheus that have been
> sourced from another existing metrics system (for example things like the
> Cloudwatch Exporter). You also mention something about your exporter
> caching things until they are scraped, which also sounds like something
> that is not advisable. The action of the exporter shouldn't really be
> changing depending on the requests being received (or not received).
>
> An exporter is expected to return the various metrics that reflect "now",
> in the same way that a directly instrumented application would be expected
> to return the current state of the metrics being maintained in memory. For
> a simple exporter the normal mechanism is for a request to be received
> which then triggers some mechanism to generate the metrics. For example
> with something like the MySQL Exporter a request would trigger a query on
> the connected database which then returns various information that is
> converted into Prometheus metrics and returned. In some situations the
> process to fetch information from the underlying system can be quite
> resource intensive or slow. In that case a common design is to decouple the
> information fetching process from the request handling process. One example
> is to perform the information fetching process on a periodic timer, with
> the information fetched then stored in memory. The request process then
> reads and returns that information - returning the same values for every
> request until the next cycle of the information fetching process. In none
> of these standard scenarios would you expect timestamps to be attached to
> the returned metrics.
>
> It would be good to hear a bit more about what you are trying to do, as it
> is highly likely that the use of timestamps in your use case is probably
> not the right option and they should just be dropped.
>
> --
> Stuart Clark
>
> --
> You received this message because you are subscribed to the Google Groups
> "Prometheus Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to prometheus-users+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/prometheus-users/007cf462-d87e-d4c7-316e-4007567c74a1%40Jahingo.com
> 
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/CABbyFmoHG%3DeEJbQXWM4QaN%2BDuwcBfzBLPNktSzcMLk6Q7oQViw%40mail.gmail.com.


Re: [prometheus-users] Guaranteed ingestion of metrics with historical timestamps

2022-06-18 Thread Stuart Clark

On 14/06/2022 18:32, Jeremy Collette wrote:

Hello,

We have written a custom exporter that exposes metrics with explicit 
timestamps, which Prometheus periodically scrapes. In the case where 
Prometheus becomes temporarily unavailable, these metric samples will 
be cached in the exporter until they are scraped, causing affected 
metrics to age.


I understand that if a metric is older than a certain threshold, it 
will be rejected by Prometheus with the message: "Error on ingesting 
samples that are too old or are too far into the future".


I'm trying to understand if there are any guarantees surrounding the 
ingestion of historical metrics. Is there some metric sample age that 
is guaranteed to be recent enough to be ingested? For example, are 
samples with timestamps within the last hour always going to be 
considered recent? Within the last five minutes?


According to this previous thread: Error on ingesting samples that are 
too old 
, 
MR seems to indicate that metrics as old as 1 second can be dropped 
due to being too old. Is this interpretation correct? If so, is there 
any way to ensure metrics with timestamps won't be dropped for being 
too old?


The use of timestamps in metrics is not something that should be used 
except in some very specific cases. The main use case for adding a 
timestamp is when you are scraping metrics into Prometheus that have 
been sourced from another existing metrics system (for example things 
like the Cloudwatch Exporter). You also mention something about your 
exporter caching things until they are scraped, which also sounds like 
something that is not advisable. The action of the exporter shouldn't 
really be changing depending on the requests being received (or not 
received).


An exporter is expected to return the various metrics that reflect 
"now", in the same way that a directly instrumented application would be 
expected to return the current state of the metrics being maintained in 
memory. For a simple exporter the normal mechanism is for a request to 
be received which then triggers some mechanism to generate the metrics. 
For example with something like the MySQL Exporter a request would 
trigger a query on the connected database which then returns various 
information that is converted into Prometheus metrics and returned. In 
some situations the process to fetch information from the underlying 
system can be quite resource intensive or slow. In that case a common 
design is to decouple the information fetching process from the request 
handling process. One example is to perform the information fetching 
process on a periodic timer, with the information fetched then stored in 
memory. The request process then reads and returns that information - 
returning the same values for every request until the next cycle of the 
information fetching process. In none of these standard scenarios would 
you expect timestamps to be attached to the returned metrics.


It would be good to hear a bit more about what you are trying to do, as 
it is highly likely that the use of timestamps in your use case is 
probably not the right option and they should just be dropped.


--
Stuart Clark

--
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/007cf462-d87e-d4c7-316e-4007567c74a1%40Jahingo.com.


[prometheus-users] Central pushgateway: limits/sizing for pushgateway

2022-06-18 Thread DerekLai Devops
We have been using Prometheus to monitor our AKS clusters for the various 
departments. Each has their own AKS cluster for ease of management/billing.

Now the users would like to setup pushgateways to help with getting metrics 
on jobs which is shorted lived and can't be scraped. Management would like 
to setup just 1 "central" pushgateway. We have a total of about 10 
Prometheus instances.

I know this is probably not recommended due to potential single point of 
failure/bottleneck issues. However, just wanted to find out the limit in 
case the reverse is true - that we should setup multiple pushgateway due to 
performance issues. Still trying to get a sense of how many batch jobs and 
how frequently they run from the users. 

Is there any kind of sizing guide for the pushgateway itself?


Thanks,

Derek

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/f92b5894-0554-40df-a4b4-aadeedbf9d79n%40googlegroups.com.


[prometheus-users] Guaranteed ingestion of metrics with historical timestamps

2022-06-18 Thread Jeremy Collette
Hello,

We have written a custom exporter that exposes metrics with explicit 
timestamps, which Prometheus periodically scrapes. In the case where 
Prometheus becomes temporarily unavailable, these metric samples will be 
cached in the exporter until they are scraped, causing affected metrics to 
age. 

I understand that if a metric is older than a certain threshold, it will be 
rejected by Prometheus with the message:  "Error on ingesting samples that 
are too old or are too far into the future".

I'm trying to understand if there are any guarantees surrounding the 
ingestion of historical metrics. Is there some metric sample age that is 
guaranteed to be recent enough to be ingested? For example, are samples 
with timestamps within the last hour always going to be considered recent? 
Within the last five minutes?

According to this previous thread: Error on ingesting samples that are too 
old 
, 
MR seems to indicate that metrics as old as 1 second can be dropped due to 
being too old. Is this interpretation correct? If so, is there any way to 
ensure metrics with timestamps won't be dropped for being too old?


Cheers,

Jeremy

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/8b008a36-2a55-46ce-bde1-cd7f70d09c7en%40googlegroups.com.


Re: [prometheus-users] Re: Does Prometheus recommend exposing 2M timeseries per scrape endpoint?

2022-06-18 Thread Matthias Rampke
One place where time series of this magnitude from a single target are
unfortunately common is kube-state-metrics
 (KSM). On a large
cluster, I see almost 1M metrics. Those are relatively cheap because they
are nearly constant and compress well, but I believe there was quite some
work in that project to make scraping work well from the target side. This
includes playing with compression - depending on your network it may be
faster to stream uncompressed, than to compress and uncompress.

In summary, 2M time series from a single target is unusual but not without
precedent. Look at KSM for issues that they encountered and possible
solutions.

/MR

On Tue, Jun 14, 2022 at 2:44 PM l.mi...@gmail.com 
wrote:

> Total number of time series scraped would be more important I think, so
> you also need to know how many targets you'll have.
> I had Prometheus servers scraping 20-30M time series total and that was
> eating pretty much all memory on server with 256GB ram.
> In general when doing capacity planning we expect 4KB of memory per time
> series for base Go memory, and then we need to double that for garbage
> collector (you can try to tweak GOGC env variable to trade some cpu for
> less gc memory overhead).
> With 25M time series 4KB per series means 100GB of Go allocations, and
> 200GB to account for garbage collector, which usually fits 256GB.
> But we do run a huge number of services, so Prometheus will scrape lots of
> targets and get a small number of metrics from each.
> You want to scrape 2M from a single target and that means Prometheus will
> have to request, read and parse a huge response body, this might require
> more peak memory and it might be slow, so your scrape interval would have
> to allow for that.
> Another thing to remember is churn - if your time series have labels that
> keep changing all the time then you might run out of memory, since
> everything that prometheus scrapes (even only once) ends up in memory until
> it persists data to disk, which is by default every 2h AFAIR. If the list
> of values of your APN is not a fixed set and you keep seeing random values
> over time, then that will accumulate in memory, so your capacity planning
> would have to take into account how many unique values of APN (and other
> values) are there and if this is going to grow over time. That's assuming
> you want to stick with a single prometheus instance, if you can shard your
> scrapes then you can scale horizontally.
>
> It's always hard to give a concrete answer to question like this since it
> all depends, but it's usually a matter of having enough memory, cpu is
> typically (in my environment at least) less important.
> On Tuesday, 14 June 2022 at 12:13:24 UTC+1 vteja...@gmail.com wrote:
>
>> I have a use case where a particular service (that can be horizontally
>> scaled to desired replica count) exposes a 2 Million time series.
>> Prometheus might expect huge resources to scrape such service (this is
>> normal). But I'm not sure if there is a recommendation from the community
>> on instrumentation best practices and maximum count to expose.
>>
>> Thanks,
>> Teja
>>
> --
> You received this message because you are subscribed to the Google Groups
> "Prometheus Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to prometheus-users+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/prometheus-users/9c494e40-fe19-4252-a9f0-5d024c04d8b4n%40googlegroups.com
> 
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/CAMV%3D_gbZcZJGcp9EXRuL0YqYqzyDKh11R54U88C7f-yH8pz9LQ%40mail.gmail.com.