Re: [prometheus-users] Immediately pull metrics from target

2023-03-30 Thread Bo Liu
Sorry, forward to mail list manually.

Liu Bo 于2023年3月31日 周五上午9:53写道:

> Stuart Clark  writes:
>
> > On 2023-03-30 12:48, Ben Kochie wrote:
> >> No. As Brian says, it's intentional that this is not possible in order
> >> to avoid load spikes.
> >
> > And as Ben mentioned earlier the normal scrape intervals are usually
> > 15/30 seconds for normal metrics, or 1/2 minutes for slower use
> > cases. Therefore you'd only have to wait a short amount of time before
> > metrics start appearing - although often you need to wait for at least
> > a few scrapes to be able to do things like looking at counter increase
> > rates, etc.
>
> Thank you guys, I got it now, that's help me a lot.
>
> --
> Liu Bo
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/CABKmfmqdZJCN3Zp1AMTxoTCz1h4Wjp%2Bg30aEz%2Bk9-T3ptJfVvA%40mail.gmail.com.


Re: [prometheus-users] Re: How to define metric type as a variable

2023-03-30 Thread Brian Candler
> #metrics(5min) /#metrics(30 mins) > 50

Thinks: if you're only interested in the *number* of timeseries for each 
metric name, then you can do

count by (__name__) ({__name__=~".+"})

(warning: potentially expensive query if you have many timeseries). Then 
you could move the metric name into a label:

label_replace(count by (__name__) ({__name__=~".+"}), "metric", "$1", 
"__name__", "(.+)") * 1

At that point, you have something you could alert on. Example: find metrics 
which have at least 1% more timeseries than they did 30 minutes ago:

(label_replace(count by (__name__) ({__name__=~".+"}), "metric", "$1", 
"__name__", "(.+)") * 1) / (label_replace(count by (__name__) 
({__name__=~".+"} offset 30m), "metric", "$1", "__name__", "(.+)") * 1) > 
1.01

This won't detect *completely new* metrics which appear, but you could have 
a separate rule for these, e.g. (untested):

(label_replace(count by (__name__) ({__name__=~".+"}), "metric", "$1", 
"__name__", "(.+)") * 1) unless (label_replace(count by (__name__) 
({__name__=~".+"} offset 30m), "metric", "$1", "__name__", "(.+)") * 1)

Or to detect *every* new timeseries, including new timeseries for existing 
metrics:

{__name__=~".+"} unless {__name__=~".+"} offset 30m

On Friday, 24 March 2023 at 08:41:12 UTC Agarwal ,Naveen wrote:

> Thanks Brian. Insightful.  
>
> Sent from Outlook for Android 
> --
> *From:* promethe...@googlegroups.com  on 
> behalf of Brian Candler 
> *Sent:* Friday, March 24, 2023 1:05:30 PM
> *To:* Prometheus Users 
> *Subject:* [prometheus-users] Re: How to define metric type as a variable 
>  
> No, because binary operators like division are designed to work between 
> different metrics (with the same set of labels, but different metric name), 
> e.g. 
>
> node_filesystem_avail_bytes / node_filesystem_size_bytes
>
> You can however generate your alerting rules programatically: make a 
> script that writes out a rules file, then hits the reload 
>  
> endpoint.
>
> On Friday, 24 March 2023 at 00:57:16 UTC Agarwal ,Naveen wrote:
>
> Hi:
>
> Our prometheus database contains around 5k+ unique type of metrics. With 
> time, we have defined alerting rules to detect deviations. 
>
> However, given the number of growing metrics, it is becoming difficult to 
> expand the alerting rules. 
>
> Generally we are interested in increase/decrease of values in the metrics 
> when compared to a previous time-interval. Keeping this in mind, is it 
> possible to write a query where metric name is not specified, instead it 
> picks up all metric names available in database in sequence. 
>
>
> e.g. #metrics(5min) /#metrics(30 mins) > 50
> all unique metric names are picked from database. 
>
> Thanks, 
> Naveen
>
> Sent from Outlook for Android 
>
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Prometheus Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to prometheus-use...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/prometheus-users/11bed9ec-d186-4757-ac29-4d6147d61e76n%40googlegroups.com
>  
> 
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/57eb9103-fc93-4dca-b7c7-663ddcd6a453n%40googlegroups.com.


Re: [prometheus-users] Immediately pull metrics from target

2023-03-30 Thread Stuart Clark

On 2023-03-30 12:48, Ben Kochie wrote:

No. As Brian says, it's intentional that this is not possible in order
to avoid load spikes.


And as Ben mentioned earlier the normal scrape intervals are usually 
15/30 seconds for normal metrics, or 1/2 minutes for slower use cases. 
Therefore you'd only have to wait a short amount of time before metrics 
start appearing - although often you need to wait for at least a few 
scrapes to be able to do things like looking at counter increase rates, 
etc.


--
Stuart Clark

--
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/328bf2ccb6b6eabf4fcd4205caa50835%40Jahingo.com.


Re: [prometheus-users] Immediately pull metrics from target

2023-03-30 Thread Ben Kochie
No. As Brian says, it's intentional that this is not possible in order to
avoid load spikes.

On Thu, Mar 30, 2023, 12:46 Liu Bo  wrote:

> Ben Kochie  writes:
>
> > Prometheus is optimized to scrape at least every 1m. The typical
> > scrape interval is 15s.
>
> Thanks, so there's no way to actively ask prometheus to pull the
> target's metrics?
>
> --
> Liu Bo
>
> --
> You received this message because you are subscribed to the Google Groups
> "Prometheus Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to prometheus-users+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/prometheus-users/87tty2o6u1.fsf%40gmail.com
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/CABbyFmoS03AQBg76KAVT2chLpQucUWcnhOMWh2SD87Q-Z4QvwQ%40mail.gmail.com.


Re: [prometheus-users] Immediately pull metrics from target

2023-03-30 Thread Brian Candler
AIUI, it spreads them over the scrape interval. If it didn't, then when you 
add a whole load of new targets, they would all be scraped simultaneously 
in huge regular bursts.

On Thursday, 30 March 2023 at 11:46:06 UTC+1 Liu Bo wrote:

> Ben Kochie  writes:
>
> > Prometheus is optimized to scrape at least every 1m. The typical
> > scrape interval is 15s.
>
> Thanks, so there's no way to actively ask prometheus to pull the
> target's metrics?
>
> -- 
> Liu Bo
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/52126d24-fe01-462d-a002-cbcf28fc815cn%40googlegroups.com.


Re: [prometheus-users] Immediately pull metrics from target

2023-03-30 Thread Liu Bo
Ben Kochie  writes:

> Prometheus is optimized to scrape at least every 1m. The typical
> scrape interval is 15s.

Thanks, so there's no way to actively ask prometheus to pull the
target's metrics?

-- 
Liu Bo

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/87tty2o6u1.fsf%40gmail.com.


Re: [prometheus-users] Immediately pull metrics from target

2023-03-30 Thread Ben Kochie
Prometheus is optimized to scrape at least every 1m. The typical scrape
interval is 15s.

On Thu, Mar 30, 2023 at 12:01 PM Liu Bo  wrote:

>
> Hi guys, can I make prometheus to pull metrics from a target immediately
> after add this target to config?
>
> For now I have to wait scrape_interval after I add target to config,
> it's like if I set scrape_interval to 5m then I have to wait five
> minutes so that I cat query metrics from newly added target.
>
>
> --
> Liu Bo
>
> --
> You received this message because you are subscribed to the Google Groups
> "Prometheus Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to prometheus-users+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/prometheus-users/87ileik16e.fsf%40gmail.com
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/CABbyFmo1GX5X8MxVoFvmPoAo-9Sq%2BQg-wB6DFgo2DdnT09kJDg%40mail.gmail.com.


[prometheus-users] Immediately pull metrics from target

2023-03-30 Thread Liu Bo


Hi guys, can I make prometheus to pull metrics from a target immediately
after add this target to config?

For now I have to wait scrape_interval after I add target to config,
it's like if I set scrape_interval to 5m then I have to wait five
minutes so that I cat query metrics from newly added target.


-- 
Liu Bo

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/87ileik16e.fsf%40gmail.com.


Re: [prometheus-users] Separate endpoint for aggregate metrics?

2023-03-30 Thread Brian Candler
Right: high cardinality is bad. But what matters is the number of 
timeseries which have ingested a point in the last ~2 hours. Whether each 
of those time series ingests 1 data point or 10,000 data points in 2 hours 
makes almost no difference.  Therefore, scraping them less frequently 
doesn't fix the high cardinality problem at all.

You need to avoid those labels in Prometheus, so that the total number of 
timeseries (unique combinations of metric name + set of labels) is within a 
reasonable range.

What I would suggest is that you push the high cardinality data into a 
different system like Loki.  You can then either use LogQL queries to 
derive the lower-cardinality metrics to go into Prometheus, or export them 
separately.  (Instead of Loki you could also use Elasticseach/Opensearch, 
or a SQL database, or whatever)

Then you get the best of both worlds: fast timeseries data and querying 
from Prometheus, and the full data in Loki for deeper analysis.

Note that Loki streams are defined by labels, and you can use the same 
low-cardinality labels that you will use for Prometheus. Hence doing 
searches across timespans of raw data for a given set of labels still 
performs better than a "brute force" log scan *à la* grep.

For some use cases you may also find "exemplars" to be useful in 
Prometheus. These let you store *one* example detailed event which was 
counted in a bucket, against that bucket. There's a short 5 minute overview 
here: https://www.youtube.com/watch?v=TzFjweKACMY=1644s

On Thursday, 30 March 2023 at 06:19:07 UTC+1 Kevin Z wrote:

> This label scales as users interact with our server and create new 
> accounts. It is problematic right now because it currently is added to all 
> metrics.
>
> On Monday, March 27, 2023 at 1:39:57 AM UTC-7 Stuart Clark wrote:
>
>> On 2023-03-25 07:30, Kevin Z wrote: 
>> > Hi, 
>> > 
>> > We have a server that has a high cardinality of metrics, mainly due to 
>> > a label that is tagged on the majority of the metrics. However, most 
>> > of our dashboards/queries don't use this label, and just use aggregate 
>> > queries. There are specific scenarios where we would need to debug and 
>> > sort based on the label, but this doesn't happen that often. 
>> > 
>> > Is it a common design pattern to separate out two metrics endpoints, 
>> > one for aggregates, one for labelled metrics, with different scrape 
>> > intervals? This way we could limit the impact of the high cardinality 
>> > time series, by scraping the labelled metrics less frequently. 
>> > 
>> > Couple of follow-up questions: 
>> > - When a query that uses the aggregate metric comes in, does it matter 
>> > that the data is potentially duplicated between the two endpoints? How 
>> > do we ensure that it doesn't try loading all the different time series 
>> > with the label and then aggregating, and instead directly use the 
>> > aggregate metric itself? 
>> > - How could we make sure this new setup is more efficient than the old 
>> > one? What criteria/metrics would be best (query evaluation time? 
>> > amount of data ingested?) 
>> > 
>>
>> You certainly could split things into two endpoints and scrape at 
>> different intervals, however it is unlikely to make little/any 
>> difference. From the Prometheus side data points within a time series 
>> are very low impact. So for your aggregate endpoint you might be 
>> scraping every 30 seconds and the full data every 2 minutes (the slowest 
>> available scrape interval) meaning there are 4x less data points, which 
>> has very little memory impact. 
>>
>> You mention that there is a high cardinality - that is the thing which 
>> you need to fix, as that will be having the impact. You say there is a 
>> problematic label applied to most of the metrics. Can it be removed? 
>> What makes it problematic? 
>>
>> -- 
>> Stuart Clark 
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/96a0faf1-dfaf-4b76-92da-38befa8bc659n%40googlegroups.com.