Re: [prometheus-users] Re: Query data is empty when step is 1h in query_range api

Brian Candler Sat, 18 Mar 2023 01:43:02 -0700

> I try to set step to 1m and prometheus give me metrics back with per
> minute.
> How prometheus group data to per minute?
> I mean I set scrape_interval to 1h and it should be just one metric in
> an hour right?

No, that's not right.

When you do call the query_range API, you give it an *instant vector* 
query, a start time, end time and step.  Prometheus evaluates this 
expression repeatedly at different times across the requested range: t, 
t+s, t+2s, t+3s etc.

The value of a metric at any particular time "t" is defined to be the most 
recent value of that metric *at or before* time t, looking back in time up 
to --query.lookback-delta [default 5m] to find the most recent value.

It's necessary to work this way if you think about it.  Suppose you give a 
query which works across multiple timeseries, like sum(foo) where there are 
multiple timeseries of the metric "foo" (with different labels).  Those 
values will almost certainly have been sampled at different points in 
time.  To sum them, you have to pick all their values at a common point in 
time, which is the time of the result.

Hence the result of such a query_range is the data resampled at the step 
interval.

If you want to access the raw data in the database, then you can send a 
*range vector* expression, e.g. foo[24h], to the *query* 
endpoint 
https://prometheus.io/docs/prometheus/latest/querying/api/#instant-queries

This will give you the raw data in that period (i.e. the 24 hours up to the 
evaluation time), and each data point will have its original timestamp.  
However, there are only a very few queries that can be built this way: 
basically just plain metrics.  If you try to generate a range vector from a 
more complex expression, you'll have to build a subquery 
<https://prometheus.io/docs/prometheus/latest/querying/basics/#subquery>, 
which again involves sweeping an instant query across a range with a fixed 
step: e.g. sum(foo)[24h:1m]

> Yeah we don't worry about about storage, actuall we are worry about
> bandwidth cost, it because we are trying to get kubernetes metris(mainly
> for cost) form customer's cloud cluster(GCP, AWS etc...), and generate a
> cost report for our customer.
> The report's min time step is Hour

In that case I would be inclined to:

- set up an hourly scrape using a cronjob and curl
- get curl to write the data to a file
- get prometheus to scrape the contents of this file (e.g. using 
node_exporter textfile collector, or just serve the file using a webserver 
like Apache)

Prometheus can then scrape this data at 2 minute intervals.

If you're using the textfile collector, then it also exposes a metric 
giving the mtime of the file.  This allows you to write alerting rules to 
detect when a file hasn't been updated for more than a certain amount of 
time (say more than 90 minutes)

Alternatively: perhaps prometheus isn't the right tool for the job here. 
You might be better off putting your hourly reports into a SQL database, or 
something which stores events like Loki or Elasticsearch.

On Saturday, 18 March 2023 at 08:18:40 UTC Brian Candler wrote:

> [forwarding message manually: for some reason it arrived at mine but not 
> in the list]
>
> On 18/03/2023 07:43, Liu Bo wrote:
> > Brian Candler <b.ca...@pobox.com> writes:
> >
> >> You should never use any scrape interval greater than 2 minutes. 
> Prometheus considers data
> >> more than 5 minutes old as "stale", unless you've tweaked other 
> settings (and it's not
> >> recommended).
> >>
> >> https://www.robustperception.io/keep-it-simple-scrape_interval-id/
> > Thanks, I just know it today and didn't find it's document before,
> > missed that I think.
> >
> >> Don't worry about storage. Prometheus compresses extremely efficiently, 
> so even if you scrape
> >> the same data 30 times, the delta between them is zero and it will take 
> almost no storage space
> >> at all.
> > Yeah we don't worry about about storage, actuall we are worry about
> > bandwidth cost, it because we are trying to get kubernetes metris(mainly
> > for cost) form customer's cloud cluster(GCP, AWS etc...), and generate a
> > cost report for our customer.
> > The report's min time step is Hour so I set scrape_interval to 1h, and
> > I think we and our customer both can't accept the bandwidth cost form
> > prometheus if scrape_interval is 1m or 2m.
> >
> > And I found that I can use function last_over_time to get last metric in
> > an hour and ignore this's metric specific timestamp.
> > I set last_over_time range to 1h, and then set Step to 1h, then got
> > one metric per hour, and if I want get one metric per day I can set Step
> > to 24h, it seems work.
> >
> > But now I have some confused about this Step parameter, document here:
> > https://prometheus.io/docs/prometheus/latest/querying/api/#range-queries
> >
> > I try to set step to 1m and prometheus give me metrics back with per
> > minute.
> > How prometheus group data to per minute?
> > I mean I set scrape_interval to 1h and it should be just one metric in
> > an hour right?
> >
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/5fe12dcb-5729-4d90-8265-e16fb761b0aan%40googlegroups.com.

Re: [prometheus-users] Re: Query data is empty when step is 1h in query_range api

Reply via email to