[prometheus-users] Re: 1 minute scrape, [1m] == no data ([90s] has data)

Brian Candler Fri, 02 Oct 2020 00:38:28 -0700

On Thursday, 1 October 2020 20:21:43 UTC+1, Laurent Demailly wrote:
>
> I described in detailed the problem in 
> https://github.com/prometheus/prometheus/issues/8001 (which was closed 
> but see details there)
>
> In short the default install from helm has 1 minute scrape which makes the 
> istio and kiali dashboards "empty" because they use [1m] in the query
>


You need to use at least [2m] for a rate query, if scraping at 1 minute 
intervals.
 

> Given a 1m scrape means the data is on average 30s old, I don't think 
> returning "no data" for queries is a very useful behavior, even if I was 
> scraping every 10 minutes I expect that any resolution I ask would be 
> extrapolated - but I guess I have the wrong expectations?
>
> Can someone talk me through why "no data" is the right answer for [1m] 
> while there is data for [90s]
>

rate() calculates the average rate between the *first* and *last* data 
points in the given time window.
irate() calculates the average rate between the *last two* data points in 
the given time window.

It uses the timestamps of the actual stored data points to calculate the 
rate, i.e. (v2-v1)/(t2-t1)    (**)

However, you need at least two data points to get an answer.  If your data 
is scraped at 1 minute intervals, then a 1-minute window will only ever 
contain one data point.  A 90-second window will sometimes contain two data 
points (in which case a rate is available), or one data point (in which 
case there is no answer).  If you graph this, the line will have gaps; to 
draw a point at time T, the rate shown is for the window between T-90 and 
T, which sometimes exists, and sometimes doesn't.

This is maybe surprising at first.  But it is consistent: for example, 
count_over_time(foo) will tell you the number of data points *within the 
window*.

When you do an instant query, then the value of a metric at query time T is 
nearest *previous* value of the metric.  So you might have expected 
rate(foo[1m]) to take the value of foo at the end of the window, and the 
value of foo at the start of the window, and calculate the rate between 
those.  But that's not how it works, for several reasons.  One is that it 
would have to look backwards *before* the start of the window to find the 
previous value (an instant query, by default, looks back up to 5 minutes).  
Another is because the rate would bounce up and down as points enter and 
leave the window, whereas prometheus calculates an accurate rate between 
two timestamped values.

(**) That is a simplified description, because there is additional work to 
handle counter resets.  Basically, only periods of time within the window 
where the counter is not decreasing are considered, and an average rate is 
calculated from these.

For a slightly longer description, see:
https://groups.google.com/d/topic/prometheus-users/W6YjNVhhKRc/discussion
https://groups.google.com/d/topic/prometheus-users/InggG_a5JTY/discussion

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/d938c169-1375-4cc7-994e-11915db8627eo%40googlegroups.com.

[prometheus-users] Re: 1 minute scrape, [1m] == no data ([90s] has data)

Reply via email to