*> The idea behind the rate() implementation is to not give the impression 
that the counter has actually been consistently increasing by 2 per second 
over the entire 5 minute input window if the series likely just started 
somewhere under the window (meaning that the rate was 0 or non-existent 
before that).*

Hmm... so I guess it's a kind of histogram thing, where the area of a bar 
(= width x height) implies the total quantity - in this case meaning 
increase(), which is rate() times period.. It's estimating "how much did X 
increase" and taking into account that the counter cannot have been 
negative.

However, looking at the other example where the counter goes from 1017 to 
1137 in 60 seconds (an increase of 120, with no zero crossing in the 
window): it extends by half a sample interval on each side, giving a range 
of 120 seconds, and proportionately scales up the increase from 120 to 240. 
It then assigns that entire increase to the window period of 300 seconds.  
Using rate = increase / window size, calculating the rate gives 240 / 300 = 
0.8, rather than what I was expecting (2, which is the slope).

That logic is, well, surprising to me.  I guess the question is, would I be 
more surprised to see increase(foo[5m]) equal to 600, given only those two 
data points?

In the past I have noticed rate graphs in Grafana behaving strangely for 
the first few samples of a new timeseries (being scraped from an SNMP 
device), and now I kind of understand it.

*> Btw. rate() hasn't always behaved like this. Here's a super old issue 
(that I actually made a lengthy comment on) and a PR by Björn to address 
it:*

Thanks for the links. I can understand the issue there: if a counter only 
increments occasionally, e.g.

(0 0 0 0) 0 1 1 1 1 1 2 2 2 2 2

and you are unlucky enough to pick up only the "0 1" at the start of the 
timeseries, you incorrectly extrapolate the rate to 1 / (sample interval).  
I don't think you'd need worry about this if the difference between the 
values is 2 or more: if the counter has incremented by N, then the average 
interval between those events must be somewhere between (sample interval / 
(N-1)) and (sample interval / N)

On Saturday, 14 October 2023 at 15:35:12 UTC+1 Julius Volz wrote:

> Hi Brian,
>
> On Fri, Oct 13, 2023 at 6:51 PM 'Brian Candler' via Prometheus Users <
> promethe...@googlegroups.com> wrote:
>
>> I can reproduce the results by extracting parts of the extrapolatedRate() 
>> function, but I still don't understand what it's doing.
>>
>> https://go.dev/play/p/ua3XGLEA0cI    # _ _ _ 17 137
>> https://go.dev/play/p/NoJ3FoKLeLG    # _ _ _ 1017 1137
>>
>> Taking the first one:
>> - It extrapolates backwards to find where the zero crossing ought to be 
>> (if it started linearly from zero). In the case of _ _ _ 17 137, this gives 
>> a time 8.5 seconds earlier, so the time between first and end points grows 
>> from 60 to 68.5 seconds.
>> - It then adds half the average interval, making 98.5 seconds.
>> - It then takes the increase and scales it by 98.5 / 60, where 60 is the 
>> true gap between the samples
>> - It multiplies the true increase in the value by 98.5/60, then divides 
>> by 300 which is the full width of the range
>>
>> 120 * (98.5/60) / 300 = .6566666666
>>
>> This calculates the extrapolated increase from time (t1 - 8.5) to (t2 + 
>> 30), but then it seems to be treating this increase as if it had been 
>> spread over the whole 5 minutes, rather than over 98.5 seconds?
>>
>
> Yes, I think this is intended even for rate() and embarrassingly a detail 
> I missed in my video about rates as well (
> https://www.youtube.com/watch?v=7uy_yovtyqw). That is, even for rate() it 
> is not always enough to take the first and last (extrapolated) samples and 
> look at the slope between them to get to the output per-second rate. It 
> only works when rate() extrapolates all the way to the window boundaries, 
> which it does not do in your example.
>
> The idea behind the rate() implementation is to not give the impression 
> that the counter has actually been consistently increasing by 2 per second 
> over the entire 5 minute input window if the series likely just started 
> somewhere under the window (meaning that the rate was 0 or non-existent 
> before that). So in that case we do want to smear out the actual increase 
> over the requested window range.
>
> Btw. rate() hasn't always behaved like this. Here's a super old issue 
> (that I actually made a lengthy comment on) and a PR by Björn to address it:
>
> - Issue: https://github.com/prometheus/prometheus/issues/581
> - https://github.com/prometheus/prometheus/pull/1295
>  
> I hope I didn't miss anything else in your points, since I didn't math 
> myself through everything in detail.
>
> On Friday, 13 October 2023 at 12:31:38 UTC+1 Brian Candler wrote:
>>
>>> (I copy pasted the wrong bit of screen from when I had eval_time: 5m30s, 
>>> but it doesn't affect the result)
>>>
>>> On Friday, 13 October 2023 at 12:30:16 UTC+1 Brian Candler wrote:
>>>
>>>> I've been using "promtool test rules" to see how rate() behaves with a 
>>>> timeseries that has recently started, and I am struggling to understand it.
>>>>
>>>> What would you think the following test should return?
>>>>
>>>> ```
>>>> evaluation_interval: 1m
>>>>
>>>> tests:
>>>>   - input_series:
>>>>       - series: foo
>>>>         values: '_ _ _ 17 137'
>>>>     interval: 1m
>>>>
>>>>     promql_expr_test:
>>>>       - expr: rate(foo[5m])
>>>>         eval_time: 4m30s
>>>>         exp_samples:
>>>>           - value: 2   # I expected: increase of 120 in 60 seconds
>>>>             labels: ""
>>>> ```
>>>>
>>>> Result (with prometheus 2.45.0):
>>>>
>>>> ```
>>>>   FAILED:
>>>>     expr: "rate(foo[5m])", time: 5m30s,
>>>>         exp: {} 2E+00
>>>>         got: {} 6.566666666666666E-01
>>>> ```
>>>>
>>>> That result is 197/300, and I have no idea how it derives this value!
>>>>
>>>> Now change the input data to:
>>>>
>>>> ```
>>>>         values: '_ _ _ 1017 1137'
>>>> ```
>>>>
>>>> and the result is 0.8 (=240/300, 192/240 or 48/60) - even though the 
>>>> input values are still 120 apart.
>>>>
>>>> Any clues as to how it gets these results?
>>>>
>>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Prometheus Users" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to prometheus-use...@googlegroups.com.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/prometheus-users/44e16f4a-9fcb-44fe-ad73-985cec076502n%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/prometheus-users/44e16f4a-9fcb-44fe-ad73-985cec076502n%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>
>
> -- 
> Julius Volz
> PromLabs - promlabs.com
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/7c2b991c-5ab5-4bb7-b9ce-55f8fb1af1e6n%40googlegroups.com.

Reply via email to