Re: [prometheus-users] Re: Big spike when using rate() - doesn't seem to be a counter reset

Ben Kochie Wed, 06 Apr 2022 05:53:28 -0700

Yup, PromQL thinks there's a small dip in the data. I'm not sure why tho. I
took your raw values:


225201
225226
225249
225262
225278
225310
225329
225363
225402
225437
225466
225492
225529
225555
225595

$ awk '{print $1-225201}' values
0
25
48
61
77
109
128
162
201
236
265
291
328
354
394

I'm not seeing the reset there.

One thing I noticed, your data interval is 60 seconds and you are doing a
rate(counter[1m]). This is not going to work reliably, because you are
likely to not have two samples in the same step window. This is because
Prometheus uses millisecond timestamps, so if you have timestamps at these
times:

5.335
65.335
125.335

Then you do a rate(counter[1m]) at time 120 (Grafana attempts to align
queries to even minutes for consistency), the only sample you'll get back
is 65.335.

You need to do rate(counter[2m]) in order to avoid problems.

On Wed, Apr 6, 2022 at 2:45 PM Sam Rose <[email protected]> wrote:

> I just learned about the resets() function and applying it does seem to
> show that a reset occurred:
>
> {
>   "request": {
>     "url":
> "api/datasources/proxy/1/api/v1/query_range?query=resets(counter[1m])&start=1649239200&end=1649240100&step=60",
>     "method": "GET",
>     "hideFromInspector": false
>   },
>   "response": {
>     "status": "success",
>     "data": {
>       "resultType": "matrix",
>       "result": [
>         {
>           "metric": {/* redacted */},
>           "values": [
>             [
>               1649239200,
>               "0"
>             ],
>             [
>               1649239260,
>               "0"
>             ],
>             [
>               1649239320,
>               "0"
>             ],
>             [
>               1649239380,
>               "0"
>             ],
>             [
>               1649239440,
>               "0"
>             ],
>             [
>               1649239500,
>               "0"
>             ],
>             [
>               1649239560,
>               "0"
>             ],
>             [
>               1649239620,
>               "0"
>             ],
>             [
>               1649239680,
>               "0"
>             ],
>             [
>               1649239740,
>               "1"
>             ],
>             [
>               1649239800,
>               "0"
>             ],
>             [
>               1649239860,
>               "0"
>             ],
>             [
>               1649239920,
>               "0"
>             ],
>             [
>               1649239980,
>               "0"
>             ],
>             [
>               1649240040,
>               "0"
>             ],
>             [
>               1649240100,
>               "0"
>             ]
>           ]
>         }
>       ]
>     }
>   }
> }
>
> I don't quite understand how, though.
> On Wednesday, April 6, 2022 at 1:40:12 PM UTC+1 Sam Rose wrote:
>
>> Hi there,
>>
>> We're seeing really large spikes when using the `rate()` function on some
>> of our metrics. I've been able to isolate a single time series that
>> displays this problem, which I'm going to call `counter`. I haven't
>> attached the actual metric labels here, but all of the data you see here is
>> from `counter` over the same time period.
>>
>> This is the raw data, as obtained through a request to /api/v1/query:
>>
>> {
>>     "data": {
>>         "result": [
>>             {
>>                 "metric": {/* redacted */},
>>                 "values": [
>>                     [
>>                         1649239253.4,
>>                         "225201"
>>                     ],
>>                     [
>>                         1649239313.4,
>>                         "225226"
>>                     ],
>>                     [
>>                         1649239373.4,
>>                         "225249"
>>                     ],
>>                     [
>>                         1649239433.4,
>>                         "225262"
>>                     ],
>>                     [
>>                         1649239493.4,
>>                         "225278"
>>                     ],
>>                     [
>>                         1649239553.4,
>>                         "225310"
>>                     ],
>>                     [
>>                         1649239613.4,
>>                         "225329"
>>                     ],
>>                     [
>>                         1649239673.4,
>>                         "225363"
>>                     ],
>>                     [
>>                         1649239733.4,
>>                         "225402"
>>                     ],
>>                     [
>>                         1649239793.4,
>>                         "225437"
>>                     ],
>>                     [
>>                         1649239853.4,
>>                         "225466"
>>                     ],
>>                     [
>>                         1649239913.4,
>>                         "225492"
>>                     ],
>>                     [
>>                         1649239973.4,
>>                         "225529"
>>                     ],
>>                     [
>>                         1649240033.4,
>>                         "225555"
>>                     ],
>>                     [
>>                         1649240093.4,
>>                         "225595"
>>                     ]
>>                 ]
>>             }
>>         ],
>>         "resultType": "matrix"
>>     },
>>     "status": "success"
>> }
>>
>> The next query is taken from the Grafana query inspector, because for
>> reasons I don't understand I can't get Prometheus to give me any data when
>> I issue the same query to /api/v1/query_range. The query is the same as the
>> above query, but wrapped in a rate([1m]):
>>
>>     "request": {
>>         "url":
>> "api/datasources/proxy/1/api/v1/query_range?query=rate(counter[1m])&start=1649239200&end=1649240100&step=60",
>>         "method": "GET",
>>         "hideFromInspector": false
>>     },
>>     "response": {
>>         "status": "success",
>>         "data": {
>>             "resultType": "matrix",
>>             "result": [
>>                 {
>>                     "metric": {/* redacted */},
>>                     "values": [
>>                         [
>>                             1649239200,
>>                             "0"
>>                         ],
>>                         [
>>                             1649239260,
>>                             "0"
>>                         ],
>>                         [
>>                             1649239320,
>>                             "0"
>>                         ],
>>                         [
>>                             1649239380,
>>                             "0"
>>                         ],
>>                         [
>>                             1649239440,
>>                             "0"
>>                         ],
>>                         [
>>                             1649239500,
>>                             "0"
>>                         ],
>>                         [
>>                             1649239560,
>>                             "0"
>>                         ],
>>                         [
>>                             1649239620,
>>                             "0"
>>                         ],
>>                         [
>>                             1649239680,
>>                             "0"
>>                         ],
>>                         [
>>                             1649239740,
>>                             "9391.766666666665"
>>                         ],
>>                         [
>>                             1649239800,
>>                             "0"
>>                         ],
>>                         [
>>                             1649239860,
>>                             "0"
>>                         ],
>>                         [
>>                             1649239920,
>>                             "0"
>>                         ],
>>                         [
>>                             1649239980,
>>                             "0"
>>                         ],
>>                         [
>>                             1649240040,
>>                             "0.03333333333333333"
>>                         ],
>>                         [
>>                             1649240100,
>>                             "0"
>>                         ]
>>                     ]
>>                 }
>>             ]
>>         }
>>     }
>> }
>>
>> Given the gradual increase in the underlying counter, I have two
>> questions:
>>
>> 1. How come the rate is 0 for all except 2 datapoints?
>> 2. How come there is one enormous datapoint in the rate query, that is
>> seemingly unexplained in the raw data?
>>
>> For 2 I've seen in other threads that the explanation is an unintentional
>> counter reset, caused by scrapes a millisecond apart that make the counter
>> appear to go down for a single scrape interval. I don't think I see this in
>> our raw data, though.
>>
>> We're using Prometheus version 2.26.0, revision
>> 3cafc58827d1ebd1a67749f88be4218f0bab3d8d, go version go1.16.2.
>>
> --
> You received this message because you are subscribed to the Google Groups
> "Prometheus Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/prometheus-users/c1b7568b-f7f9-4edc-943a-22412658975fn%40googlegroups.com
> <https://groups.google.com/d/msgid/prometheus-users/c1b7568b-f7f9-4edc-943a-22412658975fn%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/CABbyFmogEuat06mqJbOHskUHj7w-vhDW7374v1BK%2B-U7C60Q-g%40mail.gmail.com.

Re: [prometheus-users] Re: Big spike when using rate() - doesn't seem to be a counter reset

Reply via email to