I would recommend using the `$__rate_interval` magic variable in Grafana.
Note that Grafana assumes a default interval of 15s in the datasource
settings.

If your data is mostly 60s scrape intervals, you can configure this setting
in the Grafana datasource settings.

If you want to be able to view 1m resolution rates, I recommend increasing
your scrape interval to 15s. This makes sure you have several samples in
the rate window. This helps Prometheus better handle true counter resets
and lost scrapes.

On Wed, Apr 6, 2022 at 2:56 PM Sam Rose <[email protected]> wrote:

> Thanks for the heads up! We've flip flopped a bit between using 1m or 2m.
> 1m seems to work reliably enough to be useful in most situations, but I'll
> probably end up going back to 2m after this discussion.
>
> I don't believe that helps with the reset problem though, right? I retried
> the queries using 2m instead of 1m and they still exhibit the same problem.
>
> Is there any more data I can get you to help debug the problem? We see
> this happen multiple times per day, and it's making it difficult to monitor
> our systems in production.
>
> On Wednesday, April 6, 2022 at 1:53:26 PM UTC+1 [email protected] wrote:
>
>> Yup, PromQL thinks there's a small dip in the data. I'm not sure why tho.
>> I took your raw values:
>>
>> 225201
>> 225226
>> 225249
>> 225262
>> 225278
>> 225310
>> 225329
>> 225363
>> 225402
>> 225437
>> 225466
>> 225492
>> 225529
>> 225555
>> 225595
>>
>> $ awk '{print $1-225201}' values
>> 0
>> 25
>> 48
>> 61
>> 77
>> 109
>> 128
>> 162
>> 201
>> 236
>> 265
>> 291
>> 328
>> 354
>> 394
>>
>> I'm not seeing the reset there.
>>
>> One thing I noticed, your data interval is 60 seconds and you are doing a
>> rate(counter[1m]). This is not going to work reliably, because you are
>> likely to not have two samples in the same step window. This is because
>> Prometheus uses millisecond timestamps, so if you have timestamps at these
>> times:
>>
>> 5.335
>> 65.335
>> 125.335
>>
>> Then you do a rate(counter[1m]) at time 120 (Grafana attempts to align
>> queries to even minutes for consistency), the only sample you'll get back
>> is 65.335.
>>
>> You need to do rate(counter[2m]) in order to avoid problems.
>>
>> On Wed, Apr 6, 2022 at 2:45 PM Sam Rose <[email protected]> wrote:
>>
>>> I just learned about the resets() function and applying it does seem to
>>> show that a reset occurred:
>>>
>>> {
>>>   "request": {
>>>     "url":
>>> "api/datasources/proxy/1/api/v1/query_range?query=resets(counter[1m])&start=1649239200&end=1649240100&step=60",
>>>     "method": "GET",
>>>     "hideFromInspector": false
>>>   },
>>>   "response": {
>>>     "status": "success",
>>>     "data": {
>>>       "resultType": "matrix",
>>>       "result": [
>>>         {
>>>           "metric": {/* redacted */},
>>>           "values": [
>>>             [
>>>               1649239200,
>>>               "0"
>>>             ],
>>>             [
>>>               1649239260,
>>>               "0"
>>>             ],
>>>             [
>>>               1649239320,
>>>               "0"
>>>             ],
>>>             [
>>>               1649239380,
>>>               "0"
>>>             ],
>>>             [
>>>               1649239440,
>>>               "0"
>>>             ],
>>>             [
>>>               1649239500,
>>>               "0"
>>>             ],
>>>             [
>>>               1649239560,
>>>               "0"
>>>             ],
>>>             [
>>>               1649239620,
>>>               "0"
>>>             ],
>>>             [
>>>               1649239680,
>>>               "0"
>>>             ],
>>>             [
>>>               1649239740,
>>>               "1"
>>>             ],
>>>             [
>>>               1649239800,
>>>               "0"
>>>             ],
>>>             [
>>>               1649239860,
>>>               "0"
>>>             ],
>>>             [
>>>               1649239920,
>>>               "0"
>>>             ],
>>>             [
>>>               1649239980,
>>>               "0"
>>>             ],
>>>             [
>>>               1649240040,
>>>               "0"
>>>             ],
>>>             [
>>>               1649240100,
>>>               "0"
>>>             ]
>>>           ]
>>>         }
>>>       ]
>>>     }
>>>   }
>>> }
>>>
>>> I don't quite understand how, though.
>>> On Wednesday, April 6, 2022 at 1:40:12 PM UTC+1 Sam Rose wrote:
>>>
>>>> Hi there,
>>>>
>>>> We're seeing really large spikes when using the `rate()` function on
>>>> some of our metrics. I've been able to isolate a single time series that
>>>> displays this problem, which I'm going to call `counter`. I haven't
>>>> attached the actual metric labels here, but all of the data you see here is
>>>> from `counter` over the same time period.
>>>>
>>>> This is the raw data, as obtained through a request to /api/v1/query:
>>>>
>>>> {
>>>>     "data": {
>>>>         "result": [
>>>>             {
>>>>                 "metric": {/* redacted */},
>>>>                 "values": [
>>>>                     [
>>>>                         1649239253.4,
>>>>                         "225201"
>>>>                     ],
>>>>                     [
>>>>                         1649239313.4,
>>>>                         "225226"
>>>>                     ],
>>>>                     [
>>>>                         1649239373.4,
>>>>                         "225249"
>>>>                     ],
>>>>                     [
>>>>                         1649239433.4,
>>>>                         "225262"
>>>>                     ],
>>>>                     [
>>>>                         1649239493.4,
>>>>                         "225278"
>>>>                     ],
>>>>                     [
>>>>                         1649239553.4,
>>>>                         "225310"
>>>>                     ],
>>>>                     [
>>>>                         1649239613.4,
>>>>                         "225329"
>>>>                     ],
>>>>                     [
>>>>                         1649239673.4,
>>>>                         "225363"
>>>>                     ],
>>>>                     [
>>>>                         1649239733.4,
>>>>                         "225402"
>>>>                     ],
>>>>                     [
>>>>                         1649239793.4,
>>>>                         "225437"
>>>>                     ],
>>>>                     [
>>>>                         1649239853.4,
>>>>                         "225466"
>>>>                     ],
>>>>                     [
>>>>                         1649239913.4,
>>>>                         "225492"
>>>>                     ],
>>>>                     [
>>>>                         1649239973.4,
>>>>                         "225529"
>>>>                     ],
>>>>                     [
>>>>                         1649240033.4,
>>>>                         "225555"
>>>>                     ],
>>>>                     [
>>>>                         1649240093.4,
>>>>                         "225595"
>>>>                     ]
>>>>                 ]
>>>>             }
>>>>         ],
>>>>         "resultType": "matrix"
>>>>     },
>>>>     "status": "success"
>>>> }
>>>>
>>>> The next query is taken from the Grafana query inspector, because for
>>>> reasons I don't understand I can't get Prometheus to give me any data when
>>>> I issue the same query to /api/v1/query_range. The query is the same as the
>>>> above query, but wrapped in a rate([1m]):
>>>>
>>>>     "request": {
>>>>         "url":
>>>> "api/datasources/proxy/1/api/v1/query_range?query=rate(counter[1m])&start=1649239200&end=1649240100&step=60",
>>>>         "method": "GET",
>>>>         "hideFromInspector": false
>>>>     },
>>>>     "response": {
>>>>         "status": "success",
>>>>         "data": {
>>>>             "resultType": "matrix",
>>>>             "result": [
>>>>                 {
>>>>                     "metric": {/* redacted */},
>>>>                     "values": [
>>>>                         [
>>>>                             1649239200,
>>>>                             "0"
>>>>                         ],
>>>>                         [
>>>>                             1649239260,
>>>>                             "0"
>>>>                         ],
>>>>                         [
>>>>                             1649239320,
>>>>                             "0"
>>>>                         ],
>>>>                         [
>>>>                             1649239380,
>>>>                             "0"
>>>>                         ],
>>>>                         [
>>>>                             1649239440,
>>>>                             "0"
>>>>                         ],
>>>>                         [
>>>>                             1649239500,
>>>>                             "0"
>>>>                         ],
>>>>                         [
>>>>                             1649239560,
>>>>                             "0"
>>>>                         ],
>>>>                         [
>>>>                             1649239620,
>>>>                             "0"
>>>>                         ],
>>>>                         [
>>>>                             1649239680,
>>>>                             "0"
>>>>                         ],
>>>>                         [
>>>>                             1649239740,
>>>>                             "9391.766666666665"
>>>>                         ],
>>>>                         [
>>>>                             1649239800,
>>>>                             "0"
>>>>                         ],
>>>>                         [
>>>>                             1649239860,
>>>>                             "0"
>>>>                         ],
>>>>                         [
>>>>                             1649239920,
>>>>                             "0"
>>>>                         ],
>>>>                         [
>>>>                             1649239980,
>>>>                             "0"
>>>>                         ],
>>>>                         [
>>>>                             1649240040,
>>>>                             "0.03333333333333333"
>>>>                         ],
>>>>                         [
>>>>                             1649240100,
>>>>                             "0"
>>>>                         ]
>>>>                     ]
>>>>                 }
>>>>             ]
>>>>         }
>>>>     }
>>>> }
>>>>
>>>> Given the gradual increase in the underlying counter, I have two
>>>> questions:
>>>>
>>>> 1. How come the rate is 0 for all except 2 datapoints?
>>>> 2. How come there is one enormous datapoint in the rate query, that is
>>>> seemingly unexplained in the raw data?
>>>>
>>>> For 2 I've seen in other threads that the explanation is an
>>>> unintentional counter reset, caused by scrapes a millisecond apart that
>>>> make the counter appear to go down for a single scrape interval. I don't
>>>> think I see this in our raw data, though.
>>>>
>>>> We're using Prometheus version 2.26.0, revision
>>>> 3cafc58827d1ebd1a67749f88be4218f0bab3d8d, go version go1.16.2.
>>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "Prometheus Users" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/prometheus-users/c1b7568b-f7f9-4edc-943a-22412658975fn%40googlegroups.com
>>> <https://groups.google.com/d/msgid/prometheus-users/c1b7568b-f7f9-4edc-943a-22412658975fn%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>> --
> You received this message because you are subscribed to the Google Groups
> "Prometheus Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/prometheus-users/888affc4-e9ba-4ea8-8a40-c7b7a17affe4n%40googlegroups.com
> <https://groups.google.com/d/msgid/prometheus-users/888affc4-e9ba-4ea8-8a40-c7b7a17affe4n%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/CABbyFmoojYKsbFtLzAuLZB6fnDzAThu-OUvFTMNxr0n%2B-RpARw%40mail.gmail.com.

Reply via email to