Re: [prometheus-users] Re: Big spike when using rate() - doesn't seem to be a counter reset

Sam Rose Wed, 06 Apr 2022 12:23:23 -0700

I appreciate your time. I’ve logged off for the day but will get back to you 
tomorrow with more data.


To answer the question I can: we aren’t using any proxy software to my 
knowledge. We use the 
https://github.com/prometheus-community/helm-charts/tree/main/charts/prometheus 
Helm chart (version 13.8.0) hooked up to store data in AWS’s managed Prometheus 
product.

That said, we do run it in StatefulSet mode with 2 replicas. I wonder if that’s 
causing problems.

On Wed, 6 Apr 2022, at 19:49, Brian Candler wrote:
> Are you going through any middleware or proxy, like promxy?
> 
> rate(foo[1m]) should definitely give no answer at all, when the timeseries 
> data is sampled at 1 minute intervals.
> 
> Here is a working query_range for rate[1m] where the scrape interval is 15s:
> 
> # curl -Ssg 
> 'http://localhost:9090/api/v1/query_range?query=rate(ifHCInOctets{instance="gw1",ifName="ether1"}[60s])&start=1649264340&end=1649264640&step=60'
>  | python3 -m json.tool
> {
>     "status": "success",
>     "data": {
>         "resultType": "matrix",
>         "result": [
>             {
>                 "metric": {
>                     "ifIndex": "16",
>                     "ifName": "ether1",
>                     "instance": "gw1",
>                     "job": "snmp",
>                     "module": "mikrotik_secret",
>                     "netbox_type": "device"
>                 },
>                 "values": [
>                     [
>                         1649264340,
>                         "578.6444444444444"
>                     ],
>                     [
>                         1649264400,
>                         "651.4222222222221"
>                     ],
>                     [
>                         1649264460,
>                         "135.17777777777778"
>                     ],
>                     [
>                         1649264520,
>                         "1699.4888888888888"
>                     ],
>                     [
>                         1649264580,
>                         "441.5777777777777"
>                     ],
>                     [
>                         1649264640,
>                         "39768.08888888888"
>                     ]
>                 ]
>             }
>         ]
>     }
> }
> 
> But if I make exactly the same query but with rate[15s] then there are no 
> answers:
> 
> # curl -Ssg 
> 'http://localhost:9090/api/v1/query_range?query=rate(ifHCInOctets{instance="gw1",ifName="ether1"}[15s])&start=1649264340&end=1649264640&step=60'
>  | python3 -m json.tool
> {
>     "status": "success",
>     "data": {
>         "resultType": "matrix",
>         "result": []
>     }
> }
> 
> I think the real reason for your problem is hidden; you're obfuscating the 
> query and metric names, and I suspect it's hidden behind that.  Sorry, I 
> can't help you any further given what I can see, but hopefully you have an 
> idea where you can look further.
> 
> On Wednesday, 6 April 2022 at 18:45:10 UTC+1 [email protected] wrote:
>> Hey Brian,
>> 
>> In the original post I put the output of the raw time series as gathered the 
>> way you suggest. I'll copy it again below:
>> 
>> {
>>     "data": {
>>         "result": [
>>             {
>>                 "metric": {/* redacted */},
>>                 "values": [
>>                     [
>>                         1649239253.4,
>>                         "225201"
>>                     ],
>>                     [
>>                         1649239313.4,
>>                         "225226"
>>                     ],
>>                     [
>>                         1649239373.4,
>>                         "225249"
>>                     ],
>>                     [
>>                         1649239433.4,
>>                         "225262"
>>                     ],
>>                     [
>>                         1649239493.4,
>>                         "225278"
>>                     ],
>>                     [
>>                         1649239553.4,
>>                         "225310"
>>                     ],
>>                     [
>>                         1649239613.4,
>>                         "225329"
>>                     ],
>>                     [
>>                         1649239673.4,
>>                         "225363"
>>                     ],
>>                     [
>>                         1649239733.4,
>>                         "225402"
>>                     ],
>>                     [
>>                         1649239793.4,
>>                         "225437"
>>                     ],
>>                     [
>>                         1649239853.4,
>>                         "225466"
>>                     ],
>>                     [
>>                         1649239913.4,
>>                         "225492"
>>                     ],
>>                     [
>>                         1649239973.4,
>>                         "225529"
>>                     ],
>>                     [
>>                         1649240033.4,
>>                         "225555"
>>                     ],
>>                     [
>>                         1649240093.4,
>>                         "225595"
>>                     ]
>>                 ]
>>             }
>>         ],
>>         "resultType": "matrix"
>>     },
>>     "status": "success"
>> }
>> 
>> The query was of the form `counter[15m]` at a given time. I don't see 
>> duplicate scrape data in there.
>> 
>> The version of prometheus is 2.26.0, revision 
>> 3cafc58827d1ebd1a67749f88be4218f0bab3d8d, go version go1.16.2.
>> On Wednesday, April 6, 2022 at 6:13:10 PM UTC+1 Brian Candler wrote:
>>> What version of prometheus are you running?
>>> 
>>> With prometheus, rate(counter[1m]) should give you no results at all when 
>>> you are scraping at 1 minute intervals - unless something has changed very 
>>> recently (I'm running 2.33.4).  So this is a big red flag.
>>> Now, for driving the query API, you should be able to do it like this:
>>> 
>>> # curl -Ssg 
>>> 'http://localhost:9090/api/v1/query?query=ifHCInOctets{instance= 
>>> <http://localhost:9090/api/v1/query?query=ifHCInOctets%7Binstance=>"gw1",ifName="ether1"}[60s]'
>>>  | python3 -m json.tool
>>> 
>>> {
>>>     "status": "success",
>>>     "data": {
>>>         "resultType": "matrix",
>>>         "result": [
>>>             {
>>>                 "metric": {
>>>                     "__name__": "ifHCInOctets",
>>>                     "ifIndex": "16",
>>>                     "ifName": "ether1",
>>>                     "instance": "gw1",
>>>                     "job": "snmp",
>>>                     "module": "mikrotik_secret",
>>>                     "netbox_type": "device"
>>>                 },
>>>                 "values": [
>>>                     [
>>>                         1649264595.241,
>>>                         "117857843410 <tel:(785)%20784-3410>"
>>>                     ],
>>>                     [
>>>                         1649264610.241,
>>>                         "117858063821 <tel:(785)%20806-3821>"
>>>                     ],
>>>                     [
>>>                         1649264625.241,
>>>                         "117858075769 <tel:(785)%20807-5769>"
>>>                     ]
>>>                 ]
>>>             }
>>>         ]
>>>     }
>>> }
>>> 
>>> There I gave a range vector of 60 seconds, and I got 3 data points because 
>>> I'm scraping at 15 second intervals, so only 3 points fell within the time 
>>> window of (current time) and (current time - 60s)
>>> 
>>> Sending a query_range will sample the data at intervals.  Only an actual 
>>> range vector query (as shown above) will show you *all* the data points in 
>>> the time series, wherever they lie.
>>> 
>>> I think you should do this.  My guess - and it's only a guess at the moment 
>>> - is that there are multiple points being received for the same timeseries, 
>>> and this is giving your spike.  This could be due to overlapping scrape 
>>> jobs for the same timeseries, or relabelling removing some distinguishing 
>>> label, or some HA setup which is scraping the same timeseries multiple 
>>> times but not adding external labels to distinguish them.
>>> 
>>> I do have some evidence for my guess.  If you are storing the same data 
>>> points twice, this will give you the rate of zero most of the time, when 
>>> doing rate[1m], because there are two adjacent identical points most of the 
>>> time (whereas if there were only a single data point, you'd get no rate at 
>>> all).  And you'll get a counter spike if two data points get transposed.
>>> 
>>> On Wednesday, 6 April 2022 at 14:37:57 UTC+1 [email protected] wrote:
>>>> Here's the query inspector output from Grafana for rate(counter[2m]). It 
>>>> makes the answer to question 1 in my original post more clear. You're 
>>>> right, the graph for 1m is just plain wrong. We do still see the reset, 
>>>> though.
>>>> 
>>>> {
>>>>   "request": {
>>>>     "url": 
>>>> "api/datasources/proxy/1/api/v1/query_range?query=rate(counter[2m])&start=1649239200&end=1649240100&step=60",
>>>> 
>>>>     "method": "GET",
>>>>     "hideFromInspector": false
>>>>   },
>>>>   "response": {
>>>>     "status": "success",
>>>>     "data": {
>>>>       "resultType": "matrix",
>>>>       "result": [
>>>>         {
>>>>           "metric": {/* redacted */},
>>>>           "values": [
>>>>             [
>>>>               1649239200,
>>>>               "0.2871886897537781"
>>>>             ],
>>>>             [
>>>>               1649239260,
>>>>               "0.3084619260318357"
>>>>             ],
>>>>             [
>>>>               1649239320,
>>>>               "0.26591545347572043"
>>>>             ],
>>>>             [
>>>>               1649239380,
>>>>               "0.2446422171976628"
>>>>             ],
>>>>             [
>>>>               1649239440,
>>>>               "0.13827603580737463"
>>>>             ],
>>>>             [
>>>>               1649239500,
>>>>               "0.1701858902244611"
>>>>             ],
>>>>             [
>>>>               1649239560,
>>>>               "0.3403717804489222"
>>>>             ],
>>>>             [
>>>>               1649239620,
>>>>               "0.20209574464154753"
>>>>             ],
>>>>             [
>>>>               1649239680,
>>>>               "0.3616450167269798"
>>>>             ],
>>>>             [
>>>>               1649239740,
>>>>               "2397.9404664989347"
>>>>             ],
>>>>             [
>>>>               1649239800,
>>>>               "2397.88728340824"
>>>>             ],
>>>>             [
>>>>               1649239860,
>>>>               "0.3084619260318357"
>>>>             ],
>>>>             [
>>>>               1649239920,
>>>>               "0.27655207161474926"
>>>>             ],
>>>>             [
>>>>               1649239980,
>>>>               "0.39355487114406623"
>>>>             ],
>>>>             [
>>>>               1649240040,
>>>>               "0.27655207161474926"
>>>>             ],
>>>>             [
>>>>               1649240100,
>>>>               "0.43610134370018155"
>>>>             ]
>>>>           ]
>>>>         }
>>>>       ]
>>>>     }
>>>>   }
>>>> }
>>>> On Wednesday, April 6, 2022 at 2:34:59 PM UTC+1 Sam Rose wrote:
>>>>> We do see a graph with rate(counter[1m]). It even looks pretty close to 
>>>>> what we see with rate(counter[2m]). We definitely scrape every 60 
>>>>> seconds, double checked our config to make sure.
>>>>> 
>>>>> The exact query was `counter[15m]`. Counter is 
>>>>> `django_http_responses_total_by_status_total` in reality, with a long 
>>>>> list of labels attached to ensure I'm selecting a single time series.
>>>>> 
>>>>> I didn't realise Grafana did that, thank you for the advice.
>>>>> 
>>>>> I feel like we're drifting away from the original problem a little bit. 
>>>>> Can I get you any additional data to make the original problem easier to 
>>>>> debug?
>>>>> 
>>>>> On Wednesday, April 6, 2022 at 2:31:27 PM UTC+1 Brian Candler wrote:
>>>>>> If you are scraping at 1m intervals, then you definitely need 
>>>>>> rate(counter[2m]).  That's because rate() needs at least two data points 
>>>>>> to fall within the range window.  I would be surprised if you see any 
>>>>>> graph at all with rate(counter[1m]).
>>>>>> 
>>>>>> > This is the raw data, as obtained through a request to /api/v1/query
>>>>>> 
>>>>>> What is the *exact* query you gave? Hopefully it is a range vector 
>>>>>> query, like counter[15m].  A range vector expression sent to the simple 
>>>>>> query endpoint gives you the raw data points with their raw timestamps 
>>>>>> from the database.
>>>>>> 
>>>>>> > and then we configure the minimum value of it to 1m per-graph
>>>>>> 
>>>>>> Just in case you haven't realised: to set a minimum value of 1m, you 
>>>>>> must set the data source scrape interval (in Grafana) to 15s - since 
>>>>>> Grafana clamps the minimum value to 4 x Grafana-configured data source 
>>>>>> scrape interval.
>>>>>> 
>>>>>> Therefore if you are actually scraping at 1m intervals, and you want the 
>>>>>> minimum of $__rate_interval to be 2m, then you must set the Grafana data 
>>>>>> source interval to 30s.  This is weird, but it is what it is.
>>>>>> https://github.com/grafana/grafana/issues/32169
>>>>>> 
>>>>>> On Wednesday, 6 April 2022 at 14:07:13 UTC+1 [email protected] wrote:
>>>>>>> We do make use of that variable, and then we configure the minimum 
>>>>>>> value of it to 1m per-graph. I didn't realise you could configure this 
>>>>>>> per-datasource, thanks for pointing that out!
>>>>>>> 
>>>>>>> We did used to scrape at 15s intervals but we're using AWS's managed 
>>>>>>> prometheus workspaces, and each data point costs money, so we brought 
>>>>>>> it down to 1m intervals.
>>>>>>> 
>>>>>>> I'm not sure I understand the relationship between scrape interval and 
>>>>>>> counter resets, especially considering there doesn't appear to be a 
>>>>>>> counter reset in the raw data of the time series in question.
>>>>>>> 
>>>>>>> You mentioned "true counter reset", does prometheus have some internal 
>>>>>>> distinction between types of counter reset?
>>>>>>> On Wednesday, April 6, 2022 at 2:03:40 PM UTC+1 [email protected] wrote:
>>>>>>>> I would recommend using the `$__rate_interval` magic variable in 
>>>>>>>> Grafana. Note that Grafana assumes a default interval of 15s in the 
>>>>>>>> datasource settings.
>>>>>>>> 
>>>>>>>> If your data is mostly 60s scrape intervals, you can configure this 
>>>>>>>> setting in the Grafana datasource settings.
>>>>>>>> 
>>>>>>>> If you want to be able to view 1m resolution rates, I recommend 
>>>>>>>> increasing your scrape interval to 15s. This makes sure you have 
>>>>>>>> several samples in the rate window. This helps Prometheus better 
>>>>>>>> handle true counter resets and lost scrapes.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Wed, Apr 6, 2022 at 2:56 PM Sam Rose <[email protected]> wrote:
>>>>>>>>> Thanks for the heads up! We've flip flopped a bit between using 1m or 
>>>>>>>>> 2m. 1m seems to work reliably enough to be useful in most situations, 
>>>>>>>>> but I'll probably end up going back to 2m after this discussion.
>>>>>>>>> 
>>>>>>>>> I don't believe that helps with the reset problem though, right? I 
>>>>>>>>> retried the queries using 2m instead of 1m and they still exhibit the 
>>>>>>>>> same problem.
>>>>>>>>> 
>>>>>>>>> Is there any more data I can get you to help debug the problem? We 
>>>>>>>>> see this happen multiple times per day, and it's making it difficult 
>>>>>>>>> to monitor our systems in production.
>>>>>>>>> On Wednesday, April 6, 2022 at 1:53:26 PM UTC+1 [email protected] 
>>>>>>>>> wrote:
>>>>>>>>>> Yup, PromQL thinks there's a small dip in the data. I'm not sure why 
>>>>>>>>>> tho. I took your raw values:
>>>>>>>>>> 
>>>>>>>>>> 225201
>>>>>>>>>> 225226
>>>>>>>>>> 225249
>>>>>>>>>> 225262
>>>>>>>>>> 225278
>>>>>>>>>> 225310
>>>>>>>>>> 225329
>>>>>>>>>> 225363
>>>>>>>>>> 225402
>>>>>>>>>> 225437
>>>>>>>>>> 225466
>>>>>>>>>> 225492
>>>>>>>>>> 225529
>>>>>>>>>> 225555
>>>>>>>>>> 225595
>>>>>>>>>> 
>>>>>>>>>> $ awk '{print $1-225201}' values
>>>>>>>>>> 0
>>>>>>>>>> 25
>>>>>>>>>> 48
>>>>>>>>>> 61
>>>>>>>>>> 77
>>>>>>>>>> 109
>>>>>>>>>> 128
>>>>>>>>>> 162
>>>>>>>>>> 201
>>>>>>>>>> 236
>>>>>>>>>> 265
>>>>>>>>>> 291
>>>>>>>>>> 328
>>>>>>>>>> 354
>>>>>>>>>> 394
>>>>>>>>>> 
>>>>>>>>>> I'm not seeing the reset there.
>>>>>>>>>> 
>>>>>>>>>> One thing I noticed, your data interval is 60 seconds and you are 
>>>>>>>>>> doing a rate(counter[1m]). This is not going to work reliably, 
>>>>>>>>>> because you are likely to not have two samples in the same step 
>>>>>>>>>> window. This is because Prometheus uses millisecond timestamps, so 
>>>>>>>>>> if you have timestamps at these times:
>>>>>>>>>> 
>>>>>>>>>> 5.335
>>>>>>>>>> 65.335
>>>>>>>>>> 125.335
>>>>>>>>>> 
>>>>>>>>>> Then you do a rate(counter[1m]) at time 120 (Grafana attempts to 
>>>>>>>>>> align queries to even minutes for consistency), the only sample 
>>>>>>>>>> you'll get back is 65.335.
>>>>>>>>>> 
>>>>>>>>>> You need to do rate(counter[2m]) in order to avoid problems.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> On Wed, Apr 6, 2022 at 2:45 PM Sam Rose <[email protected]> wrote:
>>>>>>>>>>> I just learned about the resets() function and applying it does 
>>>>>>>>>>> seem to show that a reset occurred:
>>>>>>>>>>> 
>>>>>>>>>>> {
>>>>>>>>>>>   "request": {
>>>>>>>>>>>     "url": 
>>>>>>>>>>> "api/datasources/proxy/1/api/v1/query_range?query=resets(counter[1m])&start=1649239200&end=1649240100&step=60",
>>>>>>>>>>>     "method": "GET",
>>>>>>>>>>>     "hideFromInspector": false
>>>>>>>>>>>   },
>>>>>>>>>>>   "response": {
>>>>>>>>>>>     "status": "success",
>>>>>>>>>>>     "data": {
>>>>>>>>>>>       "resultType": "matrix",
>>>>>>>>>>>       "result": [
>>>>>>>>>>>         {
>>>>>>>>>>>           "metric": {/* redacted */},
>>>>>>>>>>>           "values": [
>>>>>>>>>>>             [
>>>>>>>>>>>               1649239200,
>>>>>>>>>>>               "0"
>>>>>>>>>>>             ],
>>>>>>>>>>>             [
>>>>>>>>>>>               1649239260,
>>>>>>>>>>>               "0"
>>>>>>>>>>>             ],
>>>>>>>>>>>             [
>>>>>>>>>>>               1649239320,
>>>>>>>>>>>               "0"
>>>>>>>>>>>             ],
>>>>>>>>>>>             [
>>>>>>>>>>>               1649239380,
>>>>>>>>>>>               "0"
>>>>>>>>>>>             ],
>>>>>>>>>>>             [
>>>>>>>>>>>               1649239440,
>>>>>>>>>>>               "0"
>>>>>>>>>>>             ],
>>>>>>>>>>>             [
>>>>>>>>>>>               1649239500,
>>>>>>>>>>>               "0"
>>>>>>>>>>>             ],
>>>>>>>>>>>             [
>>>>>>>>>>>               1649239560,
>>>>>>>>>>>               "0"
>>>>>>>>>>>             ],
>>>>>>>>>>>             [
>>>>>>>>>>>               1649239620,
>>>>>>>>>>>               "0"
>>>>>>>>>>>             ],
>>>>>>>>>>>             [
>>>>>>>>>>>               1649239680,
>>>>>>>>>>>               "0"
>>>>>>>>>>>             ],
>>>>>>>>>>>             [
>>>>>>>>>>>               1649239740,
>>>>>>>>>>>               "1"
>>>>>>>>>>>             ],
>>>>>>>>>>>             [
>>>>>>>>>>>               1649239800,
>>>>>>>>>>>               "0"
>>>>>>>>>>>             ],
>>>>>>>>>>>             [
>>>>>>>>>>>               1649239860,
>>>>>>>>>>>               "0"
>>>>>>>>>>>             ],
>>>>>>>>>>>             [
>>>>>>>>>>>               1649239920,
>>>>>>>>>>>               "0"
>>>>>>>>>>>             ],
>>>>>>>>>>>             [
>>>>>>>>>>>               1649239980,
>>>>>>>>>>>               "0"
>>>>>>>>>>>             ],
>>>>>>>>>>>             [
>>>>>>>>>>>               1649240040,
>>>>>>>>>>>               "0"
>>>>>>>>>>>             ],
>>>>>>>>>>>             [
>>>>>>>>>>>               1649240100,
>>>>>>>>>>>               "0"
>>>>>>>>>>>             ]
>>>>>>>>>>>           ]
>>>>>>>>>>>         }
>>>>>>>>>>>       ]
>>>>>>>>>>>     }
>>>>>>>>>>>   }
>>>>>>>>>>> }
>>>>>>>>>>> I don't quite understand how, though.
>>>>>>>>>>> On Wednesday, April 6, 2022 at 1:40:12 PM UTC+1 Sam Rose wrote:
>>>>>>>>>>>> Hi there,
>>>>>>>>>>>> 
>>>>>>>>>>>> We're seeing really large spikes when using the `rate()` function 
>>>>>>>>>>>> on some of our metrics. I've been able to isolate a single time 
>>>>>>>>>>>> series that displays this problem, which I'm going to call 
>>>>>>>>>>>> `counter`. I haven't attached the actual metric labels here, but 
>>>>>>>>>>>> all of the data you see here is from `counter` over the same time 
>>>>>>>>>>>> period.
>>>>>>>>>>>> 
>>>>>>>>>>>> This is the raw data, as obtained through a request to 
>>>>>>>>>>>> /api/v1/query:
>>>>>>>>>>>> 
>>>>>>>>>>>> {
>>>>>>>>>>>>     "data": {
>>>>>>>>>>>>         "result": [
>>>>>>>>>>>>             {
>>>>>>>>>>>>                 "metric": {/* redacted */},
>>>>>>>>>>>>                 "values": [
>>>>>>>>>>>>                     [
>>>>>>>>>>>>                         1649239253.4,
>>>>>>>>>>>>                         "225201"
>>>>>>>>>>>>                     ],
>>>>>>>>>>>>                     [
>>>>>>>>>>>>                         1649239313.4,
>>>>>>>>>>>>                         "225226"
>>>>>>>>>>>>                     ],
>>>>>>>>>>>>                     [
>>>>>>>>>>>>                         1649239373.4,
>>>>>>>>>>>>                         "225249"
>>>>>>>>>>>>                     ],
>>>>>>>>>>>>                     [
>>>>>>>>>>>>                         1649239433.4,
>>>>>>>>>>>>                         "225262"
>>>>>>>>>>>>                     ],
>>>>>>>>>>>>                     [
>>>>>>>>>>>>                         1649239493.4,
>>>>>>>>>>>>                         "225278"
>>>>>>>>>>>>                     ],
>>>>>>>>>>>>                     [
>>>>>>>>>>>>                         1649239553.4,
>>>>>>>>>>>>                         "225310"
>>>>>>>>>>>>                     ],
>>>>>>>>>>>>                     [
>>>>>>>>>>>>                         1649239613.4,
>>>>>>>>>>>>                         "225329"
>>>>>>>>>>>>                     ],
>>>>>>>>>>>>                     [
>>>>>>>>>>>>                         1649239673.4,
>>>>>>>>>>>>                         "225363"
>>>>>>>>>>>>                     ],
>>>>>>>>>>>>                     [
>>>>>>>>>>>>                         1649239733.4,
>>>>>>>>>>>>                         "225402"
>>>>>>>>>>>>                     ],
>>>>>>>>>>>>                     [
>>>>>>>>>>>>                         1649239793.4,
>>>>>>>>>>>>                         "225437"
>>>>>>>>>>>>                     ],
>>>>>>>>>>>>                     [
>>>>>>>>>>>>                         1649239853.4,
>>>>>>>>>>>>                         "225466"
>>>>>>>>>>>>                     ],
>>>>>>>>>>>>                     [
>>>>>>>>>>>>                         1649239913.4,
>>>>>>>>>>>>                         "225492"
>>>>>>>>>>>>                     ],
>>>>>>>>>>>>                     [
>>>>>>>>>>>>                         1649239973.4,
>>>>>>>>>>>>                         "225529"
>>>>>>>>>>>>                     ],
>>>>>>>>>>>>                     [
>>>>>>>>>>>>                         1649240033.4,
>>>>>>>>>>>>                         "225555"
>>>>>>>>>>>>                     ],
>>>>>>>>>>>>                     [
>>>>>>>>>>>>                         1649240093.4,
>>>>>>>>>>>>                         "225595"
>>>>>>>>>>>>                     ]
>>>>>>>>>>>>                 ]
>>>>>>>>>>>>             }
>>>>>>>>>>>>         ],
>>>>>>>>>>>>         "resultType": "matrix"
>>>>>>>>>>>>     },
>>>>>>>>>>>>     "status": "success"
>>>>>>>>>>>> }
>>>>>>>>>>>> 
>>>>>>>>>>>> The next query is taken from the Grafana query inspector, because 
>>>>>>>>>>>> for reasons I don't understand I can't get Prometheus to give me 
>>>>>>>>>>>> any data when I issue the same query to /api/v1/query_range. The 
>>>>>>>>>>>> query is the same as the above query, but wrapped in a rate([1m]):
>>>>>>>>>>>> 
>>>>>>>>>>>>     "request": {
>>>>>>>>>>>>         "url": 
>>>>>>>>>>>> "api/datasources/proxy/1/api/v1/query_range?query=rate(counter[1m])&start=1649239200&end=1649240100&step=60",
>>>>>>>>>>>>         "method": "GET",
>>>>>>>>>>>>         "hideFromInspector": false
>>>>>>>>>>>>     },
>>>>>>>>>>>>     "response": {
>>>>>>>>>>>>         "status": "success",
>>>>>>>>>>>>         "data": {
>>>>>>>>>>>>             "resultType": "matrix",
>>>>>>>>>>>>             "result": [
>>>>>>>>>>>>                 {
>>>>>>>>>>>>                     "metric": {/* redacted */},
>>>>>>>>>>>>                     "values": [
>>>>>>>>>>>>                         [
>>>>>>>>>>>>                             1649239200,
>>>>>>>>>>>>                             "0"
>>>>>>>>>>>>                         ],
>>>>>>>>>>>>                         [
>>>>>>>>>>>>                             1649239260,
>>>>>>>>>>>>                             "0"
>>>>>>>>>>>>                         ],
>>>>>>>>>>>>                         [
>>>>>>>>>>>>                             1649239320,
>>>>>>>>>>>>                             "0"
>>>>>>>>>>>>                         ],
>>>>>>>>>>>>                         [
>>>>>>>>>>>>                             1649239380,
>>>>>>>>>>>>                             "0"
>>>>>>>>>>>>                         ],
>>>>>>>>>>>>                         [
>>>>>>>>>>>>                             1649239440,
>>>>>>>>>>>>                             "0"
>>>>>>>>>>>>                         ],
>>>>>>>>>>>>                         [
>>>>>>>>>>>>                             1649239500,
>>>>>>>>>>>>                             "0"
>>>>>>>>>>>>                         ],
>>>>>>>>>>>>                         [
>>>>>>>>>>>>                             1649239560,
>>>>>>>>>>>>                             "0"
>>>>>>>>>>>>                         ],
>>>>>>>>>>>>                         [
>>>>>>>>>>>>                             1649239620,
>>>>>>>>>>>>                             "0"
>>>>>>>>>>>>                         ],
>>>>>>>>>>>>                         [
>>>>>>>>>>>>                             1649239680,
>>>>>>>>>>>>                             "0"
>>>>>>>>>>>>                         ],
>>>>>>>>>>>>                         [
>>>>>>>>>>>>                             1649239740,
>>>>>>>>>>>>                             "9391.766666666665"
>>>>>>>>>>>>                         ],
>>>>>>>>>>>>                         [
>>>>>>>>>>>>                             1649239800,
>>>>>>>>>>>>                             "0"
>>>>>>>>>>>>                         ],
>>>>>>>>>>>>                         [
>>>>>>>>>>>>                             1649239860,
>>>>>>>>>>>>                             "0"
>>>>>>>>>>>>                         ],
>>>>>>>>>>>>                         [
>>>>>>>>>>>>                             1649239920,
>>>>>>>>>>>>                             "0"
>>>>>>>>>>>>                         ],
>>>>>>>>>>>>                         [
>>>>>>>>>>>>                             1649239980,
>>>>>>>>>>>>                             "0"
>>>>>>>>>>>>                         ],
>>>>>>>>>>>>                         [
>>>>>>>>>>>>                             1649240040,
>>>>>>>>>>>>                             "0.03333333333333333"
>>>>>>>>>>>>                         ],
>>>>>>>>>>>>                         [
>>>>>>>>>>>>                             1649240100,
>>>>>>>>>>>>                             "0"
>>>>>>>>>>>>                         ]
>>>>>>>>>>>>                     ]
>>>>>>>>>>>>                 }
>>>>>>>>>>>>             ]
>>>>>>>>>>>>         }
>>>>>>>>>>>>     }
>>>>>>>>>>>> }
>>>>>>>>>>>> 
>>>>>>>>>>>> Given the gradual increase in the underlying counter, I have two 
>>>>>>>>>>>> questions:
>>>>>>>>>>>> 
>>>>>>>>>>>> 1. How come the rate is 0 for all except 2 datapoints?
>>>>>>>>>>>> 2. How come there is one enormous datapoint in the rate query, 
>>>>>>>>>>>> that is seemingly unexplained in the raw data?
>>>>>>>>>>>> 
>>>>>>>>>>>> For 2 I've seen in other threads that the explanation is an 
>>>>>>>>>>>> unintentional counter reset, caused by scrapes a millisecond apart 
>>>>>>>>>>>> that make the counter appear to go down for a single scrape 
>>>>>>>>>>>> interval. I don't think I see this in our raw data, though.
>>>>>>>>>>>> 
>>>>>>>>>>>> We're using Prometheus version 2.26.0, revision 
>>>>>>>>>>>> 3cafc58827d1ebd1a67749f88be4218f0bab3d8d, go version go1.16.2.
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> -- 
>>>>>>>>>>> You received this message because you are subscribed to the Google 
>>>>>>>>>>> Groups "Prometheus Users" group.
>>>>>>>>>>> To unsubscribe from this group and stop receiving emails from it, 
>>>>>>>>>>> send an email to [email protected].
>>>>>>>>>>> To view this discussion on the web visit 
>>>>>>>>>>> https://groups.google.com/d/msgid/prometheus-users/c1b7568b-f7f9-4edc-943a-22412658975fn%40googlegroups.com
>>>>>>>>>>>  
>>>>>>>>>>> <https://groups.google.com/d/msgid/prometheus-users/c1b7568b-f7f9-4edc-943a-22412658975fn%40googlegroups.com?utm_medium=email&utm_source=footer>.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> -- 
>>>>>>>>> You received this message because you are subscribed to the Google 
>>>>>>>>> Groups "Prometheus Users" group.
>>>>>>>>> To unsubscribe from this group and stop receiving emails from it, 
>>>>>>>>> send an email to [email protected].
>>>>>>>>> To view this discussion on the web visit 
>>>>>>>>> https://groups.google.com/d/msgid/prometheus-users/888affc4-e9ba-4ea8-8a40-c7b7a17affe4n%40googlegroups.com
>>>>>>>>>  
>>>>>>>>> <https://groups.google.com/d/msgid/prometheus-users/888affc4-e9ba-4ea8-8a40-c7b7a17affe4n%40googlegroups.com?utm_medium=email&utm_source=footer>.
> 
> 
> -- 
> You received this message because you are subscribed to a topic in the Google 
> Groups "Prometheus Users" group.
> To unsubscribe from this topic, visit 
> https://groups.google.com/d/topic/prometheus-users/gJgSjdxlgYY/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to 
> [email protected].
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/prometheus-users/5b7f1f53-1ea4-4c60-b05e-76396a370a46n%40googlegroups.com
>  
> <https://groups.google.com/d/msgid/prometheus-users/5b7f1f53-1ea4-4c60-b05e-76396a370a46n%40googlegroups.com?utm_medium=email&utm_source=footer>.

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/e78b1ee2-10a9-4f47-996a-caf465d3d614%40www.fastmail.com.

Re: [prometheus-users] Re: Big spike when using rate() - doesn't seem to be a counter reset

Reply via email to