[prometheus-users] Re: Big spike when using rate() - doesn't seem to be a counter reset

Sam Rose Wed, 06 Apr 2022 05:45:44 -0700

I just learned about the resets() function and applying it does seem to 
show that a reset occurred:


{
  "request": {
    "url": 
"api/datasources/proxy/1/api/v1/query_range?query=resets(counter[1m])&start=1649239200&end=1649240100&step=60",
    "method": "GET",
    "hideFromInspector": false
  },
  "response": {
    "status": "success",
    "data": {
      "resultType": "matrix",
      "result": [
        {
          "metric": {/* redacted */},
          "values": [
            [
              1649239200,
              "0"
            ],
            [
              1649239260,
              "0"
            ],
            [
              1649239320,
              "0"
            ],
            [
              1649239380,
              "0"
            ],
            [
              1649239440,
              "0"
            ],
            [
              1649239500,
              "0"
            ],
            [
              1649239560,
              "0"
            ],
            [
              1649239620,
              "0"
            ],
            [
              1649239680,
              "0"
            ],
            [
              1649239740,
              "1"
            ],
            [
              1649239800,
              "0"
            ],
            [
              1649239860,
              "0"
            ],
            [
              1649239920,
              "0"
            ],
            [
              1649239980,
              "0"
            ],
            [
              1649240040,
              "0"
            ],
            [
              1649240100,
              "0"
            ]
          ]
        }
      ]
    }
  }
}

I don't quite understand how, though.
On Wednesday, April 6, 2022 at 1:40:12 PM UTC+1 Sam Rose wrote:

> Hi there,
>
> We're seeing really large spikes when using the `rate()` function on some 
> of our metrics. I've been able to isolate a single time series that 
> displays this problem, which I'm going to call `counter`. I haven't 
> attached the actual metric labels here, but all of the data you see here is 
> from `counter` over the same time period.
>
> This is the raw data, as obtained through a request to /api/v1/query:
>
> {
>     "data": {
>         "result": [
>             {
>                 "metric": {/* redacted */},
>                 "values": [
>                     [
>                         1649239253.4,
>                         "225201"
>                     ],
>                     [
>                         1649239313.4,
>                         "225226"
>                     ],
>                     [
>                         1649239373.4,
>                         "225249"
>                     ],
>                     [
>                         1649239433.4,
>                         "225262"
>                     ],
>                     [
>                         1649239493.4,
>                         "225278"
>                     ],
>                     [
>                         1649239553.4,
>                         "225310"
>                     ],
>                     [
>                         1649239613.4,
>                         "225329"
>                     ],
>                     [
>                         1649239673.4,
>                         "225363"
>                     ],
>                     [
>                         1649239733.4,
>                         "225402"
>                     ],
>                     [
>                         1649239793.4,
>                         "225437"
>                     ],
>                     [
>                         1649239853.4,
>                         "225466"
>                     ],
>                     [
>                         1649239913.4,
>                         "225492"
>                     ],
>                     [
>                         1649239973.4,
>                         "225529"
>                     ],
>                     [
>                         1649240033.4,
>                         "225555"
>                     ],
>                     [
>                         1649240093.4,
>                         "225595"
>                     ]
>                 ]
>             }
>         ],
>         "resultType": "matrix"
>     },
>     "status": "success"
> }
>
> The next query is taken from the Grafana query inspector, because for 
> reasons I don't understand I can't get Prometheus to give me any data when 
> I issue the same query to /api/v1/query_range. The query is the same as the 
> above query, but wrapped in a rate([1m]):
>
>     "request": {
>         "url": 
> "api/datasources/proxy/1/api/v1/query_range?query=rate(counter[1m])&start=1649239200&end=1649240100&step=60",
>         "method": "GET",
>         "hideFromInspector": false
>     },
>     "response": {
>         "status": "success",
>         "data": {
>             "resultType": "matrix",
>             "result": [
>                 {
>                     "metric": {/* redacted */},
>                     "values": [
>                         [
>                             1649239200,
>                             "0"
>                         ],
>                         [
>                             1649239260,
>                             "0"
>                         ],
>                         [
>                             1649239320,
>                             "0"
>                         ],
>                         [
>                             1649239380,
>                             "0"
>                         ],
>                         [
>                             1649239440,
>                             "0"
>                         ],
>                         [
>                             1649239500,
>                             "0"
>                         ],
>                         [
>                             1649239560,
>                             "0"
>                         ],
>                         [
>                             1649239620,
>                             "0"
>                         ],
>                         [
>                             1649239680,
>                             "0"
>                         ],
>                         [
>                             1649239740,
>                             "9391.766666666665"
>                         ],
>                         [
>                             1649239800,
>                             "0"
>                         ],
>                         [
>                             1649239860,
>                             "0"
>                         ],
>                         [
>                             1649239920,
>                             "0"
>                         ],
>                         [
>                             1649239980,
>                             "0"
>                         ],
>                         [
>                             1649240040,
>                             "0.03333333333333333"
>                         ],
>                         [
>                             1649240100,
>                             "0"
>                         ]
>                     ]
>                 }
>             ]
>         }
>     }
> }
>
> Given the gradual increase in the underlying counter, I have two questions:
>
> 1. How come the rate is 0 for all except 2 datapoints?
> 2. How come there is one enormous datapoint in the rate query, that is 
> seemingly unexplained in the raw data?
>
> For 2 I've seen in other threads that the explanation is an unintentional 
> counter reset, caused by scrapes a millisecond apart that make the counter 
> appear to go down for a single scrape interval. I don't think I see this in 
> our raw data, though.
>
> We're using Prometheus version 2.26.0, revision 
> 3cafc58827d1ebd1a67749f88be4218f0bab3d8d, go version go1.16.2.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/c1b7568b-f7f9-4edc-943a-22412658975fn%40googlegroups.com.

[prometheus-users] Re: Big spike when using rate() - doesn't seem to be a counter reset

Reply via email to