Re: [prometheus-users] Re: up query

Brian Candler Wed, 17 Aug 2022 07:53:51 -0700

Incidentally, there is another way to slice this, which may or may not be 
helpful.


If you tell Grafana to query Prometheus for the simple query "up", you can 
then get Grafana itself to calculate the average, or minimum, or maximum 
over the time range it has queried:
[image: img1.png]
This can be useful in stat panels, where the stat panel dynamically changes 
for the time period you have selected in Grafana (e.g. if you select a 
particular 6 hour window, you want to show the average over those 6 
hours).  It defaults to "Last", i.e. the most recent value.

But we are now moving into the realm of Grafana, and this is a mailing list 
for Prometheus.  Grafana has its own community discussion forum, so 
questions about Grafana are best asked there.

On Wednesday, 17 August 2022 at 15:12:26 UTC+1 Brian Candler wrote:

> If you want servers that have been down for 30 days, then I thought it 
> should be obvious you need max_over_time(up[30d]) == 0  ... but perhaps it 
> isn't as obvious as I thought.
>
> Let me break that query down into parts:
>
> up[30d]   :   returns a *range vector* containing all data points for the 
> timeseries with metric name "up" from T - 30 days to T (where T is the 
> evaluation time, i.e. the point on the X axis)
>
> By "timeseries" I mean distinct combination of metric name and labels, e.g.
> up{instance="foo"}
> up{instance="bar"}
> are two different timeseries.  They happen to share the same metric name 
> ("up") but they are recording an independent sequence of measurements.
>
> Think of the range vector as a two-dimensional grid: there are N different 
> timeseries, each with M data points over that period. The data collected 
> and stored in the TSDB might look like this:
>
> up{instance="foo"}  v1 . . . v2 . . . v3 . . .
> up{instance="bar"}  . . v4 . . . v5 . . . v6 .
>                     -------------------------> time
>
> Then:
> max_over_time(...)  :  for each timeseries in the range vector, picks the 
> highest value.  This returns an *instant vector*, i.e. a single value for 
> every timeseries, which is the maximum of each.
>
> up{instance="foo"}  v3
> up{instance="bar"}  v5
>
> Each of those values is the maximum value of the timeseries, over the 30 
> day period.
>
> Now, you've chosen to draw a graph of this expression, but it's important 
> to realise that the graph itself doesn't need to be over 30 days.  When you 
> draw a graph of an expression, it will sweep across the evaluation time, 
> evaluating the expression repeatedly at different instants in time over the 
> given period.
>
> Let's say, for example, you set the graph range to be 1 week, but you are 
> graphing max_over_time(up[30d]) == 0
>
> What will you get?  This will be a series of points.  Let's imagine the 
> graph only had one point per day. Considering the position of each point on 
> the time axis:
> Aug 17: shows if the server has been down from (Aug 17 - 30 days) to (Aug 
> 17)
> Aug 16: shows if the server has been down from (Aug 16 - 30 days) to (Aug 
> 16)
> ...
> Aug 10: shows if the server has been down from (Aug 10 - 30 days) to (Aug 
> 10)
>
> In fact, for your purposes (asking, has the server been down for the *last 
> 30 days*?) you don't need to draw a graph at all!  In which case, if you 
> turn on the "Instant" switch in Grafana it will only ask Prometheus to 
> evaluate the expression for the current instant, which makes the query much 
> faster and cheaper.
>
> This is then an ideal query to use in a dashboard, where you just want to 
> show a list of servers that have been down for the last 30 days.  You don't 
> care, for example, if 2 days ago they were down for the 30 days before that 
> point, do you?  Because that's what basically a graph of that expression 
> will tell you: at each point in time, whether it was down for the previous 
> 30 days.
>
> On Wednesday, 17 August 2022 at 14:09:42 UTC+1 chembakay...@gmail.com 
> wrote:
>
>> [image: up.PNG]
>> this is the query I am using and the above graph is for 30 days and it is 
>> down from the last day. I want the servers that are down for the whole 30 
>> days
>> On Wednesday, 17 August 2022 at 12:55:48 UTC+5:30 Brian Candler wrote:
>>
>>> Extraordinary claims require extraordinary evidence.
>>>
>>> I don't believe there's a bug in prometheus: I believe there's a bug in 
>>> how you are using it.  But unless you show the data, there's no way to 
>>> demonstrate this.
>>>
>>> On Wednesday, 17 August 2022 at 04:36:43 UTC+1 chembakay...@gmail.com 
>>> wrote:
>>>
>>>>
>>>> yeah. I want only that the servers are down for the whole two days. Its 
>>>> value should always be zero(0) throughout the last 'X' days.
>>>>
>>>> But max_over_time is giving me the info if the servers are down for 
>>>> even one minute from the last 'X' days.
>>>>
>>>> Thanks & regards,
>>>> Bharath kumar.
>>>> On Tuesday, 16 August 2022 at 20:27:30 UTC+5:30 Stuart Clark wrote:
>>>>
>>>>> On 2022-08-16 15:08, BHARATH KUMAR wrote: 
>>>>> > hello, 
>>>>> > 
>>>>> > max_over_time(up[2d]) == 0 is giving me the info like ...for the 
>>>>> last 
>>>>> > two days if the server goes down for 1 minute also it was displaying 
>>>>> > in the graph which I don't want. I want the information that for the 
>>>>> > last "X" days it should be completely in an unreachable state. 
>>>>> > 
>>>>>
>>>>> So you are only wanting it if every single scrape failed over the past 
>>>>> 2 
>>>>> days? 
>>>>>
>>>>> Try sum() instead of max_over_time(). 
>>>>>
>>>>> -- 
>>>>> Stuart Clark 
>>>>>
>>>>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/73553053-b710-4864-9718-5d77b1e0af4an%40googlegroups.com.

Re: [prometheus-users] Re: up query

Reply via email to