Re: [prometheus-users] Re: up query

2022-08-27 Thread Brian Candler
On Saturday, 27 August 2022 at 13:33:33 UTC+1 Brian Candler wrote:

> You want to know how long have they been down? Do the same as you did with 
> node_boot_time_seconds:
>
> *time() - max_over_time((time() * up)[24h:]) unless up == 1*
>

On reflection, I think the following is a slightly better version:

*time() - max_over_time((time() * up)[24h:]) and up == 0*

The only difference is, if you remove a target from the scrape job, this 
will also suppress the value.  That is: you'll only be told that the 
machine has been down for N seconds, if the machine is still being scraped 
*and* is still down.

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/8d561a39-393b-4c36-8866-8fc0ee066408n%40googlegroups.com.


Re: [prometheus-users] Re: up query

2022-08-27 Thread Brian Candler
That's a different thing.

node_boot_time_seconds is a metric that says when the host itself thinks it 
booted - which is not necessarily the same as the host has been "up" or 
"down" from the point of view of Prometheus, which classes "up" as a 
successful scrape.  For example, the host could have been running fine, but 
the network was down: you'll get up == 0 during the network outage, but 
node_boot_time_seconds will not have changed.

Question: are you generating alerts when these machines go down?  If you 
are, then the answer is easy: there's a metric ALERTS_FOR_STATE where the 
value is the time that the alert started. See:
https://jaanhio.me/blog/visualizing-alerts-metrics-grafana/

(You could always add alerting rules which send out no alerts: add a label 
that identifies them as a silent alert, and match this tag in your 
alertmanager routing rules to route them to an empty receiver)

Otherwise, assuming the node is currently down (i.e. up == 0), I think you 
are looking for either:
* the last time at which up == 1
* the last time at which up changed from 1 to 0

However, getting this answer directly through a prometheus query is not 
easy. You can graph the transitions from "up" to "down":

up == 0 unless up offset 5m == 1

But you want the timestamp of the last transition. There is a function 
last_over_time(...) which gets you the last available value, but 
timestamp(last_over_time(...)) doesn't tell you its timestamp.

To the best of my knowledge, you need a trick like:

timestamp(up) and up==1

or more simply, since we know up=0 or 1 only:

time() * up

Then you can sweep this over a range and pick the maximum value, which must 
be the most recent, since time increases monotonically (and it will give 
zero if the machine has been down over the whole period):

max_over_time((time() * up)[24h:])

Note: This is a fairly expensive query, so make sure you only evaluate it 
at a single instant.  If you're doing this in Prometheus web interface 
select "Table", not "Graph".  If you're doing this in Grafana, turn on the 
"Instant" switch.

Want to limit the result to just machines which are down *now*?

max_over_time((time() * up)[24h:]) unless up == 1

You want to know how long have they been down? Do the same as you did with 
node_boot_time_seconds:


*time() - max_over_time((time() * up)[24h:]) unless up == 1*

This query gets more expensive as you increase the time range covered. If 
you're not too worried about full accuracy, e.g. the approximate number of 
hours that the machine has gone down is OK, then you can use a larger 
evaluation step in the subquery: 

time() - max_over_time((time() * up)[30d:1h]) unless up == 1

Hopefully, this has given some ideas about how flexible and powerful PromQL 
is.  Here are some links about PromQL I've bookmarked over time, in case 
they are useful (I haven't tested they all still work):

* 
* 
* 
* 
* 

* 
* 
* 
* 

* 
* 

* 
* 
* 
* 

On Thursday, 25 August 2022 at 13:33:44 UTC+1 chembakay...@gmail.com wrote:

> Thanks, Brian. It really helped me. 
>
> I want to find the Downtime of the instance in a similar way to how we 
> will find the up time of the instance.
>
> Up time : time() - node_boot_time_seconds{instance=~"$instance"}
>
> Is there any metric in node exporter so that we can find the downtime of 
> the instance?
>
> On Wednesday, 24 August 2022 at 16:57:32 UTC+5:30 Brian Candler wrote:
>
>> On Wednesday, 24 August 2022 at 11:43:15 UTC+1 chembakay...@gmail.com 
>> wrote:
>>
>>> (max_over_time(up[60s]) == bool 0) * ((up offset 61s == bool 1) * 
>>> count(up[60s]) OR vector(1)) ---> query
>>>
>>> But the above query threw me an error as below:
>>>
>>> bad_data: 1:73: parse error: expected type instant vector in aggregation 
>>> expression, got range vector
>>>
>> That expression is junk, and you didn't say where you got it from apart 
>> from "some blog".
>>
>> What I am missing here... How I can achieve this solution like "find the 
>>> 

Re: [prometheus-users] Re: up query

2022-08-25 Thread BHARATH KUMAR
Thanks, Brian. It really helped me. 

I want to find the Downtime of the instance in a similar way to how we will 
find the up time of the instance.

Up time : time() - node_boot_time_seconds{instance=~"$instance"}

Is there any metric in node exporter so that we can find the downtime of 
the instance?

On Wednesday, 24 August 2022 at 16:57:32 UTC+5:30 Brian Candler wrote:

> On Wednesday, 24 August 2022 at 11:43:15 UTC+1 chembakay...@gmail.com 
> wrote:
>
>> (max_over_time(up[60s]) == bool 0) * ((up offset 61s == bool 1) * 
>> count(up[60s]) OR vector(1)) ---> query
>>
>> But the above query threw me an error as below:
>>
>> bad_data: 1:73: parse error: expected type instant vector in aggregation 
>> expression, got range vector
>>
> That expression is junk, and you didn't say where you got it from apart 
> from "some blog".
>
> What I am missing here... How I can achieve this solution like "find the 
>> instances that have been completely in down state for last X days"
>>
>
> Can you explain why the answer I gave before is not usable for you?  I 
> have already told you that:
>
> max_over_time(up[30d]) == 0
>
> will give you a list all instances which have been down continuously for 
> the last 30 days, and that seems to be what you keep asking for.  I have 
> tested it, it works:
>
> [image: img1.png]
> That is a table of machines which have been down for 30 days continuously.
>
> Note that this is a query that you should run at a single instant (the 
> current time), not one that you make a graph from.  In Grafana, turn the 
> "instant" toggle on to get this behaviour.
>
> [image: img2.png]
>
> You'll just get set of single data points, which is a list of all the 
> machines that have been down continuously from (now - 30 days) to (now).
>
> You probably want to change the visualisation to a table, or some other 
> panel type. Graph isn't want you want here, since it only shows data for a 
> single point in time.  That is: those machines, which *at the current time* 
> have been down for 30 days before *the current time*.  The reference point 
> is the current time only; you don't want to sweep this query over previous 
> times.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/0f8e77ce-faec-4bac-bd0e-94e8453d2c48n%40googlegroups.com.


Re: [prometheus-users] Re: up query

2022-08-24 Thread BHARATH KUMAR
I saw some blog in google as below:

If you want to count the time spent in down state, this becomes more 
complicated because you have to detect the switch from 1 to 0 which count 
for 1min and the subsequent down state until the first switch back from 0 
to 1.

It could be something along the lines of:

(max_over_time(up[60s]) == bool 0) * ((up offset 61s == bool 1) * 
count(up[60s]) OR vector(1)) ---> query

But the above query threw me an error as below:

bad_data: 1:73: parse error: expected type instant vector in aggregation 
expression, got range vector


What I am missing here... How I can achieve this solution like "find the 
instances that have been completely in down state for last X days"

Thanks & regards,

Bharath Kumar.

On Wednesday, 17 August 2022 at 19:42:26 UTC+5:30 Brian Candler wrote:

> If you want servers that have been down for 30 days, then I thought it 
> should be obvious you need max_over_time(up[30d]) == 0  ... but perhaps it 
> isn't as obvious as I thought.
>
> Let me break that query down into parts:
>
> up[30d]   :   returns a *range vector* containing all data points for the 
> timeseries with metric name "up" from T - 30 days to T (where T is the 
> evaluation time, i.e. the point on the X axis)
>
> By "timeseries" I mean distinct combination of metric name and labels, e.g.
> up{instance="foo"}
> up{instance="bar"}
> are two different timeseries.  They happen to share the same metric name 
> ("up") but they are recording an independent sequence of measurements.
>
> Think of the range vector as a two-dimensional grid: there are N different 
> timeseries, each with M data points over that period. The data collected 
> and stored in the TSDB might look like this:
>
> up{instance="foo"}  v1 . . . v2 . . . v3 . . .
> up{instance="bar"}  . . v4 . . . v5 . . . v6 .
> -> time
>
> Then:
> max_over_time(...)  :  for each timeseries in the range vector, picks the 
> highest value.  This returns an *instant vector*, i.e. a single value for 
> every timeseries, which is the maximum of each.
>
> up{instance="foo"}  v3
> up{instance="bar"}  v5
>
> Each of those values is the maximum value of the timeseries, over the 30 
> day period.
>
> Now, you've chosen to draw a graph of this expression, but it's important 
> to realise that the graph itself doesn't need to be over 30 days.  When you 
> draw a graph of an expression, it will sweep across the evaluation time, 
> evaluating the expression repeatedly at different instants in time over the 
> given period.
>
> Let's say, for example, you set the graph range to be 1 week, but you are 
> graphing max_over_time(up[30d]) == 0
>
> What will you get?  This will be a series of points.  Let's imagine the 
> graph only had one point per day. Considering the position of each point on 
> the time axis:
> Aug 17: shows if the server has been down from (Aug 17 - 30 days) to (Aug 
> 17)
> Aug 16: shows if the server has been down from (Aug 16 - 30 days) to (Aug 
> 16)
> ...
> Aug 10: shows if the server has been down from (Aug 10 - 30 days) to (Aug 
> 10)
>
> In fact, for your purposes (asking, has the server been down for the *last 
> 30 days*?) you don't need to draw a graph at all!  In which case, if you 
> turn on the "Instant" switch in Grafana it will only ask Prometheus to 
> evaluate the expression for the current instant, which makes the query much 
> faster and cheaper.
>
> This is then an ideal query to use in a dashboard, where you just want to 
> show a list of servers that have been down for the last 30 days.  You don't 
> care, for example, if 2 days ago they were down for the 30 days before that 
> point, do you?  Because that's what basically a graph of that expression 
> will tell you: at each point in time, whether it was down for the previous 
> 30 days.
>
> On Wednesday, 17 August 2022 at 14:09:42 UTC+1 chembakay...@gmail.com 
> wrote:
>
>> [image: up.PNG]
>> this is the query I am using and the above graph is for 30 days and it is 
>> down from the last day. I want the servers that are down for the whole 30 
>> days
>> On Wednesday, 17 August 2022 at 12:55:48 UTC+5:30 Brian Candler wrote:
>>
>>> Extraordinary claims require extraordinary evidence.
>>>
>>> I don't believe there's a bug in prometheus: I believe there's a bug in 
>>> how you are using it.  But unless you show the data, there's no way to 
>>> demonstrate this.
>>>
>>> On Wednesday, 17 August 2022 at 04:36:43 UTC+1 chembakay...@gmail.com 
>>> wrote:
>>>

 yeah. I want only that the servers are down for the whole two days. Its 
 value should always be zero(0) throughout the last 'X' days.

 But max_over_time is giving me the info if the servers are down for 
 even one minute from the last 'X' days.

 Thanks & regards,
 Bharath kumar.
 On Tuesday, 16 August 2022 at 20:27:30 UTC+5:30 Stuart Clark wrote:

> On 2022-08-16 15:08, BHARATH KUMAR wrote: 
> > hello, 
> > 
> 

Re: [prometheus-users] Re: up query

2022-08-17 Thread Brian Candler
Incidentally, there is another way to slice this, which may or may not be 
helpful.

If you tell Grafana to query Prometheus for the simple query "up", you can 
then get Grafana itself to calculate the average, or minimum, or maximum 
over the time range it has queried:
[image: img1.png]
This can be useful in stat panels, where the stat panel dynamically changes 
for the time period you have selected in Grafana (e.g. if you select a 
particular 6 hour window, you want to show the average over those 6 
hours).  It defaults to "Last", i.e. the most recent value.

But we are now moving into the realm of Grafana, and this is a mailing list 
for Prometheus.  Grafana has its own community discussion forum, so 
questions about Grafana are best asked there.

On Wednesday, 17 August 2022 at 15:12:26 UTC+1 Brian Candler wrote:

> If you want servers that have been down for 30 days, then I thought it 
> should be obvious you need max_over_time(up[30d]) == 0  ... but perhaps it 
> isn't as obvious as I thought.
>
> Let me break that query down into parts:
>
> up[30d]   :   returns a *range vector* containing all data points for the 
> timeseries with metric name "up" from T - 30 days to T (where T is the 
> evaluation time, i.e. the point on the X axis)
>
> By "timeseries" I mean distinct combination of metric name and labels, e.g.
> up{instance="foo"}
> up{instance="bar"}
> are two different timeseries.  They happen to share the same metric name 
> ("up") but they are recording an independent sequence of measurements.
>
> Think of the range vector as a two-dimensional grid: there are N different 
> timeseries, each with M data points over that period. The data collected 
> and stored in the TSDB might look like this:
>
> up{instance="foo"}  v1 . . . v2 . . . v3 . . .
> up{instance="bar"}  . . v4 . . . v5 . . . v6 .
> -> time
>
> Then:
> max_over_time(...)  :  for each timeseries in the range vector, picks the 
> highest value.  This returns an *instant vector*, i.e. a single value for 
> every timeseries, which is the maximum of each.
>
> up{instance="foo"}  v3
> up{instance="bar"}  v5
>
> Each of those values is the maximum value of the timeseries, over the 30 
> day period.
>
> Now, you've chosen to draw a graph of this expression, but it's important 
> to realise that the graph itself doesn't need to be over 30 days.  When you 
> draw a graph of an expression, it will sweep across the evaluation time, 
> evaluating the expression repeatedly at different instants in time over the 
> given period.
>
> Let's say, for example, you set the graph range to be 1 week, but you are 
> graphing max_over_time(up[30d]) == 0
>
> What will you get?  This will be a series of points.  Let's imagine the 
> graph only had one point per day. Considering the position of each point on 
> the time axis:
> Aug 17: shows if the server has been down from (Aug 17 - 30 days) to (Aug 
> 17)
> Aug 16: shows if the server has been down from (Aug 16 - 30 days) to (Aug 
> 16)
> ...
> Aug 10: shows if the server has been down from (Aug 10 - 30 days) to (Aug 
> 10)
>
> In fact, for your purposes (asking, has the server been down for the *last 
> 30 days*?) you don't need to draw a graph at all!  In which case, if you 
> turn on the "Instant" switch in Grafana it will only ask Prometheus to 
> evaluate the expression for the current instant, which makes the query much 
> faster and cheaper.
>
> This is then an ideal query to use in a dashboard, where you just want to 
> show a list of servers that have been down for the last 30 days.  You don't 
> care, for example, if 2 days ago they were down for the 30 days before that 
> point, do you?  Because that's what basically a graph of that expression 
> will tell you: at each point in time, whether it was down for the previous 
> 30 days.
>
> On Wednesday, 17 August 2022 at 14:09:42 UTC+1 chembakay...@gmail.com 
> wrote:
>
>> [image: up.PNG]
>> this is the query I am using and the above graph is for 30 days and it is 
>> down from the last day. I want the servers that are down for the whole 30 
>> days
>> On Wednesday, 17 August 2022 at 12:55:48 UTC+5:30 Brian Candler wrote:
>>
>>> Extraordinary claims require extraordinary evidence.
>>>
>>> I don't believe there's a bug in prometheus: I believe there's a bug in 
>>> how you are using it.  But unless you show the data, there's no way to 
>>> demonstrate this.
>>>
>>> On Wednesday, 17 August 2022 at 04:36:43 UTC+1 chembakay...@gmail.com 
>>> wrote:
>>>

 yeah. I want only that the servers are down for the whole two days. Its 
 value should always be zero(0) throughout the last 'X' days.

 But max_over_time is giving me the info if the servers are down for 
 even one minute from the last 'X' days.

 Thanks & regards,
 Bharath kumar.
 On Tuesday, 16 August 2022 at 20:27:30 UTC+5:30 Stuart Clark wrote:

> On 2022-08-16 15:08, BHARATH KUMAR wrote: 
> > hello, 

Re: [prometheus-users] Re: up query

2022-08-17 Thread Brian Candler
If you want servers that have been down for 30 days, then I thought it 
should be obvious you need max_over_time(up[30d]) == 0  ... but perhaps it 
isn't as obvious as I thought.

Let me break that query down into parts:

up[30d]   :   returns a *range vector* containing all data points for the 
timeseries with metric name "up" from T - 30 days to T (where T is the 
evaluation time, i.e. the point on the X axis)

By "timeseries" I mean distinct combination of metric name and labels, e.g.
up{instance="foo"}
up{instance="bar"}
are two different timeseries.  They happen to share the same metric name 
("up") but they are recording an independent sequence of measurements.

Think of the range vector as a two-dimensional grid: there are N different 
timeseries, each with M data points over that period. The data collected 
and stored in the TSDB might look like this:

up{instance="foo"}  v1 . . . v2 . . . v3 . . .
up{instance="bar"}  . . v4 . . . v5 . . . v6 .
-> time

Then:
max_over_time(...)  :  for each timeseries in the range vector, picks the 
highest value.  This returns an *instant vector*, i.e. a single value for 
every timeseries, which is the maximum of each.

up{instance="foo"}  v3
up{instance="bar"}  v5

Each of those values is the maximum value of the timeseries, over the 30 
day period.

Now, you've chosen to draw a graph of this expression, but it's important 
to realise that the graph itself doesn't need to be over 30 days.  When you 
draw a graph of an expression, it will sweep across the evaluation time, 
evaluating the expression repeatedly at different instants in time over the 
given period.

Let's say, for example, you set the graph range to be 1 week, but you are 
graphing max_over_time(up[30d]) == 0

What will you get?  This will be a series of points.  Let's imagine the 
graph only had one point per day. Considering the position of each point on 
the time axis:
Aug 17: shows if the server has been down from (Aug 17 - 30 days) to (Aug 
17)
Aug 16: shows if the server has been down from (Aug 16 - 30 days) to (Aug 
16)
...
Aug 10: shows if the server has been down from (Aug 10 - 30 days) to (Aug 
10)

In fact, for your purposes (asking, has the server been down for the *last 
30 days*?) you don't need to draw a graph at all!  In which case, if you 
turn on the "Instant" switch in Grafana it will only ask Prometheus to 
evaluate the expression for the current instant, which makes the query much 
faster and cheaper.

This is then an ideal query to use in a dashboard, where you just want to 
show a list of servers that have been down for the last 30 days.  You don't 
care, for example, if 2 days ago they were down for the 30 days before that 
point, do you?  Because that's what basically a graph of that expression 
will tell you: at each point in time, whether it was down for the previous 
30 days.

On Wednesday, 17 August 2022 at 14:09:42 UTC+1 chembakay...@gmail.com wrote:

> [image: up.PNG]
> this is the query I am using and the above graph is for 30 days and it is 
> down from the last day. I want the servers that are down for the whole 30 
> days
> On Wednesday, 17 August 2022 at 12:55:48 UTC+5:30 Brian Candler wrote:
>
>> Extraordinary claims require extraordinary evidence.
>>
>> I don't believe there's a bug in prometheus: I believe there's a bug in 
>> how you are using it.  But unless you show the data, there's no way to 
>> demonstrate this.
>>
>> On Wednesday, 17 August 2022 at 04:36:43 UTC+1 chembakay...@gmail.com 
>> wrote:
>>
>>>
>>> yeah. I want only that the servers are down for the whole two days. Its 
>>> value should always be zero(0) throughout the last 'X' days.
>>>
>>> But max_over_time is giving me the info if the servers are down for even 
>>> one minute from the last 'X' days.
>>>
>>> Thanks & regards,
>>> Bharath kumar.
>>> On Tuesday, 16 August 2022 at 20:27:30 UTC+5:30 Stuart Clark wrote:
>>>
 On 2022-08-16 15:08, BHARATH KUMAR wrote: 
 > hello, 
 > 
 > max_over_time(up[2d]) == 0 is giving me the info like ...for the last 
 > two days if the server goes down for 1 minute also it was displaying 
 > in the graph which I don't want. I want the information that for the 
 > last "X" days it should be completely in an unreachable state. 
 > 

 So you are only wanting it if every single scrape failed over the past 
 2 
 days? 

 Try sum() instead of max_over_time(). 

 -- 
 Stuart Clark 

>>>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/9bfc2837-952d-4177-8b8c-2058fd03522cn%40googlegroups.com.


Re: [prometheus-users] Re: up query

2022-08-17 Thread BHARATH KUMAR
[image: up.PNG]
this is the query I am using and the above graph is for 30 days and it is 
down from the last day. I want the servers that are down for the whole 30 
days
On Wednesday, 17 August 2022 at 12:55:48 UTC+5:30 Brian Candler wrote:

> Extraordinary claims require extraordinary evidence.
>
> I don't believe there's a bug in prometheus: I believe there's a bug in 
> how you are using it.  But unless you show the data, there's no way to 
> demonstrate this.
>
> On Wednesday, 17 August 2022 at 04:36:43 UTC+1 chembakay...@gmail.com 
> wrote:
>
>>
>> yeah. I want only that the servers are down for the whole two days. Its 
>> value should always be zero(0) throughout the last 'X' days.
>>
>> But max_over_time is giving me the info if the servers are down for even 
>> one minute from the last 'X' days.
>>
>> Thanks & regards,
>> Bharath kumar.
>> On Tuesday, 16 August 2022 at 20:27:30 UTC+5:30 Stuart Clark wrote:
>>
>>> On 2022-08-16 15:08, BHARATH KUMAR wrote: 
>>> > hello, 
>>> > 
>>> > max_over_time(up[2d]) == 0 is giving me the info like ...for the last 
>>> > two days if the server goes down for 1 minute also it was displaying 
>>> > in the graph which I don't want. I want the information that for the 
>>> > last "X" days it should be completely in an unreachable state. 
>>> > 
>>>
>>> So you are only wanting it if every single scrape failed over the past 2 
>>> days? 
>>>
>>> Try sum() instead of max_over_time(). 
>>>
>>> -- 
>>> Stuart Clark 
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/bd1f6949-c476-41b9-9c05-13551b486922n%40googlegroups.com.


Re: [prometheus-users] Re: up query

2022-08-17 Thread Brian Candler
Extraordinary claims require extraordinary evidence.

I don't believe there's a bug in prometheus: I believe there's a bug in how 
you are using it.  But unless you show the data, there's no way to 
demonstrate this.

On Wednesday, 17 August 2022 at 04:36:43 UTC+1 chembakay...@gmail.com wrote:

>
> yeah. I want only that the servers are down for the whole two days. Its 
> value should always be zero(0) throughout the last 'X' days.
>
> But max_over_time is giving me the info if the servers are down for even 
> one minute from the last 'X' days.
>
> Thanks & regards,
> Bharath kumar.
> On Tuesday, 16 August 2022 at 20:27:30 UTC+5:30 Stuart Clark wrote:
>
>> On 2022-08-16 15:08, BHARATH KUMAR wrote:
>> > hello,
>> > 
>> > max_over_time(up[2d]) == 0 is giving me the info like ...for the last
>> > two days if the server goes down for 1 minute also it was displaying
>> > in the graph which I don't want. I want the information that for the
>> > last "X" days it should be completely in an unreachable state.
>> > 
>>
>> So you are only wanting it if every single scrape failed over the past 2 
>> days?
>>
>> Try sum() instead of max_over_time().
>>
>> -- 
>> Stuart Clark
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/7e5443c1-ec7a-43ea-a400-674bbd93d5f1n%40googlegroups.com.


Re: [prometheus-users] Re: up query

2022-08-16 Thread BHARATH KUMAR

yeah. I want only that the servers are down for the whole two days. Its 
value should always be zero(0) throughout the last 'X' days.

But max_over_time is giving me the info if the servers are down for even 
one minute from the last 'X' days.

Thanks & regards,
Bharath kumar.
On Tuesday, 16 August 2022 at 20:27:30 UTC+5:30 Stuart Clark wrote:

> On 2022-08-16 15:08, BHARATH KUMAR wrote:
> > hello,
> > 
> > max_over_time(up[2d]) == 0 is giving me the info like ...for the last
> > two days if the server goes down for 1 minute also it was displaying
> > in the graph which I don't want. I want the information that for the
> > last "X" days it should be completely in an unreachable state.
> > 
>
> So you are only wanting it if every single scrape failed over the past 2 
> days?
>
> Try sum() instead of max_over_time().
>
> -- 
> Stuart Clark
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/f4ab23e7-faf6-4e35-a15c-af84f799a375n%40googlegroups.com.


[prometheus-users] Re: up query

2022-08-16 Thread Brian Candler
> Try sum() instead of max_over_time().

Do you mean sum_over_time() ?  But it will amount to the same thing, surely?

On Tuesday, 16 August 2022 at 15:57:44 UTC+1 Brian Candler wrote:

> What you are saying doesn't make sense, so you need to provide some 
> evidence: actual queries, actual data.  Pick a specific instance, which 
> I'll say is "foo".
>
> up{instance="foo"} # show graph over 1 week
> up{instance="foo"}[2d]# show console (range vectors can't be graphed)
> max_over_time{instance="foo"}[2d])   # show graph over 1 week
>
> I assert that if up{instance="foo"} is a mixture of 0s and 1s for a given 
> 48 hour period, then the value of max_over_time{instance="foo"}[2d]} at the 
> end of that 48 hour period will be 1.
>
> On Tuesday, 16 August 2022 at 15:08:15 UTC+1 chembakay...@gmail.com wrote:
>
>> hello,
>>
>> max_over_time(up[2d]) == 0 is giving me the info like ...for the last two 
>> days if the server goes down for 1 minute also it was displaying in the 
>> graph which I don't want. I want the information that for the last "X" days 
>> it should be completely in an unreachable state.
>>
>> Thanks & regards,
>> Bharath Kumar.
>>
>> On Tuesday, 16 August 2022 at 12:47:03 UTC+5:30 Brian Candler wrote:
>>
>>> If the metric is 0, 1, 0, 1, 1, 0 ...  then max_over_time will be 1, if 
>>> the time period in question covers those values.
>>> If the metric is 0, 0, 0, 0, 0, 0 ... then max_over_time will be 0.
>>>
>>> If you enter an expression like
>>>
>>> max_over_time(up{instance=~"some_instance_name"}[2d])
>>>
>>> and *draw a graph of it*, then you need to understand what that graph 
>>> represents.  On the X axis is time; this is the time the expression was 
>>> evaluated at.  The expression itself looks at the 2 days of data *up to and 
>>> including that time*: that is, the range vector up[2d] reads all data in 
>>> the database between T and T-2d.
>>>
>>> For example, if there's a point on the graph where the X axis is 15 Aug 
>>> 12:00, and the Y axis is 1, it means that the max_over_time between 13 Aug 
>>> 12:00 and 15 Aug 12:00 was 1.  This in turn implies that there was at least 
>>> one 1 value in that 2d period.  It will only show 0 if *all* the values in 
>>> that period were 0.
>>>
>>> If that doesn't do what you want, then you'll have to describe exactly 
>>> what you see more clearly, with actual concrete queries and responses, and 
>>> explain why it is different to what you expect.  Otherwise, only you can 
>>> see the data in front of you, so it's up to you to understand why your 
>>> query isn't doing what you expect.
>>>
>>> > But I want only unreachable state servers over a period of time?
>>>
>>> That will be those where max_over_time(...) is zero, and you can filter 
>>> down to just those servers with an expression like this:
>>>
>>> max_over_time(up[2d]) == 0
>>>
>>> If you graph this expression, then all the data points will be zeros, 
>>> but the points will appear and disappear over time.  They will be present 
>>> at time T only if all the values in the period T-2d to T were 0.  If that's 
>>> not the case, then the point will not be displayed.
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/76ce5af4-c50d-4496-958f-e849cdb10afbn%40googlegroups.com.


[prometheus-users] Re: up query

2022-08-16 Thread Brian Candler
What you are saying doesn't make sense, so you need to provide some 
evidence: actual queries, actual data.  Pick a specific instance, which 
I'll say is "foo".

up{instance="foo"} # show graph over 1 week
up{instance="foo"}[2d]# show console (range vectors can't be graphed)
max_over_time{instance="foo"}[2d])   # show graph over 1 week

I assert that if up{instance="foo"} is a mixture of 0s and 1s for a given 
48 hour period, then the value of max_over_time{instance="foo"}[2d]} at the 
end of that 48 hour period will be 1.

On Tuesday, 16 August 2022 at 15:08:15 UTC+1 chembakay...@gmail.com wrote:

> hello,
>
> max_over_time(up[2d]) == 0 is giving me the info like ...for the last two 
> days if the server goes down for 1 minute also it was displaying in the 
> graph which I don't want. I want the information that for the last "X" days 
> it should be completely in an unreachable state.
>
> Thanks & regards,
> Bharath Kumar.
>
> On Tuesday, 16 August 2022 at 12:47:03 UTC+5:30 Brian Candler wrote:
>
>> If the metric is 0, 1, 0, 1, 1, 0 ...  then max_over_time will be 1, if 
>> the time period in question covers those values.
>> If the metric is 0, 0, 0, 0, 0, 0 ... then max_over_time will be 0.
>>
>> If you enter an expression like
>>
>> max_over_time(up{instance=~"some_instance_name"}[2d])
>>
>> and *draw a graph of it*, then you need to understand what that graph 
>> represents.  On the X axis is time; this is the time the expression was 
>> evaluated at.  The expression itself looks at the 2 days of data *up to and 
>> including that time*: that is, the range vector up[2d] reads all data in 
>> the database between T and T-2d.
>>
>> For example, if there's a point on the graph where the X axis is 15 Aug 
>> 12:00, and the Y axis is 1, it means that the max_over_time between 13 Aug 
>> 12:00 and 15 Aug 12:00 was 1.  This in turn implies that there was at least 
>> one 1 value in that 2d period.  It will only show 0 if *all* the values in 
>> that period were 0.
>>
>> If that doesn't do what you want, then you'll have to describe exactly 
>> what you see more clearly, with actual concrete queries and responses, and 
>> explain why it is different to what you expect.  Otherwise, only you can 
>> see the data in front of you, so it's up to you to understand why your 
>> query isn't doing what you expect.
>>
>> > But I want only unreachable state servers over a period of time?
>>
>> That will be those where max_over_time(...) is zero, and you can filter 
>> down to just those servers with an expression like this:
>>
>> max_over_time(up[2d]) == 0
>>
>> If you graph this expression, then all the data points will be zeros, but 
>> the points will appear and disappear over time.  They will be present at 
>> time T only if all the values in the period T-2d to T were 0.  If that's 
>> not the case, then the point will not be displayed.
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/6f9c7533-d1dc-431c-9456-337b36e2f667n%40googlegroups.com.


Re: [prometheus-users] Re: up query

2022-08-16 Thread Stuart Clark

On 2022-08-16 15:08, BHARATH KUMAR wrote:

hello,

max_over_time(up[2d]) == 0 is giving me the info like ...for the last
two days if the server goes down for 1 minute also it was displaying
in the graph which I don't want. I want the information that for the
last "X" days it should be completely in an unreachable state.



So you are only wanting it if every single scrape failed over the past 2 
days?


Try sum() instead of max_over_time().

--
Stuart Clark

--
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/9c67136f433aae5e60531df463901ca8%40Jahingo.com.


[prometheus-users] Re: up query

2022-08-16 Thread BHARATH KUMAR
hello,

max_over_time(up[2d]) == 0 is giving me the info like ...for the last two 
days if the server goes down for 1 minute also it was displaying in the 
graph which I don't want. I want the information that for the last "X" days 
it should be completely in an unreachable state.

Thanks & regards,
Bharath Kumar.

On Tuesday, 16 August 2022 at 12:47:03 UTC+5:30 Brian Candler wrote:

> If the metric is 0, 1, 0, 1, 1, 0 ...  then max_over_time will be 1, if 
> the time period in question covers those values.
> If the metric is 0, 0, 0, 0, 0, 0 ... then max_over_time will be 0.
>
> If you enter an expression like
>
> max_over_time(up{instance=~"some_instance_name"}[2d])
>
> and *draw a graph of it*, then you need to understand what that graph 
> represents.  On the X axis is time; this is the time the expression was 
> evaluated at.  The expression itself looks at the 2 days of data *up to and 
> including that time*: that is, the range vector up[2d] reads all data in 
> the database between T and T-2d.
>
> For example, if there's a point on the graph where the X axis is 15 Aug 
> 12:00, and the Y axis is 1, it means that the max_over_time between 13 Aug 
> 12:00 and 15 Aug 12:00 was 1.  This in turn implies that there was at least 
> one 1 value in that 2d period.  It will only show 0 if *all* the values in 
> that period were 0.
>
> If that doesn't do what you want, then you'll have to describe exactly 
> what you see more clearly, with actual concrete queries and responses, and 
> explain why it is different to what you expect.  Otherwise, only you can 
> see the data in front of you, so it's up to you to understand why your 
> query isn't doing what you expect.
>
> > But I want only unreachable state servers over a period of time?
>
> That will be those where max_over_time(...) is zero, and you can filter 
> down to just those servers with an expression like this:
>
> max_over_time(up[2d]) == 0
>
> If you graph this expression, then all the data points will be zeros, but 
> the points will appear and disappear over time.  They will be present at 
> time T only if all the values in the period T-2d to T were 0.  If that's 
> not the case, then the point will not be displayed.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/33e2d50f-0d9c-46be-a099-2e107765077dn%40googlegroups.com.


[prometheus-users] Re: up query

2022-08-16 Thread Brian Candler
If the metric is 0, 1, 0, 1, 1, 0 ...  then max_over_time will be 1, if the 
time period in question covers those values.
If the metric is 0, 0, 0, 0, 0, 0 ... then max_over_time will be 0.

If you enter an expression like

max_over_time(up{instance=~"some_instance_name"}[2d])

and *draw a graph of it*, then you need to understand what that graph 
represents.  On the X axis is time; this is the time the expression was 
evaluated at.  The expression itself looks at the 2 days of data *up to and 
including that time*: that is, the range vector up[2d] reads all data in 
the database between T and T-2d.

For example, if there's a point on the graph where the X axis is 15 Aug 
12:00, and the Y axis is 1, it means that the max_over_time between 13 Aug 
12:00 and 15 Aug 12:00 was 1.  This in turn implies that there was at least 
one 1 value in that 2d period.  It will only show 0 if *all* the values in 
that period were 0.

If that doesn't do what you want, then you'll have to describe exactly what 
you see more clearly, with actual concrete queries and responses, and 
explain why it is different to what you expect.  Otherwise, only you can 
see the data in front of you, so it's up to you to understand why your 
query isn't doing what you expect.

> But I want only unreachable state servers over a period of time?

That will be those where max_over_time(...) is zero, and you can filter 
down to just those servers with an expression like this:

max_over_time(up[2d]) == 0

If you graph this expression, then all the data points will be zeros, but 
the points will appear and disappear over time.  They will be present at 
time T only if all the values in the period T-2d to T were 0.  If that's 
not the case, then the point will not be displayed.

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/a21eecd5-b11a-459d-b4b7-3d633d8d75c3n%40googlegroups.com.


[prometheus-users] Re: up query

2022-08-15 Thread BHARATH KUMAR
Hello sir,

 When *I look at the inner query it was showing the mixture of both 1's and 
0's.*

max_over_time(up{instance=~"instance"}[2d])

It should show 1 for any instance where the server was up over the previous 
48 hours.  Does it not?
Yes it is not showing the value equal to 1. For the last 48 hours, It was 
showing when it was in reachable state and when it was not.

But I want only unreachable state servers over a period of time?

How can we achieve that?

Thanks & regards,
Bharath Kumar

On Tuesday, 9 August 2022 at 20:45:56 UTC+5:30 Brian Candler wrote:

> Use the PromQL query browser (in the Prometheus web interface) to debug 
> it.  I suggest you first need to look at the inner query:
>
> up{instance=~"instance"}
>
> and graph it, setting the "instance" regexp to match one or more instances 
> of interest. What does it look like? Is it a mixture of 0's and 1's, or all 
> 0's, or all 1's, or is it absent entirely?  If it's absent entirely, then 
> that's a different problem you need to investigate - your scrape job is 
> completely broken.
>
> If it's a mixture of 0's and 1's, then try this query:
>
> max_over_time(up{instance=~"instance"}[2d])
>
> It should show 1 for any instant where the server was up at any time over 
> the previous 48 hours.  Does it not?
>
> If it's all 0's for at least 48 hours, then
>
> max_over_time(up{instance=~"instance"}[2d])
>
> should show 0.
>
> Once you've understood why your query wasn't working as you were 
> expecting, then for partially reachable you can try a query like this:
>
> avg_over_time(up{instance=~"instance"}[2d]) > 0 < 0.9
>
> (setting thresholds as appropriate)
>
> On Tuesday, 9 August 2022 at 13:23:33 UTC+1 chembakay...@gmail.com wrote:
>
>> Hi all,
>>
>> *First Query :*
>> I want to find the servers which have not been reachable for the last X 
>> days. It should not be in a reachable state for the last X days. I tried 
>> the following query, but it didn't work out.
>>
>> Query :  max_over_time(up{instance=~"instance"}[Xd]) == 0
>>
>> The above query gives me the info that servers are not reachable at least 
>> for 1 minute. But I want to know the info like it should not be reachable 
>> for the last X days.
>>
>> *Second Query :*
>> I want to find the servers which are partially reachable for the last X 
>> days and it should not include the info that is totally unreachable state 
>> for the X days.
>>
>> Any leads?
>>
>> Thanks & regards,
>> Bharath Kumar.
>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/52c5a22c-2923-4586-ae51-b0bff3121244n%40googlegroups.com.


[prometheus-users] Re: up query

2022-08-09 Thread Brian Candler
Use the PromQL query browser (in the Prometheus web interface) to debug 
it.  I suggest you first need to look at the inner query:

up{instance=~"instance"}

and graph it, setting the "instance" regexp to match one or more instances 
of interest. What does it look like? Is it a mixture of 0's and 1's, or all 
0's, or all 1's, or is it absent entirely?  If it's absent entirely, then 
that's a different problem you need to investigate - your scrape job is 
completely broken.

If it's a mixture of 0's and 1's, then try this query:

max_over_time(up{instance=~"instance"}[2d])

It should show 1 for any instant where the server was up at any time over 
the previous 48 hours.  Does it not?

If it's all 0's for at least 48 hours, then

max_over_time(up{instance=~"instance"}[2d])

should show 0.

Once you've understood why your query wasn't working as you were expecting, 
then for partially reachable you can try a query like this:

avg_over_time(up{instance=~"instance"}[2d]) > 0 < 0.9

(setting thresholds as appropriate)

On Tuesday, 9 August 2022 at 13:23:33 UTC+1 chembakay...@gmail.com wrote:

> Hi all,
>
> *First Query :*
> I want to find the servers which have not been reachable for the last X 
> days. It should not be in a reachable state for the last X days. I tried 
> the following query, but it didn't work out.
>
> Query :  max_over_time(up{instance=~"instance"}[Xd]) == 0
>
> The above query gives me the info that servers are not reachable at least 
> for 1 minute. But I want to know the info like it should not be reachable 
> for the last X days.
>
> *Second Query :*
> I want to find the servers which are partially reachable for the last X 
> days and it should not include the info that is totally unreachable state 
> for the X days.
>
> Any leads?
>
> Thanks & regards,
> Bharath Kumar.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/223d9dfb-5ac2-4629-b36a-a305d029fc21n%40googlegroups.com.