[prometheus-users] Re: Easy way to parse out rule expression to extract value (remove comparator)

2022-08-12 Thread Brian Candler
On Thursday, 3 June 2021 at 00:04:30 UTC+1 cal...@callumj.com wrote:

> With alerting rules the Prometheus API (and templating language) has 
> access to the "Value" of an alert which appears to be the first metric 
> specified without the comparator.
>

That statement isn't very accurate.

Firstly, PromQL works with values which are a *vector* or zero or more 
values.  For example, the expression

foo

might give the following vector of values (at the current point in time, or 
the specific time which you are querying for):

[
foo{bar="a",baz="x"} 123
foo{bar="a",baz="y"} 456
foo{bar="b",baz="z"} 789
]

An expression like foo{bar="a"} is a filter: it will limit the vector to 
just those values with label bar="a"

[
foo{bar="a",baz="x"} 123
foo{bar="a",baz="y"} 456
]

Similarly, an expression like foo > 200 is a filter: it will limit the 
vector to just those with value over 200, which in this case would be

[
foo{bar="a",baz="y"} 456
foo{bar="b",baz="z"} 789
]

And the expression "foo > 1000" will give an empty vector.

An alerting rule is just an expression like this, and the alert triggers 
whenever the vector is *not empty* - and generates multiple alerts if the 
vector has multiple values.  For each alert, the "Value" is simply the 
value of the element of that vector.  Prometheus is not working out "the 
first metric specified without the comparator" or anything like that.  
*Any* value generates an alert.

Given that foo > 200 does not return a "boolean", it's a filter which 
narrows down the result set, these filters can be combined, e.g. "foo > 200 
< 500".

If you *do* want boolean values (which I find quite rare in practice), then 
there is another form "foo > bool 200", which will give you:

[
foo{bar="a",baz="x"} 0
foo{bar="a",baz="y"} 1
foo{bar="b",baz="z"} 1
]

This is completely useless for an alerting rule, because you will get three 
alerts firing continuously: one alert with value 0 and two alerts with 
value 1.  *Any* value in a vector will generate an alert, even 0.

All the above is considering the case where the LHS is an instant vector 
and the RHS is a scalar.  You can also have expressions where both LHS and 
RHS are vectors: in this case, the values in one vector will be matched to 
those in the other vector which have identical label sets.  For example:

node_filesystem_avail_bytes < node_filesystem_free_bytes

My apologies if you know all this already :-)

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/1eb5f425-2d5d-4b10-80e8-0f3b6a2be822n%40googlegroups.com.


[prometheus-users] Re: Alertmanager slack alerting issues

2022-08-12 Thread Brian Candler
Firstly, I'd drop the "continue: true" lines. They are not required, and 
are just going to cause confusion.

The 'slack' and 'production' receivers are both sending to #prod-channel.  
So you'll hit this if the env is not exactly "dev".  I suggest you look in 
detail at the alerts themselves: maybe they're tagging with "Dev" or "dev " 
(with a hidden space).

If you change the default 'slack' receiver to go to a different channel, or 
use a different title/text template, it will be easier to see if this is 
the problem or not.


On Friday, 12 August 2022 at 09:36:22 UTC+1 rs wrote:

> Hi everyone! I am configuring alertmanager to send outputs to a prod slack 
> channel and dev slack channel. I have checked with the routing tree editor 
> and everything should be working correctly. 
> However, I am seeing some (not all) alerts that are tagged with 'env: dev' 
> being sent to the prod slack channel. Is there some sort of old 
> configuration caching happening? Is there a way to flush this out?
>
> --- Alertmanager.yml ---
> global:
>   http_config:
> proxy_url: 'xyz'
> templates:
>   - templates/*.tmpl
> route:
>   group_by: [cluster,alertname]
>   group_wait: 10s
>   group_interval: 30m
>   repeat_interval: 24h
>   receiver: 'slack'
>   routes:
>   - receiver: 'production'
> match:
>   env: 'prod'
> continue: true
>   - receiver: 'staging'
> match:
>   env: 'dev'
> continue: true
> receivers:
> #Fallback option - Default set to production server
> - name: 'slack'
>   slack_configs:
>   - api_url: 'api url'
> channel: '#prod-channel'
> send_resolved: true
> color: '{{ template "slack.color" . }}'
> title: '{{ template "slack.title" . }}'
> text: '{{ template "slack.text" . }}'
> actions:
>   - type: button
> text: 'Query :mag:'
> url: '{{ (index .Alerts 0).GeneratorURL }}'
>   - type: button
> text: 'Silence :no_bell:'
> url: '{{ template "__alert_silence_link" . }}'
>   - type: button
> text: 'Dashboard :grafana:'
> url: '{{ (index .Alerts 0).Annotations.dashboard }}'
> - name: 'staging'
>   slack_configs:
>   - api_url: 'api url'
> channel: '#staging-channel'
> send_resolved: true
> color: '{{ template "slack.color" . }}'
> title: '{{ template "slack.title" . }}'
> text: '{{ template "slack.text" . }}'
> actions:
>   - type: button
> text: 'Query :mag:'
> url: '{{ (index .Alerts 0).GeneratorURL }}'
>   - type: button
> text: 'Silence :no_bell:'
> url: '{{ template "__alert_silence_link" . }}'
>   - type: button
> text: 'Dashboard :grafana:'
> url: '{{ (index .Alerts 0).Annotations.dashboard }}'
> - name: 'production'
>   slack_configs:
>   - api_url: 'api url'
> channel: '#prod-channel'
> send_resolved: true
> color: '{{ template "slack.color" . }}'
> title: '{{ template "slack.title" . }}'
> text: '{{ template "slack.text" . }}'
> actions:
>   - type: button
> text: 'Query :mag:'
> url: '{{ (index .Alerts 0).GeneratorURL }}'
>   - type: button
> text: 'Silence :no_bell:'
> url: '{{ template "__alert_silence_link" . }}'
>   - type: button
> text: 'Dashboard :grafana:'
> url: '{{ (index .Alerts 0).Annotations.dashboard }}'
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/a6cf306e-2dee-4c07-ba64-ae439ce96182n%40googlegroups.com.


Re: [prometheus-users] Re: evaluation interval cannot be recognized in seperate job

2022-08-12 Thread Brian Candler
On Friday, 12 August 2022 at 08:50:28 UTC+1 ninag...@gmail.com wrote:

> I'm worried about the blackbox_exporter is expensive or not. 
>
>
You are, or are not, worried about this?
 

> I also have other jobs, other jobs are with global 30s scrape intervals 
> and 30s evaluation invervals, when I set the special job with scrape 
> interval 2m, will it cause any problem?
>
>
Each job defines its own scrape interval.  The global default is used only 
if it doesn't define its own scrape interval.

Similarly, each alerting rule group defines its own evaluation interval.  
The global default is used only if it doesn't define its own evaluation 
interval.

Scraping and rule evaluation run entirely independently.

 

> "
> If you only want to *check* a value every 1 day for alerting, then it's 
> perfectly fine to have an alerting rule group with a 1 day evaluation 
> interval - even if the underlying metric is collected more frequently. The 
> alerting rule will use the most recent value at evaluation time.
> "
> regarding this point, is it possible to set alerting rule group for a 
> special rule or a job?
>

I don't understand the question.  You can create as many alerting rule 
groups as you like, and each rule group can have as many alerting rules as 
you like. Each alerting expression can access *all* the metrics in the TSDB 
- it is not tied to a specific scrape job.

https://prometheus.io/docs/prometheus/latest/configuration/alerting_rules/

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/848984ad-044c-47db-a19a-326c6a536528n%40googlegroups.com.


Re: [prometheus-users] How could I trucate data after pull in Java client.

2022-08-12 Thread Stuart Clark

On 09/08/2022 08:56, Hello Wood wrote:
Hi, I using Prometheus to collect Spring Boot service metrics in Java. 
But I found a problem that data still exsit after pull, that made the 
instant data is not correct.


Like at now there has one label like 'app_version{version=a} 100', 
then the metrics updated, and add a new label value b, the metrics 
come to 'app_version{version=a} 50' and 'app_version{version=b} 50'; 
Then, label a no longer update, and metrics come to 
'app_version{version=b} 100'.


When I pull metrics form Spring Boot service, the metrics is 
'app_version{version=a} 50' and 'app_version{version=b} 100'. But 
expect data should be 'app_version{version=b} 100' only.


How could I fix this issue? Thanks.


I think possibly you aren't using labels in the way expected.

Labels are used to "slice & dice" the data, so for example to be able to 
see which specific HTTP response code was returned from a web call, etc.


What is the value of the metric app_version supposed to signify?

--
Stuart Clark

--
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/8ba27ec4-9c10-d6e3-1b8d-4e48b209010f%40Jahingo.com.


Re: [prometheus-users] Prometheus Api to json to use in react.js

2022-08-12 Thread Stuart Clark

On 06/08/2022 22:58, Geet wrote:

Hi ,

What is the best way to collect data from Prometheus metrics which is 
in dev environment and convert the metrics to json and to use in 
react.js for making bar charts?



Take a look at the HTTP query API which returns a JSON response for a 
query you send:


https://prometheus.io/docs/prometheus/latest/querying/api/

--
Stuart Clark

--
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/adb4f598-806d-8563-5625-5aa425e622b7%40Jahingo.com.


[prometheus-users] Re: [prometheus-developers] Return specific value if label regex not match

2022-08-12 Thread Matthias Rampke
Hi, this mailing list is for development of Prometheus and related
projects. Since your question is about usage, I'm moving the thread to the
prometheus-users mailing list.

To answer your question, in general a regular expression can have an
unbounded number of matches, so Prometheus cannot automatically determine
from the matcher alone that name2 should be there.

You can set up recording rules with all the names you expect to be there:

- record: probe_success:expected_name
  expr: 1
  labels:
name: name1
- record: probe_success:expected_name
  expr: 1
  labels:
name: name2
- record: probe_success:expected_name
  expr: 1
  labels:
name: name3

and then use it in the your query like

probe_success{name=~"name1|name2|name3"} or -1*probe_success:expected_name

I am using the value 1 for this metric because it is customary to do that
for "metadata metrics" like this – you can multiply it with the desired
value in the query like I did here.

Another thing about your query – you are matching __name__ but that is a
special label representing the metric name. Since your query specifies
probe_success as the metric name, the two are in conflict.

/MR



On Fri, Aug 12, 2022 at 8:35 AM Simon  wrote:

> Hello everyone,
> I have a query: probe_success{__name__=~"name1|name2|name3"}.
> Prometheus does not have label __name___ = name2 and i want it return -1
> if prometheus does not have that label value.
> How can i do that?
>
> --
> You received this message because you are subscribed to the Google Groups
> "Prometheus Developers" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to prometheus-developers+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/prometheus-developers/eccbaad3-9bb0-41a0-a626-25403d34a4d9n%40googlegroups.com
> 
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/CAMV%3D_gYuxKbpx62Ve97tvSm0%2Bb15YJcJaE6mkOxiW45jbZcdaw%40mail.gmail.com.


[prometheus-users] Alertmanager alerts going to wrong channel

2022-08-12 Thread rs
I previously posted my configuration. For whatever reason, I don't see it 
in the forum so I'll reply to this thread with the configurations.

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/dd2dc326-41de-4d26-bc7c-1c36a13216b7n%40googlegroups.com.


Re: [prometheus-users] Easy way to parse out rule expression to extract value (remove comparator)

2022-08-12 Thread laixintao
Hi Callum

For what it's worth, I met the same issue, and I wrote a python lib that 
can split alert rules into multiple parts of expressions, based on grammar, 
so it can handles and/or/group_left/on, etc as well. The split_binary_op 
function takes a string expression and returns the parsed result in json 
format.

https://github.com/laixintao/promqlpy/

Also thanks Julius, your information helped a lot. ;D

On Friday, June 4, 2021 at 1:37:29 AM UTC+8 juliu...@promlabs.com wrote:

> Hi Callum,
>
> In general yes, the PromQL Go library lets you parse an expression and 
> then traverse its abstract syntax tree, using the types from the "ast" 
> package in 
> https://github.com/prometheus/prometheus/blob/main/promql/parser/ast.go. 
> You'd first use 
> https://pkg.go.dev/github.com/prometheus/prometheus/promql/parser#ParseExpr 
> to do the parsing, and that gives you back the AST to look at.
>
> For example, this is how PromLens shows the different sub-expressions of a 
> query (for your expression: https://demo.promlens.com/?l=eF6PYANAlbQ).
>
> However: Not every alerting query is of the shape "  
> ". Comparison / filter operators are normal binary expressions that 
> may or may not occur at any part of a query (even deeply nested in a query 
> tree, not at the end), and you will also have alerting queries which do not 
> contain a filter operator at all. So you would have to apply certain 
> assumptions about the structure of a query to be able to extract just the 
> part you are interested in.
>
> Regards,
> Julius
>
> On Thu, Jun 3, 2021 at 1:04 AM Callum Jones  wrote:
>
>> Hi,
>>
>> With alerting rules the Prometheus API (and templating language) has 
>> access to the "Value" of an alert which appears to be the first metric 
>> specified without the comparator.
>>
>> I was wondering if this is something that is easily accessible in the 
>> Prometheus Go packages such that I could give it the expression string and 
>> it would strip out the comparison part and return just the metric query.
>>
>> For example passing in:
>> > sum by (instance) (rate(node_disk_read_bytes_total[2m])) / 1024 / 1024 
>> > 50
>> would return
>> > sum by (instance) (rate(node_disk_read_bytes_total[2m])) / 1024 / 1024
>>
>> Thanks,
>> Callum
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Prometheus Users" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to prometheus-use...@googlegroups.com.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/prometheus-users/a5b35b15-ebae-4030-8a9c-f67adb836eacn%40googlegroups.com
>>  
>> 
>> .
>>
>
>
> -- 
> Julius Volz
> PromLabs - promlabs.com
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/62b48cf0-5acf-4eab-8cb8-e4c702938a44n%40googlegroups.com.


[prometheus-users] Prometheus Api to json to use in react.js

2022-08-12 Thread Geet
Hi ,

What is the best way to collect data from Prometheus metrics which is in 
dev environment and convert the metrics to json and to use in react.js for 
making bar charts?

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/dd2bda38-758e-469a-8a00-ff5a3c30ab39n%40googlegroups.com.


[prometheus-users] How could I trucate data after pull in Java client.

2022-08-12 Thread Hello Wood
Hi, I using Prometheus to collect Spring Boot service metrics in Java. But 
I found a problem that data still exsit after pull, that made the instant 
data is not correct. 

Like at now there has one label like 'app_version{version=a} 100', then the 
metrics updated, and add a new label value b, the metrics come to 
'app_version{version=a} 50' and 'app_version{version=b} 50'; Then, label a 
no longer update, and metrics come to 'app_version{version=b} 100'.

When I pull metrics form Spring Boot service, the metrics is 
'app_version{version=a} 50' and 'app_version{version=b} 100'. But expect 
data should be 'app_version{version=b} 100' only. 

How could I fix this issue? Thanks.

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/31873740-2030-43ab-b905-d8f98570f647n%40googlegroups.com.


[prometheus-users] err="cannot find oid '1.3.6.1.4.1.24681' to walk"

2022-08-12 Thread Sabari SN
Hi everyone,

here i'm able to get the mibs while using the snmpwalk

but not able to generate the snmp.yml file.

snmpwalk  -v 2c -c 'public' 192.168.200.201 .1.3.6.1.4.1.24681
SNMPv2-SMI::enterprises.24681.1.2.1.0 = STRING: "4.60 %"
SNMPv2-SMI::enterprises.24681.1.2.9.1.1.1 = INTEGER: 1
SNMPv2-SMI::enterprises.24681.1.2.9.1.2.1 = STRING: "eth0"
SNMPv2-SMI::enterprises.24681.1.2.9.1.3.1 = Counter32: 15089489
SNMPv2-SMI::enterprises.24681.1.2.9.1.4.1 = Counter32: 22461175
SNMPv2-SMI::enterprises.24681.1.2.9.1.5.1 = Counter32: 0
SNMPv2-SMI::enterprises.24681.1.2.11.1.1.1 = INTEGER: 1
SNMPv2-SMI::enterprises.24681.1.2.11.1.1.2 = INTEGER: 2
SNMPv2-SMI::enterprises.24681.1.2.11.1.2.1 = STRING: "HDD1"
SNMPv2-SMI::enterprises.24681.1.2.11.1.2.2 = STRING: "HDD2"
SNMPv2-SMI::enterprises.24681.1.2.11.1.3.1 = STRING: "42 C/107 F"
SNMPv2-SMI::enterprises.24681.1.2.11.1.3.2 = STRING: "-- C/-- F"
SNMPv2-SMI::enterprises.24681.1.2.11.1.4.1 = INTEGER: 0
SNMPv2-SMI::enterprises.24681.1.2.11.1.4.2 = INTEGER: -5
SNMPv2-SMI::enterprises.24681.1.2.11.1.5.1 = STRING: "MB1000GCWCV
"
SNMPv2-SMI::enterprises.24681.1.2.11.1.5.2 = STRING: "--"
SNMPv2-SMI::enterprises.24681.1.2.11.1.6.1 = STRING: "931.51 GB"
SNMPv2-SMI::enterprises.24681.1.2.11.1.6.2 = STRING: "--"
SNMPv2-SMI::enterprises.24681.1.2.11.1.7.1 = STRING: "GOOD"
SNMPv2-SMI::enterprises.24681.1.2.11.1.7.2 = STRING: "--"
SNMPv2-SMI::enterprises.24681.1.2.17.1.1.1 = INTEGER: 1
SNMPv2-SMI::enterprises.24681.1.2.17.1.2.1 = STRING: "[Single Disk 
Volume: Drive 1]"
SNMPv2-SMI::enterprises.24681.1.2.17.1.3.1 = STRING: "EXT4"
SNMPv2-SMI::enterprises.24681.1.2.17.1.4.1 = STRING: "926.32 GB"
SNMPv2-SMI::enterprises.24681.1.2.17.1.5.1 = STRING: "599.74 GB"
SNMPv2-SMI::enterprises.24681.1.2.17.1.6.1 = STRING: "Ready"
SNMPv2-SMI::enterprises.24681.1.3.9.1.1.1 = INTEGER: 1
SNMPv2-SMI::enterprises.24681.1.3.9.1.2.1 = STRING: "eth0"
SNMPv2-SMI::enterprises.24681.1.3.9.1.3.1 = Counter32: 15089521
SNMPv2-SMI::enterprises.24681.1.3.9.1.4.1 = Counter32: 22461204
SNMPv2-SMI::enterprises.24681.1.3.9.1.5.1 = Counter32: 0
SNMPv2-SMI::enterprises.24681.1.3.11.1.1.1 = INTEGER: 1
SNMPv2-SMI::enterprises.24681.1.3.11.1.1.2 = INTEGER: 2
SNMPv2-SMI::enterprises.24681.1.3.11.1.2.1 = STRING: "HDD1"
SNMPv2-SMI::enterprises.24681.1.3.11.1.2.2 = STRING: "HDD2"
SNMPv2-SMI::enterprises.24681.1.3.11.1.3.1 = INTEGER: 42
SNMPv2-SMI::enterprises.24681.1.3.11.1.3.2 = INTEGER: 0
SNMPv2-SMI::enterprises.24681.1.3.11.1.4.1 = INTEGER: 0
SNMPv2-SMI::enterprises.24681.1.3.11.1.4.2 = INTEGER: -5
SNMPv2-SMI::enterprises.24681.1.3.11.1.5.1 = STRING: "MB1000GCWCV
"
SNMPv2-SMI::enterprises.24681.1.3.11.1.5.2 = STRING: "--"
SNMPv2-SMI::enterprises.24681.1.3.11.1.6.1 = Counter64: 1000204886016
SNMPv2-SMI::enterprises.24681.1.3.11.1.6.2 = Counter64: 0
SNMPv2-SMI::enterprises.24681.1.3.11.1.7.1 = STRING: "GOOD"
SNMPv2-SMI::enterprises.24681.1.3.11.1.7.2 = STRING: "--"
SNMPv2-SMI::enterprises.24681.1.3.17.1.1.1 = INTEGER: 1
SNMPv2-SMI::enterprises.24681.1.3.17.1.2.1 = STRING: "[Single Disk 
Volume: Drive 1]"
SNMPv2-SMI::enterprises.24681.1.3.17.1.3.1 = STRING: "EXT4"
SNMPv2-SMI::enterprises.24681.1.3.17.1.4.1 = Counter64: 971319812
SNMPv2-SMI::enterprises.24681.1.3.17.1.5.1 = Counter64: 628876064
SNMPv2-SMI::enterprises.24681.1.3.17.1.6.1 = STRING: "Ready"
./generator generate
ts=2022-08-09T10:46:14.375Z caller=net_snmp.go:161 level=info 
msg="Loading MIBs" 
from=$HOME/.snmp/mibs:/usr/share/snmp/mibs:/usr/share/snmp/mibs/iana:/usr/share/snmp/mibs/ietf
ts=2022-08-09T10:46:14.792Z caller=main.go:119 level=warn msg="NetSNMP 
reported parse error(s)" errors=359
ts=2022-08-09T10:46:15.160Z caller=main.go:51 level=info 
msg="Generating config for module" module=extreme-switch
ts=2022-08-09T10:46:15.261Z caller=main.go:129 level=error msg="Error 
generating config netsnmp" err="cannot find oid '1.3.6.1.4.1.24681' to walk"

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/3d6c62fd-9336-4a12-8c10-af726bb9bdc8n%40googlegroups.com.


[prometheus-users] Alertmanager slack alerting issues

2022-08-12 Thread rs
Hi everyone! I am configuring alertmanager to send outputs to a prod slack 
channel and dev slack channel. I have checked with the routing tree editor 
and everything should be working correctly. 
However, I am seeing some (not all) alerts that are tagged with 'env: dev' 
being sent to the prod slack channel. Is there some sort of old 
configuration caching happening? Is there a way to flush this out?

--- Alertmanager.yml ---
global:
  http_config:
proxy_url: 'xyz'
templates:
  - templates/*.tmpl
route:
  group_by: [cluster,alertname]
  group_wait: 10s
  group_interval: 30m
  repeat_interval: 24h
  receiver: 'slack'
  routes:
  - receiver: 'production'
match:
  env: 'prod'
continue: true
  - receiver: 'staging'
match:
  env: 'dev'
continue: true
receivers:
#Fallback option - Default set to production server
- name: 'slack'
  slack_configs:
  - api_url: 'api url'
channel: '#prod-channel'
send_resolved: true
color: '{{ template "slack.color" . }}'
title: '{{ template "slack.title" . }}'
text: '{{ template "slack.text" . }}'
actions:
  - type: button
text: 'Query :mag:'
url: '{{ (index .Alerts 0).GeneratorURL }}'
  - type: button
text: 'Silence :no_bell:'
url: '{{ template "__alert_silence_link" . }}'
  - type: button
text: 'Dashboard :grafana:'
url: '{{ (index .Alerts 0).Annotations.dashboard }}'
- name: 'staging'
  slack_configs:
  - api_url: 'api url'
channel: '#staging-channel'
send_resolved: true
color: '{{ template "slack.color" . }}'
title: '{{ template "slack.title" . }}'
text: '{{ template "slack.text" . }}'
actions:
  - type: button
text: 'Query :mag:'
url: '{{ (index .Alerts 0).GeneratorURL }}'
  - type: button
text: 'Silence :no_bell:'
url: '{{ template "__alert_silence_link" . }}'
  - type: button
text: 'Dashboard :grafana:'
url: '{{ (index .Alerts 0).Annotations.dashboard }}'
- name: 'production'
  slack_configs:
  - api_url: 'api url'
channel: '#prod-channel'
send_resolved: true
color: '{{ template "slack.color" . }}'
title: '{{ template "slack.title" . }}'
text: '{{ template "slack.text" . }}'
actions:
  - type: button
text: 'Query :mag:'
url: '{{ (index .Alerts 0).GeneratorURL }}'
  - type: button
text: 'Silence :no_bell:'
url: '{{ template "__alert_silence_link" . }}'
  - type: button
text: 'Dashboard :grafana:'
url: '{{ (index .Alerts 0).Annotations.dashboard }}'

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/af14f1af-2570-4f7f-a7dc-a9bbb30dn%40googlegroups.com.


Re: [prometheus-users] Re: evaluation interval cannot be recognized in seperate job

2022-08-12 Thread nina guo
Thank you. Brian.

I'm worried about the blackbox_exporter is expensive or not. 

I also have other jobs, other jobs are with global 30s scrape intervals and 
30s evaluation invervals, when I set the special job with scrape interval 
2m, will it cause any problem?

"
If you only want to *check* a value every 1 day for alerting, then it's 
perfectly fine to have an alerting rule group with a 1 day evaluation 
interval - even if the underlying metric is collected more frequently. The 
alerting rule will use the most recent value at evaluation time.
"
regarding this point, is it possible to set alerting rule group for a 
special rule or a job?


On Friday, August 12, 2022 at 3:40:23 PM UTC+8 Brian Candler wrote:

> Perhaps the most import question to ask is: *why* do you only want to 
> scrape once per day?
>
> If you're worrying about storage space consumption: don't.  Prometheus 
> uses differential encoding.  If you record the same metric repeatedly with 
> the same value, i.e. the difference between the adjacent values is zero, 
> then the amount of storage space used is miniscule.  Even if it's not the 
> same value, it's extremely highly compressed.  And disk space is cheap.
>
> If you only want to *check* a value every 1 day for alerting, then it's 
> perfectly fine to have an alerting rule group with a 1 day evaluation 
> interval - even if the underlying metric is collected more frequently. The 
> alerting rule will use the most recent value at evaluation time.
>
> If you're worried that the exporter itself is expensive to run on each 
> scrape, then you should do one of the other things that Stuart suggested 
> (i.e. using textfile collector or push gateway).  On the target node use a 
> cron job to update the metric at whatever interval makes sense - but let 
> prometheus scrape it as often as it likes.  But you should still scrape the 
> cached value every 1-2 minutes.
>
> If it's blackbox_exporter that you're talking about, those scrapes are 
> cheap.  So you may as well scrape every 2 minutes.
>
> If you're worried about transient errors causing spurious alarms, then 
> adjust your alerting rules (e.g. in the simplest case, add "for: 30m" which 
> means that you'll only get a notification if the alert has been active for 
> 30m *continuously*).
>
> On Friday, 12 August 2022 at 08:31:09 UTC+1 Stuart Clark wrote:
>
>> On 12/08/2022 08:29, nina guo wrote: 
>> > So if with push gateway or textfile collector, we need to also to 
>> > customize our metrics, am I right? 
>>
>> What do you mean? 
>>
>> -- 
>> Stuart Clark 
>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/0a952d82-c1d7-4ad1-9978-20b84985b698n%40googlegroups.com.


Re: [prometheus-users] Re: evaluation interval cannot be recognized in seperate job

2022-08-12 Thread Brian Candler
Perhaps the most import question to ask is: *why* do you only want to 
scrape once per day?

If you're worrying about storage space consumption: don't.  Prometheus uses 
differential encoding.  If you record the same metric repeatedly with the 
same value, i.e. the difference between the adjacent values is zero, then 
the amount of storage space used is miniscule.  Even if it's not the same 
value, it's extremely highly compressed.  And disk space is cheap.

If you only want to *check* a value every 1 day for alerting, then it's 
perfectly fine to have an alerting rule group with a 1 day evaluation 
interval - even if the underlying metric is collected more frequently. The 
alerting rule will use the most recent value at evaluation time.

If you're worried that the exporter itself is expensive to run on each 
scrape, then you should do one of the other things that Stuart suggested 
(i.e. using textfile collector or push gateway).  On the target node use a 
cron job to update the metric at whatever interval makes sense - but let 
prometheus scrape it as often as it likes.  But you should still scrape the 
cached value every 1-2 minutes.

If it's blackbox_exporter that you're talking about, those scrapes are 
cheap.  So you may as well scrape every 2 minutes.

If you're worried about transient errors causing spurious alarms, then 
adjust your alerting rules (e.g. in the simplest case, add "for: 30m" which 
means that you'll only get a notification if the alert has been active for 
30m *continuously*).

On Friday, 12 August 2022 at 08:31:09 UTC+1 Stuart Clark wrote:

> On 12/08/2022 08:29, nina guo wrote:
> > So if with push gateway or textfile collector, we need to also to 
> > customize our metrics, am I right?
>
> What do you mean?
>
> -- 
> Stuart Clark
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/635f5d4d-ff92-45b0-b7ef-291930df5b3cn%40googlegroups.com.


Re: [prometheus-users] Re: evaluation interval cannot be recognized in seperate job

2022-08-12 Thread nina guo


now I'm using blackbox exporter exposed metrics.

if i use pushgateway or textfile collector, i need to write a script  first 
to generate the metrics and write them with prometheus metrics format to 
textfile collector folder. 
On Friday, August 12, 2022 at 3:31:09 PM UTC+8 Stuart Clark wrote:

> On 12/08/2022 08:29, nina guo wrote:
> > So if with push gateway or textfile collector, we need to also to 
> > customize our metrics, am I right?
>
> What do you mean?
>
> -- 
> Stuart Clark
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/d4fc5e45-a324-47f0-855e-4e482cdfbac0n%40googlegroups.com.


Re: [prometheus-users] Re: evaluation interval cannot be recognized in seperate job

2022-08-12 Thread nina guo

now I'm using blackbox exporter exposed metrics.

if i use pushgateway or textfile collector, i need to write a script  first 
to generate the metrics and write them with prometheus metrics format to 
textfile collector folder. I cannot use the metrics exposed by blackbox 
exporter.
On Friday, August 12, 2022 at 3:31:09 PM UTC+8 Stuart Clark wrote:

> On 12/08/2022 08:29, nina guo wrote:
> > So if with push gateway or textfile collector, we need to also to 
> > customize our metrics, am I right?
>
> What do you mean?
>
> -- 
> Stuart Clark
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/866c2386-10fc-4e15-9632-716d9865391bn%40googlegroups.com.


Re: [prometheus-users] Re: evaluation interval cannot be recognized in seperate job

2022-08-12 Thread Stuart Clark

On 12/08/2022 08:29, nina guo wrote:
So if with push gateway or textfile collector, we need to also to 
customize our metrics, am I right?


What do you mean?

--
Stuart Clark

--
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/999dfca4-1bdc-a1dc-6135-12a5d532a0fe%40Jahingo.com.


Re: [prometheus-users] Re: evaluation interval cannot be recognized in seperate job

2022-08-12 Thread nina guo
So if with push gateway or textfile collector, we need to also to customize 
our metrics, am I right?

On Friday, August 12, 2022 at 3:17:09 PM UTC+8 Stuart Clark wrote:

> On 12/08/2022 08:09, nina guo wrote:
> > OK. So if I want to scrape the metrics for 1 day interval, which way 
> > is better to implement?
>
> Some options would be:
>
> - Scrape it every 2 minutes instead of daily
> - Use the textfile collector of the node exporter, with a scheduled job 
> to update the file daily
> - Use the push gateway with a scheduled job that updates the API daily
>
> For the second two options you will lose the ability to use the "up" 
> metric (as that will now refer to the node exporter/push gateway 
> instead), but both add their own additional metrics containing timstamps 
> of the last time the metric was updated.
>
> -- 
> Stuart Clark
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/ca5f7551-2dd3-46de-9331-c12e2f12cd29n%40googlegroups.com.


Re: [prometheus-users] Re: evaluation interval cannot be recognized in seperate job

2022-08-12 Thread Stuart Clark

On 12/08/2022 08:09, nina guo wrote:
OK. So if I want to scrape the metrics for 1 day interval, which way 
is better to implement?


Some options would be:

- Scrape it every 2 minutes instead of daily
- Use the textfile collector of the node exporter, with a scheduled job 
to update the file daily

- Use the push gateway with a scheduled job that updates the API daily

For the second two options you will lose the ability to use the "up" 
metric (as that will now refer to the node exporter/push gateway 
instead), but both add their own additional metrics containing timstamps 
of the last time the metric was updated.


--
Stuart Clark

--
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/ca768714-190e-93da-84ff-66ce4878d2d4%40Jahingo.com.


Re: [prometheus-users] Re: evaluation interval cannot be recognized in seperate job

2022-08-12 Thread nina guo
OK. So if I want to scrape the metrics for 1 day interval, which way is 
better to implement?

On Friday, August 12, 2022 at 2:56:29 PM UTC+8 Stuart Clark wrote:

> On 12/08/2022 06:46, nina guo wrote:
>
> what i want to implement is to scrape and evaluate one kind of metrics not 
> so frequently, I want to adjust the interval to 1d or 2d, something like 
> this.
>
> On Friday, August 12, 2022 at 11:06:15 AM UTC+8 nina guo wrote:
>
>> Hi, I received following error. 
>>
>> - job_name: TLS Connection
>> scrape_interval: 1d
>> evaluation_interval: 1d
>> metrics_path: /probe
>> params:
>>   module: [smtp_starttls]
>> file_sd_configs:
>> - files:xxx
>>
>>  kubectl logs prometheus -c prometheus -n monitoring
>> level=error ts=2022-08-12T03:03:50.120Z caller=main.go:347 msg="Error 
>> loading config (--config.file=/etc/prometheus/prometheus.yml)" err="parsing 
>> YAML file /etc/prometheus/prometheus.yml: yaml: unmarshal errors:\n  line 
>> 54: field evaluation_interval not found in type config.ScrapeConfig"
>>
> Two things here:
>
> 1. There is no entry called "evaluation_interval" within a scrape config, 
> so that needs removing to clear the unmarshal error.
>
> 2. The maximum sensible scrape interval is around 2-3 minutes, so 1 day is 
> far too long. With a longer interval you will end up with stale time series 
> and "gaps" in all your graphs.
>
> -- 
> Stuart Clark
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/592becb6-6ed5-4320-aa0e-be25c783c7ecn%40googlegroups.com.


Re: [prometheus-users] Re: evaluation interval cannot be recognized in seperate job

2022-08-12 Thread Stuart Clark

On 12/08/2022 06:46, nina guo wrote:
what i want to implement is to scrape and evaluate one kind of metrics 
not so frequently, I want to adjust the interval to 1d or 2d, 
something like this.


On Friday, August 12, 2022 at 11:06:15 AM UTC+8 nina guo wrote:

Hi, I received following error.

- job_name: TLS Connection
        scrape_interval: 1d
        evaluation_interval: 1d
        metrics_path: /probe
        params:
          module: [smtp_starttls]
        file_sd_configs:
        - files:xxx

 kubectl logs prometheus -c prometheus -n monitoring
level=error ts=2022-08-12T03:03:50.120Z caller=main.go:347
msg="Error loading config
(--config.file=/etc/prometheus/prometheus.yml)" err="parsing YAML
file /etc/prometheus/prometheus.yml: yaml: unmarshal errors:\n
 line 54: field evaluation_interval not found in type
config.ScrapeConfig"


Two things here:

1. There is no entry called "evaluation_interval" within a scrape 
config, so that needs removing to clear the unmarshal error.


2. The maximum sensible scrape interval is around 2-3 minutes, so 1 day 
is far too long. With a longer interval you will end up with stale time 
series and "gaps" in all your graphs.


--
Stuart Clark

--
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/903f4c47-1a79-fc56-6f71-039caf0aa400%40Jahingo.com.