[prometheus-users] Re: Prometheus alert tagging issue - multiple servers

'Brian Candler' via Prometheus Users Wed, 03 Apr 2024 05:14:22 -0700

> but i was expecting an additional host=server2 tag on the ticket. 

You won't get that, because CommonLabels is exactly how it sounds: those 
labels which are common to all the alerts in the group.  If one alert has 
instance=server1 and the other has instance=server2, but they're in the 
same alert group, then no 'instance' will appear in CommonLabels.


The documentation is here:
https://prometheus.io/docs/alerting/latest/notifications/

It looks like you could iterate over Alerts.Firing then the Labels within 
each alert.

Alternatively, you could disable grouping and let opsgenie do the grouping 
(I don't know opsgenie, so I don't know how good a job it would do of that)


On Wednesday 3 April 2024 at 09:11:24 UTC+1 mohan garden wrote:

> *correction: 
> *Scenario2: *While server1 trigger is active, a second server ( say 
> server2)'s local disk usage reaches 50%,
>
> i see that the already open Opsgenie ticket's details gets updated as:
>
> ticket header name:  local disk usage reached 50%
> ticket description:  space on /var file system at server1:9100 server = 
> 82%."
>                                  space on /var file system at 
> server2:9100 server = 80%."
> ticket tags: criteria: overuse , team: support, severity: critical, 
> infra,monitor,host=server1
>
> [image: photo003.png]
>
>
>
> On Wednesday, April 3, 2024 at 1:37:12 PM UTC+5:30 mohan garden wrote:
>
>> Hi Brian, 
>> Thank you for the response, Here are some more details, hope this will 
>> help you in gaining more understanding into the configuration and method i 
>> am using to generate tags :
>>
>>
>> 1. We collect data from the node exporter, and have created some rules 
>> around the collected data. Here is one example - 
>>     - alert: "Local Disk usage has reached 50%"
>>       expr: (round( 
>> node_filesystem_avail_bytes{mountpoint=~"/dev.*|/sys*|/|/home|/tmp|/var.*|/boot.*",}
>>  
>> / 
>> node_filesystem_size_bytes{mountpoint=~"/dev.*|/sys*|/|/home|/tmp|/var.*|/boot.*"}
>>  
>> * 100  ,0.1) >= y ) and (round( 
>> node_filesystem_avail_bytes{mountpoint=~"/dev.*|/sys*|/|/home|/tmp|/var.*|/boot.*"}
>>  
>> / 
>> node_filesystem_size_bytes{mountpoint=~"/dev.*|/sys*|/|/home|/tmp|/var.*|/boot.*"}
>>  
>> * 100  ,0.1) <= z )
>>       for: 5m
>>       labels:
>>         criteria: overuse
>>         severity: critical
>>         team: support
>>       annotations:
>>         summary: "{{ $labels.instance }} 's  ({{ $labels.device }}) has 
>> low space."
>>         description: "space on {{ $labels.mountpoint }} file system at {{ 
>> $labels.instance }} server = {{ $value }}%."
>>
>> 2. at the alert manager , we have created notification rules to notify in 
>> case the aforementioned condition occurs:
>>
>>   smtp_from: 'ser...@example.com'
>>   smtp_require_tls: false
>>   smtp_smarthost: 'ser...@example.com:25 <http://serv...@example.com:25>'
>>
>> templates:
>>   - /home/ALERTMANAGER/conf/template/*.tmpl
>>
>> route:
>>   group_wait: 5m
>>   group_interval: 2h
>>   repeat_interval: 5h
>>   receiver: admin
>>   routes:
>>   - match_re:
>>       alertname: ".*Local Disk usage has reached .*%"
>>     receiver: admin
>>     routes:
>>     - match:
>>         criteria: overuse
>>         severity: critical
>>         team: support
>>       receiver: mailsupport
>>       continue: true
>>     - match:
>>         criteria: overuse
>>         team: support
>>         severity: critical
>>         receiver: opsgeniesupport
>>
>> receivers:
>>   - name: opsgeniesupport
>>     opsgenie_configs:
>>     - api_key: XYZ
>>       api_url: https://api.opsgenie.com
>>       message: '{{ .CommonLabels.alertname }}'
>>       description: "{{ range .Alerts }}{{ .Annotations.description 
>> }}\n\r{{ end }}"
>>       tags: '{{ range $k, $v := .CommonLabels}}{{ if or (eq $k 
>> "criteria")  (eq $k "severity") (eq $k "team") }}{{$k}}={{$v}},{{ else if 
>> eq $k "instance" }}{{ reReplaceAll "(.+):(.+)" "host=$1" $v 
>> }},{{end}}{{end}},infra,monitor'
>>       priority: 'P1'
>>       update_alerts: true
>>       send_resolved: true
>> ...
>> So you can see that i derive a  tag host=<hostname> from the instance 
>> label.
>>
>>
>> *Scenario1: *When server1 's local disk usage reaches 50%, i see that 
>> Opsgenie ticket is created having:
>> Opsgenie Ticket metadata: 
>> ticket header name:  local disk usage reached 50%
>> ticket description:  space on /var file system at server1:9100 server = 
>> 82%."
>> ticket tags: criteria: overuse , team: support, severity: critical, 
>> infra,monitor,host=server1
>>
>> so everything works as expected, no issues with Scenario1.
>>
>>
>> *Scenario2: *While server1 trigger is active, a second server ( say 
>> server2)'s local disk usage reaches 50%,
>>
>> i see that Opsgenie tickets are getting updated as:
>> ticket header name:  local disk usage reached 50%
>> ticket description:  space on /var file system at server1:9100 server = 
>> 82%."
>> ticket description:  space on /var file system at server2:9100 server = 
>> 80%."
>> ticket tags: criteria: overuse , team: support, severity: critical, 
>> infra,monitor,host=server1
>>
>>
>> but i was expecting an additional host=server2 tag on the ticket.  
>> in Summary - i see updated description , but unable to see updated tags.
>>
>> in tags section of the alertmanager - opsgenie integration configuration 
>> , i had tried iterating over Alerts and CommonLabels, but i was unable to 
>> add  additional host=server2 tag .
>> {{ range $idx, $alert := .Alerts}}{{range $k, $v := $alert.Labels 
>> }}{{$k}}={{$v}},{{end}}{{end}},test=test
>> {{ range $k, $v := .CommonLabels}}....{{end}}
>>
>>
>> At the moment, i am not sure that what is potentially preventing the 
>> update of tags on the opsgenie tickets.
>> If i can get some clarity on the fact that if the configurations i have 
>> for  alertmanager are good enough, then i can look at the opsgenie 
>> configurations.
>>
>>
>> Please advice.
>>
>>
>> Regards
>> CP
>>
>>
>> On Tuesday, April 2, 2024 at 10:46:36 PM UTC+5:30 Brian Candler wrote:
>>
>>> FYI, those images are unreadable - copy-pasted text would be much better.
>>>
>>> My guess, though, is that you probably don't want to group alerts before 
>>> sending them to opsgenie. You haven't shown your full alertmanager config, 
>>> but if you have a line like
>>>
>>>    group_by: ['alertname']
>>>
>>> then try
>>>
>>>    group_by: ["..."]
>>>
>>> (literally, exactly that: a single string containing three dots, inside 
>>> square brackets)
>>>
>>> On Tuesday 2 April 2024 at 17:15:39 UTC+1 mohan garden wrote:
>>>
>>>> Dear Prometheus Community,
>>>> I am reaching out regarding an issue i have encountered with  
>>>> prometheus alert tagging, specifically while creating tickets in Opsgenie.
>>>>
>>>>
>>>> I have configured alertmanager  to send alerts to Opsgenie as , the 
>>>> configuration as :
>>>> [image: photo001.png]i ticket is generated with expected description 
>>>> and tags as - 
>>>> [image: photo002.png]
>>>>
>>>> Now, by default the alerts are grouped by the alert name( default 
>>>> behavior).So when the similar event happens on a different server i see 
>>>> that the description is updated as:
>>>> [image: photo003.png]
>>>> but the tag on the ticket remains same, 
>>>> expected behavior: criteria=..., host=108, host=114, infra.....support 
>>>>
>>>> I have set update_alert and send_resolved settings to true.
>>>> I am not sure that in order to make it work as expected, If i need 
>>>> additional configuration at opsgenie or at the alertmanager. 
>>>>
>>>> I would appreciate any insight or guidance on the method to resolve 
>>>> this issue and ensure that alerts for different servers are correctly 
>>>> tagged in Opsgenie.
>>>>
>>>> Thank you in advance.
>>>> Regards,
>>>> CP
>>>>
>>>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/9e2be26c-2fcf-46e4-af0a-9b4e56debaa1n%40googlegroups.com.

[prometheus-users] Re: Prometheus alert tagging issue - multiple servers

Reply via email to