[prometheus-users] Re: Prometheus alert tagging issue - multiple servers

mohan garden Wed, 03 Apr 2024 01:07:17 -0700

Hi Brian, 
Thank you for the response, Here are some more details, hope this will help 
you in gaining more understanding into the configuration and method i am 
using to generate tags :



1. We collect data from the node exporter, and have created some rules 
around the collected data. Here is one example - 
    - alert: "Local Disk usage has reached 50%"
      expr: (round( 
node_filesystem_avail_bytes{mountpoint=~"/dev.*|/sys*|/|/home|/tmp|/var.*|/boot.*",}
 
/ 
node_filesystem_size_bytes{mountpoint=~"/dev.*|/sys*|/|/home|/tmp|/var.*|/boot.*"}
 
* 100  ,0.1) >= y ) and (round( 
node_filesystem_avail_bytes{mountpoint=~"/dev.*|/sys*|/|/home|/tmp|/var.*|/boot.*"}
 
/ 
node_filesystem_size_bytes{mountpoint=~"/dev.*|/sys*|/|/home|/tmp|/var.*|/boot.*"}
 
* 100  ,0.1) <= z )
      for: 5m
      labels:
        criteria: overuse
        severity: critical
        team: support
      annotations:
        summary: "{{ $labels.instance }} 's  ({{ $labels.device }}) has low 
space."
        description: "space on {{ $labels.mountpoint }} file system at {{ 
$labels.instance }} server = {{ $value }}%."

2. at the alert manager , we have created notification rules to notify in 
case the aforementioned condition occurs:

  smtp_from: 'serv...@example.com'
  smtp_require_tls: false
  smtp_smarthost: 'serv...@example.com:25'

templates:
  - /home/ALERTMANAGER/conf/template/*.tmpl

route:
  group_wait: 5m
  group_interval: 2h
  repeat_interval: 5h
  receiver: admin
  routes:
  - match_re:
      alertname: ".*Local Disk usage has reached .*%"
    receiver: admin
    routes:
    - match:
        criteria: overuse
        severity: critical
        team: support
      receiver: mailsupport
      continue: true
    - match:
        criteria: overuse
        team: support
        severity: critical
        receiver: opsgeniesupport

receivers:
  - name: opsgeniesupport
    opsgenie_configs:
    - api_key: XYZ
      api_url: https://api.opsgenie.com
      message: '{{ .CommonLabels.alertname }}'
      description: "{{ range .Alerts }}{{ .Annotations.description }}\n\r{{ 
end }}"
      tags: '{{ range $k, $v := .CommonLabels}}{{ if or (eq $k "criteria")  
(eq $k "severity") (eq $k "team") }}{{$k}}={{$v}},{{ else if eq $k 
"instance" }}{{ reReplaceAll "(.+):(.+)" "host=$1" $v 
}},{{end}}{{end}},infra,monitor'
      priority: 'P1'
      update_alerts: true
      send_resolved: true
...
So you can see that i derive a  tag host=<hostname> from the instance label.


*Scenario1: *When server1 's local disk usage reaches 50%, i see that 
Opsgenie ticket is created having:
Opsgenie Ticket metadata: 
ticket header name:  local disk usage reached 50%
ticket description:  space on /var file system at server1:9100 server = 
82%."
ticket tags: criteria: overuse , team: support, severity: critical, 
infra,monitor,host=server1

so everything works as expected, no issues with Scenario1.


*Scenario2: *While server1 trigger is active, a second server ( say 
server2)'s local disk usage reaches 50%,

i see that Opsgenie tickets are getting updated as:
ticket header name:  local disk usage reached 50%
ticket description:  space on /var file system at server1:9100 server = 
82%."
ticket description:  space on /var file system at server2:9100 server = 
80%."
ticket tags: criteria: overuse , team: support, severity: critical, 
infra,monitor,host=server1


but i was expecting an additional host=server2 tag on the ticket.  
in Summary - i see updated description , but unable to see updated tags.

in tags section of the alertmanager - opsgenie integration configuration , 
i had tried iterating over Alerts and CommonLabels, but i was unable to 
add  additional host=server2 tag .
{{ range $idx, $alert := .Alerts}}{{range $k, $v := $alert.Labels 
}}{{$k}}={{$v}},{{end}}{{end}},test=test
{{ range $k, $v := .CommonLabels}}....{{end}}


At the moment, i am not sure that what is potentially preventing the update 
of tags on the opsgenie tickets.
If i can get some clarity on the fact that if the configurations i have 
for  alertmanager are good enough, then i can look at the opsgenie 
configurations.


Please advice.


Regards
CP


On Tuesday, April 2, 2024 at 10:46:36 PM UTC+5:30 Brian Candler wrote:

> FYI, those images are unreadable - copy-pasted text would be much better.
>
> My guess, though, is that you probably don't want to group alerts before 
> sending them to opsgenie. You haven't shown your full alertmanager config, 
> but if you have a line like
>
>    group_by: ['alertname']
>
> then try
>
>    group_by: ["..."]
>
> (literally, exactly that: a single string containing three dots, inside 
> square brackets)
>
> On Tuesday 2 April 2024 at 17:15:39 UTC+1 mohan garden wrote:
>
>> Dear Prometheus Community,
>> I am reaching out regarding an issue i have encountered with  prometheus 
>> alert tagging, specifically while creating tickets in Opsgenie.
>>
>>
>> I have configured alertmanager  to send alerts to Opsgenie as , the 
>> configuration as :
>> [image: photo001.png]i ticket is generated with expected description and 
>> tags as - 
>> [image: photo002.png]
>>
>> Now, by default the alerts are grouped by the alert name( default 
>> behavior).So when the similar event happens on a different server i see 
>> that the description is updated as:
>> [image: photo003.png]
>> but the tag on the ticket remains same, 
>> expected behavior: criteria=..., host=108, host=114, infra.....support 
>>
>> I have set update_alert and send_resolved settings to true.
>> I am not sure that in order to make it work as expected, If i need 
>> additional configuration at opsgenie or at the alertmanager. 
>>
>> I would appreciate any insight or guidance on the method to resolve this 
>> issue and ensure that alerts for different servers are correctly tagged in 
>> Opsgenie.
>>
>> Thank you in advance.
>> Regards,
>> CP
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/eee12f80-f738-4bb0-99bc-ecccf86d4907n%40googlegroups.com.

[prometheus-users] Re: Prometheus alert tagging issue - multiple servers

Reply via email to