Hi all,
i want same alert(alert rule) to be fire after 5 min, currently i am
getting same alert (alert rule) after every one minute for same '{{ $value
}}'.
if the threshold cross and value changes, it fires multiple alerts having
same alert rule thats fine. But with same '{{ $value }}' it should fire
alerts after 5 min. same alert rule with same value should not get fire for
next 5 min. how to get this ??
even if application is not down, it sends alerts every 1 min. how to debug
this i am using below exp:- alert: "Instance Down" expr: up == 0
whats is for, keep_firing_for and evaluation_interval ?
prometheus.yml
global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default
is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is
every 1 minute.
alerting:
alertmanagers:
- static_configs:
- targets:
- ip:port
rule_files:
- "alerts_rules.yml"
scrape_configs:
- job_name: "prometheus"
static_configs:
- targets: ["ip:port"]
alertmanager.yml
global:
resolve_timeout: 5m
route:
group_wait: 5s
group_interval: 5m
repeat_interval: 15m
receiver: webhook_receiver
receivers:
- name: webhook_receiver
webhook_configs:
- url: 'http://ip:port'
send_resolved: false
alerts_rules.yml
groups:
- name: instance_alerts
rules:
- alert: "Instance Down"
expr: up == 0
# for: 30s
# keep_firing_for: 30s
labels:
severity: "Critical"
annotations:
summary: "Endpoint {{ $labels.instance }} down"
description: "{{ $labels.instance }} of job {{ $labels.job }} has
been down for more than 30 sec."
- name: rabbitmq_alerts
rules:
- alert: "Consumer down for last 1 min"
expr: rabbitmq_queue_consumers == 0
# for: 1m
# keep_firing_for: 30s
labels:
severity: Critical
annotations:
summary: "shortify | '{{ $labels.queue }}' has no consumers"
description: "The queue '{{ $labels.queue }}' in vhost '{{
$labels.vhost }}' has zero consumers for more than 30 sec. Immediate
attention is required."
- alert: "Total Messages > 10k in last 1 min"
expr: rabbitmq_queue_messages > 10000
# for: 1m
# keep_firing_for: 30s
labels:
severity: Critical
annotations:
summary: "'{{ $labels.queue }}' has total '{{ $value }}' messages
for more than 1 min."
description: |
Queue {{ $labels.queue }} in RabbitMQ has total {{ $value }}
messages for more than 1 min.
Thank you in advance.
--
You received this message because you are subscribed to the Google Groups
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion visit
https://groups.google.com/d/msgid/prometheus-users/6a32aaa6-ae70-4eac-b3ff-f104f84180aen%40googlegroups.com.