[prometheus-users] Re: Alerts are getting fire after every minute

Amol Nagotkar Tue, 04 Mar 2025 23:50:31 -0800


Thank you for the reply.



answers for above points-

1. i checked expression "up == 0" is firing rarely and all my targets are 
being scraped.

2. for not to get alerts every minutes, now i kept  *evaluation_interval as 
5m* 

3. i have removed keep_firing_for as it is not suitable for my use case.


Updated:

I am using prometheus alerting for rabbitmq. Below is the configuration I 
am using.


*prometheus.yml file*

global:

  scrape_interval: 15s # Set the scrape interval to every 15 seconds. 
Default is every 1 minute.

  evaluation_interval: 5m # Evaluate rules every 15 seconds. The default is 
every 1 minute.

  # scrape_timeout is set to the global default (10s).


alerting:

   alertmanagers:

       - static_configs:

           - targets:

               - ip:port

rule_files:

- "alerts_rules.yml"

scrape_configs:

- job_name: "prometheus"

  static_configs:

  - targets: ["ip:port"]


*alerts_rules.yml file*

groups:

- name: instance_alerts

  rules:

  - alert: "Instance Down"

    expr: up == 0

    for: 30s

    # keep_firing_for: 30s

    labels:

      severity: "Critical"

    annotations:

      summary: "Endpoint {{ $labels.instance }} down"

      description: "{{ $labels.instance }} of job {{ $labels.job }} has 
been down for more than 30 sec."


- name: rabbitmq_alerts

  rules:

    - alert: "Consumer down for last 1 min"

      expr: rabbitmq_queue_consumers == 0

      for: 30s

      # keep_firing_for: 30s

      labels:

        severity: Critical

      annotations:

        summary: "shortify | '{{ $labels.queue }}' has no consumers"

        description: "The queue '{{ $labels.queue }}' in vhost '{{ 
$labels.vhost }}' has zero consumers for more than 30 sec. Immediate 
attention is required."



    - alert: "Total Messages > 10k in last 1 min"

      expr: rabbitmq_queue_messages > 10000

      for: 30s

      # keep_firing_for: 30s

      labels:

        severity: Critical

      annotations:

        summary: "'{{ $labels.queue }}' has total '{{ $value }}' messages 
for more than 1 min."

        description: |

          Queue {{ $labels.queue }} in RabbitMQ has total {{ $value }} 
messages for more than 1 min.


Event if there is no data in queue, it sends me alerts. I have kept 
*evaluation_interval: 
5m* ( Prometheus evaluates alert rules every 5 minutes) and *for: 30s* (Ensures 
the alert fires only if the condition persists for 30s).

I guess *for* is not working for me.

By the way* i am not using alertmanager*
(https://github.com/prometheus/alertmanager/releases/latest/download/alertmanager-0.28.0.linux-amd64.tar.gz)

i am just using *prometheus*
 
(https://github.com/prometheus/prometheus/releases/download/v3.1.0/prometheus-3.1.0.linux-amd64.tar.gz)

https://prometheus.io/download/

How can i solve this. Thank you in advance.

On Saturday, February 15, 2025 at 12:13:01 AM UTC+5:30 Brian Candler wrote:

> > even if application is not down, it sends alerts every 1 min. how to 
> debug this i am using below exp:- alert: "Instance Down" expr: up == 0
>
> You need to show the actual alerts, from the Prometheus web interface 
> and/or the notifications, and then describe how these are different from 
> what you expect.
>
> I very much doubt that the expression "up == 0" is firing unless there is 
> at least one target which is not being scraped, and therefore the "up" 
> metric has a value of 0 for a particular timeseries (metric with a given 
> set of labels).
>
> > if the threshold cross and value changes, it fires multiple alerts 
> having same alert rule thats fine. But with same '{{ $value }}' it should 
> fire alerts after 5 min. same alert rule with same value should not get 
> fire for next 5 min. how to get this ??
>
> I cannot work out what problem you are trying to describe. As long as you 
> only use '{{ $value }}' in annotations, not labels, then the same alert 
> will just continue firing.
>
> Whether you get repeated *notifications* about that ongoing alert is a 
> different matter. With "repeat_interval: 15m" you should get them every 15 
> minutes at least. You may get additional notifications if a new alert is 
> added into the same alert group, or one is resolved from the alert group.
>
> > whats is for, keep_firing_for and evaluation_interval ?
>
> keep_firing_for is debouncing: once the alert condition has gone away, it 
> will continue firing for this period of time. This is so that if the alert 
> condition vanishes briefly but reappears, it doesn't cause the alert to be 
> resolved and then retriggered.
>
> evaluation_interval is how often the alerting expression is evaluated.
>
>
> On Friday, 14 February 2025 at 15:53:24 UTC Amol Nagotkar wrote:
>
>> Hi all,
>> i want same alert(alert rule) to be fire after 5 min, currently i am 
>> getting same alert (alert rule) after every one minute for same '{{ $value 
>> }}'.
>> if the threshold cross and value changes, it fires multiple alerts having 
>> same alert rule thats fine. But with same '{{ $value }}' it should fire 
>> alerts after 5 min. same alert rule with same value should not get fire for 
>> next 5 min. how to get this ??
>> even if application is not down, it sends alerts every 1 min. how to 
>> debug this i am using below exp:- alert: "Instance Down" expr: up == 0
>> whats is for, keep_firing_for and evaluation_interval ?
>> prometheus.yml
>>
>> global:
>> scrape_interval: 15s # Set the scrape interval to every 15 seconds. 
>> Default is every 1 minute.
>> evaluation_interval: 15s # Evaluate rules every 15 seconds. The default 
>> is every 1 minute.
>>
>> alerting:
>> alertmanagers:
>>
>> - static_configs:
>> - targets:
>> - ip:port
>>
>> rule_files:
>>
>> - "alerts_rules.yml"
>>
>> scrape_configs:
>>
>> - job_name: "prometheus"
>>   static_configs:
>>   - targets: ["ip:port"]
>>
>> alertmanager.yml
>> global:
>> resolve_timeout: 5m
>> route:
>> group_wait: 5s
>> group_interval: 5m
>> repeat_interval: 15m
>> receiver: webhook_receiver
>> receivers:
>>
>> - name: webhook_receiver
>>   webhook_configs:
>>   - url: 'http://ip:port'
>>     send_resolved: false
>>
>> alerts_rules.yml
>>
>>
>> groups:
>> - name: instance_alerts
>>   rules:
>>   - alert: "Instance Down"
>>     expr: up == 0
>>     # for: 30s
>>     # keep_firing_for: 30s
>>     labels:
>>       severity: "Critical"
>>     annotations:
>>       summary: "Endpoint {{ $labels.instance }} down"
>>       description: "{{ $labels.instance }} of job {{ $labels.job }} has 
>> been down for more than 30 sec."
>>
>> - name: rabbitmq_alerts
>>   rules:
>>     - alert: "Consumer down for last 1 min"
>>       expr: rabbitmq_queue_consumers == 0
>>       # for: 1m
>>       # keep_firing_for: 30s
>>       labels:
>>         severity: Critical
>>       annotations:
>>         summary: "shortify | '{{ $labels.queue }}' has no consumers"
>>         description: "The queue '{{ $labels.queue }}' in vhost '{{ 
>> $labels.vhost }}' has zero consumers for more than 30 sec. Immediate 
>> attention is required."
>>
>>
>>     - alert: "Total Messages > 10k in last 1 min"
>>       expr: rabbitmq_queue_messages > 10000
>>       # for: 1m
>>       # keep_firing_for: 30s
>>       labels:
>>         severity: Critical
>>       annotations:
>>         summary: "'{{ $labels.queue }}' has total '{{ $value }}' messages 
>> for more than 1 min."
>>         description: |
>>           Queue {{ $labels.queue }} in RabbitMQ has total {{ $value }} 
>> messages for more than 1 min.
>>
>>
>> Thank you in advance.
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion visit 
https://groups.google.com/d/msgid/prometheus-users/368256ba-f5b2-4414-a4f7-aef3c2ddf5b1n%40googlegroups.com.

[prometheus-users] Re: Alerts are getting fire after every minute

Reply via email to