[prometheus-users] Re: Alerts are getting fire after every minute

Amol Nagotkar Wed, 05 Mar 2025 10:28:29 -0800


Thank you for the quick reply.


So, as i told you i am not using alertmanager. i am getting alerts based on 
config->

alerting:

  alertmanagers:

    - static_configs:

        - targets:

          - IP_ADDRESS_OF_EMAIL_APPLICATION:PORT


written in prometheus.yml file. below is the alert response (array of 
object) i am receiving from prometheus.


[

  {

    annotations: {

      description: 'Queue QUEUE_NAME in RabbitMQ has total 1.110738e+06 
messages\n' +

        'for more than 1 minutes.\n',

      summary: "RabbitMQ Queue 'QUEUE_NAME' has more than 10L messages"

    },

    endsAt: '2025-02-03T06:33:31.893Z',

    startsAt: '2025-02-03T06:28:31.893Z',

    generatorURL: '
http://helo-container-pr:9091/graph?g0.expr=rabbitmq_queue_messages+%3E+1e%2B06&g0.tab=1
',

    labels: {

      alertname: 'Total Messages > 10L in last 1 min',

      instance: 'IP_ADDRESS:15692',

      job: 'rabbitmq-rcs',

      queue: 'QUEUE_NAME',

      severity: 'critical',

      vhost: 'webhook'

    }

  }

]



*If i keep evaluation_internal**: **15s, it started triggering every 
minute.* 

*I want alerts to be trigger after 5 min and only if condition is true.*
On Wednesday, March 5, 2025 at 2:18:34 PM UTC+5:30 Brian Candler wrote:

> You still haven't shown an example of the actual alert you're concerned 
> about (for example, the E-mail containing all the labels and the 
> annotations)
>
> alertmanager cannot generate any alert unless Prometheus triggers it. 
> Please go into the PromQL web interface, switch to the "Graph" tab with the 
> default 1 hour time window (or less), and enter the following queries:
>
> up == 0
> rabbitmq_queue_consumers == 0
> rabbitmq_queue_messages > 10000
>
> Show the graphs.  If they are not blank, then alerts will be generated. 
>
> "*for: 30s" *has no effect when you have "*evaluation_interval: 5m".* I 
> suggest you use *evaluation_internal: 15s* (to match your scrape 
> internal), and then "for: 30s" will have some benefit; it will only send an 
> alert if the alerting condition has been true for two successive cycles.
>
> On Wednesday, 5 March 2025 at 07:50:23 UTC Amol Nagotkar wrote:
>
>> Thank you for the reply.
>>
>>
>> answers for above points-
>>
>> 1. i checked expression "up == 0" is firing rarely and all my targets are 
>> being scraped.
>>
>> 2. for not to get alerts every minutes, now i kept  *evaluation_interval 
>> as 5m* 
>>
>> 3. i have removed keep_firing_for as it is not suitable for my use case.
>>
>>
>> Updated:
>>
>> I am using prometheus alerting for rabbitmq. Below is the configuration I 
>> am using.
>>
>>
>> *prometheus.yml file*
>>
>> global:
>>
>>   scrape_interval: 15s # Set the scrape interval to every 15 seconds. 
>> Default is every 1 minute.
>>
>>   evaluation_interval: 5m # Evaluate rules every 15 seconds. The default 
>> is every 1 minute.
>>
>>   # scrape_timeout is set to the global default (10s).
>>
>>
>> alerting:
>>
>>    alertmanagers:
>>
>>        - static_configs:
>>
>>            - targets:
>>
>>                - ip:port
>>
>> rule_files:
>>
>> - "alerts_rules.yml"
>>
>> scrape_configs:
>>
>> - job_name: "prometheus"
>>
>>   static_configs:
>>
>>   - targets: ["ip:port"]
>>
>>
>> *alerts_rules.yml file*
>>
>> groups:
>>
>> - name: instance_alerts
>>
>>   rules:
>>
>>   - alert: "Instance Down"
>>
>>     expr: up == 0
>>
>>     for: 30s
>>
>>     # keep_firing_for: 30s
>>
>>     labels:
>>
>>       severity: "Critical"
>>
>>     annotations:
>>
>>       summary: "Endpoint {{ $labels.instance }} down"
>>
>>       description: "{{ $labels.instance }} of job {{ $labels.job }} has 
>> been down for more than 30 sec."
>>
>>
>> - name: rabbitmq_alerts
>>
>>   rules:
>>
>>     - alert: "Consumer down for last 1 min"
>>
>>       expr: rabbitmq_queue_consumers == 0
>>
>>       for: 30s
>>
>>       # keep_firing_for: 30s
>>
>>       labels:
>>
>>         severity: Critical
>>
>>       annotations:
>>
>>         summary: "shortify | '{{ $labels.queue }}' has no consumers"
>>
>>         description: "The queue '{{ $labels.queue }}' in vhost '{{ 
>> $labels.vhost }}' has zero consumers for more than 30 sec. Immediate 
>> attention is required."
>>
>>
>>
>>     - alert: "Total Messages > 10k in last 1 min"
>>
>>       expr: rabbitmq_queue_messages > 10000
>>
>>       for: 30s
>>
>>       # keep_firing_for: 30s
>>
>>       labels:
>>
>>         severity: Critical
>>
>>       annotations:
>>
>>         summary: "'{{ $labels.queue }}' has total '{{ $value }}' messages 
>> for more than 1 min."
>>
>>         description: |
>>
>>           Queue {{ $labels.queue }} in RabbitMQ has total {{ $value }} 
>> messages for more than 1 min.
>>
>>
>> Event if there is no data in queue, it sends me alerts. I have kept 
>> *evaluation_interval: 
>> 5m* ( Prometheus evaluates alert rules every 5 minutes) and *for: 30s* 
>> (Ensures 
>> the alert fires only if the condition persists for 30s).
>>
>> I guess *for* is not working for me.
>>
>> By the way* i am not using alertmanager*(
>> https://github.com/prometheus/alertmanager/releases/latest/download/alertmanager-0.28.0.linux-amd64.tar.gz
>> )
>>
>> i am just using *prometheus* (
>> https://github.com/prometheus/prometheus/releases/download/v3.1.0/prometheus-3.1.0.linux-amd64.tar.gz
>> )
>>
>> https://prometheus.io/download/
>>
>> How can i solve this. Thank you in advance.
>>
>> On Saturday, February 15, 2025 at 12:13:01 AM UTC+5:30 Brian Candler 
>> wrote:
>>
>>> > even if application is not down, it sends alerts every 1 min. how to 
>>> debug this i am using below exp:- alert: "Instance Down" expr: up == 0
>>>
>>> You need to show the actual alerts, from the Prometheus web interface 
>>> and/or the notifications, and then describe how these are different from 
>>> what you expect.
>>>
>>> I very much doubt that the expression "up == 0" is firing unless there 
>>> is at least one target which is not being scraped, and therefore the "up" 
>>> metric has a value of 0 for a particular timeseries (metric with a given 
>>> set of labels).
>>>
>>> > if the threshold cross and value changes, it fires multiple alerts 
>>> having same alert rule thats fine. But with same '{{ $value }}' it should 
>>> fire alerts after 5 min. same alert rule with same value should not get 
>>> fire for next 5 min. how to get this ??
>>>
>>> I cannot work out what problem you are trying to describe. As long as 
>>> you only use '{{ $value }}' in annotations, not labels, then the same alert 
>>> will just continue firing.
>>>
>>> Whether you get repeated *notifications* about that ongoing alert is a 
>>> different matter. With "repeat_interval: 15m" you should get them every 15 
>>> minutes at least. You may get additional notifications if a new alert is 
>>> added into the same alert group, or one is resolved from the alert group.
>>>
>>> > whats is for, keep_firing_for and evaluation_interval ?
>>>
>>> keep_firing_for is debouncing: once the alert condition has gone away, 
>>> it will continue firing for this period of time. This is so that if the 
>>> alert condition vanishes briefly but reappears, it doesn't cause the alert 
>>> to be resolved and then retriggered.
>>>
>>> evaluation_interval is how often the alerting expression is evaluated.
>>>
>>>
>>> On Friday, 14 February 2025 at 15:53:24 UTC Amol Nagotkar wrote:
>>>
>>>> Hi all,
>>>> i want same alert(alert rule) to be fire after 5 min, currently i am 
>>>> getting same alert (alert rule) after every one minute for same '{{ $value 
>>>> }}'.
>>>> if the threshold cross and value changes, it fires multiple alerts 
>>>> having same alert rule thats fine. But with same '{{ $value }}' it should 
>>>> fire alerts after 5 min. same alert rule with same value should not get 
>>>> fire for next 5 min. how to get this ??
>>>> even if application is not down, it sends alerts every 1 min. how to 
>>>> debug this i am using below exp:- alert: "Instance Down" expr: up == 0
>>>> whats is for, keep_firing_for and evaluation_interval ?
>>>> prometheus.yml
>>>>
>>>> global:
>>>> scrape_interval: 15s # Set the scrape interval to every 15 seconds. 
>>>> Default is every 1 minute.
>>>> evaluation_interval: 15s # Evaluate rules every 15 seconds. The default 
>>>> is every 1 minute.
>>>>
>>>> alerting:
>>>> alertmanagers:
>>>>
>>>> - static_configs:
>>>> - targets:
>>>> - ip:port
>>>>
>>>> rule_files:
>>>>
>>>> - "alerts_rules.yml"
>>>>
>>>> scrape_configs:
>>>>
>>>> - job_name: "prometheus"
>>>>   static_configs:
>>>>   - targets: ["ip:port"]
>>>>
>>>> alertmanager.yml
>>>> global:
>>>> resolve_timeout: 5m
>>>> route:
>>>> group_wait: 5s
>>>> group_interval: 5m
>>>> repeat_interval: 15m
>>>> receiver: webhook_receiver
>>>> receivers:
>>>>
>>>> - name: webhook_receiver
>>>>   webhook_configs:
>>>>   - url: 'http://ip:port'
>>>>     send_resolved: false
>>>>
>>>> alerts_rules.yml
>>>>
>>>>
>>>> groups:
>>>> - name: instance_alerts
>>>>   rules:
>>>>   - alert: "Instance Down"
>>>>     expr: up == 0
>>>>     # for: 30s
>>>>     # keep_firing_for: 30s
>>>>     labels:
>>>>       severity: "Critical"
>>>>     annotations:
>>>>       summary: "Endpoint {{ $labels.instance }} down"
>>>>       description: "{{ $labels.instance }} of job {{ $labels.job }} has 
>>>> been down for more than 30 sec."
>>>>
>>>> - name: rabbitmq_alerts
>>>>   rules:
>>>>     - alert: "Consumer down for last 1 min"
>>>>       expr: rabbitmq_queue_consumers == 0
>>>>       # for: 1m
>>>>       # keep_firing_for: 30s
>>>>       labels:
>>>>         severity: Critical
>>>>       annotations:
>>>>         summary: "shortify | '{{ $labels.queue }}' has no consumers"
>>>>         description: "The queue '{{ $labels.queue }}' in vhost '{{ 
>>>> $labels.vhost }}' has zero consumers for more than 30 sec. Immediate 
>>>> attention is required."
>>>>
>>>>
>>>>     - alert: "Total Messages > 10k in last 1 min"
>>>>       expr: rabbitmq_queue_messages > 10000
>>>>       # for: 1m
>>>>       # keep_firing_for: 30s
>>>>       labels:
>>>>         severity: Critical
>>>>       annotations:
>>>>         summary: "'{{ $labels.queue }}' has total '{{ $value }}' 
>>>> messages for more than 1 min."
>>>>         description: |
>>>>           Queue {{ $labels.queue }} in RabbitMQ has total {{ $value }} 
>>>> messages for more than 1 min.
>>>>
>>>>
>>>> Thank you in advance.
>>>>
>>>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion visit 
https://groups.google.com/d/msgid/prometheus-users/0355f855-92c1-41f9-8e77-5d3f9aed30efn%40googlegroups.com.

[prometheus-users] Re: Alerts are getting fire after every minute

Reply via email to