Thanks for the reply.
1. when i keep evaluation_interval: 5m and for: 30s -> i get alerts every 5
min. (those alerts gets store in prometheus and triggers every 5 min, i
mean even if condition is not matching, i still used to get alerts every
5min)
now i am changing config to below:-
evaluation_interval: 15s *# on the rule group, or globally*
for: 5m *# on the individual alerting rule(s)*
i will update you about this soon.
2. If you want a more readable string in the annotation, you can use {{
$value | humanize }}, but it will lose some precision.
This is serious concern for us. how to solve this?
On Wednesday, March 5, 2025 at 11:43:02 PM UTC+5:30 Brian Candler wrote:
> I notice that your "up == 0" graph shows lots of green which are values
> where up == 0. These are legitimately generating alerts, in my opinion. If
> you have set evaluation_interval to 5m, and "for:" to be less than 5m, then
> a single instance of up == 0 will send an alert, because that's what you
> asked for.
>
> *> I want alerts to be trigger after 5 min and only if condition is true.*
>
> Then you want:
>
> evaluation_interval: 15s # on the rule group, or globally
> for: 5m # on the individual alerting rule(s)
>
> Then an alert will only be sent if alert condition has been present
> consecutively for the whole 5 minutes (i.e. 20 cycles).
>
> Finally: you may find it helpful to include {{ $value }} in an annotation
> on each alerting rule, so you can tell the value which triggered the alert.
> I can see you've done this already in one of your alerts:
>
> - alert: "Total Messages > 10k in last 1 min"
> expr: rabbitmq_queue_messages > 10000
> ...
>
> annotations:
> summary: "'{{ $labels.queue }}' has total '*{{ $value }}*'
> messages for more than 1 min."
>
> And this is reflected in the alert:
>
> description: 'Queue QUEUE_NAME in RabbitMQ has total *1.110738e+06*
> messages\n' +
>
> 'for more than 1 minutes.\n',
>
> summary: "RabbitMQ Queue 'QUEUE_NAME' has more than 10L messages"
>
> rabbitmq_queue_messages is a vector containing zero or more instances of
> that metric.
>
> rabbitmq_queue_messages > 10000 is a reduced vector, containing only those
> instance of the metric with a value greater than 10000.
>
> You can see that the $value at the time the alert was generated
> was 1.110738e+06, which is 1,110,738, and that's clearly a lot more than
> 10,000. Hence you get an alert. It's what you asked for.
>
> If you want a more readable string in the annotation, you can use {{
> $value | humanize }}, but it will lose some precision.
>
> On Wednesday, 5 March 2025 at 10:28:15 UTC Amol Nagotkar wrote:
>
>> As u can see in below images
>> Last trigger was at 15:31:29
>> And receive emails after that time also, which is for example 15:35,
>> 15:37, etc.
>> [image: IMG-20250305-WA0061.jpg]
>>
>> [image: IMG-20250305-WA0060.jpg]
>> On Wednesday, March 5, 2025 at 3:28:20 PM UTC+5:30 Amol Nagotkar wrote:
>>
>>>
>>> Thank you for the quick reply.
>>>
>>> So, as i told you i am not using alertmanager. i am getting alerts based
>>> on config->
>>>
>>> alerting:
>>>
>>> alertmanagers:
>>>
>>> - static_configs:
>>>
>>> - targets:
>>>
>>> - IP_ADDRESS_OF_EMAIL_APPLICATION:PORT
>>>
>>>
>>> written in prometheus.yml file. below is the alert response (array of
>>> object) i am receiving from prometheus.
>>>
>>>
>>> [
>>>
>>> {
>>>
>>> annotations: {
>>>
>>> description: 'Queue QUEUE_NAME in RabbitMQ has total 1.110738e+06
>>> messages\n' +
>>>
>>> 'for more than 1 minutes.\n',
>>>
>>> summary: "RabbitMQ Queue 'QUEUE_NAME' has more than 10L messages"
>>>
>>> },
>>>
>>> endsAt: '2025-02-03T06:33:31.893Z',
>>>
>>> startsAt: '2025-02-03T06:28:31.893Z',
>>>
>>> generatorURL: '
>>> http://helo-container-pr:9091/graph?g0.expr=rabbitmq_queue_messages+%3E+1e%2B06&g0.tab=1
>>> ',
>>>
>>> labels: {
>>>
>>> alertname: 'Total Messages > 10L in last 1 min',
>>>
>>> instance: 'IP_ADDRESS:15692',
>>>
>>> job: 'rabbitmq-rcs',
>>>
>>> queue: 'QUEUE_NAME',
>>>
>>> severity: 'critical',
>>>
>>> vhost: 'webhook'
>>>
>>> }
>>>
>>> }
>>>
>>> ]
>>>
>>>
>>>
>>> *If i keep evaluation_internal**: **15s, it started triggering every
>>> minute.*
>>>
>>> *I want alerts to be trigger after 5 min and only if condition is true.*
>>> On Wednesday, March 5, 2025 at 2:18:34 PM UTC+5:30 Brian Candler wrote:
>>>
>>>> You still haven't shown an example of the actual alert you're concerned
>>>> about (for example, the E-mail containing all the labels and the
>>>> annotations)
>>>>
>>>> alertmanager cannot generate any alert unless Prometheus triggers it.
>>>> Please go into the PromQL web interface, switch to the "Graph" tab with
>>>> the
>>>> default 1 hour time window (or less), and enter the following queries:
>>>>
>>>> up == 0
>>>> rabbitmq_queue_consumers == 0
>>>> rabbitmq_queue_messages > 10000
>>>>
>>>> Show the graphs. If they are not blank, then alerts will be generated.
>>>>
>>>> "*for: 30s" *has no effect when you have "*evaluation_interval: 5m".* I
>>>> suggest you use *evaluation_internal: 15s* (to match your scrape
>>>> internal), and then "for: 30s" will have some benefit; it will only send
>>>> an
>>>> alert if the alerting condition has been true for two successive cycles.
>>>>
>>>> On Wednesday, 5 March 2025 at 07:50:23 UTC Amol Nagotkar wrote:
>>>>
>>>>> Thank you for the reply.
>>>>>
>>>>>
>>>>> answers for above points-
>>>>>
>>>>> 1. i checked expression "up == 0" is firing rarely and all my targets
>>>>> are being scraped.
>>>>>
>>>>> 2. for not to get alerts every minutes, now i kept *evaluation_interval
>>>>> as 5m*
>>>>>
>>>>> 3. i have removed keep_firing_for as it is not suitable for my use
>>>>> case.
>>>>>
>>>>>
>>>>> Updated:
>>>>>
>>>>> I am using prometheus alerting for rabbitmq. Below is the
>>>>> configuration I am using.
>>>>>
>>>>>
>>>>> *prometheus.yml file*
>>>>>
>>>>> global:
>>>>>
>>>>> scrape_interval: 15s # Set the scrape interval to every 15 seconds.
>>>>> Default is every 1 minute.
>>>>>
>>>>> evaluation_interval: 5m # Evaluate rules every 15 seconds. The
>>>>> default is every 1 minute.
>>>>>
>>>>> # scrape_timeout is set to the global default (10s).
>>>>>
>>>>>
>>>>> alerting:
>>>>>
>>>>> alertmanagers:
>>>>>
>>>>> - static_configs:
>>>>>
>>>>> - targets:
>>>>>
>>>>> - ip:port
>>>>>
>>>>> rule_files:
>>>>>
>>>>> - "alerts_rules.yml"
>>>>>
>>>>> scrape_configs:
>>>>>
>>>>> - job_name: "prometheus"
>>>>>
>>>>> static_configs:
>>>>>
>>>>> - targets: ["ip:port"]
>>>>>
>>>>>
>>>>> *alerts_rules.yml file*
>>>>>
>>>>> groups:
>>>>>
>>>>> - name: instance_alerts
>>>>>
>>>>> rules:
>>>>>
>>>>> - alert: "Instance Down"
>>>>>
>>>>> expr: up == 0
>>>>>
>>>>> for: 30s
>>>>>
>>>>> # keep_firing_for: 30s
>>>>>
>>>>> labels:
>>>>>
>>>>> severity: "Critical"
>>>>>
>>>>> annotations:
>>>>>
>>>>> summary: "Endpoint {{ $labels.instance }} down"
>>>>>
>>>>> description: "{{ $labels.instance }} of job {{ $labels.job }}
>>>>> has been down for more than 30 sec."
>>>>>
>>>>>
>>>>> - name: rabbitmq_alerts
>>>>>
>>>>> rules:
>>>>>
>>>>> - alert: "Consumer down for last 1 min"
>>>>>
>>>>> expr: rabbitmq_queue_consumers == 0
>>>>>
>>>>> for: 30s
>>>>>
>>>>> # keep_firing_for: 30s
>>>>>
>>>>> labels:
>>>>>
>>>>> severity: Critical
>>>>>
>>>>> annotations:
>>>>>
>>>>> summary: "shortify | '{{ $labels.queue }}' has no consumers"
>>>>>
>>>>> description: "The queue '{{ $labels.queue }}' in vhost '{{
>>>>> $labels.vhost }}' has zero consumers for more than 30 sec. Immediate
>>>>> attention is required."
>>>>>
>>>>>
>>>>>
>>>>> - alert: "Total Messages > 10k in last 1 min"
>>>>>
>>>>> expr: rabbitmq_queue_messages > 10000
>>>>>
>>>>> for: 30s
>>>>>
>>>>> # keep_firing_for: 30s
>>>>>
>>>>> labels:
>>>>>
>>>>> severity: Critical
>>>>>
>>>>> annotations:
>>>>>
>>>>> summary: "'{{ $labels.queue }}' has total '{{ $value }}'
>>>>> messages for more than 1 min."
>>>>>
>>>>> description: |
>>>>>
>>>>> Queue {{ $labels.queue }} in RabbitMQ has total {{ $value }}
>>>>> messages for more than 1 min.
>>>>>
>>>>>
>>>>> Event if there is no data in queue, it sends me alerts. I have kept
>>>>> *evaluation_interval:
>>>>> 5m* ( Prometheus evaluates alert rules every 5 minutes) and *for: 30s*
>>>>> (Ensures
>>>>> the alert fires only if the condition persists for 30s).
>>>>>
>>>>> I guess *for* is not working for me.
>>>>>
>>>>> By the way* i am not using alertmanager*(
>>>>> https://github.com/prometheus/alertmanager/releases/latest/download/alertmanager-0.28.0.linux-amd64.tar.gz
>>>>> )
>>>>>
>>>>> i am just using *prometheus* (
>>>>> https://github.com/prometheus/prometheus/releases/download/v3.1.0/prometheus-3.1.0.linux-amd64.tar.gz
>>>>> )
>>>>>
>>>>> https://prometheus.io/download/
>>>>>
>>>>> How can i solve this. Thank you in advance.
>>>>>
>>>>> On Saturday, February 15, 2025 at 12:13:01 AM UTC+5:30 Brian Candler
>>>>> wrote:
>>>>>
>>>>>> > even if application is not down, it sends alerts every 1 min. how
>>>>>> to debug this i am using below exp:- alert: "Instance Down" expr: up == 0
>>>>>>
>>>>>> You need to show the actual alerts, from the Prometheus web interface
>>>>>> and/or the notifications, and then describe how these are different from
>>>>>> what you expect.
>>>>>>
>>>>>> I very much doubt that the expression "up == 0" is firing unless
>>>>>> there is at least one target which is not being scraped, and therefore
>>>>>> the
>>>>>> "up" metric has a value of 0 for a particular timeseries (metric with a
>>>>>> given set of labels).
>>>>>>
>>>>>> > if the threshold cross and value changes, it fires multiple alerts
>>>>>> having same alert rule thats fine. But with same '{{ $value }}' it
>>>>>> should
>>>>>> fire alerts after 5 min. same alert rule with same value should not get
>>>>>> fire for next 5 min. how to get this ??
>>>>>>
>>>>>> I cannot work out what problem you are trying to describe. As long as
>>>>>> you only use '{{ $value }}' in annotations, not labels, then the same
>>>>>> alert
>>>>>> will just continue firing.
>>>>>>
>>>>>> Whether you get repeated *notifications* about that ongoing alert is
>>>>>> a different matter. With "repeat_interval: 15m" you should get them
>>>>>> every
>>>>>> 15 minutes at least. You may get additional notifications if a new alert
>>>>>> is
>>>>>> added into the same alert group, or one is resolved from the alert group.
>>>>>>
>>>>>> > whats is for, keep_firing_for and evaluation_interval ?
>>>>>>
>>>>>> keep_firing_for is debouncing: once the alert condition has gone
>>>>>> away, it will continue firing for this period of time. This is so that
>>>>>> if
>>>>>> the alert condition vanishes briefly but reappears, it doesn't cause the
>>>>>> alert to be resolved and then retriggered.
>>>>>>
>>>>>> evaluation_interval is how often the alerting expression is evaluated.
>>>>>>
>>>>>>
>>>>>> On Friday, 14 February 2025 at 15:53:24 UTC Amol Nagotkar wrote:
>>>>>>
>>>>>>> Hi all,
>>>>>>> i want same alert(alert rule) to be fire after 5 min, currently i am
>>>>>>> getting same alert (alert rule) after every one minute for same '{{
>>>>>>> $value
>>>>>>> }}'.
>>>>>>> if the threshold cross and value changes, it fires multiple alerts
>>>>>>> having same alert rule thats fine. But with same '{{ $value }}' it
>>>>>>> should
>>>>>>> fire alerts after 5 min. same alert rule with same value should not get
>>>>>>> fire for next 5 min. how to get this ??
>>>>>>> even if application is not down, it sends alerts every 1 min. how to
>>>>>>> debug this i am using below exp:- alert: "Instance Down" expr: up == 0
>>>>>>> whats is for, keep_firing_for and evaluation_interval ?
>>>>>>> prometheus.yml
>>>>>>>
>>>>>>> global:
>>>>>>> scrape_interval: 15s # Set the scrape interval to every 15 seconds.
>>>>>>> Default is every 1 minute.
>>>>>>> evaluation_interval: 15s # Evaluate rules every 15 seconds. The
>>>>>>> default is every 1 minute.
>>>>>>>
>>>>>>> alerting:
>>>>>>> alertmanagers:
>>>>>>>
>>>>>>> - static_configs:
>>>>>>> - targets:
>>>>>>> - ip:port
>>>>>>>
>>>>>>> rule_files:
>>>>>>>
>>>>>>> - "alerts_rules.yml"
>>>>>>>
>>>>>>> scrape_configs:
>>>>>>>
>>>>>>> - job_name: "prometheus"
>>>>>>> static_configs:
>>>>>>> - targets: ["ip:port"]
>>>>>>>
>>>>>>> alertmanager.yml
>>>>>>> global:
>>>>>>> resolve_timeout: 5m
>>>>>>> route:
>>>>>>> group_wait: 5s
>>>>>>> group_interval: 5m
>>>>>>> repeat_interval: 15m
>>>>>>> receiver: webhook_receiver
>>>>>>> receivers:
>>>>>>>
>>>>>>> - name: webhook_receiver
>>>>>>> webhook_configs:
>>>>>>> - url: 'http://ip:port'
>>>>>>> send_resolved: false
>>>>>>>
>>>>>>> alerts_rules.yml
>>>>>>>
>>>>>>>
>>>>>>> groups:
>>>>>>> - name: instance_alerts
>>>>>>> rules:
>>>>>>> - alert: "Instance Down"
>>>>>>> expr: up == 0
>>>>>>> # for: 30s
>>>>>>> # keep_firing_for: 30s
>>>>>>> labels:
>>>>>>> severity: "Critical"
>>>>>>> annotations:
>>>>>>> summary: "Endpoint {{ $labels.instance }} down"
>>>>>>> description: "{{ $labels.instance }} of job {{ $labels.job }}
>>>>>>> has been down for more than 30 sec."
>>>>>>>
>>>>>>> - name: rabbitmq_alerts
>>>>>>> rules:
>>>>>>> - alert: "Consumer down for last 1 min"
>>>>>>> expr: rabbitmq_queue_consumers == 0
>>>>>>> # for: 1m
>>>>>>> # keep_firing_for: 30s
>>>>>>> labels:
>>>>>>> severity: Critical
>>>>>>> annotations:
>>>>>>> summary: "shortify | '{{ $labels.queue }}' has no consumers"
>>>>>>> description: "The queue '{{ $labels.queue }}' in vhost '{{
>>>>>>> $labels.vhost }}' has zero consumers for more than 30 sec. Immediate
>>>>>>> attention is required."
>>>>>>>
>>>>>>>
>>>>>>> - alert: "Total Messages > 10k in last 1 min"
>>>>>>> expr: rabbitmq_queue_messages > 10000
>>>>>>> # for: 1m
>>>>>>> # keep_firing_for: 30s
>>>>>>> labels:
>>>>>>> severity: Critical
>>>>>>> annotations:
>>>>>>> summary: "'{{ $labels.queue }}' has total '{{ $value }}'
>>>>>>> messages for more than 1 min."
>>>>>>> description: |
>>>>>>> Queue {{ $labels.queue }} in RabbitMQ has total {{ $value
>>>>>>> }} messages for more than 1 min.
>>>>>>>
>>>>>>>
>>>>>>> Thank you in advance.
>>>>>>>
>>>>>>
--
You received this message because you are subscribed to the Google Groups
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion visit
https://groups.google.com/d/msgid/prometheus-users/dec1a3a8-2c00-4d20-9b92-511effb7b043n%40googlegroups.com.