one more imp thing,
why do i receive *same {{ $value }}* alerts again and again. In rabbitmq,
it is possible to get different values, but same value not possible *always*.
but i receive alerts having same value many times.
On Thursday, March 6, 2025 at 10:53:02 AM UTC+5:30 Amol Nagotkar wrote:
> Thanks for the reply.
>
> 1. when i keep evaluation_interval: 5m and for: 30s -> i get alerts every
> 5 min. (those alerts gets store in prometheus and triggers every 5 min, i
> mean even if condition is not matching, i still used to get alerts every
> 5min)
>
>
> now i am changing config to below:-
>
> evaluation_interval: 15s *# on the rule group, or globally*
>
> for: 5m *# on the individual alerting rule(s)*
>
> i will update you about this soon.
>
>
> 2. If you want a more readable string in the annotation, you can use {{
> $value | humanize }}, but it will lose some precision.
>
> This is serious concern for us. how to solve this?
>
> On Wednesday, March 5, 2025 at 11:43:02 PM UTC+5:30 Brian Candler wrote:
>
>> I notice that your "up == 0" graph shows lots of green which are values
>> where up == 0. These are legitimately generating alerts, in my opinion. If
>> you have set evaluation_interval to 5m, and "for:" to be less than 5m, then
>> a single instance of up == 0 will send an alert, because that's what you
>> asked for.
>>
>> *> I want alerts to be trigger after 5 min and only if condition is true.*
>>
>> Then you want:
>>
>> evaluation_interval: 15s # on the rule group, or globally
>> for: 5m # on the individual alerting rule(s)
>>
>> Then an alert will only be sent if alert condition has been present
>> consecutively for the whole 5 minutes (i.e. 20 cycles).
>>
>> Finally: you may find it helpful to include {{ $value }} in an annotation
>> on each alerting rule, so you can tell the value which triggered the alert.
>> I can see you've done this already in one of your alerts:
>>
>> - alert: "Total Messages > 10k in last 1 min"
>> expr: rabbitmq_queue_messages > 10000
>> ...
>>
>> annotations:
>> summary: "'{{ $labels.queue }}' has total '*{{ $value }}*'
>> messages for more than 1 min."
>>
>> And this is reflected in the alert:
>>
>> description: 'Queue QUEUE_NAME in RabbitMQ has total *1.110738e+06*
>> messages\n' +
>>
>> 'for more than 1 minutes.\n',
>>
>> summary: "RabbitMQ Queue 'QUEUE_NAME' has more than 10L messages"
>>
>> rabbitmq_queue_messages is a vector containing zero or more instances of
>> that metric.
>>
>> rabbitmq_queue_messages > 10000 is a reduced vector, containing only
>> those instance of the metric with a value greater than 10000.
>>
>> You can see that the $value at the time the alert was generated
>> was 1.110738e+06, which is 1,110,738, and that's clearly a lot more than
>> 10,000. Hence you get an alert. It's what you asked for.
>>
>> If you want a more readable string in the annotation, you can use {{
>> $value | humanize }}, but it will lose some precision.
>>
>> On Wednesday, 5 March 2025 at 10:28:15 UTC Amol Nagotkar wrote:
>>
>>> As u can see in below images
>>> Last trigger was at 15:31:29
>>> And receive emails after that time also, which is for example 15:35,
>>> 15:37, etc.
>>> [image: IMG-20250305-WA0061.jpg]
>>>
>>> [image: IMG-20250305-WA0060.jpg]
>>> On Wednesday, March 5, 2025 at 3:28:20 PM UTC+5:30 Amol Nagotkar wrote:
>>>
>>>>
>>>> Thank you for the quick reply.
>>>>
>>>> So, as i told you i am not using alertmanager. i am getting alerts
>>>> based on config->
>>>>
>>>> alerting:
>>>>
>>>> alertmanagers:
>>>>
>>>> - static_configs:
>>>>
>>>> - targets:
>>>>
>>>> - IP_ADDRESS_OF_EMAIL_APPLICATION:PORT
>>>>
>>>>
>>>> written in prometheus.yml file. below is the alert response (array of
>>>> object) i am receiving from prometheus.
>>>>
>>>>
>>>> [
>>>>
>>>> {
>>>>
>>>> annotations: {
>>>>
>>>> description: 'Queue QUEUE_NAME in RabbitMQ has total
>>>> 1.110738e+06 messages\n' +
>>>>
>>>> 'for more than 1 minutes.\n',
>>>>
>>>> summary: "RabbitMQ Queue 'QUEUE_NAME' has more than 10L messages"
>>>>
>>>> },
>>>>
>>>> endsAt: '2025-02-03T06:33:31.893Z',
>>>>
>>>> startsAt: '2025-02-03T06:28:31.893Z',
>>>>
>>>> generatorURL: '
>>>> http://helo-container-pr:9091/graph?g0.expr=rabbitmq_queue_messages+%3E+1e%2B06&g0.tab=1
>>>> ',
>>>>
>>>> labels: {
>>>>
>>>> alertname: 'Total Messages > 10L in last 1 min',
>>>>
>>>> instance: 'IP_ADDRESS:15692',
>>>>
>>>> job: 'rabbitmq-rcs',
>>>>
>>>> queue: 'QUEUE_NAME',
>>>>
>>>> severity: 'critical',
>>>>
>>>> vhost: 'webhook'
>>>>
>>>> }
>>>>
>>>> }
>>>>
>>>> ]
>>>>
>>>>
>>>>
>>>> *If i keep evaluation_internal**: **15s, it started triggering every
>>>> minute.*
>>>>
>>>> *I want alerts to be trigger after 5 min and only if condition is true.*
>>>> On Wednesday, March 5, 2025 at 2:18:34 PM UTC+5:30 Brian Candler wrote:
>>>>
>>>>> You still haven't shown an example of the actual alert you're
>>>>> concerned about (for example, the E-mail containing all the labels and
>>>>> the
>>>>> annotations)
>>>>>
>>>>> alertmanager cannot generate any alert unless Prometheus triggers it.
>>>>> Please go into the PromQL web interface, switch to the "Graph" tab with
>>>>> the
>>>>> default 1 hour time window (or less), and enter the following queries:
>>>>>
>>>>> up == 0
>>>>> rabbitmq_queue_consumers == 0
>>>>> rabbitmq_queue_messages > 10000
>>>>>
>>>>> Show the graphs. If they are not blank, then alerts will be
>>>>> generated.
>>>>>
>>>>> "*for: 30s" *has no effect when you have "*evaluation_interval: 5m".* I
>>>>> suggest you use *evaluation_internal: 15s* (to match your scrape
>>>>> internal), and then "for: 30s" will have some benefit; it will only send
>>>>> an
>>>>> alert if the alerting condition has been true for two successive cycles.
>>>>>
>>>>> On Wednesday, 5 March 2025 at 07:50:23 UTC Amol Nagotkar wrote:
>>>>>
>>>>>> Thank you for the reply.
>>>>>>
>>>>>>
>>>>>> answers for above points-
>>>>>>
>>>>>> 1. i checked expression "up == 0" is firing rarely and all my targets
>>>>>> are being scraped.
>>>>>>
>>>>>> 2. for not to get alerts every minutes, now i kept *evaluation_interval
>>>>>> as 5m*
>>>>>>
>>>>>> 3. i have removed keep_firing_for as it is not suitable for my use
>>>>>> case.
>>>>>>
>>>>>>
>>>>>> Updated:
>>>>>>
>>>>>> I am using prometheus alerting for rabbitmq. Below is the
>>>>>> configuration I am using.
>>>>>>
>>>>>>
>>>>>> *prometheus.yml file*
>>>>>>
>>>>>> global:
>>>>>>
>>>>>> scrape_interval: 15s # Set the scrape interval to every 15 seconds.
>>>>>> Default is every 1 minute.
>>>>>>
>>>>>> evaluation_interval: 5m # Evaluate rules every 15 seconds. The
>>>>>> default is every 1 minute.
>>>>>>
>>>>>> # scrape_timeout is set to the global default (10s).
>>>>>>
>>>>>>
>>>>>> alerting:
>>>>>>
>>>>>> alertmanagers:
>>>>>>
>>>>>> - static_configs:
>>>>>>
>>>>>> - targets:
>>>>>>
>>>>>> - ip:port
>>>>>>
>>>>>> rule_files:
>>>>>>
>>>>>> - "alerts_rules.yml"
>>>>>>
>>>>>> scrape_configs:
>>>>>>
>>>>>> - job_name: "prometheus"
>>>>>>
>>>>>> static_configs:
>>>>>>
>>>>>> - targets: ["ip:port"]
>>>>>>
>>>>>>
>>>>>> *alerts_rules.yml file*
>>>>>>
>>>>>> groups:
>>>>>>
>>>>>> - name: instance_alerts
>>>>>>
>>>>>> rules:
>>>>>>
>>>>>> - alert: "Instance Down"
>>>>>>
>>>>>> expr: up == 0
>>>>>>
>>>>>> for: 30s
>>>>>>
>>>>>> # keep_firing_for: 30s
>>>>>>
>>>>>> labels:
>>>>>>
>>>>>> severity: "Critical"
>>>>>>
>>>>>> annotations:
>>>>>>
>>>>>> summary: "Endpoint {{ $labels.instance }} down"
>>>>>>
>>>>>> description: "{{ $labels.instance }} of job {{ $labels.job }}
>>>>>> has been down for more than 30 sec."
>>>>>>
>>>>>>
>>>>>> - name: rabbitmq_alerts
>>>>>>
>>>>>> rules:
>>>>>>
>>>>>> - alert: "Consumer down for last 1 min"
>>>>>>
>>>>>> expr: rabbitmq_queue_consumers == 0
>>>>>>
>>>>>> for: 30s
>>>>>>
>>>>>> # keep_firing_for: 30s
>>>>>>
>>>>>> labels:
>>>>>>
>>>>>> severity: Critical
>>>>>>
>>>>>> annotations:
>>>>>>
>>>>>> summary: "shortify | '{{ $labels.queue }}' has no consumers"
>>>>>>
>>>>>> description: "The queue '{{ $labels.queue }}' in vhost '{{
>>>>>> $labels.vhost }}' has zero consumers for more than 30 sec. Immediate
>>>>>> attention is required."
>>>>>>
>>>>>>
>>>>>>
>>>>>> - alert: "Total Messages > 10k in last 1 min"
>>>>>>
>>>>>> expr: rabbitmq_queue_messages > 10000
>>>>>>
>>>>>> for: 30s
>>>>>>
>>>>>> # keep_firing_for: 30s
>>>>>>
>>>>>> labels:
>>>>>>
>>>>>> severity: Critical
>>>>>>
>>>>>> annotations:
>>>>>>
>>>>>> summary: "'{{ $labels.queue }}' has total '{{ $value }}'
>>>>>> messages for more than 1 min."
>>>>>>
>>>>>> description: |
>>>>>>
>>>>>> Queue {{ $labels.queue }} in RabbitMQ has total {{ $value
>>>>>> }} messages for more than 1 min.
>>>>>>
>>>>>>
>>>>>> Event if there is no data in queue, it sends me alerts. I have kept
>>>>>> *evaluation_interval:
>>>>>> 5m* ( Prometheus evaluates alert rules every 5 minutes) and *for:
>>>>>> 30s* (Ensures the alert fires only if the condition persists for
>>>>>> 30s).
>>>>>>
>>>>>> I guess *for* is not working for me.
>>>>>>
>>>>>> By the way* i am not using alertmanager*(
>>>>>> https://github.com/prometheus/alertmanager/releases/latest/download/alertmanager-0.28.0.linux-amd64.tar.gz
>>>>>> )
>>>>>>
>>>>>> i am just using *prometheus* (
>>>>>> https://github.com/prometheus/prometheus/releases/download/v3.1.0/prometheus-3.1.0.linux-amd64.tar.gz
>>>>>> )
>>>>>>
>>>>>> https://prometheus.io/download/
>>>>>>
>>>>>> How can i solve this. Thank you in advance.
>>>>>>
>>>>>> On Saturday, February 15, 2025 at 12:13:01 AM UTC+5:30 Brian Candler
>>>>>> wrote:
>>>>>>
>>>>>>> > even if application is not down, it sends alerts every 1 min. how
>>>>>>> to debug this i am using below exp:- alert: "Instance Down" expr: up == >>>>>>> 0
>>>>>>>
>>>>>>> You need to show the actual alerts, from the Prometheus web
>>>>>>> interface and/or the notifications, and then describe how these are
>>>>>>> different from what you expect.
>>>>>>>
>>>>>>> I very much doubt that the expression "up == 0" is firing unless
>>>>>>> there is at least one target which is not being scraped, and therefore
>>>>>>> the
>>>>>>> "up" metric has a value of 0 for a particular timeseries (metric with a
>>>>>>> given set of labels).
>>>>>>>
>>>>>>> > if the threshold cross and value changes, it fires multiple alerts
>>>>>>> having same alert rule thats fine. But with same '{{ $value }}' it
>>>>>>> should
>>>>>>> fire alerts after 5 min. same alert rule with same value should not get
>>>>>>> fire for next 5 min. how to get this ??
>>>>>>>
>>>>>>> I cannot work out what problem you are trying to describe. As long
>>>>>>> as you only use '{{ $value }}' in annotations, not labels, then the
>>>>>>> same
>>>>>>> alert will just continue firing.
>>>>>>>
>>>>>>> Whether you get repeated *notifications* about that ongoing alert is
>>>>>>> a different matter. With "repeat_interval: 15m" you should get them
>>>>>>> every
>>>>>>> 15 minutes at least. You may get additional notifications if a new
>>>>>>> alert is
>>>>>>> added into the same alert group, or one is resolved from the alert
>>>>>>> group.
>>>>>>>
>>>>>>> > whats is for, keep_firing_for and evaluation_interval ?
>>>>>>>
>>>>>>> keep_firing_for is debouncing: once the alert condition has gone
>>>>>>> away, it will continue firing for this period of time. This is so that
>>>>>>> if
>>>>>>> the alert condition vanishes briefly but reappears, it doesn't cause
>>>>>>> the
>>>>>>> alert to be resolved and then retriggered.
>>>>>>>
>>>>>>> evaluation_interval is how often the alerting expression is
>>>>>>> evaluated.
>>>>>>>
>>>>>>>
>>>>>>> On Friday, 14 February 2025 at 15:53:24 UTC Amol Nagotkar wrote:
>>>>>>>
>>>>>>>> Hi all,
>>>>>>>> i want same alert(alert rule) to be fire after 5 min, currently i
>>>>>>>> am getting same alert (alert rule) after every one minute for same '{{
>>>>>>>> $value }}'.
>>>>>>>> if the threshold cross and value changes, it fires multiple alerts
>>>>>>>> having same alert rule thats fine. But with same '{{ $value }}' it
>>>>>>>> should
>>>>>>>> fire alerts after 5 min. same alert rule with same value should not
>>>>>>>> get
>>>>>>>> fire for next 5 min. how to get this ??
>>>>>>>> even if application is not down, it sends alerts every 1 min. how
>>>>>>>> to debug this i am using below exp:- alert: "Instance Down" expr: up
>>>>>>>> == 0
>>>>>>>> whats is for, keep_firing_for and evaluation_interval ?
>>>>>>>> prometheus.yml
>>>>>>>>
>>>>>>>> global:
>>>>>>>> scrape_interval: 15s # Set the scrape interval to every 15 seconds.
>>>>>>>> Default is every 1 minute.
>>>>>>>> evaluation_interval: 15s # Evaluate rules every 15 seconds. The
>>>>>>>> default is every 1 minute.
>>>>>>>>
>>>>>>>> alerting:
>>>>>>>> alertmanagers:
>>>>>>>>
>>>>>>>> - static_configs:
>>>>>>>> - targets:
>>>>>>>> - ip:port
>>>>>>>>
>>>>>>>> rule_files:
>>>>>>>>
>>>>>>>> - "alerts_rules.yml"
>>>>>>>>
>>>>>>>> scrape_configs:
>>>>>>>>
>>>>>>>> - job_name: "prometheus"
>>>>>>>> static_configs:
>>>>>>>> - targets: ["ip:port"]
>>>>>>>>
>>>>>>>> alertmanager.yml
>>>>>>>> global:
>>>>>>>> resolve_timeout: 5m
>>>>>>>> route:
>>>>>>>> group_wait: 5s
>>>>>>>> group_interval: 5m
>>>>>>>> repeat_interval: 15m
>>>>>>>> receiver: webhook_receiver
>>>>>>>> receivers:
>>>>>>>>
>>>>>>>> - name: webhook_receiver
>>>>>>>> webhook_configs:
>>>>>>>> - url: 'http://ip:port'
>>>>>>>> send_resolved: false
>>>>>>>>
>>>>>>>> alerts_rules.yml
>>>>>>>>
>>>>>>>>
>>>>>>>> groups:
>>>>>>>> - name: instance_alerts
>>>>>>>> rules:
>>>>>>>> - alert: "Instance Down"
>>>>>>>> expr: up == 0
>>>>>>>> # for: 30s
>>>>>>>> # keep_firing_for: 30s
>>>>>>>> labels:
>>>>>>>> severity: "Critical"
>>>>>>>> annotations:
>>>>>>>> summary: "Endpoint {{ $labels.instance }} down"
>>>>>>>> description: "{{ $labels.instance }} of job {{ $labels.job }}
>>>>>>>> has been down for more than 30 sec."
>>>>>>>>
>>>>>>>> - name: rabbitmq_alerts
>>>>>>>> rules:
>>>>>>>> - alert: "Consumer down for last 1 min"
>>>>>>>> expr: rabbitmq_queue_consumers == 0
>>>>>>>> # for: 1m
>>>>>>>> # keep_firing_for: 30s
>>>>>>>> labels:
>>>>>>>> severity: Critical
>>>>>>>> annotations:
>>>>>>>> summary: "shortify | '{{ $labels.queue }}' has no consumers"
>>>>>>>> description: "The queue '{{ $labels.queue }}' in vhost '{{
>>>>>>>> $labels.vhost }}' has zero consumers for more than 30 sec. Immediate
>>>>>>>> attention is required."
>>>>>>>>
>>>>>>>>
>>>>>>>> - alert: "Total Messages > 10k in last 1 min"
>>>>>>>> expr: rabbitmq_queue_messages > 10000
>>>>>>>> # for: 1m
>>>>>>>> # keep_firing_for: 30s
>>>>>>>> labels:
>>>>>>>> severity: Critical
>>>>>>>> annotations:
>>>>>>>> summary: "'{{ $labels.queue }}' has total '{{ $value }}'
>>>>>>>> messages for more than 1 min."
>>>>>>>> description: |
>>>>>>>> Queue {{ $labels.queue }} in RabbitMQ has total {{ $value
>>>>>>>> }} messages for more than 1 min.
>>>>>>>>
>>>>>>>>
>>>>>>>> Thank you in advance.
>>>>>>>>
>>>>>>>
--
You received this message because you are subscribed to the Google Groups
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion visit
https://groups.google.com/d/msgid/prometheus-users/d21e9c53-64a8-4581-88d4-c75830f8f142n%40googlegroups.com.