I can see from your ALERTS graph that your alerts are all different (they
have a different combination of labels), which in turn comes from here:
labels:
metric: rail_temp
severity: warning
threshold: 0
threshold_type: global
value: '{{ $value }}' <<< HERE
Just remove that label, and you should be good. You can use $value in
annotations, but you should not use it in labels, for this very reason.
What's happening is that $value changes, and so the old alert (with
value="old") resolves, and a new alert fires (with value="new")
On Monday, 20 January 2025 at 10:02:12 UTC Alexander Diyakov wrote:
> Hello Prometheus Users,
>
> I'm facing an issue with my alert rules where the alerts are resetting on
> every evaluation cycle. I have simplified the setup as much as possible,
> but the problem persists. Here's the context:
>
> 1. *Metric :*
>
> Metric rail_temp is continuously increasing or decreasing and is always
> greater than 0.
>
> The metric is exposed via an HTTP server using
> the start_http_server function from prometheus_client. It updates every
> second.
>
> 2. *Alert Rule:*
>
> groups:
> - name: rail_temp_alerts
> rules:
> - alert: rail_temp_Warning
> annotations:
> description: rail_temp is above the warning threshold
> (rail_temp_th_W_G)
> summary: rail_temp exceeded warning threshold
> expr: rail_temp > 0
> for: 10s
> labels:
> metric: rail_temp
> severity: warning
> threshold: 0
> threshold_type: global
> value: '{{ $value }}'
>
> 3. *Prometheus Global Configuration*
>
> global:
> scrape_interval: 7s
> evaluation_interval: 4s
> # scrape_timeout is set to the global default (10s).
>
> rule_files:
> - "alert_rules.yml"
>
> scrape_configs:
>
> - job_name: "pushgateway"
> scrape_interval: 1s
> static_configs:
> - targets: ["localhost:9091"] # URL Pushgateway
>
> 4. *Observations:*
>
>
>
> The rail_temp metric has no gaps and updates correctly, as seen in the
> screenshot
>
> However, the alert constantly resets on each evaluation cycle
> (evaluation_interval: 4s), even though the for duration is set to 10
> seconds. And alert newer goes to Firing, otherwise for=0.
>
> There's two graphs of the ALERTS prometheus internal metric and
> Alerts tab.
>
>
>
> 5. *What* *I've Tried:*
>
>
> Verified that the metric updates correctly without any gaps.
> Used both push_to_gateway and start_http_server to expose metrics, but the
> behavior remains the same.
> Increased the for duration and adjusted the
> scrape_interval and evaluation_interval, but it didn't help.
>
>
> 6. *Expected Behavior:*
>
> The alert should transition to firing after the for duration is met
> without resetting on each evaluation cycle.
>
> 7. *Current Behavior:*
>
> The alert resets to pending every 4 seconds (matching
> the evaluation_interval) instead of transitioning to firing.
>
> I believe this could be a bug or misconfiguration, but I'm not sure how to
> further debug this. Any insights or suggestions on resolving this would be
> greatly appreciated.
>
> Thank you in advance!
>
> Best regards,
>
> Alexander
>
> [image: Screenshot 2025-01-20 122907.png]
>
--
You received this message because you are subscribed to the Google Groups
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion visit
https://groups.google.com/d/msgid/prometheus-users/058188f2-cf1d-4374-8ea6-6fa0501293acn%40googlegroups.com.