[prometheus-users] Re: Alert Resetting on Every Evaluation Cycle

'Brian Candler' via Prometheus Users Mon, 20 Jan 2025 06:35:29 -0800

I can see from your ALERTS graph that your alerts are all different (they 
have a different combination of labels), which in turn comes from here:


    labels:
      metric: rail_temp
      severity: warning
      threshold: 0
      threshold_type: global
      value: '{{ $value }}'     <<< HERE

Just remove that label, and you should be good.  You can use $value in 
annotations, but you should not use it in labels, for this very reason.

What's happening is that $value changes, and so the old alert (with 
value="old") resolves, and a new alert fires (with value="new")

On Monday, 20 January 2025 at 10:02:12 UTC Alexander Diyakov wrote:

> Hello Prometheus Users,
>
> I'm facing an issue with my alert rules where the alerts are resetting on 
> every evaluation cycle. I have simplified the setup as much as possible, 
> but the problem persists. Here's the context:
>
>    1. *Metric :* 
>
> Metric rail_temp is continuously increasing or decreasing and is always 
> greater than 0.
>
> The metric is exposed via an HTTP server using 
> the start_http_server function from prometheus_client. It updates every 
> second.
>
>    2. *Alert Rule:* 
>
> groups:
> - name: rail_temp_alerts
>   rules:
>   - alert: rail_temp_Warning
>     annotations:
>       description: rail_temp is above the warning threshold 
> (rail_temp_th_W_G)
>       summary: rail_temp exceeded warning threshold
>     expr: rail_temp > 0
>     for: 10s
>     labels:
>       metric: rail_temp
>       severity: warning
>       threshold: 0
>       threshold_type: global
>       value: '{{ $value }}'
>
>    3. *Prometheus Global Configuration* 
>
> global:
>   scrape_interval: 7s  
>   evaluation_interval: 4s  
>   # scrape_timeout is set to the global default (10s).
>
> rule_files:
>    - "alert_rules.yml"
>
> scrape_configs:
>
>   - job_name: "pushgateway"
>     scrape_interval: 1s
>     static_configs:
>       - targets: ["localhost:9091"]  # URL Pushgateway
>
>    4. *Observations:*    
>
>  
>
> The rail_temp metric has no gaps and updates correctly, as seen in the 
> screenshot
>
> However, the alert constantly resets  on each evaluation cycle 
> (evaluation_interval: 4s), even though the for duration is set to 10 
> seconds. And alert newer goes to Firing, otherwise for=0.
>
>          There's two graphs of the ALERTS prometheus internal metric and 
> Alerts tab.
>
>  
>
>    5. *What* *I've Tried:* 
>
>        
> Verified that the metric updates correctly without any gaps.
> Used both push_to_gateway and start_http_server to expose metrics, but the 
> behavior remains the same.
> Increased the for duration and adjusted the 
> scrape_interval and evaluation_interval, but it didn't help.
>
>
>    6. *Expected Behavior:* 
>
> The alert should transition to firing after the for duration is met 
> without resetting on each evaluation cycle.
>
>    7. *Current Behavior:* 
>
> The alert resets to pending every 4 seconds (matching 
> the evaluation_interval) instead of transitioning to firing.
>
> I believe this could be a bug or misconfiguration, but I'm not sure how to 
> further debug this. Any insights or suggestions on resolving this would be 
> greatly appreciated.
>
> Thank you in advance!
>
> Best regards,
>
> Alexander
>
>  [image: Screenshot 2025-01-20 122907.png]
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion visit 
https://groups.google.com/d/msgid/prometheus-users/058188f2-cf1d-4374-8ea6-6fa0501293acn%40googlegroups.com.

[prometheus-users] Re: Alert Resetting on Every Evaluation Cycle

Reply via email to