> Not sure if I'm right, but I think if one places both rules in the same group (and I think even the order shouldn't matter?), then the original: > expr: min_over_time(up[5m]) == 0 unless max_over_time(up[5m]) == 0 > for: 5m > with 5m being the "for:"-time of the long-alert should be guaranteed to work... in the sense that if the above doesn't fire... the long-alert > does.
It depends on the exact semantics of "for". e.g. take a simple case of 1 minute rule evaluation interval. If you apply "for: 1m" then I guess that means the alert must be firing for two successive evaluations (otherwise, "for: 1m" would have no effect). If so, then "for: 5m" means it must be firing for six successive evaluations. But up[5m] only looks at samples wholly contained within a 5 minute window, and therefore will normally only look at 5 samples. (If there is jitter in the sampling time, then occasionally it might look at 4 or 6 samples) If what I've written above is correct (and it may well not be!), then expr: up == 0 for: 5m will fire if "up" is zero for 6 cycles, whereas ... unless max_over_time(up[5m]) will suppress an alert if "up" is zero for (usually) 5 cycles. If you want to get to the bottom of this with certainty, you can write unit tests <https://prometheus.io/docs/prometheus/latest/configuration/unit_testing_rules/> that try out these scenarios. -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-users+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/12e68a80-7d90-4e91-838a-bae6a21ca3b1n%40googlegroups.com.