On 11/01/2023 19:58, Eulogio Apelin wrote:
I'm looking for information, primarily examples, of various ways to configure alert rules.

Specifically, scenarios like:

In a single rule group:
Regular expression that determined a tls cert expires in 60 days. send 1 alert Regular expression that determined a tls cert expires in 40 days, send 1 alert Regular expression that determined a tls cert expires in 30 days, send 1 alert Regular expression that determined a tls cert expires in 20 days, send 1 alert Regular expression that determined a tls cert expires in 10 days, send 1 alert Regular expression that determined a tls cert expires in 5 days, send 1 alert Regular expression that determined a tls cert expires in 0 days, send 1 alert

Another scenario is to
send an alert once day to an email address.
send an alert if it's the 3rd day in a row, send the alert to another set of address. and stop alerting.

can alertmanager send alerts to teams like it does slack?

And another other general examples of alert manager rules.

I think it is best not to think of alerts as moment in time events but as being a time period where a certain condition is true. Separate to the actual alert firing are then rules (in Alertmanager) of how to route it (e.g. to Slack, email, etc.), what to send (email body template) and how often to remind people that the alert is happening.

So for example with your TLS expiry example you might have an alert which starts firing once a certificate is within 60 days of expiry. It would continue to fire continuously until either the certificate is renewed (i.e. it is over 60 days again) or stops existing (because you've reconfigured Prometheus to no longer monitor that certificate). Then within Alertmanager you can set rules to send you a message every 10 days that alert is firing, meaning you'd get a message at 60, 50, 40, etc days until expiry.

More complex alerting routing decisions are generally out of scope for Alertmanager and would be expected to be managed by a more complex system (such as PagerDuty, OpsGenie, Grafana On-Call, etc.). This would cover you example of wanting to escalate an alert after a period of time, but would also cover things like having on-call rotas where different people would be contacted by looking at a rota calendar.

--
Stuart Clark

--
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/b43cfa1a-18c1-3c44-48f3-46349d8cdffa%40Jahingo.com.

Reply via email to