On 11/01/2023 19:58, Eulogio Apelin wrote:
I'm looking for information, primarily examples, of various ways to
configure alert rules.
Specifically, scenarios like:
In a single rule group:
Regular expression that determined a tls cert expires in 60 days. send
1 alert
Regular expression that determined a tls cert expires in 40 days, send
1 alert
Regular expression that determined a tls cert expires in 30 days, send
1 alert
Regular expression that determined a tls cert expires in 20 days, send
1 alert
Regular expression that determined a tls cert expires in 10 days, send
1 alert
Regular expression that determined a tls cert expires in 5 days, send
1 alert
Regular expression that determined a tls cert expires in 0 days, send
1 alert
Another scenario is to
send an alert once day to an email address.
send an alert if it's the 3rd day in a row, send the alert to another
set of address. and stop alerting.
can alertmanager send alerts to teams like it does slack?
And another other general examples of alert manager rules.
I think it is best not to think of alerts as moment in time events but
as being a time period where a certain condition is true. Separate to
the actual alert firing are then rules (in Alertmanager) of how to route
it (e.g. to Slack, email, etc.), what to send (email body template) and
how often to remind people that the alert is happening.
So for example with your TLS expiry example you might have an alert
which starts firing once a certificate is within 60 days of expiry. It
would continue to fire continuously until either the certificate is
renewed (i.e. it is over 60 days again) or stops existing (because
you've reconfigured Prometheus to no longer monitor that certificate).
Then within Alertmanager you can set rules to send you a message every
10 days that alert is firing, meaning you'd get a message at 60, 50, 40,
etc days until expiry.
More complex alerting routing decisions are generally out of scope for
Alertmanager and would be expected to be managed by a more complex
system (such as PagerDuty, OpsGenie, Grafana On-Call, etc.). This would
cover you example of wanting to escalate an alert after a period of
time, but would also cover things like having on-call rotas where
different people would be contacted by looking at a rota calendar.
--
Stuart Clark
--
You received this message because you are subscribed to the Google Groups
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/prometheus-users/b43cfa1a-18c1-3c44-48f3-46349d8cdffa%40Jahingo.com.