Re: [prometheus-users] Discuss Prmetheus alerts suppressing

Stuart Clark Fri, 19 Feb 2021 13:48:27 -0800

On 19/02/2021 19:43, Badreddin Aboubakr wrote:

Hello,
We use Prometheus to monitor our infrastructure (hypervisors,gateways, storage servers, etc). Scrape targets are sourced from aPostgres database, which contains additional information about the “inproduction” state of the target. In the beginning we used to have ametadata metric which indicated the state of the server as an `enum`metric.
By joining the state metric on each alerting rule and then droppingthe alerts which have specific state, we were able to suppressun-needed alerts
With the growth of number of alerting rules and the number of states,joining on these metrics in all alerting rules became so expensivethat we wrote some recording rules which keeps evaluating the enummetric and produces enum metric with less cardinality (production(where alerts shall pass to their receivers) and everything else (Willbe dropped at alertmanager step))
so again we join on these metrics and drop alerts which havenon-production.
Now that is not going to scale but it was a temporary solution as ouralerting rules are growing.
So we discussed some solutions:
* We can set silences and remove them on state change using alertmanager API:
This approach is too dynamic however (I don’t know if alertmanagerAPI was designed for this purpose and, maybe it’s ) Will that scalewith number of silences and hosts
* We can develop a kind of proxy which will be deployed betweenPrometheus and alertmanager, and drop alerts for hosts innon-production state:
This approach is dangerous as if the proxy fails, no alerts willreach alertmanager
* put the proxy on the notification path: This will make it a bitcomplicated as the proxy has to understand receivers, etc
PS: We still want to scrape and monitor the servers which are not inproduction state.
We will be really thankful for any suggestions or ideas.

Couldn't you run two sets of Prometheus servers to monitor theproduction infrastructure separately from the non-production. Then justdon't have alerting rules or connect alertmanagers to the non-productionservers.


--
Stuart Clark

--
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/07f4078b-997d-2aa0-f7a5-c88afcef2d70%40Jahingo.com.

Re: [prometheus-users] Discuss Prmetheus alerts suppressing

Reply via email to