Monitoring for a metric vanishing is not a very good way to do alerting. 
Metrics hang around for the "staleness" interval, which by default is 5 
minutes. Ideally, you should monitor all the things you care about 
explicitly, get a success metric like "up" (1 = working, 0 = not working) 
and then alert on "up == 0" or equivalent. This is much more flexible and 
timely.

Having said that, there's a quick and dirty hack that might be good enough 
for you:

    expr: container_memory_usage_bytes offset 10m unless 
container_memory_usage_bytes

This will give you an alert if any metric container_memory_usage_bytes 
existed 10 minutes ago but does not exist now. The alert will resolve 
itself after 10 minutes.

The result of this expression is a vector, so it can alert on multiple 
containers at once; each element of the vector will have the container name 
in the label ("name")

On Saturday 18 May 2024 at 19:50:48 UTC+1 Sleep Man wrote:

> I have a large number of containers. I learned that the following 
> configuration can monitor a single container down. How to configure it to 
> monitor all containers and send the container name once a container is down.
>
>
> - name: containers
>   rules:
>   - alert: jenkins_down
>     expr: absent(container_memory_usage_bytes{name="jenkins"})
>     for: 30s
>     labels:
>       severity: critical
>     annotations:
>       summary: "Jenkins down"
>       description: "Jenkins container is down for more than 30 seconds."
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/17dd75ea-16d6-4e2d-bdc1-3d2bb345c4fan%40googlegroups.com.

Reply via email to