I have a counter metric called response_total. It has labels source,
status, and service, plus a few more, but those are the important ones for
this question.
response_total{status="200", source="foo", service="bar"} is the counter
for successful requests from a service or job called "foo" to a service
called "bar".
response_total{status!="200", source="foo", service="bar"} is the counter
for failed requests from a service or job called "foo" to a service called
"bar".
I'm trying to define an alert that will trigger if there's a sudden
increase of non-200 requests from a specific source to a specific service
relative the increase of 200 requests for the same (source, service). E.g.,
if the increase of non-200 requests over the last 10 minutes is 10x greater
than the increase of 200 requests, trigger an alert.
I'm a bit stuck on how to define this as an expression. So far I've
converged on something along these lines:
*increase(response_total{status!="200"}[10m])) /
increase(response_total{status="200"}[10m]) > 10*
This doesn't seem to work, and it's not particularly surprising. I'm not
sure how prometheus should "know" that it should be comparing
response_total{status!="200", source="foo", service="bar"} to
response_total{status="200", source="foo", service="bar"}.
I could define the service up-front, but the sources are defined by our
cluster manager, so I can't enumerate them all up-front.
I appreciate any help!
Thanks,
Alex
--
You received this message because you are subscribed to the Google Groups
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/prometheus-users/d5f90eb5-f7f5-4049-bbd6-50d530edc545n%40googlegroups.com.