I have a counter metric called response_total. It has labels source, 
status, and service, plus a few more, but those are the important ones for 
this question.

response_total{status="200", source="foo", service="bar"} is the counter 
for successful requests from a service or job called "foo" to a service 
called "bar". 
response_total{status!="200", source="foo", service="bar"} is the counter 
for failed requests from a service or job called "foo" to a service called 
"bar". 

I'm trying to define an alert that will trigger if there's a sudden 
increase of non-200 requests from a specific source to a specific service 
relative the increase of 200 requests for the same (source, service). E.g., 
if the increase of non-200 requests over the last 10 minutes is 10x greater 
than the increase of 200 requests, trigger an alert.

I'm a bit stuck on how to define this as an expression. So far I've 
converged on something along these lines: 

*increase(response_total{status!="200"}[10m])) / 
increase(response_total{status="200"}[10m]) > 10*

This doesn't seem to work, and it's not particularly surprising. I'm not 
sure how prometheus should "know" that it should be comparing 
response_total{status!="200", source="foo", service="bar"} to 
response_total{status="200", source="foo", service="bar"}.

I could define the service up-front, but the sources are defined by our 
cluster manager, so I can't enumerate them all up-front.

I appreciate any help!

Thanks,
Alex


-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/d5f90eb5-f7f5-4049-bbd6-50d530edc545n%40googlegroups.com.

Reply via email to