Well, maybe care is the wrong word. It's not relevant when you're doing math that explicitly compares different statuses. So you use aggregation operators to factor it out.
On Wed, Dec 30, 2020 at 4:08 PM Alex K <[email protected]> wrote: > Hmm. I do care about the status. Maybe when I simplified the question, by > labeling oversimplified the problem too much. > I got it to pretty much work by doing this: > > (sum without (dst_pod) ( > route_response_total{ > direction="outbound", grpc_status!="0", grpc_status!="", > rt_route!="", dst="bar"})) > / on (rt_route, pod, workload_ns) > (sum without (dst_pod) ( > route_response_total{ > direction="outbound", grpc_status="0", rt_route!="", dst="bar"})) > > 10 > > dst_pod denotes a specific kubernetes pod in the "bar" service. dst > denotes the service name. direction="outbound" denotes a counter for > requests sent from a pod. > > This gives the correct answer, but only if the denominator is present > (i.e. not "absent"). So if the pod has made at least one successful > request, this works. But for a pod that has never made a successful > request, the denominator is missing. Then the prometheus console returns > "no data", and no alert can be triggered in my rule group. Dividing by zero > seems like a separate problem, but I still appreciate any input there. > > On Wednesday, December 30, 2020 at 3:55:32 AM UTC-5 [email protected] > wrote: > >> Since you don't care about the status, the typical thing to do is us a >> sum() aggregator to remove the label. >> >> sum without (status) (increase(response_total{status!="200"}[10m])) / sum >> without (status) (increase(response_total{status="200"}[10m])) >> >> On Tue, Dec 29, 2020 at 11:36 PM Alex K <[email protected]> wrote: >> >>> I have a counter metric called response_total. It has labels source, >>> status, and service, plus a few more, but those are the important ones for >>> this question. >>> >>> response_total{status="200", source="foo", service="bar"} is the counter >>> for successful requests from a service or job called "foo" to a service >>> called "bar". >>> response_total{status!="200", source="foo", service="bar"} is the >>> counter for failed requests from a service or job called "foo" to a service >>> called "bar". >>> >>> I'm trying to define an alert that will trigger if there's a sudden >>> increase of non-200 requests from a specific source to a specific service >>> relative the increase of 200 requests for the same (source, service). E.g., >>> if the increase of non-200 requests over the last 10 minutes is 10x greater >>> than the increase of 200 requests, trigger an alert. >>> >>> I'm a bit stuck on how to define this as an expression. So far I've >>> converged on something along these lines: >>> >>> *increase(response_total{status!="200"}[10m])) / >>> increase(response_total{status="200"}[10m]) > 10* >>> >>> This doesn't seem to work, and it's not particularly surprising. I'm not >>> sure how prometheus should "know" that it should be comparing >>> response_total{status!="200", source="foo", service="bar"} to >>> response_total{status="200", source="foo", service="bar"}. >>> >>> I could define the service up-front, but the sources are defined by our >>> cluster manager, so I can't enumerate them all up-front. >>> >>> I appreciate any help! >>> >>> Thanks, >>> Alex >>> >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "Prometheus Users" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/prometheus-users/d5f90eb5-f7f5-4049-bbd6-50d530edc545n%40googlegroups.com >>> <https://groups.google.com/d/msgid/prometheus-users/d5f90eb5-f7f5-4049-bbd6-50d530edc545n%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> >> -- > You received this message because you are subscribed to the Google Groups > "Prometheus Users" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/prometheus-users/6b787e6e-9a1a-40ba-af14-9bcdedf544a4n%40googlegroups.com > <https://groups.google.com/d/msgid/prometheus-users/6b787e6e-9a1a-40ba-af14-9bcdedf544a4n%40googlegroups.com?utm_medium=email&utm_source=footer> > . > -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/CABbyFmop4y9mfDKqUt8%3DZ%3DuOR_e04BD6c1Df8EkEQeOn3UDYMQ%40mail.gmail.com.

