You can use the PromQL browser in the prometheus web UI to debug this,
since you can view the value of an expression at any previous point in time.
Try the two halves separately:
absent(our_metric{environment="pro",service="bar",stack="foo"})
up{service="bar",source="app"} == 1
Then try the whole expression at that point in time. Either view the
graph, or view the instant query and set the instant time to when there was
a problem.
> As the node went missing the second operand of the binary operator could
not be evaluated, simply because it was neither `1`, nor `0`
The expression:
up{service="bar",source="app"} == 1
can only ever have the value 1 or be missing. metric == constant is a
filter, not a boolean. The value it returns is the value of the LHS, or no
value if the filter condition is not met.
Possibly you want to remove the "== 1" entirely:
absent(our_metric{environment="pro",service="bar",stack="foo"}) and
on(stack, environment) up{service="bar",source="app"}
"and" expressions behave in a corresponding way:
foo and bar
This is *not* boolean. Rather, it takes the vector of timeseries "foo" and
matches them up with the vector of timeseries "bar". All those elements of
foo which have exactly matching label sets with bar, are passed through
unchanged. Anything else is dropped.
So it's just a filter: "give me all values of foo, where there is also a
value present for bar". It does not have true/false values either as its
input or its output.
> Or, in other words, the following was holding true:
>
> absent(up{service="bar",source="app"}) = 1
How do you know? The "up" metric is always present for a target, whether
or not scraping is successful: it would only not be present if you removed
the target from the scrape job. This could be the case if you are using
some dynamic service discovery, and the service went away. But then your
real problem is how to stop services vanishing from service discovery.
Anyway, you can tell for sure by looking at historical values of these
queries:
up{service="bar",source="app"}
absent(up{service="bar",source="app"})
On Thursday, 3 March 2022 at 11:12:11 UTC Federico Buti wrote:
> Hi list,
>
> For a monitored system we setup a rule as follows:
>
> absent(our_metric{environment="pro",service="bar",stack="foo"}) and
> on(stack, environment) up{service="bar",source="app"} == 1
>
> This is one of the few absence rules we have in our ruleset. This is also
> a bit special because the exporter uses the absence of the metric to
> indicate a problem - something that is discouraged from guidelines. But
> that goes beyond my question anyway.
>
> Using a binary AND operator seems to work fine, cutting out the cases in
> which the node is not scrapable. However this morning the node went
> missing. We had probably a misconfiguration in our provisioning which we
> are currently investigating.
>
> As the node went missing the second operand of the binary operator could
> not be evaluated, simply because it was neither `1`, nor `0`. Or, in other
> words, the following was holding true:
>
> absent(up{service="bar",source="app"}) = 1
>
> I understand an alert can resolve if the related metric goes stale but I'm
> not sure how the logic should translate in this case. On the surface I
> would not expect the AND expression to fire as we are not able to say the
> "up" metric is really 1.
>
> But maybe I'm missing the point here?
>
> Thanks in advance,
> F.
>
--
You received this message because you are subscribed to the Google Groups
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/prometheus-users/69a92873-0ecc-4fd7-8221-f2f67b2c8832n%40googlegroups.com.