> Assuming the second metric goes missing how is the binary expression 
evaluated exactly?

The same as it always is.  Remember that the left-hand side and the 
right-hand side are both vectors, containing zero or more values, each 
value having a distinct set of labels. Noting the documentation here 
<https://prometheus.io/docs/prometheus/latest/querying/operators/#logical-set-binary-operators>
:

*    vector1*
* and vector2 results in a vector consisting of the elements of vector1 for 
which there are elements in vector2 with exactly matching label sets. Other 
elements are dropped. The metric name and values are carried over from the 
left-hand side vector.*

Therefore, if the RHS of "and" is an empty vector, then the result of the 
entire "and" expression is an empty vector - since there is nothing in 
vector2 for vector1 to match.

> In the "normal" case, i.e. "foo and bar" we would not have points but in 
the case of "absent(foo) and bar", from my tests, it seems to me the "bar" 
filtering is simply ignored.

I don't understand what mean by that. Can you give examples of the LHS and 
the RHS vectors, and the combined expression, which don't behave how you 
expect?

Note that "foo and bar" and "absent(foo) and bar" will both be empty if bar 
is empty, as just described.

"absent(foo)" is an unusual function:
- if the input vector has one or more values, i.e. any non-empty vector, 
its output is an empty vector (no values)
- if the input vector is empty, its output is one-element vector with a 
single value "1". The label set of that value depends on the exact form of 
the expression inside the parentheses; it tries to do "the right thing" but 
at worst you could have value 1 with empty label set {}

In your case,

    absent(our_metric{environment="pro",service="bar",stack="foo"})

will return
    {environment="pro",service="bar",stack="foo"} 1

i.e. a single-element vector with empty metric name, those labels, and the 
value 1.

Going back to the whole original expression:

    absent(our_metric{environment="pro",service="bar",stack="foo"}) and 
on(stack, environment) up{service="bar",source="app"} == 1

ISTM that is saying you want to generate an alert if 
our_metric{environment="pro",service="bar",stack="foo"} is missing, but 
only if metric up{service="bar",source="app"} exists *and* has value 1. 
That means the alert is suppressed if either:
(a) up{service="bar",source="app"} exists but its value is not 1
(b) up{service="bar",source="app"} does not exist - i.e. that expression 
returns an empty vector. ("up" is a special metric in prometheus; if it 
doesn't exist, it means there is no configured scrape job with those labels)

If that's not what you want, then think about what you actually want, and 
then how to express that.  For example, if you want to suppress the alert 
in case (a) but not in case (b), then you can do this:

    absent(our_metric{environment="pro",service="bar",stack="foo"}) unless 
on(stack, environment) up{service="bar",source="app"} != 1

------
If you don't mind, I will make an observation about the use of "and 
on(...)".  Since the LHS and RHS are vectors, an expression needs to 
identify corresponding values in the LHS vector and the RHS vector, to 
generate a vector of results. The on(...) part is when the LHS and RHS 
vectors don't have exactly the same label sets, and you need to ignore some 
when matching them up. I think you know all this already.

I find your expression rather confusing, because:
- we know that any values in the LHS vector must have labels 
{environment="pro",service="bar",stack="foo"}
- we know that any values in the RHS vector must have labels 
{service="bar",source="app"}
- "on(stack,environment)" says to pair up LHS and RHS values where the 
"stack" and "environment" labels match
- therefore, the RHS vector must also have stack="foo" and environment="pro"
- as this a one-to-one vector match: it will fail if a particular pair of 
(stack,environment) labels returns multiple values for the LHS and one or 
more for the RHS, or vice versa. Therefore we know (stack,environment) must 
be a unique match for a given service (*)

Therefore, implicitly I think all of (environment, service, stack) must 
match, i.e. this expression is the same as:

    absent(our_metric{environment="pro",service="bar",stack="foo"}) and 
on(environment, service, stack) 
up{environment="pro",service="bar",stack="foo",source="app"} == 1

And this can be simplified to:

    absent(our_metric{environment="pro",service="bar",stack="foo"}) and 
on(environment, service, stack) up{source="app"} == 1

I find the second version easier to read and reason about, because the 
environment/service/stack matching is all in one place, but you may 
disagree :-)

(*) This does provide another reason why an alert could fail to trigger.  
If the "and" expression returns multiple values for the same 
(stack,environment) pair on either the LHS or the RHS, with at least one 
match on the other side, then the whole expression will generate an error.

However, I think it's unlikely in this particular case. We know the LHS can 
only possibly return a single-element vector, so this error condition could 
only occur if up{service="bar",source="app"} == 1 returns multiple values 
with the same pair of (stack,environment) labels. That is, it would only be 
a problem if you had something like this:
up{environment="pro",service="bar",stack="foo",source="app",xxx="yyy"} 1
up{environment="pro",service="bar",stack="foo",source="app",xxx="zzz"} 1

On Friday, 4 March 2022 at 07:23:16 UTC baca...@gmail.com wrote:

> Hi Brian,
>
> thanks a lot for your reply.
>
> I re-read my original mail and I recognize I should have probably 
> delivered less information and went straight to the point. That probably 
> created a bit of confusion. E.g. I never intended the up metric - or any 
> other metric - to be considered a boolean. My bad. I'll try to get straight 
> to the point this time.
>
> >This is *not* boolean.  Rather, it takes the vector of timeseries "foo" 
> and matches them up with the vector of timeseries "bar".  All those 
> elements of foo which have exactly matching label >sets with bar, are 
> passed through unchanged.  Anything else is dropped.
>
> Right, and my question is the following. Mostly to understand the 
> underlining behaviour, not because I have any particular problem to resolve.
> Assuming the second metric goes missing how is the binary expression 
> evaluated exactly? In the "normal" case, i.e. "foo and bar" we would not 
> have points but in the case of "absent(foo) and bar", from my tests, it 
> seems to me the "bar" filtering is simply ignored.
>
> I can guess that is because "absent" is not really a metric per se and 
> thus we are comparing two empty sets of labels - effectively reducing 
> "absent(foo) and bar" to "absent(foo)".
> I'd say, it would make sort of sense, right?
>
> Cheers,
> F.
>
> On Thursday, 3 March 2022 at 17:01:29 UTC+1 Brian Candler wrote:
>
>> You can use the PromQL browser in the prometheus web UI to debug this, 
>> since you can view the value of an expression at any previous point in time.
>>
>> Try the two halves separately:
>>
>> absent(our_metric{environment="pro",service="bar",stack="foo"}) 
>>
>> up{service="bar",source="app"} == 1
>>
>> Then try the whole expression at that point in time.  Either view the 
>> graph, or view the instant query and set the instant time to when there was 
>> a problem.
>>
>> > As the node went missing the second operand of the binary operator 
>> could not be evaluated, simply because it was neither `1`, nor `0`
>>
>> The expression:
>>     up{service="bar",source="app"} == 1
>> can only ever have the value 1 or be missing.  metric == constant is a 
>> filter, not a boolean.  The value it returns is the value of the LHS, or no 
>> value if the filter condition is not met.
>>
>> Possibly you want to remove the "== 1" entirely:
>>
>> absent(our_metric{environment="pro",service="bar",stack="foo"}) and 
>> on(stack, environment) up{service="bar",source="app"}
>>
>> "and" expressions behave in a corresponding way:
>>
>>     foo and bar
>>
>> This is *not* boolean.  Rather, it takes the vector of timeseries "foo" 
>> and matches them up with the vector of timeseries "bar".  All those 
>> elements of foo which have exactly matching label sets with bar, are passed 
>> through unchanged.  Anything else is dropped.
>>
>> So it's just a filter: "give me all values of foo, where there is also a 
>> value present for bar".  It does not have true/false values either as its 
>> input or its output.
>>
>> > Or, in other words, the following was holding true:
>> > 
>> > absent(up{service="bar",source="app"}) = 1
>>
>> How do you know?  The "up" metric is always present for a target, whether 
>> or not scraping is successful: it would only not be present if you removed 
>> the target from the scrape job.  This could be the case if you are using 
>> some dynamic service discovery, and the service went away.  But then your 
>> real problem is how to stop services vanishing from service discovery.
>>
>> Anyway, you can tell for sure by looking at historical values of these 
>> queries:
>>
>> up{service="bar",source="app"}
>> absent(up{service="bar",source="app"})
>>
>>
>> On Thursday, 3 March 2022 at 11:12:11 UTC Federico Buti wrote:
>>
>>> Hi list,
>>>
>>> For a monitored system we setup a rule as follows:
>>>
>>> absent(our_metric{environment="pro",service="bar",stack="foo"}) and 
>>> on(stack, environment) up{service="bar",source="app"} == 1
>>>
>>> This is one of the few absence rules we have in our ruleset. This is 
>>> also a bit special because the exporter uses the absence of the metric to 
>>> indicate a problem - something that is discouraged from guidelines. But 
>>> that goes beyond my question anyway.
>>>
>>> Using a binary AND operator seems to work fine, cutting out the cases in 
>>> which the node is not scrapable. However this morning the node went 
>>> missing. We had probably a misconfiguration in our provisioning which we 
>>> are currently investigating.
>>>
>>> As the node went missing the second operand of the binary operator could 
>>> not be evaluated, simply because it was neither `1`, nor `0`. Or, in other 
>>> words, the following was holding true:
>>>
>>> absent(up{service="bar",source="app"}) = 1
>>>
>>> I understand an alert can resolve if the related metric goes stale but 
>>> I'm not sure how the logic should translate in this case. On the surface I 
>>> would not expect the AND expression to fire as we are not able to say the 
>>> "up" metric is really 1.
>>>
>>> But maybe I'm missing the point here?
>>>
>>> Thanks in advance,
>>> F.
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/f24239ac-aa22-4b1e-bcd9-92861bfa2976n%40googlegroups.com.

Reply via email to