Glad it makes sense now. It was definitely a bump in the learning curve for 
me :-)

Regards, Brian.

On Friday, 4 March 2022 at 10:00:12 UTC Federico Buti wrote:

> Hi Brian.
>
> Thanks for the super-deep dive into the topic! This is simply awesome. And 
> sorry for the mails mismatch...too many mail accounts! :-D
>
> On Fri, 4 Mar 2022 at 09:46, Brian Candler <[email protected]> wrote:
>
>> > Assuming the second metric goes missing how is the binary expression 
>> evaluated exactly?
>>
>> The same as it always is.  Remember that the left-hand side and the 
>> right-hand side are both vectors, containing zero or more values, each 
>> value having a distinct set of labels. Noting the documentation here 
>> <https://prometheus.io/docs/prometheus/latest/querying/operators/#logical-set-binary-operators>
>> :
>>
>> *    vector1*
>> * and vector2 results in a vector consisting of the elements 
>> of vector1 for which there are elements in vector2 with exactly matching 
>> label sets. Other elements are dropped. The metric name and values are 
>> carried over from the left-hand side vector.*
>>
>> Therefore, if the RHS of "and" is an empty vector, then the result of the 
>> entire "and" expression is an empty vector - since there is nothing in 
>> vector2 for vector1 to match.
>>
>> > In the "normal" case, i.e. "foo and bar" we would not have points but 
>> in the case of "absent(foo) and bar", from my tests, it seems to me the 
>> "bar" filtering is simply ignored.
>>
>> I don't understand what mean by that. Can you give examples of the LHS 
>> and the RHS vectors, and the combined expression, which don't behave how 
>> you expect?
>>
>
> I was referring to "absent(foo) and bar", which was the source of my 
> original question. On the surface it seemed to me that  LHS was firing 
> even though RHS was empty. But your detailed explanation below forced me to 
> double-check again in the expression browser and now I see the RHS wasn't 
> really empty as I first (erroneously) reported. Which matches the 
> documentation you mentioned and makes everything click perfectly in my 
> head. Was dumb of me, but I guess stuff happens. Thanks a lot.
>
>
>
> Note that "foo and bar" and "absent(foo) and bar" will both be empty if 
>> bar is empty, as just described.
>>
>> "absent(foo)" is an unusual function:
>> - if the input vector has one or more values, i.e. any non-empty vector, 
>> its output is an empty vector (no values)
>> - if the input vector is empty, its output is one-element vector with a 
>> single value "1". The label set of that value depends on the exact form of 
>> the expression inside the parentheses; it tries to do "the right thing" but 
>> at worst you could have value 1 with empty label set {}
>>
>> In your case,
>>
>>     absent(our_metric{environment="pro",service="bar",stack="foo"})
>>
>> will return
>>     {environment="pro",service="bar",stack="foo"} 1
>>
>> i.e. a single-element vector with empty metric name, those labels, and 
>> the value 1.
>>
>> Going back to the whole original expression:
>>
>>     absent(our_metric{environment="pro",service="bar",stack="foo"}) and 
>> on(stack, environment) up{service="bar",source="app"} == 1
>>
>> ISTM that is saying you want to generate an alert if 
>> our_metric{environment="pro",service="bar",stack="foo"} is missing, but 
>> only if metric up{service="bar",source="app"} exists *and* has value 1. 
>> That means the alert is suppressed if either:
>> (a) up{service="bar",source="app"} exists but its value is not 1
>> (b) up{service="bar",source="app"} does not exist - i.e. that expression 
>> returns an empty vector. ("up" is a special metric in prometheus; if it 
>> doesn't exist, it means there is no configured scrape job with those labels)
>>
>
> Yes, I was interested in having (a). Then yesterday we experienced (b) 
> because of a provision problem and I wrote to the list to understand that 
> case better. Just to improve my knowledge. We do NOT want disappearance of 
> targets which would lead to (b) ofc, but that is an investigation we are 
> doing on our side to avoid the problem in the future.
>  
>
>
> If that's not what you want, then think about what you actually want, and 
>> then how to express that.  For example, if you want to suppress the alert 
>> in case (a) but not in case (b), then you can do this:
>>
>>     absent(our_metric{environment="pro",service="bar",stack="foo"}) 
>> unless on(stack, environment) up{service="bar",source="app"} != 1
>>
>> ------
>>
>
> Cool! I've always struggled a bit with "unless" but I can totally give it 
> a go for this case. As I should have mentioned I want to move away from the 
> absent altogether but that is something is not going to happen soon due to 
> the way the exporter is written atm, unfortunately.
>  
>
>
> If you don't mind, I will make an observation about the use of "and 
>> on(...)".  Since the LHS and RHS are vectors, an expression needs to 
>> identify corresponding values in the LHS vector and the RHS vector, to 
>> generate a vector of results. The on(...) part is when the LHS and RHS 
>> vectors don't have exactly the same label sets, and you need to ignore some 
>> when matching them up. I think you know all this already.
>>
>> I find your expression rather confusing, because:
>> - we know that any values in the LHS vector must have labels 
>> {environment="pro",service="bar",stack="foo"}
>> - we know that any values in the RHS vector must have labels 
>> {service="bar",source="app"}
>> - "on(stack,environment)" says to pair up LHS and RHS values where the 
>> "stack" and "environment" labels match
>> - therefore, the RHS vector must also have stack="foo" and 
>> environment="pro"
>> - as this a one-to-one vector match: it will fail if a particular pair of 
>> (stack,environment) labels returns multiple values for the LHS and one or 
>> more for the RHS, or vice versa. Therefore we know (stack,environment) must 
>> be a unique match for a given service (*)
>>
>> Therefore, implicitly I think all of (environment, service, stack) must 
>> match, i.e. this expression is the same as:
>>
>>     absent(our_metric{environment="pro",service="bar",stack="foo"}) and 
>> on(environment, service, stack) 
>> up{environment="pro",service="bar",stack="foo",source="app"} == 1
>>
>> And this can be simplified to:
>>
>>     absent(our_metric{environment="pro",service="bar",stack="foo"}) and 
>> on(environment, service, stack) up{source="app"} == 1
>>
>> I find the second version easier to read and reason about, because the 
>> environment/service/stack matching is all in one place, but you may 
>> disagree :-)
>>
>
> Not really sure why I should disagree here! :-D
> This is a great insight and a source of reflection for us to improve our 
> rule set. We have a few binary expressions using "and" for which the 
> reasoning applied here could be taken in account. If anything it 
> simplifies/shortens the expression a lot, which is always a plus, imo.
>
> Thanks a lot for your huge help!
> F.
>
>
>
>
> (*) This does provide another reason why an alert could fail to trigger.  
>> If the "and" expression returns multiple values for the same 
>> (stack,environment) pair on either the LHS or the RHS, with at least one 
>> match on the other side, then the whole expression will generate an error.
>>
>> However, I think it's unlikely in this particular case. We know the LHS 
>> can only possibly return a single-element vector, so this error condition 
>> could only occur if up{service="bar",source="app"} == 1 returns multiple 
>> values with the same pair of (stack,environment) labels. That is, it would 
>> only be a problem if you had something like this:
>> up{environment="pro",service="bar",stack="foo",source="app",xxx="yyy"} 1
>> up{environment="pro",service="bar",stack="foo",source="app",xxx="zzz"} 1
>>
>> On Friday, 4 March 2022 at 07:23:16 UTC [email protected] wrote:
>>
>>> Hi Brian,
>>>
>>> thanks a lot for your reply.
>>>
>>> I re-read my original mail and I recognize I should have probably 
>>> delivered less information and went straight to the point. That probably 
>>> created a bit of confusion. E.g. I never intended the up metric - or any 
>>> other metric - to be considered a boolean. My bad. I'll try to get straight 
>>> to the point this time.
>>>
>>> >This is *not* boolean.  Rather, it takes the vector of timeseries "foo" 
>>> and matches them up with the vector of timeseries "bar".  All those 
>>> elements of foo which have exactly matching label >sets with bar, are 
>>> passed through unchanged.  Anything else is dropped.
>>>
>>> Right, and my question is the following. Mostly to understand the 
>>> underlining behaviour, not because I have any particular problem to resolve.
>>> Assuming the second metric goes missing how is the binary expression 
>>> evaluated exactly? In the "normal" case, i.e. "foo and bar" we would not 
>>> have points but in the case of "absent(foo) and bar", from my tests, it 
>>> seems to me the "bar" filtering is simply ignored.
>>>
>>> I can guess that is because "absent" is not really a metric per se and 
>>> thus we are comparing two empty sets of labels - effectively reducing 
>>> "absent(foo) and bar" to "absent(foo)".
>>> I'd say, it would make sort of sense, right?
>>>
>>> Cheers,
>>> F.
>>>
>>> On Thursday, 3 March 2022 at 17:01:29 UTC+1 Brian Candler wrote:
>>>
>>>> You can use the PromQL browser in the prometheus web UI to debug this, 
>>>> since you can view the value of an expression at any previous point in 
>>>> time.
>>>>
>>>> Try the two halves separately:
>>>>
>>>> absent(our_metric{environment="pro",service="bar",stack="foo"}) 
>>>>
>>>> up{service="bar",source="app"} == 1
>>>>
>>>> Then try the whole expression at that point in time.  Either view the 
>>>> graph, or view the instant query and set the instant time to when there 
>>>> was 
>>>> a problem.
>>>>
>>>> > As the node went missing the second operand of the binary operator 
>>>> could not be evaluated, simply because it was neither `1`, nor `0`
>>>>
>>>> The expression:
>>>>     up{service="bar",source="app"} == 1
>>>> can only ever have the value 1 or be missing.  metric == constant is a 
>>>> filter, not a boolean.  The value it returns is the value of the LHS, or 
>>>> no 
>>>> value if the filter condition is not met.
>>>>
>>>> Possibly you want to remove the "== 1" entirely:
>>>>
>>>> absent(our_metric{environment="pro",service="bar",stack="foo"}) and 
>>>> on(stack, environment) up{service="bar",source="app"}
>>>>
>>>> "and" expressions behave in a corresponding way:
>>>>
>>>>     foo and bar
>>>>
>>>> This is *not* boolean.  Rather, it takes the vector of timeseries "foo" 
>>>> and matches them up with the vector of timeseries "bar".  All those 
>>>> elements of foo which have exactly matching label sets with bar, are 
>>>> passed 
>>>> through unchanged.  Anything else is dropped.
>>>>
>>>> So it's just a filter: "give me all values of foo, where there is also 
>>>> a value present for bar".  It does not have true/false values either as 
>>>> its 
>>>> input or its output.
>>>>
>>>> > Or, in other words, the following was holding true:
>>>> > 
>>>> > absent(up{service="bar",source="app"}) = 1
>>>>
>>>> How do you know?  The "up" metric is always present for a target, 
>>>> whether or not scraping is successful: it would only not be present if you 
>>>> removed the target from the scrape job.  This could be the case if you are 
>>>> using some dynamic service discovery, and the service went away.  But then 
>>>> your real problem is how to stop services vanishing from service discovery.
>>>>
>>>> Anyway, you can tell for sure by looking at historical values of these 
>>>> queries:
>>>>
>>>> up{service="bar",source="app"}
>>>> absent(up{service="bar",source="app"})
>>>>
>>>>
>>>> On Thursday, 3 March 2022 at 11:12:11 UTC Federico Buti wrote:
>>>>
>>>>> Hi list,
>>>>>
>>>>> For a monitored system we setup a rule as follows:
>>>>>
>>>>> absent(our_metric{environment="pro",service="bar",stack="foo"}) and 
>>>>> on(stack, environment) up{service="bar",source="app"} == 1
>>>>>
>>>>> This is one of the few absence rules we have in our ruleset. This is 
>>>>> also a bit special because the exporter uses the absence of the metric to 
>>>>> indicate a problem - something that is discouraged from guidelines. But 
>>>>> that goes beyond my question anyway.
>>>>>
>>>>> Using a binary AND operator seems to work fine, cutting out the cases 
>>>>> in which the node is not scrapable. However this morning the node went 
>>>>> missing. We had probably a misconfiguration in our provisioning which we 
>>>>> are currently investigating.
>>>>>
>>>>> As the node went missing the second operand of the binary operator 
>>>>> could not be evaluated, simply because it was neither `1`, nor `0`. Or, 
>>>>> in 
>>>>> other words, the following was holding true:
>>>>>
>>>>> absent(up{service="bar",source="app"}) = 1
>>>>>
>>>>> I understand an alert can resolve if the related metric goes stale but 
>>>>> I'm not sure how the logic should translate in this case. On the surface 
>>>>> I 
>>>>> would not expect the AND expression to fire as we are not able to say the 
>>>>> "up" metric is really 1.
>>>>>
>>>>> But maybe I'm missing the point here?
>>>>>
>>>>> Thanks in advance,
>>>>> F.
>>>>>
>>>> -- 
>> You received this message because you are subscribed to a topic in the 
>> Google Groups "Prometheus Users" group.
>> To unsubscribe from this topic, visit 
>> https://groups.google.com/d/topic/prometheus-users/pyTVLNKp3XM/unsubscribe
>> .
>> To unsubscribe from this group and all its topics, send an email to 
>> [email protected].
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/prometheus-users/f24239ac-aa22-4b1e-bcd9-92861bfa2976n%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/prometheus-users/f24239ac-aa22-4b1e-bcd9-92861bfa2976n%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/7485f8ca-2304-4d3c-81fe-a38b3a1d80f9n%40googlegroups.com.

Reply via email to