Re: [prometheus-users] Issue with Prometheus increase()

amar Thu, 06 Oct 2022 00:17:09 -0700

Thanks Julius for your response. 
I don't think pre-initializing will work out here in our case. Let me 
provide some more details.
We are running python services in multiprocess mode and dumping all the 
metrics to PROMETHEUS_MULTIPROC_DIR. ( Using Prometheus Python Client 
Library )
These metrics are being scraped and there is a cleanup thats happening (due 
to some limitations ) which leads to non-existent metrics scenario and our 
alerts are not getting fired.
Now, I might have 1k to 10k active/distinct time series for the same metric 
( due to different labels combinations ).
In order to make this work, should I be pre-initialzing every combination 
with zero and even if I do it if we cleanup the metrics under 
PROMETHEUS_MULTIPROC_DIR, it will still lead to same case.
Please let me know if you have any suggestions for these cases possibly by 
tweaking the alert expressions.


Thanks,
Amar

On Thursday, October 6, 2022 at 12:05:03 PM UTC+5:30 juliu...@promlabs.com 
wrote:

> On Thu, Oct 6, 2022 at 8:07 AM amar <amarmu...@gmail.com> wrote:
>
>> Hello All,
>>
>> I believe many of the would have come across the below issue Prometheus 
>> increase function. Let me explain with an example:
>>
>> *Prometheus Metric:*
>> "api_exit_status_total" is a Counter with distinct label set.
>>
>> api_exit_status_total{api="", method="",exit_status="",.....}
>>
>> *Alert Expression:*
>> increase(api_exit_status_total{api="",exit_status="failure"}[1h]) > 0
>> You can even consider we are doing sum by region, api etc.,
>>
>> *Issue being observed:*
>> Due to some reasons or due to counter reset, we don't have any time 
>> series data for the above metrics for some 10 mins or 1hr.
>> Now, when the counter gets increased due to some transactions, since the 
>> we don't have the time series data in the past the increase function return 
>> 0 even if the counter is at some x value.
>> There are workarounds to use OR on() vector(0) at the end of the alert 
>> expression but this doesn't work for metrics with distinct label set.
>>
>
> While:
>
>     foo_metric or on() vector(0)
>
> ...does always give you an output time series, that default fill-in series 
> will indeed not have any labels at all.
>
> The next best thing you can do is join in a metric that always exists for 
> the same job (like the "up" metric), so that you at least get the target 
> labels in the output. Like:
>
>     foo_metric{job="myjob"} or on() up{job="myjob"}
>
> That still does not give you the "api", "method", "exit_status" etc. 
> labels that would be present if you had data on the left side. That's fine 
> if the left side is an aggregation over the whole job/instance anyway 
> (since then the output would lose those labels anyway), but if you want a 
> specific label set to always be present in the output, you could:
>
> * Always pre-initialize all possible counter series combinations in the 
> instrumented application, so that they are exposed immediately after a 
> counter reset as well (without having received increments yet)
> * Force the right-hand side to have a specific label set using the 
> "label_replace()" function
>
> All that is btw. not related to the increase() function, it's just a 
> general issue of how to deal with non-existent time series. There's also 
> this good blog post about it: 
> https://www.robustperception.io/existential-issues-with-metrics/
>  
>
>> Could you please let me know if you have come across such issue and 
>> possible workaround please.
>>
>> Thanks,
>> Amar
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Prometheus Users" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to prometheus-use...@googlegroups.com.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/prometheus-users/98318453-6008-42b0-8426-903353a1f908n%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/prometheus-users/98318453-6008-42b0-8426-903353a1f908n%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>
>
> -- 
> Julius Volz
> PromLabs - promlabs.com
>
> On Thu, Oct 6, 2022 at 8:07 AM amar <amarmu...@gmail.com> wrote:
>
>> Hello All,
>>
>> I believe many of the would have come across the below issue Prometheus 
>> increase function. Let me explain with an example:
>>
>> *Prometheus Metric:*
>> "api_exit_status_total" is a Counter with distinct label set.
>>
>> api_exit_status_total{api="", method="",exit_status="",.....}
>>
>> *Alert Expression:*
>> increase(api_exit_status_total{api="",exit_status="failure"}[1h]) > 0
>> You can even consider we are doing sum by region, api etc.,
>>
>> *Issue being observed:*
>> Due to some reasons or due to counter reset, we don't have any time 
>> series data for the above metrics for some 10 mins or 1hr.
>> Now, when the counter gets increased due to some transactions, since the 
>> we don't have the time series data in the past the increase function return 
>> 0 even if the counter is at some x value.
>> There are workarounds to use OR on() vector(0) at the end of the alert 
>> expression but this doesn't work for metrics with distinct label set.
>>
>> Could you please let me know if you have come across such issue and 
>> possible workaround please.
>>
>> Thanks,
>> Amar
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Prometheus Users" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to prometheus-use...@googlegroups.com.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/prometheus-users/98318453-6008-42b0-8426-903353a1f908n%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/prometheus-users/98318453-6008-42b0-8426-903353a1f908n%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>
>
> -- 
> Julius Volz
> PromLabs - promlabs.com
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/d655548e-9660-4e38-9f33-7dc2e4459db3n%40googlegroups.com.

Re: [prometheus-users] Issue with Prometheus increase()

Reply via email to