[prometheus-users] Prometheus alerting rules test for counters requiring multiple day span

Debashish Ghosh Tue, 10 Mar 2020 08:31:13 -0700

Hi,
I have a metric regarding SLA that needs to be 99.95 % or above . I am
using the formula 100-(((30*24*60*60) -
increase(process_uptime_seconds{job="Interop-InboundApi"}[30d]))/(30*24*60*60))*100
that runs for15 minutes ,which means if there is any time missing between
the total number of seconds in 30 days minus the number of seconds the
server was up in the last 30 days , that time should be less than .05%.. I
am having difficulty writing test for this since I see that alert rules
test doesn't allow '1d' as interval . So should I use something like 1m as
interval with values: '0+60x43200' which would be number of entries equal
to the number of minutes in 30 days. Also what should be the eval_time I
use in this case ? I am using 15m but that doesn't yield the required
result .


I have similar problem for Latency SLA . I am using histogram for that and
am trying to get the percentage of messages below 1 second bucket . I am
using the formula below :
sum(rate(http_server_requests_seconds_bucket{le="1.0",uri="/inboundapi/message/v2"}[30d]))
by (job)
/sum(rate(http_server_requests_seconds_count{uri="/inboundapi/message/v2"}[30d]))by
(job)*100.
To test this too I need to use something similar to above case.

Thanks
Debashish

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/CAHg4STwcVpsdX4_Q1Q3W2tCK69UcS8oLfPAJR%2BvBVcUiwkhHiw%40mail.gmail.com.

[prometheus-users] Prometheus alerting rules test for counters requiring multiple day span

Reply via email to