Hi,
  I am currently using spring's actuator/micrometer to spit out metrics 
that are scraped by prometheus.
The framework generates a metric called *process_uptime_seconds* which is 
the number of seconds my app is running in a VM . I have *2 VMs* where my 
app is running to provide high availability of 99.95 %.

I am using the formula *100-(((30*24*60*60) - 
increase(process_uptime_seconds{job="Interop-InboundApi"}[30d]))/(30*24*60*60))*100
 
*to calculate the SLA.

30*24*60*60 represents the number of sencods in 30 days and the difference 
with the process_uptime_seconds will give the number of seconds the app was 
down in a VM .

But the problem with this approach is that periodically we have to *restart 
*the service to apply patch and while doing so we do it one by one so that 
there is no downtime.

But since the above formula creates one timeseries for each VM instance the 
SLA goes down since both the servers are restarted one after the another.

Is there a way to take this into consideration to calculate sla based on 
the time* when both the servers were down together *?

Thanks
Debashish
  

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/6547455f-8ebb-4d7f-b5b9-8198f415fb84%40googlegroups.com.

Reply via email to