Hi, I am currently using spring's actuator/micrometer to spit out metrics that are scraped by prometheus. The framework generates a metric called *process_uptime_seconds* which is the number of seconds my app is running in a VM . I have *2 VMs* where my app is running to provide high availability of 99.95 %.
I am using the formula *100-(((30*24*60*60) - increase(process_uptime_seconds{job="Interop-InboundApi"}[30d]))/(30*24*60*60))*100 *to calculate the SLA. 30*24*60*60 represents the number of sencods in 30 days and the difference with the process_uptime_seconds will give the number of seconds the app was down in a VM . But the problem with this approach is that periodically we have to *restart *the service to apply patch and while doing so we do it one by one so that there is no downtime. But since the above formula creates one timeseries for each VM instance the SLA goes down since both the servers are restarted one after the another. Is there a way to take this into consideration to calculate sla based on the time* when both the servers were down together *? Thanks Debashish -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-users+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/6547455f-8ebb-4d7f-b5b9-8198f415fb84%40googlegroups.com.