On Monday, 19 September 2022 at 04:14:14 UTC+1 [email protected] wrote:
> The subquery you mentioned above : > *max_over_time((time() * up)[24h:5m]) ----->* In This query we are > getting values for both up and down, but for the servers which were down I > am getting the wrong time(i.e the year was 1970) and the servers which were > up, the timestamp was correct. > This will give zero if the servers have been down for more than 24 hours, i.e. the value of the "up" metric is zero at every point during the last 24 hours. Essentially, a result of zero here means "server has been down for more than 24 hours". You can increase the size of the time window from 24h to whatever is needed - but the query will become more expensive as it has to look over a wider range of data. As far as I know, there is no existing metric you can query which says "the time a scrape was last successful". Given that the prometheus server could be restarted at any time, this would involve maintaining state when the server is shutdown and restarted. You could perhaps synthesise one using recording rules. That is: I believe it's possible to write a recording rule which depends on its own previous value. Without careful rule writing, it might still fail if the prometheus server itself is shut down for more than 5 minutes. > Or > *max_over_time((timestamp(up == 1))[24h:5m])* ------> This query we are > getting output which were up, I am not getting the value which were down > Again for the same reason: the servers which are down for more than 24 hours have 'up' value of 0, so the filter "up == 1" excludes them at all points in the 24 hour period that you are querying. -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/d6e91748-e2e6-4b83-adbc-de114e930f56n%40googlegroups.com.

