On 06.03.21 11:45, Ben Kochie wrote:
Yes, this is what the `up` metric provides. There's also `scrape_duration_seconds` that provides the time it took to perform the scrape. This makes it easier to see timeouts
Hi
a few additions from https://www.omerlh.info/2019/03/04/keeping-prometheus-in-shape/
- Use scrape_duration for monitoring - Use scrape_limit to drop problematic targets- Use scrape_samples_scraped to monitor the size of metrics exposed by specific target
alert: ScrapeDuration expr: max(scrape_duration_seconds) > 15 for: 5m labels: severity: high annotations: summary: "Prometheus Scrape Duration is getting near the limit" alert: TeamAwesomeScraeSampleSize expr: max(scrape_samples_scraped[kubernetes_namespace='awesome']) > 1000 for: 5m labels: severity: high annotations: summary: "Oh No! One of our services is exposing too much metrics!" kind regards Evelyn -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/8d8cc880-7ca6-499c-8c52-e63c55d05181%40disroot.org.
OpenPGP_0x61776FA8E38403FB.asc
Description: application/pgp-keys
OpenPGP_signature
Description: OpenPGP digital signature

