On 06.03.21 11:45, Ben Kochie wrote:
Yes, this is what the `up` metric provides. There's also `scrape_duration_seconds` that provides the time it took to perform the scrape. This makes it easier to see timeouts
Hi

a few additions from https://www.omerlh.info/2019/03/04/keeping-prometheus-in-shape/

- Use scrape_duration for monitoring
- Use scrape_limit to drop problematic targets
- Use scrape_samples_scraped to monitor the size of metrics exposed by specific target

alert: ScrapeDuration
expr: max(scrape_duration_seconds) > 15
for: 5m
labels:
  severity: high
annotations:
  summary: "Prometheus Scrape Duration is getting near the limit"


alert: TeamAwesomeScraeSampleSize
expr: max(scrape_samples_scraped[kubernetes_namespace='awesome']) > 1000
for: 5m
labels:
  severity: high
annotations:
  summary: "Oh No! One of our services is exposing too much metrics!"

kind regards
Evelyn

--
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/8d8cc880-7ca6-499c-8c52-e63c55d05181%40disroot.org.

Attachment: OpenPGP_0x61776FA8E38403FB.asc
Description: application/pgp-keys

Attachment: OpenPGP_signature
Description: OpenPGP digital signature

Reply via email to