On Tuesday, 28 November 2023 at 04:15:41 UTC Chris Siebenmann wrote: The Blackbox exporter is a bit tricky to understand in relation to up{}, because unlike many exporters you create multiple scrape targets against (or through) the same exporter. This generally means you want to ignore the up{} metric for any particular blackbox probe and instead scrape Blackbox's metric endpoint and pay attention to its up{} (for alerts, for example).
I think that's worded in a misleading way. Blackbox exporter does have a /metrics endpoint, but this is only for metrics internal to the operation of blackbox_exporter itself (e.g. memory stats, software version). You don't need to scrape this, but it gives you a little bit of extra info about how your exporter is performing. Blackbox exporter's main interface is the /probe endpoint, where you tell it to run individual tests: /probe?target=xxx&module=yyy The 'up' metric is generated by Prometheus itself, and only tells you that it was successfully able to communicate with the exporter and get some results (without a 4xx / 5xx error for example). So it's correct to say that you're not interested in the 'up' metric for scrapes to /probe, since it will always be 1 unless blackbox_exporter itself is badly broken, and you're interested in probe_success instead. This is pretty easy to arrange in alerting rules. Here's a starting point: groups: - name: UpDown rules: - alert: UpDown expr: up == 0 for: 3m keep_firing_for: 3m labels: severity: critical annotations: summary: 'Scrape failed: host is down or scrape endpoint down/unreachable' - name: BlackboxRules rules: - alert: ProbeFail expr: probe_success == 0 for: 3m keep_firing_for: 3m labels: severity: critical annotations: description: | {{ $labels.instance }} ({{ $labels.module }}) probe is failing summary: Probed service is down For Grafana I'd probably just make two dashboards, but if you really want a grand summary of all "problems" then you can simply use a PromQL expression like this: up == 0 or probe_success == 0 The "or" operator <https://prometheus.io/docs/prometheus/latest/querying/operators/#logical-set-binary-operators> in PromQL is not a boolean: it's more like a set union operator. It will give you all the values of the "up" vector where the value is 0, along with all values of the "probe_success" vector where the value is 0 (except for values of probe_success == 0 which have *exactly* the same labels as up == 0, but those are unlikely anyway) The consumer of this query is going to see a mixture of up{...} and probe_success{...} metrics, all with value 0. there are other multi-target indirect exporters like Blackbox. I believe that the SNMP exporter is another one where you often have one exporter separately scraping a lot of targets, and each target will have its own up{} metric that you probably want to ignore.) The first part of that is correct: SNMP exporter uses /snmp?target=xxx&module=yyy&auth=zzz. But the second part is wrong: if SNMP exporter fails to talk to the target then it returns an empty scrape with a 4xx/5xx error code, which prometheus turns into up==0. So you definitely *do* want to alert on up==0 in this case, as that's how you detect a device which is failing to respond to SNMP. In our environment, it's useful for us to have a granular view of what has failed. That a device has stopped pinging is a different issue than its node_exporter not being up, so our dashboards (and alerts) reflect that. I agree with that. Different metrics inherently have different meanings, and although 'up' and 'probe_success' have similar 0/1 semantics, there's other information you can get from blackbox_exporter when probe_success==0 which can tell you more about the nature of the problem (e.g. failure to connect, failure to resolve DNS name, TLS certificate validation failure etc) -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-users+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/adf18a14-269f-41a3-b60f-d8c7a49858ean%40googlegroups.com.