On Tuesday, 28 November 2023 at 04:15:41 UTC Chris Siebenmann wrote:

The Blackbox exporter is a bit tricky to understand in relation to up{}, 
because unlike many exporters you create multiple scrape targets against 
(or through) the same exporter. This generally means you want to ignore 
the up{} metric for any particular blackbox probe and instead scrape 
Blackbox's metric endpoint and pay attention to its up{} (for alerts, 
for example).


I think that's worded in a misleading way.

Blackbox exporter does have a /metrics endpoint, but this is only for 
metrics internal to the operation of blackbox_exporter itself (e.g. memory 
stats, software version). You don't need to scrape this, but it gives you a 
little bit of extra info about how your exporter is performing.

Blackbox exporter's main interface is the /probe endpoint, where you tell 
it to run individual tests: /probe?target=xxx&module=yyy

The 'up' metric is generated by Prometheus itself, and only tells you that 
it was successfully able to communicate with the exporter and get some 
results (without a 4xx / 5xx error for example).  So it's correct to say 
that you're not interested in the 'up' metric for scrapes to /probe, since 
it will always be 1 unless blackbox_exporter itself is badly broken, and 
you're interested in probe_success instead.

This is pretty easy to arrange in alerting rules. Here's a starting point:

groups:
- name: UpDown
  rules:
  - alert: UpDown
    expr: up == 0
    for: 3m
    keep_firing_for: 3m
    labels:
      severity: critical
    annotations:
      summary: 'Scrape failed: host is down or scrape endpoint 
down/unreachable'
- name: BlackboxRules
  rules:
  - alert: ProbeFail
    expr: probe_success == 0
    for: 3m
    keep_firing_for: 3m
    labels:
      severity: critical
    annotations:
      description: |
        {{ $labels.instance }} ({{ $labels.module }}) probe is failing
      summary: Probed service is down

For Grafana I'd probably just make two dashboards, but if you really want a 
grand summary of all "problems" then you can simply use a PromQL expression 
like this:

    up == 0 or probe_success == 0

The "or" operator 
<https://prometheus.io/docs/prometheus/latest/querying/operators/#logical-set-binary-operators>
 
in PromQL is not a boolean: it's more like a set union operator.  It will 
give you all the values of the "up" vector where the value is 0, along with 
all values of the "probe_success" vector where the value is 0 (except for 
values of probe_success == 0 which have *exactly* the same labels as up == 
0, but those are unlikely anyway)

The consumer of this query is going to see a mixture of up{...} and 
probe_success{...} metrics, all with value 0.

 there are other multi-target 
indirect exporters like Blackbox. I believe that the SNMP exporter is 
another one where you often have one exporter separately scraping a lot 
of targets, and each target will have its own up{} metric that you 
probably want to ignore.)


The first part of that is correct: SNMP exporter uses 
/snmp?target=xxx&module=yyy&auth=zzz.

But the second part is wrong: if SNMP exporter fails to talk to the target 
then it returns an empty scrape with a 4xx/5xx error code, which prometheus 
turns into up==0.  So you definitely *do* want to alert on up==0 in this 
case, as that's how you detect a device which is failing to respond to SNMP.

 


In our environment, it's useful for us to have a granular view of what 
has failed. That a device has stopped pinging is a different issue than 
its node_exporter not being up, so our dashboards (and alerts) reflect 
that.


I agree with that. Different metrics inherently have different meanings, 
and although 'up' and 'probe_success' have similar 0/1 semantics, there's 
other information you can get from blackbox_exporter when probe_success==0 
which can tell you more about the nature of the problem (e.g. failure to 
connect, failure to resolve DNS name, TLS certificate validation failure 
etc)

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/adf18a14-269f-41a3-b60f-d8c7a49858ean%40googlegroups.com.

Reply via email to