*blackbox exporter config:*
icmp:
        prober: icmp
        icmp:
          preferred_ip_protocol: "ip4"
tcp:
        prober: tcp
        timeout: 5s
        tcp:
          preferred_ip_protocol: "ip4"

*Prometheus scrape config:*
global:
      scrape_interval: 60s
      evaluation_interval: 60s
- job_name: PING
        metrics_path: /probe
        params:
          module: [icmp]
        file_sd_configs:
        - files:
          - '/etc/prometheus/targets/'
        relabel_configs:
          - source_labels: [__address__]
            target_label: __param_target
            regex: '([^:]+)(:[0-9]+)?'
            replacement: '${1}'
          - source_labels: [__param_target]
            target_label: instance
          - target_label: __address__
            replacement: prometheus-blackbox-exporter:9115
      - job_name: SSH
        metrics_path: /probe
        params:
          module: [ssh_banner]
        file_sd_configs:
        - files:
          - '/etc/prometheus/targets/'
        relabel_configs:
          - source_labels: [__address__]
            target_label: __param_target
            regex: '([^:]+)(:[0-9]+)?'
            replacement: '${1}:22'
          - source_labels: [__param_target]
            target_label: instance
          - target_label: __address__
            replacement: prometheus-blackbox-exporter:9115

*Alert rules:*
- alert: TargetDown
          expr: probe_success == 0
          for: 5s
          labels:
            severity: critical
          annotations:
            description: Service {{ $labels.instance }} is unreachable.
            value: DOWN ({{ $value }})
            summary: "Target {{ $labels.instance }} is down."

*Alert manager config:*
config.yml: |-
    global:
      resolve_timeout: 5m
      smtp_smarthost: mail
      smtp_from: alertmanager
      smtp_require_tls: false
    route:
      receiver: email-me
      group_by: [instance, alertname, job]
      group_wait: 45s
      group_interval: 5m
      repeat_interval: 24h
    receivers:
    - name: email-me
      email_configs:
      - to: alert
        send_resolved: true

On Wednesday, April 20, 2022 at 8:29:10 PM UTC+8 Brian Candler wrote:

> blackbox_exporter monitoring TCP ports (e.g. for SSH) and ICMP (ping) 
> works fine.
>
> "but black box exporter detect the recover behavior after about 5mins"
>
> Black box exporter only performs a single test when you scrape it.  It 
> does not by itself do any recovery detection.  The problem is therefore 
> most likely with your prometheus scrape config or your alertmanager config.
>
> If you're having a problem, you'll need to be more specific:
> * show your blackbox_exporter config, your prometheus scrape config which 
> scrapes it, your alerting rules, and your alertmanager config (if using 
> alertmanager)
> * describe more clearly the behaviour you're seeing, and what you expected 
> to see.  (For example, are you waiting for a "recovery" E-mail from 
> alertmanager?)
>
> "And after the IP table is recovered, the alert for Ping can be cleared 
> after about 20mins, but SSH is still there."
>
> Either SSH is working and reachable, or it is not.  You can check the 
> results of blackbox_exporter tests by hand using curl, and also get 
> additional debugging information, like this:
>
> curl -g 'http://127.0.0.1:9115/probe?module=xxx&target=yyyy&debug=true'
>
> Here is an example:
>
> # *curl -g 
> 'http://localhost:9115/probe?module=icmp&target=1.2.3.4&debug=true 
> <http://localhost:9115/probe?module=icmp&target=1.2.3.4&debug=true>'*
> Logs for the probe:
> ts=2022-04-20T12:25:11.587855449Z caller=main.go:320 module=icmp 
> target=1.2.3.4 level=info msg="Beginning probe" probe=icmp timeout_seconds=3
> ts=2022-04-20T12:25:11.588014456Z caller=icmp.go:91 module=icmp 
> target=1.2.3.4 level=info msg="Resolving target address" ip_protocol=ip6
> ts=2022-04-20T12:25:11.588065658Z caller=icmp.go:91 module=icmp 
> target=1.2.3.4 level=info msg="Resolving target address" ip_protocol=ip4
> ts=2022-04-20T12:25:11.588098688Z caller=icmp.go:91 module=icmp 
> target=1.2.3.4 level=info msg="Resolved target address" ip=1.2.3.4
> ts=2022-04-20T12:25:11.588133368Z caller=main.go:130 module=icmp 
> target=1.2.3.4 level=info msg="Creating socket"
> ts=2022-04-20T12:25:11.588188673Z caller=main.go:130 module=icmp 
> target=1.2.3.4 level=debug msg="Unable to do unprivileged listen on socket, 
> will attempt privileged" err="socket: permission denied"
> ts=2022-04-20T12:25:11.58829848Z caller=main.go:130 module=icmp 
> target=1.2.3.4 level=info msg="Creating ICMP packet" seq=24581 id=190
> ts=2022-04-20T12:25:11.588348917Z caller=main.go:130 module=icmp 
> target=1.2.3.4 level=info msg="Writing out packet"
> ts=2022-04-20T12:25:11.588470176Z caller=main.go:130 module=icmp 
> target=1.2.3.4 level=info msg="Waiting for reply packets"
> ts=2022-04-20T12:25:14.588761946Z caller=main.go:130 module=icmp 
> target=1.2.3.4 level=debug msg="Cannot get TTL from the received packet. 
> 'probe_icmp_reply_hop_limit' will be missing."
> ts=2022-04-20T12:25:14.588979317Z caller=main.go:130 module=icmp 
> target=1.2.3.4 level=warn msg="Timeout reading from socket" err="read ip 
> 0.0.0.0: raw-read ip4 0.0.0.0: i/o timeout"
> ts=2022-04-20T12:25:14.589247538Z caller=main.go:320 module=icmp 
> target=1.2.3.4 level=error msg="Probe failed" duration_seconds=3.001307309
>
>
>
> Metrics that would have been returned:
> # HELP probe_dns_lookup_time_seconds Returns the time taken for probe dns 
> lookup in seconds
> # TYPE probe_dns_lookup_time_seconds gauge
> probe_dns_lookup_time_seconds 0.000116077
> # HELP probe_duration_seconds Returns how long the probe took to complete 
> in seconds
> # TYPE probe_duration_seconds gauge
> probe_duration_seconds 3.001307309
> # HELP probe_icmp_duration_seconds Duration of icmp request by phase
> # TYPE probe_icmp_duration_seconds gauge
> probe_icmp_duration_seconds{phase="resolve"} 0.000116077
> probe_icmp_duration_seconds{phase="rtt"} 0
> probe_icmp_duration_seconds{phase="setup"} 0.000212886
> # HELP probe_ip_addr_hash Specifies the hash of IP address. It's useful to 
> detect if the IP address changes.
> # TYPE probe_ip_addr_hash gauge
> probe_ip_addr_hash 3.268949123e+09
> # HELP probe_ip_protocol Specifies whether probe ip protocol is IP4 or IP6
> # TYPE probe_ip_protocol gauge
> probe_ip_protocol 4
> # HELP probe_success Displays whether or not the probe was a success
> # TYPE probe_success gauge
> probe_success 0
>
>
>
> Module configuration:
> prober: icmp
> timeout: 3s
> http:
>     ip_protocol_fallback: true
>     follow_redirects: true
> tcp:
>     ip_protocol_fallback: true
> icmp:
>     ip_protocol_fallback: true
> dns:
>     ip_protocol_fallback: true
>
>
> Look at "probe_success" for the overall result.
>
> You can also use the PromQL browser in the Prometheus web interface: enter 
> "probe_success" as the query and look at the graph tab. You'll see the 
> history of your blackbox exporter probes.
>
> On Wednesday, 20 April 2022 at 12:37:17 UTC+1 ninag...@gmail.com wrote:
>
>> Hi guys,
>>
>> We are using black box exporter to monitor ssh and ping.
>>
>> For ssh, (we monitor the port 22) if we stop sshd service, actually the 
>> service will be auto-recovered, but black box exporter detect the recover 
>> behavior after about 5mins.
>>
>> For ping, we use icmp module to monitor system ping, we deleted the IP 
>> tables, then Prometheus triggered 2 alerts, one is SSH is failed, the other 
>> is Ping is failed. And after the IP table is recovered, the alert for Ping 
>> can be cleared after about 20mins, but SSH is still there.
>>
>> So it is a good approach to use blackbox exporter to monitor SSH and PING?
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/35d018ad-19d6-45f4-871c-0c82792d33c2n%40googlegroups.com.

Reply via email to