Hello Julius
* The rule is something like this:
- name: ServerDown
rules:
- alert: Server-InstanceDown
expr: probe_success{job="blackbox_icmp-server"} == 0
for: 1m
* When alerting is not working, they are down for hours until I restart
prometheus and blackbox exporters. After restarting, everything is normal.
* The underlying metrics (probe_sucess) get 0 when it's down but they
don't change to Pending/Fired.
Thanks
Paras.
On Mon, Sep 19, 2022 at 2:35 AM Julius Volz <[email protected]>
wrote:
> Hi Paras,
>
> Could you share more information about your setup:
>
> * What's the alerting rule that isn't working as intended?
> * For how long were the hosts down without getting alerted on?
> * What did the underlying metrics (e.g. "up" for the exporter's own scrape
> health and "probe_success" for the backend probe health) collected by the
> Blackbox Exporter look like at the time when the alert should have been
> firing, but didn't?
>
> One possibility is that your Blackbox exporter itself couldn't be scraped
> anymore, in which case its "up" metric would be 0 and the "probe_success"
> metric would be absent (and thus any alerts based on that metric would
> never fire).
>
> Regards,
> Julius
>
> On Thu, Sep 15, 2022 at 6:33 PM Paras pradhan <[email protected]>
> wrote:
>
>> Hello,
>>
>> We use prometheus , alertmanager and blackbox-exporter to check hosts if
>> they respond to icmp. Host counts are 1K+. We noticed sometimes and
>> randomly the alerts are not generated (prometheus dashboard --> alerts)
>> when the hosts/targets are actually down. Restarting prometheus,
>> alertmanager and blackbox-exports fixes the issue. Don't see anything that
>> standouts in the logs. How do I troubleshoot and is there anything like
>> cache data in prometheus that needs to be cleared?
>>
>> Thanks
>> Paras.
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "Prometheus Users" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/prometheus-users/6bfb92dc-2a18-44d9-8fda-d6f84efba0e7n%40googlegroups.com
>> <https://groups.google.com/d/msgid/prometheus-users/6bfb92dc-2a18-44d9-8fda-d6f84efba0e7n%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>
>
> --
> Julius Volz
> PromLabs - promlabs.com
>
--
You received this message because you are subscribed to the Google Groups
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/prometheus-users/CADyt5gmOGhcPw%3DaG_HkxCA3FbryBo3%2B0qXCK96u%2B0D2nO8tNDw%40mail.gmail.com.