Prometheus version? Alertmanager version?

What if you enter the query
    probe_success{job="blackbox_icmp-server"} == 0
in the prometheus web interface (PromQL browser) while the problem is 
happening?  Does it show any results?

On Monday, 19 September 2022 at 19:21:29 UTC+1 [email protected] wrote:

> Hello Julius
>
> * The rule is something like this:
>
> - name: ServerDown
>    rules:
>    - alert: Server-InstanceDown
>      expr: probe_success{job="blackbox_icmp-server"} == 0
>      for: 1m
>
> * When alerting is not working, they are down for hours until I restart 
> prometheus and blackbox exporters. After restarting, everything is normal.
>
> *  The underlying metrics (probe_sucess) get 0 when it's down but they 
> don't change to Pending/Fired. 
>
> Thanks
> Paras.
>
> On Mon, Sep 19, 2022 at 2:35 AM Julius Volz <[email protected]> wrote:
>
>> Hi Paras,
>>
>> Could you share more information about your setup:
>>
>> * What's the alerting rule that isn't working as intended?
>> * For how long were the hosts down without getting alerted on?
>> * What did the underlying metrics (e.g. "up" for the exporter's own 
>> scrape health and "probe_success" for the backend probe health) collected 
>> by the Blackbox Exporter look like at the time when the alert should have 
>> been firing, but didn't?
>>
>> One possibility is that your Blackbox exporter itself couldn't be scraped 
>> anymore, in which case its "up" metric would be 0 and the "probe_success" 
>> metric would be absent (and thus any alerts based on that metric would 
>> never fire).
>>
>> Regards,
>> Julius
>>
>> On Thu, Sep 15, 2022 at 6:33 PM Paras pradhan <[email protected]> 
>> wrote:
>>
>>> Hello,
>>>
>>> We use prometheus , alertmanager and blackbox-exporter to check hosts if 
>>> they respond to icmp. Host counts are 1K+.  We noticed sometimes and 
>>> randomly  the alerts are not generated (prometheus dashboard --> alerts) 
>>> when the hosts/targets are actually down. Restarting prometheus, 
>>> alertmanager and blackbox-exports fixes the issue. Don't see anything that 
>>> standouts in the logs. How do I troubleshoot and is there anything like 
>>> cache data in prometheus that needs to be cleared?
>>>
>>> Thanks
>>> Paras.
>>>
>>> -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "Prometheus Users" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to [email protected].
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/prometheus-users/6bfb92dc-2a18-44d9-8fda-d6f84efba0e7n%40googlegroups.com
>>>  
>>> <https://groups.google.com/d/msgid/prometheus-users/6bfb92dc-2a18-44d9-8fda-d6f84efba0e7n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>>
>>
>> -- 
>> Julius Volz
>> PromLabs - promlabs.com
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/8e9dedc5-38ca-4e22-883c-3f15a5f84227n%40googlegroups.com.

Reply via email to