Correct. Restating prometheus does fix it.

On Mon, Sep 19, 2022 at 3:44 PM Brian Candler <[email protected]> wrote:

> "Restarting prometheus, alertmanager and blackbox-exports fixes the issue"
>
> Which one of these fixes the issue?  From what you've said, I am guessing
> that restarting only prometheus would do it - since you're saying you see
> no alerts in the Prometheus UI, not even in "pending" state.
>
> On Monday, 19 September 2022 at 21:39:11 UTC+1 [email protected] wrote:
>
>> Prometheus : 2.38.0
>> Alertmanager : 0.24.0
>> Blackbox: 0.22.0
>>
>> probe_success{job="blackbox_icmp-server"}  returns 0. I see it .
>>
>> Thanks
>> Paras.
>>
>> On Mon, Sep 19, 2022 at 3:32 PM Brian Candler <[email protected]> wrote:
>>
>>> Prometheus version? Alertmanager version?
>>>
>>> What if you enter the query
>>>     probe_success{job="blackbox_icmp-server"} == 0
>>> in the prometheus web interface (PromQL browser) while the problem is
>>> happening?  Does it show any results?
>>>
>>> On Monday, 19 September 2022 at 19:21:29 UTC+1 [email protected]
>>> wrote:
>>>
>>>> Hello Julius
>>>>
>>>> * The rule is something like this:
>>>>
>>>> - name: ServerDown
>>>>    rules:
>>>>    - alert: Server-InstanceDown
>>>>      expr: probe_success{job="blackbox_icmp-server"} == 0
>>>>      for: 1m
>>>>
>>>> * When alerting is not working, they are down for hours until I restart
>>>> prometheus and blackbox exporters. After restarting, everything is normal.
>>>>
>>>> *  The underlying metrics (probe_sucess) get 0 when it's down but they
>>>> don't change to Pending/Fired.
>>>>
>>>> Thanks
>>>> Paras.
>>>>
>>>> On Mon, Sep 19, 2022 at 2:35 AM Julius Volz <[email protected]>
>>>> wrote:
>>>>
>>>>> Hi Paras,
>>>>>
>>>>> Could you share more information about your setup:
>>>>>
>>>>> * What's the alerting rule that isn't working as intended?
>>>>> * For how long were the hosts down without getting alerted on?
>>>>> * What did the underlying metrics (e.g. "up" for the exporter's own
>>>>> scrape health and "probe_success" for the backend probe health) collected
>>>>> by the Blackbox Exporter look like at the time when the alert should have
>>>>> been firing, but didn't?
>>>>>
>>>>> One possibility is that your Blackbox exporter itself couldn't be
>>>>> scraped anymore, in which case its "up" metric would be 0 and the
>>>>> "probe_success" metric would be absent (and thus any alerts based on that
>>>>> metric would never fire).
>>>>>
>>>>> Regards,
>>>>> Julius
>>>>>
>>>>> On Thu, Sep 15, 2022 at 6:33 PM Paras pradhan <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> We use prometheus , alertmanager and blackbox-exporter to check hosts
>>>>>> if they respond to icmp. Host counts are 1K+.  We noticed sometimes and
>>>>>> randomly  the alerts are not generated (prometheus dashboard --> alerts)
>>>>>> when the hosts/targets are actually down. Restarting prometheus,
>>>>>> alertmanager and blackbox-exports fixes the issue. Don't see anything 
>>>>>> that
>>>>>> standouts in the logs. How do I troubleshoot and is there anything like
>>>>>> cache data in prometheus that needs to be cleared?
>>>>>>
>>>>>> Thanks
>>>>>> Paras.
>>>>>>
>>>>>> --
>>>>>> You received this message because you are subscribed to the Google
>>>>>> Groups "Prometheus Users" group.
>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>> send an email to [email protected].
>>>>>> To view this discussion on the web visit
>>>>>> https://groups.google.com/d/msgid/prometheus-users/6bfb92dc-2a18-44d9-8fda-d6f84efba0e7n%40googlegroups.com
>>>>>> <https://groups.google.com/d/msgid/prometheus-users/6bfb92dc-2a18-44d9-8fda-d6f84efba0e7n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>> .
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Julius Volz
>>>>> PromLabs - promlabs.com
>>>>>
>>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "Prometheus Users" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>>
>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/prometheus-users/8e9dedc5-38ca-4e22-883c-3f15a5f84227n%40googlegroups.com
>>> <https://groups.google.com/d/msgid/prometheus-users/8e9dedc5-38ca-4e22-883c-3f15a5f84227n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>> --
> You received this message because you are subscribed to the Google Groups
> "Prometheus Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/prometheus-users/0a344880-3ac6-4567-9e0a-7e8cec7177dan%40googlegroups.com
> <https://groups.google.com/d/msgid/prometheus-users/0a344880-3ac6-4567-9e0a-7e8cec7177dan%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/CADyt5g%3D1Lfr4pxiA86cT8MHAVEoLOjdbos446i4emK21F-yHrg%40mail.gmail.com.

Reply via email to