Correct. Restating prometheus does fix it. On Mon, Sep 19, 2022 at 3:44 PM Brian Candler <[email protected]> wrote:
> "Restarting prometheus, alertmanager and blackbox-exports fixes the issue" > > Which one of these fixes the issue? From what you've said, I am guessing > that restarting only prometheus would do it - since you're saying you see > no alerts in the Prometheus UI, not even in "pending" state. > > On Monday, 19 September 2022 at 21:39:11 UTC+1 [email protected] wrote: > >> Prometheus : 2.38.0 >> Alertmanager : 0.24.0 >> Blackbox: 0.22.0 >> >> probe_success{job="blackbox_icmp-server"} returns 0. I see it . >> >> Thanks >> Paras. >> >> On Mon, Sep 19, 2022 at 3:32 PM Brian Candler <[email protected]> wrote: >> >>> Prometheus version? Alertmanager version? >>> >>> What if you enter the query >>> probe_success{job="blackbox_icmp-server"} == 0 >>> in the prometheus web interface (PromQL browser) while the problem is >>> happening? Does it show any results? >>> >>> On Monday, 19 September 2022 at 19:21:29 UTC+1 [email protected] >>> wrote: >>> >>>> Hello Julius >>>> >>>> * The rule is something like this: >>>> >>>> - name: ServerDown >>>> rules: >>>> - alert: Server-InstanceDown >>>> expr: probe_success{job="blackbox_icmp-server"} == 0 >>>> for: 1m >>>> >>>> * When alerting is not working, they are down for hours until I restart >>>> prometheus and blackbox exporters. After restarting, everything is normal. >>>> >>>> * The underlying metrics (probe_sucess) get 0 when it's down but they >>>> don't change to Pending/Fired. >>>> >>>> Thanks >>>> Paras. >>>> >>>> On Mon, Sep 19, 2022 at 2:35 AM Julius Volz <[email protected]> >>>> wrote: >>>> >>>>> Hi Paras, >>>>> >>>>> Could you share more information about your setup: >>>>> >>>>> * What's the alerting rule that isn't working as intended? >>>>> * For how long were the hosts down without getting alerted on? >>>>> * What did the underlying metrics (e.g. "up" for the exporter's own >>>>> scrape health and "probe_success" for the backend probe health) collected >>>>> by the Blackbox Exporter look like at the time when the alert should have >>>>> been firing, but didn't? >>>>> >>>>> One possibility is that your Blackbox exporter itself couldn't be >>>>> scraped anymore, in which case its "up" metric would be 0 and the >>>>> "probe_success" metric would be absent (and thus any alerts based on that >>>>> metric would never fire). >>>>> >>>>> Regards, >>>>> Julius >>>>> >>>>> On Thu, Sep 15, 2022 at 6:33 PM Paras pradhan <[email protected]> >>>>> wrote: >>>>> >>>>>> Hello, >>>>>> >>>>>> We use prometheus , alertmanager and blackbox-exporter to check hosts >>>>>> if they respond to icmp. Host counts are 1K+. We noticed sometimes and >>>>>> randomly the alerts are not generated (prometheus dashboard --> alerts) >>>>>> when the hosts/targets are actually down. Restarting prometheus, >>>>>> alertmanager and blackbox-exports fixes the issue. Don't see anything >>>>>> that >>>>>> standouts in the logs. How do I troubleshoot and is there anything like >>>>>> cache data in prometheus that needs to be cleared? >>>>>> >>>>>> Thanks >>>>>> Paras. >>>>>> >>>>>> -- >>>>>> You received this message because you are subscribed to the Google >>>>>> Groups "Prometheus Users" group. >>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>> send an email to [email protected]. >>>>>> To view this discussion on the web visit >>>>>> https://groups.google.com/d/msgid/prometheus-users/6bfb92dc-2a18-44d9-8fda-d6f84efba0e7n%40googlegroups.com >>>>>> <https://groups.google.com/d/msgid/prometheus-users/6bfb92dc-2a18-44d9-8fda-d6f84efba0e7n%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>> . >>>>>> >>>>> >>>>> >>>>> -- >>>>> Julius Volz >>>>> PromLabs - promlabs.com >>>>> >>>> -- >>> You received this message because you are subscribed to the Google >>> Groups "Prometheus Users" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> >> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/prometheus-users/8e9dedc5-38ca-4e22-883c-3f15a5f84227n%40googlegroups.com >>> <https://groups.google.com/d/msgid/prometheus-users/8e9dedc5-38ca-4e22-883c-3f15a5f84227n%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> >> -- > You received this message because you are subscribed to the Google Groups > "Prometheus Users" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/prometheus-users/0a344880-3ac6-4567-9e0a-7e8cec7177dan%40googlegroups.com > <https://groups.google.com/d/msgid/prometheus-users/0a344880-3ac6-4567-9e0a-7e8cec7177dan%40googlegroups.com?utm_medium=email&utm_source=footer> > . > -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/CADyt5g%3D1Lfr4pxiA86cT8MHAVEoLOjdbos446i4emK21F-yHrg%40mail.gmail.com.

