[prometheus-users] Re: Prometheus alert tracking

kedar sirshikar Mon, 01 Jun 2020 16:16:45 -0700

Hi Brian,
I thought this issue will not be faced again however we witnessed this 
alert again on Saturday 09:12:18 am UTC so requesting your guideance.

Alert configuration is as below:

admin@orchestrator[nd2bwa6drm01v]# show running-config alert rule 
PROCESS_STATE 
alert rule PROCESS_STATE
 expression         "docker_service_up==1 or docker_service_up==3"
 event-host-label   container_name
 message            "{{ $labels.service_name }} instance {{ 
$labels.module_instance }} of module {{ $labels.module }} is in Aborted 
state !"
 snmp-facility      application
 snmp-severity      critical
 snmp-clear-message "{{ $labels.service_name }} instance {{ 
$labels.module_instance }} of module {{ $labels.module }} is moved from 
Aborted state !"
!
admin@orchestrator[nd2bwa6drm01v]# 

Recent alert details:

NAME           EVENT HOST    STATUS MESSAGE                                
                      CREATE TIME            RESOLVE TIME           UPDATE 
TIME  

PROCESS_STATE haproxy-common-s101 resolved   haproxy-common instance 101 of 
module haproxy-common is moved from Aborted state !  
2020-05-30T09:12:18.643+00:00  2020-05-30T09:12:33.617+00:00  
2020-05-30T09:27:38.659+00:00
PROCESS_STATE haproxy-common-s103 resolved   haproxy-common instance 103 of 
module haproxy-common is moved from Aborted state !  
2020-05-30T09:12:18.644+00:00  2020-05-30T09:12:33.619+00:00  
2020-05-30T09:27:38.66+00:00

Per your last suggestion, I have also verified below output but it does not 
indicate 'docker_service_up' metric set to either 1 or 3 (for which alert 
is configured)
           "curl '
http://localhost:9090/api/v1/query_range?query=docker_service_up&start=2020-05-24T07:20:00.000Z&end=2020-05-24T08:10:00.000Z&step=1s'

> docker_service_up.log"

Please let me know if you have any comment/opinion.

On Thursday, May 28, 2020 at 11:22:21 PM UTC-7, Brian Candler wrote:
>
> Was this an alert generated by prometheus' alertmanager, or an alert 
> generated by grafana's alerting system?
>
> You said alert was resolved "in 5 seconds" which sounds dubious.  Maybe 
> you have some extremely low interval configured for your alerting rules in 
> prometheus?
>
> Nonetheless, the history is all in prometheus (at least for the TSDB 
> retention period - default 15 days).  You need to work out what expression 
> generated the alert, and use PromQL to explore the data in prometheus.
>
> That's all we can say, unless you show the content of the alert itself 
> *and* the rule which you believe generated the alert *and* the data which 
> backs up your assertion that there was no triggering data in that period.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/bc97a803-fb5a-437f-aea3-83c02374280b%40googlegroups.com.

[prometheus-users] Re: Prometheus alert tracking

Reply via email to