Hi Brian, I thought this issue will not be faced again however we witnessed this alert again on Saturday 09:12:18 am UTC so requesting your guideance.
Alert configuration is as below: admin@orchestrator[nd2bwa6drm01v]# show running-config alert rule PROCESS_STATE alert rule PROCESS_STATE expression "docker_service_up==1 or docker_service_up==3" event-host-label container_name message "{{ $labels.service_name }} instance {{ $labels.module_instance }} of module {{ $labels.module }} is in Aborted state !" snmp-facility application snmp-severity critical snmp-clear-message "{{ $labels.service_name }} instance {{ $labels.module_instance }} of module {{ $labels.module }} is moved from Aborted state !" ! admin@orchestrator[nd2bwa6drm01v]# Recent alert details: NAME EVENT HOST STATUS MESSAGE CREATE TIME RESOLVE TIME UPDATE TIME PROCESS_STATE haproxy-common-s101 resolved haproxy-common instance 101 of module haproxy-common is moved from Aborted state ! 2020-05-30T09:12:18.643+00:00 2020-05-30T09:12:33.617+00:00 2020-05-30T09:27:38.659+00:00 PROCESS_STATE haproxy-common-s103 resolved haproxy-common instance 103 of module haproxy-common is moved from Aborted state ! 2020-05-30T09:12:18.644+00:00 2020-05-30T09:12:33.619+00:00 2020-05-30T09:27:38.66+00:00 Per your last suggestion, I have also verified below output but it does not indicate 'docker_service_up' metric set to either 1 or 3 (for which alert is configured) "curl ' http://localhost:9090/api/v1/query_range?query=docker_service_up&start=2020-05-24T07:20:00.000Z&end=2020-05-24T08:10:00.000Z&step=1s' > docker_service_up.log" Please let me know if you have any comment/opinion. On Thursday, May 28, 2020 at 11:22:21 PM UTC-7, Brian Candler wrote: > > Was this an alert generated by prometheus' alertmanager, or an alert > generated by grafana's alerting system? > > You said alert was resolved "in 5 seconds" which sounds dubious. Maybe > you have some extremely low interval configured for your alerting rules in > prometheus? > > Nonetheless, the history is all in prometheus (at least for the TSDB > retention period - default 15 days). You need to work out what expression > generated the alert, and use PromQL to explore the data in prometheus. > > That's all we can say, unless you show the content of the alert itself > *and* the rule which you believe generated the alert *and* the data which > backs up your assertion that there was no triggering data in that period. > -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-users+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/bc97a803-fb5a-437f-aea3-83c02374280b%40googlegroups.com.