Hi,
Recently, I've been debugging an issue where the alert is resolving even
though from prometheus it is in firing mode.
so, the cycle is firing->resolving->firing.
After going through some documents and blogs, I found out that alertmanager
will resolve the alert if the prometheus doesn't send the alert within the "
*resolve_timeout*".
If, Prometheus now sends the *endsAt* field to the Alertmanager with a very
short timeout until AlertManager can mark the alert as resolved. This
overrides the *resolve_timeout* setting in AlertManager and creates the
firing->resolved->firing behavior if Prometheus does not resend the alert
before the short timeout.
Is that understanding correct?
*Questions as follows:*
1) How endsAt time is calculated? Is that calculated from *resend_delay*?
2) *say, If alert is sent to alertmanager and resend_delay is 100hrs, and
during next evaluation interval which is 1m, alert is resolved. will
prometheus send alert clear to alertmanager or it will wait for
100hrs(resend_delay)?*
3)* msg:Received alerts *logs are received when prometheus sends alerts to
alertmanager? msg:flushing* logs get *logged when? (below)
4) evaluation_interval : 1m, and scrape_interval : 1m. then why did the
received alert at 12:34 and received alert at 12:36 have a time difference
of 2m?
When I do get a request for alerts from alertmanager, I could see
endsat time is +4 minutes from the last received alert, why is that so? *Is
my resend_delay 4m? Because, I didn't set the resend_delay value.*
*Below are the logs from alertmanager :*
level=debug ts=2021-08-29T12:34:40.342Z caller=dispatch.go:138
component=dispatcher msg="*Received alert*"
alert=disk_utilization[6356c43][active]
level=debug ts=2021-08-29T12:34:40.342Z caller=dispatch.go:138
component=dispatcher msg="*Received alert*"
alert=disk_utilization[1db5352][active]
level=debug ts=2021-08-29T12:34:40.381Z caller=dispatch.go:473
component=dispatcher
aggrGroup="{}/{name=~\"^(?:test-1)$\"}:{alertname=\"disk_utilization\"}"
msg=flushing alerts="[disk_utilization[6356c43][active]
disk_utilization[1db5352][active]]"
level=debug ts=2021-08-29T12:35:10.381Z caller=dispatch.go:473
component=dispatcher
aggrGroup="{}/{name=~\"^(?:test-1)$\"}:{alertname=\"disk_utilization\"}"
msg=flushing alerts="[disk_utilization[6356c43][active]
disk_utilization[1db5352][active]]"
level=debug ts=2021-08-29T12:35:40.382Z caller=dispatch.go:473
component=dispatcher
aggrGroup="{}/{name=~\"^(?:test-1)$\"}:{alertname=\"disk_utilization\"}"
msg=flushing alerts="[disk_utilization[6356c43][active]
disk_utilization[1db5352][active]]"
level=debug ts=2021-08-29T12:36:10.382Z caller=dispatch.go:473
component=dispatcher
aggrGroup="{}/{name=~\"^(?:test-1)$\"}:{alertname=\"disk_utilization\"}"
msg=flushing alerts="[disk_utilization[6356c43][active]
disk_utilization[1db5352][active]]"
level=debug ts=2021-08-29T12:36:40.345Z caller=dispatch.go:138
component=dispatcher msg="*Received alert*"
alert=disk_utilization[6356c43][active]
level=debug ts=2021-08-29T12:36:40.345Z caller=dispatch.go:138
component=dispatcher msg="*Received alert*"
alert=disk_utilization[1db5352][active]
Get request from alertmanager:
curl http://10.233.49.116:9092/api/v1/alerts
{"status":"success","data":[{"labels":{"alertname":"disk_utilization","device":"xx.xx.xx.xx:/media/test","fstype":"nfs4","instance":"xx.xx.xx.xx","job":"test-1","mountpoint":"/media/test","node_name":"test-1","severity":"critical"},"annotations":{"summary":"Disk
utilization has crossed x%. Current Disk utilization =
86.823044624783"},"startsAt":"2021-08-29T11:28:40.339802555Z",
*"endsAt":"2021-08-29T12:40:40.339802555Z",*"generatorURL":"x","status":{"state":"active","silencedBy":[],"inhibitedBy":[]},"receivers":["test-1"],"fingerprint":"1db535212ea6dcf6"},{"labels":{"alertname":"disk_utilization","device":"test","fstype":"ext4","instance":"xx.xx.xx.xx","job":"Node_test-1","mountpoint":"/","node_name":"test-1","severity":"critical"},"annotations":{"summary":"Disk
utilization has crossed x%. Current Disk utilization =
94.59612027578963"},"startsAt":"2021-08-29T11:28:40.339802555Z","
*endsAt":"2021-08-29T12:40:40.339802555Z*
","generatorURL":"x","status":{"state":"active","silencedBy":[],"inhibitedBy":[]},"receivers":["test-1"],"fingerprint":"6356c43dc3589622"}]}
thanks,
Akshay
--
You received this message because you are subscribed to the Google Groups
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/prometheus-users/CAOrgXN%2BTixLKn_9saFzZzLoZc6tNSP0NsNwkLmEtd1r8GTTv1w%40mail.gmail.com.