Re: [prometheus-users] Messages are dropping because too many are queued in AlertManager

shivakumar sajjan Thu, 06 Jan 2022 21:27:33 -0800

Hi Matthias,

Thanks for responding my questions


It is a service where I added an API to post alert
information(firing/resolved) by alertmanager whenever alerts are triggered.

*There are below warnings in AlertManager pod logs:*

level=warn ts=2022-01-06T20:27:41.726Z caller=delegate.go:272
component=cluster msg="dropping messages because too many are queued"
current=4097 limit=4096
level=warn ts=2022-01-06T20:42:41.726Z caller=delegate.go:272
component=cluster msg="dropping messages because too many are queued"
current=4121 limit=4096
level=warn ts=2022-01-06T21:27:41.726Z caller=delegate.go:272
component=cluster msg="dropping messages because too many are queued"
current=4097 limit=4096
level=warn ts=2022-01-06T21:42:41.726Z caller=delegate.go:272
component=cluster msg="dropping messages because too many are queued"
current=4098 limit=4096
level=warn ts=2022-01-06T21:57:41.727Z caller=delegate.go:272
component=cluster msg="dropping messages because too many are queued"
current=4098 limit=4096
level=warn ts=2022-01-06T22:42:41.727Z caller=delegate.go:272
component=cluster msg="dropping messages because too many are queued"
current=4123 limit=4096
level=warn ts=2022-01-06T22:57:41.727Z caller=delegate.go:272
component=cluster msg="dropping messages because too many are queued"
current=4155 limit=4096
level=warn ts=2022-01-06T23:12:41.727Z caller=delegate.go:272
component=cluster msg="dropping messages because too many are queued"
current=4100 limit=4096
level=warn ts=2022-01-06T23:27:41.728Z caller=delegate.go:272
component=cluster msg="dropping messages because too many are queued"
current=4097 limit=4096
level=warn ts=2022-01-06T23:42:41.728Z caller=delegate.go:272
component=cluster msg="dropping messages because too many are queued"
current=4099 limit=4096
level=warn ts=2022-01-06T23:57:41.728Z caller=delegate.go:272
component=cluster msg="dropping messages because too many are queued"
current=4097 limit=4096
level=warn ts=2022-01-07T00:27:41.728Z caller=delegate.go:272
component=cluster msg="dropping messages because too many are queued"
current=4124 limit=4096
level=warn ts=2022-01-07T00:42:41.729Z caller=delegate.go:272
component=cluster msg="dropping messages because too many are queued"
current=4124 limit=4096
level=warn ts=2022-01-07T00:57:41.729Z caller=delegate.go:272
component=cluster msg="dropping messages because too many are queued"
current=4097 limit=4096
level=warn ts=2022-01-07T01:42:41.729Z caller=delegate.go:272
component=cluster msg="dropping messages because too many are queued"
current=4099 limit=4096
level=warn ts=2022-01-07T01:57:41.730Z caller=delegate.go:272
component=cluster msg="dropping messages because too many are queued"
current=4098 limit=4096
level=warn ts=2022-01-07T02:42:41.730Z caller=delegate.go:272
component=cluster msg="dropping messages because too many are queued"
current=4098 limit=4096
level=warn ts=2022-01-07T02:57:41.730Z caller=delegate.go:272
component=cluster msg="dropping messages because too many are queued"
current=4155 limit=4096
level=warn ts=2022-01-07T03:12:41.730Z caller=delegate.go:272
component=cluster msg="dropping messages because too many are queued"
current=4098 limit=4096
level=warn ts=2022-01-07T03:27:41.731Z caller=delegate.go:272
component=cluster msg="dropping messages because too many are queued"
current=4098 limit=4096
level=warn ts=2022-01-07T03:42:41.731Z caller=delegate.go:272
component=cluster msg="dropping messages because too many are queued"
current=4099 limit=4096
level=warn ts=2022-01-07T03:57:41.731Z caller=delegate.go:272
component=cluster msg="dropping messages because too many are queued"
current=4098 limit=4096
level=warn ts=2022-01-07T04:42:41.732Z caller=delegate.go:272
component=cluster msg="dropping messages because too many are queued"
current=4098 limit=4096
level=warn ts=2022-01-07T04:57:41.732Z caller=delegate.go:272
component=cluster msg="dropping messages because too many are queued"
current=4097 limit=4096


*There are errors in prometheus server pod logs:*

level=error ts=2021-09-06T10:11:22.754Z caller=notifier.go:528
component=notifier alertmanager=http://127.0.0.1:9093/api/v1/alerts count=0
msg="Error sending alert" err="Post http://127.0.0.1:9093/api/v1/alerts:
context deadline exceeded"
level=error ts=2021-09-07T23:36:27.753Z caller=notifier.go:528
component=notifier alertmanager=http://127.0.0.1:9093/api/v1/alerts count=0
msg="Error sending alert" err="Post http://127.0.0.1:9093/api/v1/alerts:
context deadline exceeded"
level=error ts=2021-09-07T23:36:52.755Z caller=notifier.go:528
component=notifier alertmanager=http://127.0.0.1:9093/api/v1/alerts count=0
msg="Error sending alert" err="Post http://10.64.87.17:9093/api/v1/alerts:
dial tcp 127.0.0.1:9093: i/o timeout"
level=error ts=2021-09-07T23:37:02.756Z caller=notifier.go:528
component=notifier alertmanager=http://127.0.0.1:9093/api/v1/alerts
count=64 msg="Error sending alert" err="Post
http://127.0.0.1:9093/api/v1/alerts: context deadline exceeded"
level=error ts=2021-09-07T23:37:12.757Z caller=notifier.go:528
component=notifier alertmanager=http://127.0.0.1:9093/api/v1/alerts
count=11 msg="Error sending alert" err="Post
http://127.0.0.1:9093/api/v1/alerts: context deadline exceeded"
level=error ts=2021-09-07T23:37:27.755Z caller=notifier.go:528
component=notifier alertmanager=http://127.0.0.1:9093/api/v1/alerts count=0
msg="Error sending alert" err="Post http://127.0.0.1:9093/api/v1/alerts:
context deadline exceeded"
level=error ts=2021-09-07T23:37:42.754Z caller=notifier.go:528
component=notifier alertmanager=http://127.0.0.1:9093/api/v1/alerts count=0
msg="Error sending alert" err="Post http://127.0.0.1:9093/api/v1/alerts:
context deadline exceeded"
level=error ts=2021-09-07T23:37:56.967Z caller=notifier.go:528
component=notifier alertmanager=http://127.0.0.1:9093/api/v1/alerts count=2
msg="Error sending alert" err="Post http://127.0.0.1:9093/api/v1/alerts:
context deadline exceeded"
level=error ts=2021-09-07T23:38:06.968Z caller=notifier.go:528
component=notifier alertmanager=http://127.0.0.1:9093/api/v1/alerts
count=18 msg="Error sending alert" err="Post
http://127.0.0.1:9093/api/v1/alerts: context deadline exceeded"

*May I know what could be the cause ?*

Thanks,
Shiva


On Fri, Jan 7, 2022 at 2:45 AM Matthias Rampke <matth...@prometheus.io>
wrote:

> What is your webhook receiver? Are any of the resolve messages getting
> through? Are the requests succeeding?
>
> I think Alertmanager will retry failed webhooks, not sure for how long.
> This would keep them in the queue, leading to what you observe in
> Alertmanager.
>
> /MR
>
> On Thu, Jan 6, 2022, 07:14 shivakumar sajjan <shivusajjan...@gmail.com>
> wrote:
>
>> Hi,
>>
>> I have single instance cluster for AlertManager and I see below warning
>> in AlertManager
>>
>>
>> *container level=warn ts=2021-11-03T08:50:44.528Z caller=delegate.go:272
>> component=cluster msg="dropping messages because too many are queued"
>> current=4125 limit=4096*
>>
>> *Alert Manager Version information:*
>> Branch: HEAD
>> BuildDate: 20190708-14:31:49
>> BuildUser: root@868685ed3ed0
>> GoVersion: go1.12.6
>> Revision: 1ace0f76b7101cccc149d7298022df36039858ca
>> Version: 0.18.0
>>
>> *AlertManager metrics*
>> # HELP alertmanager_cluster_members Number indicating current number of
>> members in cluster.
>> # TYPE alertmanager_cluster_members gauge alertmanager_cluster_members 1
>> # HELP alertmanager_cluster_messages_pruned_total Total number of cluster
>> messages pruned.
>> # TYPE alertmanager_cluster_messages_pruned_total counter
>> alertmanager_cluster_messages_pruned_total 23020
>> # HELP alertmanager_cluster_messages_queued Number of cluster messages
>> which are queued.
>> # TYPE alertmanager_cluster_messages_queued gauge
>> alertmanager_cluster_messages_queued 4125
>>
>> I am new to alerting. Could you please answer for the below questions?
>>
>>
>>    -
>>
>>    Why are messages queueing up due to this alertmanager is not sending
>>    alert resolve information to webhook instance?
>>    -
>>
>>    What is the solution for the above issue?
>>    -
>>
>>    How do we see those queued messages in AlertManager?
>>    -
>>
>>    Do we lose alerts when messages are dropped because of too many
>>    queued ?
>>    -
>>
>>    Why are messages queued even though there is logic to prune messages
>>    on regular interval i.e 15 minutes ?
>>    -
>>
>>    Do we lose alerts when AlertManager pruned messages on regular
>>    interval?
>>
>>
>> Thanks,
>>
>> Shiva
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "Prometheus Users" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to prometheus-users+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/prometheus-users/ea293a79-9f3f-42f1-b9a4-66ff7353cb16n%40googlegroups.com
>> <https://groups.google.com/d/msgid/prometheus-users/ea293a79-9f3f-42f1-b9a4-66ff7353cb16n%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/CAL%3DnBmX0iOxHnGZO-RJA-dMi22cuctWTaBWkV8hXAoYjQqa2Rw%40mail.gmail.com.

Re: [prometheus-users] Messages are dropping because too many are queued in AlertManager

Reply via email to