> from a look in the AlertManager UI no silence was created, and i got 
resolved notification after 5 minutes since the fired notification.
...
> I wonder why the silence wasn't able to create? (not the first time it 
happens) 
> Maybe it's some kind of a race condition? we can't silence alerts which 
are not in fired state right?

That's not true - you can certainly create silences which don't match any 
active alerts.  This allows you, for example, to create silences before 
maintenance starts, to suppress the alerts you expect.

If the silences aren't being created (i.e. not visible in the GUI), then 
you need to look deeper into the code which creates them, and perhaps 
tcpdump the API to alertmanager to see if you're passing valid parameters.

On Monday, 13 January 2025 at 15:22:54 UTC Saar Zur wrote:

> Hi,
>
> i am using the amtool client in a Job inside my cluster.
>
> An alert was fired and we got notification in our slack channel, i used 
> the cli (in code that runs inside docker image from the Job) to create a 
> silence according to `alertname` matcher and there was no failure.
>
> from a look in the AlertManager UI no silence was created, and i got 
> resolved notification after 5 minutes since the fired notification.
>
> After ~10 minutes the alert was fired and resolved again (5 minutes 
> difference).
>
> I wonder why the silence wasn't able to create? (not the first time it 
> happens) 
> Maybe it's some kind of a race condition? we can't silence alerts which 
> are not in fired state right? (although the alert was in fired state while 
> i tried to create the silence)
>
> The Alert rule:
> name: Orchestrator GRPC Failures for ExternalProcessor Service 
> <http://localhost:9090/graph?g0.expr=ALERTS%7Balertname%3D%22Orchestrator%20GRPC%20Failures%20for%20ExternalProcessor%20Service%22%7D&g0.tab=1&g0.display_mode=lines&g0.show_exemplars=0.g0.range_input=1h.>
> expr: 
> sum(increase(grpc_server_handled_total{grpc_code!~"OK|Canceled",grpc_service="envoy.service.ext_proc.v3.ExternalProcessor"}[5m]))
>  
> > 0 
> <http://localhost:9090/graph?g0.expr=sum(increase(grpc_server_handled_total%7Bgrpc_code!~%22OK%7CCanceled%22%2Cgrpc_service%3D%22envoy.service.ext_proc.v3.ExternalProcessor%22%7D%5B5m%5D))%20%3E%200&g0.tab=1&g0.display_mode=lines&g0.show_exemplars=0.g0.range_input=1h.>
> for: 5m
> labels:
> severity: WARNING
> annotations:
> dashboard_url: p-R7Hw1Iz
> runbook_url: extension-orchestrator-dashboard
> summary: Failed gRPC calls detected in the Envoy External Processor within 
> the last 5 minutes. <!subteam^S06E0CPPC5S>
>
> The code for creating the silence:
> func postSilence(amCli amclient.Client, matchers []*models.Matcher) error {
> startsAt := strfmt.DateTime(silenceStart)
> endsAt := strfmt.DateTime(silenceStart.Add(silenceDuration))
> createdBy := creatorType
> comment := silenceComment
> silenceParams := silence.NewPostSilencesParams().WithSilence(
> &models.PostableSilence{
> Silence: models.Silence{
> Matchers:  matchers,
> StartsAt:  &startsAt,
> EndsAt:    &endsAt,
> CreatedBy: &createdBy,
> Comment:   &comment,
> },
> },
> )
>
> err := amCli.PostSilence(silenceParams)
> if err != nil {
> return fmt.Errorf("failed on post silence: %w", err)
> }
> log.Print("Silence posted successfully")
>
> return nil
> }
>
> Thank in advance,
> Saar Zur SAP Labs
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion visit 
https://groups.google.com/d/msgid/prometheus-users/89a29d4b-288c-466d-8510-e6cbbf229c75n%40googlegroups.com.

Reply via email to