> from a look in the AlertManager UI no silence was created, and i got resolved notification after 5 minutes since the fired notification. ... > I wonder why the silence wasn't able to create? (not the first time it happens) > Maybe it's some kind of a race condition? we can't silence alerts which are not in fired state right?
That's not true - you can certainly create silences which don't match any active alerts. This allows you, for example, to create silences before maintenance starts, to suppress the alerts you expect. If the silences aren't being created (i.e. not visible in the GUI), then you need to look deeper into the code which creates them, and perhaps tcpdump the API to alertmanager to see if you're passing valid parameters. On Monday, 13 January 2025 at 15:22:54 UTC Saar Zur wrote: > Hi, > > i am using the amtool client in a Job inside my cluster. > > An alert was fired and we got notification in our slack channel, i used > the cli (in code that runs inside docker image from the Job) to create a > silence according to `alertname` matcher and there was no failure. > > from a look in the AlertManager UI no silence was created, and i got > resolved notification after 5 minutes since the fired notification. > > After ~10 minutes the alert was fired and resolved again (5 minutes > difference). > > I wonder why the silence wasn't able to create? (not the first time it > happens) > Maybe it's some kind of a race condition? we can't silence alerts which > are not in fired state right? (although the alert was in fired state while > i tried to create the silence) > > The Alert rule: > name: Orchestrator GRPC Failures for ExternalProcessor Service > <http://localhost:9090/graph?g0.expr=ALERTS%7Balertname%3D%22Orchestrator%20GRPC%20Failures%20for%20ExternalProcessor%20Service%22%7D&g0.tab=1&g0.display_mode=lines&g0.show_exemplars=0.g0.range_input=1h.> > expr: > sum(increase(grpc_server_handled_total{grpc_code!~"OK|Canceled",grpc_service="envoy.service.ext_proc.v3.ExternalProcessor"}[5m])) > > > 0 > <http://localhost:9090/graph?g0.expr=sum(increase(grpc_server_handled_total%7Bgrpc_code!~%22OK%7CCanceled%22%2Cgrpc_service%3D%22envoy.service.ext_proc.v3.ExternalProcessor%22%7D%5B5m%5D))%20%3E%200&g0.tab=1&g0.display_mode=lines&g0.show_exemplars=0.g0.range_input=1h.> > for: 5m > labels: > severity: WARNING > annotations: > dashboard_url: p-R7Hw1Iz > runbook_url: extension-orchestrator-dashboard > summary: Failed gRPC calls detected in the Envoy External Processor within > the last 5 minutes. <!subteam^S06E0CPPC5S> > > The code for creating the silence: > func postSilence(amCli amclient.Client, matchers []*models.Matcher) error { > startsAt := strfmt.DateTime(silenceStart) > endsAt := strfmt.DateTime(silenceStart.Add(silenceDuration)) > createdBy := creatorType > comment := silenceComment > silenceParams := silence.NewPostSilencesParams().WithSilence( > &models.PostableSilence{ > Silence: models.Silence{ > Matchers: matchers, > StartsAt: &startsAt, > EndsAt: &endsAt, > CreatedBy: &createdBy, > Comment: &comment, > }, > }, > ) > > err := amCli.PostSilence(silenceParams) > if err != nil { > return fmt.Errorf("failed on post silence: %w", err) > } > log.Print("Silence posted successfully") > > return nil > } > > Thank in advance, > Saar Zur SAP Labs > -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion visit https://groups.google.com/d/msgid/prometheus-users/89a29d4b-288c-466d-8510-e6cbbf229c75n%40googlegroups.com.

