Thanks for your response. Another question - there is a way to delete expired silences via the api ?
ב-יום שני, 13 בינואר 2025 בשעה 18:16:36 UTC+2, Brian Candler כתב/ה: > > from a look in the AlertManager UI no silence was created, and i got > resolved notification after 5 minutes since the fired notification. > ... > > I wonder why the silence wasn't able to create? (not the first time it > happens) > > Maybe it's some kind of a race condition? we can't silence alerts which > are not in fired state right? > > That's not true - you can certainly create silences which don't match any > active alerts. This allows you, for example, to create silences before > maintenance starts, to suppress the alerts you expect. > > If the silences aren't being created (i.e. not visible in the GUI), then > you need to look deeper into the code which creates them, and perhaps > tcpdump the API to alertmanager to see if you're passing valid parameters. > > On Monday, 13 January 2025 at 15:22:54 UTC Saar Zur wrote: > >> Hi, >> >> i am using the amtool client in a Job inside my cluster. >> >> An alert was fired and we got notification in our slack channel, i used >> the cli (in code that runs inside docker image from the Job) to create a >> silence according to `alertname` matcher and there was no failure. >> >> from a look in the AlertManager UI no silence was created, and i got >> resolved notification after 5 minutes since the fired notification. >> >> After ~10 minutes the alert was fired and resolved again (5 minutes >> difference). >> >> I wonder why the silence wasn't able to create? (not the first time it >> happens) >> Maybe it's some kind of a race condition? we can't silence alerts which >> are not in fired state right? (although the alert was in fired state while >> i tried to create the silence) >> >> The Alert rule: >> name: Orchestrator GRPC Failures for ExternalProcessor Service >> <http://localhost:9090/graph?g0.expr=ALERTS%7Balertname%3D%22Orchestrator%20GRPC%20Failures%20for%20ExternalProcessor%20Service%22%7D&g0.tab=1&g0.display_mode=lines&g0.show_exemplars=0.g0.range_input=1h.> >> expr: >> sum(increase(grpc_server_handled_total{grpc_code!~"OK|Canceled",grpc_service="envoy.service.ext_proc.v3.ExternalProcessor"}[5m])) >> >> > 0 >> <http://localhost:9090/graph?g0.expr=sum(increase(grpc_server_handled_total%7Bgrpc_code!~%22OK%7CCanceled%22%2Cgrpc_service%3D%22envoy.service.ext_proc.v3.ExternalProcessor%22%7D%5B5m%5D))%20%3E%200&g0.tab=1&g0.display_mode=lines&g0.show_exemplars=0.g0.range_input=1h.> >> for: 5m >> labels: >> severity: WARNING >> annotations: >> dashboard_url: p-R7Hw1Iz >> runbook_url: extension-orchestrator-dashboard >> summary: Failed gRPC calls detected in the Envoy External Processor >> within the last 5 minutes. <!subteam^S06E0CPPC5S> >> >> The code for creating the silence: >> func postSilence(amCli amclient.Client, matchers []*models.Matcher) error >> { >> startsAt := strfmt.DateTime(silenceStart) >> endsAt := strfmt.DateTime(silenceStart.Add(silenceDuration)) >> createdBy := creatorType >> comment := silenceComment >> silenceParams := silence.NewPostSilencesParams().WithSilence( >> &models.PostableSilence{ >> Silence: models.Silence{ >> Matchers: matchers, >> StartsAt: &startsAt, >> EndsAt: &endsAt, >> CreatedBy: &createdBy, >> Comment: &comment, >> }, >> }, >> ) >> >> err := amCli.PostSilence(silenceParams) >> if err != nil { >> return fmt.Errorf("failed on post silence: %w", err) >> } >> log.Print("Silence posted successfully") >> >> return nil >> } >> >> Thank in advance, >> Saar Zur SAP Labs >> > -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion visit https://groups.google.com/d/msgid/prometheus-users/8648cb55-e90e-47b6-8fb2-fdfa0def0f95n%40googlegroups.com.

