Hi,

i am using the amtool client in a Job inside my cluster.

An alert was fired and we got notification in our slack channel, i used the 
cli (in code that runs inside docker image from the Job) to create a 
silence according to `alertname` matcher and there was no failure.

from a look in the AlertManager UI no silence was created, and i got 
resolved notification after 5 minutes since the fired notification.

After ~10 minutes the alert was fired and resolved again (5 minutes 
difference).

I wonder why the silence wasn't able to create? (not the first time it 
happens) 
Maybe it's some kind of a race condition? we can't silence alerts which are 
not in fired state right? (although the alert was in fired state while i 
tried to create the silence)

The Alert rule:
name: Orchestrator GRPC Failures for ExternalProcessor Service 
<http://localhost:9090/graph?g0.expr=ALERTS%7Balertname%3D%22Orchestrator%20GRPC%20Failures%20for%20ExternalProcessor%20Service%22%7D&g0.tab=1&g0.display_mode=lines&g0.show_exemplars=0.g0.range_input=1h.>
expr: 
sum(increase(grpc_server_handled_total{grpc_code!~"OK|Canceled",grpc_service="envoy.service.ext_proc.v3.ExternalProcessor"}[5m]))
 
> 0 
<http://localhost:9090/graph?g0.expr=sum(increase(grpc_server_handled_total%7Bgrpc_code!~%22OK%7CCanceled%22%2Cgrpc_service%3D%22envoy.service.ext_proc.v3.ExternalProcessor%22%7D%5B5m%5D))%20%3E%200&g0.tab=1&g0.display_mode=lines&g0.show_exemplars=0.g0.range_input=1h.>
for: 5m
labels:
severity: WARNING
annotations:
dashboard_url: p-R7Hw1Iz
runbook_url: extension-orchestrator-dashboard
summary: Failed gRPC calls detected in the Envoy External Processor within 
the last 5 minutes. <!subteam^S06E0CPPC5S>

The code for creating the silence:
func postSilence(amCli amclient.Client, matchers []*models.Matcher) error {
startsAt := strfmt.DateTime(silenceStart)
endsAt := strfmt.DateTime(silenceStart.Add(silenceDuration))
createdBy := creatorType
comment := silenceComment
silenceParams := silence.NewPostSilencesParams().WithSilence(
&models.PostableSilence{
Silence: models.Silence{
Matchers:  matchers,
StartsAt:  &startsAt,
EndsAt:    &endsAt,
CreatedBy: &createdBy,
Comment:   &comment,
},
},
)

err := amCli.PostSilence(silenceParams)
if err != nil {
return fmt.Errorf("failed on post silence: %w", err)
}
log.Print("Silence posted successfully")

return nil
}

Thank in advance,
Saar Zur SAP Labs

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion visit 
https://groups.google.com/d/msgid/prometheus-users/60b275a6-f9b2-4bae-a9d2-95460f6b8cf0n%40googlegroups.com.

Reply via email to