Hi Stuart. On Wed, 25 Nov, 2020, 6:56 pm Stuart Clark, <stuart.cl...@jahingo.com> wrote:
> On 25/11/2020 11:46, yagyans...@gmail.com wrote: > > The alert formation doesn't seem to be a problem here, because it > > happens for different alerts randomly. Below is the alert for Exporter > > being down for which it has happened thrice today. > > > > - alert: ExporterDown > > expr: up == 0 > > for: 10m > > labels: > > severity: "CRITICAL" > > annotations: > > summary: "Exporter down on *{{ $labels.instance }}*" > > description: "Not able to fetch application metrics from *{{ > > $labels.instance }}*" > > > > - the ALERTS metric shows what is pending or firing over time > > >> But the problem is that one of my ExporterDown alerts is active > > since the past 10 days, there is no genuine reason for the alert to go > > to a resolved state. > > > What do you have evaluation_interval set to in Prometheus, and > resolve_timeout in Alertmanager? > >> My evaluation interval is 1m whereas my scrape timeout and scrape interval are 25s. Resolve timeout in Alertmanager is 5m. > > Is the alert definitely being resolved, as in you are getting a resolved > email/notification, or could it just be an email/notification for a long > running alert? - you should get another email/notification every now and > then based on repeat_interval. > >> Yes, I suspected that too in the beginning but I am logging each and every alert notification and found that I am indeed getting resolved notification for that alert and again firing notification the very next second. > > > -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-users+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/CAFGi5vBO-T%3DxnZH5FSJBAKTLJp-%2BMDm4fWoHyc_HbwPh4UU3-g%40mail.gmail.com.