Any updates on this? We're having this same problem. Restart Kapacitor or re-define the task, and we get spammed alerts saying everything is OK (even from hosts which never entered a non-OK state).
Our TICK is pretty simple (and very similar to OP): stream |from() .database('telegraf') .measurement('disk') .groupBy('host', 'device') |alert() .warn(lambda: "used_percent" >= 80) .warnReset(lambda: "used_percent" < 80) .crit(lambda: "used_percent" >= 90) .critReset(lambda: "used_percent" < 90) .stateChangesOnly() On Wednesday, February 22, 2017 at 7:14:23 PM UTC-5, Archie Archbold wrote: > Interestingly enough, when I add the .noRecoveries() property to the alert > node I only get one DOWN alert even though there are 7 servers that are > within the alert range > > On Wednesday, February 22, 2017 at 11:10:09 AM UTC-8, nath...@influxdb.com > wrote: > If you want to ignore the OK alerts use the `.noRecoveries` property of the > alert node. This will suppress the OK alerts. > > On Friday, February 17, 2017 at 3:33:16 PM UTC-7, Archie Archbold wrote: > Hey all. Pretty new to TICK but I have a problem that I can't wrap my head > around. > > > I am monitoring multiple servers all sending data to one influxdb database > and using the 'host' tag to separate the servers in the DB > > > My 'disk' measurement is taking in mulitiple disk paths from the servers > (HOSTS) which each have a respective 'PATH' tag. > > > So basically each server is assigned a HOST tag and each HOST has multiple > PATH tags. > > > EXPECTED FUNCTIONALITY: kapacitor should alert upon state change of a HOST's > PATH if that path is within the alerting Lambda. > PROBLEM: When I start the kapacitor service, it looks like it's sensing a > state change any time it sees another host/path with a opposite status. > > > This is a simplified example of the alerts I am getting: > > > Host: host1 Path: /path1 Status: UP > Host: host1 Path: /path2 Status: DOWN > Host: host1 Path: /path3 Status: UP > Host: host2 Path: /path1 Status: DOWN > Host: host2 Path: /path2 Status: UP > > > > These alerts happen once for each host/path combination and then the service > performs as expected, alerting properly when lambda is achieved. > > > The result of this is that I receive a slew of up/down alerts every time I > restart the kapacitor service > > > Here is my current tick: > > > > var data = stream > |from() > .measurement('disk') > .groupBy('host','path') > |alert() > .message('{{ .ID }} Server:{{ index .Tags "host" }} Path: {{ index > .Tags "path" }} USED PERCENT: {{ index .Fields "used_percent" }}') > .warn(lambda: "used_percent" >= 80) > .id('DISK SPACE WARNING') > .email($DISK_WARN_GRP) > And the corresponding DOT > > > > > > > > > > > > ID: disk_alert_warn > > Error: > > Template: > > Type: stream > > Status: enabled > > Executing: true > > Created: 17 Feb 17 22:27 UTC > > Modified: 17 Feb 17 22:27 UTC > > LastEnabled: 17 Feb 17 22:27 UTC > > Databases Retention Policies: ["main"."autogen"] > > TICKscript: > > var data = stream > > |from() > > .measurement('disk') > > .groupBy('host', 'path') > > |alert() > > .message('{{ .ID }} Server:{{ index .Tags "host" }} Path: {{ index > .Tags "path" }} USED PERCENT: {{ index .Fields "used_percent" }}') > > .warn(lambda: "used_percent" >= 80) > > .id('DISK SPACE WARNING') > > .email() > > > > > DOT: > > digraph disk_alert_warn { > > graph [throughput="38.00 points/s"]; > > > > > stream0 [avg_exec_time_ns="0s" ]; > > stream0 -> from1 [processed="284"]; > > > > > from1 [avg_exec_time_ns="3.9µs" ]; > > from1 -> alert2 [processed="284"]; > > > > > alert2 [alerts_triggered="14" avg_exec_time_ns="72.33µs" crits_triggered="0" > infos_triggered="0" oks_triggered="7" warns_triggered="7" ]; > > } > As you can see, I get 7 oks triggered (for host/path groups that are not in > alert range) and 7 warns triggered (for the 7 host/path groups that are > within the alert range) upon start up. > Then it behaves as normal. > > > I understand that it should be alerting for the 7 host/path groups that are > over 80 but why follow it with an alert about the ok groups? > > > MORE INFO: When I raise the lambda to 90% (out of range for all host/paths) I > get no alerts at all (which is expected) > > > Thanks to anyone who can help me understand this -- Remember to include the version number! --- You received this message because you are subscribed to the Google Groups "InfluxData" group. To unsubscribe from this group and stop receiving emails from it, send an email to influxdb+unsubscr...@googlegroups.com. To post to this group, send email to influxdb@googlegroups.com. Visit this group at https://groups.google.com/group/influxdb. To view this discussion on the web visit https://groups.google.com/d/msgid/influxdb/4572f9e6-6977-4084-832c-80f7e5d82795%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.