Thanks so much for the reply. I do want the recovery alerts, but the problem is that when I start kapacitor, the task sees *any* server/path in an up status as a recovery of a *different* server/path's down status. So if 7 server/paths are in a down status at the time of start-up I get 7 down alerts (expected) but they are immediately followed by 7 recovery messages from different server/paths. Please let me know if I am not being clear enough.
On Wednesday, February 22, 2017 at 11:10:09 AM UTC-8, [email protected] wrote: > > If you want to ignore the OK alerts use the `.noRecoveries` property of > the alert node. This will suppress the OK alerts. > > On Friday, February 17, 2017 at 3:33:16 PM UTC-7, Archie Archbold wrote: >> >> Hey all. Pretty new to TICK but I have a problem that I can't wrap my >> head around. >> >> I am monitoring multiple servers all sending data to one influxdb >> database and using the 'host' tag to separate the servers in the DB >> >> My 'disk' measurement is taking in mulitiple disk paths from the servers >> (HOSTS) which each have a respective 'PATH' tag. >> >> So basically each server is assigned a HOST tag and each HOST has >> multiple PATH tags. >> >> EXPECTED FUNCTIONALITY: kapacitor should alert upon state change of a >> HOST's PATH if that path is within the alerting Lambda. >> PROBLEM: When I start the kapacitor service, it looks like it's sensing a >> state change any time it sees another host/path with a opposite status. >> >> This is a simplified example of the alerts I am getting: >> >> Host: host1 Path: /path1 Status: UP >> Host: host1 Path: /path2 Status: DOWN >> Host: host1 Path: /path3 Status: UP >> Host: host2 Path: /path1 Status: DOWN >> Host: host2 Path: /path2 Status: UP >> >> These alerts happen once for each host/path combination and then the >> service performs as expected, alerting properly when lambda is achieved. >> >> The result of this is that I receive a slew of up/down alerts every time >> I restart the kapacitor service >> >> Here is my current tick: >> var data = stream >> |from() >> .measurement('disk') >> .groupBy('host','path') >> |alert() >> .message('{{ .ID }} Server:{{ index .Tags "host" }} Path: {{ >> index .Tags "path" }} USED PERCENT: {{ index .Fields "used_percent" }}') >> .warn(lambda: "used_percent" >= 80) >> .id('DISK SPACE WARNING') >> .email($DISK_WARN_GRP) >> >> And the corresponding DOT >> >> ID: disk_alert_warn >> >> Error: >> >> Template: >> >> Type: stream >> >> Status: enabled >> >> Executing: true >> >> Created: 17 Feb 17 22:27 UTC >> >> Modified: 17 Feb 17 22:27 UTC >> >> LastEnabled: 17 Feb 17 22:27 UTC >> >> Databases Retention Policies: ["main"."autogen"] >> >> TICKscript: >> >> var data = stream >> >> |from() >> >> .measurement('disk') >> >> .groupBy('host', 'path') >> >> |alert() >> >> .message('{{ .ID }} Server:{{ index .Tags "host" }} Path: {{ >> index .Tags "path" }} USED PERCENT: {{ index .Fields "used_percent" }}') >> >> .warn(lambda: "used_percent" >= 80) >> >> .id('DISK SPACE WARNING') >> >> .email() >> >> >> DOT: >> >> digraph disk_alert_warn { >> >> graph [throughput="38.00 points/s"]; >> >> >> stream0 [avg_exec_time_ns="0s" ]; >> >> stream0 -> from1 [processed="284"]; >> >> >> from1 [avg_exec_time_ns="3.9µs" ]; >> >> from1 -> alert2 [processed="284"]; >> >> >> alert2 [alerts_triggered="14" avg_exec_time_ns="72.33µs" >> crits_triggered="0" infos_triggered="0" oks_triggered="7" >> warns_triggered="7" ]; >> >> } >> >> As you can see, I get 7 oks triggered (for host/path groups that are not >> in alert range) and 7 warns triggered (for the 7 host/path groups that are >> within the alert range) upon start up. >> Then it behaves as normal. >> >> I understand that it should be alerting for the 7 host/path groups that >> are over 80 but why follow it with an alert about the ok groups? >> >> MORE INFO: When I raise the lambda to 90% (out of range for all >> host/paths) I get no alerts at all (which is expected) >> >> Thanks to anyone who can help me understand this >> > -- Remember to include the version number! --- You received this message because you are subscribed to the Google Groups "InfluxData" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/influxdb. To view this discussion on the web visit https://groups.google.com/d/msgid/influxdb/c7b9a16b-f6b8-4bb8-a0c6-3de0172ce217%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
