[influxdb] Re: kapacitor sending ok alerts as state change on start up

jcmcken Mon, 12 Jun 2017 15:52:58 -0700

>From some limited testing, it seems like the problem is that if a particular 
>host (say, 'host1') has a WARN/CRITICAL for a particular (host1, device1) 
>grouping, as well as an OK for a (host1, device2) grouping, that alerts will 
>be generated for both device1 and device2, even though only device1 is in an 
>alert state.


I've tested this hypothesis on a host that has no groupings in an alert state, 
and one with just a single grouping in an alert state. The host with no 
groupings in alert state receives no alerts. 

Can anyone make sense of this?

I'm using Kapacitor 1.3.1 BTW

On Monday, June 12, 2017 at 12:17:25 PM UTC-4, jcm...@gmail.com wrote:
> Any updates on this?
> 
> We're having this same problem. Restart Kapacitor or re-define the task, and 
> we get spammed alerts saying everything is OK (even from hosts which never 
> entered a non-OK state).
> 
> Our TICK is pretty simple (and very similar to OP):
> 
> stream
>     |from()
>         .database('telegraf')
>         .measurement('disk')
>         .groupBy('host', 'device')
>     |alert()
>         .warn(lambda: "used_percent" >= 80)
>         .warnReset(lambda: "used_percent" < 80)
>         .crit(lambda: "used_percent" >= 90)
>         .critReset(lambda: "used_percent" < 90)
>         .stateChangesOnly()
> 
> 
> 
> On Wednesday, February 22, 2017 at 7:14:23 PM UTC-5, Archie Archbold wrote:
> > Interestingly enough, when I add the .noRecoveries() property to the alert 
> > node I only get one DOWN alert even though there are 7 servers that are 
> > within the alert range 
> > 
> > On Wednesday, February 22, 2017 at 11:10:09 AM UTC-8, nath...@influxdb.com 
> > wrote:
> > If you want to ignore the OK alerts use the `.noRecoveries` property of the 
> > alert node. This will suppress the OK alerts.
> > 
> > On Friday, February 17, 2017 at 3:33:16 PM UTC-7, Archie Archbold wrote:
> > Hey all. Pretty new to TICK but I have a problem that I can't wrap my head 
> > around.
> > 
> > 
> > I am monitoring multiple servers all sending data to one influxdb database 
> > and using the 'host' tag to separate the servers in the DB
> > 
> > 
> > My 'disk' measurement  is taking in mulitiple disk paths from the servers 
> > (HOSTS) which each have a respective 'PATH' tag.
> > 
> > 
> > So basically each server is assigned a HOST tag and each HOST has multiple 
> > PATH tags.
> > 
> > 
> > EXPECTED FUNCTIONALITY: kapacitor should alert upon state change of a 
> > HOST's PATH if that path is within the alerting Lambda. 
> > PROBLEM: When I start the kapacitor service, it looks like it's sensing a 
> > state change any time it sees another host/path with a opposite status.
> > 
> > 
> > This is a simplified example of the alerts I am getting:
> > 
> > 
> > Host: host1  Path: /path1  Status: UP
> > Host: host1  Path: /path2  Status: DOWN
> > Host: host1  Path: /path3  Status: UP
> > Host: host2  Path: /path1 Status: DOWN
> > Host: host2  Path: /path2  Status: UP
> > 
> > 
> > 
> > These alerts happen once for each host/path combination and then the 
> > service performs as expected, alerting properly when lambda is achieved.
> > 
> > 
> > The result of this is that I receive a slew of up/down alerts every time I 
> > restart the kapacitor service
> > 
> > 
> > Here is my current tick:
> > 
> > 
> > 
> > var data = stream
> >     |from()
> >         .measurement('disk')
> >         .groupBy('host','path')       
> >     |alert()
> >         .message('{{ .ID }} Server:{{ index .Tags "host" }} Path: {{ index 
> > .Tags "path" }} USED PERCENT: {{ index .Fields "used_percent" }}')
> >         .warn(lambda: "used_percent" >= 80)
> >           .id('DISK SPACE WARNING')
> >         .email($DISK_WARN_GRP)
> > And the corresponding DOT
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > ID: disk_alert_warn
> > 
> > Error: 
> > 
> > Template: 
> > 
> > Type: stream
> > 
> > Status: enabled
> > 
> > Executing: true
> > 
> > Created: 17 Feb 17 22:27 UTC
> > 
> > Modified: 17 Feb 17 22:27 UTC
> > 
> > LastEnabled: 17 Feb 17 22:27 UTC
> > 
> > Databases Retention Policies: ["main"."autogen"]
> > 
> > TICKscript:
> > 
> > var data = stream
> > 
> >     |from()
> > 
> >         .measurement('disk')
> > 
> >         .groupBy('host', 'path')
> > 
> >     |alert()
> > 
> >         .message('{{ .ID }} Server:{{ index .Tags "host" }} Path: {{ index 
> > .Tags "path" }} USED PERCENT: {{ index .Fields "used_percent" }}')
> > 
> >         .warn(lambda: "used_percent" >= 80)
> > 
> >         .id('DISK SPACE WARNING')
> > 
> >         .email()
> > 
> > 
> > 
> > 
> > DOT:
> > 
> > digraph disk_alert_warn {
> > 
> > graph [throughput="38.00 points/s"];
> > 
> > 
> > 
> > 
> > stream0 [avg_exec_time_ns="0s" ];
> > 
> > stream0 -> from1 [processed="284"];
> > 
> > 
> > 
> > 
> > from1 [avg_exec_time_ns="3.9µs" ];
> > 
> > from1 -> alert2 [processed="284"];
> > 
> > 
> > 
> > 
> > alert2 [alerts_triggered="14" avg_exec_time_ns="72.33µs" 
> > crits_triggered="0" infos_triggered="0" oks_triggered="7" 
> > warns_triggered="7" ];
> > 
> > }
> > As you can see, I get 7 oks triggered (for host/path groups that are not in 
> > alert range) and 7 warns triggered (for the 7 host/path groups that are 
> > within the alert range) upon start up.
> > Then it behaves as normal.
> > 
> > 
> > I understand that it should be alerting for the 7 host/path groups that are 
> > over 80 but why follow it with an alert about the ok groups?
> > 
> > 
> > MORE INFO: When I raise the lambda to 90% (out of range for all host/paths) 
> > I get no alerts at all (which is expected)
> > 
> > 
> > Thanks to anyone who can help me understand this

-- 
Remember to include the version number!
--- 
You received this message because you are subscribed to the Google Groups 
"InfluxData" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to influxdb+unsubscr...@googlegroups.com.
To post to this group, send email to influxdb@googlegroups.com.
Visit this group at https://groups.google.com/group/influxdb.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/influxdb/bf6e4d70-5bf7-4f84-9f43-e61ca1c611b8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[influxdb] Re: kapacitor sending ok alerts as state change on start up

Reply via email to