[influxdb] Re: kapacitor sending ok alerts as state change on start up

jcmcken Mon, 12 Jun 2017 09:17:52 -0700

Any updates on this?

We're having this same problem. Restart Kapacitor or re-define the task, and we 
get spammed alerts saying everything is OK (even from hosts which never entered 
a non-OK state).


Our TICK is pretty simple (and very similar to OP):

stream
    |from()
        .database('telegraf')
        .measurement('disk')
        .groupBy('host', 'device')
    |alert()
        .warn(lambda: "used_percent" >= 80)
        .warnReset(lambda: "used_percent" < 80)
        .crit(lambda: "used_percent" >= 90)
        .critReset(lambda: "used_percent" < 90)
        .stateChangesOnly()



On Wednesday, February 22, 2017 at 7:14:23 PM UTC-5, Archie Archbold wrote:
> Interestingly enough, when I add the .noRecoveries() property to the alert 
> node I only get one DOWN alert even though there are 7 servers that are 
> within the alert range 
> 
> On Wednesday, February 22, 2017 at 11:10:09 AM UTC-8, nath...@influxdb.com 
> wrote:
> If you want to ignore the OK alerts use the `.noRecoveries` property of the 
> alert node. This will suppress the OK alerts.
> 
> On Friday, February 17, 2017 at 3:33:16 PM UTC-7, Archie Archbold wrote:
> Hey all. Pretty new to TICK but I have a problem that I can't wrap my head 
> around.
> 
> 
> I am monitoring multiple servers all sending data to one influxdb database 
> and using the 'host' tag to separate the servers in the DB
> 
> 
> My 'disk' measurement  is taking in mulitiple disk paths from the servers 
> (HOSTS) which each have a respective 'PATH' tag.
> 
> 
> So basically each server is assigned a HOST tag and each HOST has multiple 
> PATH tags.
> 
> 
> EXPECTED FUNCTIONALITY: kapacitor should alert upon state change of a HOST's 
> PATH if that path is within the alerting Lambda. 
> PROBLEM: When I start the kapacitor service, it looks like it's sensing a 
> state change any time it sees another host/path with a opposite status.
> 
> 
> This is a simplified example of the alerts I am getting:
> 
> 
> Host: host1  Path: /path1  Status: UP
> Host: host1  Path: /path2  Status: DOWN
> Host: host1  Path: /path3  Status: UP
> Host: host2  Path: /path1 Status: DOWN
> Host: host2  Path: /path2  Status: UP
> 
> 
> 
> These alerts happen once for each host/path combination and then the service 
> performs as expected, alerting properly when lambda is achieved.
> 
> 
> The result of this is that I receive a slew of up/down alerts every time I 
> restart the kapacitor service
> 
> 
> Here is my current tick:
> 
> 
> 
> var data = stream
>     |from()
>         .measurement('disk')
>         .groupBy('host','path')       
>     |alert()
>         .message('{{ .ID }} Server:{{ index .Tags "host" }} Path: {{ index 
> .Tags "path" }} USED PERCENT: {{ index .Fields "used_percent" }}')
>         .warn(lambda: "used_percent" >= 80)
>             .id('DISK SPACE WARNING')
>         .email($DISK_WARN_GRP)
> And the corresponding DOT
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> ID: disk_alert_warn
> 
> Error: 
> 
> Template: 
> 
> Type: stream
> 
> Status: enabled
> 
> Executing: true
> 
> Created: 17 Feb 17 22:27 UTC
> 
> Modified: 17 Feb 17 22:27 UTC
> 
> LastEnabled: 17 Feb 17 22:27 UTC
> 
> Databases Retention Policies: ["main"."autogen"]
> 
> TICKscript:
> 
> var data = stream
> 
>     |from()
> 
>         .measurement('disk')
> 
>         .groupBy('host', 'path')
> 
>     |alert()
> 
>         .message('{{ .ID }} Server:{{ index .Tags "host" }} Path: {{ index 
> .Tags "path" }} USED PERCENT: {{ index .Fields "used_percent" }}')
> 
>         .warn(lambda: "used_percent" >= 80)
> 
>         .id('DISK SPACE WARNING')
> 
>         .email()
> 
> 
> 
> 
> DOT:
> 
> digraph disk_alert_warn {
> 
> graph [throughput="38.00 points/s"];
> 
> 
> 
> 
> stream0 [avg_exec_time_ns="0s" ];
> 
> stream0 -> from1 [processed="284"];
> 
> 
> 
> 
> from1 [avg_exec_time_ns="3.9µs" ];
> 
> from1 -> alert2 [processed="284"];
> 
> 
> 
> 
> alert2 [alerts_triggered="14" avg_exec_time_ns="72.33µs" crits_triggered="0" 
> infos_triggered="0" oks_triggered="7" warns_triggered="7" ];
> 
> }
> As you can see, I get 7 oks triggered (for host/path groups that are not in 
> alert range) and 7 warns triggered (for the 7 host/path groups that are 
> within the alert range) upon start up.
> Then it behaves as normal.
> 
> 
> I understand that it should be alerting for the 7 host/path groups that are 
> over 80 but why follow it with an alert about the ok groups?
> 
> 
> MORE INFO: When I raise the lambda to 90% (out of range for all host/paths) I 
> get no alerts at all (which is expected)
> 
> 
> Thanks to anyone who can help me understand this

-- 
Remember to include the version number!
--- 
You received this message because you are subscribed to the Google Groups 
"InfluxData" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to influxdb+unsubscr...@googlegroups.com.
To post to this group, send email to influxdb@googlegroups.com.
Visit this group at https://groups.google.com/group/influxdb.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/influxdb/4572f9e6-6977-4084-832c-80f7e5d82795%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[influxdb] Re: kapacitor sending ok alerts as state change on start up

Reply via email to