[influxdb] Re: kapacitor sending ok alerts as state change on start up

nathaniel Wed, 22 Feb 2017 11:34:09 -0800

Hmm, I see. Could you put a log node before the alert node and share the 
logs along with the logs for triggered alerts after startup?


Also this may be a bug with the most recent alerting system. Do you get the 
same behavior if you configure slack directly in the TICKscript instead of 
via the topic handler?

On Wednesday, February 22, 2017 at 12:23:53 PM UTC-7, Archie Archbold wrote:
>
> Thanks so much for the reply. I do want the recovery alerts, but the 
> problem is that when I start kapacitor, the task sees *any* server/path 
> in an up status as a recovery of a *different* server/path's down status. 
> So if 7 server/paths are in a down status at the time of start-up I get 7 
> down alerts (expected) but they are immediately followed by 7 recovery 
> messages from different server/paths. Please let me know if I am not being 
> clear enough.
>
> On Wednesday, February 22, 2017 at 11:10:09 AM UTC-8, nath...@influxdb.com 
> wrote:
>>
>> If you want to ignore the OK alerts use the `.noRecoveries` property of 
>> the alert node. This will suppress the OK alerts.
>>
>> On Friday, February 17, 2017 at 3:33:16 PM UTC-7, Archie Archbold wrote:
>>>
>>> Hey all. Pretty new to TICK but I have a problem that I can't wrap my 
>>> head around.
>>>
>>> I am monitoring multiple servers all sending data to one influxdb 
>>> database and using the 'host' tag to separate the servers in the DB
>>>
>>> My 'disk' measurement  is taking in mulitiple disk paths from the 
>>> servers (HOSTS) which each have a respective 'PATH' tag.
>>>
>>> So basically each server is assigned a HOST tag and each HOST has 
>>> multiple PATH tags.
>>>
>>> EXPECTED FUNCTIONALITY: kapacitor should alert upon state change of a 
>>> HOST's PATH if that path is within the alerting Lambda. 
>>> PROBLEM: When I start the kapacitor service, it looks like it's sensing 
>>> a state change any time it sees another host/path with a opposite status.
>>>
>>> This is a simplified example of the alerts I am getting:
>>>
>>> Host: host1  Path: /path1  Status: UP
>>> Host: host1  Path: /path2  Status: DOWN
>>> Host: host1  Path: /path3  Status: UP
>>> Host: host2  Path: /path1 Status: DOWN
>>> Host: host2  Path: /path2  Status: UP
>>>
>>> These alerts happen once for each host/path combination and then the 
>>> service performs as expected, alerting properly when lambda is achieved.
>>>
>>> The result of this is that I receive a slew of up/down alerts every time 
>>> I restart the kapacitor service
>>>
>>> Here is my current tick:
>>> var data = stream
>>>     |from()
>>>         .measurement('disk')
>>>         .groupBy('host','path')       
>>>     |alert()
>>>         .message('{{ .ID }} Server:{{ index .Tags "host" }} Path: {{ 
>>> index .Tags "path" }} USED PERCENT: {{ index .Fields "used_percent" }}')
>>>         .warn(lambda: "used_percent" >= 80)
>>>      .id('DISK SPACE WARNING')
>>>         .email($DISK_WARN_GRP)
>>>
>>> And the corresponding DOT
>>>
>>> ID: disk_alert_warn
>>>
>>> Error: 
>>>
>>> Template: 
>>>
>>> Type: stream
>>>
>>> Status: enabled
>>>
>>> Executing: true
>>>
>>> Created: 17 Feb 17 22:27 UTC
>>>
>>> Modified: 17 Feb 17 22:27 UTC
>>>
>>> LastEnabled: 17 Feb 17 22:27 UTC
>>>
>>> Databases Retention Policies: ["main"."autogen"]
>>>
>>> TICKscript:
>>>
>>> var data = stream
>>>
>>>     |from()
>>>
>>>         .measurement('disk')
>>>
>>>         .groupBy('host', 'path')
>>>
>>>     |alert()
>>>
>>>         .message('{{ .ID }} Server:{{ index .Tags "host" }} Path: {{ 
>>> index .Tags "path" }} USED PERCENT: {{ index .Fields "used_percent" }}')
>>>
>>>         .warn(lambda: "used_percent" >= 80)
>>>
>>>         .id('DISK SPACE WARNING')
>>>
>>>         .email()
>>>
>>>
>>> DOT:
>>>
>>> digraph disk_alert_warn {
>>>
>>> graph [throughput="38.00 points/s"];
>>>
>>>
>>> stream0 [avg_exec_time_ns="0s" ];
>>>
>>> stream0 -> from1 [processed="284"];
>>>
>>>
>>> from1 [avg_exec_time_ns="3.9µs" ];
>>>
>>> from1 -> alert2 [processed="284"];
>>>
>>>
>>> alert2 [alerts_triggered="14" avg_exec_time_ns="72.33µs" 
>>> crits_triggered="0" infos_triggered="0" oks_triggered="7" 
>>> warns_triggered="7" ];
>>>
>>> }
>>>
>>> As you can see, I get 7 oks triggered (for host/path groups that are not 
>>> in alert range) and 7 warns triggered (for the 7 host/path groups that are 
>>> within the alert range) upon start up.
>>> Then it behaves as normal.
>>>
>>> I understand that it should be alerting for the 7 host/path groups that 
>>> are over 80 but why follow it with an alert about the ok groups?
>>>
>>> MORE INFO: When I raise the lambda to 90% (out of range for all 
>>> host/paths) I get no alerts at all (which is expected)
>>>
>>> Thanks to anyone who can help me understand this
>>>
>>

-- 
Remember to include the version number!
--- 
You received this message because you are subscribed to the Google Groups 
"InfluxData" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to influxdb+unsubscr...@googlegroups.com.
To post to this group, send email to influxdb@googlegroups.com.
Visit this group at https://groups.google.com/group/influxdb.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/influxdb/7f7ae3a3-a082-4802-825d-6e0f42afdd44%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[influxdb] Re: kapacitor sending ok alerts as state change on start up

Reply via email to