Hi everyone,
I have server that handles HTTP requests sending a metric to influxdb for 
each request and one of the data fields is the response code.  I wanted to 
set up a kapacitor script to alert whenever a 5xx response is generated but 
I am seeing strange behavior.

Here's my tick script:

var data = stream
    |from()
        .database('production')
        .retentionPolicy('default')
        .measurement('controller.action.count')
        .where(lambda: "status_code" =~ /^5\d\d/)
        .groupBy('component', 'controller', 'action', 'status_code')
    |window()
        .period(1m)
        .every(1m)
    |sum('value')
        .as('stat')

var alert = data
    |alert()
        .id('{{ index .Tags "component" }}::{{ index .Tags "controller" 
}}#{{ index .Tags "action" }} Error')
        .message('{{ .ID }}: {{ index .Fields "stat" }} {{ index .Tags 
"status_code" }} error(s) has occurred')
        .warn(lambda: "stat" > 0)
        .topic('controller_errors')


The very first time a [component, controller, action] serves up a 500 error 
I will get an alert on the topic (it outputs to slack), but never again for 
that combination.

The handler listening to the topic does not have "stateChangesOnly" 
specified.

Is this the right way to go about this?  Would it be better to just stream 
every entry in that measurement and have the alert's warn level be based on 
a lambda that checks if the 'status_code' is 5xx instead?

Btw, here's the output of `kapacitor show error_check`:

digraph error_check {
graph [throughput="9.00 points/s"];

stream0 [avg_exec_time_ns="0s" ];
stream0 -> from1 [processed="4431449"];

from1 [avg_exec_time_ns="58.058µs" ];
from1 -> window2 [processed="177"];

window2 [avg_exec_time_ns="22.448µs" ];
window2 -> sum3 [processed="15"];

sum3 [avg_exec_time_ns="0s" ];
sum3 -> alert4 [processed="15"];

alert4 [alerts_triggered="15" avg_exec_time_ns="55.86451ms" 
crits_triggered="0" infos_triggered="0" oks_triggered="0" 
warns_triggered="15" ];

I know from our dashboards that we've had way more than 15 5xx errors since 
that check started running.

Any advice?

-- 
Remember to include the version number!
--- 
You received this message because you are subscribed to the Google Groups 
"InfluxData" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to influxdb+unsubscr...@googlegroups.com.
To post to this group, send email to influxdb@googlegroups.com.
Visit this group at https://groups.google.com/group/influxdb.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/influxdb/4122ede2-8f13-4c22-9502-95825b229347%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to