> The test case is so basic I really don't see what I could be doing wrong...
Agreed, what version of Kapacitor are you using? I just manually tested the deadman with the latest release and its working fine. Could you try this TICKscript to helps us get to the bottom of what is going on? var data = stream |from() .measurement('invite_delay') .where(lambda: "host" == 'router' AND "app_name" == 'phone_tester') data // |deadman(1.0, 10s) is equivalent to the below code, with the exception of the |log statements |stats(10s) .align() |log() .prefix('DEADMAN RAW STATS') |derivative('emitted') .unit(10s) .nonNegative() |log() .prefix('DEADMAN STATS') |alert() .id('{{ .TaskName }}/{{ .Name }}') .crit(lambda: "emitted" <= 1.0) .stateChangesOnly() .log('/tmp/dead.log') data |log() .prefix('RAW DATA') With the added log statements we should be able to determine where the breakdown is. After running this script can you share the relevant logs? Thanks On Monday, November 7, 2016 at 10:18:33 AM UTC-7, Julien Ammous wrote: > > I just did another test with 10s instead of 3min to make it easier with > the same result, here is what I do: > > - I insert a point and wait 10s, the alert is correctly raised > - I insert four points and wait 10s, nothing happen > > The kapacitor alert endpoint confirms what I see: > > "alert5": { >> >> - "alerts_triggered": 1, >> - "avg_exec_time_ns": 30372, >> - "collected": 29, >> - "crits_triggered": 1, >> - "emitted": 1, >> - "infos_triggered": 0, >> - "oks_triggered": 0, >> - "warns_triggered": 0 >> >> }, >> > > 1 critical alert was raised and no ok. > > The test case is so basic I really don't see what I could be doing wrong... > > On 7 November 2016 at 17:28, <nath...@influxdb.com <javascript:>> wrote: > >> To answer your questions: >> >> Yes, the deadman should fire an OK alert. And it should do so within the >> deadman interval of the point arriving. In your case since you are checking >> on 3m intervals, if a new points arrives it should fire an OK alert within >> 3m of that point's arrival. >> >> As for the sources they are a bit hidden since the deadman function is >> really just syntactic sugar for a combination of nodes. Primarily deadman >> uses the stats node under the hood. See >> https://github.com/influxdata/kapacitor/blob/master/stats.go >> >> >> As for what might be going on in your case I have one idea. The deadman >> comparison is less than or equal to the threshold. So since you have a >> threshold of 1 then you have to send at least 2 points in 3m for the OK to >> be sent. Can you verify that at least 2 points arrived within 3m and you >> still didn't get an OK alert? >> >> >> On Monday, November 7, 2016 at 2:28:44 AM UTC-7, Julien Ammous wrote: >>> >>> Hi, >>> I want to have an alert raised when no data were received in the last >>> 3min but I also want the alert to be stopped as soon as new data arrived >>> again, I have been playing with deadman but I can't figure out how to make >>> it save an OK state when data arrive again, here is the script: >>> >>> stream >>> |from() >>> .measurement('invite_delay') >>> .where(lambda: "host" == 'router' AND "app_name" == 'phone_tester') >>> |deadman(1.0, 3m) >>> .id('{{ .TaskName }}/{{ .Name }}') >>> .stateChangesOnly() >>> .levelField('level') >>> .IdField('id') >>> .DurationField('duration') >>> |influxDBOut() >>> .database('metrics') >>> .measurement('alerts') >>> .retentionPolicy('raw') >>> >>> >>> I get a CRITICAL alert when data have been missing for 3min, this works, >>> but if data start flowing again I get nothing, I kept it running while >>> doing something else and never got any OK for this alert :( >>> >>> I tried to find the source for the deadman logic but I couldn't find it, >>> I have a few questions: >>> - when data are received again, is the deaman alert supposed to send an >>> OK state ? >>> - if it is then when will it send it, will it be as soon as a point >>> arrive or will there be a delay ?(let's pretend influxdbOut write the alert >>> immediately for this question) >>> >>> Where is the dedaman logic defined in the sources ? I am not too >>> familiar with go but I searched for "Deadman" and what came up were just >>> what looked like structures and their accessors, not that useful. >>> >>> -- >> Remember to include the version number! >> --- >> You received this message because you are subscribed to a topic in the >> Google Groups "InfluxData" group. >> To unsubscribe from this topic, visit >> https://groups.google.com/d/topic/influxdb/rUm82LQd9UI/unsubscribe. >> To unsubscribe from this group and all its topics, send an email to >> influxdb+u...@googlegroups.com <javascript:>. >> To post to this group, send email to infl...@googlegroups.com >> <javascript:>. >> Visit this group at https://groups.google.com/group/influxdb. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/influxdb/83cc9a04-962e-4eba-9680-8a029c3e111c%40googlegroups.com >> >> <https://groups.google.com/d/msgid/influxdb/83cc9a04-962e-4eba-9680-8a029c3e111c%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> >> For more options, visit https://groups.google.com/d/optout. >> > > -- Remember to include the version number! --- You received this message because you are subscribed to the Google Groups "InfluxData" group. To unsubscribe from this group and stop receiving emails from it, send an email to influxdb+unsubscr...@googlegroups.com. To post to this group, send email to influxdb@googlegroups.com. Visit this group at https://groups.google.com/group/influxdb. To view this discussion on the web visit https://groups.google.com/d/msgid/influxdb/b7f1312c-025a-4d75-839c-2497cde2c470%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.