Glad you figured it out. Yes, the deadman is literally the same thing as that code, you can think of deadman kind of like a macro. If you ever need to look up that code again it can be found in the docs here https://docs.influxdata.com/kapacitor/v1.1/nodes/from_node/#deadman
On Tuesday, November 8, 2016 at 5:10:54 AM UTC-7, Julien Ammous wrote: > > Thanks for the full blown deadman code, it makes me realize what I was > doing wrong... > I was using multiple script to inject data into my influxdb database and I > was using the wrong one so the fact that a critical alert was raised and > never resolved was correct, the metric name was close enough for me not to > realize it, I really feel stupid now xD > > Thanks for your help, is the script above really equivalent to the > deadman() call ? Because if that's the cause I think I will keep this one > since I can actually understand what it is doing. > > On Monday, 7 November 2016 18:54:03 UTC+1, nath...@influxdb.com wrote: >> >> > The test case is so basic I really don't see what I could be doing >> wrong... >> >> Agreed, what version of Kapacitor are you using? I just manually tested >> the deadman with the latest release and its working fine. >> >> Could you try this TICKscript to helps us get to the bottom of what is >> going on? >> >> var data = stream >> |from() >> .measurement('invite_delay') >> .where(lambda: "host" == 'router' AND "app_name" == >> 'phone_tester') >> >> data >> // |deadman(1.0, 10s) is equivalent to the below code, with the >> exception of the |log statements >> |stats(10s) >> .align() >> |log() >> .prefix('DEADMAN RAW STATS') >> |derivative('emitted') >> .unit(10s) >> .nonNegative() >> |log() >> .prefix('DEADMAN STATS') >> |alert() >> .id('{{ .TaskName }}/{{ .Name }}') >> .crit(lambda: "emitted" <= 1.0) >> .stateChangesOnly() >> .log('/tmp/dead.log') >> >> data >> |log() >> .prefix('RAW DATA') >> >> >> With the added log statements we should be able to determine where the >> breakdown is. After running this script can you share the relevant logs? >> >> Thanks >> >> >> On Monday, November 7, 2016 at 10:18:33 AM UTC-7, Julien Ammous wrote: >>> >>> I just did another test with 10s instead of 3min to make it easier with >>> the same result, here is what I do: >>> >>> - I insert a point and wait 10s, the alert is correctly raised >>> - I insert four points and wait 10s, nothing happen >>> >>> The kapacitor alert endpoint confirms what I see: >>> >>> "alert5": { >>>> >>>> - "alerts_triggered": 1, >>>> - "avg_exec_time_ns": 30372, >>>> - "collected": 29, >>>> - "crits_triggered": 1, >>>> - "emitted": 1, >>>> - "infos_triggered": 0, >>>> - "oks_triggered": 0, >>>> - "warns_triggered": 0 >>>> >>>> }, >>>> >>> >>> 1 critical alert was raised and no ok. >>> >>> The test case is so basic I really don't see what I could be doing >>> wrong... >>> >>> On 7 November 2016 at 17:28, <nath...@influxdb.com> wrote: >>> >>>> To answer your questions: >>>> >>>> Yes, the deadman should fire an OK alert. And it should do so within >>>> the deadman interval of the point arriving. In your case since you are >>>> checking on 3m intervals, if a new points arrives it should fire an OK >>>> alert within 3m of that point's arrival. >>>> >>>> As for the sources they are a bit hidden since the deadman function is >>>> really just syntactic sugar for a combination of nodes. Primarily deadman >>>> uses the stats node under the hood. See >>>> https://github.com/influxdata/kapacitor/blob/master/stats.go >>>> >>>> >>>> As for what might be going on in your case I have one idea. The deadman >>>> comparison is less than or equal to the threshold. So since you have a >>>> threshold of 1 then you have to send at least 2 points in 3m for the OK to >>>> be sent. Can you verify that at least 2 points arrived within 3m and you >>>> still didn't get an OK alert? >>>> >>>> >>>> On Monday, November 7, 2016 at 2:28:44 AM UTC-7, Julien Ammous wrote: >>>>> >>>>> Hi, >>>>> I want to have an alert raised when no data were received in the last >>>>> 3min but I also want the alert to be stopped as soon as new data arrived >>>>> again, I have been playing with deadman but I can't figure out how to >>>>> make >>>>> it save an OK state when data arrive again, here is the script: >>>>> >>>>> stream >>>>> |from() >>>>> .measurement('invite_delay') >>>>> .where(lambda: "host" == 'router' AND "app_name" == 'phone_tester') >>>>> |deadman(1.0, 3m) >>>>> .id('{{ .TaskName }}/{{ .Name }}') >>>>> .stateChangesOnly() >>>>> .levelField('level') >>>>> .IdField('id') >>>>> .DurationField('duration') >>>>> |influxDBOut() >>>>> .database('metrics') >>>>> .measurement('alerts') >>>>> .retentionPolicy('raw') >>>>> >>>>> >>>>> I get a CRITICAL alert when data have been missing for 3min, this >>>>> works, but if data start flowing again I get nothing, I kept it running >>>>> while doing something else and never got any OK for this alert :( >>>>> >>>>> I tried to find the source for the deadman logic but I couldn't find >>>>> it, I have a few questions: >>>>> - when data are received again, is the deaman alert supposed to send >>>>> an OK state ? >>>>> - if it is then when will it send it, will it be as soon as a point >>>>> arrive or will there be a delay ?(let's pretend influxdbOut write the >>>>> alert >>>>> immediately for this question) >>>>> >>>>> Where is the dedaman logic defined in the sources ? I am not too >>>>> familiar with go but I searched for "Deadman" and what came up were just >>>>> what looked like structures and their accessors, not that useful. >>>>> >>>>> -- >>>> Remember to include the version number! >>>> --- >>>> You received this message because you are subscribed to a topic in the >>>> Google Groups "InfluxData" group. >>>> To unsubscribe from this topic, visit >>>> https://groups.google.com/d/topic/influxdb/rUm82LQd9UI/unsubscribe. >>>> To unsubscribe from this group and all its topics, send an email to >>>> influxdb+u...@googlegroups.com. >>>> To post to this group, send email to infl...@googlegroups.com. >>>> Visit this group at https://groups.google.com/group/influxdb. >>>> To view this discussion on the web visit >>>> https://groups.google.com/d/msgid/influxdb/83cc9a04-962e-4eba-9680-8a029c3e111c%40googlegroups.com >>>> >>>> <https://groups.google.com/d/msgid/influxdb/83cc9a04-962e-4eba-9680-8a029c3e111c%40googlegroups.com?utm_medium=email&utm_source=footer> >>>> . >>>> >>>> For more options, visit https://groups.google.com/d/optout. >>>> >>> >>> -- Remember to include the version number! --- You received this message because you are subscribed to the Google Groups "InfluxData" group. To unsubscribe from this group and stop receiving emails from it, send an email to influxdb+unsubscr...@googlegroups.com. To post to this group, send email to influxdb@googlegroups.com. Visit this group at https://groups.google.com/group/influxdb. To view this discussion on the web visit https://groups.google.com/d/msgid/influxdb/5f464bad-9353-4208-86b0-91169ee1d2ca%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.