Re: [influxdb] Re: deadman never triggers OK state

nathaniel Tue, 08 Nov 2016 07:56:35 -0800

Glad you figured it out.

Yes, the deadman is literally the same thing as that code, you can think of 
deadman kind of like a macro. If you ever need to look up that code again 
it can be found in the docs 
here https://docs.influxdata.com/kapacitor/v1.1/nodes/from_node/#deadman




On Tuesday, November 8, 2016 at 5:10:54 AM UTC-7, Julien Ammous wrote:
>
> Thanks for the full blown deadman code, it makes me realize what I was 
> doing wrong...
> I was using multiple script to inject data into my influxdb database and I 
> was using the wrong one so the fact that a critical alert was raised and 
> never resolved was correct, the metric name was close enough for me not to 
> realize it, I really feel stupid now xD
>
> Thanks for your help, is the script above really equivalent to the 
> deadman() call ? Because if that's the cause I think I will keep this one 
> since I can actually understand what it is doing.
>
> On Monday, 7 November 2016 18:54:03 UTC+1, nath...@influxdb.com wrote:
>>
>> > The test case is so basic I really don't see what I could be doing 
>> wrong...
>>
>> Agreed, what version of Kapacitor are you using? I just manually tested 
>> the deadman with the latest release and its working fine. 
>>
>> Could you try this TICKscript to helps us get to the bottom of what is 
>> going on?
>>
>> var data = stream
>>     |from()
>>         .measurement('invite_delay')
>>         .where(lambda: "host" == 'router' AND "app_name" == 
>> 'phone_tester')
>>
>> data
>>     // |deadman(1.0, 10s) is equivalent to the below code, with the 
>> exception of the |log statements
>>     |stats(10s)
>>         .align()
>>     |log()
>>         .prefix('DEADMAN RAW STATS')
>>     |derivative('emitted')
>>         .unit(10s)
>>         .nonNegative()
>>     |log()
>>         .prefix('DEADMAN STATS')
>>     |alert()
>>         .id('{{ .TaskName }}/{{ .Name }}')
>>         .crit(lambda: "emitted" <= 1.0)
>>         .stateChangesOnly()
>>         .log('/tmp/dead.log')
>>
>> data
>>     |log()
>>         .prefix('RAW DATA')
>>
>>
>> With the added log statements we should be able to determine where the 
>> breakdown is. After running this script can you share the relevant logs?
>>
>>  Thanks
>>
>>
>> On Monday, November 7, 2016 at 10:18:33 AM UTC-7, Julien Ammous wrote:
>>>
>>> I just did another test with 10s instead of 3min to make it easier with 
>>> the same result, here is what I do:
>>>
>>> - I insert a point and wait 10s, the alert is correctly raised
>>> - I insert four points and wait 10s, nothing happen
>>>
>>> The kapacitor alert endpoint confirms what I see:
>>>
>>> "alert5": {
>>>>    
>>>>    - "alerts_triggered": 1,
>>>>    - "avg_exec_time_ns": 30372,
>>>>    - "collected": 29,
>>>>    - "crits_triggered": 1,
>>>>    - "emitted": 1,
>>>>    - "infos_triggered": 0,
>>>>    - "oks_triggered": 0,
>>>>    - "warns_triggered": 0
>>>>
>>>> },
>>>>
>>>
>>> 1 critical alert was raised and no ok.
>>>
>>> The test case is so basic I really don't see what I could be doing 
>>> wrong...
>>>
>>> On 7 November 2016 at 17:28, <nath...@influxdb.com> wrote:
>>>
>>>> To answer your questions:
>>>>
>>>> Yes, the deadman should fire an OK alert. And it should do so within 
>>>> the deadman interval of the point arriving. In your case since you are 
>>>> checking on 3m intervals, if a new points arrives it should fire an OK 
>>>> alert within 3m of that point's arrival.
>>>>
>>>> As for the sources they are a bit hidden since the deadman function is 
>>>> really just syntactic sugar for a combination of nodes. Primarily deadman 
>>>> uses the stats node under the hood. See 
>>>> https://github.com/influxdata/kapacitor/blob/master/stats.go 
>>>>
>>>>
>>>> As for what might be going on in your case I have one idea. The deadman 
>>>> comparison is less than or equal to the threshold. So since you have a 
>>>> threshold of 1 then you have to send at least 2 points in 3m for the OK to 
>>>> be sent. Can you verify that at least 2 points arrived within 3m and you 
>>>> still didn't get an OK alert?
>>>>
>>>>
>>>> On Monday, November 7, 2016 at 2:28:44 AM UTC-7, Julien Ammous wrote:
>>>>>
>>>>> Hi,
>>>>> I want to have an alert raised when no data were received in the last 
>>>>> 3min but I also want the alert to be stopped as soon as new data arrived 
>>>>> again, I have been playing with deadman but I can't figure out how to 
>>>>> make 
>>>>> it save an OK state when data arrive again, here is the script:
>>>>>
>>>>> stream
>>>>> |from()
>>>>>   .measurement('invite_delay')
>>>>>   .where(lambda: "host" == 'router' AND "app_name" == 'phone_tester')
>>>>> |deadman(1.0, 3m)
>>>>>   .id('{{ .TaskName }}/{{ .Name }}')
>>>>>   .stateChangesOnly()
>>>>>   .levelField('level')
>>>>>   .IdField('id')
>>>>>   .DurationField('duration')
>>>>> |influxDBOut()
>>>>>   .database('metrics')
>>>>>   .measurement('alerts')
>>>>>   .retentionPolicy('raw')
>>>>>
>>>>>
>>>>> I get a CRITICAL alert when data have been missing for 3min, this 
>>>>> works, but if data start flowing again I get nothing, I kept it running 
>>>>> while doing something else and never got any OK for this alert :(
>>>>>
>>>>> I tried to find the source for the deadman logic but I couldn't find 
>>>>> it, I have a few questions:
>>>>> - when data are received again, is the deaman alert supposed to send 
>>>>> an OK state ?
>>>>> - if it is then when will it send it, will  it be as soon as a point 
>>>>> arrive or will there be a delay ?(let's pretend influxdbOut write the 
>>>>> alert 
>>>>> immediately for this question)
>>>>>
>>>>> Where is the dedaman logic defined in the sources ? I am not too 
>>>>> familiar with go but I searched for "Deadman" and what came up were just 
>>>>> what looked like structures and their accessors, not that useful.
>>>>>
>>>>> -- 
>>>> Remember to include the version number!
>>>> --- 
>>>> You received this message because you are subscribed to a topic in the 
>>>> Google Groups "InfluxData" group.
>>>> To unsubscribe from this topic, visit 
>>>> https://groups.google.com/d/topic/influxdb/rUm82LQd9UI/unsubscribe.
>>>> To unsubscribe from this group and all its topics, send an email to 
>>>> influxdb+u...@googlegroups.com.
>>>> To post to this group, send email to infl...@googlegroups.com.
>>>> Visit this group at https://groups.google.com/group/influxdb.
>>>> To view this discussion on the web visit 
>>>> https://groups.google.com/d/msgid/influxdb/83cc9a04-962e-4eba-9680-8a029c3e111c%40googlegroups.com
>>>>  
>>>> <https://groups.google.com/d/msgid/influxdb/83cc9a04-962e-4eba-9680-8a029c3e111c%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>>
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>
>>>

-- 
Remember to include the version number!
--- 
You received this message because you are subscribed to the Google Groups 
"InfluxData" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to influxdb+unsubscr...@googlegroups.com.
To post to this group, send email to influxdb@googlegroups.com.
Visit this group at https://groups.google.com/group/influxdb.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/influxdb/5f464bad-9353-4208-86b0-91169ee1d2ca%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [influxdb] Re: deadman never triggers OK state

Reply via email to