[influxdb] Re: Kapacitor: trouble creating alert when field in certain state for too long

nathaniel Fri, 11 Nov 2016 12:59:35 -0800

Below is a script that gets closer but still suffers from the 2nd and 3rd 
issues you mentioned. It does fix the first issue.


I'll file a github issue to see if we can change elapsed to return a 0 if 
it only receives a single point.
The third issue about buffering is going to stay. The buffering is coming 
from the union node. It needs to make sure that it emits points ordered by 
time and so it buffers points until it knows no more points can arrive. (It 
did seem to be buffering too many points when I tested it so I'll double 
check it can't be optimized). 

You have me thinking, I'll be playing around with this some more to make 
sure I haven't missed something.

var data = batch | query('select active from 
telegraf.autogen.postgresql_replication_slots')
      .period(4h)
      .every(10s)
      .groupBy('host','slot_name')

var data_last = data
    |last('active')
        .as('active')
        .usePointTimes()

var data_last_active = data
    |where(lambda: "active" == TRUE)
    |last('active')
        .as('active')
        .usePointTimes()

var data_union = data_last
    |union(data_last_active)
    |log()
        .prefix('UNION STREAM')
    |window()
        .period(10s)
        .every(10s)
        .align()
    |log()
        .prefix('UNION BATCH')

var data_elapsed = data_union
    |elapsed('active', 1s)
        .as('elapsed')
    |last('elapsed')
        .as('elapsed')
    |log()
        .prefix('ELAPSED')

var data_count = data_union
    |count('active')
        .as('count')
    |log()
        .prefix('COUNT')

data_elapsed
    |join(data_count)
        .as('elapsed', 'count')
        .fill('none')
    |log()
        .prefix('JOIN')


On Friday, November 11, 2016 at 10:34:06 AM UTC-7, patrick...@gmail.com 
wrote:
>
> I'm trying to create an alert when a specific field has been in a certain 
> state for too long.
> Currently my tick script looks like this:
>
>     var data = batch | query('select active from 
> telegraf.autogen.postgresql_replication_slots')
>       .period(4h)
>       .every(10s)
>       .groupBy('host','slot_name')
>     
>     var data_last = data|last('active').as('active')
>     var data_last_active = data|where(lambda: "active" == 
> TRUE)|last('active').as('active')
>     
>     var data_union = data_last|union(data_last_active)
>     var data_elapsed = data_union|elapsed('active',1s)|log()
>     var data_count = data_union|count('active').as('count')|log()
>     
> data_elapsed|join(data_count).as('elapsed','count').tolerance(10s).fill('none')|log()
>
> The idea is that `data_last` will be the last data point. 
> `data_last_active` will be the last data point where `active == true`. We 
> would then calculate the time difference between these 2 points. If the 
> difference is greater than X, generate an alert.
> But we also want to handle the case where `data_last_active` is empty (no 
> match within time period), so we get the count of points, which in this 
> case would be 1.
>
> However there are numerous problems with this:
> 1. `elapsed()` is including the data points from the previous batch 
> period, instead of just within the batch. So if batch.period is 60s, then 
> one of the elapsed values is going to be 60s.
> 2. `elapsed()` won't emit anything at all if there is no previous data 
> point, thus breaking the case where `data_last_active` is empty.
> 3. `count()` is buffering, and doesn't release the data points until the 
> next batch comes in.
>
>
> Here's an example of what the above generates:
>     [test:log10] 2016/11/11 12:31:09 I! 
>  
> {"Name":"postgresql_replication_slots","Database":"","RetentionPolicy":"","Group":"host=fll2gdbs01qa,slot_name=fll2gbar01stg","Dimensions":{"ByName":false,"TagNames":["host","slot_name"]},"Tags":{"host":"fll2gdbs01qa","slot_name":"fll2gbar01stg"},"Fields":{"count":2},"Time":"2016-11-11T12:30:59.785563762-05:00"}
>     [test:log10] 2016/11/11 12:31:09 I! 
>  
> {"Name":"postgresql_replication_slots","Database":"","RetentionPolicy":"","Group":"host=fll2gdbs01qa,slot_name=fll2gdbs01qa","Dimensions":{"ByName":false,"TagNames":["host","slot_name"]},"Tags":{"host":"fll2gdbs01qa","slot_name":"fll2gdbs01qa"},"Fields":{"count":1},"Time":"2016-11-11T12:30:59.785563762-05:00"}
>     [test:log8] 2016/11/11 12:31:09 I! 
>  
> {"Name":"postgresql_replication_slots","Database":"","RetentionPolicy":"","Group":"host=fll2gdbs01qa,slot_name=fll2gbar01stg","Dimensions":{"ByName":false,"TagNames":["host","slot_name"]},"Tags":{"host":"fll2gdbs01qa","slot_name":"fll2gbar01stg"},"Fields":{"elapsed":10},"Time":"2016-11-11T17:31:09.785572403Z"}
>     [test:log8] 2016/11/11 12:31:09 I! 
>  
> {"Name":"postgresql_replication_slots","Database":"","RetentionPolicy":"","Group":"host=fll2gdbs01qa,slot_name=fll2gbar01stg","Dimensions":{"ByName":false,"TagNames":["host","slot_name"]},"Tags":{"host":"fll2gdbs01qa","slot_name":"fll2gbar01stg"},"Fields":{"elapsed":0},"Time":"2016-11-11T17:31:09.785572403Z"}
>     [test:log8] 2016/11/11 12:31:09 I! 
>  
> {"Name":"postgresql_replication_slots","Database":"","RetentionPolicy":"","Group":"host=fll2gdbs01qa,slot_name=fll2gdbs01qa","Dimensions":{"ByName":false,"TagNames":["host","slot_name"]},"Tags":{"host":"fll2gdbs01qa","slot_name":"fll2gdbs01qa"},"Fields":{"elapsed":10},"Time":"2016-11-11T17:31:09.785572403Z"}
>
>
> Any suggestions to get this working?
>
> -Patrick
>
>

-- 
Remember to include the version number!
--- 
You received this message because you are subscribed to the Google Groups 
"InfluxData" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to influxdb+unsubscr...@googlegroups.com.
To post to this group, send email to influxdb@googlegroups.com.
Visit this group at https://groups.google.com/group/influxdb.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/influxdb/ab925f34-ee17-4aec-9586-5d8000a1daaf%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[influxdb] Re: Kapacitor: trouble creating alert when field in certain state for too long

Reply via email to