[influxdb] Re: Understanding Morgoth

nathaniel Tue, 08 Nov 2016 08:04:11 -0800

The actual comparison is  <= which is why you received the alert. But if 
your tolerances are tight enough that <= matters over < then you are 
probably too tight on your tolerances.


I would first recommend that you tweak the sigmas value, may increase it to 
3.5 or 4. To iterate quickly on for these tests I recommend that you create 
a recording of the data set and then tweak value replay the recording check 
the results, and repeat until you have something you like. If you share 
your recording with me I would be willing to take a quick look as well. As 
it is its a little hard to give good advice based of a handful of data 
points.

On Tuesday, November 8, 2016 at 7:47:39 AM UTC-7, amith...@gmail.com wrote:
>
> On Thursday, 27 October 2016 21:46:08 UTC+5:30, nath...@influxdb.com 
>  wrote: 
> > Clarification from Amith: 
> > 
> > 
> > 
> > 
> > 
> > 
> > Hi Nathaniel, 
> > 
> > 
> > Thanks a lot for your quick reply, what is confusing for me here is how 
> morgoth calculated anomalyScore field whose value has turned out to be 
> 0.9897172236503856. And how is this being used to detect anomaly. 
> > How does this particular node function 
> > 
> > 
> > 
> > … 
> > 
> >   @morgoth() 
> >      .field(field) 
> >      .scoreField(scoreField) 
> >      .minSupport(minSupport) 
> >      .errorTolerance(errorTolerance) 
> >      .consensus(consensus) 
> >      // Configure a single Sigma fingerprinter 
> > 
> > 
> > 
> > 
> >      .sigma(sigmas). 
> > 
> > 
> > You can choose some arbitrary data to help me understand this. :) 
> > Thanks, 
> > Amith 
> > 
> > 
> > My response: 
> > 
> > 
> > The `anomalyScore` is `1 - averageSupport`, where averageSupport is the 
> average of the support values returned from each or the fingerprinters. In 
> your case you only have one fingerprinter `sigma` so using the anomalyScore 
> of ~ `0.99` we can determine that the sigma fingerprinter returned a 
> support of ~ `0.01`. Support is defined as `count / total`, where count is 
> the number of times a specific event has been seen and total is the total 
> number events seen. The support can be interpreted as a frequency 
> percentage, i.e. the most recent window has only been seen 1% of the time. 
> Since 0.01 is < 0.05 (the min support defined) an anomaly was triggered. 
> Taking this back to the anomaly score it can be interpreted that 99% of the 
> time we do not see an event like this one. 
> > 
> > 
> > Remember that Morgoth distinguishs different windows as different events 
> using the fingerprinters. In your case the sigma function is computing the 
> std deviation and mean of the windows it receives. If a window arrives that 
> is more than 3 stddevs away from the mean than it is not considered the 
> same event and is a unique event. 
> > 
> > 
> > Taking all of that and putting it together receiving an anomaly score of 
> 99% out of Morgoth for your setup can be interpreted  as: You have sent 
> several 1m windows to Morgoth. The window that triggered the anomaly event 
> is only similar to ~1% of those windows, where similar is defined as being 
> within 3 std deviations. 
> > 
> > 
> > 
> > 
> > On Thursday, October 27, 2016 at 9:30:13 AM UTC-6, nath...@influxdb.com 
> wrote: 
> > 
> > 
> > 
> > In short there are two parts to Morgoth. 
> > 
> > 
> > 1. A system that counts the frequency of different kinds of events. This 
> is the lossy counting part 
> > 2. A system that determines if a window of data is the same as an 
> existing event being tracked or something new. This is the fingerprinting 
> part. 
> > 
> > 
> > 
> > Here is a quick read through for those concepts 
> http://docs.morgoth.io/docs/detection_framework/ 
> > 
> > 
> > 
> > Its a little hard to tell if Morgoth has done anything unexpected 
> without more detail. Can you share some of the data that lead to this 
> alert, so I can talk to the specifics of what is going on? Or maybe you 
> could ask a more specific question about which part is confusing? 
> > 
> > 
> > 
> > 
> > On Thursday, October 27, 2016 at 6:47:02 AM UTC-6, amith...@gmail.com 
> wrote:Hi All, 
> > I am trying to run morgoth as a child process to kapacitor, but I am 
> failing understand how morgoth functions. Below is the sample tick script I 
> tried out of the Morgoth docs. This is generating some alerts but I am 
> unable to figure out if they are suppose to get triggered way they have. 
> Pasting a snippet out of alert as well. 
> > I basically want to understand the functioning of Morgoth through this 
> example. 
> > Alert 
> > =================================================================== 
> > { 
> > "id":"cpu:cpu=cpu-total,host=ip-10-121-48-24.ec2.internal,", 
> > "message":"cpu:cpu=cpu-total,host=ip-10-121-48-24.ec2.internal, is 
> CRITICAL", 
> > "details":"", 
> > "time":"2016-10-27T11:33:00Z", 
> > "duration":21780000000000, 
> > "level":"CRITICAL", 
> > "data":{ 
> > "series":[ 
> > { 
> > "name":"cpu", 
> > "tags":{ 
> > "cpu":"cpu-total", 
> > "host":"ip-10-121-48-24.ec2.internal" 
> > }, 
> > "columns":[ 
> > "time", 
> > "anomalyScore", 
> > "usage_guest", 
> > "usage_guest_nice", 
> > "usage_idle", 
> > "usage_iowait", 
> > "usage_irq", 
> > "usage_nice", 
> > "usage_softirq", 
> > "usage_steal", 
> > "usage_system", 
> > "usage_user" 
> > ], 
> > "values":[ 
> > [ 
> > "2016-10-27T11:33:00Z", 
> > 0.9897172236503856, 
> > 0, 
> > 0, 
> > 99.49748743708487, 
> > 0, 
> > 0, 
> > 0, 
> > 0, 
> > 0, 
> > 0.5025125628122904, 
> > 0 
> > ] 
> > =================================================================== 
> > // The measurement to analyze 
> > var measurement = 'cpu' 
> > // Optional group by dimensions 
> > var groups = [*] 
> > // Optional where filter 
> > var whereFilter = lambda: TRUE 
> > // The amount of data to window at once 
> > var window = 1m 
> > // The field to process 
> > var field = 'usage_idle' 
> > // The name for the anomaly score field 
> > var scoreField = 'anomalyScore' 
> > // The minimum support 
> > var minSupport = 0.05 
> > // The error tolerance 
> > var errorTolerance = 0.01 
> > // The consensus 
> > var consensus = 0.5 
> > // Number of sigmas allowed for normal window deviation 
> > var sigmas = 3.0 
> > stream 
> >   // Select the data we want 
> >   |from() 
> >       .measurement(measurement) 
> >       .groupBy(groups) 
> >       .where(whereFilter) 
> >   // Window the data for a certain amount of time 
> >   |window() 
> >      .period(window) 
> >      .every(window) 
> >      .align() 
> >   // Send each window to Morgoth 
> >   @morgoth() 
> >      .field(field) 
> >      .scoreField(scoreField) 
> >      .minSupport(minSupport) 
> >      .errorTolerance(errorTolerance) 
> >      .consensus(consensus) 
> >      // Configure a single Sigma fingerprinter 
> >      .sigma(sigmas) 
> >   // Morgoth returns any anomalous windows 
> >   |alert() 
> >      .details('') 
> >      .crit(lamda: TRUE) 
> >      .log('/tmp/cpu_alert.log') 
>
> Thanks a lot Nathaneil for your explanation on Morgoth, I have come back 
> with a new example and its set of alerts. I will brief on what I am trying 
> to achieve here. 
>
> Below a set of data with count of errors(eventcount) that occurred for a 
> particular errorcode out of IIS logs. I want to run Morgoth on field 
> eventcount to detect if its an anomaly. 
>
> time        app        eventcount        status        tech 
> 2016-11-07T11:31:28.261Z        "OTSI"        586        "Success"        
> "IIS" 
>
> 2016-11-07T11:32:03.254Z        "OTSI"        1        "Failure"        "IIS" 
>   
> 2016-11-07T11:33:03.243Z        "OTSI"        8        "Success"        "IIS" 
>
> 2016-11-07T11:33:23.259Z        "ANALYTICS"        158        "Success"       
>  "IIS" 
>
> 2016-11-07T11:33:23.26Z        "ANALYTICS"        24        "Failure"        
> "IIS" 
>
>
> My tickscript: 
>
> TICKscript: 
> // The measurement to analyze 
> var measurement = 'eventflow_IIS' 
>
> // The amount of data to window at once 
> var window = 1m 
>
> // The field to process 
> var field = 'eventcount' 
>
> // The name for the anomaly score field 
> var scoreField = 'anomalyScore' 
>
> // The minimum support 
> var minSupport = 0.05 
>
> // The error tolerance 
> var errorTolerance = 0.01 
>
> // The consensus 
> var consensus = 0.5 
>
> // Number of sigmas allowed for normal window deviation 
> var sigmas = 3.0 
>
> batch 
>     |query(''' 
>         SELECT * 
>         FROM "statistics"."autogen"."eventflow_IIS" 
>     ''') 
>         .period(1m) 
>         .every(1m) 
>         .groupBy(*) 
>     // |.where(lambda: TRUE) 
>     @morgoth() 
>         .field(field) 
>         .scoreField(scoreField) 
>         .minSupport(minSupport) 
>         .errorTolerance(errorTolerance) 
>         .consensus(consensus) 
>         // Configure a single Sigma fingerprinter 
>         .sigma(sigmas) 
>     // Morgoth returns any anomalous windows 
>     |alert() 
>         .details('Count is anomalous') 
>         .id('kapacitor/{{ .TaskName }}/{{ .Name }}/{{ .Group }}') 
>         .message('{{ .ID }} is at level {{ .Level }} Errorcount is:{{ 
> index .Fields "eventcount" }}') 
>         .crit(lambda: TRUE) 
>         .log('/tmp/morgothbb.log') 
>     |influxDBOut() 
>         .database('anomaly') 
>         .retentionPolicy('autogen') 
>         .flushInterval(1s) 
>         .measurement('Anomaly') 
>         // .tag('eventcount','field') 
>         // .tag('AnomalyScore','scoreField') 
>         // .tag('Time','time') 
>         // .tag('Status','status') 
>         .precision('u') 
>
> Below is the alert what it has generated pumped into a table. 
>
> time                                                                        
> anomalyScore                app        eventcount        status               
>  tech 
>
> 2016-11-08T09:34:40.169285533Z                        0.95                    
>             "OTSI"        296                "Success"        "IIS" 
>
> 2016-11-08T09:35:40.171285533Z                        0.9523809523809523      
>   "OTSI"        28                "Success"        "IIS" 
>
> 2016-11-08T09:36:40.170285533Z                        0.9545454545454546      
>   "OTSI"        12                "Success"        "IIS" 
>
> 2016-11-08T09:37:40.169285533Z                        0.9565217391304348      
>   "OTSI"        20                "Success"        "IIS" 
>
> 2016-11-08T09:38:40.170285533Z                        0.9583333333333334      
>   "OTSI"        249                "Success"        "IIS" 
>
> 2016-11-08T09:39:40.167285533Z                        0.96                    
>             "OTSI"        70                "Success"        "IIS" 
>
> 2016-11-08T09:43:00.167285533Z                        0.9615384615384616      
>   "ANALYTICS"        1        "Success"        "IIS" 
>
> 2016-11-08T09:43:40.164285533Z                        0.962962962962963       
>  "OTSI"        24                "Success"        "IIS" 
>
> 2016-11-08T09:52:00.160285533Z                        0.9642857142857143      
>   "ANALYTICS"        1        "Success"        "IIS" 
>
>
> My question is: 
>
> How to interpret the anomaly score generated here ~0.95 with the counts 
> for which Morgoth has triggered an Anomaly.Going by our earliar discussion 
> Support here turns out to be ~0.05 (1- Anomaly Score). And anomaly gets 
> triggered when (support < Min Support), so in this case it turns out 0.05 < 
> 0.05 which should not be true. But still anomaly is getting triggered 
> almost every minute. Could you please help me understand this. 
>
> Also let me know if e,M,N need to be tweaked here for this particular data 
> sample to generate meaningful alert out of it.

-- 
Remember to include the version number!
--- 
You received this message because you are subscribed to the Google Groups 
"InfluxData" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to influxdb+unsubscr...@googlegroups.com.
To post to this group, send email to influxdb@googlegroups.com.
Visit this group at https://groups.google.com/group/influxdb.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/influxdb/1041c8e4-025e-4b3b-b576-e9c97cee86ee%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[influxdb] Re: Understanding Morgoth

Reply via email to