Hi Mark, Retraining sounds like a really good idea!
I could set up an application for anomaly detection on isolated metrics and mirror the raw metrics as well as the corresponding anomaly probability scores to a database. The users then carry responsibility to trace down the cause of any anomalies (RCA or something similar) and classify them. If an anomaly is traced to a cause that could impact the monitored system, the raw metrics records that correspond to that anomaly can be filtered out, after which the HTM should be retrained on the resulting data file. I'm hoping that the resulting gap in the raw data records doesn't influence the temporal pooler. If that is the case I need to figure out how to compensate for that without compromising anomaly detection quality. A big drawback to this is that it would probably require at least one redundantly operating HTM to continue anomaly detection while retraining occurs. As long as the machine that's running the HTM can manage to retrain it in a reasonable time(<= 8 hours?), it should not have an impact on anomaly detection quality. The maximum amount of records in the data file can be based on that limit. The minimum I think is at least a couple of weeks worth of "normal" data, assuming the metrics are somehow linked to a typical business environment. It heavily relies on the users' ability to adequately classify anomalies, and I can't predict if the extra human effort in classifying anomalies corresponds to an increase in anomaly detection quality. I think it definitely beats a setup with disabled HTM learning in an environment that is subject to frequent change and noise. regards, Casper Rooker [email protected] On Thu, Oct 15, 2015 at 1:01 PM, Marek Otahal <[email protected]> wrote: > Hi Casper, > > > On Thu, Oct 15, 2015 at 12:08 PM, Cas <[email protected]> wrote: > >> Thanks Alex! >> >> I'm Casper, not Mark, by the way :) >> >> Thanks for taking the time to make those plots. I certainly see the >> advantage of notification based on the HTM algorithm over conventional >> threshold monitoring, in terms of mitigating useless notifications. I'm >> hopeful that NuPIC can detect anomalies earlier than a person would >> identify a problem. >> >> I'll be looking for a quick and easy way to set up an application similar >> to Grok in my own environment. I'll certainly look into using NAB to see >> how well it performs. >> >> What keeps bothering me is the problem that HTM is always learning, so >> the probability score of *recurring* anomalies will decrease every time >> until it falls below the notification threshold. >> > I think this is a use-case specific question, you have to decide how your > system should react, if you can have a "normal state", even evolving, > human-evaluated at a later point, you could learn, disable learning & > detect anomalies, review your past week -> if "normal" retrain on the > normal parts. > > Similar approach is running 2 models at a time, 1 with disabled learning, > 2nd learning, detect anomalies from both combined. > > Problem is how you want to react to a periodically recurring "anomaly" in > streaming data? It is an anomaly first, then lowers and if it is periodic > it would become part of the predicted state. > > Cheers, Mark > > >> If it concerned an anomaly that is easily recognizable for a person this >> would not be a problem, but it mitigates the power of the algorithm to >> detect anomalies too subtle for people to detect. Classifying a set of >> desirable and/or undesirable behaviors would counteract this, but is that >> even possible at this point? In the presentation that Matt linked, I think >> mr. Subutai mentioned (https://youtu.be/nVCKjZWYavM?t=1190) that you >> would have to tweak the data stream based on what you want HTM to learn >> from it, does that relate to this problem? >> >> kind regards, >> >> Casper Rooker >> [email protected] >> >> On Wed, Oct 14, 2015 at 7:47 PM, Alex Lavin <[email protected]> wrote: >> >>> Hi Mark, >>> I'd like to point you to NAB [1], our benchmark for anomaly detection in >>> streaming data. Included in the corpus are 17 data files representing a >>> variety of server metrics, where we specifically selected these files for >>> NAB because they test detectors for the problems you described. >>> >>> I’ve plotted a few examples you may be interested in [2-4], where the >>> red dots represent the starting point of true anomalies, and the diamonds >>> mark detections by the HTM anomaly detection algorithm (green and red are >>> true and false positives, respectively). >>> >>> On your previous questions... >>> - We typically say HTM needs 1000 data instances to sufficiently learn >>> the temporal patterns such that it can start reliably making predictions >>> (and anomaly detections). You'll notice the anomaly scores are relatively >>> high at the beginning of a data stream, but settle down after HTM has >>> learned the sequences well. >>> - A very noisy stream will result in FP detections, but this is true of >>> any anomaly detection algorithm. To decrease the number of false positives, >>> you can increase the threshold on the anomaly likelihood. That is, fewer >>> data points will be flagged as anomalous, but this may come at the cost of >>> an increase in false negatives. >>> - The temporal memory has a large capacity for storing patterns of >>> sequences, so this depends on what you mean by "prolonged use". The anomaly >>> likelihood estimation uses several parameters [5] related to how much >>> previous data is used to reestimate the distribution, but tweaking these >>> generally has little effect on the resulting detections. >>> >>> [1] https://github.com/numenta/NAB >>> [2] >>> https://plot.ly/~alavin/3151/anomaly-detections-for-realawscloudwatchec2-cpu-utilization-5f5533csv/ >>> [3] >>> https://plot.ly/~alavin/3187/anomaly-detections-for-realawscloudwatchelb-request-count-8c0756csv/ >>> [4] >>> https://plot.ly/~alavin/3199/anomaly-detections-for-realawscloudwatchrds-cpu-utilization-e47b3bcsv/ >>> [5] >>> https://github.com/numenta/nupic/blob/master/src/nupic/algorithms/anomaly_likelihood.py#L84-106 >>> >>> Cheers, >>> Alex >>> >> >> > > > -- > Marek Otahal :o) >
