Re: Anomaly Detection Files

Marek Otahal Tue, 21 Oct 2014 04:59:10 -0700

Hi Nick,

thanks for explanations.. some comments below.

On Tue, Oct 21, 2014 at 1:13 PM, Nicholas Mitri <ngmitr...@gmail.com> wrote:

>  This is not traditional spatial anomaly detection where the purpose is to
> decide if a new input pattern falls within the RANGE of previously observed
> patterns.
>

Hmm, I was unaware of such spatial anomaly definition, so if I understand
it right:
experienced: {1,2,3,101,102,103}, value 51 is normal, while 152 is
anomalous? (1000 being anomaly is ok).

I somehow don't like this definition (not sure why exactly now :)), maybe a
"distance from significant clusters in observed data" would be better
(?)(152, 51 have same anomaly score, and eg 10 has a low score).
...but if it has its uses, why not.

> Here’s a few excerpts from the wiki:
>
> "A non-temporal anomaly is defined as a combination of fields that doesn’t
> *usually* occur, independent of the history of the data.”
>
> Maybe we should update the wiki here, imho everything in CLA is dependent
of history of the data (but not on the sequential order of the data, in
this case)

>
> This formulation will produce high anomaly scores for patterns
> that haven’t been seen before even if they fall inside the cluster of older
> patterns. Essentially, it’s detecting rarity and not spatial distance.
>
True, that is how CLA anomaly works now, maybe you could generate your
training samples from uniform distribution (instead of just the edge cases)?

> Scott’s suggestion of using overlap instead is spatial anomaly detection
> in the traditional sense.
> I haven’t started testing out any code but I’d be interested in seeing if
> the SP can be used like a distance based anomaly detector. Specifically, I
> want to find out whether the spatial pattern stability can be used as
> an analog for a cluster centroid and thus compared to novel input to
> calculate anomaly.
>

I see. I think this is what you both said, so  distance based anomaly
detector = diff between active columns and columns with high weights
(commonly used)?
We could turn this around and output distance anomaly as ratio of active
columns with low weights.

My main concern with that approach is that the anomaly detector will
> produce a centroid and a threshold that is used to calculate an anomaly
> score (think of sigmoid function with the threshold as the knee). In the
> SP, the only way to achieve that is to force stability for all training
> patterns and bake in the thresholds accordingly to use for testing patterns.
>
> Cheers, Mark

-- 
Marek Otahal :o)

Re: Anomaly Detection Files

Reply via email to