Re: Customizing K-Means for Anomaly Detection

2021-01-12 Thread Sean Owen
You could fit the k-means pipeline, get the cluster centers, create a Transformer using that info, then create a new PipelineModel including all the original elements and the new Transformer. Does that work? It's not out of the question to expose a new parameter in KMeansModel that lets you also

Customizing K-Means for Anomaly Detection

2021-01-12 Thread Artemis User
First some background: * We want to use the k-means model for anomaly detection against a multi-dimensional dataset.  The current k-means implementation in Spark is designed for clustering purpose, not exactly for anomaly detection.  Once a model is trained and pipeline is