[ https://issues.apache.org/jira/browse/SPARK-22867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16301072#comment-16301072 ]
Fangzhou Yang commented on SPARK-22867: --------------------------------------- Currently, there is no special algorithm for novelty and outlier detection in MLlib. Isolation Forest is a very popular and effective algorithm for outlier detection, which is also included in scikit-learn. Therefore, I think it might be good and practical if it can be provided in Spark MLlib. > Add Isolation Forest algorithm to MLlib > --------------------------------------- > > Key: SPARK-22867 > URL: https://issues.apache.org/jira/browse/SPARK-22867 > Project: Spark > Issue Type: New Feature > Components: MLlib > Affects Versions: 2.2.1 > Reporter: Fangzhou Yang > > Isolation Forest (iForest) is an effective model that focuses on anomaly > isolation. > iForest uses tree structure for modeling data, iTree isolates anomalies > closer to the root of the tree as compared to normal points. > A anomaly score is calculated by iForest model to measure the abnormality of > the data instances. The lower, the more abnormal. > More details about iForest can be found in the following papers: > <a href="https://dl.acm.org/citation.cfm?id=1511387">Isolation Forest</a> [1] > and <a href="https://dl.acm.org/citation.cfm?id=2133363">Isolation-Based > Anomaly Detection</a> [2]. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org