[ 
https://issues.apache.org/jira/browse/SPARK-22867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16301072#comment-16301072
 ] 

Fangzhou Yang commented on SPARK-22867:
---------------------------------------

Currently, there is no special algorithm for novelty and outlier detection in 
MLlib. Isolation Forest is a very popular and effective algorithm for outlier 
detection, which is also included in scikit-learn. Therefore, I think it might 
be good and practical if it can be provided in Spark MLlib. 

> Add Isolation Forest algorithm to MLlib
> ---------------------------------------
>
>                 Key: SPARK-22867
>                 URL: https://issues.apache.org/jira/browse/SPARK-22867
>             Project: Spark
>          Issue Type: New Feature
>          Components: MLlib
>    Affects Versions: 2.2.1
>            Reporter: Fangzhou Yang
>
> Isolation Forest (iForest) is an effective model that focuses on anomaly 
> isolation. 
> iForest uses tree structure for modeling data, iTree isolates anomalies 
> closer to the root of the tree as compared to normal points. 
> A anomaly score is calculated by iForest model to measure the abnormality of 
> the data instances. The lower, the more abnormal.
> More details about iForest can be found in the following papers: 
> <a href="https://dl.acm.org/citation.cfm?id=1511387";>Isolation Forest</a> [1] 
> and <a href="https://dl.acm.org/citation.cfm?id=2133363";>Isolation-Based 
> Anomaly Detection</a> [2].



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to