yuhao yang created SPARK-12875: ---------------------------------- Summary: Add Weight of Evidence and Information value to Spark.ml as a feature transformer Key: SPARK-12875 URL: https://issues.apache.org/jira/browse/SPARK-12875 Project: Spark Issue Type: New Feature Components: ML Reporter: yuhao yang Priority: Minor
As a feature transformer, WOE and IV enable one to: Consider each variable’s independent contribution to the outcome. Detect linear and non-linear relationships. Rank variables in terms of "univariate" predictive strength. Visualize the correlations between the predictive variables and the binary outcome. http://multithreaded.stitchfix.com/blog/2015/08/13/weight-of-evidence/ gives a good introduction to WoE and IV. The Weight of Evidence or WoE value provides a measure of how well a grouping of feature is able to distinguish between a binary response (e.g. "good" versus "bad"), which is widely used in grouping continuous feature or mapping categorical features to continuous values. It is computed from the basic odds ratio: (Distribution of positive Outcomes) / (Distribution of negative Outcomes) where Distr refers to the proportion of positive or negative in the respective group, relative to the column totals. The WoE recoding of features is particularly well suited for subsequent modeling using Logistic Regression or MLP. In addition, the information value or IV can be computed based on WoE, which is a popular technique to select variables in a predictive model. TODO: Currently we support only calculation for categorical features. Add an estimator to estimate the proper grouping for continuous feature. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org