[ https://issues.apache.org/jira/browse/SPARK-12875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Apache Spark reassigned SPARK-12875: ------------------------------------ Assignee: (was: Apache Spark) > Add Weight of Evidence and Information value to Spark.ml as a feature > transformer > --------------------------------------------------------------------------------- > > Key: SPARK-12875 > URL: https://issues.apache.org/jira/browse/SPARK-12875 > Project: Spark > Issue Type: New Feature > Components: ML > Reporter: yuhao yang > Priority: Minor > > As a feature transformer, WOE and IV enable one to: > Consider each variable’s independent contribution to the outcome. > Detect linear and non-linear relationships. > Rank variables in terms of "univariate" predictive strength. > Visualize the correlations between the predictive variables and the binary > outcome. > http://multithreaded.stitchfix.com/blog/2015/08/13/weight-of-evidence/ gives > a good introduction to WoE and IV. > The Weight of Evidence or WoE value provides a measure of how well a > grouping of feature is able to distinguish between a binary response (e.g. > "good" versus "bad"), which is widely used in grouping continuous feature or > mapping categorical features to continuous values. It is computed from the > basic odds ratio: > (Distribution of positive Outcomes) / (Distribution of negative Outcomes) > where Distr refers to the proportion of positive or negative in the > respective group, relative to the column totals. > The WoE recoding of features is particularly well suited for subsequent > modeling using Logistic Regression or MLP. > In addition, the information value or IV can be computed based on WoE, which > is a popular technique to select variables in a predictive model. > TODO: Currently we support only calculation for categorical features. Add an > estimator to estimate the proper grouping for continuous feature. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org