yuhao yang created SPARK-12875:
----------------------------------

             Summary: Add Weight of Evidence and Information value to Spark.ml 
as a feature transformer
                 Key: SPARK-12875
                 URL: https://issues.apache.org/jira/browse/SPARK-12875
             Project: Spark
          Issue Type: New Feature
          Components: ML
            Reporter: yuhao yang
            Priority: Minor


As a feature transformer, WOE and IV enable one to:

Consider each variable’s independent contribution to the outcome.
Detect linear and non-linear relationships.
Rank variables in terms of "univariate" predictive strength.
Visualize the correlations between the predictive variables and the binary 
outcome.

http://multithreaded.stitchfix.com/blog/2015/08/13/weight-of-evidence/ gives a 
good introduction to WoE and IV.

 The Weight of Evidence or WoE value provides a measure of how well a grouping 
of feature is able to distinguish between a binary response (e.g. "good" versus 
"bad"), which is widely used in grouping continuous feature or mapping 
categorical features to continuous values. It is computed from the basic odds 
ratio:
(Distribution of positive Outcomes) / (Distribution of negative Outcomes)
where Distr refers to the proportion of positive or negative in the respective 
group, relative to the column totals.

The WoE recoding of features is particularly well suited for subsequent 
modeling using Logistic Regression or MLP.

In addition, the information value or IV can be computed based on WoE, which is 
a popular technique to select variables in a predictive model.

TODO: Currently we support only calculation for categorical features. Add an 
estimator to estimate the proper grouping for continuous feature. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to