[ 
https://issues.apache.org/jira/browse/SPARK-9478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15347138#comment-15347138
 ] 

Seth Hendrickson commented on SPARK-9478:
-----------------------------------------

[~mengxr] Thanks for your feedback. Originally I did not implement a change to 
the sampling semantics, though after some thought it does not seem entirely 
correct to only apply the sampling weights after bagging. I checked 
scikit-learn and they do not use weighted sampling (instead applying weights 
after taking uniform samples), but I think we should implement the weighted 
sampling assuming it can fit into the current Spark abstractions.

>From my understanding, it is reasonable to use the Poisson distribution as an 
>approximation to the Multinomial sampling. Currently, we approximate binomial 
>sampling using a Poisson sampler with constant mean. To implement weighted 
>sampling with replacement, we can use a Poisson sampler with mean parameter 
>proportional to the sample weight - is that correct? We could use the 
>{{RandomDataGenerator}} class in StratifiedSamplingUtils, which maintains a 
>cache of Poisson sampling functions. I am not an expert in sampling algorithms 
>so I really appreciate your thoughts on this. 

> Add class weights to Random Forest
> ----------------------------------
>
>                 Key: SPARK-9478
>                 URL: https://issues.apache.org/jira/browse/SPARK-9478
>             Project: Spark
>          Issue Type: Improvement
>          Components: ML, MLlib
>    Affects Versions: 1.4.1
>            Reporter: Patrick Crenshaw
>
> Currently, this implementation of random forest does not support class 
> weights. Class weights are important when there is imbalanced training data 
> or the evaluation metric of a classifier is imbalanced (e.g. true positive 
> rate at some false positive threshold). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to