[ https://issues.apache.org/jira/browse/SPARK-9478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15347138#comment-15347138 ]
Seth Hendrickson commented on SPARK-9478: ----------------------------------------- [~mengxr] Thanks for your feedback. Originally I did not implement a change to the sampling semantics, though after some thought it does not seem entirely correct to only apply the sampling weights after bagging. I checked scikit-learn and they do not use weighted sampling (instead applying weights after taking uniform samples), but I think we should implement the weighted sampling assuming it can fit into the current Spark abstractions. >From my understanding, it is reasonable to use the Poisson distribution as an >approximation to the Multinomial sampling. Currently, we approximate binomial >sampling using a Poisson sampler with constant mean. To implement weighted >sampling with replacement, we can use a Poisson sampler with mean parameter >proportional to the sample weight - is that correct? We could use the >{{RandomDataGenerator}} class in StratifiedSamplingUtils, which maintains a >cache of Poisson sampling functions. I am not an expert in sampling algorithms >so I really appreciate your thoughts on this. > Add class weights to Random Forest > ---------------------------------- > > Key: SPARK-9478 > URL: https://issues.apache.org/jira/browse/SPARK-9478 > Project: Spark > Issue Type: Improvement > Components: ML, MLlib > Affects Versions: 1.4.1 > Reporter: Patrick Crenshaw > > Currently, this implementation of random forest does not support class > weights. Class weights are important when there is imbalanced training data > or the evaluation metric of a classifier is imbalanced (e.g. true positive > rate at some false positive threshold). -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org