[mllib] Random forest maxBins and confidence in training points

Mark Alen Tue, 18 Aug 2015 16:59:10 -0700

Hi everyone, 
I have two questions regarding the random forest implementation in mllib
1- maxBins: Say the value of a feature is between [0,100]. In my dataset there 
are a lot of data points between [0,10] and one datapoint at 100 and nothing 
between (10, 100). I am wondering how does the binning work in this case? I 
obviously don't want all my points that are in between [0,10] to fall into the 
same bin and other bins to be empty.  would mllib do any smart reallocation of 
bins such that each bin gets some datapoints in them and one bin does not get 
all the datapoints?
2- Is there any way to do this in Spark? 
http://stats.stackexchange.com/questions/165062/incorporating-the-confidence-in-the-training-data-into-the-ml-model
Thanks a lotMark

[mllib] Random forest maxBins and confidence in training points

Reply via email to