Hi everyone, 
I have two questions regarding the random forest implementation in mllib
1- maxBins: Say the value of a feature is between [0,100]. In my dataset there 
are a lot of data points between [0,10] and one datapoint at 100 and nothing 
between (10, 100). I am wondering how does the binning work in this case? I 
obviously don't want all my points that are in between [0,10] to fall into the 
same bin and other bins to be empty.  would mllib do any smart reallocation of 
bins such that each bin gets some datapoints in them and one bin does not get 
all the datapoints?
2- Is there any way to do this in Spark? 
http://stats.stackexchange.com/questions/165062/incorporating-the-confidence-in-the-training-data-into-the-ml-model
Thanks a lotMark

Reply via email to