hi, I've run into some poor RF behavior, although not as pronounced as you.. would be great to get more insight into this one
Thanks! On Mon, Aug 3, 2015 at 8:21 AM pkphlam <pkph...@gmail.com> wrote: > Hi, > > This might be a long shot, but has anybody run into very poor predictive > performance using RandomForest with Mllib? Here is what I'm doing: > > - Spark 1.4.1 with PySpark > - Python 3.4.2 > - ~30,000 Tweets of text > - 12289 1s and 15956 0s > - Whitespace tokenization and then hashing trick for feature selection > using > 10,000 features > - Run RF with 100 trees and maxDepth of 4 and then predict using the > features from all the 1s observations. > > So in theory, I should get predictions of close to 12289 1s (especially if > the model overfits). But I'm getting exactly 0 1s, which sounds ludicrous > to > me and makes me suspect something is wrong with my code or I'm missing > something. I notice similar behavior (although not as extreme) if I play > around with the settings. But I'm getting normal behavior with other > classifiers, so I don't think it's my setup that's the problem. > > For example: > > >>> lrm = LogisticRegressionWithSGD.train(lp, iterations=10) > >>> logit_predict = lrm.predict(predict_feat) > >>> logit_predict.sum() > 9077 > > >>> nb = NaiveBayes.train(lp) > >>> nb_predict = nb.predict(predict_feat) > >>> nb_predict.sum() > 10287.0 > > >>> rf = RandomForest.trainClassifier(lp, numClasses=2, > >>> categoricalFeaturesInfo={}, numTrees=100, seed=422) > >>> rf_predict = rf.predict(predict_feat) > >>> rf_predict.sum() > 0.0 > > This code was all run back to back so I didn't change anything in between. > Does anybody have a possible explanation for this? > > Thanks! > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Extremely-poor-predictive-performance-with-RF-in-mllib-tp24112.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > > -- *-Barak*