hi,
I've run into some poor RF behavior, although not as pronounced as you..
would be great to get more insight into this one

Thanks!

On Mon, Aug 3, 2015 at 8:21 AM pkphlam <pkph...@gmail.com> wrote:

> Hi,
>
> This might be a long shot, but has anybody run into very poor predictive
> performance using RandomForest with Mllib? Here is what I'm doing:
>
> - Spark 1.4.1 with PySpark
> - Python 3.4.2
> - ~30,000 Tweets of text
> - 12289 1s and 15956 0s
> - Whitespace tokenization and then hashing trick for feature selection
> using
> 10,000 features
> - Run RF with 100 trees and maxDepth of 4 and then predict using the
> features from all the 1s observations.
>
> So in theory, I should get predictions of close to 12289 1s (especially if
> the model overfits). But I'm getting exactly 0 1s, which sounds ludicrous
> to
> me and makes me suspect something is wrong with my code or I'm missing
> something. I notice similar behavior (although not as extreme) if I play
> around with the settings. But I'm getting normal behavior with other
> classifiers, so I don't think it's my setup that's the problem.
>
> For example:
>
> >>> lrm = LogisticRegressionWithSGD.train(lp, iterations=10)
> >>> logit_predict = lrm.predict(predict_feat)
> >>> logit_predict.sum()
> 9077
>
> >>> nb = NaiveBayes.train(lp)
> >>> nb_predict = nb.predict(predict_feat)
> >>> nb_predict.sum()
> 10287.0
>
> >>> rf = RandomForest.trainClassifier(lp, numClasses=2,
> >>> categoricalFeaturesInfo={}, numTrees=100, seed=422)
> >>> rf_predict = rf.predict(predict_feat)
> >>> rf_predict.sum()
> 0.0
>
> This code was all run back to back so I didn't change anything in between.
> Does anybody have a possible explanation for this?
>
> Thanks!
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Extremely-poor-predictive-performance-with-RF-in-mllib-tp24112.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
> --
*-Barak*

Reply via email to