[Scikit-learn-general] Unpredictability of GradientBoosting

Debanjan Bhattacharyya Tue, 16 Sep 2014 03:09:08 -0700

Hi I recently participated in the Atlas (Higgs Boson Machine Learning
Challenge)


One of the models I tried was GradientBoostingClassifier. I found it
extremely non deterministic.
So if I use

est = GradientBoostingClassifier(n_estimators=100,
max_depth=10,min_samples_leaf=20,max_features=6,verbose=1)

and train several times on the same training set (full). I end up having
models (significantly different in size - I mean pickle
output) which predict differently on the same instance. The difference is
on the scale of 20 to 30% (so I have seen values varying between 0.7x and
0.4x) on the same instance. Even the (ordering) top 20 features (out of 30)
differ from model to model quite significantly.

Can someone tell me a bit more in details about this uncertainty.

The train data set can be downloaded from here
https://www.kaggle.com/c/higgs-boson/data


Thanks

Regards

------------------------------------------------------------------------------
Want excitement?
Manually upgrade your production database.
When you want reliability, choose Perforce.
Perforce version control. Predictably reliable.
http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk

_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

[Scikit-learn-general] Unpredictability of GradientBoosting

Reply via email to