Hi I recently participated in the Atlas (Higgs Boson Machine Learning
Challenge)
One of the models I tried was GradientBoostingClassifier. I found it
extremely non deterministic.
So if I use
est = GradientBoostingClassifier(n_estimators=100,
max_depth=10,min_samples_leaf=20,max_features=6,verbose=1)
and train several times on the same training set (full). I end up having
models (significantly different in size - I mean pickle
output) which predict differently on the same instance. The difference is
on the scale of 20 to 30% (so I have seen values varying between 0.7x and
0.4x) on the same instance. Even the (ordering) top 20 features (out of 30)
differ from model to model quite significantly.
Can someone tell me a bit more in details about this uncertainty.
The train data set can be downloaded from here
https://www.kaggle.com/c/higgs-boson/data
Thanks
Regards
------------------------------------------------------------------------------
Want excitement?
Manually upgrade your production database.
When you want reliability, choose Perforce.
Perforce version control. Predictably reliable.
http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general