Hi Deb, In your case, randomness comes from the max_features=6 setting, which makes the model not very stable from one execution to another, since the original dataset includes about 5x more input variables.
Gilles On 16 September 2014 12:40, Debanjan Bhattacharyya <[email protected]> wrote: > Thanks Arnaud > > random_state is not listed as a parameter on > http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.GradientBoostingClassifier.html > page. > But it is listed as an argument in the constructor. Its my fault probably - > that I did not notice it as a passable parameter. May be the documentation > can be changed. > > In hind sight, and as a generic approach, if I am training without > random_state, why and when would the boosted models vary highly ? (I have > seen data sets where they don't) ? > And what should be the right approach on having stable CV ? Not using > random_state and doing several rounds of CV and averaging it ? or using > different random_states > and doing several rounds of CV and averaging it ? > > What exactly goes behind random_state from a Gradient Boosting approach ? > > Regards > Deb > > On Tue, Sep 16, 2014 at 3:52 PM, Arnaud Joly <[email protected]> wrote: >> >> Hi, >> >> >> To get reproducible model, you have to set the random_state. >> >> Best regards, >> Arnaud >> >> >> On 16 Sep 2014, at 12:08, Debanjan Bhattacharyya <[email protected]> >> wrote: >> >> Hi I recently participated in the Atlas (Higgs Boson Machine Learning >> Challenge) >> >> One of the models I tried was GradientBoostingClassifier. I found it >> extremely non deterministic. >> So if I use >> >> est = GradientBoostingClassifier(n_estimators=100, >> max_depth=10,min_samples_leaf=20,max_features=6,verbose=1) >> >> and train several times on the same training set (full). I end up having >> models (significantly different in size - I mean pickle output) which >> predict differently on the same instance. The difference is on the scale of >> 20 to 30% (so I have seen values varying between 0.7x and 0.4x) on the same >> instance. Even the (ordering) top 20 features (out of 30) differ from model >> to model quite significantly. >> >> Can someone tell me a bit more in details about this uncertainty. >> >> The train data set can be downloaded from here >> https://www.kaggle.com/c/higgs-boson/data >> >> >> Thanks >> >> Regards >> >> >> ------------------------------------------------------------------------------ >> Want excitement? >> Manually upgrade your production database. >> When you want reliability, choose Perforce. >> Perforce version control. Predictably reliable. >> >> http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk_______________________________________________ >> Scikit-learn-general mailing list >> [email protected] >> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >> >> >> >> >> ------------------------------------------------------------------------------ >> Want excitement? >> Manually upgrade your production database. >> When you want reliability, choose Perforce. >> Perforce version control. Predictably reliable. >> >> http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk >> _______________________________________________ >> Scikit-learn-general mailing list >> [email protected] >> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >> > > > ------------------------------------------------------------------------------ > Want excitement? > Manually upgrade your production database. > When you want reliability, choose Perforce. > Perforce version control. Predictably reliable. > http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk > _______________________________________________ > Scikit-learn-general mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general > ------------------------------------------------------------------------------ Want excitement? Manually upgrade your production database. When you want reliability, choose Perforce. Perforce version control. Predictably reliable. http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
