Hi Deb,

In your case, randomness comes from the max_features=6 setting, which
makes the model not very stable from one execution to another, since
the original dataset includes about 5x more input variables.

Gilles

On 16 September 2014 12:40, Debanjan Bhattacharyya <[email protected]> wrote:
> Thanks Arnaud
>
> random_state is not listed as a parameter on
> http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.GradientBoostingClassifier.html
> page.
> But it is listed as an argument in the constructor. Its my fault probably -
> that I did not notice it as a passable parameter. May be the documentation
> can be changed.
>
> In hind sight, and as a generic approach, if I am training without
> random_state, why and when would the boosted models vary highly ? (I have
> seen data sets where they don't) ?
> And what should be the right approach on having stable CV ? Not using
> random_state and doing several rounds of CV and averaging it ? or using
> different random_states
> and doing several rounds of CV and averaging it ?
>
> What exactly goes behind random_state from a Gradient Boosting approach ?
>
> Regards
> Deb
>
> On Tue, Sep 16, 2014 at 3:52 PM, Arnaud Joly <[email protected]> wrote:
>>
>> Hi,
>>
>>
>> To get reproducible model, you have to set the random_state.
>>
>> Best regards,
>> Arnaud
>>
>>
>> On 16 Sep 2014, at 12:08, Debanjan Bhattacharyya <[email protected]>
>> wrote:
>>
>> Hi I recently participated in the Atlas (Higgs Boson Machine Learning
>> Challenge)
>>
>> One of the models I tried was GradientBoostingClassifier. I found it
>> extremely non deterministic.
>> So if I use
>>
>> est = GradientBoostingClassifier(n_estimators=100,
>> max_depth=10,min_samples_leaf=20,max_features=6,verbose=1)
>>
>> and train several times on the same training set (full). I end up having
>> models (significantly different in size - I mean pickle output) which
>> predict differently on the same instance. The difference is on the scale of
>> 20 to 30% (so I have seen values varying between 0.7x and 0.4x) on the same
>> instance. Even the (ordering) top 20 features (out of 30) differ from model
>> to model quite significantly.
>>
>> Can someone tell me a bit more in details about this uncertainty.
>>
>> The train data set can be downloaded from here
>> https://www.kaggle.com/c/higgs-boson/data
>>
>>
>> Thanks
>>
>> Regards
>>
>>
>> ------------------------------------------------------------------------------
>> Want excitement?
>> Manually upgrade your production database.
>> When you want reliability, choose Perforce.
>> Perforce version control. Predictably reliable.
>>
>> http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk_______________________________________________
>> Scikit-learn-general mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Want excitement?
>> Manually upgrade your production database.
>> When you want reliability, choose Perforce.
>> Perforce version control. Predictably reliable.
>>
>> http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
>> _______________________________________________
>> Scikit-learn-general mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>
>
> ------------------------------------------------------------------------------
> Want excitement?
> Manually upgrade your production database.
> When you want reliability, choose Perforce.
> Perforce version control. Predictably reliable.
> http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>

------------------------------------------------------------------------------
Want excitement?
Manually upgrade your production database.
When you want reliability, choose Perforce.
Perforce version control. Predictably reliable.
http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to