Re: [Scikit-learn-general] Sparse Gradient Boosting & Fully Corrective Gradient Boosting

Mathieu Blondel Sat, 20 Sep 2014 08:05:28 -0700

I read the RGF paper. Interesting method but definitely too early to
include it in scikit-learn (we focus on mature algorithms). This also looks
more complicated to implement than gradient boosting, since tree induction
and boosting are interleaved.

The paper also clarified what "fully corrective" means, thanks. To
summarize, here are different strategies for setting the weights in
gradient boosting:
1. using the same constant value (`learning_rate`) for all estimators
2. setting the weight of the last estimator by line search (other weights
are kept fixed)
3. setting one separate weight per leaf node of the last estimator, by line
search
4. refitting all estimators weights (including the past ones)
5. refitting all leaf nodes of all estimators?

Some authors [1] argued that option 1 is sufficient in practice to get good
performance since our goal is to do well on test data, not training data.
Option 3 is what scikit-learn implements. Option 4 is the fully corrective
case. It  basically amounts to a least squares for square loss or to
logisic regression for log loss (using each estimator predictions as
features). One thing though is that our implementation doesn't store the
estimator weights explicitly (leaves are updated directly). Setting a
penalty (l1 or l2) on the estimator weights (i.e., on the functional)
should prevent overfitting. Option 5 sounds pretty computationally
expensive, although the update doesn't need to be done at every iteration.

I recently re-implemented gradient boosting [2]. One difference in my
implementation is that it is possible to use any base learner (not just
trees). So my implementation stores estimator weights explicitly and uses
option 2 above. The fully corrective updates (option 4) might be easier to
add to my implementation.

BTW, the notion of fully corrective updates is also present in the matching
pursuit (-> orthogonal matching pursuit) and frank-wolfe literatures.

Mathieu

[1] "Boosting Algorithms: Regularization, Prediction and Model Fitting",
Peter B ̈uhlmann and Torsten Hothorn (thanks to Peter for telling me about
this paper)

[2]
https://github.com/mblondel/ivalice/blob/master/ivalice/impl/gradient_boosting.py

Mathieu

On Wed, Sep 17, 2014 at 4:02 AM, c TAKES <[email protected]> wrote:

> yes - In fact my real goal is to implement RGF ultimately, though I had
> considered building/forking off the current GradientBoostingRegressor
> package as a starting point A) b/c I'm new to developing for scikit-learn
> and B) to maintain as much consistency as possible with the rest of the
> package.
>
> Upon further review though, I don't think there's much point in adding
> fully corrective updates to GBR because without the regularization (the
> rest of RGF) it is probably useless and leads to over fitting, as per
> http://stat.rutgers.edu/home/tzhang/software/rgf/tpami14-rgf.pdf.
>
> So it would likely be more useful to go ahead and create RGF as an
> entirely separate module.  But that could take some time, of course.
>
> Is anyone working on RGF for sklearn?
>
> Arnaud, thanks for directing me to your sparse implementation, I will have
> a look!  It would certainly be excellent to have this work for all decision
> tree algorithms.
>
> Ken
>
>
>
>
>
>
>
> On Tue, Sep 16, 2014 at 11:16 AM, Peter Prettenhofer <
> [email protected]> wrote:
>
>> The only reference I know is the Regularized Greedy Forest paper by
>> Johnson and Zhang [1]
>> I havent read the primary source (by Zhang as well).
>>
>> [1] http://arxiv.org/abs/1109.0887
>>
>> 2014-09-16 15:15 GMT+02:00 Mathieu Blondel <[email protected]>:
>>
>>> Could you give a reference for gradient boosting with fully corrective
>>> updates?
>>>
>>> Since the philosophy of gradient boosting is to fit each tree against
>>> the residuals (or negative gradient) so far, I am wondering how such fully
>>> corrective update would work...
>>>
>>> Mathieu
>>>
>>> On Tue, Sep 16, 2014 at 9:16 AM, c TAKES <[email protected]> wrote:
>>>
>>>> Is anyone working on making Gradient Boosting Regressor work with
>>>> sparse matrices?
>>>>
>>>> Or is anyone working on adding an option for fully corrective gradient
>>>> boosting, I.E. all trees in the ensemble are re-weighted at each iteration?
>>>>
>>>> These are things I would like to see and may be able to help with if no
>>>> one is currently working on them.
>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>> Want excitement?
>>>> Manually upgrade your production database.
>>>> When you want reliability, choose Perforce.
>>>> Perforce version control. Predictably reliable.
>>>>
>>>> http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
>>>> _______________________________________________
>>>> Scikit-learn-general mailing list
>>>> [email protected]
>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>>
>>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Want excitement?
>>> Manually upgrade your production database.
>>> When you want reliability, choose Perforce.
>>> Perforce version control. Predictably reliable.
>>>
>>> http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
>>> _______________________________________________
>>> Scikit-learn-general mailing list
>>> [email protected]
>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>
>>>
>>
>>
>> --
>> Peter Prettenhofer
>>
>>
>> ------------------------------------------------------------------------------
>> Want excitement?
>> Manually upgrade your production database.
>> When you want reliability, choose Perforce.
>> Perforce version control. Predictably reliable.
>>
>> http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
>> _______________________________________________
>> Scikit-learn-general mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>

------------------------------------------------------------------------------
Slashdot TV.  Video for Nerds.  Stuff that Matters.
http://pubads.g.doubleclick.net/gampad/clk?id=160591471&iu=/4140/ostg.clktrk

_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] Sparse Gradient Boosting & Fully Corrective Gradient Boosting

Reply via email to