Hi Jacques,
very exciting -- this was on my wish list for quite a while.
maybe we should start creating a PR upfront so that we can discuss things
there -- better than using the mailing list (quite a lot of traffic
already).
The most important part of adding lambdaMart to sklearn is fleshing out an
API for "learning to rank" problems (ie we need to group samples by "query
id") -- based on past experience this will take a while ;-) .
We should sync with Mathieu, Olivier, and Fabian -- if I remember
correctly, we have discussed this a while ago.
I've been reading through the GBM code lately to look at their best-first
tree building heuristic (again) -- we can definitely share experience there
-- source code is sometimes a bit verbose...
We should definitely take a look at Ranklib -- seems like its doing pretty
well here [1]. Otherwise, I too bench against gbm since its IMHO the
reference implementation of GBRT and a pretty good one as well. IMHO part
of the success of certain ML methods stems from the availability of high
quality implementations -- gbm definitely counts for one, libsvm/liblinear
too.
[1]
http://www.kaggle.com/c/expedia-personalized-sort/forums/t/6228/my-approach
best,
Peter
PS: Lucas Eustaquio pointed me to a python lambdaMart implementation that
uses sklear.tree.DecisionTreeRegressor:
https://github.com/discobot/LambdaMart/blob/acb8329ab63a45d2bcb43055fa54f14b8c6725c1/mart.py
2013/11/6 Jacques Kvam <[email protected]>
> Hello scikit-learn,
>
> I recently wrote up an implementation of the LambdaMART algorithm on top
> of the existing gradient boosting code (thanks for the great base of code
> to work with btw). It currently only supports NDCG but it would be easy to
> generalize. That's kind of besides the point however. Before I even think
> about putting together a PR I wanted to compare it against the gbm package.
> I'm aware of java implementations like jforest and ranklib but gbm's
> interface seems closest to sklearn's so that's what I want to use.
> Unfortunately whenever I try to use ndcg, it segfaults on me or I get an
> error in split.default depending on where I specify the group variable. I
> realize this isn't an R list but I was hoping someone could shed some light
> for me.
>
> I'm using the supervised MQ2007 and MQ2008 datasets from (
> https://research.microsoft.com/en-us/um/beijing/projects/letor//letor4download.aspx)
> and my test code is here (https://gist.github.com/jwkvam/7332448).
>
> I simply use python to transform the given train.txt file into a csv so I
> can load it in R. I'm using gbm 2.1 and I've tried R 2.15.3 and 3.0.2.
>
> Alternatively can I easily transform my gbm.fit() call to use the gbm()
> interface? Sorry I'm kind of a newbie when it comes to R.
>
> I saw there's also this standing issue, but it doesn't look like there's
> been a lot of movement on it.
>
>
> https://code.google.com/p/gradientboostedmodels/issues/detail?id=28&q=pairwise
>
> Thanks,
> Jacques
>
>
> ------------------------------------------------------------------------------
> November Webinars for C, C++, Fortran Developers
> Accelerate application performance with scalable programming models.
> Explore
> techniques for threading, error checking, porting, and tuning. Get the most
> from the latest Intel processors and coprocessors. See abstracts and
> register
> http://pubads.g.doubleclick.net/gampad/clk?id=60136231&iu=/4140/ostg.clktrk
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
--
Peter Prettenhofer
------------------------------------------------------------------------------
November Webinars for C, C++, Fortran Developers
Accelerate application performance with scalable programming models. Explore
techniques for threading, error checking, porting, and tuning. Get the most
from the latest Intel processors and coprocessors. See abstracts and register
http://pubads.g.doubleclick.net/gampad/clk?id=60136231&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general