On Sun, Sep 21, 2014 at 2:04 AM, Olivier Grisel <[email protected]>
wrote:

> 2014-09-20 8:04 GMT-07:00 Mathieu Blondel <[email protected]>:
> >
> > I recently re-implemented gradient boosting [2].
>
> I am interested in your feedback in implementing trees with numba. Is
> it easy to reach the speed the scikit-learn cython generated code?
> What are the trade-offs and traps you discovered while working on
> this?
>
> In general would you recommend numba over Cython for Python users
> implementing their own estimators with non vectorizable routines?
>

My implementation of trees is still very much a work in progress and I
still have to implement a bunch of optimizations on the algorithmic level
before we can compare it  to the implementation in scikit-learn. My
implementation is not a port of scikit-learn, I started from scratch
(although I followed most of the implementation techniques detailed in
Gilles' thesis).

I really love Numba. It is really great to be able to accelerate routines
directly in the same file and type inference makes the code much more
readable. I usually get 20x to 50x speed ups compared to pure Python.
However there are still a few limitations. There are a few language
constructs which are not yet supported like recursion and returning tuples.
Working around these limitations sometimes makes the code ugly. Compilation
is slow. Here's an example:

$ time nosetests ivalice/impl/tests/test_tree.py
........
----------------------------------------------------------------------
Ran 8 tests in 0.416s

OK
nosetests ivalice/impl/tests/test_tree.py  5,50s user 0,20s system 99% cpu
5,696 total

So the tests take 0.4s to run but compilation of the jitted functions takes
5s. It's ok for large scale learning but quite annoying for small scale
interactive work. So compilation caching would be a nice feature to have (
https://github.com/numba/numba/issues/224).

Last, error reporting is a bit confusing at times.

To summarize, in its current state, I'd recommend Numba for small projects
but perhaps not for scikit-learn. Also this makes me realize that Cython
has reached a level of maturity which is pretty impressive!


Mathieu


> --
> Olivier
> http://twitter.com/ogrisel - http://github.com/ogrisel
>
------------------------------------------------------------------------------
Slashdot TV.  Video for Nerds.  Stuff that Matters.
http://pubads.g.doubleclick.net/gampad/clk?id=160591471&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to