Re: [Scikit-learn-general] Sparse matrix support for Decision tree implementation

Felipe Eltermann Fri, 07 Feb 2014 10:59:14 -0800

Arnaud,

I added a issparse attribute to Splitter base class.
Doing so, I think I managed to introduce sparse support without the need of
replicating Splitters' business logic code.
I'm working on this branch [1]


I have two questions:
1- I removed a "with nogil" statement [2]. Is there a way to keep it?
2- Toy sparse input test is failing, and I think it's due to incorrect
values being read from X.data [3] (the method considers the matrix is in
CSC format). Can you see something I'm doing wrong here?


[1] https://github.com/eltermann/scikit-learn/commits/DT-sparse
[2]
https://github.com/eltermann/scikit-learn/commit/8388bedff4e225cda9a1b2b6e3fc250bb7d22276#diff-a2cead4f3702cc4b9f76562bb2777edbL2297
[3]
https://github.com/eltermann/scikit-learn/commit/5ba9c367661446c3eba7e6ea54adc1ff5cdfd39f#diff-a2cead4f3702cc4b9f76562bb2777edbR1281

On Wed, Feb 5, 2014 at 10:34 AM, Arnaud Joly <[email protected]> wrote:

> I think that I would go for the option that minimize the amount of code
> duplication.
> I would probably start with 2. Since we don't pickle anymore the Splitter
> and criterion, the constructor
> arguments could be used to pass the X and the y matrix.
>
> Cheers,
> Arnaud
>
>
>
> On 04 Feb 2014, at 17:38, Felipe Eltermann <[email protected]>
> wrote:
>
> >>  How much code in our current implementation depends on the data
> representation?
> > Not much actually. It now basically boils down to simply write a new
> splitter object. Everything else remains the same. So basically, I would
> say that it amounts to 300~ lines of Cython (out of the 2300 lines in our
> implementation).
>
> Gilles,
>
> *Splitter* base class assumes X to be a dense, 2-dimensional, numpy array.
> Available splitters include: *BestSplitter*, *PresortBestSplitter* and
> *RandomSplitter* (all inheriting Splitter).
>
> That said, I think there are these possible architectures:
>
> 1-
> - make "Splitter" base class support both dense and sparse representation
> - implicitly assume that "BestSplitter", "PresortBestSplitter" and
> "RandomSplitter" are dealing with dense structure
> - create "SparseSplitter" that inherits "DenseSplitter" and implicitly
> assumes the sparse structure
>
> 2-
> - replace "Splitter" by "DenseSplitter"
> - make "BestSplitter", "PresortBestSplitter" and "RandomSplitter" inherit
> from "DenseSplitter"
> - create "SparseSplitter"
> - force "SparseSplitter" usage when the input is a sparse matrix
>
> Which one should we use?
> Also, which splitting approach should be used on SparseSplitter? Why not
> "SparseBestSplitter", "SparsePresortBestSplitter" and
> "SparseRandomSplitter"?
> ------------------------------------------------------------------------------
> Managing the Performance of Cloud-Based Applications
> Take advantage of what the Cloud has to offer - Avoid Common Pitfalls.
> Read the Whitepaper.
>
> http://pubads.g.doubleclick.net/gampad/clk?id=121051231&iu=/4140/ostg.clktrk_______________________________________________
>
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
>
>
> ------------------------------------------------------------------------------
> Managing the Performance of Cloud-Based Applications
> Take advantage of what the Cloud has to offer - Avoid Common Pitfalls.
> Read the Whitepaper.
>
> http://pubads.g.doubleclick.net/gampad/clk?id=121051231&iu=/4140/ostg.clktrk
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>

------------------------------------------------------------------------------
Managing the Performance of Cloud-Based Applications
Take advantage of what the Cloud has to offer - Avoid Common Pitfalls.
Read the Whitepaper.
http://pubads.g.doubleclick.net/gampad/clk?id=121051231&iu=/4140/ostg.clktrk

_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] Sparse matrix support for Decision tree implementation

Reply via email to