I think that I would go for the option that minimize the amount of code
duplication.
I would probably start with 2. Since we don’t pickle anymore the Splitter and
criterion, the constructor
arguments could be used to pass the X and the y matrix.
Cheers,
Arnaud
On 04 Feb 2014, at 17:38, Felipe Eltermann <[email protected]> wrote:
> >> How much code in our current implementation depends on the data
> >> representation?
> > Not much actually. It now basically boils down to simply write a new
> > splitter object. Everything else remains the same. So basically, I would
> > say that it amounts to 300~ lines of Cython (out of the 2300 lines in our
> > implementation).
>
> Gilles,
>
> Splitter base class assumes X to be a dense, 2-dimensional, numpy array.
> Available splitters include: BestSplitter, PresortBestSplitter and
> RandomSplitter (all inheriting Splitter).
>
> That said, I think there are these possible architectures:
>
> 1-
> - make "Splitter" base class support both dense and sparse representation
> - implicitly assume that "BestSplitter", "PresortBestSplitter" and
> "RandomSplitter" are dealing with dense structure
> - create "SparseSplitter" that inherits "DenseSplitter" and implicitly
> assumes the sparse structure
>
> 2-
> - replace "Splitter" by "DenseSplitter"
> - make "BestSplitter", "PresortBestSplitter" and "RandomSplitter" inherit
> from "DenseSplitter"
> - create "SparseSplitter"
> - force "SparseSplitter" usage when the input is a sparse matrix
>
> Which one should we use?
> Also, which splitting approach should be used on SparseSplitter? Why not
> "SparseBestSplitter", "SparsePresortBestSplitter" and "SparseRandomSplitter"?
> ------------------------------------------------------------------------------
> Managing the Performance of Cloud-Based Applications
> Take advantage of what the Cloud has to offer - Avoid Common Pitfalls.
> Read the Whitepaper.
> http://pubads.g.doubleclick.net/gampad/clk?id=121051231&iu=/4140/ostg.clktrk_______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Managing the Performance of Cloud-Based Applications
Take advantage of what the Cloud has to offer - Avoid Common Pitfalls.
Read the Whitepaper.
http://pubads.g.doubleclick.net/gampad/clk?id=121051231&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general