Given the license, I don’t think you can read the code to make a new bsd 
implementation.

Arnaud

On 17 Mar 2014, at 15:19, vamsi kaushik <[email protected]> wrote:

> Hi,
> And of the fest implementation i would say we just rewrite the algorithm in 
> cython instead of writing wrappers for it, this dosn't need licence and is 
> readable for other contributors and also easily changeable in future. This 
> boils down to a few 500 lines of code in cython. 
> 
> 
> On Mon, Mar 17, 2014 at 7:26 PM, vamsi kaushik <[email protected]> 
> wrote:
> hi arnaud,
> 
> yeah i have already gone through that implementation. I feel at least we 
> should have separate splitter classes for the node splitting to be more 
> efficient and flexible.
> And as of the choice of matrix, I don't have one in mind but mostly the 
> elements are sorted along the feature columns. so I would say, csr maybe. But 
> there can be
> other efficient formats too. 
> 
> And as of the input matrices, we can follow the same procedure followed in 
> https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/linear_model/cd_fast.pyx
> 
> so as to not break the user code. And moreover i have read somewhere on the 
> internet about sparse coding format, is it possible to use a more efficient 
> sparse structure within the tree algorithm. But the input types remain the 
> same, they get converted to another sparse structure once they are passed to 
> the tree, since we are working with non-zero element pointers, this should 
> not be memory/time inefficient. But my question is, is this acceptable?   
> 
> 
> On Mon, Mar 17, 2014 at 6:49 PM, Arnaud Joly <[email protected]> wrote:
> Hi,
> 
> 
> The support for sparse matrices should exploit as much as possible the 
> sparsity structure of the matrix
> without blowing up memory as to obtain optimal performance. Note that the 
> choice of the best sparse matrix 
> representation  is still open (csc, csr, coo, ... matrix?). 
> 
> To my knowldege, there is one implementation of sparse decision tree
> https://github.com/fest/fest. But I don’t think that the license is BSD 
> compatible.
> 
> If input sparsity is added, adaboost, GBRT, bagging and random
> forest should support sparse input data.
> 
> Best,
> Arnaud
> 
> 
> On 16 Mar 2014, at 02:49, vamsi kaushik <[email protected]> wrote:
> 
>> Hi arnaud, Eltermann
>> 
>> Thanks for the reply. Firstly I am a noob here so excuse me for any stupid 
>> questions. Eltermann Just my 2 cents, I felt your approach(in BestSplitter ) 
>> somewhat inefficient when you are calling the get_sparse_item. Firstly even 
>> though we have indices of all the non zero items, it is being called 
>> node_samples times, iterating over nnz times. An easier way of doing this 
>> would be to sort the samples in sparse matrix along features and then 
>> extracting if necessary. because the whole point in using Xf is to sort them 
>> which becomes difficult when they are in an n-d array. like for example
>> http://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.csr_matrix.nonzero.html#scipy.sparse.csr_matrix.nonzero
>> gives the non zero elements. My solution might not be perfect but my point 
>> is when we are adding support for sparse matrices, why not exploit their 
>> structure as much as we can. Arnaud is this feasible ? Eltermann anything 
>> wrong in my thinking ?
>> 
>> 
>> cheers,
>> kaushik varanasi
>> On Wed, Mar 12, 2014 at 8:29 PM, Arnaud Joly <[email protected]> wrote:
>> For the number of contributions, I would advise you to do as many
>> as possible to get familiar with the codebase and scikit-learn practices.
>> Of course starting to think and to work on how to add sparsity support to 
>> decision tree
>> is important to grasp the challenges and write a convincing proposal.
>> Thus, the right proportion is difficult to suggest.
>> 
>> The goal of this GSOC is to have an efficient and a maintenable 
>> implementation.
>> The overall project plan should be dictated by this goal.
>> 
>> Best,
>> Arnaud
>> 
>> On 12 Mar 2014, at 13:44, Gilles Louppe <[email protected]> wrote:
>> 
>> > On 12 March 2014 13:08, Felipe Eltermann <[email protected]> 
>> > wrote:
>> >> Hello Vamsi,
>> >>
>> >> "Firstly, regarding the implementation of sparse functions. _tree.pxy is 
>> >> the
>> >> back end cython code to handle the operations Splitting, Evaluating
>> >> impurities at nodes and then constructing the tree."
>> >> That's right.
>> >>
>> >> "The basic implementation that i have in mind is that we duplicate the
>> >> splitter classes and The tree architecture itself."
>> >> Duplicating the code doesn't seem to be the best approach. It will be
>> >> simpler if the same splitter class should handle both dense and sparse
>> >> matrices.
>> >> I started a PR that is already in progress:
>> >> https://github.com/scikit-learn/scikit-learn/pull/2848
>> >> If you want to help, you can read up the commits and help on the open
>> >> issues.
>> >> (however, I don't think this apply as a GSoC project though)
>> >
>> > Just my 2 cents, but I am not sure having both dense and sparse
>> > support in the same codebase is the best choice. Both implementations
>> > should in my opinion be separated, with the common code factored out
>> > to avoid code dupplication.
>> >
>> > In particular, having an *efficient* sparse splitter will require
>> > specific changes to the splitter. I am really not sure having this
>> > code intermingled with the dense support is the best option.
>> >
>> >>
>> >> On Tue, Mar 11, 2014 at 7:07 PM, vamsi kaushik 
>> >> <[email protected]>
>> >> wrote:
>> >>>
>> >>> hi Joly,
>> >>>
>> >>> I actually have a few questions:
>> >>>
>> >>> Firstly, regarding the implementation of sparse functions. _tree.pxy is
>> >>> the back end cython code to handle the operations Splitting, Evaluating
>> >>> impurities at nodes and then constructing the tree. The basic 
>> >>> implementation
>> >>> that i have in mind is that we duplicate the splitter classes and The 
>> >>> tree
>> >>> architecture itself. At each leaf node we might have a sparse matrix
>> >>> representing the output data and at every non-leaf node, splitting 
>> >>> should be
>> >>> done on the sparse matrix at that node and produce few other nodes with
>> >>> sparse output. So the project boils down to implementing good access and
>> >>> construction methods for sparse matrices in cython. I just want to know 
>> >>> if
>> >>> this is a possible implementation criterion ?. And i would also like to 
>> >>> know
>> >>> the implementation plan that you have in mind ?
>> >>>
>> >>> Secondly, In my previous mail i have shown you my contributions. I would
>> >>> just like to know if the patches are upto the mark for a selection. If 
>> >>> they
>> >>> are, i will turn my attention to the DT sparse problem, else i will work 
>> >>> on
>> >>> the issues a bit more.
>> >>>
>> >>>
>> >>>
>> >>> On Tue, Mar 11, 2014 at 1:12 PM, Arnaud Joly <[email protected]> wrote:
>> >>>>
>> >>>> Thanks for your contribution.
>> >>>>
>> >>>> Keep up!
>> >>>>
>> >>>> Arnaud
>> >>>>
>> >>>> On 10 Mar 2014, at 23:51, vamsi kaushik <[email protected]>
>> >>>> wrote:
>> >>>>
>> >>>> My name is actually Varanasi Vamsi Kaushik(yeah its pretty long).
>> >>>>
>> >>>> And i have already contributed a little.
>> >>>> Please take a look at :
>> >>>>
>> >>>> https://github.com/scikit-learn/scikit-learn/pull/2905
>> >>>>
>> >>>> and also:
>> >>>> https://github.com/scikit-learn/scikit-learn/pull/2895
>> >>>> https://github.com/scikit-learn/scikit-learn/issues/2727
>> >>>>
>> >>>> cheers,
>> >>>> kaushik
>> >>>>
>> >>>>
>> >>>> On Tue, Mar 11, 2014 at 3:13 AM, vamsi kaushik
>> >>>> <[email protected]> wrote:
>> >>>>>
>> >>>>> hi Arnaud,
>> >>>>>
>> >>>>> Vamsi kaushik is actually me.
>> >>>>> Thanks for your reply, i'll get to the issue soon
>> >>>>>
>> >>>>> cheers,
>> >>>>> vamsi kaushik
>> >>>>>
>> >>>>>
>> >>>>> On Mon, Mar 10, 2014 at 3:16 PM, Arnaud Joly <[email protected]> wrote:
>> >>>>>>
>> >>>>>> Hi,
>> >>>>>>
>> >>>>>> Anything concerning the GSOC should pass by the scikit-learn
>> >>>>>> mailing list.
>> >>>>>>
>> >>>>>> Thanks for your interest in the subject. If you intend to apply for a
>> >>>>>> GSOC, I suggest you to read
>> >>>>>>
>> >>>>>> https://github.com/scikit-learn/scikit-learn/wiki/Google-summer-of-code-%28GSOC%29-2014
>> >>>>>> and start contributing to scikit-learn.
>> >>>>>>
>> >>>>>> Right now, three people have shown interest for this topic: Maheshakya
>> >>>>>> (who is now
>> >>>>>> applying for a GSOC about LSH),  Vamsi Kaushik  and you. Several
>> >>>>>> candidates may apply for the
>> >>>>>> same subject. However it is likely that if a GSOC is awarded for a
>> >>>>>> given subject, only the best proposal will
>> >>>>>> be selected.
>> >>>>>>
>> >>>>>> Best regards,
>> >>>>>> Arnaud
>> >>>>>>
>> >>>>>>
>> >>>>>> On 28 Feb 2014, at 22:33, João <[email protected]> wrote:
>> >>>>>>
>> >>>>>> Hi Arnaud,
>> >>>>>>
>> >>>>>> While browsing the current GSoC projects I saw an interesting one for
>> >>>>>> which you are assigned as mentor: "Add scipy.sparse matrix support to 
>> >>>>>> the
>> >>>>>> Decision Tree".
>> >>>>>>
>> >>>>>> I am considering applying for this project as I have already faced the
>> >>>>>> necessity of sparse matrices in sklearn and the outcome was not 
>> >>>>>> totally
>> >>>>>> satisfactory.
>> >>>>>> Right now I am participating in a kaggle contest
>> >>>>>> (http://www.kaggle.com/c/lshtc/) and facing several difficulties even 
>> >>>>>> with
>> >>>>>> algorithms that already support spase matrices (even simple 
>> >>>>>> algorithms such
>> >>>>>> as NB). In general I get some memory error and sometimes segfaults.
>> >>>>>> I would be happy if I could implement the necessary support for DT 
>> >>>>>> (and
>> >>>>>> related algorithms such as random forest and extra trees) and, if the 
>> >>>>>> time
>> >>>>>> constraint allows, improve as much as possible the general support for
>> >>>>>> sparse matrices.
>> >>>>>>
>> >>>>>> With this email, I want to show my interest and ask you if there are
>> >>>>>> already any candidates for this place.
>> >>>>>>
>> >>>>>> Best regards,
>> >>>>>> --
>> >>>>>> João
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> ------------------------------------------------------------------------------
>> >>>>>> Learn Graph Databases - Download FREE O'Reilly Book
>> >>>>>> "Graph Databases" is the definitive new guide to graph databases and
>> >>>>>> their
>> >>>>>> applications. Written by three acclaimed leaders in the field,
>> >>>>>> this first edition is now available. Download your free book today!
>> >>>>>> http://p.sf.net/sfu/13534_NeoTech
>> >>>>>> _______________________________________________
>> >>>>>> Scikit-learn-general mailing list
>> >>>>>> [email protected]
>> >>>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>> >>>>>>
>> >>>>>
>> >>>>
>> >>>>
>> >>>> ------------------------------------------------------------------------------
>> >>>> Learn Graph Databases - Download FREE O'Reilly Book
>> >>>> "Graph Databases" is the definitive new guide to graph databases and
>> >>>> their
>> >>>> applications. Written by three acclaimed leaders in the field,
>> >>>> this first edition is now available. Download your free book today!
>> >>>>
>> >>>> http://p.sf.net/sfu/13534_NeoTech_______________________________________________
>> >>>> Scikit-learn-general mailing list
>> >>>> [email protected]
>> >>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>> ------------------------------------------------------------------------------
>> >>>> Learn Graph Databases - Download FREE O'Reilly Book
>> >>>> "Graph Databases" is the definitive new guide to graph databases and
>> >>>> their
>> >>>> applications. Written by three acclaimed leaders in the field,
>> >>>> this first edition is now available. Download your free book today!
>> >>>> http://p.sf.net/sfu/13534_NeoTech
>> >>>> _______________________________________________
>> >>>> Scikit-learn-general mailing list
>> >>>> [email protected]
>> >>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>> >>>>
>> >>>
>> >>>
>> >>>
>> >>> ------------------------------------------------------------------------------
>> >>> Learn Graph Databases - Download FREE O'Reilly Book
>> >>> "Graph Databases" is the definitive new guide to graph databases and 
>> >>> their
>> >>> applications. Written by three acclaimed leaders in the field,
>> >>> this first edition is now available. Download your free book today!
>> >>> http://p.sf.net/sfu/13534_NeoTech
>> >>> _______________________________________________
>> >>> Scikit-learn-general mailing list
>> >>> [email protected]
>> >>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>> >>>
>> >>
>> >>
>> >> ------------------------------------------------------------------------------
>> >> Learn Graph Databases - Download FREE O'Reilly Book
>> >> "Graph Databases" is the definitive new guide to graph databases and their
>> >> applications. Written by three acclaimed leaders in the field,
>> >> this first edition is now available. Download your free book today!
>> >> http://p.sf.net/sfu/13534_NeoTech
>> >> _______________________________________________
>> >> Scikit-learn-general mailing list
>> >> [email protected]
>> >> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>> >>
>> >
>> > ------------------------------------------------------------------------------
>> > Learn Graph Databases - Download FREE O'Reilly Book
>> > "Graph Databases" is the definitive new guide to graph databases and their
>> > applications. Written by three acclaimed leaders in the field,
>> > this first edition is now available. Download your free book today!
>> > http://p.sf.net/sfu/13534_NeoTech
>> > _______________________________________________
>> > Scikit-learn-general mailing list
>> > [email protected]
>> > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>> 
>> 
>> ------------------------------------------------------------------------------
>> Learn Graph Databases - Download FREE O'Reilly Book
>> "Graph Databases" is the definitive new guide to graph databases and their
>> applications. Written by three acclaimed leaders in the field,
>> this first edition is now available. Download your free book today!
>> http://p.sf.net/sfu/13534_NeoTech
>> _______________________________________________
>> Scikit-learn-general mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>> 
>> ------------------------------------------------------------------------------
>> Learn Graph Databases - Download FREE O'Reilly Book
>> "Graph Databases" is the definitive new guide to graph databases and their
>> applications. Written by three acclaimed leaders in the field,
>> this first edition is now available. Download your free book today!
>> http://p.sf.net/sfu/13534_NeoTech_______________________________________________
>> Scikit-learn-general mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
> 
> 
> ------------------------------------------------------------------------------
> Learn Graph Databases - Download FREE O'Reilly Book
> "Graph Databases" is the definitive new guide to graph databases and their
> applications. Written by three acclaimed leaders in the field,
> this first edition is now available. Download your free book today!
> http://p.sf.net/sfu/13534_NeoTech
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
> 
> 
> 
> ------------------------------------------------------------------------------
> Learn Graph Databases - Download FREE O'Reilly Book
> "Graph Databases" is the definitive new guide to graph databases and their
> applications. Written by three acclaimed leaders in the field,
> this first edition is now available. Download your free book today!
> http://p.sf.net/sfu/13534_NeoTech_______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

------------------------------------------------------------------------------
Put Bad Developers to Shame
Dominate Development with Jenkins Continuous Integration
Continuously Automate Build, Test & Deployment 
Start a new project now. Try Jenkins in the cloud.
http://p.sf.net/sfu/13600_Cloudbees
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to