Hi,
and i suppose we have to change the bagging models as well ?

On Sun, Mar 16, 2014 at 7:19 AM, vamsi kaushik
<[email protected]>wrote:

> Hi arnaud, Eltermann
>
> Thanks for the reply. Firstly I am a noob here so excuse me for any stupid
> questions. Eltermann Just my 2 cents, I felt your approach(in BestSplitter
> ) somewhat inefficient when you are calling the get_sparse_item. Firstly
> even though we have indices of all the non zero items, it is being called
> node_samples times, iterating over nnz times. An easier way of doing this
> would be to sort the samples in sparse matrix along features and then
> extracting if necessary. because the whole point in using Xf is to sort
> them which becomes difficult when they are in an n-d array. like for example
>
> http://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.csr_matrix.nonzero.html#scipy.sparse.csr_matrix.nonzero
> gives the non zero elements. My solution might not be perfect but my point
> is when we are adding support for sparse matrices, why not exploit their
> structure as much as we can. Arnaud is this feasible ? Eltermann anything
> wrong in my thinking ?
>
>
> cheers,
> kaushik varanasi
> On Wed, Mar 12, 2014 at 8:29 PM, Arnaud Joly <[email protected]> wrote:
>
>> For the number of contributions, I would advise you to do as many
>> as possible to get familiar with the codebase and scikit-learn practices.
>> Of course starting to think and to work on how to add sparsity support to
>> decision tree
>> is important to grasp the challenges and write a convincing proposal.
>> Thus, the right proportion is difficult to suggest.
>>
>> The goal of this GSOC is to have an efficient and a maintenable
>> implementation.
>> The overall project plan should be dictated by this goal.
>>
>> Best,
>> Arnaud
>>
>> On 12 Mar 2014, at 13:44, Gilles Louppe <[email protected]> wrote:
>>
>> > On 12 March 2014 13:08, Felipe Eltermann <[email protected]>
>> wrote:
>> >> Hello Vamsi,
>> >>
>> >> "Firstly, regarding the implementation of sparse functions. _tree.pxy
>> is the
>> >> back end cython code to handle the operations Splitting, Evaluating
>> >> impurities at nodes and then constructing the tree."
>> >> That's right.
>> >>
>> >> "The basic implementation that i have in mind is that we duplicate the
>> >> splitter classes and The tree architecture itself."
>> >> Duplicating the code doesn't seem to be the best approach. It will be
>> >> simpler if the same splitter class should handle both dense and sparse
>> >> matrices.
>> >> I started a PR that is already in progress:
>> >> https://github.com/scikit-learn/scikit-learn/pull/2848
>> >> If you want to help, you can read up the commits and help on the open
>> >> issues.
>> >> (however, I don't think this apply as a GSoC project though)
>> >
>> > Just my 2 cents, but I am not sure having both dense and sparse
>> > support in the same codebase is the best choice. Both implementations
>> > should in my opinion be separated, with the common code factored out
>> > to avoid code dupplication.
>> >
>> > In particular, having an *efficient* sparse splitter will require
>> > specific changes to the splitter. I am really not sure having this
>> > code intermingled with the dense support is the best option.
>> >
>> >>
>> >> On Tue, Mar 11, 2014 at 7:07 PM, vamsi kaushik <
>> [email protected]>
>> >> wrote:
>> >>>
>> >>> hi Joly,
>> >>>
>> >>> I actually have a few questions:
>> >>>
>> >>> Firstly, regarding the implementation of sparse functions. _tree.pxy
>> is
>> >>> the back end cython code to handle the operations Splitting,
>> Evaluating
>> >>> impurities at nodes and then constructing the tree. The basic
>> implementation
>> >>> that i have in mind is that we duplicate the splitter classes and The
>> tree
>> >>> architecture itself. At each leaf node we might have a sparse matrix
>> >>> representing the output data and at every non-leaf node, splitting
>> should be
>> >>> done on the sparse matrix at that node and produce few other nodes
>> with
>> >>> sparse output. So the project boils down to implementing good access
>> and
>> >>> construction methods for sparse matrices in cython. I just want to
>> know if
>> >>> this is a possible implementation criterion ?. And i would also like
>> to know
>> >>> the implementation plan that you have in mind ?
>> >>>
>> >>> Secondly, In my previous mail i have shown you my contributions. I
>> would
>> >>> just like to know if the patches are upto the mark for a selection.
>> If they
>> >>> are, i will turn my attention to the DT sparse problem, else i will
>> work on
>> >>> the issues a bit more.
>> >>>
>> >>>
>> >>>
>> >>> On Tue, Mar 11, 2014 at 1:12 PM, Arnaud Joly <[email protected]>
>> wrote:
>> >>>>
>> >>>> Thanks for your contribution.
>> >>>>
>> >>>> Keep up!
>> >>>>
>> >>>> Arnaud
>> >>>>
>> >>>> On 10 Mar 2014, at 23:51, vamsi kaushik <[email protected]
>> >
>> >>>> wrote:
>> >>>>
>> >>>> My name is actually Varanasi Vamsi Kaushik(yeah its pretty long).
>> >>>>
>> >>>> And i have already contributed a little.
>> >>>> Please take a look at :
>> >>>>
>> >>>> https://github.com/scikit-learn/scikit-learn/pull/2905
>> >>>>
>> >>>> and also:
>> >>>> https://github.com/scikit-learn/scikit-learn/pull/2895
>> >>>> https://github.com/scikit-learn/scikit-learn/issues/2727
>> >>>>
>> >>>> cheers,
>> >>>> kaushik
>> >>>>
>> >>>>
>> >>>> On Tue, Mar 11, 2014 at 3:13 AM, vamsi kaushik
>> >>>> <[email protected]> wrote:
>> >>>>>
>> >>>>> hi Arnaud,
>> >>>>>
>> >>>>> Vamsi kaushik is actually me.
>> >>>>> Thanks for your reply, i'll get to the issue soon
>> >>>>>
>> >>>>> cheers,
>> >>>>> vamsi kaushik
>> >>>>>
>> >>>>>
>> >>>>> On Mon, Mar 10, 2014 at 3:16 PM, Arnaud Joly <[email protected]>
>> wrote:
>> >>>>>>
>> >>>>>> Hi,
>> >>>>>>
>> >>>>>> Anything concerning the GSOC should pass by the scikit-learn
>> >>>>>> mailing list.
>> >>>>>>
>> >>>>>> Thanks for your interest in the subject. If you intend to apply
>> for a
>> >>>>>> GSOC, I suggest you to read
>> >>>>>>
>> >>>>>>
>> https://github.com/scikit-learn/scikit-learn/wiki/Google-summer-of-code-%28GSOC%29-2014
>> >>>>>> and start contributing to scikit-learn.
>> >>>>>>
>> >>>>>> Right now, three people have shown interest for this topic:
>> Maheshakya
>> >>>>>> (who is now
>> >>>>>> applying for a GSOC about LSH),  Vamsi Kaushik  and you. Several
>> >>>>>> candidates may apply for the
>> >>>>>> same subject. However it is likely that if a GSOC is awarded for a
>> >>>>>> given subject, only the best proposal will
>> >>>>>> be selected.
>> >>>>>>
>> >>>>>> Best regards,
>> >>>>>> Arnaud
>> >>>>>>
>> >>>>>>
>> >>>>>> On 28 Feb 2014, at 22:33, João <[email protected]> wrote:
>> >>>>>>
>> >>>>>> Hi Arnaud,
>> >>>>>>
>> >>>>>> While browsing the current GSoC projects I saw an interesting one
>> for
>> >>>>>> which you are assigned as mentor: "Add scipy.sparse matrix support
>> to the
>> >>>>>> Decision Tree".
>> >>>>>>
>> >>>>>> I am considering applying for this project as I have already faced
>> the
>> >>>>>> necessity of sparse matrices in sklearn and the outcome was not
>> totally
>> >>>>>> satisfactory.
>> >>>>>> Right now I am participating in a kaggle contest
>> >>>>>> (http://www.kaggle.com/c/lshtc/) and facing several difficulties
>> even with
>> >>>>>> algorithms that already support spase matrices (even simple
>> algorithms such
>> >>>>>> as NB). In general I get some memory error and sometimes segfaults.
>> >>>>>> I would be happy if I could implement the necessary support for DT
>> (and
>> >>>>>> related algorithms such as random forest and extra trees) and, if
>> the time
>> >>>>>> constraint allows, improve as much as possible the general support
>> for
>> >>>>>> sparse matrices.
>> >>>>>>
>> >>>>>> With this email, I want to show my interest and ask you if there
>> are
>> >>>>>> already any candidates for this place.
>> >>>>>>
>> >>>>>> Best regards,
>> >>>>>> --
>> >>>>>> João
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> ------------------------------------------------------------------------------
>> >>>>>> Learn Graph Databases - Download FREE O'Reilly Book
>> >>>>>> "Graph Databases" is the definitive new guide to graph databases
>> and
>> >>>>>> their
>> >>>>>> applications. Written by three acclaimed leaders in the field,
>> >>>>>> this first edition is now available. Download your free book today!
>> >>>>>> http://p.sf.net/sfu/13534_NeoTech
>> >>>>>> _______________________________________________
>> >>>>>> Scikit-learn-general mailing list
>> >>>>>> [email protected]
>> >>>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>> >>>>>>
>> >>>>>
>> >>>>
>> >>>>
>> >>>>
>> ------------------------------------------------------------------------------
>> >>>> Learn Graph Databases - Download FREE O'Reilly Book
>> >>>> "Graph Databases" is the definitive new guide to graph databases and
>> >>>> their
>> >>>> applications. Written by three acclaimed leaders in the field,
>> >>>> this first edition is now available. Download your free book today!
>> >>>>
>> >>>>
>> http://p.sf.net/sfu/13534_NeoTech_______________________________________________
>> >>>> Scikit-learn-general mailing list
>> >>>> [email protected]
>> >>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> ------------------------------------------------------------------------------
>> >>>> Learn Graph Databases - Download FREE O'Reilly Book
>> >>>> "Graph Databases" is the definitive new guide to graph databases and
>> >>>> their
>> >>>> applications. Written by three acclaimed leaders in the field,
>> >>>> this first edition is now available. Download your free book today!
>> >>>> http://p.sf.net/sfu/13534_NeoTech
>> >>>> _______________________________________________
>> >>>> Scikit-learn-general mailing list
>> >>>> [email protected]
>> >>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>> >>>>
>> >>>
>> >>>
>> >>>
>> >>>
>> ------------------------------------------------------------------------------
>> >>> Learn Graph Databases - Download FREE O'Reilly Book
>> >>> "Graph Databases" is the definitive new guide to graph databases and
>> their
>> >>> applications. Written by three acclaimed leaders in the field,
>> >>> this first edition is now available. Download your free book today!
>> >>> http://p.sf.net/sfu/13534_NeoTech
>> >>> _______________________________________________
>> >>> Scikit-learn-general mailing list
>> >>> [email protected]
>> >>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>> >>>
>> >>
>> >>
>> >>
>> ------------------------------------------------------------------------------
>> >> Learn Graph Databases - Download FREE O'Reilly Book
>> >> "Graph Databases" is the definitive new guide to graph databases and
>> their
>> >> applications. Written by three acclaimed leaders in the field,
>> >> this first edition is now available. Download your free book today!
>> >> http://p.sf.net/sfu/13534_NeoTech
>> >> _______________________________________________
>> >> Scikit-learn-general mailing list
>> >> [email protected]
>> >> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>> >>
>> >
>> >
>> ------------------------------------------------------------------------------
>> > Learn Graph Databases - Download FREE O'Reilly Book
>> > "Graph Databases" is the definitive new guide to graph databases and
>> their
>> > applications. Written by three acclaimed leaders in the field,
>> > this first edition is now available. Download your free book today!
>> > http://p.sf.net/sfu/13534_NeoTech
>> > _______________________________________________
>> > Scikit-learn-general mailing list
>> > [email protected]
>> > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Learn Graph Databases - Download FREE O'Reilly Book
>> "Graph Databases" is the definitive new guide to graph databases and their
>> applications. Written by three acclaimed leaders in the field,
>> this first edition is now available. Download your free book today!
>> http://p.sf.net/sfu/13534_NeoTech
>> _______________________________________________
>> Scikit-learn-general mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>
>
------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/13534_NeoTech
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to