I am already working on that. Thank you Robert and Daniel.

On Thu, Mar 13, 2014 at 2:52 PM, Robert Layton <[email protected]>wrote:

> Great. Make sure you sign up (Gael has posted the link elsewhere on this
> mailing list).
>
> Maheshakya -- can you start to put together a proposal. For now, I would
> include LSH forests.
>
>
> On 13 March 2014 19:12, Daniel Vainsencher 
> <[email protected]>wrote:
>
>> Hi Robert,
>>
>> I would like to help with the subject matter aspects; integration (in
>> more sense than one) requires more familiarity with the scikit learn
>> project. If you want to co-mentor it together, that would be cool.
>>
>> Other data structures worth evaluating (without depending on them) are
>> BTrees (complicated but do everything reasonably efficiently, and very
>> well known) and COLA [1]: efficient at random insertion and range
>> queries, still based on sorted arrays and binary search.
>>
>> Maheshakya:
>> > In this case, I think the entire structure can be implemented  in
>> > numpy arrays as they contain an (can be performed on) every operation
>>  > required this LSH forest implementation. But I'm not aware of this
>> > cost/benefit analysis. I need help of those who have more experience.
>> > Robert, can you give me some assistance here?
>>
>> I understand "cost/benefit analysis" and "risk analysis" as meaning:
>> - Is the likely product useful? how hard to maintain?
>> - Are there parts of the proposal that are unclear or complicated, thus
>> risking delay?
>>
>> We've discussed some of those. You should address them all in your
>> proposal (see [2]) which you should probably write as soon as possible,
>> the deadline is pretty close.
>>
>> How should an LSH project be evaluated? one proxy measurement is
>> improvement on kNN speed vs. accuracy.
>>
>> Daniel
>>
>> [1] supertech.csail.mit.edu/papers/sbtree.pdf
>> [2]
>>
>> https://github.com/scikit-learn/scikit-learn/wiki/Google-summer-of-code-%28GSOC%29-2014
>>
>> On 03/13/2014 01:00 AM, Robert Layton wrote:
>> > I apologise if I haven't been as active as I would have liked -- I had
>> > my wisdom teeth removed on Tuesday (ouch!).
>> >
>> > Daniel -- are you interested in becoming a mentor for this project?
>> >
>> > On the LSH forest, I don't have experience with it, but I think Daniel's
>> > point probably allows it to be included. It is an improvement on a
>> > standard algorithm.
>> > (That said, I've always had the opinion of erring on the side of
>> > inclusion, but I understand the push-back on this).
>> >
>> > On the data structure, it is my strong preference to make it work in
>> > "native" numpy, if that is possible (i.e. without doing ugly hacks).
>> > This is due mainly to long-term maintainability -- i.e. if we need to
>> > fix a bug and you aren't around, will someone with numpy/scipy
>> > experience be able to work out what is going on?
>> >
>> > /That said/, it would be good to allocate some extra time for the
>> > analysis of other data structures in the GSoC project scope. Something
>> > like "Week 8-12: Implement other data structure and test in comparison
>> > to existing method". I'm not sure if the GSoC accepts projects with that
>> > sort of risk, I'll find out.
>> >
>> > Thanks,
>> >
>> > Robert
>> >
>> >
>> >
>> >
>> >
>> >
>> > On 13 March 2014 07:00, Maheshakya Wijewardena <[email protected]
>> > <mailto:[email protected]>> wrote:
>> >
>> >     In this case, I think the entire structure can be implemented  in
>> >     numpy arrays as they contain an (can be performed on) every
>> >     operation required this LSH forest implementation. But I'm not aware
>> >     of this cost/benefit analysis. I need help of those who have more
>> >     experience.
>> >     Robert, can you give me some assistance here?
>> >
>> >
>> >     On Wed, Mar 12, 2014 at 10:33 PM, Gael Varoquaux
>> >     <[email protected]
>> >     <mailto:[email protected]>> wrote:
>> >
>> >          > Gaƫl, What is the status of the evaluation of the cost to
>> >         benefit ratio for
>> >          > this?
>> >
>> >         I don't know. I just cannot be an expert of everything
>> >         machine-learning
>> >         related. Somebody needs to do a cost-benefit analysis. Ideally
>> >         different
>> >         people should comment on it. And yes, it's challenging.
>> >
>> >          > I would appreciate it if we can decide on storing structure
>> >         for the LSH
>> >          > implementation quickly as I have to make a sound plan for
>> >         this project.
>> >
>> >         Well, anything that uses structure more complex than numpy
>> >         arrays is much
>> >         harder to develop and maintain. You need to factor this in the
>> >         cost/benefit analysis. Once again, I stress: much harder!
>> >
>> >         G
>> >
>> >
>> ------------------------------------------------------------------------------
>> >         Learn Graph Databases - Download FREE O'Reilly Book
>> >         "Graph Databases" is the definitive new guide to graph databases
>> >         and their
>> >         applications. Written by three acclaimed leaders in the field,
>> >         this first edition is now available. Download your free book
>> today!
>> >         http://p.sf.net/sfu/13534_NeoTech
>> >         _______________________________________________
>> >         Scikit-learn-general mailing list
>> >         [email protected]
>> >         <mailto:[email protected]>
>> >
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>> >
>> >
>> >
>> >
>> >     --
>> >     Undergraduate,
>> >     Department of Computer Science and Engineering,
>> >     Faculty of Engineering.
>> >     University of Moratuwa,
>> >     Sri Lanka
>> >
>> >
>> ------------------------------------------------------------------------------
>> >     Learn Graph Databases - Download FREE O'Reilly Book
>> >     "Graph Databases" is the definitive new guide to graph databases and
>> >     their
>> >     applications. Written by three acclaimed leaders in the field,
>> >     this first edition is now available. Download your free book today!
>> >     http://p.sf.net/sfu/13534_NeoTech
>> >     _______________________________________________
>> >     Scikit-learn-general mailing list
>> >     [email protected]
>> >     <mailto:[email protected]>
>> >     https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>> >
>> >
>> >
>> >
>> >
>> ------------------------------------------------------------------------------
>> > Learn Graph Databases - Download FREE O'Reilly Book
>> > "Graph Databases" is the definitive new guide to graph databases and
>> their
>> > applications. Written by three acclaimed leaders in the field,
>> > this first edition is now available. Download your free book today!
>> > http://p.sf.net/sfu/13534_NeoTech
>> >
>> >
>> >
>> > _______________________________________________
>> > Scikit-learn-general mailing list
>> > [email protected]
>> > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>> >
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Learn Graph Databases - Download FREE O'Reilly Book
>> "Graph Databases" is the definitive new guide to graph databases and their
>> applications. Written by three acclaimed leaders in the field,
>> this first edition is now available. Download your free book today!
>> http://p.sf.net/sfu/13534_NeoTech
>> _______________________________________________
>> Scikit-learn-general mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>
>
>
> ------------------------------------------------------------------------------
> Learn Graph Databases - Download FREE O'Reilly Book
> "Graph Databases" is the definitive new guide to graph databases and their
> applications. Written by three acclaimed leaders in the field,
> this first edition is now available. Download your free book today!
> http://p.sf.net/sfu/13534_NeoTech
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>


-- 
Undergraduate,
Department of Computer Science and Engineering,
Faculty of Engineering.
University of Moratuwa,
Sri Lanka
------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/13534_NeoTech
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to