Thanks for sending this.
The grammar needs updating, but I'll let you worry about proofreading.
Comments below:
- State some *evidence* for your experience with python, i.e. projects you
worked on or links to big commits (this is covered late in the application,
move some information forward).
- Don't choose a project because "it was on the list" -- why is the problem
important and what can *you* do to help?
- Be more confident! "This is the most vital part of this project as I
think." Ensure the whole project is planned and evaluated.
- Can you talk a bit more about which hashing algorithms will be
implemented? This is a major part of the work, but isn't appropriately
covered.
- In your abstract, I would make certain your plan -- you want to implement
prototypes, evaluate them and then finish the implementations. Make this
absolutely clear (currently, the reader has to piece this together for
themselves)
- References need to be done more thoroughly.
I think it's a good project, now you need to sell it!
On 18 March 2014 06:55, Maheshakya Wijewardena <[email protected]>wrote:
> Hi,
>
> I have written a proposal for this project and it is in the wiki,
>
> https://github.com/scikit-learn/scikit-learn/wiki/GSOC-2014--Project-Proposal-:-Locality-sensitive-Hashing-for-approximate-neighbor-search
>
> It is not complete yet. I have not written the timeline yet. But I have
> described my plan for this project. I think the prototyping part of this
> project should be started during the community bonding period, while fixing
> other issues because that needs to be done as soon as possible. I will
> update this with my timeline soon.
>
> Can you review this and give me your feedback.
>
> Thank you,
> Maheshakya.
>
>
> On Mon, Mar 17, 2014 at 1:02 PM, Daniel Vainsencher <
> [email protected]> wrote:
>
>> On 03/16/2014 08:59 PM, Olivier Grisel wrote:
>> > 2014-03-14 19:40 GMT+01:00 Daniel Vainsencher <
>> [email protected]>:
>> >> Hi Olivier,
>> >>
>> >> Have you looked at the LSH-Forest (and to a lesser extent, Multiprobe)
>> >> paper? I've used it in practice, and it makes a significant improvement
>> >> vs. basic LSH. I've proposed in recent emails to the list how to
>> >> implement it based on sorting and binary search (in order to avoid the
>> >> dependency/maintenance burden of range-queries).
>> >
>> > No I did not read those papers (yet).
>> I don't know the culture here, but since you seemed to summarize rather
>> than wait for someone to read up on LSH:
>> - Where LSH uses hash-maps keyed on the a random hash of length exactly
>> k, LSHF uses an ordered map (binary tree or BTree in the paper). This
>> enables queries of form:
>> "all keys whose prefix of length h<=k matches this hash". This trivially
>> enables a per tree query of type "give me the best q hashes (and maybe a
>> few more)" by doing queries with h=k, h=k-1,... until satisfied (this
>> can be optimized in different ways).
>> - Then queries across multiple trees can be combined in different ways,
>> the simplest being "give me at least q/L candidates from each tree, take
>> the union". More complex variants don't seem to make much difference.
>> - So, the parameter k becomes essentially per-query, L can be determined
>> on its own, and we can determine the number of expensive "real distance"
>> comparisons we want to pay per query.
>>
>> I offer the simple observation that sorted arrays support very fast
>> range queries, so that complex/expensive trees are not necessary for an
>> LSHF implementation.
>>
>> As Maheshakya said it might very
>> > well be the method implemented in Annoy.
>> It seems similar but distinct, see a recent mail.
>>
>> > Do you have sample python
>> > code around?
>> No, I implemented it in C#, and source is not available.
>>
>> >> I think evaluating and tweaking vanilla LSH is a waste of time, when an
>> >> obvious improvement that fixes a significant real world flaw (control
>> of
>> >> number of candidates seen) in the method is available.
>> > If the obvious improvement does not come with a significant increment
>> > in code complexity (and maintenance burden) then I agree, yes.
>> So, was the above was sufficient to answer your prerequisite?
>>
>> >Getting rid of hyperparameters to select manually of via grid-search
>> > is a big usability improvement and can make the method much more
>> > practical to use.
>> In this case, performance becomes better than mere tuning would allow,
>> since it adapts to database density around the query point.
>>
>> Daniel
>>
>>
>> ------------------------------------------------------------------------------
>> Learn Graph Databases - Download FREE O'Reilly Book
>> "Graph Databases" is the definitive new guide to graph databases and their
>> applications. Written by three acclaimed leaders in the field,
>> this first edition is now available. Download your free book today!
>> http://p.sf.net/sfu/13534_NeoTech
>> _______________________________________________
>> Scikit-learn-general mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>
>
>
> --
> Undergraduate,
> Department of Computer Science and Engineering,
> Faculty of Engineering.
> University of Moratuwa,
> Sri Lanka
>
>
> ------------------------------------------------------------------------------
> Learn Graph Databases - Download FREE O'Reilly Book
> "Graph Databases" is the definitive new guide to graph databases and their
> applications. Written by three acclaimed leaders in the field,
> this first edition is now available. Download your free book today!
> http://p.sf.net/sfu/13534_NeoTech
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/13534_NeoTech
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general