Re: [Scikit-learn-general] Implementing Locality Sensitivity Hashing to approximate neighbor search

Maheshakya Wijewardena Fri, 14 Mar 2014 16:07:32 -0700

Isn't the algorithm used in Annoy is somewhat similar to LSH-forest. In
Annoy, a binary tree is built using the result of a random projection on a
training data points at each node. (Random projection itself is method to
do LSH) The main difference I see is the data driven nature of Random
projection in Annoy.(Only when metric for AnnoyIndex='euclidean')


Maheshakya


On Sat, Mar 15, 2014 at 12:10 AM, Daniel Vainsencher <
[email protected]> wrote:

> Hi Olivier,
>
> Have you looked at the LSH-Forest (and to a lesser extent, Multiprobe)
> paper? I've used it in practice, and it makes a significant improvement
> vs. basic LSH. I've proposed in recent emails to the list how to
> implement it based on sorting and binary search (in order to avoid the
> dependency/maintenance burden of range-queries).
>
> I think evaluating and tweaking vanilla LSH is a waste of time, when an
> obvious improvement that fixes a significant real world flaw (control of
> number of candidates seen) in the method is available.
>
> Daniel
>
> On 03/14/2014 11:38 AM, Olivier Grisel wrote:
> > 2014-03-13 23:15 GMT+01:00 Robert Layton <[email protected]>:
> >> Thanks Gael. My thinking was to implement "Basic LSH with basic data
> >> structures" and then spend some of the time working on seeing if
> moderate
> >> improvements (i.e. a more complex data structure) can deliver benefits.
> This
> >> way, we get the key deliverable, and spend some time trying to see if
> we can
> >> do better.
> >>
> >> I'd also like to see scalability added to the evaluation criteria!
> >
> > The problem is that by having had a look at the ANN literature in the
> > past, my gut feeling is that basic LSH is mostly worthless for
> > practical applications. I would rather not have a method is
> > scikit-learn that is not performing good enough (in terms of speed up
> > vs exact brute force queries) to be useful on real data.
> >
> > To decide I think the best way to proceed would be to have a
> > evaluation / prototyping stage in the project that would first hack an
> > implementation in of the basic methods (at least random projections
> > and maybe others) in a gist outside of the scikit-learn codebase, not
> > to worry about documentation and compliance with the API and benchmark
> > it / profile it on representative datasets with various statistical
> > structures and compare the outcome to alternative methods such as the
> > methods implemented in FLANN and Annoy.
> >
> > Actually there exists several Python implementation of vanilla LSH:
> >
> > https://pypi.python.org/pypi/lshash
> > http://nearpy.io/
> > https://github.com/yahoo/Optimal-LSH/blob/master/lsh.py
> >
> > It would be interesting to benchmark them all on the same datasets
> > (with various dimensionality, ranks and sparsity patterns).
> >
> > Note, that Radim already had a look at the yahoo LSH implementation
> > and found it impractical to use:
> >
> >
> http://radimrehurek.com/2013/12/performance-shootout-of-nearest-neighbours-contestants/
> >
>
>
>
> ------------------------------------------------------------------------------
> Learn Graph Databases - Download FREE O'Reilly Book
> "Graph Databases" is the definitive new guide to graph databases and their
> applications. Written by three acclaimed leaders in the field,
> this first edition is now available. Download your free book today!
> http://p.sf.net/sfu/13534_NeoTech
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>



-- 
Undergraduate,
Department of Computer Science and Engineering,
Faculty of Engineering.
University of Moratuwa,
Sri Lanka

------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/13534_NeoTech

_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] Implementing Locality Sensitivity Hashing to approximate neighbor search

Reply via email to