On Fri, Mar 14, 2014 at 01:52:35PM +0100, Olivier Grisel wrote:
> > Indeed, but they use random projections rather than LSH.

> It is my understanding that Annoy implements ANN hashing by using a
> data-driven forests of Random Projections rather than (data
> independent) uniformly distributed RPs to bucket the samples.

> Both are hashing schemes, but annoy is focusing on high density
> regions of the dataset, hence problem more efficient.

> >> Another user case would be to implement k-Nearest Neighbors
> >> classification on datasets with a dimension high enough to render
> >> exact methods such as KD-tree and ball-tree inefficient (I would say
> >> 500+ features).

> > We do this quite often in the lab, and we simply use a randomized PCA on
> > the train set. It works very well.

> > For simple KNN on numerical features, what is the evidence that LSH works
> > better than random projections? Forgive me for asking this question, I
> > may be unaware of the literature.

> Hashing the samples into buckets like Vanilla LSH and the Annoy variant do, 
> is:

> - more memory efficient: you don't store the results of the
> projections in memory, just the integer index of the buckets,
> - as a faster query time as you don't need to compute pairwise
> distances with the whole (projected or not dataset): only with the
> samples in the same bucket.

OK. This is interesting. I think that the technical part of the proposal
should focus on these implementation considerations, as they are
important to make the proposal viable.

Gaƫl

------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/13534_NeoTech
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to