LSH Forest is a bit similar to Annoy (multiple trees, tune essentially 
only the number of them), but there is (at least) one significant 
difference.

IIUC, Annoy has large leafs, with many points in them. LSHForest is best 
used with very high k (say, 64), where each point (that was not 
duplicated) is very likely to have its own unique hash key.

Thus, Annoy is probably lower bounded in the number of candidates it can 
consider (compare raw distances with query), whereas LSHForest can tune 
that down to around 2L (two per tree). Each Annoy index can be cheaper 
than a binary tree based LSHF implementation (because fewer tree nodes); 
this probably is part of the explanation why they use so many trees (50 
in their example, I've used 10 and even as few as 3 with LSH Forest).

Anyway, I would guess(!!) we can get performance similar to that of 
Annoy or better, writing 0 non-Python code.

Daniel

On 03/17/2014 06:32 AM, Maheshakya Wijewardena wrote:
> Ok, I failed to notice that in a Pareto efficiency sense. But it makes
> sense to try the approach used in Annoy, when the maintainability of the
> code is considered.
> Apart from that, as I have mentioned earlier, the only LSH based ANN
> method used in FLANN is usable for matching binary features using
> Hamming distances. Using that wont be much of a use for scikit-learn.
>
> So in my opinion, it is better give a high priority to LSH forest based
> ANN when doing prototyping.
>
>
> On Mon, Mar 17, 2014 at 12:55 AM, Olivier Grisel
> <[email protected] <mailto:[email protected]>> wrote:
>
>     2014-03-16 20:19 GMT+01:00 Maheshakya Wijewardena
>     <[email protected] <mailto:[email protected]>>:
>      > Yes, I was considering the accuracy as well when speaking of the
>      > performance. (Actually put more weight to that)
>
>     Even though, on most plots the FLANN points are dominating the Annoy
>     plots (in a Pareto-optimal sense).
>
>     But the speed difference is not large enough to justify the added
>     complexity of FLANN IMHO.
>
>     --
>     Olivier
>
>     
> ------------------------------------------------------------------------------
>     Learn Graph Databases - Download FREE O'Reilly Book
>     "Graph Databases" is the definitive new guide to graph databases and
>     their
>     applications. Written by three acclaimed leaders in the field,
>     this first edition is now available. Download your free book today!
>     http://p.sf.net/sfu/13534_NeoTech
>     _______________________________________________
>     Scikit-learn-general mailing list
>     [email protected]
>     <mailto:[email protected]>
>     https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
>
>
> --
> Undergraduate,
> Department of Computer Science and Engineering,
> Faculty of Engineering.
> University of Moratuwa,
> Sri Lanka
>
>
> ------------------------------------------------------------------------------
> Learn Graph Databases - Download FREE O'Reilly Book
> "Graph Databases" is the definitive new guide to graph databases and their
> applications. Written by three acclaimed leaders in the field,
> this first edition is now available. Download your free book today!
> http://p.sf.net/sfu/13534_NeoTech
>
>
>
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>


------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/13534_NeoTech
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to