Hi scikit-learn, Maheshakya,
In my experience implementing and using LSH, the vanilla
algorithms are very impractical. Two extensions that help a lot
(and combine nicely) are:
- Multi-probe LSH [1], where hash values neighboring the target's
hash are also used.
- LSH Forest [2], where fixed length hashes are replaced by length
bounded hashes; the hash used is just long enough to get
sufficiently few candidates for the target's nearest neighbor.
Together they make it feasible to limit the number of independent
copies of the index to a reasonable number. Incidentally, the LSH
Forest approach may be implemented using binary trees.
Daniel Vainsencher
[1] www.cs.princeton.edu/cass/papers/mplsh_vldb07.pdf
[2] www2005.org/docs/p651.pdf
On 03/04/2014 06:09 PM, Maheshakya Wijewardena wrote:
Hi,
Currently, unsupervised neighbor search is implemented in
sklearn.neighbors. There are three algorithms used to perform
neighbor search.
- kd tree
- ball tree
- brute force method
kd tree and ball tree are implemented separately as cython
implementations. Both of them use binary
tree(binary_tree.pxi). In the NeighborBass class, above
implementations are used.
As Locality Sensitivity Hashing will also be used in
approximating nearest neighbors, its' implementation should
also be a part of sklearn.neighbors. But Locality sensitivity
hashing algorithms do not involve binary tree implementation.
Therefore it has to be implemented separately and be used in
sklearn.neighbors as another algorithm.Is this approach
correct?
I'm trying to implement a base LSH class and the hashing
algorithms as cython implementations. To approximate neighbor
search using those algorithms, multiple hash tables will be
created with those hash functions so, for that an efficient
way to store is required.
These are some of the issues I found disturbing while I was
trying to implement LSH to approximate neighbor search. Is
this an appropriate approach to implement this or are there
any other better ways?
Regards,
Maheshakya,
--
Undergraduate,
Department of Computer Science and Engineering,
Faculty of Engineering.
University of Moratuwa,
Sri Lanka
------------------------------------------------------------------------------
Subversion Kills Productivity. Get off Subversion & Make the Move to Perforce.
With Perforce, you get hassle-free workflows. Merge that actually works.
Faster operations. Version large binaries. Built-in WAN optimization and the
freedom to use Git, Perforce or both. Make the move to Perforce.
http://pubads.g.doubleclick.net/gampad/clk?id=122218951&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
|
------------------------------------------------------------------------------
Subversion Kills Productivity. Get off Subversion & Make the Move to Perforce.
With Perforce, you get hassle-free workflows. Merge that actually works.
Faster operations. Version large binaries. Built-in WAN optimization and the
freedom to use Git, Perforce or both. Make the move to Perforce.
http://pubads.g.doubleclick.net/gampad/clk?id=122218951&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general