Thank you Robert.
With this method, it is possible to use binary tree implementation in*
sklearn.neighbors* module to store transformed data points. As majority of
LSH algorithms produce binary values after hashing, concatenated k number
of hash values can be stored in binary trees. Height of the binary tree
will be k and there will be L number of binary trees.(as L number of Hash
tables in plain LSH based ANN algorithm)

While searching for a first mentor this project, I will keep contributing
as I can.

Best Regards,
Maheshakya.


On Sun, Mar 9, 2014 at 3:16 PM, Robert Layton <[email protected]>wrote:

> I think I'd be better as a second mentor -- this would be my first GSoC of
> any type.
> That said, I am here to help!
>
> *That said*, you'll need to make the case to the community -- the Forest
> LSH is a variation of a key algorithm. LSH by itself is definitely worthy
> of inclusion in scikit-learn, and it looks like Forest LSH would be
> sufficiently well-used (Google tells me 138 citations), but I think we need
> the community's decision.
>
>
> On 9 March 2014 17:51, Maheshakya Wijewardena <[email protected]>wrote:
>
>> Yes, I think Robert Layton is a possible mentor for this project. I have
>> already discussed some facts about this with him in the issues in Github.
>> I was thinking of getting the opinion of other contributors as well as
>> they may have something to say about this or other suggestions for better
>> ways of implementing.
>>
>>
>>
>> On Sun, Mar 9, 2014 at 12:12 PM, Ronnie Ghose <[email protected]>wrote:
>>
>>> rather than approval of the community you should ask for a mentor as I
>>> assumed you read the GSOC idea list - a mentor is a heavy job relatively
>>>
>>>
>>> On Sun, Mar 9, 2014 at 1:36 AM, Maheshakya Wijewardena <
>>> [email protected]> wrote:
>>>
>>>> Hi Daniel,
>>>>
>>>> Thank you for pointing this out. I also have noticed that storing hash
>>>> tables of training data and dealing with binary vectors might be
>>>> problematic when using plain LSH based ANN.
>>>> However, before implementing LSH forests, I need to get the approval of
>>>> the community. I'm looking forward to have feedback on this.
>>>>
>>>> Thank you,
>>>> Maheshakya.
>>>>
>>>>
>>>> On Thu, Mar 6, 2014 at 2:34 PM, Daniel Vainsencher <
>>>> [email protected]> wrote:
>>>>
>>>>>  Hi Maheshakya,
>>>>>
>>>>> In regular LSH, a particular setting of the number of hash functions
>>>>> per index (k) and the number of indexes (L) essentially determines the 
>>>>> size
>>>>> of the region in space from which candidates will be chosen in response to
>>>>> a query. If queries q1 and q2 are in areas where the density of data 
>>>>> points
>>>>> is very different, then one will receive many more candidates than the
>>>>> other, thus be slower and more accurate. Thus tuning k, L involves a
>>>>> tradeoff, and with varied density, you cannot win.
>>>>>
>>>>> Even worse: as you add data, your k, L stop being optimal. So tuning
>>>>> on a  significantly smaller sample is not effective, which makes tuning
>>>>> even more expensive.
>>>>>
>>>>> Going back to LSH Forest, while an implementation using binary trees
>>>>> is possible and ok for testing, but not very space or cache efficient.
>>>>> Other data structures (b-trees and variants, see [2] for some discussion)
>>>>> are more appropriate. The key operations are range queries and depending 
>>>>> on
>>>>> data change rate, insertion.
>>>>>
>>>>> Space efficiency of indexes is also affected by the hash type: if you
>>>>> use a binary hash functions, choose k in {32, 64} and use the appropriate
>>>>> integer types to avoid numpy's awkwardness around bit vectors. LSH-Forest
>>>>> has no difficulty with such high k in the index (as opposed to plain LSH).
>>>>>
>>>>> Daniel Vainsencher
>>>>>
>>>>>
>>>>> On 03/05/2014 07:18 AM, Maheshakya Wijewardena wrote:
>>>>>
>>>>> Hi Daniel,
>>>>> These alternative strategies of LSH are a good way to start using the
>>>>> existing implementations of Scikit-learn. But why do you say the standard
>>>>> algorithms are impractical?
>>>>>
>>>>>  Maheshakya
>>>>>
>>>>>
>>>>>  On Wed, Mar 5, 2014 at 1:31 AM, Daniel Vainsencher <
>>>>> [email protected]> wrote:
>>>>>
>>>>>>  Hi scikit-learn, Maheshakya,
>>>>>>
>>>>>> In my experience implementing and using LSH, the vanilla algorithms
>>>>>> are very impractical. Two extensions that help a lot (and combine nicely)
>>>>>> are:
>>>>>> - Multi-probe LSH [1], where hash values neighboring the target's
>>>>>> hash are also used.
>>>>>> - LSH Forest [2], where fixed length hashes are replaced by length
>>>>>> bounded hashes; the hash used is just long enough to get sufficiently few
>>>>>> candidates for the target's nearest neighbor.
>>>>>>
>>>>>> Together they make it feasible to limit the number of independent
>>>>>> copies of the index to a reasonable number. Incidentally, the LSH Forest
>>>>>> approach may be implemented using binary trees.
>>>>>>
>>>>>> Daniel Vainsencher
>>>>>>
>>>>>> [1] www.cs.princeton.edu/cass/papers/mp*lsh*_vldb07.pdf
>>>>>> [2] www2005.org/docs/p651.pdf
>>>>>>
>>>>>>
>>>>>> On 03/04/2014 06:09 PM, Maheshakya Wijewardena wrote:
>>>>>>
>>>>>>  Hi,
>>>>>>
>>>>>> Currently, unsupervised neighbor search is implemented in
>>>>>> sklearn.neighbors. There are three algorithms used to perform neighbor
>>>>>> search.
>>>>>>
>>>>>>    - kd tree
>>>>>>    - ball tree
>>>>>>    - brute force method
>>>>>>
>>>>>> kd tree and ball tree are implemented separately as cython
>>>>>> implementations. Both of them use binary tree(binary_tree.pxi). In the
>>>>>> NeighborBass class, above implementations are used.
>>>>>>
>>>>>>  As Locality Sensitivity Hashing will also be used in approximating
>>>>>> nearest neighbors, its' implementation should also be a part of
>>>>>> sklearn.neighbors. But Locality sensitivity hashing algorithms do not
>>>>>> involve binary tree implementation. Therefore it has to be implemented
>>>>>> separately and be used in sklearn.neighbors as another algorithm.Is this
>>>>>> approach correct?
>>>>>>
>>>>>>  I'm trying to implement a base LSH class and the hashing algorithms
>>>>>> as cython implementations. To approximate neighbor search using those
>>>>>> algorithms, multiple hash tables will be created with those hash 
>>>>>> functions
>>>>>> so, for that an efficient way to store is required.
>>>>>>
>>>>>>  These are some of the issues I found disturbing while I was trying
>>>>>> to implement LSH to approximate neighbor search. Is this an appropriate
>>>>>> approach to implement this or are there any other better ways?
>>>>>>
>>>>>>
>>>>>>
>>>>>> Regards,
>>>>>> Maheshakya,
>>>>>> --
>>>>>> Undergraduate,
>>>>>> Department of Computer Science and Engineering,
>>>>>> Faculty of Engineering.
>>>>>> University of Moratuwa,
>>>>>> Sri Lanka
>>>>>>
>>>>>>
>>>>>>  
>>>>>> ------------------------------------------------------------------------------
>>>>>> Subversion Kills Productivity. Get off Subversion & Make the Move to 
>>>>>> Perforce.
>>>>>> With Perforce, you get hassle-free workflows. Merge that actually works.
>>>>>> Faster operations. Version large binaries.  Built-in WAN optimization 
>>>>>> and the
>>>>>> freedom to use Git, Perforce or both. Make the move to 
>>>>>> Perforce.http://pubads.g.doubleclick.net/gampad/clk?id=122218951&iu=/4140/ostg.clktrk
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Scikit-learn-general mailing 
>>>>>> [email protected]https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> ------------------------------------------------------------------------------
>>>>>> Subversion Kills Productivity. Get off Subversion & Make the Move to
>>>>>> Perforce.
>>>>>> With Perforce, you get hassle-free workflows. Merge that actually
>>>>>> works.
>>>>>> Faster operations. Version large binaries.  Built-in WAN optimization
>>>>>> and the
>>>>>> freedom to use Git, Perforce or both. Make the move to Perforce.
>>>>>>
>>>>>> http://pubads.g.doubleclick.net/gampad/clk?id=122218951&iu=/4140/ostg.clktrk
>>>>>> _______________________________________________
>>>>>> Scikit-learn-general mailing list
>>>>>> [email protected]
>>>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>  --
>>>>> Undergraduate,
>>>>> Department of Computer Science and Engineering,
>>>>> Faculty of Engineering.
>>>>> University of Moratuwa,
>>>>> Sri Lanka
>>>>>
>>>>>
>>>>> ------------------------------------------------------------------------------
>>>>> Subversion Kills Productivity. Get off Subversion & Make the Move to 
>>>>> Perforce.
>>>>> With Perforce, you get hassle-free workflows. Merge that actually works.
>>>>> Faster operations. Version large binaries.  Built-in WAN optimization and 
>>>>> the
>>>>> freedom to use Git, Perforce or both. Make the move to 
>>>>> Perforce.http://pubads.g.doubleclick.net/gampad/clk?id=122218951&iu=/4140/ostg.clktrk
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Scikit-learn-general mailing 
>>>>> [email protected]https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> ------------------------------------------------------------------------------
>>>>> Subversion Kills Productivity. Get off Subversion & Make the Move to
>>>>> Perforce.
>>>>> With Perforce, you get hassle-free workflows. Merge that actually
>>>>> works.
>>>>> Faster operations. Version large binaries.  Built-in WAN optimization
>>>>> and the
>>>>> freedom to use Git, Perforce or both. Make the move to Perforce.
>>>>>
>>>>> http://pubads.g.doubleclick.net/gampad/clk?id=122218951&iu=/4140/ostg.clktrk
>>>>> _______________________________________________
>>>>> Scikit-learn-general mailing list
>>>>> [email protected]
>>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Undergraduate,
>>>> Department of Computer Science and Engineering,
>>>> Faculty of Engineering.
>>>> University of Moratuwa,
>>>> Sri Lanka
>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>> Subversion Kills Productivity. Get off Subversion & Make the Move to
>>>> Perforce.
>>>> With Perforce, you get hassle-free workflows. Merge that actually works.
>>>> Faster operations. Version large binaries.  Built-in WAN optimization
>>>> and the
>>>> freedom to use Git, Perforce or both. Make the move to Perforce.
>>>>
>>>> http://pubads.g.doubleclick.net/gampad/clk?id=122218951&iu=/4140/ostg.clktrk
>>>> _______________________________________________
>>>> Scikit-learn-general mailing list
>>>> [email protected]
>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>>
>>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Subversion Kills Productivity. Get off Subversion & Make the Move to
>>> Perforce.
>>> With Perforce, you get hassle-free workflows. Merge that actually works.
>>> Faster operations. Version large binaries.  Built-in WAN optimization
>>> and the
>>> freedom to use Git, Perforce or both. Make the move to Perforce.
>>>
>>> http://pubads.g.doubleclick.net/gampad/clk?id=122218951&iu=/4140/ostg.clktrk
>>> _______________________________________________
>>> Scikit-learn-general mailing list
>>> [email protected]
>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>
>>>
>>
>>
>> --
>> Undergraduate,
>> Department of Computer Science and Engineering,
>> Faculty of Engineering.
>> University of Moratuwa,
>> Sri Lanka
>>
>>
>> ------------------------------------------------------------------------------
>> Subversion Kills Productivity. Get off Subversion & Make the Move to
>> Perforce.
>> With Perforce, you get hassle-free workflows. Merge that actually works.
>> Faster operations. Version large binaries.  Built-in WAN optimization and
>> the
>> freedom to use Git, Perforce or both. Make the move to Perforce.
>>
>> http://pubads.g.doubleclick.net/gampad/clk?id=122218951&iu=/4140/ostg.clktrk
>> _______________________________________________
>> Scikit-learn-general mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>
>
> ------------------------------------------------------------------------------
> Subversion Kills Productivity. Get off Subversion & Make the Move to
> Perforce.
> With Perforce, you get hassle-free workflows. Merge that actually works.
> Faster operations. Version large binaries.  Built-in WAN optimization and
> the
> freedom to use Git, Perforce or both. Make the move to Perforce.
>
> http://pubads.g.doubleclick.net/gampad/clk?id=122218951&iu=/4140/ostg.clktrk
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>


-- 
Undergraduate,
Department of Computer Science and Engineering,
Faculty of Engineering.
University of Moratuwa,
Sri Lanka
------------------------------------------------------------------------------
Subversion Kills Productivity. Get off Subversion & Make the Move to Perforce.
With Perforce, you get hassle-free workflows. Merge that actually works. 
Faster operations. Version large binaries.  Built-in WAN optimization and the
freedom to use Git, Perforce or both. Make the move to Perforce.
http://pubads.g.doubleclick.net/gampad/clk?id=122218951&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to