Yes, I think Robert Layton is a possible mentor for this project. I have
already discussed some facts about this with him in the issues in Github.
I was thinking of getting the opinion of other contributors as well as they
may have something to say about this or other suggestions for better ways
of implementing.



On Sun, Mar 9, 2014 at 12:12 PM, Ronnie Ghose <[email protected]>wrote:

> rather than approval of the community you should ask for a mentor as I
> assumed you read the GSOC idea list - a mentor is a heavy job relatively
>
>
> On Sun, Mar 9, 2014 at 1:36 AM, Maheshakya Wijewardena <
> [email protected]> wrote:
>
>> Hi Daniel,
>>
>> Thank you for pointing this out. I also have noticed that storing hash
>> tables of training data and dealing with binary vectors might be
>> problematic when using plain LSH based ANN.
>> However, before implementing LSH forests, I need to get the approval of
>> the community. I'm looking forward to have feedback on this.
>>
>> Thank you,
>> Maheshakya.
>>
>>
>> On Thu, Mar 6, 2014 at 2:34 PM, Daniel Vainsencher <
>> [email protected]> wrote:
>>
>>>  Hi Maheshakya,
>>>
>>> In regular LSH, a particular setting of the number of hash functions per
>>> index (k) and the number of indexes (L) essentially determines the size of
>>> the region in space from which candidates will be chosen in response to a
>>> query. If queries q1 and q2 are in areas where the density of data points
>>> is very different, then one will receive many more candidates than the
>>> other, thus be slower and more accurate. Thus tuning k, L involves a
>>> tradeoff, and with varied density, you cannot win.
>>>
>>> Even worse: as you add data, your k, L stop being optimal. So tuning on
>>> a  significantly smaller sample is not effective, which makes tuning even
>>> more expensive.
>>>
>>> Going back to LSH Forest, while an implementation using binary trees is
>>> possible and ok for testing, but not very space or cache efficient. Other
>>> data structures (b-trees and variants, see [2] for some discussion) are
>>> more appropriate. The key operations are range queries and depending on
>>> data change rate, insertion.
>>>
>>> Space efficiency of indexes is also affected by the hash type: if you
>>> use a binary hash functions, choose k in {32, 64} and use the appropriate
>>> integer types to avoid numpy's awkwardness around bit vectors. LSH-Forest
>>> has no difficulty with such high k in the index (as opposed to plain LSH).
>>>
>>> Daniel Vainsencher
>>>
>>>
>>> On 03/05/2014 07:18 AM, Maheshakya Wijewardena wrote:
>>>
>>> Hi Daniel,
>>> These alternative strategies of LSH are a good way to start using the
>>> existing implementations of Scikit-learn. But why do you say the standard
>>> algorithms are impractical?
>>>
>>>  Maheshakya
>>>
>>>
>>>  On Wed, Mar 5, 2014 at 1:31 AM, Daniel Vainsencher <
>>> [email protected]> wrote:
>>>
>>>>  Hi scikit-learn, Maheshakya,
>>>>
>>>> In my experience implementing and using LSH, the vanilla algorithms are
>>>> very impractical. Two extensions that help a lot (and combine nicely) are:
>>>> - Multi-probe LSH [1], where hash values neighboring the target's hash
>>>> are also used.
>>>> - LSH Forest [2], where fixed length hashes are replaced by length
>>>> bounded hashes; the hash used is just long enough to get sufficiently few
>>>> candidates for the target's nearest neighbor.
>>>>
>>>> Together they make it feasible to limit the number of independent
>>>> copies of the index to a reasonable number. Incidentally, the LSH Forest
>>>> approach may be implemented using binary trees.
>>>>
>>>> Daniel Vainsencher
>>>>
>>>> [1] www.cs.princeton.edu/cass/papers/mp*lsh*_vldb07.pdf
>>>> [2] www2005.org/docs/p651.pdf
>>>>
>>>>
>>>> On 03/04/2014 06:09 PM, Maheshakya Wijewardena wrote:
>>>>
>>>>  Hi,
>>>>
>>>> Currently, unsupervised neighbor search is implemented in
>>>> sklearn.neighbors. There are three algorithms used to perform neighbor
>>>> search.
>>>>
>>>>    - kd tree
>>>>    - ball tree
>>>>    - brute force method
>>>>
>>>> kd tree and ball tree are implemented separately as cython
>>>> implementations. Both of them use binary tree(binary_tree.pxi). In the
>>>> NeighborBass class, above implementations are used.
>>>>
>>>>  As Locality Sensitivity Hashing will also be used in approximating
>>>> nearest neighbors, its' implementation should also be a part of
>>>> sklearn.neighbors. But Locality sensitivity hashing algorithms do not
>>>> involve binary tree implementation. Therefore it has to be implemented
>>>> separately and be used in sklearn.neighbors as another algorithm.Is this
>>>> approach correct?
>>>>
>>>>  I'm trying to implement a base LSH class and the hashing algorithms
>>>> as cython implementations. To approximate neighbor search using those
>>>> algorithms, multiple hash tables will be created with those hash functions
>>>> so, for that an efficient way to store is required.
>>>>
>>>>  These are some of the issues I found disturbing while I was trying to
>>>> implement LSH to approximate neighbor search. Is this an appropriate
>>>> approach to implement this or are there any other better ways?
>>>>
>>>>
>>>>
>>>> Regards,
>>>> Maheshakya,
>>>> --
>>>> Undergraduate,
>>>> Department of Computer Science and Engineering,
>>>> Faculty of Engineering.
>>>> University of Moratuwa,
>>>> Sri Lanka
>>>>
>>>>
>>>>  
>>>> ------------------------------------------------------------------------------
>>>> Subversion Kills Productivity. Get off Subversion & Make the Move to 
>>>> Perforce.
>>>> With Perforce, you get hassle-free workflows. Merge that actually works.
>>>> Faster operations. Version large binaries.  Built-in WAN optimization and 
>>>> the
>>>> freedom to use Git, Perforce or both. Make the move to 
>>>> Perforce.http://pubads.g.doubleclick.net/gampad/clk?id=122218951&iu=/4140/ostg.clktrk
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Scikit-learn-general mailing 
>>>> [email protected]https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>>
>>>>
>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>> Subversion Kills Productivity. Get off Subversion & Make the Move to
>>>> Perforce.
>>>> With Perforce, you get hassle-free workflows. Merge that actually works.
>>>> Faster operations. Version large binaries.  Built-in WAN optimization
>>>> and the
>>>> freedom to use Git, Perforce or both. Make the move to Perforce.
>>>>
>>>> http://pubads.g.doubleclick.net/gampad/clk?id=122218951&iu=/4140/ostg.clktrk
>>>> _______________________________________________
>>>> Scikit-learn-general mailing list
>>>> [email protected]
>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>>
>>>>
>>>
>>>
>>>  --
>>> Undergraduate,
>>> Department of Computer Science and Engineering,
>>> Faculty of Engineering.
>>> University of Moratuwa,
>>> Sri Lanka
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Subversion Kills Productivity. Get off Subversion & Make the Move to 
>>> Perforce.
>>> With Perforce, you get hassle-free workflows. Merge that actually works.
>>> Faster operations. Version large binaries.  Built-in WAN optimization and 
>>> the
>>> freedom to use Git, Perforce or both. Make the move to 
>>> Perforce.http://pubads.g.doubleclick.net/gampad/clk?id=122218951&iu=/4140/ostg.clktrk
>>>
>>>
>>>
>>> _______________________________________________
>>> Scikit-learn-general mailing 
>>> [email protected]https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>
>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Subversion Kills Productivity. Get off Subversion & Make the Move to
>>> Perforce.
>>> With Perforce, you get hassle-free workflows. Merge that actually works.
>>> Faster operations. Version large binaries.  Built-in WAN optimization
>>> and the
>>> freedom to use Git, Perforce or both. Make the move to Perforce.
>>>
>>> http://pubads.g.doubleclick.net/gampad/clk?id=122218951&iu=/4140/ostg.clktrk
>>> _______________________________________________
>>> Scikit-learn-general mailing list
>>> [email protected]
>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>
>>>
>>
>>
>> --
>> Undergraduate,
>> Department of Computer Science and Engineering,
>> Faculty of Engineering.
>> University of Moratuwa,
>> Sri Lanka
>>
>>
>> ------------------------------------------------------------------------------
>> Subversion Kills Productivity. Get off Subversion & Make the Move to
>> Perforce.
>> With Perforce, you get hassle-free workflows. Merge that actually works.
>> Faster operations. Version large binaries.  Built-in WAN optimization and
>> the
>> freedom to use Git, Perforce or both. Make the move to Perforce.
>>
>> http://pubads.g.doubleclick.net/gampad/clk?id=122218951&iu=/4140/ostg.clktrk
>> _______________________________________________
>> Scikit-learn-general mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>
>
> ------------------------------------------------------------------------------
> Subversion Kills Productivity. Get off Subversion & Make the Move to
> Perforce.
> With Perforce, you get hassle-free workflows. Merge that actually works.
> Faster operations. Version large binaries.  Built-in WAN optimization and
> the
> freedom to use Git, Perforce or both. Make the move to Perforce.
>
> http://pubads.g.doubleclick.net/gampad/clk?id=122218951&iu=/4140/ostg.clktrk
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>


-- 
Undergraduate,
Department of Computer Science and Engineering,
Faculty of Engineering.
University of Moratuwa,
Sri Lanka
------------------------------------------------------------------------------
Subversion Kills Productivity. Get off Subversion & Make the Move to Perforce.
With Perforce, you get hassle-free workflows. Merge that actually works. 
Faster operations. Version large binaries.  Built-in WAN optimization and the
freedom to use Git, Perforce or both. Make the move to Perforce.
http://pubads.g.doubleclick.net/gampad/clk?id=122218951&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to