[ 
https://issues.apache.org/jira/browse/DATAFU-37?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13985142#comment-13985142
 ] 

Matthew Hayes commented on DATAFU-37:
-------------------------------------

Something else I was wondering about when going through the code and reading 
the paper is how to determine the parameters.

For CosineDistanceHash the important parameter is:
* sRepeat: Number of internal repetitions

For L1PStableHash and L2PStableHash the important parameters are:
* sW: A double representing the quantization parameter (also known as the 
projection width)
* sRepeat: Number of internal repetitions (generally this should be 1 as the 
p-stable hashes have a larger range than one bit) 

You mention that the parameters should be determined empirically.  I also came 
across a presentation you did, file:///Users/mhayes/Downloads/presentation.pdf 
, where you mention a tool that can assist in choosing the parameters.  Do you 
think we could estimate parameters using a data sample and these UDFs or do we 
need additional UDFs to do that?

> Add Locality Sensitive Hashing UDFs
> -----------------------------------
>
>                 Key: DATAFU-37
>                 URL: https://issues.apache.org/jira/browse/DATAFU-37
>             Project: DataFu
>          Issue Type: New Feature
>            Reporter: Casey Stella
>            Assignee: Casey Stella
>         Attachments: DATAFU-37.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Create a set of UDFs to implement [Locality Sensitive 
> Hashing|http://en.wikipedia.org/wiki/Locality-sensitive_hashing] in support 
> of finding k-near neighbors.   Initially, hashes associated with L1, L2 and 
> Cosine similarity should be supported.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to