[
https://issues.apache.org/jira/browse/DATAFU-37?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13985142#comment-13985142
]
Matthew Hayes edited comment on DATAFU-37 at 4/30/14 4:07 AM:
--------------------------------------------------------------
Something else I was wondering about when going through the code and reading
the paper is how to determine the parameters.
For CosineDistanceHash the important parameter is:
* sRepeat: Number of internal repetitions
For L1PStableHash and L2PStableHash the important parameters are:
* sW: A double representing the quantization parameter (also known as the
projection width)
* sRepeat: Number of internal repetitions (generally this should be 1 as the
p-stable hashes have a larger range than one bit)
You mention that the parameters should be determined empirically. I also came
across a presentation you did where you mention a tool that can assist in
choosing the parameters. Do you think we could estimate parameters using a
data sample and these UDFs or do we need additional UDFs to do that?
was (Author: matterhayes):
Something else I was wondering about when going through the code and reading
the paper is how to determine the parameters.
For CosineDistanceHash the important parameter is:
* sRepeat: Number of internal repetitions
For L1PStableHash and L2PStableHash the important parameters are:
* sW: A double representing the quantization parameter (also known as the
projection width)
* sRepeat: Number of internal repetitions (generally this should be 1 as the
p-stable hashes have a larger range than one bit)
You mention that the parameters should be determined empirically. I also came
across a presentation you did, file:///Users/mhayes/Downloads/presentation.pdf
, where you mention a tool that can assist in choosing the parameters. Do you
think we could estimate parameters using a data sample and these UDFs or do we
need additional UDFs to do that?
> Add Locality Sensitive Hashing UDFs
> -----------------------------------
>
> Key: DATAFU-37
> URL: https://issues.apache.org/jira/browse/DATAFU-37
> Project: DataFu
> Issue Type: New Feature
> Reporter: Casey Stella
> Assignee: Casey Stella
> Attachments: DATAFU-37.patch
>
> Original Estimate: 168h
> Remaining Estimate: 168h
>
> Create a set of UDFs to implement [Locality Sensitive
> Hashing|http://en.wikipedia.org/wiki/Locality-sensitive_hashing] in support
> of finding k-near neighbors. Initially, hashes associated with L1, L2 and
> Cosine similarity should be supported.
--
This message was sent by Atlassian JIRA
(v6.2#6252)