[
https://issues.apache.org/jira/browse/DATAFU-37?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13986317#comment-13986317
]
Casey Stella commented on DATAFU-37:
------------------------------------
So, I was contemplating including a stand-alone CLI application which takes the
LSH algorithm to use, a distance threshold and a sample and it would walk
through parameter combinations and find the best parameters. I don't think
you'd do it with the UDFs submitted here as they're only parameterized via the
constructor. We could just do what I was going to do in the CLI as a UDF, but
you'd need to do multiple passes through the data, of course (one pass per set
of parameters). Anyway, do you think that should be part of this or a
follow-on item?
> Add Locality Sensitive Hashing UDFs
> -----------------------------------
>
> Key: DATAFU-37
> URL: https://issues.apache.org/jira/browse/DATAFU-37
> Project: DataFu
> Issue Type: New Feature
> Reporter: Casey Stella
> Assignee: Casey Stella
> Attachments: DATAFU-37.patch
>
> Original Estimate: 168h
> Remaining Estimate: 168h
>
> Create a set of UDFs to implement [Locality Sensitive
> Hashing|http://en.wikipedia.org/wiki/Locality-sensitive_hashing] in support
> of finding k-near neighbors. Initially, hashes associated with L1, L2 and
> Cosine similarity should be supported.
--
This message was sent by Atlassian JIRA
(v6.2#6252)