[ 
https://issues.apache.org/jira/browse/DATAFU-37?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13986317#comment-13986317
 ] 

Casey Stella commented on DATAFU-37:
------------------------------------

So, I was contemplating including a stand-alone CLI application which takes the 
LSH algorithm to use, a distance threshold and a sample and it would walk 
through parameter combinations and find the best parameters.  I don't think 
you'd do it with the UDFs submitted here as they're only parameterized via the 
constructor.  We could just do what I was going to do in the CLI as a UDF, but 
you'd need to do multiple passes through the data, of course (one pass per set 
of parameters).  Anyway, do you think that should be part of this or a 
follow-on item?

> Add Locality Sensitive Hashing UDFs
> -----------------------------------
>
>                 Key: DATAFU-37
>                 URL: https://issues.apache.org/jira/browse/DATAFU-37
>             Project: DataFu
>          Issue Type: New Feature
>            Reporter: Casey Stella
>            Assignee: Casey Stella
>         Attachments: DATAFU-37.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Create a set of UDFs to implement [Locality Sensitive 
> Hashing|http://en.wikipedia.org/wiki/Locality-sensitive_hashing] in support 
> of finding k-near neighbors.   Initially, hashes associated with L1, L2 and 
> Cosine similarity should be supported.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to