[ https://issues.apache.org/jira/browse/DATAFU-37?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13986317#comment-13986317 ]
Casey Stella commented on DATAFU-37: ------------------------------------ So, I was contemplating including a stand-alone CLI application which takes the LSH algorithm to use, a distance threshold and a sample and it would walk through parameter combinations and find the best parameters. I don't think you'd do it with the UDFs submitted here as they're only parameterized via the constructor. We could just do what I was going to do in the CLI as a UDF, but you'd need to do multiple passes through the data, of course (one pass per set of parameters). Anyway, do you think that should be part of this or a follow-on item? > Add Locality Sensitive Hashing UDFs > ----------------------------------- > > Key: DATAFU-37 > URL: https://issues.apache.org/jira/browse/DATAFU-37 > Project: DataFu > Issue Type: New Feature > Reporter: Casey Stella > Assignee: Casey Stella > Attachments: DATAFU-37.patch > > Original Estimate: 168h > Remaining Estimate: 168h > > Create a set of UDFs to implement [Locality Sensitive > Hashing|http://en.wikipedia.org/wiki/Locality-sensitive_hashing] in support > of finding k-near neighbors. Initially, hashes associated with L1, L2 and > Cosine similarity should be supported. -- This message was sent by Atlassian JIRA (v6.2#6252)