[jira] [Commented] (DATAFU-87) Edit distance
[ https://issues.apache.org/jira/browse/DATAFU-87?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16199201#comment-16199201 ] Matthew Hayes commented on DATAFU-87: - Makes sense to me. I'll close. > Edit distance > - > > Key: DATAFU-87 > URL: https://issues.apache.org/jira/browse/DATAFU-87 > Project: DataFu > Issue Type: New Feature >Affects Versions: 1.3.0 >Reporter: Joydeep Banerjee > Attachments: DATAFU-87.patch > > > [This is work-in-progress] > Given 2 strings, provide a measure of dis-similarity (Levenshtein distance) > between them. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DATAFU-87) Edit distance
[ https://issues.apache.org/jira/browse/DATAFU-87?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16197120#comment-16197120 ] Eyal Allweil commented on DATAFU-87: On second thought, since this UDF is now available in Hive, and since Levenshtein distance is a purely local computation, I'm guessing there's no need for a specific DataFu implementation. Shall we close this issue? Here are some links to the Hive UDF. https://issues.apache.org/jira/browse/HIVE-9556 https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-StringFunctions > Edit distance > - > > Key: DATAFU-87 > URL: https://issues.apache.org/jira/browse/DATAFU-87 > Project: DataFu > Issue Type: New Feature >Affects Versions: 1.3.0 >Reporter: Joydeep Banerjee > Attachments: DATAFU-87.patch > > > [This is work-in-progress] > Given 2 strings, provide a measure of dis-similarity (Levenshtein distance) > between them. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DATAFU-87) Edit distance
[ https://issues.apache.org/jira/browse/DATAFU-87?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15606106#comment-15606106 ] Eyal Allweil commented on DATAFU-87: Hi Joydeep, I want to begin by apologizing for the time it's taken us to get to your contribution. Did you ever continue with it? Have you compared your implementation with [the one in Apache Commons Text|https://github.com/apache/commons-text/blob/master/src/main/java/org/apache/commons/text/similarity/LevenshteinDistance.java] or [Commons Lang|https://github.com/apache/commons-lang/blob/master/src/main/java/org/apache/commons/lang3/StringUtils.java#L7731]? (I think they follow the same algorithm, from _Algorithms on Strings, Trees and Sequences_ by Dan Gusfield and Chas Emerick) > Edit distance > - > > Key: DATAFU-87 > URL: https://issues.apache.org/jira/browse/DATAFU-87 > Project: DataFu > Issue Type: New Feature >Affects Versions: 1.3.0 >Reporter: Joydeep Banerjee > Attachments: DATAFU-87.patch > > > [This is work-in-progress] > Given 2 strings, provide a measure of dis-similarity (Levenshtein distance) > between them. -- This message was sent by Atlassian JIRA (v6.3.4#6332)