[jira] [Commented] (DATAFU-87) Edit distance

2017-10-10 Thread Matthew Hayes (JIRA)

[ 
https://issues.apache.org/jira/browse/DATAFU-87?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16199201#comment-16199201
 ] 

Matthew Hayes commented on DATAFU-87:
-

Makes sense to me.  I'll close.

> Edit distance
> -
>
> Key: DATAFU-87
> URL: https://issues.apache.org/jira/browse/DATAFU-87
> Project: DataFu
>  Issue Type: New Feature
>Affects Versions: 1.3.0
>Reporter: Joydeep Banerjee
> Attachments: DATAFU-87.patch
>
>
> [This is work-in-progress]
> Given 2 strings, provide a measure of dis-similarity (Levenshtein distance) 
> between them.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DATAFU-87) Edit distance

2017-10-09 Thread Eyal Allweil (JIRA)

[ 
https://issues.apache.org/jira/browse/DATAFU-87?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16197120#comment-16197120
 ] 

Eyal Allweil commented on DATAFU-87:


On second thought, since this UDF is now available in Hive, and since 
Levenshtein distance is a purely local computation, I'm guessing there's no 
need for a specific DataFu implementation. Shall we close this issue?

Here are some links to the Hive UDF.

https://issues.apache.org/jira/browse/HIVE-9556

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-StringFunctions



> Edit distance
> -
>
> Key: DATAFU-87
> URL: https://issues.apache.org/jira/browse/DATAFU-87
> Project: DataFu
>  Issue Type: New Feature
>Affects Versions: 1.3.0
>Reporter: Joydeep Banerjee
> Attachments: DATAFU-87.patch
>
>
> [This is work-in-progress]
> Given 2 strings, provide a measure of dis-similarity (Levenshtein distance) 
> between them.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DATAFU-87) Edit distance

2016-10-25 Thread Eyal Allweil (JIRA)

[ 
https://issues.apache.org/jira/browse/DATAFU-87?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15606106#comment-15606106
 ] 

Eyal Allweil commented on DATAFU-87:


Hi Joydeep,

I want to begin by apologizing for the time it's taken us to get to your 
contribution. Did you ever continue with it? Have you compared your 
implementation with [the one in Apache Commons 
Text|https://github.com/apache/commons-text/blob/master/src/main/java/org/apache/commons/text/similarity/LevenshteinDistance.java]
 or [Commons 
Lang|https://github.com/apache/commons-lang/blob/master/src/main/java/org/apache/commons/lang3/StringUtils.java#L7731]?
 (I think they follow the same algorithm, from _Algorithms on Strings, Trees 
and Sequences_ by Dan Gusfield and Chas Emerick)

> Edit distance
> -
>
> Key: DATAFU-87
> URL: https://issues.apache.org/jira/browse/DATAFU-87
> Project: DataFu
>  Issue Type: New Feature
>Affects Versions: 1.3.0
>Reporter: Joydeep Banerjee
> Attachments: DATAFU-87.patch
>
>
> [This is work-in-progress]
> Given 2 strings, provide a measure of dis-similarity (Levenshtein distance) 
> between them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)