[ 
https://issues.apache.org/jira/browse/IMPALA-8709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16980609#comment-16980609
 ] 

ASF subversion and git services commented on IMPALA-8709:
---------------------------------------------------------

Commit a862282811e76767c6c5d7874db2a310586f2421 in impala's branch 
refs/heads/master from norbert.luksa
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=a862282 ]

IMPALA-8709: Add Damerau-Levenshtein edit distance built-in function

This patch adds new built-in functions to calculate restricted
Damerau-Levenshtein edit distance (optimal string alignment).
Implmented as dle_dst() and damerau_levenshtein(). If either value is
NULL or both values are NULL returns NULL which differs from Netezza's
dle_dst() which returns the length of the not NULL value or 0 if both
values are NULL. The NULL behavior matches the existing levenshtein()
function.

Also cleans up levenshtein tests.

Testing:
- Added unit tests to expr-test.cc
- Manual testing on over 1400 string pairs from
  http://marvin.cs.uidaho.edu/misspell.html and results match Netezza

Change-Id: Ib759817ec15e7075bf49d51e494e45c8af4db94d
Reviewed-on: http://gerrit.cloudera.org:8080/13794
Reviewed-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Reviewed-by: Csaba Ringhofer <csringho...@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com>


> Add Damerau-Levenshtein edit distance built-in function
> -------------------------------------------------------
>
>                 Key: IMPALA-8709
>                 URL: https://issues.apache.org/jira/browse/IMPALA-8709
>             Project: IMPALA
>          Issue Type: New Feature
>            Reporter: Greg Rahn
>            Assignee: Greg Rahn
>            Priority: Major
>              Labels: built-in-function
>
> Algo (restricted DL / optimal string alignment)
>  [https://en.wikipedia.org/wiki/Damerau%E2%80%93Levenshtein_distance]
> References:
>  
> [https://www.ibm.com/support/knowledgecenter/en/SSULQD_7.2.1/com.ibm.nz.dbu.doc/r_dbuser_functions_expressions_fuzzy_funcs.html]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to