[
https://issues.apache.org/jira/browse/HIVE-9556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Alexander Pivovarov updated HIVE-9556:
--------------------------------------
Description:
Levenshtein distance is a string metric for measuring the difference between
two sequences. Informally, the Levenshtein distance between two words is the
minimum number of single-character edits (i.e. insertions, deletions or
substitutions) required to change one word into the other. It is named after
Vladimir Levenshtein, who considered this distance in 1965.
Example:
The Levenshtein distance between "kitten" and "sitting" is 3
1. kitten → sitten (substitution of "s" for "k")
2. sitten → sittin (substitution of "i" for "e")
3. sittin → sitting (insertion of "g" at the end).
{code}
select levenshtein('kitten', 'sitting');
3
{code}
was:
algorithm description http://en.wikipedia.org/wiki/Levenshtein_distance
{code}
--one edit operation, greatest str len = 12
str_sim_levenshtein('Test String1', 'Test String2') = 1 - 1 / 12 = 0.91666667
{code}
> create UDF to calculate the Levenshtein distance between two strings
> --------------------------------------------------------------------
>
> Key: HIVE-9556
> URL: https://issues.apache.org/jira/browse/HIVE-9556
> Project: Hive
> Issue Type: Improvement
> Components: UDF
> Reporter: Alexander Pivovarov
> Assignee: Alexander Pivovarov
> Attachments: HIVE-9556.1.patch, HIVE-9556.2.patch
>
>
> Levenshtein distance is a string metric for measuring the difference between
> two sequences. Informally, the Levenshtein distance between two words is the
> minimum number of single-character edits (i.e. insertions, deletions or
> substitutions) required to change one word into the other. It is named after
> Vladimir Levenshtein, who considered this distance in 1965.
> Example:
> The Levenshtein distance between "kitten" and "sitting" is 3
> 1. kitten → sitten (substitution of "s" for "k")
> 2. sitten → sittin (substitution of "i" for "e")
> 3. sittin → sitting (insertion of "g" at the end).
> {code}
> select levenshtein('kitten', 'sitting');
> 3
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)