Charles Givre created DRILL-6519:
------------------------------------

             Summary: Add String Distance and Phonetic Functions
                 Key: DRILL-6519
                 URL: https://issues.apache.org/jira/browse/DRILL-6519
             Project: Apache Drill
          Issue Type: Improvement
    Affects Versions: 1.14.0
            Reporter: Charles Givre
            Assignee: Charles Givre


>From a recent project, this collection of functions makes it possible to do 
>fuzzy string matching as well as phonetic matching on strings. 

 

The following functions are all phonetic functions and map text to a number or 
string based on how the word sounds.  For instance "Jayme" and "Jaime" have the 
same soundex values and hence these functions can be used to match similar 
sounding words.
 * caverphone1( <string> )
 * caverphone2( <string> )
 * cologne_phonetic( <string> )
 * dm_soundex( <string> )
 * double_metaphone(<string>)
 * match_rating_encoder( <string> )
 * metaphone(<string>)
 * nysiis( <string> )
 * refined_soundex(<string>)
 * soundex(<string>)

Additionally, there is the
{code:java}
sounds_like(<string1>,<string2>){code}
function which can be used to find strings that sound similar.   For instance:

 
{code:java}
SELECT * 
FROM <data>
WHERE sounds_like( last_name, 'Gretsky' )
{code}
h2. String Distance Functions

In addition to the phonetic functions, there are a series of distance functions 
which measure the difference between two strings.  The functions include:
 * cosine_distance(<string1>,<string2>)
 * fuzzy_score(<string1>,<string2>)
 * hamming_distance (<string1>,<string2>)
 * jaccard_distance (<string1>,<string2>)
 * jaro_distance (<string1>,<string2>)
 * levenshtein_distance (<string1>,<string2>)
 * longest_common_substring_distance(<string1>,<string2>)

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to