Charles Givre created DRILL-6519: ------------------------------------ Summary: Add String Distance and Phonetic Functions Key: DRILL-6519 URL: https://issues.apache.org/jira/browse/DRILL-6519 Project: Apache Drill Issue Type: Improvement Affects Versions: 1.14.0 Reporter: Charles Givre Assignee: Charles Givre
>From a recent project, this collection of functions makes it possible to do >fuzzy string matching as well as phonetic matching on strings. The following functions are all phonetic functions and map text to a number or string based on how the word sounds. For instance "Jayme" and "Jaime" have the same soundex values and hence these functions can be used to match similar sounding words. * caverphone1( <string> ) * caverphone2( <string> ) * cologne_phonetic( <string> ) * dm_soundex( <string> ) * double_metaphone(<string>) * match_rating_encoder( <string> ) * metaphone(<string>) * nysiis( <string> ) * refined_soundex(<string>) * soundex(<string>) Additionally, there is the {code:java} sounds_like(<string1>,<string2>){code} function which can be used to find strings that sound similar. For instance: {code:java} SELECT * FROM <data> WHERE sounds_like( last_name, 'Gretsky' ) {code} h2. String Distance Functions In addition to the phonetic functions, there are a series of distance functions which measure the difference between two strings. The functions include: * cosine_distance(<string1>,<string2>) * fuzzy_score(<string1>,<string2>) * hamming_distance (<string1>,<string2>) * jaccard_distance (<string1>,<string2>) * jaro_distance (<string1>,<string2>) * levenshtein_distance (<string1>,<string2>) * longest_common_substring_distance(<string1>,<string2>) -- This message was sent by Atlassian JIRA (v7.6.3#76005)