[ 
https://issues.apache.org/jira/browse/DRILL-3747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14983103#comment-14983103
 ] 

ASF GitHub Bot commented on DRILL-3747:
---------------------------------------

GitHub user k255 opened a pull request:

    https://github.com/apache/drill/pull/224

    DRILL-3747: basic similarity search with simmetric

    Helps handling i.e. typos in search queries with popular algorithms like 
levenshtein.
    Sample query:
    ```
    select levenshtein('foo', 'boo') from (VALUES(1)); //gives 0.67
    ```
    and
    ```
    select levenshtein('foo', 'bar') from (VALUES(1)); //not similar - gives 0
    ```
    More:
    https://github.com/k255/drill-fuzzy-search
    https://en.wikipedia.org/wiki/Levenshtein_distance

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/k255/drill drill-fuzzysearch

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/drill/pull/224.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #224
    
----
commit 51248358adf7ee71a744cccb7a22b45850f192a8
Author: potocki <k...@gmx.com>
Date:   2015-10-30T18:54:41Z

    basic similarity search with simmetric

----


> UDF for "fuzzy" string and similarity matching
> ----------------------------------------------
>
>                 Key: DRILL-3747
>                 URL: https://issues.apache.org/jira/browse/DRILL-3747
>             Project: Apache Drill
>          Issue Type: New Feature
>          Components: Functions - Drill
>    Affects Versions: Future
>            Reporter: Edmon Begoli
>            Priority: Minor
>              Labels: features
>             Fix For: Future
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> I propose implementation of string/distance or distance matching functions 
> similar to what one finds in most of other databases - soundex, metaphone, 
> levenshtein (and more advanced variants such as levenshtein-damerau, 
> jaro-winkler, etc.).
> See fuzzystrmatch 
> http://www.postgresql.org/docs/9.5/static/fuzzystrmatch.html, 
> and pg_similarity http://pgsimilarity.projects.pgfoundry.org/
> for inspiration.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to