Max Gekk created SPARK-43493:
--------------------------------

             Summary: Add a max distance argument to the levenshtein() function.
                 Key: SPARK-43493
                 URL: https://issues.apache.org/jira/browse/SPARK-43493
             Project: Spark
          Issue Type: New Feature
          Components: SQL
    Affects Versions: 3.4.0
            Reporter: Max Gekk


Currently, Spark's levenshtein(str1, str2) function can be very inefficient for 
long strings. Many other databases which support this type of built-in function 
also take a third argument which signifies a maximum distance after which it is 
okay to terminate the algorithm.

For example something like

{code:sql}
levenshtein(str1, str2[, max_distance])
{code}

the function stops computing the distant once the max values is reached.
See postgresql for an example of a 3 argument 
[levenshtein|https://www.postgresql.org/docs/current/fuzzystrmatch.html#id-1.11.7.26.7].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to