[ https://issues.apache.org/jira/browse/SPARK-43493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Max Gekk resolved SPARK-43493. ------------------------------ Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 41169 [https://github.com/apache/spark/pull/41169] > Add a max distance argument to the levenshtein() function > --------------------------------------------------------- > > Key: SPARK-43493 > URL: https://issues.apache.org/jira/browse/SPARK-43493 > Project: Spark > Issue Type: New Feature > Components: SQL > Affects Versions: 3.4.0 > Reporter: Max Gekk > Assignee: BingKun Pan > Priority: Major > Fix For: 3.5.0 > > > Currently, Spark's levenshtein(str1, str2) function can be very inefficient > for long strings. Many other databases which support this type of built-in > function also take a third argument which signifies a maximum distance after > which it is okay to terminate the algorithm. > For example something like > {code:sql} > levenshtein(str1, str2[, max_distance]) > {code} > the function stops computing the distant once the max values is reached. > See postgresql for an example of a 3 argument > [levenshtein|https://www.postgresql.org/docs/current/fuzzystrmatch.html#id-1.11.7.26.7]. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org