[jira] [Commented] (SPARK-43493) Add a max distance argument to the levenshtein() function
[ https://issues.apache.org/jira/browse/SPARK-43493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17736863#comment-17736863 ] Nikita Awasthi commented on SPARK-43493: User 'panbingkun' has created a pull request for this issue: https://github.com/apache/spark/pull/41724 > Add a max distance argument to the levenshtein() function > - > > Key: SPARK-43493 > URL: https://issues.apache.org/jira/browse/SPARK-43493 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Assignee: BingKun Pan >Priority: Major > Fix For: 3.5.0 > > > Currently, Spark's levenshtein(str1, str2) function can be very inefficient > for long strings. Many other databases which support this type of built-in > function also take a third argument which signifies a maximum distance after > which it is okay to terminate the algorithm. > For example something like > {code:sql} > levenshtein(str1, str2[, max_distance]) > {code} > the function stops computing the distant once the max values is reached. > See postgresql for an example of a 3 argument > [levenshtein|https://www.postgresql.org/docs/current/fuzzystrmatch.html#id-1.11.7.26.7]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43493) Add a max distance argument to the levenshtein() function
[ https://issues.apache.org/jira/browse/SPARK-43493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17722720#comment-17722720 ] BingKun Pan commented on SPARK-43493: - Let me implements it at `sql` first, I will implements it ad `connect` later. > Add a max distance argument to the levenshtein() function > - > > Key: SPARK-43493 > URL: https://issues.apache.org/jira/browse/SPARK-43493 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Priority: Major > > Currently, Spark's levenshtein(str1, str2) function can be very inefficient > for long strings. Many other databases which support this type of built-in > function also take a third argument which signifies a maximum distance after > which it is okay to terminate the algorithm. > For example something like > {code:sql} > levenshtein(str1, str2[, max_distance]) > {code} > the function stops computing the distant once the max values is reached. > See postgresql for an example of a 3 argument > [levenshtein|https://www.postgresql.org/docs/current/fuzzystrmatch.html#id-1.11.7.26.7]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43493) Add a max distance argument to the levenshtein() function
[ https://issues.apache.org/jira/browse/SPARK-43493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17722713#comment-17722713 ] ASF GitHub Bot commented on SPARK-43493: User 'panbingkun' has created a pull request for this issue: https://github.com/apache/spark/pull/41169 > Add a max distance argument to the levenshtein() function > - > > Key: SPARK-43493 > URL: https://issues.apache.org/jira/browse/SPARK-43493 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Priority: Major > > Currently, Spark's levenshtein(str1, str2) function can be very inefficient > for long strings. Many other databases which support this type of built-in > function also take a third argument which signifies a maximum distance after > which it is okay to terminate the algorithm. > For example something like > {code:sql} > levenshtein(str1, str2[, max_distance]) > {code} > the function stops computing the distant once the max values is reached. > See postgresql for an example of a 3 argument > [levenshtein|https://www.postgresql.org/docs/current/fuzzystrmatch.html#id-1.11.7.26.7]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43493) Add a max distance argument to the levenshtein() function
[ https://issues.apache.org/jira/browse/SPARK-43493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17722453#comment-17722453 ] BingKun Pan commented on SPARK-43493: - OK > Add a max distance argument to the levenshtein() function > - > > Key: SPARK-43493 > URL: https://issues.apache.org/jira/browse/SPARK-43493 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Priority: Major > > Currently, Spark's levenshtein(str1, str2) function can be very inefficient > for long strings. Many other databases which support this type of built-in > function also take a third argument which signifies a maximum distance after > which it is okay to terminate the algorithm. > For example something like > {code:sql} > levenshtein(str1, str2[, max_distance]) > {code} > the function stops computing the distant once the max values is reached. > See postgresql for an example of a 3 argument > [levenshtein|https://www.postgresql.org/docs/current/fuzzystrmatch.html#id-1.11.7.26.7]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43493) Add a max distance argument to the levenshtein() function
[ https://issues.apache.org/jira/browse/SPARK-43493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17722448#comment-17722448 ] Max Gekk commented on SPARK-43493: -- [~panbingkun] Sure, go ahead. > Add a max distance argument to the levenshtein() function > - > > Key: SPARK-43493 > URL: https://issues.apache.org/jira/browse/SPARK-43493 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Priority: Major > > Currently, Spark's levenshtein(str1, str2) function can be very inefficient > for long strings. Many other databases which support this type of built-in > function also take a third argument which signifies a maximum distance after > which it is okay to terminate the algorithm. > For example something like > {code:sql} > levenshtein(str1, str2[, max_distance]) > {code} > the function stops computing the distant once the max values is reached. > See postgresql for an example of a 3 argument > [levenshtein|https://www.postgresql.org/docs/current/fuzzystrmatch.html#id-1.11.7.26.7]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43493) Add a max distance argument to the levenshtein() function
[ https://issues.apache.org/jira/browse/SPARK-43493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17722426#comment-17722426 ] BingKun Pan commented on SPARK-43493: - [~maxgekk] Can I try to do it? > Add a max distance argument to the levenshtein() function > - > > Key: SPARK-43493 > URL: https://issues.apache.org/jira/browse/SPARK-43493 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Priority: Major > > Currently, Spark's levenshtein(str1, str2) function can be very inefficient > for long strings. Many other databases which support this type of built-in > function also take a third argument which signifies a maximum distance after > which it is okay to terminate the algorithm. > For example something like > {code:sql} > levenshtein(str1, str2[, max_distance]) > {code} > the function stops computing the distant once the max values is reached. > See postgresql for an example of a 3 argument > [levenshtein|https://www.postgresql.org/docs/current/fuzzystrmatch.html#id-1.11.7.26.7]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org