[jira] [Commented] (SPARK-43493) Add a max distance argument to the levenshtein() function

2023-06-25 Thread Nikita Awasthi (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17736863#comment-17736863
 ] 

Nikita Awasthi commented on SPARK-43493:


User 'panbingkun' has created a pull request for this issue:
https://github.com/apache/spark/pull/41724

> Add a max distance argument to the levenshtein() function
> -
>
> Key: SPARK-43493
> URL: https://issues.apache.org/jira/browse/SPARK-43493
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Assignee: BingKun Pan
>Priority: Major
> Fix For: 3.5.0
>
>
> Currently, Spark's levenshtein(str1, str2) function can be very inefficient 
> for long strings. Many other databases which support this type of built-in 
> function also take a third argument which signifies a maximum distance after 
> which it is okay to terminate the algorithm.
> For example something like
> {code:sql}
> levenshtein(str1, str2[, max_distance])
> {code}
> the function stops computing the distant once the max values is reached.
> See postgresql for an example of a 3 argument 
> [levenshtein|https://www.postgresql.org/docs/current/fuzzystrmatch.html#id-1.11.7.26.7].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-43493) Add a max distance argument to the levenshtein() function

2023-05-15 Thread BingKun Pan (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17722720#comment-17722720
 ] 

BingKun Pan commented on SPARK-43493:
-

Let me implements it at `sql` first, I will implements it ad `connect` later.

> Add a max distance argument to the levenshtein() function
> -
>
> Key: SPARK-43493
> URL: https://issues.apache.org/jira/browse/SPARK-43493
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Priority: Major
>
> Currently, Spark's levenshtein(str1, str2) function can be very inefficient 
> for long strings. Many other databases which support this type of built-in 
> function also take a third argument which signifies a maximum distance after 
> which it is okay to terminate the algorithm.
> For example something like
> {code:sql}
> levenshtein(str1, str2[, max_distance])
> {code}
> the function stops computing the distant once the max values is reached.
> See postgresql for an example of a 3 argument 
> [levenshtein|https://www.postgresql.org/docs/current/fuzzystrmatch.html#id-1.11.7.26.7].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-43493) Add a max distance argument to the levenshtein() function

2023-05-15 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17722713#comment-17722713
 ] 

ASF GitHub Bot commented on SPARK-43493:


User 'panbingkun' has created a pull request for this issue:
https://github.com/apache/spark/pull/41169

> Add a max distance argument to the levenshtein() function
> -
>
> Key: SPARK-43493
> URL: https://issues.apache.org/jira/browse/SPARK-43493
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Priority: Major
>
> Currently, Spark's levenshtein(str1, str2) function can be very inefficient 
> for long strings. Many other databases which support this type of built-in 
> function also take a third argument which signifies a maximum distance after 
> which it is okay to terminate the algorithm.
> For example something like
> {code:sql}
> levenshtein(str1, str2[, max_distance])
> {code}
> the function stops computing the distant once the max values is reached.
> See postgresql for an example of a 3 argument 
> [levenshtein|https://www.postgresql.org/docs/current/fuzzystrmatch.html#id-1.11.7.26.7].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-43493) Add a max distance argument to the levenshtein() function

2023-05-14 Thread BingKun Pan (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17722453#comment-17722453
 ] 

BingKun Pan commented on SPARK-43493:
-

OK

> Add a max distance argument to the levenshtein() function
> -
>
> Key: SPARK-43493
> URL: https://issues.apache.org/jira/browse/SPARK-43493
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Priority: Major
>
> Currently, Spark's levenshtein(str1, str2) function can be very inefficient 
> for long strings. Many other databases which support this type of built-in 
> function also take a third argument which signifies a maximum distance after 
> which it is okay to terminate the algorithm.
> For example something like
> {code:sql}
> levenshtein(str1, str2[, max_distance])
> {code}
> the function stops computing the distant once the max values is reached.
> See postgresql for an example of a 3 argument 
> [levenshtein|https://www.postgresql.org/docs/current/fuzzystrmatch.html#id-1.11.7.26.7].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-43493) Add a max distance argument to the levenshtein() function

2023-05-14 Thread Max Gekk (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17722448#comment-17722448
 ] 

Max Gekk commented on SPARK-43493:
--

[~panbingkun] Sure, go ahead.

> Add a max distance argument to the levenshtein() function
> -
>
> Key: SPARK-43493
> URL: https://issues.apache.org/jira/browse/SPARK-43493
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Priority: Major
>
> Currently, Spark's levenshtein(str1, str2) function can be very inefficient 
> for long strings. Many other databases which support this type of built-in 
> function also take a third argument which signifies a maximum distance after 
> which it is okay to terminate the algorithm.
> For example something like
> {code:sql}
> levenshtein(str1, str2[, max_distance])
> {code}
> the function stops computing the distant once the max values is reached.
> See postgresql for an example of a 3 argument 
> [levenshtein|https://www.postgresql.org/docs/current/fuzzystrmatch.html#id-1.11.7.26.7].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-43493) Add a max distance argument to the levenshtein() function

2023-05-13 Thread BingKun Pan (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17722426#comment-17722426
 ] 

BingKun Pan commented on SPARK-43493:
-

[~maxgekk] Can I try to do it?

> Add a max distance argument to the levenshtein() function
> -
>
> Key: SPARK-43493
> URL: https://issues.apache.org/jira/browse/SPARK-43493
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Priority: Major
>
> Currently, Spark's levenshtein(str1, str2) function can be very inefficient 
> for long strings. Many other databases which support this type of built-in 
> function also take a third argument which signifies a maximum distance after 
> which it is okay to terminate the algorithm.
> For example something like
> {code:sql}
> levenshtein(str1, str2[, max_distance])
> {code}
> the function stops computing the distant once the max values is reached.
> See postgresql for an example of a 3 argument 
> [levenshtein|https://www.postgresql.org/docs/current/fuzzystrmatch.html#id-1.11.7.26.7].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org