date:20230513

[jira] [Commented] (SPARK-43489) Remove protobuf 2.5.0

2023-05-13 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-43489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17722348#comment-17722348
 ] 

ASF GitHub Bot commented on SPARK-43489:


User 'pan3793' has created a pull request for this issue:
https://github.com/apache/spark/pull/41153

> Remove protobuf 2.5.0
> -
>
> Key: SPARK-43489
> URL: https://issues.apache.org/jira/browse/SPARK-43489
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.5.0
>Reporter: Cheng Pan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-43491) In expression not compatible with EqualTo Expression

2023-05-13 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-43491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17722349#comment-17722349
 ] 

ASF GitHub Bot commented on SPARK-43491:


User 'liukuijian8040' has created a pull request for this issue:
https://github.com/apache/spark/pull/41162

> In expression not compatible with EqualTo Expression
> 
>
> Key: SPARK-43491
> URL: https://issues.apache.org/jira/browse/SPARK-43491
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.1
>Reporter: KuijianLiu
>Priority: Minor
> Attachments: image-2023-05-13-13-14-55-853.png, 
> image-2023-05-13-13-15-50-685.png
>
>
> The query results of Spark SQL 3.1.1  and Hive SQL 3.1.0 are inconsistent 
> with same sql. Spark SQL calculates `{{{}0 in ('00')`{}}} as false, which act 
> different from `{{{}=`{}}} keyword, but Hive calculates true. Hive is 
> compatible with the `{{{}in`{}}} keyword in 3.1.0, but SparkSQL does not.
> It's better  when dataTypes of elements in `{{{}In`{}}} expression are the 
> same, it should behaviour as same as BinaryComparison like ` {{{}EqualTo`{}}}.
> Test SQL:
> {code:java}
> scala> spark.sql("select 1 as test where 0 = '00'").show
> ++
> |test|
> ++
> |   1|
> ++
> scala> spark.sql("select 1 as test where 0 in ('00')").show
> ++
> |test|
> ++
> ++{code}
> !image-2023-05-13-13-15-50-685.png!
> !image-2023-05-13-13-14-55-853.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-43485) Confused errors from the DATEADD function

2023-05-13 Thread BingKun Pan (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-43485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17722351#comment-17722351
 ] 

BingKun Pan commented on SPARK-43485:
-

Can I do it ? [~maxgekk] 

> Confused errors from the DATEADD function
> -
>
> Key: SPARK-43485
> URL: https://issues.apache.org/jira/browse/SPARK-43485
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
>
> The code example portraits the issue:
> {code:sql}
> spark-sql (default)> select dateadd('MONTH', 1, date'2023-05-11');
> [WRONG_NUM_ARGS.WITHOUT_SUGGESTION] The `dateadd` requires 2 parameters but 
> the actual number is 3. Please, refer to 
> 'https://spark.apache.org/docs/latest/sql-ref-functions.html' for a fix.; 
> line 1 pos 7
> {code}
> The error says about number of arguments passed to DATEADD but the issue is 
> about the type of the first argument.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-43487) Wrong error message used for `ambiguousRelationAliasNameInNestedCTEError`

2023-05-13 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-43487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17722350#comment-17722350
 ] 

ASF GitHub Bot commented on SPARK-43487:


User 'johanl-db' has created a pull request for this issue:
https://github.com/apache/spark/pull/41155

> Wrong error message used for `ambiguousRelationAliasNameInNestedCTEError`
> -
>
> Key: SPARK-43487
> URL: https://issues.apache.org/jira/browse/SPARK-43487
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Johan Lasperas
>Priority: Minor
>
> The batch of errors migrated to error classes as part of spark-40540 contains 
> an error that got mixed up with the wrong error message:
> [ambiguousRelationAliasNameInNestedCTEError|https://github.com/apache/spark/commit/43a6b932759865c45ccf36f3e9cf6898c1b762da#diff-744ac13f6fe074fddeab09b407404bffa2386f54abc83c501e6e1fe618f6db56R1983]
>  uses the same error message as the following 
> commandUnsupportedInV2TableError:
>  
> {code:java}
> WITH t AS (SELECT 1), t2 AS ( WITH t AS (SELECT 2) SELECT * FROM t) SELECT * 
> FROM t2;
> AnalysisException: t is not supported for v2 tables
> {code}
> The error should be:
> {code:java}
> AnalysisException: Name tis ambiguous in nested CTE.
> Please set spark.sql.legacy.ctePrecedencePolicy to CORRECTED so that name 
> defined in inner CTE takes precedence. If set it to LEGACY, outer CTE 
> definitions will take precedence. See more details in SPARK-28228.{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] (SPARK-43485) Confused errors from the DATEADD function

2023-05-13 Thread BingKun Pan (Jira)



[ https://issues.apache.org/jira/browse/SPARK-43485 ]


BingKun Pan deleted comment on SPARK-43485:
-

was (Author: panbingkun):
Can I do it ? [~maxgekk] 

> Confused errors from the DATEADD function
> -
>
> Key: SPARK-43485
> URL: https://issues.apache.org/jira/browse/SPARK-43485
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
>
> The code example portraits the issue:
> {code:sql}
> spark-sql (default)> select dateadd('MONTH', 1, date'2023-05-11');
> [WRONG_NUM_ARGS.WITHOUT_SUGGESTION] The `dateadd` requires 2 parameters but 
> the actual number is 3. Please, refer to 
> 'https://spark.apache.org/docs/latest/sql-ref-functions.html' for a fix.; 
> line 1 pos 7
> {code}
> The error says about number of arguments passed to DATEADD but the issue is 
> about the type of the first argument.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-43491) In expression not compatible with EqualTo Expression

2023-05-13 Thread KuijianLiu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-43491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

KuijianLiu updated SPARK-43491:
---
Issue Type: Bug  (was: Improvement)

> In expression not compatible with EqualTo Expression
> 
>
> Key: SPARK-43491
> URL: https://issues.apache.org/jira/browse/SPARK-43491
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.1
>Reporter: KuijianLiu
>Priority: Minor
> Attachments: image-2023-05-13-13-14-55-853.png, 
> image-2023-05-13-13-15-50-685.png
>
>
> The query results of Spark SQL 3.1.1  and Hive SQL 3.1.0 are inconsistent 
> with same sql. Spark SQL calculates `{{{}0 in ('00')`{}}} as false, which act 
> different from `{{{}=`{}}} keyword, but Hive calculates true. Hive is 
> compatible with the `{{{}in`{}}} keyword in 3.1.0, but SparkSQL does not.
> It's better  when dataTypes of elements in `{{{}In`{}}} expression are the 
> same, it should behaviour as same as BinaryComparison like ` {{{}EqualTo`{}}}.
> Test SQL:
> {code:java}
> scala> spark.sql("select 1 as test where 0 = '00'").show
> ++
> |test|
> ++
> |   1|
> ++
> scala> spark.sql("select 1 as test where 0 in ('00')").show
> ++
> |test|
> ++
> ++{code}
> !image-2023-05-13-13-15-50-685.png!
> !image-2023-05-13-13-14-55-853.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-43491) In expression not compatible with EqualTo Expression

2023-05-13 Thread KuijianLiu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-43491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

KuijianLiu updated SPARK-43491:
---
Priority: Major  (was: Minor)

> In expression not compatible with EqualTo Expression
> 
>
> Key: SPARK-43491
> URL: https://issues.apache.org/jira/browse/SPARK-43491
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.1
>Reporter: KuijianLiu
>Priority: Major
> Attachments: image-2023-05-13-13-14-55-853.png, 
> image-2023-05-13-13-15-50-685.png
>
>
> The query results of Spark SQL 3.1.1  and Hive SQL 3.1.0 are inconsistent 
> with same sql. Spark SQL calculates `{{{}0 in ('00')`{}}} as false, which act 
> different from `{{{}=`{}}} keyword, but Hive calculates true. Hive is 
> compatible with the `{{{}in`{}}} keyword in 3.1.0, but SparkSQL does not.
> It's better  when dataTypes of elements in `{{{}In`{}}} expression are the 
> same, it should behaviour as same as BinaryComparison like ` {{{}EqualTo`{}}}.
> Test SQL:
> {code:java}
> scala> spark.sql("select 1 as test where 0 = '00'").show
> ++
> |test|
> ++
> |   1|
> ++
> scala> spark.sql("select 1 as test where 0 in ('00')").show
> ++
> |test|
> ++
> ++{code}
> !image-2023-05-13-13-15-50-685.png!
> !image-2023-05-13-13-14-55-853.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-43492) Define the DATE_ADD and DATE_DIFF functions with 3-args

2023-05-13 Thread Max Gekk (Jira)

Max Gekk created SPARK-43492:


 Summary: Define the DATE_ADD and DATE_DIFF functions with 3-args
 Key: SPARK-43492
 URL: https://issues.apache.org/jira/browse/SPARK-43492
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.4.0
Reporter: Max Gekk
Assignee: Max Gekk


Spark supports the DATE_ADD and DATE_DIFF functions with 2 arguments but when 
an user calls the same functions with 3 arguments, Spark SQL outputs the 
confusing error:

{code:sql}
spark-sql (default)> select date_add(MONTH, 1, date'2023-05-13');
[UNRESOLVED_COLUMN.WITHOUT_SUGGESTION] A column or function parameter with name 
`MONTH` cannot be resolved. ; line 1 pos 16;
'Project [unresolvedalias('date_add('MONTH, 1, 2023-05-13), None)]
+- OneRowRelation
{code}




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-43493) Add a max distance argument to the levenshtein() function.

2023-05-13 Thread Max Gekk (Jira)

Max Gekk created SPARK-43493:


 Summary: Add a max distance argument to the levenshtein() function.
 Key: SPARK-43493
 URL: https://issues.apache.org/jira/browse/SPARK-43493
 Project: Spark
  Issue Type: New Feature
  Components: SQL
Affects Versions: 3.4.0
Reporter: Max Gekk


Currently, Spark's levenshtein(str1, str2) function can be very inefficient for 
long strings. Many other databases which support this type of built-in function 
also take a third argument which signifies a maximum distance after which it is 
okay to terminate the algorithm.

For example something like

{code:sql}
levenshtein(str1, str2[, max_distance])
{code}

the function stops computing the distant once the max values is reached.
See postgresql for an example of a 3 argument 
[levenshtein|https://www.postgresql.org/docs/current/fuzzystrmatch.html#id-1.11.7.26.7].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-43493) Add a max distance argument to the levenshtein() function

2023-05-13 Thread Max Gekk (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-43493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk updated SPARK-43493:
-
Summary: Add a max distance argument to the levenshtein() function  (was: 
Add a max distance argument to the levenshtein() function.)

> Add a max distance argument to the levenshtein() function
> -
>
> Key: SPARK-43493
> URL: https://issues.apache.org/jira/browse/SPARK-43493
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Priority: Major
>
> Currently, Spark's levenshtein(str1, str2) function can be very inefficient 
> for long strings. Many other databases which support this type of built-in 
> function also take a third argument which signifies a maximum distance after 
> which it is okay to terminate the algorithm.
> For example something like
> {code:sql}
> levenshtein(str1, str2[, max_distance])
> {code}
> the function stops computing the distant once the max values is reached.
> See postgresql for an example of a 3 argument 
> [levenshtein|https://www.postgresql.org/docs/current/fuzzystrmatch.html#id-1.11.7.26.7].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-43493) Add a max distance argument to the levenshtein() function

2023-05-13 Thread BingKun Pan (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-43493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17722426#comment-17722426
 ] 

BingKun Pan commented on SPARK-43493:
-

[~maxgekk] Can I try to do it?

> Add a max distance argument to the levenshtein() function
> -
>
> Key: SPARK-43493
> URL: https://issues.apache.org/jira/browse/SPARK-43493
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Priority: Major
>
> Currently, Spark's levenshtein(str1, str2) function can be very inefficient 
> for long strings. Many other databases which support this type of built-in 
> function also take a third argument which signifies a maximum distance after 
> which it is okay to terminate the algorithm.
> For example something like
> {code:sql}
> levenshtein(str1, str2[, max_distance])
> {code}
> the function stops computing the distant once the max values is reached.
> See postgresql for an example of a 3 argument 
> [levenshtein|https://www.postgresql.org/docs/current/fuzzystrmatch.html#id-1.11.7.26.7].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-41401) spark3 stagedir can't be change

2023-05-13 Thread Jia Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jia Fan updated SPARK-41401:

Summary: spark3 stagedir can't be change   (was: spark2 stagedir can't be 
change )

> spark3 stagedir can't be change 
> 
>
> Key: SPARK-41401
> URL: https://issues.apache.org/jira/browse/SPARK-41401
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.2.2, 3.2.3
>Reporter: sinlang
>Priority: Major
>
> i want't change different staging dir when write temporary data using , but 
> spark3 seen can only write in table path
> spark.yarn.stagingDir parameter only work when use spark2
>  
> in org.apache.spark.internal.io.FileCommitProtocol  file :  
>   def getStagingDir(path: String, jobId: String): Path = {
>     new Path(path, ".spark-staging-" + jobId)
>   }
> }



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-43492) Define the DATE_ADD and DATE_DIFF functions with 3-args

2023-05-13 Thread Jia Fan (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-43492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17722439#comment-17722439
 ] 

Jia Fan commented on SPARK-43492:
-

I can fix this.

> Define the DATE_ADD and DATE_DIFF functions with 3-args
> ---
>
> Key: SPARK-43492
> URL: https://issues.apache.org/jira/browse/SPARK-43492
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
>
> Spark supports the DATE_ADD and DATE_DIFF functions with 2 arguments but when 
> an user calls the same functions with 3 arguments, Spark SQL outputs the 
> confusing error:
> {code:sql}
> spark-sql (default)> select date_add(MONTH, 1, date'2023-05-13');
> [UNRESOLVED_COLUMN.WITHOUT_SUGGESTION] A column or function parameter with 
> name `MONTH` cannot be resolved. ; line 1 pos 16;
> 'Project [unresolvedalias('date_add('MONTH, 1, 2023-05-13), None)]
> +- OneRowRelation
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] (SPARK-43492) Define the DATE_ADD and DATE_DIFF functions with 3-args

2023-05-13 Thread Jia Fan (Jira)



[ https://issues.apache.org/jira/browse/SPARK-43492 ]


Jia Fan deleted comment on SPARK-43492:
-

was (Author: fanjia):
I can fix this.

> Define the DATE_ADD and DATE_DIFF functions with 3-args
> ---
>
> Key: SPARK-43492
> URL: https://issues.apache.org/jira/browse/SPARK-43492
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
>
> Spark supports the DATE_ADD and DATE_DIFF functions with 2 arguments but when 
> an user calls the same functions with 3 arguments, Spark SQL outputs the 
> confusing error:
> {code:sql}
> spark-sql (default)> select date_add(MONTH, 1, date'2023-05-13');
> [UNRESOLVED_COLUMN.WITHOUT_SUGGESTION] A column or function parameter with 
> name `MONTH` cannot be resolved. ; line 1 pos 16;
> 'Project [unresolvedalias('date_add('MONTH, 1, 2023-05-13), None)]
> +- OneRowRelation
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-40189) Support json_array_get/json_array_length function

2023-05-13 Thread Jia Fan (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-40189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17722443#comment-17722443
 ] 

Jia Fan commented on SPARK-40189:
-

Already have json_array_length function in SPARK-31008.

> Support json_array_get/json_array_length function
> -
>
> Key: SPARK-40189
> URL: https://issues.apache.org/jira/browse/SPARK-40189
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: melin
>Priority: Major
>
> presto provides these two functions，frequently used
> https://prestodb.io/docs/current/functions/json.html#json-functions



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-40189) Support json_array_get function

2023-05-13 Thread Jia Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jia Fan updated SPARK-40189:

Summary: Support json_array_get function  (was: Support 
json_array_get/json_array_length function)

> Support json_array_get function
> ---
>
> Key: SPARK-40189
> URL: https://issues.apache.org/jira/browse/SPARK-40189
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: melin
>Priority: Major
>
> presto provides these two functions，frequently used
> https://prestodb.io/docs/current/functions/json.html#json-functions



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-40189) Support json_array_get function

2023-05-13 Thread Jia Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jia Fan updated SPARK-40189:

Description: 
presto provides  json_array_get function，frequently used

[https://prestodb.io/docs/current/functions/json.html#json-functions]

  was:
presto provides these two functions，frequently used

https://prestodb.io/docs/current/functions/json.html#json-functions


> Support json_array_get function
> ---
>
> Key: SPARK-40189
> URL: https://issues.apache.org/jira/browse/SPARK-40189
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: melin
>Priority: Major
>
> presto provides  json_array_get function，frequently used
> [https://prestodb.io/docs/current/functions/json.html#json-functions]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-43493) Add a max distance argument to the levenshtein() function

2023-05-13 Thread Max Gekk (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-43493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17722448#comment-17722448
 ] 

Max Gekk commented on SPARK-43493:
--

[~panbingkun] Sure, go ahead.

> Add a max distance argument to the levenshtein() function
> -
>
> Key: SPARK-43493
> URL: https://issues.apache.org/jira/browse/SPARK-43493
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Priority: Major
>
> Currently, Spark's levenshtein(str1, str2) function can be very inefficient 
> for long strings. Many other databases which support this type of built-in 
> function also take a third argument which signifies a maximum distance after 
> which it is okay to terminate the algorithm.
> For example something like
> {code:sql}
> levenshtein(str1, str2[, max_distance])
> {code}
> the function stops computing the distant once the max values is reached.
> See postgresql for an example of a 3 argument 
> [levenshtein|https://www.postgresql.org/docs/current/fuzzystrmatch.html#id-1.11.7.26.7].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-43489) Remove protobuf 2.5.0

[jira] [Commented] (SPARK-43491) In expression not compatible with EqualTo Expression

[jira] [Commented] (SPARK-43485) Confused errors from the DATEADD function

[jira] [Commented] (SPARK-43487) Wrong error message used for `ambiguousRelationAliasNameInNestedCTEError`

[jira] (SPARK-43485) Confused errors from the DATEADD function

[jira] [Updated] (SPARK-43491) In expression not compatible with EqualTo Expression

[jira] [Updated] (SPARK-43491) In expression not compatible with EqualTo Expression

[jira] [Created] (SPARK-43492) Define the DATE_ADD and DATE_DIFF functions with 3-args

[jira] [Created] (SPARK-43493) Add a max distance argument to the levenshtein() function.

[jira] [Updated] (SPARK-43493) Add a max distance argument to the levenshtein() function

[jira] [Commented] (SPARK-43493) Add a max distance argument to the levenshtein() function

[jira] [Updated] (SPARK-41401) spark3 stagedir can't be change

[jira] [Commented] (SPARK-43492) Define the DATE_ADD and DATE_DIFF functions with 3-args

[jira] (SPARK-43492) Define the DATE_ADD and DATE_DIFF functions with 3-args

[jira] [Commented] (SPARK-40189) Support json_array_get/json_array_length function

[jira] [Updated] (SPARK-40189) Support json_array_get function

[jira] [Updated] (SPARK-40189) Support json_array_get function

[jira] [Commented] (SPARK-43493) Add a max distance argument to the levenshtein() function

18 matches

Site Navigation

Mail list logo

Footer information