[jira] [Commented] (SPARK-43489) Remove protobuf 2.5.0
[ https://issues.apache.org/jira/browse/SPARK-43489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17722348#comment-17722348 ] ASF GitHub Bot commented on SPARK-43489: User 'pan3793' has created a pull request for this issue: https://github.com/apache/spark/pull/41153 > Remove protobuf 2.5.0 > - > > Key: SPARK-43489 > URL: https://issues.apache.org/jira/browse/SPARK-43489 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.5.0 >Reporter: Cheng Pan >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43491) In expression not compatible with EqualTo Expression
[ https://issues.apache.org/jira/browse/SPARK-43491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17722349#comment-17722349 ] ASF GitHub Bot commented on SPARK-43491: User 'liukuijian8040' has created a pull request for this issue: https://github.com/apache/spark/pull/41162 > In expression not compatible with EqualTo Expression > > > Key: SPARK-43491 > URL: https://issues.apache.org/jira/browse/SPARK-43491 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.1 >Reporter: KuijianLiu >Priority: Minor > Attachments: image-2023-05-13-13-14-55-853.png, > image-2023-05-13-13-15-50-685.png > > > The query results of Spark SQL 3.1.1 and Hive SQL 3.1.0 are inconsistent > with same sql. Spark SQL calculates `{{{}0 in ('00')`{}}} as false, which act > different from `{{{}=`{}}} keyword, but Hive calculates true. Hive is > compatible with the `{{{}in`{}}} keyword in 3.1.0, but SparkSQL does not. > It's better when dataTypes of elements in `{{{}In`{}}} expression are the > same, it should behaviour as same as BinaryComparison like ` {{{}EqualTo`{}}}. > Test SQL: > {code:java} > scala> spark.sql("select 1 as test where 0 = '00'").show > ++ > |test| > ++ > | 1| > ++ > scala> spark.sql("select 1 as test where 0 in ('00')").show > ++ > |test| > ++ > ++{code} > !image-2023-05-13-13-15-50-685.png! > !image-2023-05-13-13-14-55-853.png! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43485) Confused errors from the DATEADD function
[ https://issues.apache.org/jira/browse/SPARK-43485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17722351#comment-17722351 ] BingKun Pan commented on SPARK-43485: - Can I do it ? [~maxgekk] > Confused errors from the DATEADD function > - > > Key: SPARK-43485 > URL: https://issues.apache.org/jira/browse/SPARK-43485 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Assignee: Max Gekk >Priority: Major > > The code example portraits the issue: > {code:sql} > spark-sql (default)> select dateadd('MONTH', 1, date'2023-05-11'); > [WRONG_NUM_ARGS.WITHOUT_SUGGESTION] The `dateadd` requires 2 parameters but > the actual number is 3. Please, refer to > 'https://spark.apache.org/docs/latest/sql-ref-functions.html' for a fix.; > line 1 pos 7 > {code} > The error says about number of arguments passed to DATEADD but the issue is > about the type of the first argument. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43487) Wrong error message used for `ambiguousRelationAliasNameInNestedCTEError`
[ https://issues.apache.org/jira/browse/SPARK-43487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17722350#comment-17722350 ] ASF GitHub Bot commented on SPARK-43487: User 'johanl-db' has created a pull request for this issue: https://github.com/apache/spark/pull/41155 > Wrong error message used for `ambiguousRelationAliasNameInNestedCTEError` > - > > Key: SPARK-43487 > URL: https://issues.apache.org/jira/browse/SPARK-43487 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Johan Lasperas >Priority: Minor > > The batch of errors migrated to error classes as part of spark-40540 contains > an error that got mixed up with the wrong error message: > [ambiguousRelationAliasNameInNestedCTEError|https://github.com/apache/spark/commit/43a6b932759865c45ccf36f3e9cf6898c1b762da#diff-744ac13f6fe074fddeab09b407404bffa2386f54abc83c501e6e1fe618f6db56R1983] > uses the same error message as the following > commandUnsupportedInV2TableError: > > {code:java} > WITH t AS (SELECT 1), t2 AS ( WITH t AS (SELECT 2) SELECT * FROM t) SELECT * > FROM t2; > AnalysisException: t is not supported for v2 tables > {code} > The error should be: > {code:java} > AnalysisException: Name tis ambiguous in nested CTE. > Please set spark.sql.legacy.ctePrecedencePolicy to CORRECTED so that name > defined in inner CTE takes precedence. If set it to LEGACY, outer CTE > definitions will take precedence. See more details in SPARK-28228.{code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] (SPARK-43485) Confused errors from the DATEADD function
[ https://issues.apache.org/jira/browse/SPARK-43485 ] BingKun Pan deleted comment on SPARK-43485: - was (Author: panbingkun): Can I do it ? [~maxgekk] > Confused errors from the DATEADD function > - > > Key: SPARK-43485 > URL: https://issues.apache.org/jira/browse/SPARK-43485 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Assignee: Max Gekk >Priority: Major > > The code example portraits the issue: > {code:sql} > spark-sql (default)> select dateadd('MONTH', 1, date'2023-05-11'); > [WRONG_NUM_ARGS.WITHOUT_SUGGESTION] The `dateadd` requires 2 parameters but > the actual number is 3. Please, refer to > 'https://spark.apache.org/docs/latest/sql-ref-functions.html' for a fix.; > line 1 pos 7 > {code} > The error says about number of arguments passed to DATEADD but the issue is > about the type of the first argument. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43491) In expression not compatible with EqualTo Expression
[ https://issues.apache.org/jira/browse/SPARK-43491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] KuijianLiu updated SPARK-43491: --- Issue Type: Bug (was: Improvement) > In expression not compatible with EqualTo Expression > > > Key: SPARK-43491 > URL: https://issues.apache.org/jira/browse/SPARK-43491 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.1 >Reporter: KuijianLiu >Priority: Minor > Attachments: image-2023-05-13-13-14-55-853.png, > image-2023-05-13-13-15-50-685.png > > > The query results of Spark SQL 3.1.1 and Hive SQL 3.1.0 are inconsistent > with same sql. Spark SQL calculates `{{{}0 in ('00')`{}}} as false, which act > different from `{{{}=`{}}} keyword, but Hive calculates true. Hive is > compatible with the `{{{}in`{}}} keyword in 3.1.0, but SparkSQL does not. > It's better when dataTypes of elements in `{{{}In`{}}} expression are the > same, it should behaviour as same as BinaryComparison like ` {{{}EqualTo`{}}}. > Test SQL: > {code:java} > scala> spark.sql("select 1 as test where 0 = '00'").show > ++ > |test| > ++ > | 1| > ++ > scala> spark.sql("select 1 as test where 0 in ('00')").show > ++ > |test| > ++ > ++{code} > !image-2023-05-13-13-15-50-685.png! > !image-2023-05-13-13-14-55-853.png! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43491) In expression not compatible with EqualTo Expression
[ https://issues.apache.org/jira/browse/SPARK-43491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] KuijianLiu updated SPARK-43491: --- Priority: Major (was: Minor) > In expression not compatible with EqualTo Expression > > > Key: SPARK-43491 > URL: https://issues.apache.org/jira/browse/SPARK-43491 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.1 >Reporter: KuijianLiu >Priority: Major > Attachments: image-2023-05-13-13-14-55-853.png, > image-2023-05-13-13-15-50-685.png > > > The query results of Spark SQL 3.1.1 and Hive SQL 3.1.0 are inconsistent > with same sql. Spark SQL calculates `{{{}0 in ('00')`{}}} as false, which act > different from `{{{}=`{}}} keyword, but Hive calculates true. Hive is > compatible with the `{{{}in`{}}} keyword in 3.1.0, but SparkSQL does not. > It's better when dataTypes of elements in `{{{}In`{}}} expression are the > same, it should behaviour as same as BinaryComparison like ` {{{}EqualTo`{}}}. > Test SQL: > {code:java} > scala> spark.sql("select 1 as test where 0 = '00'").show > ++ > |test| > ++ > | 1| > ++ > scala> spark.sql("select 1 as test where 0 in ('00')").show > ++ > |test| > ++ > ++{code} > !image-2023-05-13-13-15-50-685.png! > !image-2023-05-13-13-14-55-853.png! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43492) Define the DATE_ADD and DATE_DIFF functions with 3-args
Max Gekk created SPARK-43492: Summary: Define the DATE_ADD and DATE_DIFF functions with 3-args Key: SPARK-43492 URL: https://issues.apache.org/jira/browse/SPARK-43492 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.4.0 Reporter: Max Gekk Assignee: Max Gekk Spark supports the DATE_ADD and DATE_DIFF functions with 2 arguments but when an user calls the same functions with 3 arguments, Spark SQL outputs the confusing error: {code:sql} spark-sql (default)> select date_add(MONTH, 1, date'2023-05-13'); [UNRESOLVED_COLUMN.WITHOUT_SUGGESTION] A column or function parameter with name `MONTH` cannot be resolved. ; line 1 pos 16; 'Project [unresolvedalias('date_add('MONTH, 1, 2023-05-13), None)] +- OneRowRelation {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43493) Add a max distance argument to the levenshtein() function.
Max Gekk created SPARK-43493: Summary: Add a max distance argument to the levenshtein() function. Key: SPARK-43493 URL: https://issues.apache.org/jira/browse/SPARK-43493 Project: Spark Issue Type: New Feature Components: SQL Affects Versions: 3.4.0 Reporter: Max Gekk Currently, Spark's levenshtein(str1, str2) function can be very inefficient for long strings. Many other databases which support this type of built-in function also take a third argument which signifies a maximum distance after which it is okay to terminate the algorithm. For example something like {code:sql} levenshtein(str1, str2[, max_distance]) {code} the function stops computing the distant once the max values is reached. See postgresql for an example of a 3 argument [levenshtein|https://www.postgresql.org/docs/current/fuzzystrmatch.html#id-1.11.7.26.7]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43493) Add a max distance argument to the levenshtein() function
[ https://issues.apache.org/jira/browse/SPARK-43493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk updated SPARK-43493: - Summary: Add a max distance argument to the levenshtein() function (was: Add a max distance argument to the levenshtein() function.) > Add a max distance argument to the levenshtein() function > - > > Key: SPARK-43493 > URL: https://issues.apache.org/jira/browse/SPARK-43493 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Priority: Major > > Currently, Spark's levenshtein(str1, str2) function can be very inefficient > for long strings. Many other databases which support this type of built-in > function also take a third argument which signifies a maximum distance after > which it is okay to terminate the algorithm. > For example something like > {code:sql} > levenshtein(str1, str2[, max_distance]) > {code} > the function stops computing the distant once the max values is reached. > See postgresql for an example of a 3 argument > [levenshtein|https://www.postgresql.org/docs/current/fuzzystrmatch.html#id-1.11.7.26.7]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43493) Add a max distance argument to the levenshtein() function
[ https://issues.apache.org/jira/browse/SPARK-43493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17722426#comment-17722426 ] BingKun Pan commented on SPARK-43493: - [~maxgekk] Can I try to do it? > Add a max distance argument to the levenshtein() function > - > > Key: SPARK-43493 > URL: https://issues.apache.org/jira/browse/SPARK-43493 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Priority: Major > > Currently, Spark's levenshtein(str1, str2) function can be very inefficient > for long strings. Many other databases which support this type of built-in > function also take a third argument which signifies a maximum distance after > which it is okay to terminate the algorithm. > For example something like > {code:sql} > levenshtein(str1, str2[, max_distance]) > {code} > the function stops computing the distant once the max values is reached. > See postgresql for an example of a 3 argument > [levenshtein|https://www.postgresql.org/docs/current/fuzzystrmatch.html#id-1.11.7.26.7]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41401) spark3 stagedir can't be change
[ https://issues.apache.org/jira/browse/SPARK-41401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jia Fan updated SPARK-41401: Summary: spark3 stagedir can't be change (was: spark2 stagedir can't be change ) > spark3 stagedir can't be change > > > Key: SPARK-41401 > URL: https://issues.apache.org/jira/browse/SPARK-41401 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.2.2, 3.2.3 >Reporter: sinlang >Priority: Major > > i want't change different staging dir when write temporary data using , but > spark3 seen can only write in table path > spark.yarn.stagingDir parameter only work when use spark2 > > in org.apache.spark.internal.io.FileCommitProtocol file : > def getStagingDir(path: String, jobId: String): Path = { > new Path(path, ".spark-staging-" + jobId) > } > } -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43492) Define the DATE_ADD and DATE_DIFF functions with 3-args
[ https://issues.apache.org/jira/browse/SPARK-43492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17722439#comment-17722439 ] Jia Fan commented on SPARK-43492: - I can fix this. > Define the DATE_ADD and DATE_DIFF functions with 3-args > --- > > Key: SPARK-43492 > URL: https://issues.apache.org/jira/browse/SPARK-43492 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Assignee: Max Gekk >Priority: Major > > Spark supports the DATE_ADD and DATE_DIFF functions with 2 arguments but when > an user calls the same functions with 3 arguments, Spark SQL outputs the > confusing error: > {code:sql} > spark-sql (default)> select date_add(MONTH, 1, date'2023-05-13'); > [UNRESOLVED_COLUMN.WITHOUT_SUGGESTION] A column or function parameter with > name `MONTH` cannot be resolved. ; line 1 pos 16; > 'Project [unresolvedalias('date_add('MONTH, 1, 2023-05-13), None)] > +- OneRowRelation > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] (SPARK-43492) Define the DATE_ADD and DATE_DIFF functions with 3-args
[ https://issues.apache.org/jira/browse/SPARK-43492 ] Jia Fan deleted comment on SPARK-43492: - was (Author: fanjia): I can fix this. > Define the DATE_ADD and DATE_DIFF functions with 3-args > --- > > Key: SPARK-43492 > URL: https://issues.apache.org/jira/browse/SPARK-43492 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Assignee: Max Gekk >Priority: Major > > Spark supports the DATE_ADD and DATE_DIFF functions with 2 arguments but when > an user calls the same functions with 3 arguments, Spark SQL outputs the > confusing error: > {code:sql} > spark-sql (default)> select date_add(MONTH, 1, date'2023-05-13'); > [UNRESOLVED_COLUMN.WITHOUT_SUGGESTION] A column or function parameter with > name `MONTH` cannot be resolved. ; line 1 pos 16; > 'Project [unresolvedalias('date_add('MONTH, 1, 2023-05-13), None)] > +- OneRowRelation > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40189) Support json_array_get/json_array_length function
[ https://issues.apache.org/jira/browse/SPARK-40189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17722443#comment-17722443 ] Jia Fan commented on SPARK-40189: - Already have json_array_length function in SPARK-31008. > Support json_array_get/json_array_length function > - > > Key: SPARK-40189 > URL: https://issues.apache.org/jira/browse/SPARK-40189 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: melin >Priority: Major > > presto provides these two functions,frequently used > https://prestodb.io/docs/current/functions/json.html#json-functions -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40189) Support json_array_get function
[ https://issues.apache.org/jira/browse/SPARK-40189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jia Fan updated SPARK-40189: Summary: Support json_array_get function (was: Support json_array_get/json_array_length function) > Support json_array_get function > --- > > Key: SPARK-40189 > URL: https://issues.apache.org/jira/browse/SPARK-40189 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: melin >Priority: Major > > presto provides these two functions,frequently used > https://prestodb.io/docs/current/functions/json.html#json-functions -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40189) Support json_array_get function
[ https://issues.apache.org/jira/browse/SPARK-40189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jia Fan updated SPARK-40189: Description: presto provides json_array_get function,frequently used [https://prestodb.io/docs/current/functions/json.html#json-functions] was: presto provides these two functions,frequently used https://prestodb.io/docs/current/functions/json.html#json-functions > Support json_array_get function > --- > > Key: SPARK-40189 > URL: https://issues.apache.org/jira/browse/SPARK-40189 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: melin >Priority: Major > > presto provides json_array_get function,frequently used > [https://prestodb.io/docs/current/functions/json.html#json-functions] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43493) Add a max distance argument to the levenshtein() function
[ https://issues.apache.org/jira/browse/SPARK-43493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17722448#comment-17722448 ] Max Gekk commented on SPARK-43493: -- [~panbingkun] Sure, go ahead. > Add a max distance argument to the levenshtein() function > - > > Key: SPARK-43493 > URL: https://issues.apache.org/jira/browse/SPARK-43493 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Priority: Major > > Currently, Spark's levenshtein(str1, str2) function can be very inefficient > for long strings. Many other databases which support this type of built-in > function also take a third argument which signifies a maximum distance after > which it is okay to terminate the algorithm. > For example something like > {code:sql} > levenshtein(str1, str2[, max_distance]) > {code} > the function stops computing the distant once the max values is reached. > See postgresql for an example of a 3 argument > [levenshtein|https://www.postgresql.org/docs/current/fuzzystrmatch.html#id-1.11.7.26.7]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org