[jira] [Commented] (SPARK-40173) Make pyspark.taskcontext examples self-contained
[ https://issues.apache.org/jira/browse/SPARK-40173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17583363#comment-17583363 ] Apache Spark commented on SPARK-40173: -- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/37623 > Make pyspark.taskcontext examples self-contained > > > Key: SPARK-40173 > URL: https://issues.apache.org/jira/browse/SPARK-40173 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark, Spark Core >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40173) Make pyspark.taskcontext examples self-contained
[ https://issues.apache.org/jira/browse/SPARK-40173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40173: Assignee: (was: Apache Spark) > Make pyspark.taskcontext examples self-contained > > > Key: SPARK-40173 > URL: https://issues.apache.org/jira/browse/SPARK-40173 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark, Spark Core >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40173) Make pyspark.taskcontext examples self-contained
[ https://issues.apache.org/jira/browse/SPARK-40173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17583361#comment-17583361 ] Apache Spark commented on SPARK-40173: -- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/37623 > Make pyspark.taskcontext examples self-contained > > > Key: SPARK-40173 > URL: https://issues.apache.org/jira/browse/SPARK-40173 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark, Spark Core >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40173) Make pyspark.taskcontext examples self-contained
[ https://issues.apache.org/jira/browse/SPARK-40173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40173: Assignee: Apache Spark > Make pyspark.taskcontext examples self-contained > > > Key: SPARK-40173 > URL: https://issues.apache.org/jira/browse/SPARK-40173 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark, Spark Core >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40187) Add doc for using Apache YuniKorn as a customized scheduler
[ https://issues.apache.org/jira/browse/SPARK-40187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17583352#comment-17583352 ] Apache Spark commented on SPARK-40187: -- User 'yangwwei' has created a pull request for this issue: https://github.com/apache/spark/pull/37622 > Add doc for using Apache YuniKorn as a customized scheduler > --- > > Key: SPARK-40187 > URL: https://issues.apache.org/jira/browse/SPARK-40187 > Project: Spark > Issue Type: Sub-task > Components: Documentation >Affects Versions: 3.3.0 >Reporter: Weiwei Yang >Priority: Major > > Add a section under > https://spark.apache.org/docs/latest/running-on-kubernetes.html#customized-kubernetes-schedulers-for-spark-on-kubernetes > to explain how to run Spark with Apache YuniKorn. This is based on [this PR > review > comment|https://github.com/apache/spark/pull/35663#issuecomment-1220474012]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40187) Add doc for using Apache YuniKorn as a customized scheduler
[ https://issues.apache.org/jira/browse/SPARK-40187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40187: Assignee: (was: Apache Spark) > Add doc for using Apache YuniKorn as a customized scheduler > --- > > Key: SPARK-40187 > URL: https://issues.apache.org/jira/browse/SPARK-40187 > Project: Spark > Issue Type: Sub-task > Components: Documentation >Affects Versions: 3.3.0 >Reporter: Weiwei Yang >Priority: Major > > Add a section under > https://spark.apache.org/docs/latest/running-on-kubernetes.html#customized-kubernetes-schedulers-for-spark-on-kubernetes > to explain how to run Spark with Apache YuniKorn. This is based on [this PR > review > comment|https://github.com/apache/spark/pull/35663#issuecomment-1220474012]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40187) Add doc for using Apache YuniKorn as a customized scheduler
[ https://issues.apache.org/jira/browse/SPARK-40187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40187: Assignee: Apache Spark > Add doc for using Apache YuniKorn as a customized scheduler > --- > > Key: SPARK-40187 > URL: https://issues.apache.org/jira/browse/SPARK-40187 > Project: Spark > Issue Type: Sub-task > Components: Documentation >Affects Versions: 3.3.0 >Reporter: Weiwei Yang >Assignee: Apache Spark >Priority: Major > > Add a section under > https://spark.apache.org/docs/latest/running-on-kubernetes.html#customized-kubernetes-schedulers-for-spark-on-kubernetes > to explain how to run Spark with Apache YuniKorn. This is based on [this PR > review > comment|https://github.com/apache/spark/pull/35663#issuecomment-1220474012]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40187) Add doc for using Apache YuniKorn as a customized scheduler
Weiwei Yang created SPARK-40187: --- Summary: Add doc for using Apache YuniKorn as a customized scheduler Key: SPARK-40187 URL: https://issues.apache.org/jira/browse/SPARK-40187 Project: Spark Issue Type: Sub-task Components: Documentation Affects Versions: 3.3.0 Reporter: Weiwei Yang Add a section under https://spark.apache.org/docs/latest/running-on-kubernetes.html#customized-kubernetes-schedulers-for-spark-on-kubernetes to explain how to run Spark with Apache YuniKorn. This is based on [this PR review comment|https://github.com/apache/spark/pull/35663#issuecomment-1220474012]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40186) mergedShuffleCleaner should have been shutdown before db closed
Yang Jie created SPARK-40186: Summary: mergedShuffleCleaner should have been shutdown before db closed Key: SPARK-40186 URL: https://issues.apache.org/jira/browse/SPARK-40186 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 3.4.0 Reporter: Yang Jie Should ensure `RemoteBlockPushResolver#mergedShuffleCleaner` have been shutdown before `RemoteBlockPushResolver#db` closed, otherwise, `RemoteBlockPushResolver#applicationRemoved` may perform delete operations on a closed db. https://github.com/apache/spark/pull/37610#discussion_r951185256 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-22588) SPARK: Load Data from Dataframe or RDD to DynamoDB / dealing with null values
[ https://issues.apache.org/jira/browse/SPARK-22588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17583315#comment-17583315 ] Vivek Garg commented on SPARK-22588: We offer comprehensive [Splunk online training|https://www.igmguru.com/big-data/splunk-training/] that also covers a variety of administrative and support options. > SPARK: Load Data from Dataframe or RDD to DynamoDB / dealing with null values > - > > Key: SPARK-22588 > URL: https://issues.apache.org/jira/browse/SPARK-22588 > Project: Spark > Issue Type: Question > Components: Deploy >Affects Versions: 2.1.1 >Reporter: Saanvi Sharma >Priority: Minor > Labels: dynamodb, spark > Original Estimate: 24h > Remaining Estimate: 24h > > I am using spark 2.1 on EMR and i have a dataframe like this: > ClientNum | Value_1 | Value_2 | Value_3 | Value_4 > 14 |A |B| C | null > 19 |X |Y| null| null > 21 |R | null | null| null > I want to load data into DynamoDB table with ClientNum as key fetching: > Analyze Your Data on Amazon DynamoDB with apche Spark11 > Using Spark SQL for ETL3 > here is my code that I tried to solve: > var jobConf = new JobConf(sc.hadoopConfiguration) > jobConf.set("dynamodb.servicename", "dynamodb") > jobConf.set("dynamodb.input.tableName", "table_name") > jobConf.set("dynamodb.output.tableName", "table_name") > jobConf.set("dynamodb.endpoint", "dynamodb.eu-west-1.amazonaws.com") > jobConf.set("dynamodb.regionid", "eu-west-1") > jobConf.set("dynamodb.throughput.read", "1") > jobConf.set("dynamodb.throughput.read.percent", "1") > jobConf.set("dynamodb.throughput.write", "1") > jobConf.set("dynamodb.throughput.write.percent", "1") > > jobConf.set("mapred.output.format.class", > "org.apache.hadoop.dynamodb.write.DynamoDBOutputFormat") > jobConf.set("mapred.input.format.class", > "org.apache.hadoop.dynamodb.read.DynamoDBInputFormat") > #Import Data > val df = > sqlContext.read.format("com.databricks.spark.csv").option("header", > "true").option("inferSchema", "true").load(path) > I performed a transformation to have an RDD that matches the types that the > DynamoDB custom output format knows how to write. The custom output format > expects a tuple containing the Text and DynamoDBItemWritable types. > Create a new RDD with those types in it, in the following map call: > #Convert the dataframe to rdd > val df_rdd = df.rdd > > df_rdd: org.apache.spark.rdd.RDD[org.apache.spark.sql.Row] = > MapPartitionsRDD[10] at rdd at :41 > > #Print first rdd > df_rdd.take(1) > > res12: Array[org.apache.spark.sql.Row] = Array([14,A,B,C,null]) > var ddbInsertFormattedRDD = df_rdd.map(a => { > var ddbMap = new HashMap[String, AttributeValue]() > var ClientNum = new AttributeValue() > ClientNum.setN(a.get(0).toString) > ddbMap.put("ClientNum", ClientNum) > var Value_1 = new AttributeValue() > Value_1.setS(a.get(1).toString) > ddbMap.put("Value_1", Value_1) > var Value_2 = new AttributeValue() > Value_2.setS(a.get(2).toString) > ddbMap.put("Value_2", Value_2) > var Value_3 = new AttributeValue() > Value_3.setS(a.get(3).toString) > ddbMap.put("Value_3", Value_3) > var Value_4 = new AttributeValue() > Value_4.setS(a.get(4).toString) > ddbMap.put("Value_4", Value_4) > var item = new DynamoDBItemWritable() > item.setItem(ddbMap) > (new Text(""), item) > }) > This last call uses the job configuration that defines the EMR-DDB connector > to write out the new RDD you created in the expected format: > ddbInsertFormattedRDD.saveAsHadoopDataset(jobConf) > fails with the follwoing error: > Caused by: java.lang.NullPointerException > null values caused the error, if I try with ClientNum and Value_1 it works > data is correctly inserted on DynamoDB table. > Thanks for your help !! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40185) Remove column suggestion when the candidate list is empty for unresolved column/attribute/map key
[ https://issues.apache.org/jira/browse/SPARK-40185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40185: Assignee: (was: Apache Spark) > Remove column suggestion when the candidate list is empty for unresolved > column/attribute/map key > - > > Key: SPARK-40185 > URL: https://issues.apache.org/jira/browse/SPARK-40185 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Vitalii Li >Priority: Major > > For unresolved column, attribute or map key an error message might contain > suggestions from the list. However, when the list is empty the error message > looks incomplete: > `[UNRESOLVED_COLUMN] A column or function parameter with name 'YrMo' cannot > be resolved. Did you mean one of the following? []` > This issue is to make final suggestion to show only if suggestion list is non > empty. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40185) Remove column suggestion when the candidate list is empty for unresolved column/attribute/map key
[ https://issues.apache.org/jira/browse/SPARK-40185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17583299#comment-17583299 ] Apache Spark commented on SPARK-40185: -- User 'vitaliili-db' has created a pull request for this issue: https://github.com/apache/spark/pull/37621 > Remove column suggestion when the candidate list is empty for unresolved > column/attribute/map key > - > > Key: SPARK-40185 > URL: https://issues.apache.org/jira/browse/SPARK-40185 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Vitalii Li >Priority: Major > > For unresolved column, attribute or map key an error message might contain > suggestions from the list. However, when the list is empty the error message > looks incomplete: > `[UNRESOLVED_COLUMN] A column or function parameter with name 'YrMo' cannot > be resolved. Did you mean one of the following? []` > This issue is to make final suggestion to show only if suggestion list is non > empty. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40185) Remove column suggestion when the candidate list is empty for unresolved column/attribute/map key
[ https://issues.apache.org/jira/browse/SPARK-40185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17583298#comment-17583298 ] Apache Spark commented on SPARK-40185: -- User 'vitaliili-db' has created a pull request for this issue: https://github.com/apache/spark/pull/37621 > Remove column suggestion when the candidate list is empty for unresolved > column/attribute/map key > - > > Key: SPARK-40185 > URL: https://issues.apache.org/jira/browse/SPARK-40185 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Vitalii Li >Priority: Major > > For unresolved column, attribute or map key an error message might contain > suggestions from the list. However, when the list is empty the error message > looks incomplete: > `[UNRESOLVED_COLUMN] A column or function parameter with name 'YrMo' cannot > be resolved. Did you mean one of the following? []` > This issue is to make final suggestion to show only if suggestion list is non > empty. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40185) Remove column suggestion when the candidate list is empty for unresolved column/attribute/map key
[ https://issues.apache.org/jira/browse/SPARK-40185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40185: Assignee: Apache Spark > Remove column suggestion when the candidate list is empty for unresolved > column/attribute/map key > - > > Key: SPARK-40185 > URL: https://issues.apache.org/jira/browse/SPARK-40185 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Vitalii Li >Assignee: Apache Spark >Priority: Major > > For unresolved column, attribute or map key an error message might contain > suggestions from the list. However, when the list is empty the error message > looks incomplete: > `[UNRESOLVED_COLUMN] A column or function parameter with name 'YrMo' cannot > be resolved. Did you mean one of the following? []` > This issue is to make final suggestion to show only if suggestion list is non > empty. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40185) Remove column suggestion when the candidate list is empty for unresolved column/attribute/map key
Vitalii Li created SPARK-40185: -- Summary: Remove column suggestion when the candidate list is empty for unresolved column/attribute/map key Key: SPARK-40185 URL: https://issues.apache.org/jira/browse/SPARK-40185 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.4.0 Reporter: Vitalii Li For unresolved column, attribute or map key an error message might contain suggestions from the list. However, when the list is empty the error message looks incomplete: `[UNRESOLVED_COLUMN] A column or function parameter with name 'YrMo' cannot be resolved. Did you mean one of the following? []` This issue is to make final suggestion to show only if suggestion list is non empty. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40160) Make pyspark.broadcast examples self-contained
[ https://issues.apache.org/jira/browse/SPARK-40160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17583291#comment-17583291 ] Qian Sun commented on SPARK-40160: -- working on it :) > Make pyspark.broadcast examples self-contained > -- > > Key: SPARK-40160 > URL: https://issues.apache.org/jira/browse/SPARK-40160 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark >Affects Versions: 3.4.0 >Reporter: Qian Sun >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40184) Support modify the comment of a partitioned column
melin created SPARK-40184: - Summary: Support modify the comment of a partitioned column Key: SPARK-40184 URL: https://issues.apache.org/jira/browse/SPARK-40184 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.4.0 Reporter: melin Comment is not added to the partition field when the table is created. Can modify the partition field Comment -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40183) Use error class NUMERIC_VALUE_OUT_OF_RANGE for overflow in decimal conversion
[ https://issues.apache.org/jira/browse/SPARK-40183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40183: Assignee: Apache Spark (was: Gengliang Wang) > Use error class NUMERIC_VALUE_OUT_OF_RANGE for overflow in decimal conversion > - > > Key: SPARK-40183 > URL: https://issues.apache.org/jira/browse/SPARK-40183 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Gengliang Wang >Assignee: Apache Spark >Priority: Major > > Use error class NUMERIC_VALUE_OUT_OF_RANGE for overflow in decimal > conversion, instead of the confusing error class > `CANNOT_CHANGE_DECIMAL_PRECISION`. > Also, use `decimal.toPlainString` instead of `decimal.toDebugString` in the > error message. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40183) Use error class NUMERIC_VALUE_OUT_OF_RANGE for overflow in decimal conversion
[ https://issues.apache.org/jira/browse/SPARK-40183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40183: Assignee: Gengliang Wang (was: Apache Spark) > Use error class NUMERIC_VALUE_OUT_OF_RANGE for overflow in decimal conversion > - > > Key: SPARK-40183 > URL: https://issues.apache.org/jira/browse/SPARK-40183 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > > Use error class NUMERIC_VALUE_OUT_OF_RANGE for overflow in decimal > conversion, instead of the confusing error class > `CANNOT_CHANGE_DECIMAL_PRECISION`. > Also, use `decimal.toPlainString` instead of `decimal.toDebugString` in the > error message. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40183) Use error class NUMERIC_VALUE_OUT_OF_RANGE for overflow in decimal conversion
[ https://issues.apache.org/jira/browse/SPARK-40183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17583280#comment-17583280 ] Apache Spark commented on SPARK-40183: -- User 'gengliangwang' has created a pull request for this issue: https://github.com/apache/spark/pull/37620 > Use error class NUMERIC_VALUE_OUT_OF_RANGE for overflow in decimal conversion > - > > Key: SPARK-40183 > URL: https://issues.apache.org/jira/browse/SPARK-40183 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > > Use error class NUMERIC_VALUE_OUT_OF_RANGE for overflow in decimal > conversion, instead of the confusing error class > `CANNOT_CHANGE_DECIMAL_PRECISION`. > Also, use `decimal.toPlainString` instead of `decimal.toDebugString` in the > error message. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40183) Use error class NUMERIC_VALUE_OUT_OF_RANGE for overflow in decimal conversion
Gengliang Wang created SPARK-40183: -- Summary: Use error class NUMERIC_VALUE_OUT_OF_RANGE for overflow in decimal conversion Key: SPARK-40183 URL: https://issues.apache.org/jira/browse/SPARK-40183 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.4.0 Reporter: Gengliang Wang Assignee: Gengliang Wang Use error class NUMERIC_VALUE_OUT_OF_RANGE for overflow in decimal conversion, instead of the confusing error class `CANNOT_CHANGE_DECIMAL_PRECISION`. Also, use `decimal.toPlainString` instead of `decimal.toDebugString` in the error message. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-39917) Use different error classes for numeric/interval arithmetic overflow
[ https://issues.apache.org/jira/browse/SPARK-39917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang updated SPARK-39917: --- Parent: SPARK-40182 Issue Type: Sub-task (was: Task) > Use different error classes for numeric/interval arithmetic overflow > > > Key: SPARK-39917 > URL: https://issues.apache.org/jira/browse/SPARK-39917 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > Fix For: 3.4.0 > > > Currently, when arithmetic overflow errors happen under ANSI mode, the error > messages are like > [ARITHMETIC_OVERFLOW] long overflow. Use 'try_multiply' to tolerate overflow > and return NULL instead. If necessary set spark.sql.ansi.enabled to "false" > > The "(except for ANSI interval type)" part is confusing. We should remove it > for the numeric arithmetic operations and have a new error class for the > interval division error: INTERVAL_ARITHMETIC_OVERFLOW -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-39865) Show proper error messages on the overflow errors of table insert
[ https://issues.apache.org/jira/browse/SPARK-39865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang updated SPARK-39865: --- Description: In Spark 3.3, the error message of ANSI CAST is improved. However, the table insertion is using the same CAST expression: {code:java} > create table tiny(i tinyint); > insert into tiny values (1000); org.apache.spark.SparkArithmeticException[CAST_OVERFLOW]: The value 1000 of the type "INT" cannot be cast to "TINYINT" due to an overflow. Use `try_cast` to tolerate overflow and return NULL instead. If necessary set "spark.sql.ansi.enabled" to "false" to bypass this error. {code} Showing the hint of `If necessary set "spark.sql.ansi.enabled" to "false" to bypass this error` doesn't help at all. This PR is to fix the error message. After changes, the error message of this example will become: {code:java} org.apache.spark.SparkArithmeticException: [CAST_OVERFLOW_IN_TABLE_INSERT] Fail to insert a value of "INT" type into the "TINYINT" type column `i` due to an overflow. Use `try_cast` on the input value to tolerate overflow and return NULL instead.{code} > Show proper error messages on the overflow errors of table insert > - > > Key: SPARK-39865 > URL: https://issues.apache.org/jira/browse/SPARK-39865 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0, 3.4.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > Fix For: 3.3.1 > > > In Spark 3.3, the error message of ANSI CAST is improved. However, the table > insertion is using the same CAST expression: > {code:java} > > create table tiny(i tinyint); > > insert into tiny values (1000); > org.apache.spark.SparkArithmeticException[CAST_OVERFLOW]: The value 1000 of > the type "INT" cannot be cast to "TINYINT" due to an overflow. Use `try_cast` > to tolerate overflow and return NULL instead. If necessary set > "spark.sql.ansi.enabled" to "false" to bypass this error. > {code} > > Showing the hint of `If necessary set "spark.sql.ansi.enabled" to "false" to > bypass this error` doesn't help at all. This PR is to fix the error message. > After changes, the error message of this example will become: > {code:java} > org.apache.spark.SparkArithmeticException: [CAST_OVERFLOW_IN_TABLE_INSERT] > Fail to insert a value of "INT" type into the "TINYINT" type column `i` due > to an overflow. Use `try_cast` on the input value to tolerate overflow and > return NULL instead.{code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-39865) Show proper error messages on the overflow errors of table insert
[ https://issues.apache.org/jira/browse/SPARK-39865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang updated SPARK-39865: --- Parent: SPARK-40182 Issue Type: Sub-task (was: Bug) > Show proper error messages on the overflow errors of table insert > - > > Key: SPARK-39865 > URL: https://issues.apache.org/jira/browse/SPARK-39865 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0, 3.4.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > Fix For: 3.3.1 > > > In Spark 3.3, the error message of ANSI CAST is improved. However, the table > insertion is using the same CAST expression: > {code:java} > > create table tiny(i tinyint); > > insert into tiny values (1000); > org.apache.spark.SparkArithmeticException[CAST_OVERFLOW]: The value 1000 of > the type "INT" cannot be cast to "TINYINT" due to an overflow. Use `try_cast` > to tolerate overflow and return NULL instead. If necessary set > "spark.sql.ansi.enabled" to "false" to bypass this error. > {code} > > Showing the hint of `If necessary set "spark.sql.ansi.enabled" to "false" to > bypass this error` doesn't help at all. This PR is to fix the error message. > After changes, the error message of this example will become: > {code:java} > org.apache.spark.SparkArithmeticException: [CAST_OVERFLOW_IN_TABLE_INSERT] > Fail to insert a value of "INT" type into the "TINYINT" type column `i` due > to an overflow. Use `try_cast` on the input value to tolerate overflow and > return NULL instead.{code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-39889) Use different error classes for numeric/interval divided by 0
[ https://issues.apache.org/jira/browse/SPARK-39889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang updated SPARK-39889: --- Parent: SPARK-40182 Issue Type: Sub-task (was: Task) > Use different error classes for numeric/interval divided by 0 > - > > Key: SPARK-39889 > URL: https://issues.apache.org/jira/browse/SPARK-39889 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > Fix For: 3.4.0 > > > Currently, when numbers are divided by 0 under ANSI mode, the error message > is like > {quote}[DIVIDE_BY_ZERO] Division by zero. Use `try_divide` to tolerate > divisor being 0 and return NULL instead. If necessary set "ansi_mode" to > "false" (except for ANSI interval type) to bypass this error.{quote} > The "(except for ANSI interval type)" part is confusing. We should remove it > and have a new error class "INTERVAL_DIVIDED_BY_ZERO" -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40182) Improve ANSI runtime error messages
Gengliang Wang created SPARK-40182: -- Summary: Improve ANSI runtime error messages Key: SPARK-40182 URL: https://issues.apache.org/jira/browse/SPARK-40182 Project: Spark Issue Type: Umbrella Components: SQL Affects Versions: 3.4.0 Reporter: Gengliang Wang Improve the runtime error messages related to the ANSI SQL mode. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40165) Update test plugins to latest versions
[ https://issues.apache.org/jira/browse/SPARK-40165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17583274#comment-17583274 ] BingKun Pan commented on SPARK-40165: - I will investigate the root cause for the failure carefully. > Update test plugins to latest versions > -- > > Key: SPARK-40165 > URL: https://issues.apache.org/jira/browse/SPARK-40165 > Project: Spark > Issue Type: Improvement > Components: Build, Tests >Affects Versions: 3.4.0 >Reporter: BingKun Pan >Priority: Trivial > > Include: > * 1.scalacheck (from 1.15.4 to 1.16.0) > * 2.maven-surefire-plugin (from 3.0.0-M5 to 3.0.0-M7) > * 3.maven-dependency-plugin (from 3.1.1 to 3.3.0) > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40081) Add Document Parameters for pyspark.sql.streaming.query
[ https://issues.apache.org/jira/browse/SPARK-40081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-40081: Assignee: Qian Sun > Add Document Parameters for pyspark.sql.streaming.query > --- > > Key: SPARK-40081 > URL: https://issues.apache.org/jira/browse/SPARK-40081 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark >Affects Versions: 3.4.0 >Reporter: Qian Sun >Assignee: Qian Sun >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-40081) Add Document Parameters for pyspark.sql.streaming.query
[ https://issues.apache.org/jira/browse/SPARK-40081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-40081. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 37587 [https://github.com/apache/spark/pull/37587] > Add Document Parameters for pyspark.sql.streaming.query > --- > > Key: SPARK-40081 > URL: https://issues.apache.org/jira/browse/SPARK-40081 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark >Affects Versions: 3.4.0 >Reporter: Qian Sun >Assignee: Qian Sun >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40142) Make pyspark.sql.functions examples self-contained
[ https://issues.apache.org/jira/browse/SPARK-40142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-40142: Assignee: Hyukjin Kwon > Make pyspark.sql.functions examples self-contained > -- > > Key: SPARK-40142 > URL: https://issues.apache.org/jira/browse/SPARK-40142 > Project: Spark > Issue Type: Sub-task > Components: PySpark, SQL >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-40142) Make pyspark.sql.functions examples self-contained
[ https://issues.apache.org/jira/browse/SPARK-40142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-40142. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 37592 [https://github.com/apache/spark/pull/37592] > Make pyspark.sql.functions examples self-contained > -- > > Key: SPARK-40142 > URL: https://issues.apache.org/jira/browse/SPARK-40142 > Project: Spark > Issue Type: Sub-task > Components: PySpark, SQL >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-40088) Add SparkPlanWIthAQESuite
[ https://issues.apache.org/jira/browse/SPARK-40088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] L. C. Hsieh resolved SPARK-40088. - Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 37619 [https://github.com/apache/spark/pull/37619] > Add SparkPlanWIthAQESuite > - > > Key: SPARK-40088 > URL: https://issues.apache.org/jira/browse/SPARK-40088 > Project: Spark > Issue Type: Test > Components: SQL, Tests >Affects Versions: 3.4.0 >Reporter: Kazuyuki Tanimura >Assignee: Kazuyuki Tanimura >Priority: Minor > Fix For: 3.4.0 > > > Currently `SparkPlanSuite` assumes that AQE is always turned off. We should > also test with AQE turned on -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40088) Add SparkPlanWIthAQESuite
[ https://issues.apache.org/jira/browse/SPARK-40088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] L. C. Hsieh reassigned SPARK-40088: --- Assignee: Kazuyuki Tanimura > Add SparkPlanWIthAQESuite > - > > Key: SPARK-40088 > URL: https://issues.apache.org/jira/browse/SPARK-40088 > Project: Spark > Issue Type: Test > Components: SQL, Tests >Affects Versions: 3.4.0 >Reporter: Kazuyuki Tanimura >Assignee: Kazuyuki Tanimura >Priority: Minor > > Currently `SparkPlanSuite` assumes that AQE is always turned off. We should > also test with AQE turned on -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40181) DataFrame.intersect and .intersectAll are inconsistently dropping rows
Luke created SPARK-40181: Summary: DataFrame.intersect and .intersectAll are inconsistently dropping rows Key: SPARK-40181 URL: https://issues.apache.org/jira/browse/SPARK-40181 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 3.0.1 Reporter: Luke I don't have a minimal reproducible example for this, but the place where it shows up in our workflow is very simple. The data in "COLUMN" are a few hundred million distinct strings (gets deduplicated in the plan also) and it is being compared against itself using intersect. The code that is failing is essentially: {quote}values = [...] # python list containing many unique strings, none of which are None df = spark.createDataFrame( spark.sparkContext.parallelize( [(value,) for value in values], numSlices=2 + len(values) // 1 ), schema=StructType([StructField("COLUMN", StringType())]), ) df = df.distinct() assert df.count() == df.intersect(df).count() assert df.count() == df.intersectAll(df).count() {quote} The issue is that both of the above asserts sometimes pass, and sometimes fail (technically we haven't seen intersectAll pass yet, but we have only tried a few times). One thing which is striking is that if you call df.intersect(df).count() multiple times, the returned count is not always the same. Sometimes it is exactly df.count(), sometimes it is ~1% lower, but how much lower exactly seems random. In particular, we have called df.intersect(df).count() twice in a row, and got two different counts, which is very surprising given that df should be deterministic, and suggests maybe there is some kind of concurrency/inconsistent hashing issue? One other thing which is possibly noteworthy is that using df.join(df, df.columns, how="inner") does seem to reliably have the desired behavior (not dropping any rows). Here is the resulting plan from df.intersect(df) {quote}== Parsed Logical Plan == 'Intersect false :- Deduplicate [COLUMN#144487] : +- LogicalRDD [COLUMN#144487], false +- Deduplicate [COLUMN#144487] +- LogicalRDD [COLUMN#144487], false == Analyzed Logical Plan == COLUMN: string Intersect false :- Deduplicate [COLUMN#144487] : +- LogicalRDD [COLUMN#144487], false +- Deduplicate [COLUMN#144523] +- LogicalRDD [COLUMN#144523], false == Optimized Logical Plan == Aggregate [COLUMN#144487], [COLUMN#144487] +- Join LeftSemi, (COLUMN#144487 <=> COLUMN#144523) :- LogicalRDD [COLUMN#144487], false +- Aggregate [COLUMN#144523], [COLUMN#144523] +- LogicalRDD [COLUMN#144523], false == Physical Plan == *(7) HashAggregate(keys=[COLUMN#144487], functions=[], output=[COLUMN#144487]) +- Exchange hashpartitioning(COLUMN#144487, 200), true, [id=#22790] +- *(6) HashAggregate(keys=[COLUMN#144487], functions=[], output=[COLUMN#144487]) +- *(6) SortMergeJoin [coalesce(COLUMN#144487, ), isnull(COLUMN#144487)], [coalesce(COLUMN#144523, ), isnull(COLUMN#144523)], LeftSemi :- *(2) Sort [coalesce(COLUMN#144487, ) ASC NULLS FIRST, isnull(COLUMN#144487) ASC NULLS FIRST], false, 0 : +- Exchange hashpartitioning(coalesce(COLUMN#144487, ), isnull(COLUMN#144487), 200), true, [id=#22772] : +- *(1) Scan ExistingRDD[COLUMN#144487] +- *(5) Sort [coalesce(COLUMN#144523, ) ASC NULLS FIRST, isnull(COLUMN#144523) ASC NULLS FIRST], false, 0 +- Exchange hashpartitioning(coalesce(COLUMN#144523, ), isnull(COLUMN#144523), 200), true, [id=#22782] +- *(4) HashAggregate(keys=[COLUMN#144523], functions=[], output=[COLUMN#144523]) +- Exchange hashpartitioning(COLUMN#144523, 200), true, [id=#22778] +- *(3) HashAggregate(keys=[COLUMN#144523], functions=[], output=[COLUMN#144523]) +- *(3) Scan ExistingRDD[COLUMN#144523] {quote} and for df.intersectAll(df) {quote}== Parsed Logical Plan == 'IntersectAll true :- Deduplicate [COLUMN#144487] : +- LogicalRDD [COLUMN#144487], false +- Deduplicate [COLUMN#144487] +- LogicalRDD [COLUMN#144487], false == Analyzed Logical Plan == COLUMN: string IntersectAll true :- Deduplicate [COLUMN#144487] : +- LogicalRDD [COLUMN#144487], false +- Deduplicate [COLUMN#144533] +- LogicalRDD [COLUMN#144533], false == Optimized Logical Plan == Project [COLUMN#144487] +- Generate replicaterows(min_count#144566L, COLUMN#144487), [1], false, [COLUMN#144487] +- Project [COLUMN#144487, if ((vcol1_count#144563L > vcol2_count#144565L)) vcol2_count#144565L else vcol1_count#144563L AS min_count#144566L] +- Filter ((vcol1_count#144563L >= 1) AND (vcol2_count#144565L >= 1)) +- Aggregate [COLUMN#144487], [count(vcol1#144558) AS vcol1_count#144563L, count(vcol2#144561) AS vcol2_count#144565L, COLUMN#144487] +- Union :- Aggregate [COLUMN#144487], [true AS vcol1#144558, null AS
[jira] [Assigned] (SPARK-40180) Format error messages by spark-sql
[ https://issues.apache.org/jira/browse/SPARK-40180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40180: Assignee: Max Gekk (was: Apache Spark) > Format error messages by spark-sql > -- > > Key: SPARK-40180 > URL: https://issues.apache.org/jira/browse/SPARK-40180 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Assignee: Max Gekk >Priority: Major > > Respect the SQL config spark.sql.error.messageFormat in the implementation of > the SQL CLI: spark-sql. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40180) Format error messages by spark-sql
[ https://issues.apache.org/jira/browse/SPARK-40180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17583143#comment-17583143 ] Apache Spark commented on SPARK-40180: -- User 'MaxGekk' has created a pull request for this issue: https://github.com/apache/spark/pull/37590 > Format error messages by spark-sql > -- > > Key: SPARK-40180 > URL: https://issues.apache.org/jira/browse/SPARK-40180 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Assignee: Max Gekk >Priority: Major > > Respect the SQL config spark.sql.error.messageFormat in the implementation of > the SQL CLI: spark-sql. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40180) Format error messages by spark-sql
[ https://issues.apache.org/jira/browse/SPARK-40180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40180: Assignee: Apache Spark (was: Max Gekk) > Format error messages by spark-sql > -- > > Key: SPARK-40180 > URL: https://issues.apache.org/jira/browse/SPARK-40180 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Assignee: Apache Spark >Priority: Major > > Respect the SQL config spark.sql.error.messageFormat in the implementation of > the SQL CLI: spark-sql. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40180) Format error messages by spark-sql
[ https://issues.apache.org/jira/browse/SPARK-40180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk updated SPARK-40180: - Description: Respect the SQL config spark.sql.error.messageFormat in the implementation of the SQL CLI: spark-sql. (was: # Introduce a config to control the format of error messages: plain text and JSON # Modify the Thrift Server to output errors from Spark SQL according to the config) > Format error messages by spark-sql > -- > > Key: SPARK-40180 > URL: https://issues.apache.org/jira/browse/SPARK-40180 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Assignee: Max Gekk >Priority: Major > > Respect the SQL config spark.sql.error.messageFormat in the implementation of > the SQL CLI: spark-sql. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40180) Format error messages by spark-sql
[ https://issues.apache.org/jira/browse/SPARK-40180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk updated SPARK-40180: - Fix Version/s: (was: 3.4.0) > Format error messages by spark-sql > -- > > Key: SPARK-40180 > URL: https://issues.apache.org/jira/browse/SPARK-40180 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Assignee: Max Gekk >Priority: Major > > # Introduce a config to control the format of error messages: plain text and > JSON > # Modify the Thrift Server to output errors from Spark SQL according to the > config -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40180) Format error messages by spark-sql
Max Gekk created SPARK-40180: Summary: Format error messages by spark-sql Key: SPARK-40180 URL: https://issues.apache.org/jira/browse/SPARK-40180 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.4.0 Reporter: Max Gekk Assignee: Max Gekk Fix For: 3.4.0 # Introduce a config to control the format of error messages: plain text and JSON # Modify the Thrift Server to output errors from Spark SQL according to the config -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-40166) Add array_sort(column, comparator) to PySpark
[ https://issues.apache.org/jira/browse/SPARK-40166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maciej Szymkiewicz resolved SPARK-40166. Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 37600 [https://github.com/apache/spark/pull/37600] > Add array_sort(column, comparator) to PySpark > - > > Key: SPARK-40166 > URL: https://issues.apache.org/jira/browse/SPARK-40166 > Project: Spark > Issue Type: Improvement > Components: PySpark, SQL >Affects Versions: 3.4.0 >Reporter: Maciej Szymkiewicz >Assignee: Maciej Szymkiewicz >Priority: Minor > Fix For: 3.4.0 > > > SPARK-39925 exposed array_sort(column, comparator) on JVM. It should be > available in Python as well. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40166) Add array_sort(column, comparator) to PySpark
[ https://issues.apache.org/jira/browse/SPARK-40166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maciej Szymkiewicz reassigned SPARK-40166: -- Assignee: Maciej Szymkiewicz > Add array_sort(column, comparator) to PySpark > - > > Key: SPARK-40166 > URL: https://issues.apache.org/jira/browse/SPARK-40166 > Project: Spark > Issue Type: Improvement > Components: PySpark, SQL >Affects Versions: 3.4.0 >Reporter: Maciej Szymkiewicz >Assignee: Maciej Szymkiewicz >Priority: Minor > > SPARK-39925 exposed array_sort(column, comparator) on JVM. It should be > available in Python as well. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-40167) Add array_sort(column, comparator) to SparkR
[ https://issues.apache.org/jira/browse/SPARK-40167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maciej Szymkiewicz resolved SPARK-40167. Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 37600 [https://github.com/apache/spark/pull/37600] > Add array_sort(column, comparator) to SparkR > > > Key: SPARK-40167 > URL: https://issues.apache.org/jira/browse/SPARK-40167 > Project: Spark > Issue Type: Improvement > Components: R, SQL >Affects Versions: 3.4.0 >Reporter: Maciej Szymkiewicz >Assignee: Maciej Szymkiewicz >Priority: Minor > Fix For: 3.4.0 > > > SPARK-39925 exposed array_sort(column, comparator) on JVM. It should be > available in R as well. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40167) Add array_sort(column, comparator) to SparkR
[ https://issues.apache.org/jira/browse/SPARK-40167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maciej Szymkiewicz reassigned SPARK-40167: -- Assignee: Maciej Szymkiewicz > Add array_sort(column, comparator) to SparkR > > > Key: SPARK-40167 > URL: https://issues.apache.org/jira/browse/SPARK-40167 > Project: Spark > Issue Type: Improvement > Components: R, SQL >Affects Versions: 3.4.0 >Reporter: Maciej Szymkiewicz >Assignee: Maciej Szymkiewicz >Priority: Minor > > SPARK-39925 exposed array_sort(column, comparator) on JVM. It should be > available in R as well. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40088) Add SparkPlanWIthAQESuite
[ https://issues.apache.org/jira/browse/SPARK-40088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17583112#comment-17583112 ] Apache Spark commented on SPARK-40088: -- User 'kazuyukitanimura' has created a pull request for this issue: https://github.com/apache/spark/pull/37619 > Add SparkPlanWIthAQESuite > - > > Key: SPARK-40088 > URL: https://issues.apache.org/jira/browse/SPARK-40088 > Project: Spark > Issue Type: Test > Components: SQL, Tests >Affects Versions: 3.4.0 >Reporter: Kazuyuki Tanimura >Priority: Minor > > Currently `SparkPlanSuite` assumes that AQE is always turned off. We should > also test with AQE turned on -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40088) Add SparkPlanWIthAQESuite
[ https://issues.apache.org/jira/browse/SPARK-40088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40088: Assignee: Apache Spark > Add SparkPlanWIthAQESuite > - > > Key: SPARK-40088 > URL: https://issues.apache.org/jira/browse/SPARK-40088 > Project: Spark > Issue Type: Test > Components: SQL, Tests >Affects Versions: 3.4.0 >Reporter: Kazuyuki Tanimura >Assignee: Apache Spark >Priority: Minor > > Currently `SparkPlanSuite` assumes that AQE is always turned off. We should > also test with AQE turned on -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40088) Add SparkPlanWIthAQESuite
[ https://issues.apache.org/jira/browse/SPARK-40088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17583111#comment-17583111 ] Apache Spark commented on SPARK-40088: -- User 'kazuyukitanimura' has created a pull request for this issue: https://github.com/apache/spark/pull/37619 > Add SparkPlanWIthAQESuite > - > > Key: SPARK-40088 > URL: https://issues.apache.org/jira/browse/SPARK-40088 > Project: Spark > Issue Type: Test > Components: SQL, Tests >Affects Versions: 3.4.0 >Reporter: Kazuyuki Tanimura >Priority: Minor > > Currently `SparkPlanSuite` assumes that AQE is always turned off. We should > also test with AQE turned on -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40088) Add SparkPlanWIthAQESuite
[ https://issues.apache.org/jira/browse/SPARK-40088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40088: Assignee: (was: Apache Spark) > Add SparkPlanWIthAQESuite > - > > Key: SPARK-40088 > URL: https://issues.apache.org/jira/browse/SPARK-40088 > Project: Spark > Issue Type: Test > Components: SQL, Tests >Affects Versions: 3.4.0 >Reporter: Kazuyuki Tanimura >Priority: Minor > > Currently `SparkPlanSuite` assumes that AQE is always turned off. We should > also test with AQE turned on -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40165) Update test plugins to latest versions
[ https://issues.apache.org/jira/browse/SPARK-40165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-40165: -- Fix Version/s: (was: 3.4.0) > Update test plugins to latest versions > -- > > Key: SPARK-40165 > URL: https://issues.apache.org/jira/browse/SPARK-40165 > Project: Spark > Issue Type: Improvement > Components: Build, Tests >Affects Versions: 3.4.0 >Reporter: BingKun Pan >Priority: Trivial > > Include: > * 1.scalacheck (from 1.15.4 to 1.16.0) > * 2.maven-surefire-plugin (from 3.0.0-M5 to 3.0.0-M7) > * 3.maven-dependency-plugin (from 3.1.1 to 3.3.0) > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40165) Update test plugins to latest versions
[ https://issues.apache.org/jira/browse/SPARK-40165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17583079#comment-17583079 ] Dongjoon Hyun commented on SPARK-40165: --- This is reverted via https://github.com/apache/spark/commit/b6192126351ea2ae658e2f0cfd8c57baf3f1d900 > Update test plugins to latest versions > -- > > Key: SPARK-40165 > URL: https://issues.apache.org/jira/browse/SPARK-40165 > Project: Spark > Issue Type: Improvement > Components: Build, Tests >Affects Versions: 3.4.0 >Reporter: BingKun Pan >Priority: Trivial > > Include: > * 1.scalacheck (from 1.15.4 to 1.16.0) > * 2.maven-surefire-plugin (from 3.0.0-M5 to 3.0.0-M7) > * 3.maven-dependency-plugin (from 3.1.1 to 3.3.0) > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40165) Update test plugins to latest versions
[ https://issues.apache.org/jira/browse/SPARK-40165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40165: Assignee: (was: Apache Spark) > Update test plugins to latest versions > -- > > Key: SPARK-40165 > URL: https://issues.apache.org/jira/browse/SPARK-40165 > Project: Spark > Issue Type: Improvement > Components: Build, Tests >Affects Versions: 3.4.0 >Reporter: BingKun Pan >Priority: Trivial > Fix For: 3.4.0 > > > Include: > * 1.scalacheck (from 1.15.4 to 1.16.0) > * 2.maven-surefire-plugin (from 3.0.0-M5 to 3.0.0-M7) > * 3.maven-dependency-plugin (from 3.1.1 to 3.3.0) > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-40165) Update test plugins to latest versions
[ https://issues.apache.org/jira/browse/SPARK-40165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reopened SPARK-40165: --- Assignee: (was: BingKun Pan) > Update test plugins to latest versions > -- > > Key: SPARK-40165 > URL: https://issues.apache.org/jira/browse/SPARK-40165 > Project: Spark > Issue Type: Improvement > Components: Build, Tests >Affects Versions: 3.4.0 >Reporter: BingKun Pan >Priority: Trivial > Fix For: 3.4.0 > > > Include: > * 1.scalacheck (from 1.15.4 to 1.16.0) > * 2.maven-surefire-plugin (from 3.0.0-M5 to 3.0.0-M7) > * 3.maven-dependency-plugin (from 3.1.1 to 3.3.0) > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40165) Update test plugins to latest versions
[ https://issues.apache.org/jira/browse/SPARK-40165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40165: Assignee: Apache Spark > Update test plugins to latest versions > -- > > Key: SPARK-40165 > URL: https://issues.apache.org/jira/browse/SPARK-40165 > Project: Spark > Issue Type: Improvement > Components: Build, Tests >Affects Versions: 3.4.0 >Reporter: BingKun Pan >Assignee: Apache Spark >Priority: Trivial > Fix For: 3.4.0 > > > Include: > * 1.scalacheck (from 1.15.4 to 1.16.0) > * 2.maven-surefire-plugin (from 3.0.0-M5 to 3.0.0-M7) > * 3.maven-dependency-plugin (from 3.1.1 to 3.3.0) > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40165) Update test plugins to latest versions
[ https://issues.apache.org/jira/browse/SPARK-40165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17583024#comment-17583024 ] Apache Spark commented on SPARK-40165: -- User 'Yikun' has created a pull request for this issue: https://github.com/apache/spark/pull/37618 > Update test plugins to latest versions > -- > > Key: SPARK-40165 > URL: https://issues.apache.org/jira/browse/SPARK-40165 > Project: Spark > Issue Type: Improvement > Components: Build, Tests >Affects Versions: 3.4.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Trivial > Fix For: 3.4.0 > > > Include: > * 1.scalacheck (from 1.15.4 to 1.16.0) > * 2.maven-surefire-plugin (from 3.0.0-M5 to 3.0.0-M7) > * 3.maven-dependency-plugin (from 3.1.1 to 3.3.0) > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-39755) Improve LocalDirsFeatureStep to randomize local directories
[ https://issues.apache.org/jira/browse/SPARK-39755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-39755: -- Affects Version/s: 3.4.0 (was: 3.3.0) > Improve LocalDirsFeatureStep to randomize local directories > --- > > Key: SPARK-39755 > URL: https://issues.apache.org/jira/browse/SPARK-39755 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 3.4.0 >Reporter: pralabhkumar >Assignee: pralabhkumar >Priority: Minor > Fix For: 3.4.0 > > > In org.apache.spark.util getConfiguredLocalDirs > > {code:java} > if (isRunningInYarnContainer(conf)) { > // If we are in yarn mode, systems can have different disk layouts so we > must set it > // to what Yarn on this system said was available. Note this assumes that > Yarn has > // created the directories already, and that they are secured so that only > the > // user has access to them. > randomizeInPlace(getYarnLocalDirs(conf).split(",")) > } else if (conf.getenv("SPARK_EXECUTOR_DIRS") != null) { > conf.getenv("SPARK_EXECUTOR_DIRS").split(File.pathSeparator) > } else if (conf.getenv("SPARK_LOCAL_DIRS") != null) { > conf.getenv("SPARK_LOCAL_DIRS").split(",") > }{code} > randomizedInplace is not called conf.getenv("SPARK_LOCAL_DIRS").split(",") . > This is what used in case of K8s and the shuffle locations are not > randomized. > IMHO , this should be randomized , so that all the directories have equal > changes of pushing the data as was done on yarn side > > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-39755) Improve LocalDirsFeatureStep to randomize local directories
[ https://issues.apache.org/jira/browse/SPARK-39755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-39755: -- Component/s: (was: Spark Core) > Improve LocalDirsFeatureStep to randomize local directories > --- > > Key: SPARK-39755 > URL: https://issues.apache.org/jira/browse/SPARK-39755 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 3.3.0 >Reporter: pralabhkumar >Assignee: pralabhkumar >Priority: Minor > Fix For: 3.4.0 > > > In org.apache.spark.util getConfiguredLocalDirs > > {code:java} > if (isRunningInYarnContainer(conf)) { > // If we are in yarn mode, systems can have different disk layouts so we > must set it > // to what Yarn on this system said was available. Note this assumes that > Yarn has > // created the directories already, and that they are secured so that only > the > // user has access to them. > randomizeInPlace(getYarnLocalDirs(conf).split(",")) > } else if (conf.getenv("SPARK_EXECUTOR_DIRS") != null) { > conf.getenv("SPARK_EXECUTOR_DIRS").split(File.pathSeparator) > } else if (conf.getenv("SPARK_LOCAL_DIRS") != null) { > conf.getenv("SPARK_LOCAL_DIRS").split(",") > }{code} > randomizedInplace is not called conf.getenv("SPARK_LOCAL_DIRS").split(",") . > This is what used in case of K8s and the shuffle locations are not > randomized. > IMHO , this should be randomized , so that all the directories have equal > changes of pushing the data as was done on yarn side > > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-39755) Improve LocalDirsFeatureStep to randomize local directories
[ https://issues.apache.org/jira/browse/SPARK-39755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-39755: -- Summary: Improve LocalDirsFeatureStep to randomize local directories (was: SPARK_LOCAL_DIRS locations are not randomized in K8s) > Improve LocalDirsFeatureStep to randomize local directories > --- > > Key: SPARK-39755 > URL: https://issues.apache.org/jira/browse/SPARK-39755 > Project: Spark > Issue Type: Improvement > Components: Kubernetes, Spark Core >Affects Versions: 3.3.0 >Reporter: pralabhkumar >Assignee: pralabhkumar >Priority: Minor > Fix For: 3.4.0 > > > In org.apache.spark.util getConfiguredLocalDirs > > {code:java} > if (isRunningInYarnContainer(conf)) { > // If we are in yarn mode, systems can have different disk layouts so we > must set it > // to what Yarn on this system said was available. Note this assumes that > Yarn has > // created the directories already, and that they are secured so that only > the > // user has access to them. > randomizeInPlace(getYarnLocalDirs(conf).split(",")) > } else if (conf.getenv("SPARK_EXECUTOR_DIRS") != null) { > conf.getenv("SPARK_EXECUTOR_DIRS").split(File.pathSeparator) > } else if (conf.getenv("SPARK_LOCAL_DIRS") != null) { > conf.getenv("SPARK_LOCAL_DIRS").split(",") > }{code} > randomizedInplace is not called conf.getenv("SPARK_LOCAL_DIRS").split(",") . > This is what used in case of K8s and the shuffle locations are not > randomized. > IMHO , this should be randomized , so that all the directories have equal > changes of pushing the data as was done on yarn side > > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-39755) SPARK_LOCAL_DIRS locations are not randomized in K8s
[ https://issues.apache.org/jira/browse/SPARK-39755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-39755: -- Issue Type: Improvement (was: Bug) > SPARK_LOCAL_DIRS locations are not randomized in K8s > > > Key: SPARK-39755 > URL: https://issues.apache.org/jira/browse/SPARK-39755 > Project: Spark > Issue Type: Improvement > Components: Kubernetes, Spark Core >Affects Versions: 3.3.0 >Reporter: pralabhkumar >Assignee: pralabhkumar >Priority: Minor > Fix For: 3.4.0 > > > In org.apache.spark.util getConfiguredLocalDirs > > {code:java} > if (isRunningInYarnContainer(conf)) { > // If we are in yarn mode, systems can have different disk layouts so we > must set it > // to what Yarn on this system said was available. Note this assumes that > Yarn has > // created the directories already, and that they are secured so that only > the > // user has access to them. > randomizeInPlace(getYarnLocalDirs(conf).split(",")) > } else if (conf.getenv("SPARK_EXECUTOR_DIRS") != null) { > conf.getenv("SPARK_EXECUTOR_DIRS").split(File.pathSeparator) > } else if (conf.getenv("SPARK_LOCAL_DIRS") != null) { > conf.getenv("SPARK_LOCAL_DIRS").split(",") > }{code} > randomizedInplace is not called conf.getenv("SPARK_LOCAL_DIRS").split(",") . > This is what used in case of K8s and the shuffle locations are not > randomized. > IMHO , this should be randomized , so that all the directories have equal > changes of pushing the data as was done on yarn side > > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-39755) SPARK_LOCAL_DIRS locations are not randomized in K8s
[ https://issues.apache.org/jira/browse/SPARK-39755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-39755. --- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 37203 [https://github.com/apache/spark/pull/37203] > SPARK_LOCAL_DIRS locations are not randomized in K8s > > > Key: SPARK-39755 > URL: https://issues.apache.org/jira/browse/SPARK-39755 > Project: Spark > Issue Type: Bug > Components: Kubernetes, Spark Core >Affects Versions: 3.3.0 >Reporter: pralabhkumar >Assignee: pralabhkumar >Priority: Minor > Fix For: 3.4.0 > > > In org.apache.spark.util getConfiguredLocalDirs > > {code:java} > if (isRunningInYarnContainer(conf)) { > // If we are in yarn mode, systems can have different disk layouts so we > must set it > // to what Yarn on this system said was available. Note this assumes that > Yarn has > // created the directories already, and that they are secured so that only > the > // user has access to them. > randomizeInPlace(getYarnLocalDirs(conf).split(",")) > } else if (conf.getenv("SPARK_EXECUTOR_DIRS") != null) { > conf.getenv("SPARK_EXECUTOR_DIRS").split(File.pathSeparator) > } else if (conf.getenv("SPARK_LOCAL_DIRS") != null) { > conf.getenv("SPARK_LOCAL_DIRS").split(",") > }{code} > randomizedInplace is not called conf.getenv("SPARK_LOCAL_DIRS").split(",") . > This is what used in case of K8s and the shuffle locations are not > randomized. > IMHO , this should be randomized , so that all the directories have equal > changes of pushing the data as was done on yarn side > > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39755) SPARK_LOCAL_DIRS locations are not randomized in K8s
[ https://issues.apache.org/jira/browse/SPARK-39755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-39755: - Assignee: pralabhkumar > SPARK_LOCAL_DIRS locations are not randomized in K8s > > > Key: SPARK-39755 > URL: https://issues.apache.org/jira/browse/SPARK-39755 > Project: Spark > Issue Type: Bug > Components: Kubernetes, Spark Core >Affects Versions: 3.3.0 >Reporter: pralabhkumar >Assignee: pralabhkumar >Priority: Minor > > In org.apache.spark.util getConfiguredLocalDirs > > {code:java} > if (isRunningInYarnContainer(conf)) { > // If we are in yarn mode, systems can have different disk layouts so we > must set it > // to what Yarn on this system said was available. Note this assumes that > Yarn has > // created the directories already, and that they are secured so that only > the > // user has access to them. > randomizeInPlace(getYarnLocalDirs(conf).split(",")) > } else if (conf.getenv("SPARK_EXECUTOR_DIRS") != null) { > conf.getenv("SPARK_EXECUTOR_DIRS").split(File.pathSeparator) > } else if (conf.getenv("SPARK_LOCAL_DIRS") != null) { > conf.getenv("SPARK_LOCAL_DIRS").split(",") > }{code} > randomizedInplace is not called conf.getenv("SPARK_LOCAL_DIRS").split(",") . > This is what used in case of K8s and the shuffle locations are not > randomized. > IMHO , this should be randomized , so that all the directories have equal > changes of pushing the data as was done on yarn side > > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40179) Run / Scala 2.13 build with SBT GA failed
[ https://issues.apache.org/jira/browse/SPARK-40179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40179: Assignee: Apache Spark > Run / Scala 2.13 build with SBT GA failed > - > > Key: SPARK-40179 > URL: https://issues.apache.org/jira/browse/SPARK-40179 > Project: Spark > Issue Type: Bug > Components: Build, Tests >Affects Versions: 3.4.0 >Reporter: Yang Jie >Assignee: Apache Spark >Priority: Major > > > {code:java} > [error] > /home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/auth/HttpAuthUtils.java:36:1: > error: package org.apache.http.protocol does not exist > 1011[error] import org.apache.http.protocol.BasicHttpContext; > 1012[error]^ > 1013[error] > /home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/auth/HttpAuthUtils.java:156:1: > error: cannot find symbol > 1014[error] private final HttpContext httpContext; > 1015[error] ^ symbol: class HttpContext > 1016[error] location: class HttpKerberosClientAction > 1017[error] 3 errors {code} > > * [https://github.com/apache/spark/runs/7947684467?check_suite_focus=true] > * [https://github.com/apache/spark/runs/7947300886?check_suite_focus=true] > * [https://github.com/apache/spark/runs/7946453241?check_suite_focus=true] > * [https://github.com/apache/spark/runs/7946444061?check_suite_focus=true] > > But local run > > {code:java} > ./dev/change-scala-version.sh 2.13 > ./build/sbt -Pyarn -Pmesos -Pkubernetes -Pvolcano -Phive -Phive-thriftserver > -Phadoop-cloud -Pkinesis-asl -Pdocker-integration-tests > -Pkubernetes-integration-tests -Pspark-ganglia-lgpl -Pscala-2.13 compile > Test/compile > {code} > can pass. Maybe cache file corrupt ? > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40179) Run / Scala 2.13 build with SBT GA failed
[ https://issues.apache.org/jira/browse/SPARK-40179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40179: Assignee: (was: Apache Spark) > Run / Scala 2.13 build with SBT GA failed > - > > Key: SPARK-40179 > URL: https://issues.apache.org/jira/browse/SPARK-40179 > Project: Spark > Issue Type: Bug > Components: Build, Tests >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Major > > > {code:java} > [error] > /home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/auth/HttpAuthUtils.java:36:1: > error: package org.apache.http.protocol does not exist > 1011[error] import org.apache.http.protocol.BasicHttpContext; > 1012[error]^ > 1013[error] > /home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/auth/HttpAuthUtils.java:156:1: > error: cannot find symbol > 1014[error] private final HttpContext httpContext; > 1015[error] ^ symbol: class HttpContext > 1016[error] location: class HttpKerberosClientAction > 1017[error] 3 errors {code} > > * [https://github.com/apache/spark/runs/7947684467?check_suite_focus=true] > * [https://github.com/apache/spark/runs/7947300886?check_suite_focus=true] > * [https://github.com/apache/spark/runs/7946453241?check_suite_focus=true] > * [https://github.com/apache/spark/runs/7946444061?check_suite_focus=true] > > But local run > > {code:java} > ./dev/change-scala-version.sh 2.13 > ./build/sbt -Pyarn -Pmesos -Pkubernetes -Pvolcano -Phive -Phive-thriftserver > -Phadoop-cloud -Pkinesis-asl -Pdocker-integration-tests > -Pkubernetes-integration-tests -Pspark-ganglia-lgpl -Pscala-2.13 compile > Test/compile > {code} > can pass. Maybe cache file corrupt ? > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40179) Run / Scala 2.13 build with SBT GA failed
[ https://issues.apache.org/jira/browse/SPARK-40179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40179: Assignee: Apache Spark > Run / Scala 2.13 build with SBT GA failed > - > > Key: SPARK-40179 > URL: https://issues.apache.org/jira/browse/SPARK-40179 > Project: Spark > Issue Type: Bug > Components: Build, Tests >Affects Versions: 3.4.0 >Reporter: Yang Jie >Assignee: Apache Spark >Priority: Major > > > {code:java} > [error] > /home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/auth/HttpAuthUtils.java:36:1: > error: package org.apache.http.protocol does not exist > 1011[error] import org.apache.http.protocol.BasicHttpContext; > 1012[error]^ > 1013[error] > /home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/auth/HttpAuthUtils.java:156:1: > error: cannot find symbol > 1014[error] private final HttpContext httpContext; > 1015[error] ^ symbol: class HttpContext > 1016[error] location: class HttpKerberosClientAction > 1017[error] 3 errors {code} > > * [https://github.com/apache/spark/runs/7947684467?check_suite_focus=true] > * [https://github.com/apache/spark/runs/7947300886?check_suite_focus=true] > * [https://github.com/apache/spark/runs/7946453241?check_suite_focus=true] > * [https://github.com/apache/spark/runs/7946444061?check_suite_focus=true] > > But local run > > {code:java} > ./dev/change-scala-version.sh 2.13 > ./build/sbt -Pyarn -Pmesos -Pkubernetes -Pvolcano -Phive -Phive-thriftserver > -Phadoop-cloud -Pkinesis-asl -Pdocker-integration-tests > -Pkubernetes-integration-tests -Pspark-ganglia-lgpl -Pscala-2.13 compile > Test/compile > {code} > can pass. Maybe cache file corrupt ? > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40179) Run / Scala 2.13 build with SBT GA failed
[ https://issues.apache.org/jira/browse/SPARK-40179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17582951#comment-17582951 ] Apache Spark commented on SPARK-40179: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/37617 > Run / Scala 2.13 build with SBT GA failed > - > > Key: SPARK-40179 > URL: https://issues.apache.org/jira/browse/SPARK-40179 > Project: Spark > Issue Type: Bug > Components: Build, Tests >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Major > > > {code:java} > [error] > /home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/auth/HttpAuthUtils.java:36:1: > error: package org.apache.http.protocol does not exist > 1011[error] import org.apache.http.protocol.BasicHttpContext; > 1012[error]^ > 1013[error] > /home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/auth/HttpAuthUtils.java:156:1: > error: cannot find symbol > 1014[error] private final HttpContext httpContext; > 1015[error] ^ symbol: class HttpContext > 1016[error] location: class HttpKerberosClientAction > 1017[error] 3 errors {code} > > * [https://github.com/apache/spark/runs/7947684467?check_suite_focus=true] > * [https://github.com/apache/spark/runs/7947300886?check_suite_focus=true] > * [https://github.com/apache/spark/runs/7946453241?check_suite_focus=true] > * [https://github.com/apache/spark/runs/7946444061?check_suite_focus=true] > > But local run > > {code:java} > ./dev/change-scala-version.sh 2.13 > ./build/sbt -Pyarn -Pmesos -Pkubernetes -Pvolcano -Phive -Phive-thriftserver > -Phadoop-cloud -Pkinesis-asl -Pdocker-integration-tests > -Pkubernetes-integration-tests -Pspark-ganglia-lgpl -Pscala-2.13 compile > Test/compile > {code} > can pass. Maybe cache file corrupt ? > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-40170) StringCoding UTF8 decode slowly
[ https://issues.apache.org/jira/browse/SPARK-40170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17582935#comment-17582935 ] caican edited comment on SPARK-40170 at 8/22/22 12:13 PM: -- [~kabhwan] My program code is very simple,as shown below. ``` val rdd = spark.sql("select triggerId,adMetadata,userData from iceberg_my_cloud.mydb.myTable where date = 20220801").rdd println(rdd.count()) ``` In addition to string decode, the conversion of Tuple2 to MAP is slow and i have submitted a patch:https://github.com/apache/spark/pull/37609 to optimize it but right now I don't have a good way to optimize string decode was (Author: JIRAUSER280464): My program code is very simple,As shown below. ``` val rdd = spark.sql("select triggerId,adMetadata,userData from iceberg_my_cloud.mydb.myTable where date = 20220801").rdd println(rdd.count()) ``` > StringCoding UTF8 decode slowly > --- > > Key: SPARK-40170 > URL: https://issues.apache.org/jira/browse/SPARK-40170 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.2, 3.2.0, 3.1.3, 3.2.1, 3.3.0, 3.2.2, 3.3.1 >Reporter: caican >Priority: Major > Attachments: image-2022-08-22-10-56-54-768.png, > image-2022-08-22-10-57-11-744.png > > > When `UnsafeRow` is converted to `Row` at > `org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificSafeProjection.createExternalRow > `, UTF8String decoding and copyMemory process are very slow. > Does anyone have any ideas for optimization? > !image-2022-08-22-10-56-54-768.png! > > !image-2022-08-22-10-57-11-744.png! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40170) StringCoding UTF8 decode slowly
[ https://issues.apache.org/jira/browse/SPARK-40170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17582935#comment-17582935 ] caican commented on SPARK-40170: My program code is very simple,As shown below. ``` val rdd = spark.sql("select triggerId,adMetadata,userData from iceberg_my_cloud.mydb.myTable where date = 20220801").rdd println(rdd.count()) ``` > StringCoding UTF8 decode slowly > --- > > Key: SPARK-40170 > URL: https://issues.apache.org/jira/browse/SPARK-40170 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.2, 3.2.0, 3.1.3, 3.2.1, 3.3.0, 3.2.2, 3.3.1 >Reporter: caican >Priority: Major > Attachments: image-2022-08-22-10-56-54-768.png, > image-2022-08-22-10-57-11-744.png > > > When `UnsafeRow` is converted to `Row` at > `org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificSafeProjection.createExternalRow > `, UTF8String decoding and copyMemory process are very slow. > Does anyone have any ideas for optimization? > !image-2022-08-22-10-56-54-768.png! > > !image-2022-08-22-10-57-11-744.png! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40179) Run / Scala 2.13 build with SBT GA failed
[ https://issues.apache.org/jira/browse/SPARK-40179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17582933#comment-17582933 ] Yang Jie commented on SPARK-40179: -- ping [~hyukjin.kwon] , Maybe file cache corrupt ?Can we clean local repository cache manually? > Run / Scala 2.13 build with SBT GA failed > - > > Key: SPARK-40179 > URL: https://issues.apache.org/jira/browse/SPARK-40179 > Project: Spark > Issue Type: Bug > Components: Build, Tests >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Major > > > {code:java} > [error] > /home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/auth/HttpAuthUtils.java:36:1: > error: package org.apache.http.protocol does not exist > 1011[error] import org.apache.http.protocol.BasicHttpContext; > 1012[error]^ > 1013[error] > /home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/auth/HttpAuthUtils.java:156:1: > error: cannot find symbol > 1014[error] private final HttpContext httpContext; > 1015[error] ^ symbol: class HttpContext > 1016[error] location: class HttpKerberosClientAction > 1017[error] 3 errors {code} > > * [https://github.com/apache/spark/runs/7947684467?check_suite_focus=true] > * [https://github.com/apache/spark/runs/7947300886?check_suite_focus=true] > * [https://github.com/apache/spark/runs/7946453241?check_suite_focus=true] > * [https://github.com/apache/spark/runs/7946444061?check_suite_focus=true] > > But local run > > {code:java} > ./dev/change-scala-version.sh 2.13 > ./build/sbt -Pyarn -Pmesos -Pkubernetes -Pvolcano -Phive -Phive-thriftserver > -Phadoop-cloud -Pkinesis-asl -Pdocker-integration-tests > -Pkubernetes-integration-tests -Pspark-ganglia-lgpl -Pscala-2.13 compile > Test/compile > {code} > can pass. Maybe cache file corrupt ? > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40178) Rebalance/Repartition Hints Not Working in PySpark
[ https://issues.apache.org/jira/browse/SPARK-40178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40178: Assignee: Apache Spark > Rebalance/Repartition Hints Not Working in PySpark > -- > > Key: SPARK-40178 > URL: https://issues.apache.org/jira/browse/SPARK-40178 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.2.0, 3.2.1, 3.3.0, 3.2.2 > Environment: Mac OSX 11.4 Big Sur > Python 3.9.7 > Spark version >= 3.2.0 (perhaps before as well). >Reporter: Maxwell Conradt >Assignee: Apache Spark >Priority: Major > Fix For: 3.2.0, 3.2.1, 3.3.0, 3.2.2, 3.4.0, 3.3.1 > > Original Estimate: 168h > Remaining Estimate: 168h > > Partitioning hints in PySpark do not work because the column parameters are > not converted to Catalyst `Expression` instances before being passed to the > hint resolver. > The behavior of the hints is documented > [here|https://spark.apache.org/docs/3.3.0/sql-ref-syntax-qry-select-hints.html#partitioning-hints-types]. > Example: > > {code:java} > >>> df = spark.range(1024) > >>> > >>> df > DataFrame[id: bigint] > >>> df.hint("rebalance", "id") > Traceback (most recent call last): > File "", line 1, in > File "/Users/maxwellconradt/spark/python/pyspark/sql/dataframe.py", line > 980, in hint > jdf = self._jdf.hint(name, self._jseq(parameters)) > File > "/Users/maxwellconradt/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py", > line 1322, in __call__ > File "/Users/maxwellconradt/spark/python/pyspark/sql/utils.py", line 196, > in deco > raise converted from None > pyspark.sql.utils.AnalysisException: REBALANCE Hint parameter should include > columns, but id found > >>> df.hint("repartition", "id") > Traceback (most recent call last): > File "", line 1, in > File "/Users/maxwellconradt/spark/python/pyspark/sql/dataframe.py", line > 980, in hint > jdf = self._jdf.hint(name, self._jseq(parameters)) > File > "/Users/maxwellconradt/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py", > line 1322, in __call__ > File "/Users/maxwellconradt/spark/python/pyspark/sql/utils.py", line 196, > in deco > raise converted from None > pyspark.sql.utils.AnalysisException: REPARTITION Hint parameter should > include columns, but id found {code} > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40178) Rebalance/Repartition Hints Not Working in PySpark
[ https://issues.apache.org/jira/browse/SPARK-40178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17582932#comment-17582932 ] Apache Spark commented on SPARK-40178: -- User 'mhconradt' has created a pull request for this issue: https://github.com/apache/spark/pull/37616 > Rebalance/Repartition Hints Not Working in PySpark > -- > > Key: SPARK-40178 > URL: https://issues.apache.org/jira/browse/SPARK-40178 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.2.0, 3.2.1, 3.3.0, 3.2.2 > Environment: Mac OSX 11.4 Big Sur > Python 3.9.7 > Spark version >= 3.2.0 (perhaps before as well). >Reporter: Maxwell Conradt >Priority: Major > Fix For: 3.2.0, 3.2.1, 3.3.0, 3.2.2, 3.4.0, 3.3.1 > > Original Estimate: 168h > Remaining Estimate: 168h > > Partitioning hints in PySpark do not work because the column parameters are > not converted to Catalyst `Expression` instances before being passed to the > hint resolver. > The behavior of the hints is documented > [here|https://spark.apache.org/docs/3.3.0/sql-ref-syntax-qry-select-hints.html#partitioning-hints-types]. > Example: > > {code:java} > >>> df = spark.range(1024) > >>> > >>> df > DataFrame[id: bigint] > >>> df.hint("rebalance", "id") > Traceback (most recent call last): > File "", line 1, in > File "/Users/maxwellconradt/spark/python/pyspark/sql/dataframe.py", line > 980, in hint > jdf = self._jdf.hint(name, self._jseq(parameters)) > File > "/Users/maxwellconradt/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py", > line 1322, in __call__ > File "/Users/maxwellconradt/spark/python/pyspark/sql/utils.py", line 196, > in deco > raise converted from None > pyspark.sql.utils.AnalysisException: REBALANCE Hint parameter should include > columns, but id found > >>> df.hint("repartition", "id") > Traceback (most recent call last): > File "", line 1, in > File "/Users/maxwellconradt/spark/python/pyspark/sql/dataframe.py", line > 980, in hint > jdf = self._jdf.hint(name, self._jseq(parameters)) > File > "/Users/maxwellconradt/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py", > line 1322, in __call__ > File "/Users/maxwellconradt/spark/python/pyspark/sql/utils.py", line 196, > in deco > raise converted from None > pyspark.sql.utils.AnalysisException: REPARTITION Hint parameter should > include columns, but id found {code} > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40178) Rebalance/Repartition Hints Not Working in PySpark
[ https://issues.apache.org/jira/browse/SPARK-40178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40178: Assignee: (was: Apache Spark) > Rebalance/Repartition Hints Not Working in PySpark > -- > > Key: SPARK-40178 > URL: https://issues.apache.org/jira/browse/SPARK-40178 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.2.0, 3.2.1, 3.3.0, 3.2.2 > Environment: Mac OSX 11.4 Big Sur > Python 3.9.7 > Spark version >= 3.2.0 (perhaps before as well). >Reporter: Maxwell Conradt >Priority: Major > Fix For: 3.2.0, 3.2.1, 3.3.0, 3.2.2, 3.4.0, 3.3.1 > > Original Estimate: 168h > Remaining Estimate: 168h > > Partitioning hints in PySpark do not work because the column parameters are > not converted to Catalyst `Expression` instances before being passed to the > hint resolver. > The behavior of the hints is documented > [here|https://spark.apache.org/docs/3.3.0/sql-ref-syntax-qry-select-hints.html#partitioning-hints-types]. > Example: > > {code:java} > >>> df = spark.range(1024) > >>> > >>> df > DataFrame[id: bigint] > >>> df.hint("rebalance", "id") > Traceback (most recent call last): > File "", line 1, in > File "/Users/maxwellconradt/spark/python/pyspark/sql/dataframe.py", line > 980, in hint > jdf = self._jdf.hint(name, self._jseq(parameters)) > File > "/Users/maxwellconradt/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py", > line 1322, in __call__ > File "/Users/maxwellconradt/spark/python/pyspark/sql/utils.py", line 196, > in deco > raise converted from None > pyspark.sql.utils.AnalysisException: REBALANCE Hint parameter should include > columns, but id found > >>> df.hint("repartition", "id") > Traceback (most recent call last): > File "", line 1, in > File "/Users/maxwellconradt/spark/python/pyspark/sql/dataframe.py", line > 980, in hint > jdf = self._jdf.hint(name, self._jseq(parameters)) > File > "/Users/maxwellconradt/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py", > line 1322, in __call__ > File "/Users/maxwellconradt/spark/python/pyspark/sql/utils.py", line 196, > in deco > raise converted from None > pyspark.sql.utils.AnalysisException: REPARTITION Hint parameter should > include columns, but id found {code} > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40179) Run / Scala 2.13 build with SBT GA failed
Yang Jie created SPARK-40179: Summary: Run / Scala 2.13 build with SBT GA failed Key: SPARK-40179 URL: https://issues.apache.org/jira/browse/SPARK-40179 Project: Spark Issue Type: Bug Components: Build, Tests Affects Versions: 3.4.0 Reporter: Yang Jie {code:java} [error] /home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/auth/HttpAuthUtils.java:36:1: error: package org.apache.http.protocol does not exist 1011[error] import org.apache.http.protocol.BasicHttpContext; 1012[error]^ 1013[error] /home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/auth/HttpAuthUtils.java:156:1: error: cannot find symbol 1014[error] private final HttpContext httpContext; 1015[error] ^ symbol: class HttpContext 1016[error] location: class HttpKerberosClientAction 1017[error] 3 errors {code} * [https://github.com/apache/spark/runs/7947684467?check_suite_focus=true] * [https://github.com/apache/spark/runs/7947300886?check_suite_focus=true] * [https://github.com/apache/spark/runs/7946453241?check_suite_focus=true] * [https://github.com/apache/spark/runs/7946444061?check_suite_focus=true] But local run {code:java} ./dev/change-scala-version.sh 2.13 ./build/sbt -Pyarn -Pmesos -Pkubernetes -Pvolcano -Phive -Phive-thriftserver -Phadoop-cloud -Pkinesis-asl -Pdocker-integration-tests -Pkubernetes-integration-tests -Pspark-ganglia-lgpl -Pscala-2.13 compile Test/compile {code} can pass. Maybe cache file corrupt ? -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40124) Update TPCDS v1.4 q32 for Plan Stability tests
[ https://issues.apache.org/jira/browse/SPARK-40124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17582929#comment-17582929 ] Apache Spark commented on SPARK-40124: -- User 'mskapilks' has created a pull request for this issue: https://github.com/apache/spark/pull/37615 > Update TPCDS v1.4 q32 for Plan Stability tests > -- > > Key: SPARK-40124 > URL: https://issues.apache.org/jira/browse/SPARK-40124 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: Kapil Singh >Assignee: Kapil Singh >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40178) Rebalance/Repartition Hints Not Working in PySpark
Maxwell Conradt created SPARK-40178: --- Summary: Rebalance/Repartition Hints Not Working in PySpark Key: SPARK-40178 URL: https://issues.apache.org/jira/browse/SPARK-40178 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 3.2.2, 3.3.0, 3.2.1, 3.2.0 Environment: Mac OSX 11.4 Big Sur Python 3.9.7 Spark version >= 3.2.0 (perhaps before as well). Reporter: Maxwell Conradt Fix For: 3.4.0, 3.3.1, 3.2.2, 3.3.0, 3.2.1, 3.2.0 Partitioning hints in PySpark do not work because the column parameters are not converted to Catalyst `Expression` instances before being passed to the hint resolver. The behavior of the hints is documented [here|https://spark.apache.org/docs/3.3.0/sql-ref-syntax-qry-select-hints.html#partitioning-hints-types]. Example: {code:java} >>> df = spark.range(1024) >>> >>> df DataFrame[id: bigint] >>> df.hint("rebalance", "id") Traceback (most recent call last): File "", line 1, in File "/Users/maxwellconradt/spark/python/pyspark/sql/dataframe.py", line 980, in hint jdf = self._jdf.hint(name, self._jseq(parameters)) File "/Users/maxwellconradt/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py", line 1322, in __call__ File "/Users/maxwellconradt/spark/python/pyspark/sql/utils.py", line 196, in deco raise converted from None pyspark.sql.utils.AnalysisException: REBALANCE Hint parameter should include columns, but id found >>> df.hint("repartition", "id") Traceback (most recent call last): File "", line 1, in File "/Users/maxwellconradt/spark/python/pyspark/sql/dataframe.py", line 980, in hint jdf = self._jdf.hint(name, self._jseq(parameters)) File "/Users/maxwellconradt/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py", line 1322, in __call__ File "/Users/maxwellconradt/spark/python/pyspark/sql/utils.py", line 196, in deco raise converted from None pyspark.sql.utils.AnalysisException: REPARTITION Hint parameter should include columns, but id found {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38992) Avoid using bash -c in ShellBasedGroupsMappingProvider
[ https://issues.apache.org/jira/browse/SPARK-38992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17582924#comment-17582924 ] Apache Spark commented on SPARK-38992: -- User 'leoluan2009' has created a pull request for this issue: https://github.com/apache/spark/pull/37614 > Avoid using bash -c in ShellBasedGroupsMappingProvider > -- > > Key: SPARK-38992 > URL: https://issues.apache.org/jira/browse/SPARK-38992 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.3, 3.1.2, 3.2.1, 3.3.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Fix For: 3.1.3, 3.0.4, 3.3.0, 3.2.2 > > > Using bash -c can allow arbitrary shall execution from the end user. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37944) Use error classes in the execution errors of casting
[ https://issues.apache.org/jira/browse/SPARK-37944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17582884#comment-17582884 ] Apache Spark commented on SPARK-37944: -- User 'goutam-git' has created a pull request for this issue: https://github.com/apache/spark/pull/37613 > Use error classes in the execution errors of casting > > > Key: SPARK-37944 > URL: https://issues.apache.org/jira/browse/SPARK-37944 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Max Gekk >Priority: Major > > Migrate the following errors in QueryExecutionErrors: > * failedToCastValueToDataTypeForPartitionColumnError > * invalidInputSyntaxForNumericError > * cannotCastToDateTimeError > * invalidInputSyntaxForBooleanError > * nullLiteralsCannotBeCastedError > onto use error classes. Throw an implementation of SparkThrowable. Also write > a test per every error in QueryExecutionErrorsSuite. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37944) Use error classes in the execution errors of casting
[ https://issues.apache.org/jira/browse/SPARK-37944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37944: Assignee: (was: Apache Spark) > Use error classes in the execution errors of casting > > > Key: SPARK-37944 > URL: https://issues.apache.org/jira/browse/SPARK-37944 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Max Gekk >Priority: Major > > Migrate the following errors in QueryExecutionErrors: > * failedToCastValueToDataTypeForPartitionColumnError > * invalidInputSyntaxForNumericError > * cannotCastToDateTimeError > * invalidInputSyntaxForBooleanError > * nullLiteralsCannotBeCastedError > onto use error classes. Throw an implementation of SparkThrowable. Also write > a test per every error in QueryExecutionErrorsSuite. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37944) Use error classes in the execution errors of casting
[ https://issues.apache.org/jira/browse/SPARK-37944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37944: Assignee: Apache Spark > Use error classes in the execution errors of casting > > > Key: SPARK-37944 > URL: https://issues.apache.org/jira/browse/SPARK-37944 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Max Gekk >Assignee: Apache Spark >Priority: Major > > Migrate the following errors in QueryExecutionErrors: > * failedToCastValueToDataTypeForPartitionColumnError > * invalidInputSyntaxForNumericError > * cannotCastToDateTimeError > * invalidInputSyntaxForBooleanError > * nullLiteralsCannotBeCastedError > onto use error classes. Throw an implementation of SparkThrowable. Also write > a test per every error in QueryExecutionErrorsSuite. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39915) Dataset.repartition(N) may not create N partitions
[ https://issues.apache.org/jira/browse/SPARK-39915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-39915: Assignee: Apache Spark > Dataset.repartition(N) may not create N partitions > -- > > Key: SPARK-39915 > URL: https://issues.apache.org/jira/browse/SPARK-39915 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: Shixiong Zhu >Assignee: Apache Spark >Priority: Major > > Looks like there is a behavior change in Dataset.repartition in 3.3.0. For > example, `spark.range(10, 0).repartition(5).rdd.getNumPartitions` returns 5 > in Spark 3.2.0, but 0 in Spark 3.3.0. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39915) Dataset.repartition(N) may not create N partitions
[ https://issues.apache.org/jira/browse/SPARK-39915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-39915: Assignee: (was: Apache Spark) > Dataset.repartition(N) may not create N partitions > -- > > Key: SPARK-39915 > URL: https://issues.apache.org/jira/browse/SPARK-39915 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: Shixiong Zhu >Priority: Major > > Looks like there is a behavior change in Dataset.repartition in 3.3.0. For > example, `spark.range(10, 0).repartition(5).rdd.getNumPartitions` returns 5 > in Spark 3.2.0, but 0 in Spark 3.3.0. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39915) Dataset.repartition(N) may not create N partitions
[ https://issues.apache.org/jira/browse/SPARK-39915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17582864#comment-17582864 ] Apache Spark commented on SPARK-39915: -- User 'ulysses-you' has created a pull request for this issue: https://github.com/apache/spark/pull/37612 > Dataset.repartition(N) may not create N partitions > -- > > Key: SPARK-39915 > URL: https://issues.apache.org/jira/browse/SPARK-39915 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: Shixiong Zhu >Priority: Major > > Looks like there is a behavior change in Dataset.repartition in 3.3.0. For > example, `spark.range(10, 0).repartition(5).rdd.getNumPartitions` returns 5 > in Spark 3.2.0, but 0 in Spark 3.3.0. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38752) Test the error class: UNSUPPORTED_DATATYPE
[ https://issues.apache.org/jira/browse/SPARK-38752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17582851#comment-17582851 ] lvshaokang commented on SPARK-38752: [~maxgekk] I am working on this. > Test the error class: UNSUPPORTED_DATATYPE > -- > > Key: SPARK-38752 > URL: https://issues.apache.org/jira/browse/SPARK-38752 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Priority: Minor > Labels: starter > > Add a test for the error classes *UNSUPPORTED_DATATYPE* to > QueryExecutionErrorsSuite. The test should cover the exception throw in > QueryExecutionErrors: > {code:scala} > def dataTypeUnsupportedError(dataType: String, failure: String): Throwable > = { > new SparkIllegalArgumentException(errorClass = "UNSUPPORTED_DATATYPE", > messageParameters = Array(dataType + failure)) > } > {code} > For example, here is a test for the error class *UNSUPPORTED_FEATURE*: > https://github.com/apache/spark/blob/34e3029a43d2a8241f70f2343be8285cb7f231b9/sql/core/src/test/scala/org/apache/spark/sql/errors/QueryCompilationErrorsSuite.scala#L151-L170 > +The test must have a check of:+ > # the entire error message > # sqlState if it is defined in the error-classes.json file > # the error class -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38749) Test the error class: RENAME_SRC_PATH_NOT_FOUND
[ https://issues.apache.org/jira/browse/SPARK-38749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38749: Assignee: Apache Spark > Test the error class: RENAME_SRC_PATH_NOT_FOUND > --- > > Key: SPARK-38749 > URL: https://issues.apache.org/jira/browse/SPARK-38749 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Assignee: Apache Spark >Priority: Minor > Labels: starter > > Add a test for the error classes *RENAME_SRC_PATH_NOT_FOUND* to > QueryExecutionErrorsSuite. The test should cover the exception throw in > QueryExecutionErrors: > {code:scala} > def renameSrcPathNotFoundError(srcPath: Path): Throwable = { > new SparkFileNotFoundException(errorClass = "RENAME_SRC_PATH_NOT_FOUND", > Array(srcPath.toString)) > } > {code} > For example, here is a test for the error class *UNSUPPORTED_FEATURE*: > https://github.com/apache/spark/blob/34e3029a43d2a8241f70f2343be8285cb7f231b9/sql/core/src/test/scala/org/apache/spark/sql/errors/QueryCompilationErrorsSuite.scala#L151-L170 > +The test must have a check of:+ > # the entire error message > # sqlState if it is defined in the error-classes.json file > # the error class -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38749) Test the error class: RENAME_SRC_PATH_NOT_FOUND
[ https://issues.apache.org/jira/browse/SPARK-38749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38749: Assignee: (was: Apache Spark) > Test the error class: RENAME_SRC_PATH_NOT_FOUND > --- > > Key: SPARK-38749 > URL: https://issues.apache.org/jira/browse/SPARK-38749 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Priority: Minor > Labels: starter > > Add a test for the error classes *RENAME_SRC_PATH_NOT_FOUND* to > QueryExecutionErrorsSuite. The test should cover the exception throw in > QueryExecutionErrors: > {code:scala} > def renameSrcPathNotFoundError(srcPath: Path): Throwable = { > new SparkFileNotFoundException(errorClass = "RENAME_SRC_PATH_NOT_FOUND", > Array(srcPath.toString)) > } > {code} > For example, here is a test for the error class *UNSUPPORTED_FEATURE*: > https://github.com/apache/spark/blob/34e3029a43d2a8241f70f2343be8285cb7f231b9/sql/core/src/test/scala/org/apache/spark/sql/errors/QueryCompilationErrorsSuite.scala#L151-L170 > +The test must have a check of:+ > # the entire error message > # sqlState if it is defined in the error-classes.json file > # the error class -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38749) Test the error class: RENAME_SRC_PATH_NOT_FOUND
[ https://issues.apache.org/jira/browse/SPARK-38749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38749: Assignee: Apache Spark > Test the error class: RENAME_SRC_PATH_NOT_FOUND > --- > > Key: SPARK-38749 > URL: https://issues.apache.org/jira/browse/SPARK-38749 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Assignee: Apache Spark >Priority: Minor > Labels: starter > > Add a test for the error classes *RENAME_SRC_PATH_NOT_FOUND* to > QueryExecutionErrorsSuite. The test should cover the exception throw in > QueryExecutionErrors: > {code:scala} > def renameSrcPathNotFoundError(srcPath: Path): Throwable = { > new SparkFileNotFoundException(errorClass = "RENAME_SRC_PATH_NOT_FOUND", > Array(srcPath.toString)) > } > {code} > For example, here is a test for the error class *UNSUPPORTED_FEATURE*: > https://github.com/apache/spark/blob/34e3029a43d2a8241f70f2343be8285cb7f231b9/sql/core/src/test/scala/org/apache/spark/sql/errors/QueryCompilationErrorsSuite.scala#L151-L170 > +The test must have a check of:+ > # the entire error message > # sqlState if it is defined in the error-classes.json file > # the error class -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38749) Test the error class: RENAME_SRC_PATH_NOT_FOUND
[ https://issues.apache.org/jira/browse/SPARK-38749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17582846#comment-17582846 ] Apache Spark commented on SPARK-38749: -- User 'lvshaokang' has created a pull request for this issue: https://github.com/apache/spark/pull/37611 > Test the error class: RENAME_SRC_PATH_NOT_FOUND > --- > > Key: SPARK-38749 > URL: https://issues.apache.org/jira/browse/SPARK-38749 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Priority: Minor > Labels: starter > > Add a test for the error classes *RENAME_SRC_PATH_NOT_FOUND* to > QueryExecutionErrorsSuite. The test should cover the exception throw in > QueryExecutionErrors: > {code:scala} > def renameSrcPathNotFoundError(srcPath: Path): Throwable = { > new SparkFileNotFoundException(errorClass = "RENAME_SRC_PATH_NOT_FOUND", > Array(srcPath.toString)) > } > {code} > For example, here is a test for the error class *UNSUPPORTED_FEATURE*: > https://github.com/apache/spark/blob/34e3029a43d2a8241f70f2343be8285cb7f231b9/sql/core/src/test/scala/org/apache/spark/sql/errors/QueryCompilationErrorsSuite.scala#L151-L170 > +The test must have a check of:+ > # the entire error message > # sqlState if it is defined in the error-classes.json file > # the error class -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-40089) Sorting of at least Decimal(20, 2) fails for some values near the max.
[ https://issues.apache.org/jira/browse/SPARK-40089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-40089. - Fix Version/s: 3.3.1 3.1.4 3.2.3 3.4.0 Resolution: Fixed Issue resolved by pull request 37540 [https://github.com/apache/spark/pull/37540] > Sorting of at least Decimal(20, 2) fails for some values near the max. > -- > > Key: SPARK-40089 > URL: https://issues.apache.org/jira/browse/SPARK-40089 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0, 3.3.0, 3.4.0 >Reporter: Robert Joseph Evans >Assignee: Apache Spark >Priority: Major > Fix For: 3.3.1, 3.1.4, 3.2.3, 3.4.0 > > Attachments: input.parquet > > > I have been doing some testing with Decimal values for the RAPIDS Accelerator > for Apache Spark. I have been trying to add in new corner cases and when I > tried to enable the maximum supported value for a sort I started to get > failures. On closer inspection it looks like the CPU is sorting things > incorrectly. Specifically anything that is "99.50" or above > is placed as a chunk in the wrong location in the outputs. > In local mode with 12 tasks. > {code:java} > spark.read.parquet("input.parquet").orderBy(col("a")).collect.foreach(System.err.println) > {code} > > Here you will notice that the last entry printed is > {{[99.49]}}, and {{[99.99]}} is near the top > near {{[-99.99]}} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40177) Simplify join condition of form (a==b) || (a==null&==null) to a<=>b
[ https://issues.apache.org/jira/browse/SPARK-40177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17582799#comment-17582799 ] Ayushi Agarwal commented on SPARK-40177: Working on the PR > Simplify join condition of form (a==b) || (a==null&==null) to a<=>b > - > > Key: SPARK-40177 > URL: https://issues.apache.org/jira/browse/SPARK-40177 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 3.2.0, 3.3.0 >Reporter: Ayushi Agarwal >Priority: Major > Fix For: 3.3.1 > > > If the join condition is like key1==key2 || (key1==null && key2==null), join > is executed as Broadcast Nested Loop Join as this condition doesn't satisfy > equi join condition. BNLJ takes more time as compared to Sort merge or > broadcast join. This condition can be converted to key1<=>key2 to make the > join execute as Broadcast or sort merge join. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40177) Simplify join condition of form (a==b) || (a==null&==null) to a<=>b
Ayushi Agarwal created SPARK-40177: -- Summary: Simplify join condition of form (a==b) || (a==null&==null) to a<=>b Key: SPARK-40177 URL: https://issues.apache.org/jira/browse/SPARK-40177 Project: Spark Issue Type: Task Components: SQL Affects Versions: 3.3.0, 3.2.0 Reporter: Ayushi Agarwal Fix For: 3.3.1 If the join condition is like key1==key2 || (key1==null && key2==null), join is executed as Broadcast Nested Loop Join as this condition doesn't satisfy equi join condition. BNLJ takes more time as compared to Sort merge or broadcast join. This condition can be converted to key1<=>key2 to make the join execute as Broadcast or sort merge join. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-40151) Fix return type for new median(interval) function
[ https://issues.apache.org/jira/browse/SPARK-40151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk resolved SPARK-40151. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 37595 [https://github.com/apache/spark/pull/37595] > Fix return type for new median(interval) function > -- > > Key: SPARK-40151 > URL: https://issues.apache.org/jira/browse/SPARK-40151 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Serge Rielau >Assignee: Max Gekk >Priority: Critical > Fix For: 3.4.0 > > > median() right now returns an interval of the same type as the input. > We should instead match mean and avg(): > The result type is computed as for the arguments: > - year-month interval: The result is an `INTERVAL YEAR TO MONTH`. > - day-time interval: The result is an `INTERVAL DAY TO SECOND`. > - In all other cases the result is a DOUBLE. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40151) Fix return type for new median(interval) function
[ https://issues.apache.org/jira/browse/SPARK-40151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk reassigned SPARK-40151: Assignee: Max Gekk > Fix return type for new median(interval) function > -- > > Key: SPARK-40151 > URL: https://issues.apache.org/jira/browse/SPARK-40151 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Serge Rielau >Assignee: Max Gekk >Priority: Critical > > median() right now returns an interval of the same type as the input. > We should instead match mean and avg(): > The result type is computed as for the arguments: > - year-month interval: The result is an `INTERVAL YEAR TO MONTH`. > - day-time interval: The result is an `INTERVAL DAY TO SECOND`. > - In all other cases the result is a DOUBLE. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40176) Enhance collapse window optimization to work in case partition or order by keys are expressions
[ https://issues.apache.org/jira/browse/SPARK-40176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17582797#comment-17582797 ] Ayushi Agarwal commented on SPARK-40176: Working on this ticket > Enhance collapse window optimization to work in case partition or order by > keys are expressions > --- > > Key: SPARK-40176 > URL: https://issues.apache.org/jira/browse/SPARK-40176 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 3.2.0, 3.2.1, 3.3.0 >Reporter: Ayushi Agarwal >Priority: Major > Fix For: 3.3.1 > > > In window operator with multiple window functions, if any expression is > present in partition by or sort order columns, windows are not collapsed even > if partition and order by expression is same for all those window functions. > E.g. query: > val w = > Window.{_}partitionBy{_}("key").orderBy({_}lower{_}({_}col{_}("value"))) > df.select({_}lead{_}("key", 1).over(w), {_}lead{_}("value", 1).over(w)) > Current Plan: > -Window(lead(value,1), key, _w1) -- W1 > - Sort (key, _w1) > -Project (lower(“value”) as _w1) - P1 > -Window(lead(key,1), key, _w0) W2 > -Sort(key, _w0) > -Exchange(key) > -Project (lower(“value”) as _w0) P2 > -Scan > > W1 and W2 can be merged in single window -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38888) Add `RocksDBProvider` similar to `LevelDBProvider`
[ https://issues.apache.org/jira/browse/SPARK-3?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-3: Assignee: (was: Apache Spark) > Add `RocksDBProvider` similar to `LevelDBProvider` > -- > > Key: SPARK-3 > URL: https://issues.apache.org/jira/browse/SPARK-3 > Project: Spark > Issue Type: Improvement > Components: Spark Core, YARN >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Minor > > `LevelDBProvider` is used by `ExternalShuffleBlockResolver` and > `YarnShuffleService`, a corresponding `RocksDB` implementation should be added -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38888) Add `RocksDBProvider` similar to `LevelDBProvider`
[ https://issues.apache.org/jira/browse/SPARK-3?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-3: Assignee: Apache Spark > Add `RocksDBProvider` similar to `LevelDBProvider` > -- > > Key: SPARK-3 > URL: https://issues.apache.org/jira/browse/SPARK-3 > Project: Spark > Issue Type: Improvement > Components: Spark Core, YARN >Affects Versions: 3.4.0 >Reporter: Yang Jie >Assignee: Apache Spark >Priority: Minor > > `LevelDBProvider` is used by `ExternalShuffleBlockResolver` and > `YarnShuffleService`, a corresponding `RocksDB` implementation should be added -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38888) Add `RocksDBProvider` similar to `LevelDBProvider`
[ https://issues.apache.org/jira/browse/SPARK-3?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17582796#comment-17582796 ] Apache Spark commented on SPARK-3: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/37610 > Add `RocksDBProvider` similar to `LevelDBProvider` > -- > > Key: SPARK-3 > URL: https://issues.apache.org/jira/browse/SPARK-3 > Project: Spark > Issue Type: Improvement > Components: Spark Core, YARN >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Minor > > `LevelDBProvider` is used by `ExternalShuffleBlockResolver` and > `YarnShuffleService`, a corresponding `RocksDB` implementation should be added -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40176) Enhance collapse window optimization to work in case partition or order by keys are expressions
Ayushi Agarwal created SPARK-40176: -- Summary: Enhance collapse window optimization to work in case partition or order by keys are expressions Key: SPARK-40176 URL: https://issues.apache.org/jira/browse/SPARK-40176 Project: Spark Issue Type: Task Components: SQL Affects Versions: 3.3.0, 3.2.1, 3.2.0 Reporter: Ayushi Agarwal Fix For: 3.3.1 In window operator with multiple window functions, if any expression is present in partition by or sort order columns, windows are not collapsed even if partition and order by expression is same for all those window functions. E.g. query: val w = Window.{_}partitionBy{_}("key").orderBy({_}lower{_}({_}col{_}("value"))) df.select({_}lead{_}("key", 1).over(w), {_}lead{_}("value", 1).over(w)) Current Plan: -Window(lead(value,1), key, _w1) -- W1 - Sort (key, _w1) -Project (lower(“value”) as _w1) - P1 -Window(lead(key,1), key, _w0) W2 -Sort(key, _w0) -Exchange(key) -Project (lower(“value”) as _w0) P2 -Scan W1 and W2 can be merged in single window -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40175) Converting Tuple2 to Scala Map via `.toMap` is slow
[ https://issues.apache.org/jira/browse/SPARK-40175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17582787#comment-17582787 ] Apache Spark commented on SPARK-40175: -- User 'caican00' has created a pull request for this issue: https://github.com/apache/spark/pull/37609 > Converting Tuple2 to Scala Map via `.toMap` is slow > --- > > Key: SPARK-40175 > URL: https://issues.apache.org/jira/browse/SPARK-40175 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.2, 3.2.0, 3.1.3, 3.3.0, 3.2.2, 3.3.1 >Reporter: caican >Priority: Major > Attachments: image-2022-08-22-14-58-26-491.png, > image-2022-08-22-14-58-53-046.png > > > Converting Tuple2 to Scala Map via `.toMap` is slow > !image-2022-08-22-14-58-53-046.png! > !image-2022-08-22-14-58-26-491.png! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40175) Converting Tuple2 to Scala Map via `.toMap` is slow
[ https://issues.apache.org/jira/browse/SPARK-40175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40175: Assignee: (was: Apache Spark) > Converting Tuple2 to Scala Map via `.toMap` is slow > --- > > Key: SPARK-40175 > URL: https://issues.apache.org/jira/browse/SPARK-40175 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.2, 3.2.0, 3.1.3, 3.3.0, 3.2.2, 3.3.1 >Reporter: caican >Priority: Major > Attachments: image-2022-08-22-14-58-26-491.png, > image-2022-08-22-14-58-53-046.png > > > Converting Tuple2 to Scala Map via `.toMap` is slow > !image-2022-08-22-14-58-53-046.png! > !image-2022-08-22-14-58-26-491.png! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40175) Converting Tuple2 to Scala Map via `.toMap` is slow
[ https://issues.apache.org/jira/browse/SPARK-40175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17582781#comment-17582781 ] Apache Spark commented on SPARK-40175: -- User 'caican00' has created a pull request for this issue: https://github.com/apache/spark/pull/37608 > Converting Tuple2 to Scala Map via `.toMap` is slow > --- > > Key: SPARK-40175 > URL: https://issues.apache.org/jira/browse/SPARK-40175 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.2, 3.2.0, 3.1.3, 3.3.0, 3.2.2, 3.3.1 >Reporter: caican >Priority: Major > Attachments: image-2022-08-22-14-58-26-491.png, > image-2022-08-22-14-58-53-046.png > > > Converting Tuple2 to Scala Map via `.toMap` is slow > !image-2022-08-22-14-58-53-046.png! > !image-2022-08-22-14-58-26-491.png! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40175) Converting Tuple2 to Scala Map via `.toMap` is slow
[ https://issues.apache.org/jira/browse/SPARK-40175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40175: Assignee: Apache Spark > Converting Tuple2 to Scala Map via `.toMap` is slow > --- > > Key: SPARK-40175 > URL: https://issues.apache.org/jira/browse/SPARK-40175 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.2, 3.2.0, 3.1.3, 3.3.0, 3.2.2, 3.3.1 >Reporter: caican >Assignee: Apache Spark >Priority: Major > Attachments: image-2022-08-22-14-58-26-491.png, > image-2022-08-22-14-58-53-046.png > > > Converting Tuple2 to Scala Map via `.toMap` is slow > !image-2022-08-22-14-58-53-046.png! > !image-2022-08-22-14-58-26-491.png! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-40172) Temporarily disable flaky test cases in ImageFileFormatSuite
[ https://issues.apache.org/jira/browse/SPARK-40172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-40172. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 37605 [https://github.com/apache/spark/pull/37605] > Temporarily disable flaky test cases in ImageFileFormatSuite > > > Key: SPARK-40172 > URL: https://issues.apache.org/jira/browse/SPARK-40172 > Project: Spark > Issue Type: Test > Components: ML, Tests >Affects Versions: 3.4.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Minor > Fix For: 3.4.0 > > > 3 test cases in ImageFileFormatSuite become flaky in the GitHub action tests: > [https://github.com/apache/spark/runs/7941765326?check_suite_focus=true] > Before they are fixed(https://issues.apache.org/jira/browse/SPARK-40171), I > suggest disabling them in OSS. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org