[jira] [Updated] (SPARK-39198) Cannot refer to nested CTE within a nested CTE in a subquery.
[ https://issues.apache.org/jira/browse/SPARK-39198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jarno Rajala updated SPARK-39198: - Affects Version/s: 3.3.0 > Cannot refer to nested CTE within a nested CTE in a subquery. > - > > Key: SPARK-39198 > URL: https://issues.apache.org/jira/browse/SPARK-39198 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.1, 3.3.0 > Environment: Tested on > * Databricks runtime 10.4 > * Spark 3.2.1 from [https://spark.apache.org/downloads.html] > * GitHub apache/spark 'master' commit 17b85ff9 >Reporter: Jarno Rajala >Priority: Major > > The following query fails with {color:#ff}Table or view not found: > cte1;{color} > {code:java} > set spark.sql.legacy.ctePrecedencePolicy=CORRECTED; > with > cte1 as (select 1) > select * from ( > with > cte2 as (select * from cte1) > select * from cte2 > ); {code} > Or Spark 3.1.1 it returns 1 as expected. > This is related to SPARK-38404, but different, since the query fails with > Spark built from 'master' (commit 17b85ff9). The [PR > #36146|https://github.com/apache/spark/pull/36146] therefore does not fix > this issue. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40186) mergedShuffleCleaner should have been shutdown before db closed
[ https://issues.apache.org/jira/browse/SPARK-40186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40186: Assignee: Apache Spark > mergedShuffleCleaner should have been shutdown before db closed > --- > > Key: SPARK-40186 > URL: https://issues.apache.org/jira/browse/SPARK-40186 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Yang Jie >Assignee: Apache Spark >Priority: Major > > Should ensure `RemoteBlockPushResolver#mergedShuffleCleaner` have been > shutdown before `RemoteBlockPushResolver#db` closed, otherwise, > `RemoteBlockPushResolver#applicationRemoved` may perform delete operations on > a closed db. > > https://github.com/apache/spark/pull/37610#discussion_r951185256 > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40186) mergedShuffleCleaner should have been shutdown before db closed
[ https://issues.apache.org/jira/browse/SPARK-40186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17583435#comment-17583435 ] Apache Spark commented on SPARK-40186: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/37624 > mergedShuffleCleaner should have been shutdown before db closed > --- > > Key: SPARK-40186 > URL: https://issues.apache.org/jira/browse/SPARK-40186 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Major > > Should ensure `RemoteBlockPushResolver#mergedShuffleCleaner` have been > shutdown before `RemoteBlockPushResolver#db` closed, otherwise, > `RemoteBlockPushResolver#applicationRemoved` may perform delete operations on > a closed db. > > https://github.com/apache/spark/pull/37610#discussion_r951185256 > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40186) mergedShuffleCleaner should have been shutdown before db closed
[ https://issues.apache.org/jira/browse/SPARK-40186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40186: Assignee: (was: Apache Spark) > mergedShuffleCleaner should have been shutdown before db closed > --- > > Key: SPARK-40186 > URL: https://issues.apache.org/jira/browse/SPARK-40186 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Major > > Should ensure `RemoteBlockPushResolver#mergedShuffleCleaner` have been > shutdown before `RemoteBlockPushResolver#db` closed, otherwise, > `RemoteBlockPushResolver#applicationRemoved` may perform delete operations on > a closed db. > > https://github.com/apache/spark/pull/37610#discussion_r951185256 > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40188) Spark Direct Streaming: Read messages of a certain bytes or count in batches from Kafka is not working.
Madhav Madhu created SPARK-40188: Summary: Spark Direct Streaming: Read messages of a certain bytes or count in batches from Kafka is not working. Key: SPARK-40188 URL: https://issues.apache.org/jira/browse/SPARK-40188 Project: Spark Issue Type: Bug Components: DStreams Affects Versions: 3.2.1 Environment: Spark Version: 3.2.1 Kafka version: 3.2.0 Reporter: Madhav Madhu Spark Kafka consumer is unable to read messages, of a certain size or count in batches. I have tried few approaches as mentioned in Kafka docs but with no success. Here is a link to Stack Overflow where I asked the same question with no response and think this is a possible bug here. Same configuration works fine when the consumer is a java code. https://stackoverflow.com/questions/73398533/spark-streaming-context-kafka-consumer-read-messages-of-a-certain-byte-size-in Here is the consumer code which fetches data from Kafka, {code:scala} val streamingContext = new StreamingContext(sparkSession.sparkContext, Seconds(10)) val kafkaParams = Map[String, Object]( "bootstrap.servers" -> "localhost:9092", "key.deserializer" -> classOf[StringDeserializer], "value.deserializer" -> classOf[StringDeserializer], "group.id" -> "test", "fetch.max.bytes" -> "65536", "max.partition.fetch.bytes" -> "8192", "max.poll.records" -> "100", "auto.offset.reset" -> "latest", "enable.auto.commit" -> (false: java.lang.Boolean), "sasl.jaas.config"-> "org.apache.kafka.common.security.plain.PlainLoginModule required username=\"admin\" password=\"admin\";", "sasl.mechanism" -> "PLAIN", "security.protocol" -> "SASL_PLAINTEXT", ) val topics = Array("test.topic") val stream = KafkaUtils.createDirectStream[String, String]( streamingContext, PreferConsistent, Subscribe[String, String](topics, kafkaParams) ) stream.foreachRDD { rdd => val offsetRanges = rdd.asInstanceOf[HasOffsetRanges].offsetRanges println(offsetRanges.foreach(a => println(a.topic + ":" + a.partition + ":" + a.fromOffset + ":" + a.untilOffset + ":" + a.count( val df = rdd.map(a => a.value().split(",")).toDF() val selectCols = columns.indices.map(i => $"value"(i)) var newDF = df.select(selectCols: _*).toDF(columns: _*) // Some business operations here and then write to back to kafka. newDF.write .format("kafka") .option("kafka.bootstrap.servers", "localhost:9092") .option("topic", "topic.ouput") .option("kafka.sasl.jaas.config", "org.apache.kafka.common.security.plain.PlainLoginModule required username=\"admin\" password=\"admin\";") .option("kafka.sasl.mechanism", "PLAIN") .option("kafka.security.protocol", "SASL_PLAINTEXT") .save() stream.asInstanceOf[CanCommitOffsets].commitAsync(offsetRanges) sparkSession.catalog.clearCache() } streamingContext.start() streamingContext.awaitTermination() {code} Output: {code:java} test.topic:6:1345075:4163058:2817983 test.topic:0:1339456:4144190:2804734 test.topic:3:1354266:4189336:2835070 test.topic:7:1353542:4186148:2832606 test.topic:5:1355140:4189071:2833931 test.topic:2:1351162:4173375:2822213 test.topic:1:1352801:4184073:2831272 test.topic:4:1348558:4166749:2818191 () test.topic:6:4163058:4163058:0 test.topic:0:4144190:4144190:0 test.topic:3:4189336:4189336:0 test.topic:7:4186148:4186148:0 test.topic:5:4189071:4189071:0 test.topic:2:4173375:4173375:0 test.topic:1:4184073:4184073:0 test.topic:4:4166749:4166749:0 {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40188) Spark Direct Streaming: Read messages of a certain bytes or count in batches from Kafka is not working.
[ https://issues.apache.org/jira/browse/SPARK-40188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Madhav Madhu updated SPARK-40188: - Description: Spark Kafka consumer is unable to read messages, of a certain size or count in batches. I have tried few approaches as mentioned in Kafka docs but with no success. Here is a link to Stack Overflow where I asked the same question with no response and think this is a possible bug here. Same configuration works fine when the consumer is a java code. https://stackoverflow.com/questions/73398533/spark-streaming-context-kafka-consumer-read-messages-of-a-certain-byte-size-in Here is the consumer code which fetches data from Kafka, {code:scala} val streamingContext = new StreamingContext(sparkSession.sparkContext, Seconds(10)) val kafkaParams = Map[String, Object]( "bootstrap.servers" -> "localhost:9092", "key.deserializer" -> classOf[StringDeserializer], "value.deserializer" -> classOf[StringDeserializer], "group.id" -> "test", "fetch.max.bytes" -> "65536", "max.partition.fetch.bytes" -> "8192", "max.poll.records" -> "100", "auto.offset.reset" -> "latest", "enable.auto.commit" -> (false: java.lang.Boolean), "sasl.jaas.config"-> "org.apache.kafka.common.security.plain.PlainLoginModule required username=\"admin\" password=\"admin\";", "sasl.mechanism" -> "PLAIN", "security.protocol" -> "SASL_PLAINTEXT", ) val topics = Array("test.topic") val stream = KafkaUtils.createDirectStream[String, String]( streamingContext, PreferConsistent, Subscribe[String, String](topics, kafkaParams) ) stream.foreachRDD { rdd => val offsetRanges = rdd.asInstanceOf[HasOffsetRanges].offsetRanges println(offsetRanges.foreach(a => println(a.topic + ":" + a.partition + ":" + a.fromOffset + ":" + a.untilOffset + ":" + a.count( val df = rdd.map(a => a.value().split(",")).toDF() val selectCols = columns.indices.map(i => $"value"(i)) var newDF = df.select(selectCols: _*).toDF(columns: _*) // Some business operations here and then write to back to kafka. newDF.write .format("kafka") .option("kafka.bootstrap.servers", "localhost:9092") .option("topic", "topic.ouput") .option("kafka.sasl.jaas.config", "org.apache.kafka.common.security.plain.PlainLoginModule required username=\"admin\" password=\"admin\";") .option("kafka.sasl.mechanism", "PLAIN") .option("kafka.security.protocol", "SASL_PLAINTEXT") .save() stream.asInstanceOf[CanCommitOffsets].commitAsync(offsetRanges) sparkSession.catalog.clearCache() } streamingContext.start() streamingContext.awaitTermination() {code} Output: {code:java} test.topic:6:1345075:4163058:2817983 test.topic:0:1339456:4144190:2804734 test.topic:3:1354266:4189336:2835070 test.topic:7:1353542:4186148:2832606 test.topic:5:1355140:4189071:2833931 test.topic:2:1351162:4173375:2822213 test.topic:1:1352801:4184073:2831272 test.topic:4:1348558:4166749:2818191 () test.topic:6:4163058:4163058:0 test.topic:0:4144190:4144190:0 test.topic:3:4189336:4189336:0 test.topic:7:4186148:4186148:0 test.topic:5:4189071:4189071:0 test.topic:2:4173375:4173375:0 test.topic:1:4184073:4184073:0 test.topic:4:4166749:4166749:0 {code} I tried different options as followed, Option 1: Topic Partition 8 Streaming Context 1 sec: "fetch.max.bytes" -> "65536", // 64 Kb "max.partition.fetch.bytes" -> "8192" // 8Kb "max.poll.records" -> "100" DataFrame count which it read from Kafka in the very first batch: 120 Option 2: Partition 1 Streaming Context 1 sec "fetch.max.bytes" -> "65536", "max.partition.fetch.bytes" -> "8192" "max.poll.records" -> "100" Kafka Lag: 126360469 DataFrame count which it read from Kafka in the very first batch: 126360469. was: Spark Kafka consumer is unable to read messages, of a certain size or count in batches. I have tried few approaches as mentioned in Kafka docs but with no success. Here is a link to Stack Overflow where I asked the same question with no response and think this is a possible bug here. Same configuration works fine when the consumer is a java code. https://stackoverflow.com/questions/73398533/spark-streaming-context-kafka-consumer-read-messages-of-a-certain-byte-size-in Here is the consumer code which fetches data from Kafka, {code:scala} val streamingContext = new StreamingContext(sparkSession.sparkContext, Seconds(10)) val kafkaParams = Map[String, Object]( "bootstrap.servers" -> "localhost:9092", "key.deserializer" -> classOf[StringDeserializer], "value.deserializer" -> classOf[StringDeserializer], "group.id" -> "test", "fetch.max.bytes" -> "65536", "max.partition.fetch.bytes" -> "8192", "max.poll.records" -> "100", "auto.offset.reset" -> "latest", "enable.auto.commit" -> (false: java.lang.Boolean), "sasl.jaas.config"-> "org.apache.kafka.common.security.plain.PlainLoginModule required username=\"admin\" password=\"admin\";", "sasl.mechanism" -> "PLAIN", "security.protocol" -> "SASL_PLAI
[jira] [Assigned] (SPARK-40173) Make pyspark.taskcontext examples self-contained
[ https://issues.apache.org/jira/browse/SPARK-40173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-40173: Assignee: Hyukjin Kwon > Make pyspark.taskcontext examples self-contained > > > Key: SPARK-40173 > URL: https://issues.apache.org/jira/browse/SPARK-40173 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark, Spark Core >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-40173) Make pyspark.taskcontext examples self-contained
[ https://issues.apache.org/jira/browse/SPARK-40173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-40173. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 37623 [https://github.com/apache/spark/pull/37623] > Make pyspark.taskcontext examples self-contained > > > Key: SPARK-40173 > URL: https://issues.apache.org/jira/browse/SPARK-40173 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark, Spark Core >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40177) Simplify join condition of form (a==b) || (a==null&&b==null) to a<=>b
[ https://issues.apache.org/jira/browse/SPARK-40177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40177: Assignee: (was: Apache Spark) > Simplify join condition of form (a==b) || (a==null&&b==null) to a<=>b > - > > Key: SPARK-40177 > URL: https://issues.apache.org/jira/browse/SPARK-40177 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 3.2.0, 3.3.0 >Reporter: Ayushi Agarwal >Priority: Major > Fix For: 3.3.1 > > > If the join condition is like key1==key2 || (key1==null && key2==null), join > is executed as Broadcast Nested Loop Join as this condition doesn't satisfy > equi join condition. BNLJ takes more time as compared to Sort merge or > broadcast join. This condition can be converted to key1<=>key2 to make the > join execute as Broadcast or sort merge join. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40177) Simplify join condition of form (a==b) || (a==null&&b==null) to a<=>b
[ https://issues.apache.org/jira/browse/SPARK-40177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40177: Assignee: Apache Spark > Simplify join condition of form (a==b) || (a==null&&b==null) to a<=>b > - > > Key: SPARK-40177 > URL: https://issues.apache.org/jira/browse/SPARK-40177 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 3.2.0, 3.3.0 >Reporter: Ayushi Agarwal >Assignee: Apache Spark >Priority: Major > Fix For: 3.3.1 > > > If the join condition is like key1==key2 || (key1==null && key2==null), join > is executed as Broadcast Nested Loop Join as this condition doesn't satisfy > equi join condition. BNLJ takes more time as compared to Sort merge or > broadcast join. This condition can be converted to key1<=>key2 to make the > join execute as Broadcast or sort merge join. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40177) Simplify join condition of form (a==b) || (a==null&&b==null) to a<=>b
[ https://issues.apache.org/jira/browse/SPARK-40177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17583450#comment-17583450 ] Apache Spark commented on SPARK-40177: -- User 'ayushi-agarwal' has created a pull request for this issue: https://github.com/apache/spark/pull/37625 > Simplify join condition of form (a==b) || (a==null&&b==null) to a<=>b > - > > Key: SPARK-40177 > URL: https://issues.apache.org/jira/browse/SPARK-40177 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 3.2.0, 3.3.0 >Reporter: Ayushi Agarwal >Priority: Major > Fix For: 3.3.1 > > > If the join condition is like key1==key2 || (key1==null && key2==null), join > is executed as Broadcast Nested Loop Join as this condition doesn't satisfy > equi join condition. BNLJ takes more time as compared to Sort merge or > broadcast join. This condition can be converted to key1<=>key2 to make the > join execute as Broadcast or sort merge join. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40177) Simplify join condition of form (a==b) || (a==null&&b==null) to a<=>b
[ https://issues.apache.org/jira/browse/SPARK-40177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17583454#comment-17583454 ] Apache Spark commented on SPARK-40177: -- User 'ayushi-agarwal' has created a pull request for this issue: https://github.com/apache/spark/pull/37625 > Simplify join condition of form (a==b) || (a==null&&b==null) to a<=>b > - > > Key: SPARK-40177 > URL: https://issues.apache.org/jira/browse/SPARK-40177 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 3.2.0, 3.3.0 >Reporter: Ayushi Agarwal >Priority: Major > Fix For: 3.3.1 > > > If the join condition is like key1==key2 || (key1==null && key2==null), join > is executed as Broadcast Nested Loop Join as this condition doesn't satisfy > equi join condition. BNLJ takes more time as compared to Sort merge or > broadcast join. This condition can be converted to key1<=>key2 to make the > join execute as Broadcast or sort merge join. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40189) Support json_array_get/json_array_length function
melin created SPARK-40189: - Summary: Support json_array_get/json_array_length function Key: SPARK-40189 URL: https://issues.apache.org/jira/browse/SPARK-40189 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.4.0 Reporter: melin presto provides these two functions,frequently used https://prestodb.io/docs/current/functions/json.html#json-functions -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40190) Support json_array_get and json_array_length function
melin created SPARK-40190: - Summary: Support json_array_get and json_array_length function Key: SPARK-40190 URL: https://issues.apache.org/jira/browse/SPARK-40190 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.4.0 Reporter: melin presto provides these two functions, which are often used: https://prestodb.io/docs/current/functions/json.html#json-functions -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-40190) Support json_array_get and json_array_length function
[ https://issues.apache.org/jira/browse/SPARK-40190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] melin resolved SPARK-40190. --- Resolution: Duplicate > Support json_array_get and json_array_length function > - > > Key: SPARK-40190 > URL: https://issues.apache.org/jira/browse/SPARK-40190 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: melin >Priority: Major > > presto provides these two functions, which are often used: > https://prestodb.io/docs/current/functions/json.html#json-functions -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40189) Support json_array_get/json_array_length function
[ https://issues.apache.org/jira/browse/SPARK-40189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17583473#comment-17583473 ] melin commented on SPARK-40189: --- [~maxgekk] > Support json_array_get/json_array_length function > - > > Key: SPARK-40189 > URL: https://issues.apache.org/jira/browse/SPARK-40189 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: melin >Priority: Major > > presto provides these two functions,frequently used > https://prestodb.io/docs/current/functions/json.html#json-functions -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40152) Codegen compilation error when using split_part
[ https://issues.apache.org/jira/browse/SPARK-40152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17583498#comment-17583498 ] Apache Spark commented on SPARK-40152: -- User 'wangyum' has created a pull request for this issue: https://github.com/apache/spark/pull/37626 > Codegen compilation error when using split_part > --- > > Key: SPARK-40152 > URL: https://issues.apache.org/jira/browse/SPARK-40152 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: Bruce Robbins >Assignee: Yuming Wang >Priority: Major > Fix For: 3.4.0, 3.3.1 > > > The following query throws an error: > {noformat} > create or replace temp view v1 as > select * from values > ('11.12.13', '.', 3) > as v1(col1, col2, col3); > cache table v1; > SELECT split_part(col1, col2, col3) > from v1; > {noformat} > The error is: > {noformat} > 22/08/19 14:25:14 ERROR CodeGenerator: failed to compile: > org.codehaus.commons.compiler.CompileException: File 'generated.java', Line > 42, Column 1: Expression "project_isNull_0 = false" is not a type > org.codehaus.commons.compiler.CompileException: File 'generated.java', Line > 42, Column 1: Expression "project_isNull_0 = false" is not a type > at > org.codehaus.janino.Java$Atom.toTypeOrCompileException(Java.java:3934) > at org.codehaus.janino.Parser.parseBlockStatement(Parser.java:1887) > at org.codehaus.janino.Parser.parseBlockStatements(Parser.java:1811) > at org.codehaus.janino.Parser.parseBlock(Parser.java:1792) > at > {noformat} > In the end, {{split_part}} does successfully execute, although in interpreted > mode. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40152) Codegen compilation error when using split_part
[ https://issues.apache.org/jira/browse/SPARK-40152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17583500#comment-17583500 ] Apache Spark commented on SPARK-40152: -- User 'wangyum' has created a pull request for this issue: https://github.com/apache/spark/pull/37626 > Codegen compilation error when using split_part > --- > > Key: SPARK-40152 > URL: https://issues.apache.org/jira/browse/SPARK-40152 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: Bruce Robbins >Assignee: Yuming Wang >Priority: Major > Fix For: 3.4.0, 3.3.1 > > > The following query throws an error: > {noformat} > create or replace temp view v1 as > select * from values > ('11.12.13', '.', 3) > as v1(col1, col2, col3); > cache table v1; > SELECT split_part(col1, col2, col3) > from v1; > {noformat} > The error is: > {noformat} > 22/08/19 14:25:14 ERROR CodeGenerator: failed to compile: > org.codehaus.commons.compiler.CompileException: File 'generated.java', Line > 42, Column 1: Expression "project_isNull_0 = false" is not a type > org.codehaus.commons.compiler.CompileException: File 'generated.java', Line > 42, Column 1: Expression "project_isNull_0 = false" is not a type > at > org.codehaus.janino.Java$Atom.toTypeOrCompileException(Java.java:3934) > at org.codehaus.janino.Parser.parseBlockStatement(Parser.java:1887) > at org.codehaus.janino.Parser.parseBlockStatements(Parser.java:1811) > at org.codehaus.janino.Parser.parseBlock(Parser.java:1792) > at > {noformat} > In the end, {{split_part}} does successfully execute, although in interpreted > mode. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40191) Make pyspark.resource examples self-contained
Hyukjin Kwon created SPARK-40191: Summary: Make pyspark.resource examples self-contained Key: SPARK-40191 URL: https://issues.apache.org/jira/browse/SPARK-40191 Project: Spark Issue Type: Sub-task Components: Documentation, PySpark, Spark Core Affects Versions: 3.4.0 Reporter: Hyukjin Kwon -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40191) Make pyspark.resource examples self-contained
[ https://issues.apache.org/jira/browse/SPARK-40191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40191: Assignee: (was: Apache Spark) > Make pyspark.resource examples self-contained > - > > Key: SPARK-40191 > URL: https://issues.apache.org/jira/browse/SPARK-40191 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark, Spark Core >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40191) Make pyspark.resource examples self-contained
[ https://issues.apache.org/jira/browse/SPARK-40191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17583510#comment-17583510 ] Apache Spark commented on SPARK-40191: -- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/37627 > Make pyspark.resource examples self-contained > - > > Key: SPARK-40191 > URL: https://issues.apache.org/jira/browse/SPARK-40191 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark, Spark Core >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40191) Make pyspark.resource examples self-contained
[ https://issues.apache.org/jira/browse/SPARK-40191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40191: Assignee: Apache Spark > Make pyspark.resource examples self-contained > - > > Key: SPARK-40191 > URL: https://issues.apache.org/jira/browse/SPARK-40191 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark, Spark Core >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40191) Make pyspark.resource examples self-contained
[ https://issues.apache.org/jira/browse/SPARK-40191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40191: Assignee: Apache Spark > Make pyspark.resource examples self-contained > - > > Key: SPARK-40191 > URL: https://issues.apache.org/jira/browse/SPARK-40191 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark, Spark Core >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40192) Remove redundant groupby
deshanxiao created SPARK-40192: -- Summary: Remove redundant groupby Key: SPARK-40192 URL: https://issues.apache.org/jira/browse/SPARK-40192 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.3.0 Reporter: deshanxiao -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40192) Remove redundant groupby
[ https://issues.apache.org/jira/browse/SPARK-40192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17583532#comment-17583532 ] Apache Spark commented on SPARK-40192: -- User 'deshanxiao' has created a pull request for this issue: https://github.com/apache/spark/pull/37628 > Remove redundant groupby > > > Key: SPARK-40192 > URL: https://issues.apache.org/jira/browse/SPARK-40192 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: deshanxiao >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40192) Remove redundant groupby
[ https://issues.apache.org/jira/browse/SPARK-40192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40192: Assignee: (was: Apache Spark) > Remove redundant groupby > > > Key: SPARK-40192 > URL: https://issues.apache.org/jira/browse/SPARK-40192 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: deshanxiao >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40192) Remove redundant groupby
[ https://issues.apache.org/jira/browse/SPARK-40192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40192: Assignee: Apache Spark > Remove redundant groupby > > > Key: SPARK-40192 > URL: https://issues.apache.org/jira/browse/SPARK-40192 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: deshanxiao >Assignee: Apache Spark >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40160) Make pyspark.broadcast examples self-contained
[ https://issues.apache.org/jira/browse/SPARK-40160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17583556#comment-17583556 ] Apache Spark commented on SPARK-40160: -- User 'dcoliversun' has created a pull request for this issue: https://github.com/apache/spark/pull/37629 > Make pyspark.broadcast examples self-contained > -- > > Key: SPARK-40160 > URL: https://issues.apache.org/jira/browse/SPARK-40160 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark >Affects Versions: 3.4.0 >Reporter: Qian Sun >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40160) Make pyspark.broadcast examples self-contained
[ https://issues.apache.org/jira/browse/SPARK-40160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40160: Assignee: (was: Apache Spark) > Make pyspark.broadcast examples self-contained > -- > > Key: SPARK-40160 > URL: https://issues.apache.org/jira/browse/SPARK-40160 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark >Affects Versions: 3.4.0 >Reporter: Qian Sun >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40160) Make pyspark.broadcast examples self-contained
[ https://issues.apache.org/jira/browse/SPARK-40160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40160: Assignee: Apache Spark > Make pyspark.broadcast examples self-contained > -- > > Key: SPARK-40160 > URL: https://issues.apache.org/jira/browse/SPARK-40160 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark >Affects Versions: 3.4.0 >Reporter: Qian Sun >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40193) Merge different filters when merging subquery plans
Peter Toth created SPARK-40193: -- Summary: Merge different filters when merging subquery plans Key: SPARK-40193 URL: https://issues.apache.org/jira/browse/SPARK-40193 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.4.0 Reporter: Peter Toth We could improve SPARK-34079 to be able to merge different filters. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40193) Merge subquery plans with different filters
[ https://issues.apache.org/jira/browse/SPARK-40193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Toth updated SPARK-40193: --- Summary: Merge subquery plans with different filters (was: Merge different filters when merging subquery plans) > Merge subquery plans with different filters > --- > > Key: SPARK-40193 > URL: https://issues.apache.org/jira/browse/SPARK-40193 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Peter Toth >Priority: Major > > We could improve SPARK-34079 to be able to merge different filters. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40193) Merge subquery plans with different filters
[ https://issues.apache.org/jira/browse/SPARK-40193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17583658#comment-17583658 ] Apache Spark commented on SPARK-40193: -- User 'peter-toth' has created a pull request for this issue: https://github.com/apache/spark/pull/37630 > Merge subquery plans with different filters > --- > > Key: SPARK-40193 > URL: https://issues.apache.org/jira/browse/SPARK-40193 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Peter Toth >Priority: Major > > We could improve SPARK-34079 to be able to merge different filters. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40193) Merge subquery plans with different filters
[ https://issues.apache.org/jira/browse/SPARK-40193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40193: Assignee: (was: Apache Spark) > Merge subquery plans with different filters > --- > > Key: SPARK-40193 > URL: https://issues.apache.org/jira/browse/SPARK-40193 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Peter Toth >Priority: Major > > We could improve SPARK-34079 to be able to merge different filters. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40193) Merge subquery plans with different filters
[ https://issues.apache.org/jira/browse/SPARK-40193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40193: Assignee: Apache Spark > Merge subquery plans with different filters > --- > > Key: SPARK-40193 > URL: https://issues.apache.org/jira/browse/SPARK-40193 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Peter Toth >Assignee: Apache Spark >Priority: Major > > We could improve SPARK-34079 to be able to merge different filters. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-40183) Use error class NUMERIC_VALUE_OUT_OF_RANGE for overflow in decimal conversion
[ https://issues.apache.org/jira/browse/SPARK-40183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang resolved SPARK-40183. Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 37620 [https://github.com/apache/spark/pull/37620] > Use error class NUMERIC_VALUE_OUT_OF_RANGE for overflow in decimal conversion > - > > Key: SPARK-40183 > URL: https://issues.apache.org/jira/browse/SPARK-40183 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > Fix For: 3.4.0 > > > Use error class NUMERIC_VALUE_OUT_OF_RANGE for overflow in decimal > conversion, instead of the confusing error class > `CANNOT_CHANGE_DECIMAL_PRECISION`. > Also, use `decimal.toPlainString` instead of `decimal.toDebugString` in the > error message. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40089) Sorting of at least Decimal(20, 2) fails for some values near the max.
[ https://issues.apache.org/jira/browse/SPARK-40089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-40089: - Assignee: Robert Joseph Evans (was: Apache Spark) > Sorting of at least Decimal(20, 2) fails for some values near the max. > -- > > Key: SPARK-40089 > URL: https://issues.apache.org/jira/browse/SPARK-40089 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0, 3.3.0, 3.4.0 >Reporter: Robert Joseph Evans >Assignee: Robert Joseph Evans >Priority: Major > Fix For: 3.1.4, 3.4.0, 3.3.1, 3.2.3 > > Attachments: input.parquet > > > I have been doing some testing with Decimal values for the RAPIDS Accelerator > for Apache Spark. I have been trying to add in new corner cases and when I > tried to enable the maximum supported value for a sort I started to get > failures. On closer inspection it looks like the CPU is sorting things > incorrectly. Specifically anything that is "99.50" or above > is placed as a chunk in the wrong location in the outputs. > In local mode with 12 tasks. > {code:java} > spark.read.parquet("input.parquet").orderBy(col("a")).collect.foreach(System.err.println) > {code} > > Here you will notice that the last entry printed is > {{[99.49]}}, and {{[99.99]}} is near the top > near {{[-99.99]}} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40172) Temporarily disable flaky test cases in ImageFileFormatSuite
[ https://issues.apache.org/jira/browse/SPARK-40172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-40172: -- Fix Version/s: 3.3.1 3.2.3 > Temporarily disable flaky test cases in ImageFileFormatSuite > > > Key: SPARK-40172 > URL: https://issues.apache.org/jira/browse/SPARK-40172 > Project: Spark > Issue Type: Test > Components: ML, Tests >Affects Versions: 3.4.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Minor > Fix For: 3.4.0, 3.3.1, 3.2.3 > > > 3 test cases in ImageFileFormatSuite become flaky in the GitHub action tests: > [https://github.com/apache/spark/runs/7941765326?check_suite_focus=true] > Before they are fixed(https://issues.apache.org/jira/browse/SPARK-40171), I > suggest disabling them in OSS. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40171) Fix flaky tests in ImageFileFormatSuite
[ https://issues.apache.org/jira/browse/SPARK-40171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-40171: -- Affects Version/s: 3.3.1 3.2.3 > Fix flaky tests in ImageFileFormatSuite > --- > > Key: SPARK-40171 > URL: https://issues.apache.org/jira/browse/SPARK-40171 > Project: Spark > Issue Type: Bug > Components: ML >Affects Versions: 3.4.0, 3.3.1, 3.2.3 >Reporter: Gengliang Wang >Priority: Major > > There are 3 test cases that become flaky in the GitHub action tests: > [https://github.com/apache/spark/runs/7941765326?check_suite_focus=true] > We should fix them. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40172) Temporarily disable flaky test cases in ImageFileFormatSuite
[ https://issues.apache.org/jira/browse/SPARK-40172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-40172: -- Affects Version/s: 3.2.2 3.3.0 > Temporarily disable flaky test cases in ImageFileFormatSuite > > > Key: SPARK-40172 > URL: https://issues.apache.org/jira/browse/SPARK-40172 > Project: Spark > Issue Type: Test > Components: ML, Tests >Affects Versions: 3.3.0, 3.2.2, 3.4.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Minor > Fix For: 3.4.0, 3.3.1, 3.2.3 > > > 3 test cases in ImageFileFormatSuite become flaky in the GitHub action tests: > [https://github.com/apache/spark/runs/7941765326?check_suite_focus=true] > Before they are fixed(https://issues.apache.org/jira/browse/SPARK-40171), I > suggest disabling them in OSS. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40171) Fix flaky tests in ImageFileFormatSuite
[ https://issues.apache.org/jira/browse/SPARK-40171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-40171: -- Affects Version/s: 3.2.2 3.3.0 (was: 3.3.1) (was: 3.2.3) > Fix flaky tests in ImageFileFormatSuite > --- > > Key: SPARK-40171 > URL: https://issues.apache.org/jira/browse/SPARK-40171 > Project: Spark > Issue Type: Bug > Components: ML >Affects Versions: 3.3.0, 3.2.2, 3.4.0 >Reporter: Gengliang Wang >Priority: Major > > There are 3 test cases that become flaky in the GitHub action tests: > [https://github.com/apache/spark/runs/7941765326?check_suite_focus=true] > We should fix them. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34631) Caught Hive MetaException when query by partition (partition col start with ‘$’)
[ https://issues.apache.org/jira/browse/SPARK-34631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17583785#comment-17583785 ] Brendan Morin commented on SPARK-34631: --- For other users encountering this issue suddenly: make sure you check all partitions, not just the partition you want to query. I'm encountering and able to duplicate this error this in the following scenario: Trying to query an external hive table that is partitioned on date, e.g.: {code:java} df = spark.sql("select a, b from my_db.my_table where date = '2022-08-10'") df.show() >>> java.lang.RuntimeException: Caught Hive MetaException attempting to get >>> partition...{code} Confirm the data type of the columns: {code:java} spark.sql("select * from my_db.my_table").printSchema() >>> root >>> |-- a: string (nullable = true) >>> |-- b: string (nullable = true) >>> |-- date: date (nullable = true){code} Check the partitions: {code:java} spark.sql("show partitions my_db.my_table").show(20, False) >>> +---+ >>> |partition | >>> +---+ >>> |date=2022-08-07| >>> |date=2022-08-08| >>> |date=2022-08-08_tmp| # Note the malformed partition >>> |date=2022-08-09| >>> |date=2022-08-10| >>> |date=2022-08-11| >>> |date=2022-08-12| >>> +---+{code} This was the problem in my case. There was a date partition (Note: the problem partition was not the only I was querying) that was malformed in the HDFS directory where the hive external table data was located. The string format was unable to be properly parsed into the data type. Removing this partition from HDFS, dropping and recreating the table with MSCK repair solved the issue. For additional context, my_db.my_table was managed as an external table. Table updates were done by writing parquet files as partitions, and then running drop table, create table, and MSCK repair on the table. For some reason, this write/update process did not fail due to the malformed partition, so additional partitions were able to continue to be added. The problem only manifested on read. I think that the root cause in my case is actually an overly broad catch by spark, and I think the error handling logic could be refined to identify this root cause, or clue users in that the issue may be a malformed partition name that does not parse correctly into the expected data type (date in this case). > Caught Hive MetaException when query by partition (partition col start with > ‘$’) > > > Key: SPARK-34631 > URL: https://issues.apache.org/jira/browse/SPARK-34631 > Project: Spark > Issue Type: Bug > Components: DStreams, Java API >Affects Versions: 2.4.4 >Reporter: zhouyuan >Priority: Critical > > create a table, set location as parquet, do msck repair table to get the data. > But when query with partition column, got some errors (adding backtick would > not help) > {code:java} > // code placeholder > {code} > select count from some_table where `$partition_date` = '2015-01-01' > > {panel:title=error:} > java.lang.RuntimeException: Caught Hive MetaException attempting to get > partition metadata by filter from Hive. You can set the Spark configuration > setting spark.sql.hive.manageFilesourcePartitions to false to work around > this problem, however this will result in degraded performance. Please report > a bug: https://issues.apache.org/jira/browse/SPARK > at > org.apache.spark.sql.hive.client.Shim_v0_13.getPartitionsByFilter(HiveShim.scala:772) > at > org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionsByFilter$1.apply(HiveClientImpl.scala:679) > at > org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionsByFilter$1.apply(HiveClientImpl.scala:677) > at > org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:275) > at > org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:213) > at > org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:212) > at > org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:258) > at > org.apache.spark.sql.hive.client.HiveClientImpl.getPartitionsByFilter(HiveClientImpl.scala:677) > at > org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$listPartitionsByFilter$1.apply(HiveExternalCatalog.scala:1221) > at > org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$listPartitionsByFilter$1.apply(HiveExternalCatalog.scala:1214) > at > org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97) > at > org.apache.spark.sql.hive.HiveExternalCatalog.listPartitionsByFilter(HiveExternalCatalog.scala:1214) >
[jira] [Comment Edited] (SPARK-34631) Caught Hive MetaException when query by partition (partition col start with ‘$’)
[ https://issues.apache.org/jira/browse/SPARK-34631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17583785#comment-17583785 ] Brendan Morin edited comment on SPARK-34631 at 8/23/22 6:18 PM: For other users encountering this issue suddenly: make sure you check all partitions, not just the partition you want to query. I'm encountering and able to duplicate this error this in the following scenario: Trying to query an external hive table that is partitioned on date, e.g.: {code:java} df = spark.sql("select a, b from my_db.my_table where date = '2022-08-10'") df.show() >>> java.lang.RuntimeException: Caught Hive MetaException attempting to get >>> partition... >>> Caused by: MetaException(message:Filtering is supported only on partition >>> keys of type string){code} Confirm the data type of the columns: {code:java} spark.sql("select * from my_db.my_table").printSchema() >>> root >>> |-- a: string (nullable = true) >>> |-- b: string (nullable = true) >>> |-- date: date (nullable = true){code} Check the partitions: {code:java} spark.sql("show partitions my_db.my_table").show(20, False) >>> +---+ >>> |partition | >>> +---+ >>> |date=2022-08-07| >>> |date=2022-08-08| >>> |date=2022-08-08_tmp| # Note the malformed partition >>> |date=2022-08-09| >>> |date=2022-08-10| >>> |date=2022-08-11| >>> |date=2022-08-12| >>> +---+{code} This was the problem in my case. There was a date partition (Note: the problem partition was not the only I was querying) that was malformed in the HDFS directory where the hive external table data was located. The string format was unable to be properly parsed into the data type. Removing this partition from HDFS, dropping and recreating the table with MSCK repair solved the issue. For additional context, my_db.my_table was managed as an external table. Table updates were done by writing parquet files as partitions, and then running drop table, create table, and MSCK repair on the table. For some reason, this write/update process did not fail due to the malformed partition, so additional partitions were able to continue to be added. The problem only manifested on read. I think that the root cause in my case is actually an overly broad catch by spark, and I think the error handling logic could be refined to identify this root cause, or clue users in that the issue may be a malformed partition name that does not parse correctly into the expected data type (date in this case). The specific error: {code:java} Caused by: MetaException(message:Filtering is supported only on partition keys of type string){code} is a bit of a red herring, as this is not true, and searching this error will lead you down a rabbit hole of incorrect root cause/unrelated issues. was (Author: brendanjmorin): For other users encountering this issue suddenly: make sure you check all partitions, not just the partition you want to query. I'm encountering and able to duplicate this error this in the following scenario: Trying to query an external hive table that is partitioned on date, e.g.: {code:java} df = spark.sql("select a, b from my_db.my_table where date = '2022-08-10'") df.show() >>> java.lang.RuntimeException: Caught Hive MetaException attempting to get >>> partition...{code} Confirm the data type of the columns: {code:java} spark.sql("select * from my_db.my_table").printSchema() >>> root >>> |-- a: string (nullable = true) >>> |-- b: string (nullable = true) >>> |-- date: date (nullable = true){code} Check the partitions: {code:java} spark.sql("show partitions my_db.my_table").show(20, False) >>> +---+ >>> |partition | >>> +---+ >>> |date=2022-08-07| >>> |date=2022-08-08| >>> |date=2022-08-08_tmp| # Note the malformed partition >>> |date=2022-08-09| >>> |date=2022-08-10| >>> |date=2022-08-11| >>> |date=2022-08-12| >>> +---+{code} This was the problem in my case. There was a date partition (Note: the problem partition was not the only I was querying) that was malformed in the HDFS directory where the hive external table data was located. The string format was unable to be properly parsed into the data type. Removing this partition from HDFS, dropping and recreating the table with MSCK repair solved the issue. For additional context, my_db.my_table was managed as an external table. Table updates were done by writing parquet files as partitions, and then running drop table, create table, and MSCK repair on the table. For some reason, this write/update process did not fail due to the malformed partition, so additional partitions were able to continue to be added. The problem only manifested on read. I think that the root cause in my case is actually a
[jira] [Comment Edited] (SPARK-34631) Caught Hive MetaException when query by partition (partition col start with ‘$’)
[ https://issues.apache.org/jira/browse/SPARK-34631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17583785#comment-17583785 ] Brendan Morin edited comment on SPARK-34631 at 8/23/22 6:52 PM: For other users encountering this issue suddenly: make sure you check all partitions, not just the partition you want to query. I'm encountering and able to duplicate this error this in the following scenario: Trying to query an external hive table that is partitioned on date, e.g.: {code:java} df = spark.sql("select a, b from my_db.my_table where date = '2022-08-10'") df.show() >>> java.lang.RuntimeException: Caught Hive MetaException attempting to get >>> partition... >>> Caused by: MetaException(message:Filtering is supported only on partition >>> keys of type string){code} Confirm the data type of the columns: {code:java} spark.sql("select * from my_db.my_table").printSchema() >>> root >>> |-- a: string (nullable = true) >>> |-- b: string (nullable = true) >>> |-- date: date (nullable = true){code} Check the partitions: {code:java} spark.sql("show partitions my_db.my_table").show(20, False) >>> +---+ >>> |partition | >>> +---+ >>> |date=2022-08-07| >>> |date=2022-08-08| >>> |date=2022-08-08_tmp| # Note the malformed partition >>> |date=2022-08-09| >>> |date=2022-08-10| >>> |date=2022-08-11| >>> |date=2022-08-12| >>> +---+{code} This was the problem in my case. There was a date partition (Note: the problem partition was not the one I was querying for) that was malformed in the HDFS directory where the hive external table data was located. The string format was unable to be properly parsed into the data type. Removing this partition from HDFS, dropping and recreating the table with MSCK repair solved the issue. For additional context, my_db.my_table was managed as an external table. Table updates were done by writing parquet files as partitions, and then running drop table, create table, and MSCK repair on the table. For some reason, this write/update process did not fail due to the malformed partition, so additional partitions were able to continue to be added. The problem only manifested on read. I think that the root cause in my case is actually an overly broad catch by spark, and I think the error handling logic could be refined to identify this root cause, or clue users in that the issue may be a malformed partition name that does not parse correctly into the expected data type (date in this case). The specific error: {code:java} Caused by: MetaException(message:Filtering is supported only on partition keys of type string){code} is a bit of a red herring, as this is not true, and searching this error will lead you down a rabbit hole of incorrect root cause/unrelated issues. was (Author: brendanjmorin): For other users encountering this issue suddenly: make sure you check all partitions, not just the partition you want to query. I'm encountering and able to duplicate this error this in the following scenario: Trying to query an external hive table that is partitioned on date, e.g.: {code:java} df = spark.sql("select a, b from my_db.my_table where date = '2022-08-10'") df.show() >>> java.lang.RuntimeException: Caught Hive MetaException attempting to get >>> partition... >>> Caused by: MetaException(message:Filtering is supported only on partition >>> keys of type string){code} Confirm the data type of the columns: {code:java} spark.sql("select * from my_db.my_table").printSchema() >>> root >>> |-- a: string (nullable = true) >>> |-- b: string (nullable = true) >>> |-- date: date (nullable = true){code} Check the partitions: {code:java} spark.sql("show partitions my_db.my_table").show(20, False) >>> +---+ >>> |partition | >>> +---+ >>> |date=2022-08-07| >>> |date=2022-08-08| >>> |date=2022-08-08_tmp| # Note the malformed partition >>> |date=2022-08-09| >>> |date=2022-08-10| >>> |date=2022-08-11| >>> |date=2022-08-12| >>> +---+{code} This was the problem in my case. There was a date partition (Note: the problem partition was not the only I was querying) that was malformed in the HDFS directory where the hive external table data was located. The string format was unable to be properly parsed into the data type. Removing this partition from HDFS, dropping and recreating the table with MSCK repair solved the issue. For additional context, my_db.my_table was managed as an external table. Table updates were done by writing parquet files as partitions, and then running drop table, create table, and MSCK repair on the table. For some reason, this write/update process did not fail due to the malformed partition, so additional partitions were able to continue to be added.
[jira] [Created] (SPARK-40194) SPLIT function on empty regex should truncate trailing empty string.
Vitalii Li created SPARK-40194: -- Summary: SPLIT function on empty regex should truncate trailing empty string. Key: SPARK-40194 URL: https://issues.apache.org/jira/browse/SPARK-40194 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.4.0 Reporter: Vitalii Li E.g. `select split('hello', '')` should convert to `['h', 'e', 'l', 'l', 'o']` instead of `['h', 'e', 'l', 'l', 'o', '']`. Requires explicit `limit` parameter to preserve trailing empty string. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40194) SPLIT function on empty regex should truncate trailing empty string.
[ https://issues.apache.org/jira/browse/SPARK-40194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40194: Assignee: (was: Apache Spark) > SPLIT function on empty regex should truncate trailing empty string. > > > Key: SPARK-40194 > URL: https://issues.apache.org/jira/browse/SPARK-40194 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Vitalii Li >Priority: Major > > E.g. `select split('hello', '')` should convert to `['h', 'e', 'l', 'l', > 'o']` instead of `['h', 'e', 'l', 'l', 'o', '']`. Requires explicit `limit` > parameter to preserve trailing empty string. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40194) SPLIT function on empty regex should truncate trailing empty string.
[ https://issues.apache.org/jira/browse/SPARK-40194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17583877#comment-17583877 ] Apache Spark commented on SPARK-40194: -- User 'vitaliili-db' has created a pull request for this issue: https://github.com/apache/spark/pull/37631 > SPLIT function on empty regex should truncate trailing empty string. > > > Key: SPARK-40194 > URL: https://issues.apache.org/jira/browse/SPARK-40194 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Vitalii Li >Priority: Major > > E.g. `select split('hello', '')` should convert to `['h', 'e', 'l', 'l', > 'o']` instead of `['h', 'e', 'l', 'l', 'o', '']`. Requires explicit `limit` > parameter to preserve trailing empty string. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40194) SPLIT function on empty regex should truncate trailing empty string.
[ https://issues.apache.org/jira/browse/SPARK-40194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40194: Assignee: Apache Spark > SPLIT function on empty regex should truncate trailing empty string. > > > Key: SPARK-40194 > URL: https://issues.apache.org/jira/browse/SPARK-40194 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Vitalii Li >Assignee: Apache Spark >Priority: Major > > E.g. `select split('hello', '')` should convert to `['h', 'e', 'l', 'l', > 'o']` instead of `['h', 'e', 'l', 'l', 'o', '']`. Requires explicit `limit` > parameter to preserve trailing empty string. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40195) Add PrunedScanWithAQESuite
Kazuyuki Tanimura created SPARK-40195: - Summary: Add PrunedScanWithAQESuite Key: SPARK-40195 URL: https://issues.apache.org/jira/browse/SPARK-40195 Project: Spark Issue Type: Test Components: SQL, Tests Affects Versions: 3.4.0 Reporter: Kazuyuki Tanimura Currently `PrunedScanSuite` assumes that AQE is always not applied. We should also test with AQE force applied. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40196) Consolidate `lit` function with NumPy input in sql and pandas module
Xinrong Meng created SPARK-40196: Summary: Consolidate `lit` function with NumPy input in sql and pandas module Key: SPARK-40196 URL: https://issues.apache.org/jira/browse/SPARK-40196 Project: Spark Issue Type: Sub-task Components: PySpark Affects Versions: 3.4.0 Reporter: Xinrong Meng Function `lit` with NumPy input in sql and pandas module have different implementations, thus, sql has a less precise result than pandas. We shall make their result consistent, the more precise, the better. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40196) Consolidate `lit` function with NumPy input in sql and pandas module
[ https://issues.apache.org/jira/browse/SPARK-40196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-40196: - Description: Per [https://github.com/apache/spark/pull/37560#discussion_r952882996,] function `lit` with NumPy input in sql and pandas module have different implementations, thus, sql has a less precise result than pandas. We shall make their result consistent, the more precise, the better. was: Function `lit` with NumPy input in sql and pandas module have different implementations, thus, sql has a less precise result than pandas. We shall make their result consistent, the more precise, the better. > Consolidate `lit` function with NumPy input in sql and pandas module > > > Key: SPARK-40196 > URL: https://issues.apache.org/jira/browse/SPARK-40196 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Priority: Major > > Per [https://github.com/apache/spark/pull/37560#discussion_r952882996,] > function `lit` with NumPy input in sql and pandas module have different > implementations, thus, sql has a less precise result than pandas. > We shall make their result consistent, the more precise, the better. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-40191) Make pyspark.resource examples self-contained
[ https://issues.apache.org/jira/browse/SPARK-40191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-40191. --- Fix Version/s: 3.4.0 Assignee: Hyukjin Kwon (was: Apache Spark) Resolution: Fixed This is resolved via https://github.com/apache/spark/pull/37627 > Make pyspark.resource examples self-contained > - > > Key: SPARK-40191 > URL: https://issues.apache.org/jira/browse/SPARK-40191 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark, Spark Core >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40197) Replace query plan with context for MULTI_VALUE_SUBQUERY_ERROR
Vitalii Li created SPARK-40197: -- Summary: Replace query plan with context for MULTI_VALUE_SUBQUERY_ERROR Key: SPARK-40197 URL: https://issues.apache.org/jira/browse/SPARK-40197 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.4.0 Reporter: Vitalii Li Instead of a query plan - output subquery context. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40197) Replace query plan with context for MULTI_VALUE_SUBQUERY_ERROR
[ https://issues.apache.org/jira/browse/SPARK-40197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40197: Assignee: (was: Apache Spark) > Replace query plan with context for MULTI_VALUE_SUBQUERY_ERROR > -- > > Key: SPARK-40197 > URL: https://issues.apache.org/jira/browse/SPARK-40197 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Vitalii Li >Priority: Major > > Instead of a query plan - output subquery context. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40197) Replace query plan with context for MULTI_VALUE_SUBQUERY_ERROR
[ https://issues.apache.org/jira/browse/SPARK-40197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17583901#comment-17583901 ] Apache Spark commented on SPARK-40197: -- User 'vitaliili-db' has created a pull request for this issue: https://github.com/apache/spark/pull/37632 > Replace query plan with context for MULTI_VALUE_SUBQUERY_ERROR > -- > > Key: SPARK-40197 > URL: https://issues.apache.org/jira/browse/SPARK-40197 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Vitalii Li >Priority: Major > > Instead of a query plan - output subquery context. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40197) Replace query plan with context for MULTI_VALUE_SUBQUERY_ERROR
[ https://issues.apache.org/jira/browse/SPARK-40197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40197: Assignee: Apache Spark > Replace query plan with context for MULTI_VALUE_SUBQUERY_ERROR > -- > > Key: SPARK-40197 > URL: https://issues.apache.org/jira/browse/SPARK-40197 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Vitalii Li >Assignee: Apache Spark >Priority: Major > > Instead of a query plan - output subquery context. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40198) Enable spark.storage.decommission.(rdd|shuffle)Blocks.enabled by default
Dongjoon Hyun created SPARK-40198: - Summary: Enable spark.storage.decommission.(rdd|shuffle)Blocks.enabled by default Key: SPARK-40198 URL: https://issues.apache.org/jira/browse/SPARK-40198 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 3.4.0 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40198) Enable spark.storage.decommission.(rdd|shuffle)Blocks.enabled by default
[ https://issues.apache.org/jira/browse/SPARK-40198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17583903#comment-17583903 ] Apache Spark commented on SPARK-40198: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/37633 > Enable spark.storage.decommission.(rdd|shuffle)Blocks.enabled by default > > > Key: SPARK-40198 > URL: https://issues.apache.org/jira/browse/SPARK-40198 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40198) Enable spark.storage.decommission.(rdd|shuffle)Blocks.enabled by default
[ https://issues.apache.org/jira/browse/SPARK-40198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40198: Assignee: Apache Spark > Enable spark.storage.decommission.(rdd|shuffle)Blocks.enabled by default > > > Key: SPARK-40198 > URL: https://issues.apache.org/jira/browse/SPARK-40198 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Dongjoon Hyun >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40198) Enable spark.storage.decommission.(rdd|shuffle)Blocks.enabled by default
[ https://issues.apache.org/jira/browse/SPARK-40198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40198: Assignee: (was: Apache Spark) > Enable spark.storage.decommission.(rdd|shuffle)Blocks.enabled by default > > > Key: SPARK-40198 > URL: https://issues.apache.org/jira/browse/SPARK-40198 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-40078) Make pyspark.sql.column examples self-contained
[ https://issues.apache.org/jira/browse/SPARK-40078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-40078. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 37521 [https://github.com/apache/spark/pull/37521] > Make pyspark.sql.column examples self-contained > --- > > Key: SPARK-40078 > URL: https://issues.apache.org/jira/browse/SPARK-40078 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark >Affects Versions: 3.4.0 >Reporter: Qian Sun >Assignee: Qian Sun >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40078) Make pyspark.sql.column examples self-contained
[ https://issues.apache.org/jira/browse/SPARK-40078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-40078: Assignee: Qian Sun > Make pyspark.sql.column examples self-contained > --- > > Key: SPARK-40078 > URL: https://issues.apache.org/jira/browse/SPARK-40078 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark >Affects Versions: 3.4.0 >Reporter: Qian Sun >Assignee: Qian Sun >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39150) Remove `# doctest: +SKIP` of SPARK-38947/SPARK-39326 when infra dump pandas to 1.4+
[ https://issues.apache.org/jira/browse/SPARK-39150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-39150: Assignee: (was: Yikun Jiang) > Remove `# doctest: +SKIP` of SPARK-38947/SPARK-39326 when infra dump pandas > to 1.4+ > --- > > Key: SPARK-39150 > URL: https://issues.apache.org/jira/browse/SPARK-39150 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Yikun Jiang >Priority: Major > > [https://github.com/apache/spark/blob/fe85d7912f86c3e337aa93b23bfa7e7e01c0a32e/python/pyspark/pandas/groupby.py#L2333] > [https://github.com/apache/spark/blob/fe85d7912f86c3e337aa93b23bfa7e7e01c0a32e/python/pyspark/pandas/groupby.py#L2265] > all doctest in https://github.com/apache/spark/pull/36712 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-39150) Remove `# doctest: +SKIP` of SPARK-38947/SPARK-39326 when infra dump pandas to 1.4+
[ https://issues.apache.org/jira/browse/SPARK-39150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-39150: - Fix Version/s: (was: 3.4.0) > Remove `# doctest: +SKIP` of SPARK-38947/SPARK-39326 when infra dump pandas > to 1.4+ > --- > > Key: SPARK-39150 > URL: https://issues.apache.org/jira/browse/SPARK-39150 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Yikun Jiang >Assignee: Yikun Jiang >Priority: Major > > [https://github.com/apache/spark/blob/fe85d7912f86c3e337aa93b23bfa7e7e01c0a32e/python/pyspark/pandas/groupby.py#L2333] > [https://github.com/apache/spark/blob/fe85d7912f86c3e337aa93b23bfa7e7e01c0a32e/python/pyspark/pandas/groupby.py#L2265] > all doctest in https://github.com/apache/spark/pull/36712 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-39150) Remove `# doctest: +SKIP` of SPARK-38947/SPARK-39326 when infra dump pandas to 1.4+
[ https://issues.apache.org/jira/browse/SPARK-39150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reopened SPARK-39150: -- > Remove `# doctest: +SKIP` of SPARK-38947/SPARK-39326 when infra dump pandas > to 1.4+ > --- > > Key: SPARK-39150 > URL: https://issues.apache.org/jira/browse/SPARK-39150 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Yikun Jiang >Assignee: Yikun Jiang >Priority: Major > Fix For: 3.4.0 > > > [https://github.com/apache/spark/blob/fe85d7912f86c3e337aa93b23bfa7e7e01c0a32e/python/pyspark/pandas/groupby.py#L2333] > [https://github.com/apache/spark/blob/fe85d7912f86c3e337aa93b23bfa7e7e01c0a32e/python/pyspark/pandas/groupby.py#L2265] > all doctest in https://github.com/apache/spark/pull/36712 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-39150) Remove `# doctest: +SKIP` of SPARK-38947/SPARK-39326 when infra dump pandas to 1.4+
[ https://issues.apache.org/jira/browse/SPARK-39150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-39150. -- Resolution: Not A Problem Reverted at https://github.com/apache/spark/commit/d32a67f92cfcc7c67f44e682d4c3612d60ba1b3a > Remove `# doctest: +SKIP` of SPARK-38947/SPARK-39326 when infra dump pandas > to 1.4+ > --- > > Key: SPARK-39150 > URL: https://issues.apache.org/jira/browse/SPARK-39150 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Yikun Jiang >Assignee: Yikun Jiang >Priority: Major > Fix For: 3.4.0 > > > [https://github.com/apache/spark/blob/fe85d7912f86c3e337aa93b23bfa7e7e01c0a32e/python/pyspark/pandas/groupby.py#L2333] > [https://github.com/apache/spark/blob/fe85d7912f86c3e337aa93b23bfa7e7e01c0a32e/python/pyspark/pandas/groupby.py#L2265] > all doctest in https://github.com/apache/spark/pull/36712 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40124) Update TPCDS v1.4 q32 for Plan Stability tests
[ https://issues.apache.org/jira/browse/SPARK-40124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-40124: -- Fix Version/s: 3.3.1 > Update TPCDS v1.4 q32 for Plan Stability tests > -- > > Key: SPARK-40124 > URL: https://issues.apache.org/jira/browse/SPARK-40124 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: Kapil Singh >Assignee: Kapil Singh >Priority: Major > Fix For: 3.4.0, 3.3.1 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40199) Spark throws NPE without useful message when NULL value appears in non-null schema
Erik Krogen created SPARK-40199: --- Summary: Spark throws NPE without useful message when NULL value appears in non-null schema Key: SPARK-40199 URL: https://issues.apache.org/jira/browse/SPARK-40199 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.2.2 Reporter: Erik Krogen Currently in some cases, if Spark encounters a NULL value where the schema indicates that the column/field should be non-null, it will throw a {{NullPointerException}} with no message and thus no way to debug further. This can happen, for example, if you have a UDF which is erroneously marked as {{asNonNullable()}}, or if you read input data where the actual values don't match the schema (which could happen e.g. with Avro if the reader provides a schema declaring non-null although the data was written with null values). As an example of how to reproduce: {code:scala} val badUDF = spark.udf.register[String, Int]("bad_udf", in => null).asNonNullable() Seq(1, 2).toDF("c1").select(badUDF($"c1")).collect() {code} This throws an exception like: {code} Driver stacktrace: org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 0.0 failed 1 times, most recent failure: Lost task 1.0 in stage 0.0 (TID 1) (xx executor driver): java.lang.NullPointerException at org.apache.spark.sql.catalyst.expressions.codegen.UnsafeWriter.write(UnsafeWriter.java:110) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:760) at org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:364) at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:890) at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:890) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:365) at org.apache.spark.rdd.RDD.iterator(RDD.scala:329) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:92) at org.apache.spark.scheduler.Task.run(Task.scala:139) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1490) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) {code} As a user, it is very confusing -- it looks like there is a bug in Spark. We have had many users report such problems, and though we can guide them to a schema-data mismatch, there is no indication of what field might contain the bad values, so a laborious data exploration process is required to find and remedy it. We should provide a better error message in such cases. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-39150) Remove `# doctest: +SKIP` of SPARK-38947/SPARK-39326 when infra dump pandas to 1.4+
[ https://issues.apache.org/jira/browse/SPARK-39150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun closed SPARK-39150. - > Remove `# doctest: +SKIP` of SPARK-38947/SPARK-39326 when infra dump pandas > to 1.4+ > --- > > Key: SPARK-39150 > URL: https://issues.apache.org/jira/browse/SPARK-39150 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Yikun Jiang >Priority: Major > > [https://github.com/apache/spark/blob/fe85d7912f86c3e337aa93b23bfa7e7e01c0a32e/python/pyspark/pandas/groupby.py#L2333] > [https://github.com/apache/spark/blob/fe85d7912f86c3e337aa93b23bfa7e7e01c0a32e/python/pyspark/pandas/groupby.py#L2265] > all doctest in https://github.com/apache/spark/pull/36712 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40199) Spark throws NPE without useful message when NULL value appears in non-null schema
[ https://issues.apache.org/jira/browse/SPARK-40199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40199: Assignee: Apache Spark > Spark throws NPE without useful message when NULL value appears in non-null > schema > -- > > Key: SPARK-40199 > URL: https://issues.apache.org/jira/browse/SPARK-40199 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.2 >Reporter: Erik Krogen >Assignee: Apache Spark >Priority: Major > > Currently in some cases, if Spark encounters a NULL value where the schema > indicates that the column/field should be non-null, it will throw a > {{NullPointerException}} with no message and thus no way to debug further. > This can happen, for example, if you have a UDF which is erroneously marked > as {{asNonNullable()}}, or if you read input data where the actual values > don't match the schema (which could happen e.g. with Avro if the reader > provides a schema declaring non-null although the data was written with null > values). > As an example of how to reproduce: > {code:scala} > val badUDF = spark.udf.register[String, Int]("bad_udf", in => > null).asNonNullable() > Seq(1, 2).toDF("c1").select(badUDF($"c1")).collect() > {code} > This throws an exception like: > {code} > Driver stacktrace: > org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in > stage 0.0 failed 1 times, most recent failure: Lost task 1.0 in stage 0.0 > (TID 1) (xx executor driver): java.lang.NullPointerException > at > org.apache.spark.sql.catalyst.expressions.codegen.UnsafeWriter.write(UnsafeWriter.java:110) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown > Source) > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:760) > at > org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:364) > at > org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:890) > at > org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:890) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:365) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:329) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:92) > at org.apache.spark.scheduler.Task.run(Task.scala:139) > at > org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1490) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > {code} > As a user, it is very confusing -- it looks like there is a bug in Spark. We > have had many users report such problems, and though we can guide them to a > schema-data mismatch, there is no indication of what field might contain the > bad values, so a laborious data exploration process is required to find and > remedy it. > We should provide a better error message in such cases. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40199) Spark throws NPE without useful message when NULL value appears in non-null schema
[ https://issues.apache.org/jira/browse/SPARK-40199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40199: Assignee: (was: Apache Spark) > Spark throws NPE without useful message when NULL value appears in non-null > schema > -- > > Key: SPARK-40199 > URL: https://issues.apache.org/jira/browse/SPARK-40199 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.2 >Reporter: Erik Krogen >Priority: Major > > Currently in some cases, if Spark encounters a NULL value where the schema > indicates that the column/field should be non-null, it will throw a > {{NullPointerException}} with no message and thus no way to debug further. > This can happen, for example, if you have a UDF which is erroneously marked > as {{asNonNullable()}}, or if you read input data where the actual values > don't match the schema (which could happen e.g. with Avro if the reader > provides a schema declaring non-null although the data was written with null > values). > As an example of how to reproduce: > {code:scala} > val badUDF = spark.udf.register[String, Int]("bad_udf", in => > null).asNonNullable() > Seq(1, 2).toDF("c1").select(badUDF($"c1")).collect() > {code} > This throws an exception like: > {code} > Driver stacktrace: > org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in > stage 0.0 failed 1 times, most recent failure: Lost task 1.0 in stage 0.0 > (TID 1) (xx executor driver): java.lang.NullPointerException > at > org.apache.spark.sql.catalyst.expressions.codegen.UnsafeWriter.write(UnsafeWriter.java:110) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown > Source) > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:760) > at > org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:364) > at > org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:890) > at > org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:890) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:365) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:329) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:92) > at org.apache.spark.scheduler.Task.run(Task.scala:139) > at > org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1490) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > {code} > As a user, it is very confusing -- it looks like there is a bug in Spark. We > have had many users report such problems, and though we can guide them to a > schema-data mismatch, there is no indication of what field might contain the > bad values, so a laborious data exploration process is required to find and > remedy it. > We should provide a better error message in such cases. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40199) Spark throws NPE without useful message when NULL value appears in non-null schema
[ https://issues.apache.org/jira/browse/SPARK-40199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17583916#comment-17583916 ] Apache Spark commented on SPARK-40199: -- User 'xkrogen' has created a pull request for this issue: https://github.com/apache/spark/pull/37634 > Spark throws NPE without useful message when NULL value appears in non-null > schema > -- > > Key: SPARK-40199 > URL: https://issues.apache.org/jira/browse/SPARK-40199 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.2 >Reporter: Erik Krogen >Priority: Major > > Currently in some cases, if Spark encounters a NULL value where the schema > indicates that the column/field should be non-null, it will throw a > {{NullPointerException}} with no message and thus no way to debug further. > This can happen, for example, if you have a UDF which is erroneously marked > as {{asNonNullable()}}, or if you read input data where the actual values > don't match the schema (which could happen e.g. with Avro if the reader > provides a schema declaring non-null although the data was written with null > values). > As an example of how to reproduce: > {code:scala} > val badUDF = spark.udf.register[String, Int]("bad_udf", in => > null).asNonNullable() > Seq(1, 2).toDF("c1").select(badUDF($"c1")).collect() > {code} > This throws an exception like: > {code} > Driver stacktrace: > org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in > stage 0.0 failed 1 times, most recent failure: Lost task 1.0 in stage 0.0 > (TID 1) (xx executor driver): java.lang.NullPointerException > at > org.apache.spark.sql.catalyst.expressions.codegen.UnsafeWriter.write(UnsafeWriter.java:110) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown > Source) > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:760) > at > org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:364) > at > org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:890) > at > org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:890) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:365) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:329) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:92) > at org.apache.spark.scheduler.Task.run(Task.scala:139) > at > org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1490) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > {code} > As a user, it is very confusing -- it looks like there is a bug in Spark. We > have had many users report such problems, and though we can guide them to a > schema-data mismatch, there is no indication of what field might contain the > bad values, so a laborious data exploration process is required to find and > remedy it. > We should provide a better error message in such cases. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40131) Support NumPy ndarray in built-in functions
[ https://issues.apache.org/jira/browse/SPARK-40131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17583917#comment-17583917 ] Apache Spark commented on SPARK-40131: -- User 'xinrong-meng' has created a pull request for this issue: https://github.com/apache/spark/pull/37635 > Support NumPy ndarray in built-in functions > --- > > Key: SPARK-40131 > URL: https://issues.apache.org/jira/browse/SPARK-40131 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Priority: Major > > Per [https://github.com/apache/spark/pull/37560#discussion_r948572473] > we want to support NumPy ndarray in built-in functions -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40131) Support NumPy ndarray in built-in functions
[ https://issues.apache.org/jira/browse/SPARK-40131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40131: Assignee: (was: Apache Spark) > Support NumPy ndarray in built-in functions > --- > > Key: SPARK-40131 > URL: https://issues.apache.org/jira/browse/SPARK-40131 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Priority: Major > > Per [https://github.com/apache/spark/pull/37560#discussion_r948572473] > we want to support NumPy ndarray in built-in functions -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40200) unpersist cascades with Kryo, MEMORY_AND_DISK_SER and monotonically_increasing_id
Calvin Pietersen created SPARK-40200: Summary: unpersist cascades with Kryo, MEMORY_AND_DISK_SER and monotonically_increasing_id Key: SPARK-40200 URL: https://issues.apache.org/jira/browse/SPARK-40200 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 3.3.0 Environment: spark-3.3.0 Reporter: Calvin Pietersen Unpersist of a parent dataset which has a column from `monotonically_increasing_id` cascades to a child dataset when * joined on another dataset * kryo serialization is enabled * storage level is MEMORY_AND_DISK_SER * not all rows join ``` import org.apache.spark.sql.functions.monotonically_increasing_id import org.apache.spark.storage.StorageLevel case class a(value: String, id: Long) val storageLevel = StorageLevel.MEMORY_AND_DISK_SER // cascades //val storageLevel = StorageLevel.MEMORY_ONLY // doesn't cascade val acc = sc.longAccumulator("acc") val parent1DS = spark .createDataset(Seq("a", "b", "c")) .withColumn("id", monotonically_increasing_id) .as[a] .persist(storageLevel) val parent2DS = spark .createDataset(Seq(1, 2, 3)) // 0,1,2 doesn't cascade .persist(storageLevel) val childDS = parent1DS .joinWith(parent2DS, parent1DS("id") === parent2DS("value")) .map(i => { acc.add(1) i }).persist(storageLevel) childDS.count parent1DS.unpersist childDS.count acc.value should be(2) ``` -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40131) Support NumPy ndarray in built-in functions
[ https://issues.apache.org/jira/browse/SPARK-40131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40131: Assignee: Apache Spark > Support NumPy ndarray in built-in functions > --- > > Key: SPARK-40131 > URL: https://issues.apache.org/jira/browse/SPARK-40131 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Assignee: Apache Spark >Priority: Major > > Per [https://github.com/apache/spark/pull/37560#discussion_r948572473] > we want to support NumPy ndarray in built-in functions -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40200) unpersist cascades with Kryo, MEMORY_AND_DISK_SER and monotonically_increasing_id
[ https://issues.apache.org/jira/browse/SPARK-40200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Calvin Pietersen updated SPARK-40200: - Description: Unpersist of a parent dataset which has a column from `monotonically_increasing_id` cascades to a child dataset when * joined on another dataset * kryo serialization is enabled * storage level is MEMORY_AND_DISK_SER * not all rows join ``` import org.apache.spark.sql.functions.monotonically_increasing_id import org.apache.spark.storage.StorageLevel case class a(value: String, id: Long) val storageLevel = StorageLevel.MEMORY_AND_DISK_SER // cascades //val storageLevel = StorageLevel.MEMORY_ONLY // doesn't cascade val acc = sc.longAccumulator("acc") val parent1DS = spark .createDataset(Seq("a", "b", "c")) .withColumn("id", monotonically_increasing_id) .as[a] .persist(storageLevel) val parent2DS = spark .createDataset(Seq(1, 2, 3)) // 0,1,2 doesn't cascade .persist(storageLevel) val childDS = parent1DS .joinWith(parent2DS, parent1DS("id") === parent2DS("value")) .map(i => { acc.add(1) i } ).persist(storageLevel) childDS.count parent1DS.unpersist childDS.count acc.value should be(2) ``` was: Unpersist of a parent dataset which has a column from `monotonically_increasing_id` cascades to a child dataset when * joined on another dataset * kryo serialization is enabled * storage level is MEMORY_AND_DISK_SER * not all rows join ``` import org.apache.spark.sql.functions.monotonically_increasing_id import org.apache.spark.storage.StorageLevel case class a(value: String, id: Long) val storageLevel = StorageLevel.MEMORY_AND_DISK_SER // cascades //val storageLevel = StorageLevel.MEMORY_ONLY // doesn't cascade val acc = sc.longAccumulator("acc") val parent1DS = spark .createDataset(Seq("a", "b", "c")) .withColumn("id", monotonically_increasing_id) .as[a] .persist(storageLevel) val parent2DS = spark .createDataset(Seq(1, 2, 3)) // 0,1,2 doesn't cascade .persist(storageLevel) val childDS = parent1DS .joinWith(parent2DS, parent1DS("id") === parent2DS("value")) .map(i => { acc.add(1) i }).persist(storageLevel) childDS.count parent1DS.unpersist childDS.count acc.value should be(2) ``` > unpersist cascades with Kryo, MEMORY_AND_DISK_SER and > monotonically_increasing_id > - > > Key: SPARK-40200 > URL: https://issues.apache.org/jira/browse/SPARK-40200 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.3.0 > Environment: spark-3.3.0 >Reporter: Calvin Pietersen >Priority: Major > > Unpersist of a parent dataset which has a column from > `monotonically_increasing_id` cascades to a child dataset when > * joined on another dataset > * kryo serialization is enabled > * storage level is MEMORY_AND_DISK_SER > * not all rows join > > > > > ``` > import org.apache.spark.sql.functions.monotonically_increasing_id > import org.apache.spark.storage.StorageLevel > case class a(value: String, id: Long) > val storageLevel = StorageLevel.MEMORY_AND_DISK_SER // cascades > //val storageLevel = StorageLevel.MEMORY_ONLY // doesn't cascade > val acc = sc.longAccumulator("acc") > val parent1DS = spark > .createDataset(Seq("a", "b", "c")) > .withColumn("id", monotonically_increasing_id) > .as[a] > .persist(storageLevel) > val parent2DS = spark > .createDataset(Seq(1, 2, 3)) // 0,1,2 doesn't cascade > .persist(storageLevel) > val childDS = parent1DS > .joinWith(parent2DS, parent1DS("id") === parent2DS("value")) > .map(i => > { acc.add(1) i } > ).persist(storageLevel) > childDS.count > parent1DS.unpersist > childDS.count > acc.value should be(2) > ``` > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40200) unpersist cascades with Kryo, MEMORY_AND_DISK_SER and monotonically_increasing_id
[ https://issues.apache.org/jira/browse/SPARK-40200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Calvin Pietersen updated SPARK-40200: - Description: Unpersist of a parent dataset which has a column from `monotonically_increasing_id` cascades to a child dataset when * joined on another dataset * kryo serialization is enabled * storage level is MEMORY_AND_DISK_SER * not all rows join {code:java} import org.apache.spark.sql.functions.monotonically_increasing_id import org.apache.spark.storage.StorageLevel case class a(value: String, id: Long) val storageLevel = StorageLevel.MEMORY_AND_DISK_SER // cascades //val storageLevel = StorageLevel.MEMORY_ONLY // doesn't cascade val acc = sc.longAccumulator("acc") val parent1DS = spark.createDataset(Seq("a", "b", "c")) .withColumn("id", monotonically_increasing_id) .as[a] .persist(storageLevel) val parent2DS = spark.createDataset(Seq(1, 2, 3)) // 0,1,2 doesn't cascade .persist(storageLevel) val childDS = parent1DS.joinWith(parent2DS, parent1DS("id") === parent2DS("value")) .map(i =>{ acc.add(1) i }).persist(storageLevel) childDS.count parent1DS.unpersist childDS.count acc.value should be(2) {code} was: Unpersist of a parent dataset which has a column from `monotonically_increasing_id` cascades to a child dataset when * joined on another dataset * kryo serialization is enabled * storage level is MEMORY_AND_DISK_SER * not all rows join ``` import org.apache.spark.sql.functions.monotonically_increasing_id import org.apache.spark.storage.StorageLevel case class a(value: String, id: Long) val storageLevel = StorageLevel.MEMORY_AND_DISK_SER // cascades //val storageLevel = StorageLevel.MEMORY_ONLY // doesn't cascade val acc = sc.longAccumulator("acc") val parent1DS = spark .createDataset(Seq("a", "b", "c")) .withColumn("id", monotonically_increasing_id) .as[a] .persist(storageLevel) val parent2DS = spark .createDataset(Seq(1, 2, 3)) // 0,1,2 doesn't cascade .persist(storageLevel) val childDS = parent1DS .joinWith(parent2DS, parent1DS("id") === parent2DS("value")) .map(i => { acc.add(1) i } ).persist(storageLevel) childDS.count parent1DS.unpersist childDS.count acc.value should be(2) ``` > unpersist cascades with Kryo, MEMORY_AND_DISK_SER and > monotonically_increasing_id > - > > Key: SPARK-40200 > URL: https://issues.apache.org/jira/browse/SPARK-40200 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.3.0 > Environment: spark-3.3.0 >Reporter: Calvin Pietersen >Priority: Major > > Unpersist of a parent dataset which has a column from > `monotonically_increasing_id` cascades to a child dataset when > * joined on another dataset > * kryo serialization is enabled > * storage level is MEMORY_AND_DISK_SER > * not all rows join > > {code:java} > import org.apache.spark.sql.functions.monotonically_increasing_id > import org.apache.spark.storage.StorageLevel > case class a(value: String, id: Long) > val storageLevel = StorageLevel.MEMORY_AND_DISK_SER // cascades > //val storageLevel = StorageLevel.MEMORY_ONLY // doesn't cascade > val acc = sc.longAccumulator("acc") > val parent1DS = spark.createDataset(Seq("a", "b", "c")) > .withColumn("id", monotonically_increasing_id) > .as[a] > .persist(storageLevel) > val parent2DS = spark.createDataset(Seq(1, 2, 3)) // 0,1,2 doesn't cascade > .persist(storageLevel) > val childDS = parent1DS.joinWith(parent2DS, parent1DS("id") === > parent2DS("value")) >.map(i =>{ > acc.add(1) > i > }).persist(storageLevel) > childDS.count > parent1DS.unpersist > childDS.count > acc.value should be(2) {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40200) unpersist cascades with Kryo, MEMORY_AND_DISK_SER and monotonically_increasing_id
[ https://issues.apache.org/jira/browse/SPARK-40200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Calvin Pietersen updated SPARK-40200: - Description: Unpersist of a parent dataset which has a column from _*monotonically_increasing_id*_ cascades to a child dataset when * joined on another dataset * kryo serialization is enabled * storage level is MEMORY_AND_DISK_SER * not all rows join {code:java} import org.apache.spark.sql.functions.monotonically_increasing_id import org.apache.spark.storage.StorageLevel case class a(value: String, id: Long) val storageLevel = StorageLevel.MEMORY_AND_DISK_SER // cascades //val storageLevel = StorageLevel.MEMORY_ONLY // doesn't cascade val acc = sc.longAccumulator("acc") val parent1DS = spark.createDataset(Seq("a", "b", "c")) .withColumn("id", monotonically_increasing_id) .as[a] .persist(storageLevel) val parent2DS = spark.createDataset(Seq(1, 2, 3)) // 0,1,2 doesn't cascade .persist(storageLevel) val childDS = parent1DS.joinWith(parent2DS, parent1DS("id") === parent2DS("value")) .map(i =>{ acc.add(1) i }).persist(storageLevel) childDS.count parent1DS.unpersist childDS.count acc.value should be(2) {code} was: Unpersist of a parent dataset which has a column from `monotonically_increasing_id` cascades to a child dataset when * joined on another dataset * kryo serialization is enabled * storage level is MEMORY_AND_DISK_SER * not all rows join {code:java} import org.apache.spark.sql.functions.monotonically_increasing_id import org.apache.spark.storage.StorageLevel case class a(value: String, id: Long) val storageLevel = StorageLevel.MEMORY_AND_DISK_SER // cascades //val storageLevel = StorageLevel.MEMORY_ONLY // doesn't cascade val acc = sc.longAccumulator("acc") val parent1DS = spark.createDataset(Seq("a", "b", "c")) .withColumn("id", monotonically_increasing_id) .as[a] .persist(storageLevel) val parent2DS = spark.createDataset(Seq(1, 2, 3)) // 0,1,2 doesn't cascade .persist(storageLevel) val childDS = parent1DS.joinWith(parent2DS, parent1DS("id") === parent2DS("value")) .map(i =>{ acc.add(1) i }).persist(storageLevel) childDS.count parent1DS.unpersist childDS.count acc.value should be(2) {code} > unpersist cascades with Kryo, MEMORY_AND_DISK_SER and > monotonically_increasing_id > - > > Key: SPARK-40200 > URL: https://issues.apache.org/jira/browse/SPARK-40200 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.3.0 > Environment: spark-3.3.0 >Reporter: Calvin Pietersen >Priority: Major > > Unpersist of a parent dataset which has a column from > _*monotonically_increasing_id*_ cascades to a child dataset when > * joined on another dataset > * kryo serialization is enabled > * storage level is MEMORY_AND_DISK_SER > * not all rows join > > {code:java} > import org.apache.spark.sql.functions.monotonically_increasing_id > import org.apache.spark.storage.StorageLevel > case class a(value: String, id: Long) > val storageLevel = StorageLevel.MEMORY_AND_DISK_SER // cascades > //val storageLevel = StorageLevel.MEMORY_ONLY // doesn't cascade > val acc = sc.longAccumulator("acc") > val parent1DS = spark.createDataset(Seq("a", "b", "c")) > .withColumn("id", monotonically_increasing_id) > .as[a] > .persist(storageLevel) > val parent2DS = spark.createDataset(Seq(1, 2, 3)) // 0,1,2 doesn't cascade > .persist(storageLevel) > val childDS = parent1DS.joinWith(parent2DS, parent1DS("id") === > parent2DS("value")) >.map(i =>{ > acc.add(1) > i > }).persist(storageLevel) > childDS.count > parent1DS.unpersist > childDS.count > acc.value should be(2) {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40200) unpersist cascades with Kryo, MEMORY_AND_DISK_SER and monotonically_increasing_id
[ https://issues.apache.org/jira/browse/SPARK-40200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Calvin Pietersen updated SPARK-40200: - Affects Version/s: 3.2.1 > unpersist cascades with Kryo, MEMORY_AND_DISK_SER and > monotonically_increasing_id > - > > Key: SPARK-40200 > URL: https://issues.apache.org/jira/browse/SPARK-40200 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.2.1, 3.3.0 > Environment: spark-3.3.0 >Reporter: Calvin Pietersen >Priority: Major > > Unpersist of a parent dataset which has a column from > _*monotonically_increasing_id*_ cascades to a child dataset when > * joined on another dataset > * kryo serialization is enabled > * storage level is MEMORY_AND_DISK_SER > * not all rows join > > {code:java} > import org.apache.spark.sql.functions.monotonically_increasing_id > import org.apache.spark.storage.StorageLevel > case class a(value: String, id: Long) > val storageLevel = StorageLevel.MEMORY_AND_DISK_SER // cascades > //val storageLevel = StorageLevel.MEMORY_ONLY // doesn't cascade > val acc = sc.longAccumulator("acc") > val parent1DS = spark.createDataset(Seq("a", "b", "c")) > .withColumn("id", monotonically_increasing_id) > .as[a] > .persist(storageLevel) > val parent2DS = spark.createDataset(Seq(1, 2, 3)) // 0,1,2 doesn't cascade > .persist(storageLevel) > val childDS = parent1DS.joinWith(parent2DS, parent1DS("id") === > parent2DS("value")) >.map(i =>{ > acc.add(1) > i > }).persist(storageLevel) > childDS.count > parent1DS.unpersist > childDS.count > acc.value should be(2) {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40152) Codegen compilation error when using split_part
[ https://issues.apache.org/jira/browse/SPARK-40152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17583923#comment-17583923 ] Apache Spark commented on SPARK-40152: -- User 'wangyum' has created a pull request for this issue: https://github.com/apache/spark/pull/37637 > Codegen compilation error when using split_part > --- > > Key: SPARK-40152 > URL: https://issues.apache.org/jira/browse/SPARK-40152 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: Bruce Robbins >Assignee: Yuming Wang >Priority: Major > Fix For: 3.4.0, 3.3.1 > > > The following query throws an error: > {noformat} > create or replace temp view v1 as > select * from values > ('11.12.13', '.', 3) > as v1(col1, col2, col3); > cache table v1; > SELECT split_part(col1, col2, col3) > from v1; > {noformat} > The error is: > {noformat} > 22/08/19 14:25:14 ERROR CodeGenerator: failed to compile: > org.codehaus.commons.compiler.CompileException: File 'generated.java', Line > 42, Column 1: Expression "project_isNull_0 = false" is not a type > org.codehaus.commons.compiler.CompileException: File 'generated.java', Line > 42, Column 1: Expression "project_isNull_0 = false" is not a type > at > org.codehaus.janino.Java$Atom.toTypeOrCompileException(Java.java:3934) > at org.codehaus.janino.Parser.parseBlockStatement(Parser.java:1887) > at org.codehaus.janino.Parser.parseBlockStatements(Parser.java:1811) > at org.codehaus.janino.Parser.parseBlock(Parser.java:1792) > at > {noformat} > In the end, {{split_part}} does successfully execute, although in interpreted > mode. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40152) Codegen compilation error when using split_part
[ https://issues.apache.org/jira/browse/SPARK-40152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17583924#comment-17583924 ] Apache Spark commented on SPARK-40152: -- User 'wangyum' has created a pull request for this issue: https://github.com/apache/spark/pull/37637 > Codegen compilation error when using split_part > --- > > Key: SPARK-40152 > URL: https://issues.apache.org/jira/browse/SPARK-40152 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: Bruce Robbins >Assignee: Yuming Wang >Priority: Major > Fix For: 3.4.0, 3.3.1 > > > The following query throws an error: > {noformat} > create or replace temp view v1 as > select * from values > ('11.12.13', '.', 3) > as v1(col1, col2, col3); > cache table v1; > SELECT split_part(col1, col2, col3) > from v1; > {noformat} > The error is: > {noformat} > 22/08/19 14:25:14 ERROR CodeGenerator: failed to compile: > org.codehaus.commons.compiler.CompileException: File 'generated.java', Line > 42, Column 1: Expression "project_isNull_0 = false" is not a type > org.codehaus.commons.compiler.CompileException: File 'generated.java', Line > 42, Column 1: Expression "project_isNull_0 = false" is not a type > at > org.codehaus.janino.Java$Atom.toTypeOrCompileException(Java.java:3934) > at org.codehaus.janino.Parser.parseBlockStatement(Parser.java:1887) > at org.codehaus.janino.Parser.parseBlockStatements(Parser.java:1811) > at org.codehaus.janino.Parser.parseBlock(Parser.java:1792) > at > {noformat} > In the end, {{split_part}} does successfully execute, although in interpreted > mode. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40156) url_decode() exposes a Java error
[ https://issues.apache.org/jira/browse/SPARK-40156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17583925#comment-17583925 ] Apache Spark commented on SPARK-40156: -- User 'ming95' has created a pull request for this issue: https://github.com/apache/spark/pull/37636 > url_decode() exposes a Java error > - > > Key: SPARK-40156 > URL: https://issues.apache.org/jira/browse/SPARK-40156 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Serge Rielau >Priority: Major > > Given a badly encode string Spark returns a Java error. > It should the return an ERROR_CLASS > spark-sql> SELECT url_decode('http%3A%2F%2spark.apache.org'); > 22/08/20 17:17:20 ERROR SparkSQLDriver: Failed in [SELECT > url_decode('http%3A%2F%2spark.apache.org')] > java.lang.IllegalArgumentException: URLDecoder: Illegal hex characters in > escape (%) pattern - Error at index 1 in: "2s" > at java.base/java.net.URLDecoder.decode(URLDecoder.java:232) > at java.base/java.net.URLDecoder.decode(URLDecoder.java:142) > at > org.apache.spark.sql.catalyst.expressions.UrlCodec$.decode(urlExpressions.scala:113) > at > org.apache.spark.sql.catalyst.expressions.UrlCodec.decode(urlExpressions.scala) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40156) url_decode() exposes a Java error
[ https://issues.apache.org/jira/browse/SPARK-40156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40156: Assignee: Apache Spark > url_decode() exposes a Java error > - > > Key: SPARK-40156 > URL: https://issues.apache.org/jira/browse/SPARK-40156 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Serge Rielau >Assignee: Apache Spark >Priority: Major > > Given a badly encode string Spark returns a Java error. > It should the return an ERROR_CLASS > spark-sql> SELECT url_decode('http%3A%2F%2spark.apache.org'); > 22/08/20 17:17:20 ERROR SparkSQLDriver: Failed in [SELECT > url_decode('http%3A%2F%2spark.apache.org')] > java.lang.IllegalArgumentException: URLDecoder: Illegal hex characters in > escape (%) pattern - Error at index 1 in: "2s" > at java.base/java.net.URLDecoder.decode(URLDecoder.java:232) > at java.base/java.net.URLDecoder.decode(URLDecoder.java:142) > at > org.apache.spark.sql.catalyst.expressions.UrlCodec$.decode(urlExpressions.scala:113) > at > org.apache.spark.sql.catalyst.expressions.UrlCodec.decode(urlExpressions.scala) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40156) url_decode() exposes a Java error
[ https://issues.apache.org/jira/browse/SPARK-40156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40156: Assignee: (was: Apache Spark) > url_decode() exposes a Java error > - > > Key: SPARK-40156 > URL: https://issues.apache.org/jira/browse/SPARK-40156 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Serge Rielau >Priority: Major > > Given a badly encode string Spark returns a Java error. > It should the return an ERROR_CLASS > spark-sql> SELECT url_decode('http%3A%2F%2spark.apache.org'); > 22/08/20 17:17:20 ERROR SparkSQLDriver: Failed in [SELECT > url_decode('http%3A%2F%2spark.apache.org')] > java.lang.IllegalArgumentException: URLDecoder: Illegal hex characters in > escape (%) pattern - Error at index 1 in: "2s" > at java.base/java.net.URLDecoder.decode(URLDecoder.java:232) > at java.base/java.net.URLDecoder.decode(URLDecoder.java:142) > at > org.apache.spark.sql.catalyst.expressions.UrlCodec$.decode(urlExpressions.scala:113) > at > org.apache.spark.sql.catalyst.expressions.UrlCodec.decode(urlExpressions.scala) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33573) Server side metrics related to push-based shuffle
[ https://issues.apache.org/jira/browse/SPARK-33573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17583926#comment-17583926 ] Apache Spark commented on SPARK-33573: -- User 'rmcyang' has created a pull request for this issue: https://github.com/apache/spark/pull/37638 > Server side metrics related to push-based shuffle > - > > Key: SPARK-33573 > URL: https://issues.apache.org/jira/browse/SPARK-33573 > Project: Spark > Issue Type: Sub-task > Components: Shuffle, Spark Core >Affects Versions: 3.1.0 >Reporter: Min Shen >Priority: Major > > Shuffle Server side metrics for push based shuffle. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33573) Server side metrics related to push-based shuffle
[ https://issues.apache.org/jira/browse/SPARK-33573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17583927#comment-17583927 ] Apache Spark commented on SPARK-33573: -- User 'rmcyang' has created a pull request for this issue: https://github.com/apache/spark/pull/37638 > Server side metrics related to push-based shuffle > - > > Key: SPARK-33573 > URL: https://issues.apache.org/jira/browse/SPARK-33573 > Project: Spark > Issue Type: Sub-task > Components: Shuffle, Spark Core >Affects Versions: 3.1.0 >Reporter: Min Shen >Priority: Major > > Shuffle Server side metrics for push based shuffle. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40165) Update test plugins to latest versions
[ https://issues.apache.org/jira/browse/SPARK-40165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17583933#comment-17583933 ] Apache Spark commented on SPARK-40165: -- User 'panbingkun' has created a pull request for this issue: https://github.com/apache/spark/pull/37639 > Update test plugins to latest versions > -- > > Key: SPARK-40165 > URL: https://issues.apache.org/jira/browse/SPARK-40165 > Project: Spark > Issue Type: Improvement > Components: Build, Tests >Affects Versions: 3.4.0 >Reporter: BingKun Pan >Priority: Trivial > > Include: > * 1.scalacheck (from 1.15.4 to 1.16.0) > * 2.maven-surefire-plugin (from 3.0.0-M5 to 3.0.0-M7) > * 3.maven-dependency-plugin (from 3.1.1 to 3.3.0) > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40165) Update test plugins to latest versions
[ https://issues.apache.org/jira/browse/SPARK-40165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17583934#comment-17583934 ] Apache Spark commented on SPARK-40165: -- User 'panbingkun' has created a pull request for this issue: https://github.com/apache/spark/pull/37639 > Update test plugins to latest versions > -- > > Key: SPARK-40165 > URL: https://issues.apache.org/jira/browse/SPARK-40165 > Project: Spark > Issue Type: Improvement > Components: Build, Tests >Affects Versions: 3.4.0 >Reporter: BingKun Pan >Priority: Trivial > > Include: > * 1.scalacheck (from 1.15.4 to 1.16.0) > * 2.maven-surefire-plugin (from 3.0.0-M5 to 3.0.0-M7) > * 3.maven-dependency-plugin (from 3.1.1 to 3.3.0) > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38752) Test the error class: UNSUPPORTED_DATATYPE
[ https://issues.apache.org/jira/browse/SPARK-38752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38752: Assignee: Apache Spark > Test the error class: UNSUPPORTED_DATATYPE > -- > > Key: SPARK-38752 > URL: https://issues.apache.org/jira/browse/SPARK-38752 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Assignee: Apache Spark >Priority: Minor > Labels: starter > > Add a test for the error classes *UNSUPPORTED_DATATYPE* to > QueryExecutionErrorsSuite. The test should cover the exception throw in > QueryExecutionErrors: > {code:scala} > def dataTypeUnsupportedError(dataType: String, failure: String): Throwable > = { > new SparkIllegalArgumentException(errorClass = "UNSUPPORTED_DATATYPE", > messageParameters = Array(dataType + failure)) > } > {code} > For example, here is a test for the error class *UNSUPPORTED_FEATURE*: > https://github.com/apache/spark/blob/34e3029a43d2a8241f70f2343be8285cb7f231b9/sql/core/src/test/scala/org/apache/spark/sql/errors/QueryCompilationErrorsSuite.scala#L151-L170 > +The test must have a check of:+ > # the entire error message > # sqlState if it is defined in the error-classes.json file > # the error class -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38752) Test the error class: UNSUPPORTED_DATATYPE
[ https://issues.apache.org/jira/browse/SPARK-38752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17583943#comment-17583943 ] Apache Spark commented on SPARK-38752: -- User 'lvshaokang' has created a pull request for this issue: https://github.com/apache/spark/pull/37640 > Test the error class: UNSUPPORTED_DATATYPE > -- > > Key: SPARK-38752 > URL: https://issues.apache.org/jira/browse/SPARK-38752 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Priority: Minor > Labels: starter > > Add a test for the error classes *UNSUPPORTED_DATATYPE* to > QueryExecutionErrorsSuite. The test should cover the exception throw in > QueryExecutionErrors: > {code:scala} > def dataTypeUnsupportedError(dataType: String, failure: String): Throwable > = { > new SparkIllegalArgumentException(errorClass = "UNSUPPORTED_DATATYPE", > messageParameters = Array(dataType + failure)) > } > {code} > For example, here is a test for the error class *UNSUPPORTED_FEATURE*: > https://github.com/apache/spark/blob/34e3029a43d2a8241f70f2343be8285cb7f231b9/sql/core/src/test/scala/org/apache/spark/sql/errors/QueryCompilationErrorsSuite.scala#L151-L170 > +The test must have a check of:+ > # the entire error message > # sqlState if it is defined in the error-classes.json file > # the error class -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38752) Test the error class: UNSUPPORTED_DATATYPE
[ https://issues.apache.org/jira/browse/SPARK-38752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17583944#comment-17583944 ] Apache Spark commented on SPARK-38752: -- User 'lvshaokang' has created a pull request for this issue: https://github.com/apache/spark/pull/37640 > Test the error class: UNSUPPORTED_DATATYPE > -- > > Key: SPARK-38752 > URL: https://issues.apache.org/jira/browse/SPARK-38752 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Priority: Minor > Labels: starter > > Add a test for the error classes *UNSUPPORTED_DATATYPE* to > QueryExecutionErrorsSuite. The test should cover the exception throw in > QueryExecutionErrors: > {code:scala} > def dataTypeUnsupportedError(dataType: String, failure: String): Throwable > = { > new SparkIllegalArgumentException(errorClass = "UNSUPPORTED_DATATYPE", > messageParameters = Array(dataType + failure)) > } > {code} > For example, here is a test for the error class *UNSUPPORTED_FEATURE*: > https://github.com/apache/spark/blob/34e3029a43d2a8241f70f2343be8285cb7f231b9/sql/core/src/test/scala/org/apache/spark/sql/errors/QueryCompilationErrorsSuite.scala#L151-L170 > +The test must have a check of:+ > # the entire error message > # sqlState if it is defined in the error-classes.json file > # the error class -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38752) Test the error class: UNSUPPORTED_DATATYPE
[ https://issues.apache.org/jira/browse/SPARK-38752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38752: Assignee: (was: Apache Spark) > Test the error class: UNSUPPORTED_DATATYPE > -- > > Key: SPARK-38752 > URL: https://issues.apache.org/jira/browse/SPARK-38752 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Priority: Minor > Labels: starter > > Add a test for the error classes *UNSUPPORTED_DATATYPE* to > QueryExecutionErrorsSuite. The test should cover the exception throw in > QueryExecutionErrors: > {code:scala} > def dataTypeUnsupportedError(dataType: String, failure: String): Throwable > = { > new SparkIllegalArgumentException(errorClass = "UNSUPPORTED_DATATYPE", > messageParameters = Array(dataType + failure)) > } > {code} > For example, here is a test for the error class *UNSUPPORTED_FEATURE*: > https://github.com/apache/spark/blob/34e3029a43d2a8241f70f2343be8285cb7f231b9/sql/core/src/test/scala/org/apache/spark/sql/errors/QueryCompilationErrorsSuite.scala#L151-L170 > +The test must have a check of:+ > # the entire error message > # sqlState if it is defined in the error-classes.json file > # the error class -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40201) Improve v1 write test coverage
XiDuo You created SPARK-40201: - Summary: Improve v1 write test coverage Key: SPARK-40201 URL: https://issues.apache.org/jira/browse/SPARK-40201 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.4.0 Reporter: XiDuo You Make v1 write test work on all SQL tests -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40201) Improve v1 write test coverage
[ https://issues.apache.org/jira/browse/SPARK-40201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17583972#comment-17583972 ] Apache Spark commented on SPARK-40201: -- User 'ulysses-you' has created a pull request for this issue: https://github.com/apache/spark/pull/37641 > Improve v1 write test coverage > -- > > Key: SPARK-40201 > URL: https://issues.apache.org/jira/browse/SPARK-40201 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: XiDuo You >Priority: Major > > Make v1 write test work on all SQL tests -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40201) Improve v1 write test coverage
[ https://issues.apache.org/jira/browse/SPARK-40201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40201: Assignee: Apache Spark > Improve v1 write test coverage > -- > > Key: SPARK-40201 > URL: https://issues.apache.org/jira/browse/SPARK-40201 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: XiDuo You >Assignee: Apache Spark >Priority: Major > > Make v1 write test work on all SQL tests -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40201) Improve v1 write test coverage
[ https://issues.apache.org/jira/browse/SPARK-40201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40201: Assignee: (was: Apache Spark) > Improve v1 write test coverage > -- > > Key: SPARK-40201 > URL: https://issues.apache.org/jira/browse/SPARK-40201 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: XiDuo You >Priority: Major > > Make v1 write test work on all SQL tests -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-40198) Enable spark.storage.decommission.(rdd|shuffle)Blocks.enabled by default
[ https://issues.apache.org/jira/browse/SPARK-40198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-40198. --- Fix Version/s: 3.4.0 Assignee: Dongjoon Hyun Resolution: Fixed This is resolved via https://github.com/apache/spark/pull/37633 > Enable spark.storage.decommission.(rdd|shuffle)Blocks.enabled by default > > > Key: SPARK-40198 > URL: https://issues.apache.org/jira/browse/SPARK-40198 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40202) Allow a map in SparkSession.config in PySpark
Hyukjin Kwon created SPARK-40202: Summary: Allow a map in SparkSession.config in PySpark Key: SPARK-40202 URL: https://issues.apache.org/jira/browse/SPARK-40202 Project: Spark Issue Type: Bug Components: PySpark, SQL Affects Versions: 3.4.0 Reporter: Hyukjin Kwon SPARK-40163 added a new signature in SparkSession.conf. We should better have the same one in PySpark too. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40202) Allow a dictionary in SparkSession.config in PySpark
[ https://issues.apache.org/jira/browse/SPARK-40202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-40202: - Summary: Allow a dictionary in SparkSession.config in PySpark (was: Allow a map in SparkSession.config in PySpark) > Allow a dictionary in SparkSession.config in PySpark > > > Key: SPARK-40202 > URL: https://issues.apache.org/jira/browse/SPARK-40202 > Project: Spark > Issue Type: Bug > Components: PySpark, SQL >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Priority: Major > > SPARK-40163 added a new signature in SparkSession.conf. We should better have > the same one in PySpark too. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org