[jira] [Updated] (SPARK-43225) Remove jackson-core-asl and jackson-mapper-asl from pre-built distribution
[ https://issues.apache.org/jira/browse/SPARK-43225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-43225: Summary: Remove jackson-core-asl and jackson-mapper-asl from pre-built distribution (was: Change the scope of jackson-mapper-asl from compile to test) > Remove jackson-core-asl and jackson-mapper-asl from pre-built distribution > -- > > Key: SPARK-43225 > URL: https://issues.apache.org/jira/browse/SPARK-43225 > Project: Spark > Issue Type: Bug > Components: Build, SQL >Affects Versions: 3.5.0 >Reporter: Yuming Wang >Priority: Major > > To fix CVE issue: https://github.com/apache/spark/security/dependabot/50 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43102) Upgrade commons-compress to 1.23.0
[ https://issues.apache.org/jira/browse/SPARK-43102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng resolved SPARK-43102. --- Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 40751 [https://github.com/apache/spark/pull/40751] > Upgrade commons-compress to 1.23.0 > -- > > Key: SPARK-43102 > URL: https://issues.apache.org/jira/browse/SPARK-43102 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.5.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Minor > Fix For: 3.5.0 > > > https://commons.apache.org/proper/commons-compress/changes-report.html#a1.23.0 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-43102) Upgrade commons-compress to 1.23.0
[ https://issues.apache.org/jira/browse/SPARK-43102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng reassigned SPARK-43102: - Assignee: Yang Jie > Upgrade commons-compress to 1.23.0 > -- > > Key: SPARK-43102 > URL: https://issues.apache.org/jira/browse/SPARK-43102 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.5.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Minor > > https://commons.apache.org/proper/commons-compress/changes-report.html#a1.23.0 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43227) Fix deserialisation issue when UDFs contain a lambda expression
Venkata Sai Akhil Gudesa created SPARK-43227: Summary: Fix deserialisation issue when UDFs contain a lambda expression Key: SPARK-43227 URL: https://issues.apache.org/jira/browse/SPARK-43227 Project: Spark Issue Type: Bug Components: Connect Affects Versions: 3.5.0 Reporter: Venkata Sai Akhil Gudesa The following code: {code:java} class A(x: Int) { def get = x * 20 + 5 } val dummyUdf = (x: Int) => new A(x).get val myUdf = udf(dummyUdf) spark.range(5).select(myUdf(col("id"))).as[Int].collect() {code} hits the following error: {noformat} io.grpc.StatusRuntimeException: INTERNAL: cannot assign instance of java.lang.invoke.SerializedLambda to field ammonite.$sess.cmd26$Helper.dummyUdf of type scala.Function1 in instance of ammonite.$sess.cmd26$Helper io.grpc.Status.asRuntimeException(Status.java:535) io.grpc.stub.ClientCalls$BlockingResponseStream.hasNext(ClientCalls.java:660) org.apache.spark.sql.connect.client.SparkResult.org$apache$spark$sql$connect$client$SparkResult$$processResponses(SparkResult.scala:62) org.apache.spark.sql.connect.client.SparkResult.length(SparkResult.scala:114) org.apache.spark.sql.connect.client.SparkResult.toArray(SparkResult.scala:131) org.apache.spark.sql.Dataset.$anonfun$collect$1(Dataset.scala:2687) org.apache.spark.sql.Dataset.withResult(Dataset.scala:3088) org.apache.spark.sql.Dataset.collect(Dataset.scala:2686) ammonite.$sess.cmd28$Helper.(cmd28.sc:1) ammonite.$sess.cmd28$.(cmd28.sc:7) ammonite.$sess.cmd28$.(cmd28.sc){noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43213) Add `DataFrame.offset` to PySpark
[ https://issues.apache.org/jira/browse/SPARK-43213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng resolved SPARK-43213. --- Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 40873 [https://github.com/apache/spark/pull/40873] > Add `DataFrame.offset` to PySpark > - > > Key: SPARK-43213 > URL: https://issues.apache.org/jira/browse/SPARK-43213 > Project: Spark > Issue Type: New Feature > Components: PySpark >Affects Versions: 3.5.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Fix For: 3.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-43213) Add `DataFrame.offset` to PySpark
[ https://issues.apache.org/jira/browse/SPARK-43213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng reassigned SPARK-43213: - Assignee: Ruifeng Zheng > Add `DataFrame.offset` to PySpark > - > > Key: SPARK-43213 > URL: https://issues.apache.org/jira/browse/SPARK-43213 > Project: Spark > Issue Type: New Feature > Components: PySpark >Affects Versions: 3.5.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43224) Executor should not be removed when decommissioned in standalone
[ https://issues.apache.org/jira/browse/SPARK-43224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17714820#comment-17714820 ] Hyukjin Kwon commented on SPARK-43224: -- [~warrenzhu25]would be great if we have some description on this issue. > Executor should not be removed when decommissioned in standalone > > > Key: SPARK-43224 > URL: https://issues.apache.org/jira/browse/SPARK-43224 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Zhongwei Zhu >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43226) Define extractors for file-constant metadata columns
[ https://issues.apache.org/jira/browse/SPARK-43226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-43226: - Target Version/s: (was: 3.5.0) > Define extractors for file-constant metadata columns > > > Key: SPARK-43226 > URL: https://issues.apache.org/jira/browse/SPARK-43226 > Project: Spark > Issue Type: New Feature > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Ryan Johnson >Priority: Major > > File-source constant metadata columns are often derived indirectly from > file-level metadata values rather than exposing those values directly. For > example, {{_metadata.file_name}} is currently hard-coded in > {{FileFormat.updateMetadataInternalRow}} as: > > {code:java} > UTF8String.fromString(filePath.getName){code} > > We should add support for metadata extractors, functions that map from > {{PartitionedFile}} to {{{}Literal{}}}, so that we can express such columns > in a generic way instead of hard-coding them. > We can't just add them to the metadata map because then they have to be > pre-computed even if it turns out the query does not select that field. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43225) Change the scope of jackson-mapper-asl from compile to test
[ https://issues.apache.org/jira/browse/SPARK-43225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17714817#comment-17714817 ] Snoot.io commented on SPARK-43225: -- User 'wangyum' has created a pull request for this issue: https://github.com/apache/spark/pull/40893 > Change the scope of jackson-mapper-asl from compile to test > --- > > Key: SPARK-43225 > URL: https://issues.apache.org/jira/browse/SPARK-43225 > Project: Spark > Issue Type: Bug > Components: Build, SQL >Affects Versions: 3.5.0 >Reporter: Yuming Wang >Priority: Major > > To fix CVE issue: https://github.com/apache/spark/security/dependabot/50 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43222) Remove check of `isHadoop3`
[ https://issues.apache.org/jira/browse/SPARK-43222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17714819#comment-17714819 ] Snoot.io commented on SPARK-43222: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/40882 > Remove check of `isHadoop3` > --- > > Key: SPARK-43222 > URL: https://issues.apache.org/jira/browse/SPARK-43222 > Project: Spark > Issue Type: Sub-task > Components: Spark Core, SQL >Affects Versions: 3.5.0 >Reporter: Yang Jie >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43193) Remove workaround for HADOOP-12074
[ https://issues.apache.org/jira/browse/SPARK-43193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17714818#comment-17714818 ] Snoot.io commented on SPARK-43193: -- User 'pan3793' has created a pull request for this issue: https://github.com/apache/spark/pull/40852 > Remove workaround for HADOOP-12074 > -- > > Key: SPARK-43193 > URL: https://issues.apache.org/jira/browse/SPARK-43193 > Project: Spark > Issue Type: Sub-task > Components: Structured Streaming >Affects Versions: 3.5.0 >Reporter: Cheng Pan >Assignee: Cheng Pan >Priority: Major > Fix For: 3.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31733) Make YarnClient.`specify a more specific type for the application` pass in Hadoop-3.2
[ https://issues.apache.org/jira/browse/SPARK-31733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17714816#comment-17714816 ] Snoot.io commented on SPARK-31733: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/40877 > Make YarnClient.`specify a more specific type for the application` pass in > Hadoop-3.2 > - > > Key: SPARK-31733 > URL: https://issues.apache.org/jira/browse/SPARK-31733 > Project: Spark > Issue Type: Bug > Components: Tests >Affects Versions: 3.0.0 >Reporter: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43193) Remove workaround for HADOOP-12074
[ https://issues.apache.org/jira/browse/SPARK-43193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-43193. -- Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 40852 [https://github.com/apache/spark/pull/40852] > Remove workaround for HADOOP-12074 > -- > > Key: SPARK-43193 > URL: https://issues.apache.org/jira/browse/SPARK-43193 > Project: Spark > Issue Type: Sub-task > Components: Structured Streaming >Affects Versions: 3.5.0 >Reporter: Cheng Pan >Assignee: Cheng Pan >Priority: Major > Fix For: 3.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-43193) Remove workaround for HADOOP-12074
[ https://issues.apache.org/jira/browse/SPARK-43193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-43193: Assignee: Cheng Pan > Remove workaround for HADOOP-12074 > -- > > Key: SPARK-43193 > URL: https://issues.apache.org/jira/browse/SPARK-43193 > Project: Spark > Issue Type: Sub-task > Components: Structured Streaming >Affects Versions: 3.5.0 >Reporter: Cheng Pan >Assignee: Cheng Pan >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43226) Define extractors for file-constant metadata columns
Ryan Johnson created SPARK-43226: Summary: Define extractors for file-constant metadata columns Key: SPARK-43226 URL: https://issues.apache.org/jira/browse/SPARK-43226 Project: Spark Issue Type: New Feature Components: Spark Core Affects Versions: 3.4.0 Reporter: Ryan Johnson File-source constant metadata columns are often derived indirectly from file-level metadata values rather than exposing those values directly. For example, {{_metadata.file_name}} is currently hard-coded in {{FileFormat.updateMetadataInternalRow}} as: {code:java} UTF8String.fromString(filePath.getName){code} We should add support for metadata extractors, functions that map from {{PartitionedFile}} to {{{}Literal{}}}, so that we can express such columns in a generic way instead of hard-coding them. We can't just add them to the metadata map because then they have to be pre-computed even if it turns out the query does not select that field. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43128) Streaming progress struct (especially in Scala)
[ https://issues.apache.org/jira/browse/SPARK-43128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17714812#comment-17714812 ] Snoot.io commented on SPARK-43128: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/40892 > Streaming progress struct (especially in Scala) > --- > > Key: SPARK-43128 > URL: https://issues.apache.org/jira/browse/SPARK-43128 > Project: Spark > Issue Type: Task > Components: Connect, Structured Streaming >Affects Versions: 3.5.0 >Reporter: Raghu Angadi >Priority: Major > > Streaming spark connect transfers streaming progress as full “json”. > This works ok for Python since it does not have any schema defined. > But in Scala, it is a full fledged class. We need to decide if we want to > match legacy Progress struct in spark-connect. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43128) Streaming progress struct (especially in Scala)
[ https://issues.apache.org/jira/browse/SPARK-43128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17714810#comment-17714810 ] Snoot.io commented on SPARK-43128: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/40892 > Streaming progress struct (especially in Scala) > --- > > Key: SPARK-43128 > URL: https://issues.apache.org/jira/browse/SPARK-43128 > Project: Spark > Issue Type: Task > Components: Connect, Structured Streaming >Affects Versions: 3.5.0 >Reporter: Raghu Angadi >Priority: Major > > Streaming spark connect transfers streaming progress as full “json”. > This works ok for Python since it does not have any schema defined. > But in Scala, it is a full fledged class. We need to decide if we want to > match legacy Progress struct in spark-connect. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42945) Support PYSPARK_JVM_STACKTRACE_ENABLED in Spark Connect
[ https://issues.apache.org/jira/browse/SPARK-42945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17714808#comment-17714808 ] Snoot.io commented on SPARK-42945: -- User 'allisonwang-db' has created a pull request for this issue: https://github.com/apache/spark/pull/40575 > Support PYSPARK_JVM_STACKTRACE_ENABLED in Spark Connect > --- > > Key: SPARK-42945 > URL: https://issues.apache.org/jira/browse/SPARK-42945 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.5.0 >Reporter: Allison Wang >Assignee: Allison Wang >Priority: Major > > Make the PySpark setting PYSPARK_JVM_STACKTRACE_ENABLED work with Spark > Connect. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43136) Scala mapGroup, coGroup
[ https://issues.apache.org/jira/browse/SPARK-43136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17714807#comment-17714807 ] Snoot.io commented on SPARK-43136: -- User 'zhenlineo' has created a pull request for this issue: https://github.com/apache/spark/pull/40729 > Scala mapGroup, coGroup > --- > > Key: SPARK-43136 > URL: https://issues.apache.org/jira/browse/SPARK-43136 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.5.0 >Reporter: Zhen Li >Priority: Major > > Adding Basics of Dataset#groupByKey -> KeyValueGroupedDataset support -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43225) Change the scope of jackson-mapper-asl from compile to test
Yuming Wang created SPARK-43225: --- Summary: Change the scope of jackson-mapper-asl from compile to test Key: SPARK-43225 URL: https://issues.apache.org/jira/browse/SPARK-43225 Project: Spark Issue Type: Bug Components: Build, SQL Affects Versions: 3.5.0 Reporter: Yuming Wang To fix CVE issue: https://github.com/apache/spark/security/dependabot/50 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-43119) Support Get SQL Keywords Dynamically
[ https://issues.apache.org/jira/browse/SPARK-43119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao reassigned SPARK-43119: Assignee: Kent Yao > Support Get SQL Keywords Dynamically > > > Key: SPARK-43119 > URL: https://issues.apache.org/jira/browse/SPARK-43119 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.5.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > > Implements the JDBC standard API and an auxiliary function -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43119) Support Get SQL Keywords Dynamically
[ https://issues.apache.org/jira/browse/SPARK-43119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao resolved SPARK-43119. -- Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 40768 [https://github.com/apache/spark/pull/40768] > Support Get SQL Keywords Dynamically > > > Key: SPARK-43119 > URL: https://issues.apache.org/jira/browse/SPARK-43119 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.5.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > Fix For: 3.5.0 > > > Implements the JDBC standard API and an auxiliary function -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-43220) INSERT INTO REPLACE statement can't support WHERE with bool_expression
[ https://issues.apache.org/jira/browse/SPARK-43220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jia Fan closed SPARK-43220. --- > INSERT INTO REPLACE statement can't support WHERE with bool_expression > -- > > Key: SPARK-43220 > URL: https://issues.apache.org/jira/browse/SPARK-43220 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0 >Reporter: Jia Fan >Priority: Major > Attachments: image-2023-04-20-23-40-25-212.png > > > {code:java} > sql("CREATE TABLE persons (name string,address String,ssn int) USING parquet") > sql("CREATE TABLE persons2 (name string,address String,ssn int) USING > parquet") > sql("INSERT INTO TABLE persons VALUES " + > "('Dora Williams', '134 Forest Ave, Menlo Park', 123456789)," + > "('Eddie Davis','245 Market St, Milpitas',345678901)") > sql("INSERT INTO TABLE persons2 VALUES ('Ashua Hill', '456 Erica Ct, > Cupertino', 432795921)") > sql("INSERT INTO persons REPLACE WHERE ssn = 123456789 SELECT * FROM > persons2") > sql("SELECT * FROM persons").show(){code} > When use `INSERT INTO table REPLACE WHERE`, only support `WHERE TRUE` at now. > `WHERE ssn = 123456789` or `WHERE FALSE` both not support. > !image-2023-04-20-23-40-25-212.png|width=795,height=152! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43220) INSERT INTO REPLACE statement can't support WHERE with bool_expression
[ https://issues.apache.org/jira/browse/SPARK-43220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17714794#comment-17714794 ] Jia Fan commented on SPARK-43220: - After I tested with another way, it works. My fault. > INSERT INTO REPLACE statement can't support WHERE with bool_expression > -- > > Key: SPARK-43220 > URL: https://issues.apache.org/jira/browse/SPARK-43220 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0 >Reporter: Jia Fan >Priority: Major > Attachments: image-2023-04-20-23-40-25-212.png > > > {code:java} > sql("CREATE TABLE persons (name string,address String,ssn int) USING parquet") > sql("CREATE TABLE persons2 (name string,address String,ssn int) USING > parquet") > sql("INSERT INTO TABLE persons VALUES " + > "('Dora Williams', '134 Forest Ave, Menlo Park', 123456789)," + > "('Eddie Davis','245 Market St, Milpitas',345678901)") > sql("INSERT INTO TABLE persons2 VALUES ('Ashua Hill', '456 Erica Ct, > Cupertino', 432795921)") > sql("INSERT INTO persons REPLACE WHERE ssn = 123456789 SELECT * FROM > persons2") > sql("SELECT * FROM persons").show(){code} > When use `INSERT INTO table REPLACE WHERE`, only support `WHERE TRUE` at now. > `WHERE ssn = 123456789` or `WHERE FALSE` both not support. > !image-2023-04-20-23-40-25-212.png|width=795,height=152! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43220) INSERT INTO REPLACE statement can't support WHERE with bool_expression
[ https://issues.apache.org/jira/browse/SPARK-43220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jia Fan resolved SPARK-43220. - Resolution: Invalid > INSERT INTO REPLACE statement can't support WHERE with bool_expression > -- > > Key: SPARK-43220 > URL: https://issues.apache.org/jira/browse/SPARK-43220 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0 >Reporter: Jia Fan >Priority: Major > Attachments: image-2023-04-20-23-40-25-212.png > > > {code:java} > sql("CREATE TABLE persons (name string,address String,ssn int) USING parquet") > sql("CREATE TABLE persons2 (name string,address String,ssn int) USING > parquet") > sql("INSERT INTO TABLE persons VALUES " + > "('Dora Williams', '134 Forest Ave, Menlo Park', 123456789)," + > "('Eddie Davis','245 Market St, Milpitas',345678901)") > sql("INSERT INTO TABLE persons2 VALUES ('Ashua Hill', '456 Erica Ct, > Cupertino', 432795921)") > sql("INSERT INTO persons REPLACE WHERE ssn = 123456789 SELECT * FROM > persons2") > sql("SELECT * FROM persons").show(){code} > When use `INSERT INTO table REPLACE WHERE`, only support `WHERE TRUE` at now. > `WHERE ssn = 123456789` or `WHERE FALSE` both not support. > !image-2023-04-20-23-40-25-212.png|width=795,height=152! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39203) Fix remote table location based on database location
[ https://issues.apache.org/jira/browse/SPARK-39203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17714793#comment-17714793 ] Hyukjin Kwon commented on SPARK-39203: -- But to be clear, this change exists in Spark 3.4.0. It was taken out from 3.4.1 and 3.5.0. > Fix remote table location based on database location > > > Key: SPARK-39203 > URL: https://issues.apache.org/jira/browse/SPARK-39203 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0, 2.3.0, 2.4.0, 3.0.0, 3.1.0, 3.1.1, 3.2.0, 3.3.0, > 3.4.0 >Reporter: Yuming Wang >Priority: Major > > We have HDFS and Hive on cluster A. We have Spark on cluster B and need to > read data from cluster A. The table location is incorrect: > {noformat} > spark-sql> desc formatted default.test_table; > fas_acct_id decimal(18,0) > fas_acct_cd string > cmpny_cd string > entity_id string > cre_date date > cre_user string > upd_date timestamp > upd_user string > # Detailed Table Information > Database default > Table test_table > Type EXTERNAL > Provider parquet > Statistics25310025737 bytes > Location /user/hive/warehouse/test_table > Serde Library > org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe > InputFormat > org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat > OutputFormat > org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat > Storage Properties[compression=snappy] > spark-sql> desc database default; > Namespace Namedefault > Comment > Location viewfs://clusterA/user/hive/warehouse/ > Owner hive_dba > {noformat} > The correct table location should be > viewfs://clusterA/user/hive/warehouse/test_table. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-39203) Fix remote table location based on database location
[ https://issues.apache.org/jira/browse/SPARK-39203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-39203: - Fix Version/s: (was: 3.4.0) > Fix remote table location based on database location > > > Key: SPARK-39203 > URL: https://issues.apache.org/jira/browse/SPARK-39203 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0, 2.3.0, 2.4.0, 3.0.0, 3.1.0, 3.1.1, 3.2.0, 3.3.0, > 3.4.0 >Reporter: Yuming Wang >Priority: Major > > We have HDFS and Hive on cluster A. We have Spark on cluster B and need to > read data from cluster A. The table location is incorrect: > {noformat} > spark-sql> desc formatted default.test_table; > fas_acct_id decimal(18,0) > fas_acct_cd string > cmpny_cd string > entity_id string > cre_date date > cre_user string > upd_date timestamp > upd_user string > # Detailed Table Information > Database default > Table test_table > Type EXTERNAL > Provider parquet > Statistics25310025737 bytes > Location /user/hive/warehouse/test_table > Serde Library > org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe > InputFormat > org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat > OutputFormat > org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat > Storage Properties[compression=snappy] > spark-sql> desc database default; > Namespace Namedefault > Comment > Location viewfs://clusterA/user/hive/warehouse/ > Owner hive_dba > {noformat} > The correct table location should be > viewfs://clusterA/user/hive/warehouse/test_table. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-39203) Fix remote table location based on database location
[ https://issues.apache.org/jira/browse/SPARK-39203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reopened SPARK-39203: -- Assignee: (was: Yuming Wang) Reverted in https://github.com/apache/spark/pull/40871 > Fix remote table location based on database location > > > Key: SPARK-39203 > URL: https://issues.apache.org/jira/browse/SPARK-39203 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0, 2.3.0, 2.4.0, 3.0.0, 3.1.0, 3.1.1, 3.2.0, 3.3.0, > 3.4.0 >Reporter: Yuming Wang >Priority: Major > Fix For: 3.4.0 > > > We have HDFS and Hive on cluster A. We have Spark on cluster B and need to > read data from cluster A. The table location is incorrect: > {noformat} > spark-sql> desc formatted default.test_table; > fas_acct_id decimal(18,0) > fas_acct_cd string > cmpny_cd string > entity_id string > cre_date date > cre_user string > upd_date timestamp > upd_user string > # Detailed Table Information > Database default > Table test_table > Type EXTERNAL > Provider parquet > Statistics25310025737 bytes > Location /user/hive/warehouse/test_table > Serde Library > org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe > InputFormat > org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat > OutputFormat > org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat > Storage Properties[compression=snappy] > spark-sql> desc database default; > Namespace Namedefault > Comment > Location viewfs://clusterA/user/hive/warehouse/ > Owner hive_dba > {noformat} > The correct table location should be > viewfs://clusterA/user/hive/warehouse/test_table. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43124) Dataset.show should not trigger job execution on CommandResults
[ https://issues.apache.org/jira/browse/SPARK-43124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-43124. -- Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 40779 [https://github.com/apache/spark/pull/40779] > Dataset.show should not trigger job execution on CommandResults > --- > > Key: SPARK-43124 > URL: https://issues.apache.org/jira/browse/SPARK-43124 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0 >Reporter: Peter Toth >Assignee: Peter Toth >Priority: Major > Fix For: 3.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-43124) Dataset.show should not trigger job execution on CommandResults
[ https://issues.apache.org/jira/browse/SPARK-43124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-43124: Assignee: Peter Toth > Dataset.show should not trigger job execution on CommandResults > --- > > Key: SPARK-43124 > URL: https://issues.apache.org/jira/browse/SPARK-43124 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0 >Reporter: Peter Toth >Assignee: Peter Toth >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-42960) Add remaining Streaming Query commands like await_termination()
[ https://issues.apache.org/jira/browse/SPARK-42960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-42960. -- Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 40785 [https://github.com/apache/spark/pull/40785] > Add remaining Streaming Query commands like await_termination() > --- > > Key: SPARK-42960 > URL: https://issues.apache.org/jira/browse/SPARK-42960 > Project: Spark > Issue Type: Task > Components: Connect, Structured Streaming >Affects Versions: 3.5.0 >Reporter: Raghu Angadi >Assignee: Raghu Angadi >Priority: Major > Fix For: 3.5.0 > > > Add remaining Streaming Query API, including: > * await_termination() : needs to be a streaming RPC. > * exception() -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42960) Add remaining Streaming Query commands like await_termination()
[ https://issues.apache.org/jira/browse/SPARK-42960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-42960: Assignee: Raghu Angadi > Add remaining Streaming Query commands like await_termination() > --- > > Key: SPARK-42960 > URL: https://issues.apache.org/jira/browse/SPARK-42960 > Project: Spark > Issue Type: Task > Components: Connect, Structured Streaming >Affects Versions: 3.5.0 >Reporter: Raghu Angadi >Assignee: Raghu Angadi >Priority: Major > > Add remaining Streaming Query API, including: > * await_termination() : needs to be a streaming RPC. > * exception() -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43224) Executor should not be removed when decommissioned in standalone
Zhongwei Zhu created SPARK-43224: Summary: Executor should not be removed when decommissioned in standalone Key: SPARK-43224 URL: https://issues.apache.org/jira/browse/SPARK-43224 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 3.4.0 Reporter: Zhongwei Zhu -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43211) Remove Hadoop2 support in IsolatedClientLoader
[ https://issues.apache.org/jira/browse/SPARK-43211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun resolved SPARK-43211. -- Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 40870 [https://github.com/apache/spark/pull/40870] > Remove Hadoop2 support in IsolatedClientLoader > -- > > Key: SPARK-43211 > URL: https://issues.apache.org/jira/browse/SPARK-43211 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: Cheng Pan >Assignee: Cheng Pan >Priority: Major > Fix For: 3.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-43211) Remove Hadoop2 support in IsolatedClientLoader
[ https://issues.apache.org/jira/browse/SPARK-43211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun reassigned SPARK-43211: Assignee: Cheng Pan > Remove Hadoop2 support in IsolatedClientLoader > -- > > Key: SPARK-43211 > URL: https://issues.apache.org/jira/browse/SPARK-43211 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: Cheng Pan >Assignee: Cheng Pan >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43202) Replace reflection w/ direct calling for YARN Resource API
[ https://issues.apache.org/jira/browse/SPARK-43202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun resolved SPARK-43202. -- Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 40860 [https://github.com/apache/spark/pull/40860] > Replace reflection w/ direct calling for YARN Resource API > -- > > Key: SPARK-43202 > URL: https://issues.apache.org/jira/browse/SPARK-43202 > Project: Spark > Issue Type: Sub-task > Components: YARN >Affects Versions: 3.5.0 >Reporter: Cheng Pan >Assignee: Cheng Pan >Priority: Major > Fix For: 3.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-43202) Replace reflection w/ direct calling for YARN Resource API
[ https://issues.apache.org/jira/browse/SPARK-43202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun reassigned SPARK-43202: Assignee: Cheng Pan > Replace reflection w/ direct calling for YARN Resource API > -- > > Key: SPARK-43202 > URL: https://issues.apache.org/jira/browse/SPARK-43202 > Project: Spark > Issue Type: Sub-task > Components: YARN >Affects Versions: 3.5.0 >Reporter: Cheng Pan >Assignee: Cheng Pan >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43201) Inconsistency between from_avro and from_json function
[ https://issues.apache.org/jira/browse/SPARK-43201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philip Adetiloye updated SPARK-43201: - Description: Spark from_avro function does not allow schema parameter to use dataframe column but takes only a String schema: {code:java} def from_avro(col: Column, jsonFormatSchema: String): Column {code} This makes it impossible to deserialize rows of Avro records with different schema since only one schema string could be pass externally. Here is what I would expect like from_json function: {code:java} def from_avro(col: Column, jsonFormatSchema: Column): Column {code} code example: {code:java} import org.apache.spark.sql.functions.from_avro val avroSchema1 = """{"type":"record","name":"myrecord","fields":[{"name":"str1","type":"string"},{"name":"str2","type":"string"}]}""" val avroSchema2 = """{"type":"record","name":"myrecord","fields":[{"name":"str1","type":"string"},{"name":"str2","type":"string"}]}""" val df = Seq( (Array[Byte](10, 97, 112, 112, 108, 101, 49, 0), avroSchema1), (Array[Byte](10, 97, 112, 112, 108, 101, 50, 0), avroSchema2) ).toDF("binaryData", "schema") val parsed = df.select(from_avro($"binaryData", $"schema").as("parsedData")) parsed.show() // Output: // ++ // | parsedData| // ++ // |[apple1, 1.0]| // |[apple2, 2.0]| // ++ {code} was: Spark from_avro function does not allow schema parameter to use dataframe column but takes only a String schema: {code:java} def from_avro(col: Column, jsonFormatSchema: String): Column {code} This makes it impossible to deserialize rows of Avro records with different schema since only one schema string could be pass externally. Here is what I would expect: {code:java} def from_avro(col: Column, jsonFormatSchema: Column): Column {code} code example: {code:java} import org.apache.spark.sql.functions.from_avro val avroSchema1 = """{"type":"record","name":"myrecord","fields":[{"name":"str1","type":"string"},{"name":"str2","type":"string"}]}""" val avroSchema2 = """{"type":"record","name":"myrecord","fields":[{"name":"str1","type":"string"},{"name":"str2","type":"string"}]}""" val df = Seq( (Array[Byte](10, 97, 112, 112, 108, 101, 49, 0), avroSchema1), (Array[Byte](10, 97, 112, 112, 108, 101, 50, 0), avroSchema2) ).toDF("binaryData", "schema") val parsed = df.select(from_avro($"binaryData", $"schema").as("parsedData")) parsed.show() // Output: // ++ // | parsedData| // ++ // |[apple1, 1.0]| // |[apple2, 2.0]| // ++ {code} > Inconsistency between from_avro and from_json function > -- > > Key: SPARK-43201 > URL: https://issues.apache.org/jira/browse/SPARK-43201 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Philip Adetiloye >Priority: Major > > Spark from_avro function does not allow schema parameter to use dataframe > column but takes only a String schema: > {code:java} > def from_avro(col: Column, jsonFormatSchema: String): Column {code} > This makes it impossible to deserialize rows of Avro records with different > schema since only one schema string could be pass externally. > > Here is what I would expect like from_json function: > {code:java} > def from_avro(col: Column, jsonFormatSchema: Column): Column {code} > code example: > {code:java} > import org.apache.spark.sql.functions.from_avro > val avroSchema1 = > """{"type":"record","name":"myrecord","fields":[{"name":"str1","type":"string"},{"name":"str2","type":"string"}]}""" > > val avroSchema2 = > """{"type":"record","name":"myrecord","fields":[{"name":"str1","type":"string"},{"name":"str2","type":"string"}]}""" > val df = Seq( > (Array[Byte](10, 97, 112, 112, 108, 101, 49, 0), avroSchema1), > (Array[Byte](10, 97, 112, 112, 108, 101, 50, 0), avroSchema2) > ).toDF("binaryData", "schema") > val parsed = df.select(from_avro($"binaryData", $"schema").as("parsedData")) > parsed.show() > // Output: > // ++ > // | parsedData| > // ++ > // |[apple1, 1.0]| > // |[apple2, 2.0]| > // ++ > {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43201) Inconsistency between from_avro and from_json function
[ https://issues.apache.org/jira/browse/SPARK-43201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philip Adetiloye updated SPARK-43201: - Description: Spark from_avro function does not allow schema parameter to use dataframe column but takes only a String schema: {code:java} def from_avro(col: Column, jsonFormatSchema: String): Column {code} This makes it impossible to deserialize rows of Avro records with different schema since only one schema string could be pass externally. Here is what I would expect: {code:java} def from_avro(col: Column, jsonFormatSchema: Column): Column {code} code example: {code:java} import org.apache.spark.sql.functions.from_avro val avroSchema1 = """{"type":"record","name":"myrecord","fields":[{"name":"str1","type":"string"},{"name":"str2","type":"string"}]}""" val avroSchema2 = """{"type":"record","name":"myrecord","fields":[{"name":"str1","type":"string"},{"name":"str2","type":"string"}]}""" val df = Seq( (Array[Byte](10, 97, 112, 112, 108, 101, 49, 0), avroSchema1), (Array[Byte](10, 97, 112, 112, 108, 101, 50, 0), avroSchema2) ).toDF("binaryData", "schema") val parsed = df.select(from_avro($"binaryData", $"schema").as("parsedData")) parsed.show() // Output: // ++ // | parsedData| // ++ // |[apple1, 1.0]| // |[apple2, 2.0]| // ++ {code} was: Spark from_avro function does not allow schema to use dataframe column but takes a String schema: {code:java} def from_avro(col: Column, jsonFormatSchema: String): Column {code} This makes it impossible to deserialize rows of Avro records with different schema since only one schema string could be pass externally. Here is what I would expect: {code:java} def from_avro(col: Column, jsonFormatSchema: Column): Column {code} code example: {code:java} import org.apache.spark.sql.functions.from_avro val avroSchema1 = """{"type":"record","name":"myrecord","fields":[{"name":"str1","type":"string"},{"name":"str2","type":"string"}]}""" val avroSchema2 = """{"type":"record","name":"myrecord","fields":[{"name":"str1","type":"string"},{"name":"str2","type":"string"}]}""" val df = Seq( (Array[Byte](10, 97, 112, 112, 108, 101, 49, 0), avroSchema1), (Array[Byte](10, 97, 112, 112, 108, 101, 50, 0), avroSchema2) ).toDF("binaryData", "schema") val parsed = df.select(from_avro($"binaryData", $"schema").as("parsedData")) parsed.show() // Output: // ++ // | parsedData| // ++ // |[apple1, 1.0]| // |[apple2, 2.0]| // ++ {code} > Inconsistency between from_avro and from_json function > -- > > Key: SPARK-43201 > URL: https://issues.apache.org/jira/browse/SPARK-43201 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Philip Adetiloye >Priority: Major > > Spark from_avro function does not allow schema parameter to use dataframe > column but takes only a String schema: > {code:java} > def from_avro(col: Column, jsonFormatSchema: String): Column {code} > This makes it impossible to deserialize rows of Avro records with different > schema since only one schema string could be pass externally. > > Here is what I would expect: > {code:java} > def from_avro(col: Column, jsonFormatSchema: Column): Column {code} > code example: > {code:java} > import org.apache.spark.sql.functions.from_avro > val avroSchema1 = > """{"type":"record","name":"myrecord","fields":[{"name":"str1","type":"string"},{"name":"str2","type":"string"}]}""" > > val avroSchema2 = > """{"type":"record","name":"myrecord","fields":[{"name":"str1","type":"string"},{"name":"str2","type":"string"}]}""" > val df = Seq( > (Array[Byte](10, 97, 112, 112, 108, 101, 49, 0), avroSchema1), > (Array[Byte](10, 97, 112, 112, 108, 101, 50, 0), avroSchema2) > ).toDF("binaryData", "schema") > val parsed = df.select(from_avro($"binaryData", $"schema").as("parsedData")) > parsed.show() > // Output: > // ++ > // | parsedData| > // ++ > // |[apple1, 1.0]| > // |[apple2, 2.0]| > // ++ > {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43202) Replace reflection w/ direct calling for YARN Resource API
[ https://issues.apache.org/jira/browse/SPARK-43202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17714707#comment-17714707 ] Mike K commented on SPARK-43202: User 'pan3793' has created a pull request for this issue: https://github.com/apache/spark/pull/40860 > Replace reflection w/ direct calling for YARN Resource API > -- > > Key: SPARK-43202 > URL: https://issues.apache.org/jira/browse/SPARK-43202 > Project: Spark > Issue Type: Sub-task > Components: YARN >Affects Versions: 3.5.0 >Reporter: Cheng Pan >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43221) Executor obtained error information
[ https://issues.apache.org/jira/browse/SPARK-43221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Qiang Yang updated SPARK-43221: --- Description: Spark on Yarn Cluster When multiple executors exist on a node, and the same block exists on both executors, with some in memory and some on disk. Probabilistically, the executor failed to obtain the block,throw Exception: java.lang.ArrayIndexOutofBoundsException: 0 at org.apache.spark.broadcast.TorrentBroadcast.$anonfun$readBlocks$1(TorrentBroadcast.scala:183) Next, I will replay the process of the problem occurring: step 1: The executor requests the driver to obtain block information(locationsAndStatusOption). The input parameters are BlockId and the host of its own node. Please note that it does not carry port information line:1092 !image-2023-04-21-00-24-22-059.png! step 2: On the driver side, the driver obtains all blockManagers holding the block based on the BlockId. For non remote shuffle scenarios, the driver will retrieve the first one with the blockId and blockManager from the locations Assuming that there are two BlockManagers holding the BlockId on this node, BM-1 holds the Block and stores it in memory, and BM-2 holds the Block and stores it in disk Assuming the returned status is of type memory and its disksize is 0 line: 852, 856 !image-2023-04-21-00-30-41-851.png! step 3: This method will return a BlockLocationsAndStatus object. If there are BMs using disk, the disk's path information will be stored in localDirs !image-2023-04-21-00-50-10-918.png! step 4: When the executor obtains locationsAndStatusOption, localDirs is not empty, but status.diskSize is 0 line: 1102 !image-2023-04-21-00-54-11-968.png! step 5: The readDiskBlockFromSameHostExecutor only determines whether the Block file exists, and then directly uses the incoming blocksize to read the byte array. If the blocksize is 0, it returns an empty byte array Only checked if the file exists line: 1234, 1240 !image-2023-04-21-00-57-29-140.png! Taking values from an empty array, causing an out of bounds problem was: Spark on Yarn Cluster When multiple executors exist on a node, and the same block exists on both executors, with some in memory and some on disk. Probabilistically, the executor failed to obtain the block,throw Exception: java.lang.ArrayIndexOutofBoundsException: 0 at org.apache.spark.broadcast.TorrentBroadcast.$anonfun$readBlocks$1(TorrentBroadcast.scala:183) Next, I will replay the process of the problem occurring: step 1: The executor requests the driver to obtain block information(locationsAndStatusOption). The input parameters are BlockId and the host of its own node. Please note that it does not carry port information line:1092 !image-2023-04-21-00-24-22-059.png! step 2: On the driver side, the driver obtains all blockManagers holding the block based on the BlockId. For non remote shuffle scenarios, the driver will retrieve the first one with the blockId and blockManager from the locations Assuming that there are two BlockManagers holding the BlockId on this node, BM-1 holds the Block and stores it in memory, and BM-2 holds the Block and stores it in disk Assuming the returned status is of type memory and its disksize is 0 line: 852, 856 !image-2023-04-21-00-30-41-851.png! step 3: This method will return a BlockLocationsAndStatus object. If there are BMs using disk, the disk's path information will be stored in localDirs !image-2023-04-21-00-50-10-918.png! step 4: When the executor obtains locationsAndStatusOption, localDirs is not empty, but status.diskSize is 0 line: 1102 !image-2023-04-21-00-54-11-968.png! step 5: The readDiskBlockFromSameHostExecutor only determines whether the Block file exists, and then directly uses the incoming blocksize to read the byte array. If the blocksize is 0, it returns an empty byte array only check line: 1234, 1240 !image-2023-04-21-00-57-29-140.png! > Executor obtained error information > > > Key: SPARK-43221 > URL: https://issues.apache.org/jira/browse/SPARK-43221 > Project: Spark > Issue Type: Bug > Components: Block Manager >Affects Versions: 3.1.1, 3.2.0, 3.3.0 >Reporter: Qiang Yang >Priority: Major > Attachments: image-2023-04-21-00-19-58-021.png, > image-2023-04-21-00-24-22-059.png, image-2023-04-21-00-30-41-851.png, > image-2023-04-21-00-50-10-918.png, image-2023-04-21-00-53-20-720.png, > image-2023-04-21-00-54-11-968.png, image-2023-04-21-00-57-29-140.png > > Original Estimate: 24h > Remaining Estimate: 24h > > Spark on Yarn Cluster > When multiple executors exist on a node, and the same block exists on both > executors, with some in memory and some on disk. > Probabilistically, the executor failed to obtain the block,throw
[jira] [Updated] (SPARK-43221) Executor obtained error information
[ https://issues.apache.org/jira/browse/SPARK-43221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Qiang Yang updated SPARK-43221: --- Description: Spark on Yarn Cluster When multiple executors exist on a node, and the same block exists on both executors, with some in memory and some on disk. Probabilistically, the executor failed to obtain the block,throw Exception: java.lang.ArrayIndexOutofBoundsException: 0 at org.apache.spark.broadcast.TorrentBroadcast.$anonfun$readBlocks$1(TorrentBroadcast.scala:183) Next, I will replay the process of the problem occurring: step 1: The executor requests the driver to obtain block information(locationsAndStatusOption). The input parameters are BlockId and the host of its own node. Please note that it does not carry port information line:1092 !image-2023-04-21-00-24-22-059.png! step 2: On the driver side, the driver obtains all blockManagers holding the block based on the BlockId. For non remote shuffle scenarios, the driver will retrieve the first one with the blockId and blockManager from the locations Assuming that there are two BlockManagers holding the BlockId on this node, BM-1 holds the Block and stores it in memory, and BM-2 holds the Block and stores it in disk Assuming the returned status is of type memory and its disksize is 0 line: 852, 856 !image-2023-04-21-00-30-41-851.png! step 3: This method will return a BlockLocationsAndStatus object. If there are BMs using disk, the disk's path information will be stored in localDirs !image-2023-04-21-00-50-10-918.png! step 4: When the executor obtains locationsAndStatusOption, localDirs is not empty, but status.diskSize is 0 line: 1102 !image-2023-04-21-00-54-11-968.png! step 5: The readDiskBlockFromSameHostExecutor only determines whether the Block file exists, and then directly uses the incoming blocksize to read the byte array. If the blocksize is 0, it returns an empty byte array only check line: 1234, 1240 !image-2023-04-21-00-57-29-140.png! was: Spark on Yarn Cluster When multiple executors exist on a node, and the same block exists on both executors, with some in memory and some on disk. Probabilistically, the executor failed to obtain the block,throw Exception: java.lang.ArrayIndexOutofBoundsException: 0 at org.apache.spark.broadcast.TorrentBroadcast.$anonfun$readBlocks$1(TorrentBroadcast.scala:183) Next, I will replay the process of the problem occurring: step 1: The executor requests the driver to obtain block information(locationsAndStatusOption). The input parameters are BlockId and the host of its own node. Please note that it does not carry port information line:1092 !image-2023-04-21-00-24-22-059.png! step 2: On the driver side, the driver obtains all blockManagers holding the block based on the BlockId. For non remote shuffle scenarios, the driver will retrieve the first one with the blockId and blockManager from the locations Assuming that there are two BlockManagers holding the BlockId on this node, BM-1 holds the Block and stores it in memory, and BM-2 holds the Block and stores it in disk line: 852, 856 !image-2023-04-21-00-30-41-851.png! step 3: > Executor obtained error information > > > Key: SPARK-43221 > URL: https://issues.apache.org/jira/browse/SPARK-43221 > Project: Spark > Issue Type: Bug > Components: Block Manager >Affects Versions: 3.1.1, 3.2.0, 3.3.0 >Reporter: Qiang Yang >Priority: Major > Attachments: image-2023-04-21-00-19-58-021.png, > image-2023-04-21-00-24-22-059.png, image-2023-04-21-00-30-41-851.png, > image-2023-04-21-00-50-10-918.png, image-2023-04-21-00-53-20-720.png, > image-2023-04-21-00-54-11-968.png, image-2023-04-21-00-57-29-140.png > > Original Estimate: 24h > Remaining Estimate: 24h > > Spark on Yarn Cluster > When multiple executors exist on a node, and the same block exists on both > executors, with some in memory and some on disk. > Probabilistically, the executor failed to obtain the block,throw Exception: > java.lang.ArrayIndexOutofBoundsException: 0 > at > org.apache.spark.broadcast.TorrentBroadcast.$anonfun$readBlocks$1(TorrentBroadcast.scala:183) > > Next, I will replay the process of the problem occurring: > step 1: > The executor requests the driver to obtain block > information(locationsAndStatusOption). The input parameters are BlockId and > the host of its own node. Please note that it does not carry port information > line:1092 > !image-2023-04-21-00-24-22-059.png! > step 2: > On the driver side, the driver obtains all blockManagers holding the block > based on the BlockId. For non remote shuffle scenarios, the driver will > retrieve the first one with the blockId and blockManager from the locations > Assuming that there are two BlockManagers holding
[jira] [Updated] (SPARK-43221) Executor obtained error information
[ https://issues.apache.org/jira/browse/SPARK-43221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Qiang Yang updated SPARK-43221: --- Attachment: image-2023-04-21-00-57-29-140.png > Executor obtained error information > > > Key: SPARK-43221 > URL: https://issues.apache.org/jira/browse/SPARK-43221 > Project: Spark > Issue Type: Bug > Components: Block Manager >Affects Versions: 3.1.1, 3.2.0, 3.3.0 >Reporter: Qiang Yang >Priority: Major > Attachments: image-2023-04-21-00-19-58-021.png, > image-2023-04-21-00-24-22-059.png, image-2023-04-21-00-30-41-851.png, > image-2023-04-21-00-50-10-918.png, image-2023-04-21-00-53-20-720.png, > image-2023-04-21-00-54-11-968.png, image-2023-04-21-00-57-29-140.png > > Original Estimate: 24h > Remaining Estimate: 24h > > Spark on Yarn Cluster > When multiple executors exist on a node, and the same block exists on both > executors, with some in memory and some on disk. > Probabilistically, the executor failed to obtain the block,throw Exception: > java.lang.ArrayIndexOutofBoundsException: 0 > at > org.apache.spark.broadcast.TorrentBroadcast.$anonfun$readBlocks$1(TorrentBroadcast.scala:183) > > Next, I will replay the process of the problem occurring: > step 1: > The executor requests the driver to obtain block > information(locationsAndStatusOption). The input parameters are BlockId and > the host of its own node. Please note that it does not carry port information > line:1092 > !image-2023-04-21-00-24-22-059.png! > step 2: > On the driver side, the driver obtains all blockManagers holding the block > based on the BlockId. For non remote shuffle scenarios, the driver will > retrieve the first one with the blockId and blockManager from the locations > Assuming that there are two BlockManagers holding the BlockId on this node, > BM-1 holds the Block and stores it in memory, and BM-2 holds the Block and > stores it in disk > line: 852, 856 > !image-2023-04-21-00-30-41-851.png! > step 3: > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43221) Executor obtained error information
[ https://issues.apache.org/jira/browse/SPARK-43221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Qiang Yang updated SPARK-43221: --- Attachment: image-2023-04-21-00-54-11-968.png > Executor obtained error information > > > Key: SPARK-43221 > URL: https://issues.apache.org/jira/browse/SPARK-43221 > Project: Spark > Issue Type: Bug > Components: Block Manager >Affects Versions: 3.1.1, 3.2.0, 3.3.0 >Reporter: Qiang Yang >Priority: Major > Attachments: image-2023-04-21-00-19-58-021.png, > image-2023-04-21-00-24-22-059.png, image-2023-04-21-00-30-41-851.png, > image-2023-04-21-00-50-10-918.png, image-2023-04-21-00-53-20-720.png, > image-2023-04-21-00-54-11-968.png > > Original Estimate: 24h > Remaining Estimate: 24h > > Spark on Yarn Cluster > When multiple executors exist on a node, and the same block exists on both > executors, with some in memory and some on disk. > Probabilistically, the executor failed to obtain the block,throw Exception: > java.lang.ArrayIndexOutofBoundsException: 0 > at > org.apache.spark.broadcast.TorrentBroadcast.$anonfun$readBlocks$1(TorrentBroadcast.scala:183) > > Next, I will replay the process of the problem occurring: > step 1: > The executor requests the driver to obtain block > information(locationsAndStatusOption). The input parameters are BlockId and > the host of its own node. Please note that it does not carry port information > line:1092 > !image-2023-04-21-00-24-22-059.png! > step 2: > On the driver side, the driver obtains all blockManagers holding the block > based on the BlockId. For non remote shuffle scenarios, the driver will > retrieve the first one with the blockId and blockManager from the locations > Assuming that there are two BlockManagers holding the BlockId on this node, > BM-1 holds the Block and stores it in memory, and BM-2 holds the Block and > stores it in disk > line: 852, 856 > !image-2023-04-21-00-30-41-851.png! > step 3: > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-43208) IsolatedClassLoader should close barrier class InputStream after reading
[ https://issues.apache.org/jira/browse/SPARK-43208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun reassigned SPARK-43208: Assignee: Cheng Pan > IsolatedClassLoader should close barrier class InputStream after reading > > > Key: SPARK-43208 > URL: https://issues.apache.org/jira/browse/SPARK-43208 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: Cheng Pan >Assignee: Cheng Pan >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43221) Executor obtained error information
[ https://issues.apache.org/jira/browse/SPARK-43221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Qiang Yang updated SPARK-43221: --- Attachment: image-2023-04-21-00-53-20-720.png > Executor obtained error information > > > Key: SPARK-43221 > URL: https://issues.apache.org/jira/browse/SPARK-43221 > Project: Spark > Issue Type: Bug > Components: Block Manager >Affects Versions: 3.1.1, 3.2.0, 3.3.0 >Reporter: Qiang Yang >Priority: Major > Attachments: image-2023-04-21-00-19-58-021.png, > image-2023-04-21-00-24-22-059.png, image-2023-04-21-00-30-41-851.png, > image-2023-04-21-00-50-10-918.png, image-2023-04-21-00-53-20-720.png > > Original Estimate: 24h > Remaining Estimate: 24h > > Spark on Yarn Cluster > When multiple executors exist on a node, and the same block exists on both > executors, with some in memory and some on disk. > Probabilistically, the executor failed to obtain the block,throw Exception: > java.lang.ArrayIndexOutofBoundsException: 0 > at > org.apache.spark.broadcast.TorrentBroadcast.$anonfun$readBlocks$1(TorrentBroadcast.scala:183) > > Next, I will replay the process of the problem occurring: > step 1: > The executor requests the driver to obtain block > information(locationsAndStatusOption). The input parameters are BlockId and > the host of its own node. Please note that it does not carry port information > line:1092 > !image-2023-04-21-00-24-22-059.png! > step 2: > On the driver side, the driver obtains all blockManagers holding the block > based on the BlockId. For non remote shuffle scenarios, the driver will > retrieve the first one with the blockId and blockManager from the locations > Assuming that there are two BlockManagers holding the BlockId on this node, > BM-1 holds the Block and stores it in memory, and BM-2 holds the Block and > stores it in disk > line: 852, 856 > !image-2023-04-21-00-30-41-851.png! > step 3: > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43208) IsolatedClassLoader should close barrier class InputStream after reading
[ https://issues.apache.org/jira/browse/SPARK-43208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun resolved SPARK-43208. -- Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 40867 [https://github.com/apache/spark/pull/40867] > IsolatedClassLoader should close barrier class InputStream after reading > > > Key: SPARK-43208 > URL: https://issues.apache.org/jira/browse/SPARK-43208 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: Cheng Pan >Assignee: Cheng Pan >Priority: Major > Fix For: 3.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43221) Executor obtained error information
[ https://issues.apache.org/jira/browse/SPARK-43221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Qiang Yang updated SPARK-43221: --- Attachment: image-2023-04-21-00-50-10-918.png > Executor obtained error information > > > Key: SPARK-43221 > URL: https://issues.apache.org/jira/browse/SPARK-43221 > Project: Spark > Issue Type: Bug > Components: Block Manager >Affects Versions: 3.1.1, 3.2.0, 3.3.0 >Reporter: Qiang Yang >Priority: Major > Attachments: image-2023-04-21-00-19-58-021.png, > image-2023-04-21-00-24-22-059.png, image-2023-04-21-00-30-41-851.png, > image-2023-04-21-00-50-10-918.png > > Original Estimate: 24h > Remaining Estimate: 24h > > Spark on Yarn Cluster > When multiple executors exist on a node, and the same block exists on both > executors, with some in memory and some on disk. > Probabilistically, the executor failed to obtain the block,throw Exception: > java.lang.ArrayIndexOutofBoundsException: 0 > at > org.apache.spark.broadcast.TorrentBroadcast.$anonfun$readBlocks$1(TorrentBroadcast.scala:183) > > Next, I will replay the process of the problem occurring: > step 1: > The executor requests the driver to obtain block > information(locationsAndStatusOption). The input parameters are BlockId and > the host of its own node. Please note that it does not carry port information > line:1092 > !image-2023-04-21-00-24-22-059.png! > step 2: > On the driver side, the driver obtains all blockManagers holding the block > based on the BlockId. For non remote shuffle scenarios, the driver will > retrieve the first one with the blockId and blockManager from the locations > Assuming that there are two BlockManagers holding the BlockId on this node, > BM-1 holds the Block and stores it in memory, and BM-2 holds the Block and > stores it in disk > line: 852, 856 > !image-2023-04-21-00-30-41-851.png! > step 3: > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43221) Executor obtained error information
[ https://issues.apache.org/jira/browse/SPARK-43221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Qiang Yang updated SPARK-43221: --- Description: Spark on Yarn Cluster When multiple executors exist on a node, and the same block exists on both executors, with some in memory and some on disk. Probabilistically, the executor failed to obtain the block,throw Exception: java.lang.ArrayIndexOutofBoundsException: 0 at org.apache.spark.broadcast.TorrentBroadcast.$anonfun$readBlocks$1(TorrentBroadcast.scala:183) Next, I will replay the process of the problem occurring: step 1: The executor requests the driver to obtain block information(locationsAndStatusOption). The input parameters are BlockId and the host of its own node. Please note that it does not carry port information line:1092 !image-2023-04-21-00-24-22-059.png! step 2: On the driver side, the driver obtains all blockManagers holding the block based on the BlockId. For non remote shuffle scenarios, the driver will retrieve the first one with the blockId and blockManager from the locations Assuming that there are two BlockManagers holding the BlockId on this node, BM-1 holds the Block and stores it in memory, and BM-2 holds the Block and stores it in disk line: 852, 856 !image-2023-04-21-00-30-41-851.png! step 3: was: Spark on Yarn Cluster When multiple executors exist on a node, and the same block exists on both executors, with some in memory and some on disk. Probabilistically, the executor failed to obtain the block,throw Exception: java.lang.ArrayIndexOutofBoundsException: 0 at org.apache.spark.broadcast.TorrentBroadcast.$anonfun$readBlocks$1(TorrentBroadcast.scala:183) Next, I will replay the process of the problem occurring: step 1: The executor requests the driver to obtain block information(locationsAndStatusOption). The input parameters are BlockId and the host of its own node. Please note that it does not carry port information code: !image-2023-04-21-00-19-58-021.png! step 2: > Executor obtained error information > > > Key: SPARK-43221 > URL: https://issues.apache.org/jira/browse/SPARK-43221 > Project: Spark > Issue Type: Bug > Components: Block Manager >Affects Versions: 3.1.1, 3.2.0, 3.3.0 >Reporter: Qiang Yang >Priority: Major > Attachments: image-2023-04-21-00-19-58-021.png, > image-2023-04-21-00-24-22-059.png, image-2023-04-21-00-30-41-851.png > > Original Estimate: 24h > Remaining Estimate: 24h > > Spark on Yarn Cluster > When multiple executors exist on a node, and the same block exists on both > executors, with some in memory and some on disk. > Probabilistically, the executor failed to obtain the block,throw Exception: > java.lang.ArrayIndexOutofBoundsException: 0 > at > org.apache.spark.broadcast.TorrentBroadcast.$anonfun$readBlocks$1(TorrentBroadcast.scala:183) > > Next, I will replay the process of the problem occurring: > step 1: > The executor requests the driver to obtain block > information(locationsAndStatusOption). The input parameters are BlockId and > the host of its own node. Please note that it does not carry port information > line:1092 > !image-2023-04-21-00-24-22-059.png! > step 2: > On the driver side, the driver obtains all blockManagers holding the block > based on the BlockId. For non remote shuffle scenarios, the driver will > retrieve the first one with the blockId and blockManager from the locations > Assuming that there are two BlockManagers holding the BlockId on this node, > BM-1 holds the Block and stores it in memory, and BM-2 holds the Block and > stores it in disk > line: 852, 856 > !image-2023-04-21-00-30-41-851.png! > step 3: > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43222) Remove check of `isHadoop3`
[ https://issues.apache.org/jira/browse/SPARK-43222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie updated SPARK-43222: - Component/s: (was: YARN) > Remove check of `isHadoop3` > --- > > Key: SPARK-43222 > URL: https://issues.apache.org/jira/browse/SPARK-43222 > Project: Spark > Issue Type: Sub-task > Components: Spark Core, SQL >Affects Versions: 3.5.0 >Reporter: Yang Jie >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43223) KeyValueGroupedDataset#agg
Zhen Li created SPARK-43223: --- Summary: KeyValueGroupedDataset#agg Key: SPARK-43223 URL: https://issues.apache.org/jira/browse/SPARK-43223 Project: Spark Issue Type: Improvement Components: Connect Affects Versions: 3.5.0 Reporter: Zhen Li Adding missing agg functions in the KVGDS API -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43221) Executor obtained error information
[ https://issues.apache.org/jira/browse/SPARK-43221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Qiang Yang updated SPARK-43221: --- Attachment: image-2023-04-21-00-30-41-851.png > Executor obtained error information > > > Key: SPARK-43221 > URL: https://issues.apache.org/jira/browse/SPARK-43221 > Project: Spark > Issue Type: Bug > Components: Block Manager >Affects Versions: 3.1.1, 3.2.0, 3.3.0 >Reporter: Qiang Yang >Priority: Major > Attachments: image-2023-04-21-00-19-58-021.png, > image-2023-04-21-00-24-22-059.png, image-2023-04-21-00-30-41-851.png > > Original Estimate: 24h > Remaining Estimate: 24h > > Spark on Yarn Cluster > When multiple executors exist on a node, and the same block exists on both > executors, with some in memory and some on disk. > Probabilistically, the executor failed to obtain the block,throw Exception: > java.lang.ArrayIndexOutofBoundsException: 0 > at > org.apache.spark.broadcast.TorrentBroadcast.$anonfun$readBlocks$1(TorrentBroadcast.scala:183) > > Next, I will replay the process of the problem occurring: > step 1: > The executor requests the driver to obtain block > information(locationsAndStatusOption). The input parameters are BlockId and > the host of its own node. Please note that it does not carry port information > code: !image-2023-04-21-00-19-58-021.png! > step 2: > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-35959) Add a new Maven profile "no-shaded-client" for older Hadoop 3.x versions
[ https://issues.apache.org/jira/browse/SPARK-35959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sun Chao resolved SPARK-35959. -- Resolution: Won't Fix > Add a new Maven profile "no-shaded-client" for older Hadoop 3.x versions > - > > Key: SPARK-35959 > URL: https://issues.apache.org/jira/browse/SPARK-35959 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.2.0 >Reporter: Chao Sun >Priority: Major > > Currently Spark uses Hadoop shaded client by default. However, if Spark users > want to build Spark with older version of Hadoop, such as 3.1.x, the shaded > client cannot be used (currently it only support Hadoop 3.2.2+ and 3.3.1+). > Therefore, this proposes to offer a new Maven profile "no-shaded-client" for > this use case. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43221) Executor obtained error information
[ https://issues.apache.org/jira/browse/SPARK-43221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Qiang Yang updated SPARK-43221: --- Attachment: image-2023-04-21-00-24-22-059.png > Executor obtained error information > > > Key: SPARK-43221 > URL: https://issues.apache.org/jira/browse/SPARK-43221 > Project: Spark > Issue Type: Bug > Components: Block Manager >Affects Versions: 3.1.1, 3.2.0, 3.3.0 >Reporter: Qiang Yang >Priority: Major > Attachments: image-2023-04-21-00-19-58-021.png, > image-2023-04-21-00-24-22-059.png > > Original Estimate: 24h > Remaining Estimate: 24h > > Spark on Yarn Cluster > When multiple executors exist on a node, and the same block exists on both > executors, with some in memory and some on disk. > Probabilistically, the executor failed to obtain the block,throw Exception: > java.lang.ArrayIndexOutofBoundsException: 0 > at > org.apache.spark.broadcast.TorrentBroadcast.$anonfun$readBlocks$1(TorrentBroadcast.scala:183) > > Next, I will replay the process of the problem occurring: > step 1: > The executor requests the driver to obtain block > information(locationsAndStatusOption). The input parameters are BlockId and > the host of its own node. Please note that it does not carry port information > code: !image-2023-04-21-00-19-58-021.png! > step 2: > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43221) Executor obtained error information
[ https://issues.apache.org/jira/browse/SPARK-43221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Qiang Yang updated SPARK-43221: --- Attachment: image-2023-04-21-00-19-58-021.png > Executor obtained error information > > > Key: SPARK-43221 > URL: https://issues.apache.org/jira/browse/SPARK-43221 > Project: Spark > Issue Type: Bug > Components: Block Manager >Affects Versions: 3.1.1, 3.2.0, 3.3.0 >Reporter: Qiang Yang >Priority: Major > Attachments: image-2023-04-21-00-19-58-021.png > > Original Estimate: 24h > Remaining Estimate: 24h > > Spark on Yarn Cluster > When multiple executors exist on a node, and the same block exists on both > executors, with some in memory and some on disk. > Probabilistically, the executor failed to obtain the block,throw Exception: > java.lang.ArrayIndexOutofBoundsException: 0 > at > org.apache.spark.broadcast.TorrentBroadcast.$anonfun$readBlocks$1(TorrentBroadcast.scala:183) > > Next, I will replay the process of the problem occurring: > step 1: > The executor requests the driver to obtain block > information(locationsAndStatusOption). The input parameters are BlockId and > the host of its own node. Please note that it does not carry port information > code: !image-2023-04-21-00-19-58-021.png! > step 2: > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43221) Executor obtained error information
[ https://issues.apache.org/jira/browse/SPARK-43221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Qiang Yang updated SPARK-43221: --- Description: Spark on Yarn Cluster When multiple executors exist on a node, and the same block exists on both executors, with some in memory and some on disk. Probabilistically, the executor failed to obtain the block,throw Exception: java.lang.ArrayIndexOutofBoundsException: 0 at org.apache.spark.broadcast.TorrentBroadcast.$anonfun$readBlocks$1(TorrentBroadcast.scala:183) Next, I will replay the process of the problem occurring: step 1: The executor requests the driver to obtain block information(locationsAndStatusOption). The input parameters are BlockId and the host of its own node. Please note that it does not carry port information code: !image-2023-04-21-00-19-58-021.png! step 2: was: Spark on Yarn Cluster When multiple executors exist on a node, and the same block exists on both executors, with some in memory and some on disk. Probabilistically, the executor failed to obtain the block,throw Exception: java.lang.ArrayIndexOutofBoundsException: 0 at org.apache.spark.broadcast.TorrentBroadcast.$anonfun$readBlocks$1(TorrentBroadcast.scala:183) Next, I will replay the process of the problem occurring: step 1: The executor requests the driver to obtain block information(locationsAndStatusOption). The input parameters are BlockId and the host of its own node. Please note that it does not carry port information {code:java} // private[spark] def getRemoteBlock[T]( blockId: BlockId, bufferTransformer: ManagedBuffer => T): Option[T] = { logDebug(s"Getting remote block $blockId") require(blockId != null, "BlockId is null") // Because all the remote blocks are registered in driver, it is not necessary to ask // all the storage endpoints to get block status. val locationsAndStatusOption = master.getLocationsAndStatus(blockId, blockManagerId.host) {code} step 2: > Executor obtained error information > > > Key: SPARK-43221 > URL: https://issues.apache.org/jira/browse/SPARK-43221 > Project: Spark > Issue Type: Bug > Components: Block Manager >Affects Versions: 3.1.1, 3.2.0, 3.3.0 >Reporter: Qiang Yang >Priority: Major > Attachments: image-2023-04-21-00-19-58-021.png > > Original Estimate: 24h > Remaining Estimate: 24h > > Spark on Yarn Cluster > When multiple executors exist on a node, and the same block exists on both > executors, with some in memory and some on disk. > Probabilistically, the executor failed to obtain the block,throw Exception: > java.lang.ArrayIndexOutofBoundsException: 0 > at > org.apache.spark.broadcast.TorrentBroadcast.$anonfun$readBlocks$1(TorrentBroadcast.scala:183) > > Next, I will replay the process of the problem occurring: > step 1: > The executor requests the driver to obtain block > information(locationsAndStatusOption). The input parameters are BlockId and > the host of its own node. Please note that it does not carry port information > code: !image-2023-04-21-00-19-58-021.png! > step 2: > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43222) Remove check of `isHadoop3`
[ https://issues.apache.org/jira/browse/SPARK-43222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie updated SPARK-43222: - Summary: Remove check of `isHadoop3` (was: Remove check of `VersionUtils.isHadoop3`) > Remove check of `isHadoop3` > --- > > Key: SPARK-43222 > URL: https://issues.apache.org/jira/browse/SPARK-43222 > Project: Spark > Issue Type: Sub-task > Components: Spark Core, SQL, YARN >Affects Versions: 3.5.0 >Reporter: Yang Jie >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43221) Executor obtained error information
[ https://issues.apache.org/jira/browse/SPARK-43221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Qiang Yang updated SPARK-43221: --- Description: Spark on Yarn Cluster When multiple executors exist on a node, and the same block exists on both executors, with some in memory and some on disk. Probabilistically, the executor failed to obtain the block,throw Exception: java.lang.ArrayIndexOutofBoundsException: 0 at org.apache.spark.broadcast.TorrentBroadcast.$anonfun$readBlocks$1(TorrentBroadcast.scala:183) Next, I will replay the process of the problem occurring: step 1: The executor requests the driver to obtain block information(locationsAndStatusOption). The input parameters are BlockId and the host of its own node. Please note that it does not carry port information {code:java} // private[spark] def getRemoteBlock[T]( blockId: BlockId, bufferTransformer: ManagedBuffer => T): Option[T] = { logDebug(s"Getting remote block $blockId") require(blockId != null, "BlockId is null") // Because all the remote blocks are registered in driver, it is not necessary to ask // all the storage endpoints to get block status. val locationsAndStatusOption = master.getLocationsAndStatus(blockId, blockManagerId.host) {code} step 2: was: Spark on Yarn Cluster When multiple executors exist on a node, and the same block exists on both executors, with some in memory and some on disk. Probabilistically, the executor failed to obtain the block,throw Exception: java.lang.ArrayIndexOutofBoundsException: 0 at org.apache.spark.broadcast.TorrentBroadcast.$anonfun$readBlocks$1(TorrentBroadcast.scala:183) Next, I will replay the process of the problem occurring: step 1: The executor requests the driver to obtain block information(locationsAndStatusOption). The input parameters are BlockId and the host of its own node. Please note that it does not carry port information step 2: > Executor obtained error information > > > Key: SPARK-43221 > URL: https://issues.apache.org/jira/browse/SPARK-43221 > Project: Spark > Issue Type: Bug > Components: Block Manager >Affects Versions: 3.1.1, 3.2.0, 3.3.0 >Reporter: Qiang Yang >Priority: Major > Original Estimate: 24h > Remaining Estimate: 24h > > Spark on Yarn Cluster > When multiple executors exist on a node, and the same block exists on both > executors, with some in memory and some on disk. > Probabilistically, the executor failed to obtain the block,throw Exception: > java.lang.ArrayIndexOutofBoundsException: 0 > at > org.apache.spark.broadcast.TorrentBroadcast.$anonfun$readBlocks$1(TorrentBroadcast.scala:183) > > Next, I will replay the process of the problem occurring: > step 1: > The executor requests the driver to obtain block > information(locationsAndStatusOption). The input parameters are BlockId and > the host of its own node. Please note that it does not carry port information > {code:java} > // private[spark] def getRemoteBlock[T]( > blockId: BlockId, > bufferTransformer: ManagedBuffer => T): Option[T] = { > logDebug(s"Getting remote block $blockId") > require(blockId != null, "BlockId is null") // Because all the remote > blocks are registered in driver, it is not necessary to ask > // all the storage endpoints to get block status. > val locationsAndStatusOption = master.getLocationsAndStatus(blockId, > blockManagerId.host) {code} > step 2: > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43221) Executor obtained error information
[ https://issues.apache.org/jira/browse/SPARK-43221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Qiang Yang updated SPARK-43221: --- Description: Spark on Yarn Cluster When multiple executors exist on a node, and the same block exists on both executors, with some in memory and some on disk. Probabilistically, the executor failed to obtain the block,throw Exception: java.lang.ArrayIndexOutofBoundsException: 0 at org.apache.spark.broadcast.TorrentBroadcast.$anonfun$readBlocks$1(TorrentBroadcast.scala:183) Next, I will replay the process of the problem occurring: step 1: The executor requests the driver to obtain block information(locationsAndStatusOption). The input parameters are BlockId and the host of its own node. Please note that it does not carry port information step 2: was: Spark on Yarn Cluster When multiple executors exist on a node, and the same block exists on both executors, with some in memory and some on disk. Probabilistically, the executor failed to obtain the block,throw Exception: java.lang.ArrayIndexOutofBoundsException: 0 at org.apache.spark.broadcast.TorrentBroadcast.$anonfun$readBlocks$1(TorrentBroadcast.scala:183) Next, I will replay the process of the problem occurring: step 1: The executor requests the driver to obtain block information(locationsAndStatusOption). The input parameters are BlockId and the host of its own node. Please note that it does not carry port information !image-2023-04-21-00-07-51-400.png! step 2: > Executor obtained error information > > > Key: SPARK-43221 > URL: https://issues.apache.org/jira/browse/SPARK-43221 > Project: Spark > Issue Type: Bug > Components: Block Manager >Affects Versions: 3.1.1, 3.2.0, 3.3.0 >Reporter: Qiang Yang >Priority: Major > Original Estimate: 24h > Remaining Estimate: 24h > > Spark on Yarn Cluster > When multiple executors exist on a node, and the same block exists on both > executors, with some in memory and some on disk. > Probabilistically, the executor failed to obtain the block,throw Exception: > java.lang.ArrayIndexOutofBoundsException: 0 > at > org.apache.spark.broadcast.TorrentBroadcast.$anonfun$readBlocks$1(TorrentBroadcast.scala:183) > > Next, I will replay the process of the problem occurring: > step 1: > The executor requests the driver to obtain block > information(locationsAndStatusOption). The input parameters are BlockId and > the host of its own node. Please note that it does not carry port information > > step 2: > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43221) Executor obtained error information
[ https://issues.apache.org/jira/browse/SPARK-43221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Qiang Yang updated SPARK-43221: --- Description: Spark on Yarn Cluster When multiple executors exist on a node, and the same block exists on both executors, with some in memory and some on disk. Probabilistically, the executor failed to obtain the block,throw Exception: java.lang.ArrayIndexOutofBoundsException: 0 at org.apache.spark.broadcast.TorrentBroadcast.$anonfun$readBlocks$1(TorrentBroadcast.scala:183) Next, I will replay the process of the problem occurring: step 1: The executor requests the driver to obtain block information(locationsAndStatusOption). The input parameters are BlockId and the host of its own node. Please note that it does not carry port information !image-2023-04-21-00-07-51-400.png! step 2: was: Spark on Yarn Cluster When multiple executors exist on a node, and the same block exists on both executors, with some in memory and some on disk. Probabilistically, the executor failed to obtain the block,throw Exception: java.lang.ArrayIndexOutofBoundsException: 0 at org.apache.spark.broadcast.TorrentBroadcast.$anonfun$readBlocks$1(TorrentBroadcast.scala:183) > Executor obtained error information > > > Key: SPARK-43221 > URL: https://issues.apache.org/jira/browse/SPARK-43221 > Project: Spark > Issue Type: Bug > Components: Block Manager >Affects Versions: 3.1.1, 3.2.0, 3.3.0 >Reporter: Qiang Yang >Priority: Major > Original Estimate: 24h > Remaining Estimate: 24h > > Spark on Yarn Cluster > When multiple executors exist on a node, and the same block exists on both > executors, with some in memory and some on disk. > Probabilistically, the executor failed to obtain the block,throw Exception: > java.lang.ArrayIndexOutofBoundsException: 0 > at > org.apache.spark.broadcast.TorrentBroadcast.$anonfun$readBlocks$1(TorrentBroadcast.scala:183) > > Next, I will replay the process of the problem occurring: > step 1: > The executor requests the driver to obtain block > information(locationsAndStatusOption). The input parameters are BlockId and > the host of its own node. Please note that it does not carry port information > !image-2023-04-21-00-07-51-400.png! > step 2: > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43222) Remove check of `VersionUtils.isHadoop3`
Yang Jie created SPARK-43222: Summary: Remove check of `VersionUtils.isHadoop3` Key: SPARK-43222 URL: https://issues.apache.org/jira/browse/SPARK-43222 Project: Spark Issue Type: Sub-task Components: Spark Core, SQL, YARN Affects Versions: 3.5.0 Reporter: Yang Jie -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43221) Executor obtained error information
Qiang Yang created SPARK-43221: -- Summary: Executor obtained error information Key: SPARK-43221 URL: https://issues.apache.org/jira/browse/SPARK-43221 Project: Spark Issue Type: Bug Components: Block Manager Affects Versions: 3.3.0, 3.2.0, 3.1.1 Reporter: Qiang Yang Spark on Yarn Cluster When multiple executors exist on a node, and the same block exists on both executors, with some in memory and some on disk. Probabilistically, the executor failed to obtain the block,throw Exception: java.lang.ArrayIndexOutofBoundsException: 0 at org.apache.spark.broadcast.TorrentBroadcast.$anonfun$readBlocks$1(TorrentBroadcast.scala:183) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43197) Clean up the code written for compatibility with Hadoop 2
[ https://issues.apache.org/jira/browse/SPARK-43197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17714647#comment-17714647 ] GridGain Integration commented on SPARK-43197: -- User 'pan3793' has created a pull request for this issue: https://github.com/apache/spark/pull/40860 > Clean up the code written for compatibility with Hadoop 2 > - > > Key: SPARK-43197 > URL: https://issues.apache.org/jira/browse/SPARK-43197 > Project: Spark > Issue Type: Umbrella > Components: Spark Core, SQL, YARN >Affects Versions: 3.5.0 >Reporter: Yang Jie >Priority: Major > > SPARK-42452 removed support for Hadoop2, we can clean up the code written for > compatibility with Hadoop 2 to make it more concise -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43215) Remove `ResourceRequestHelper#isYarnResourceTypesAvailable`
[ https://issues.apache.org/jira/browse/SPARK-43215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17714646#comment-17714646 ] GridGain Integration commented on SPARK-43215: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/40876 > Remove `ResourceRequestHelper#isYarnResourceTypesAvailable` > --- > > Key: SPARK-43215 > URL: https://issues.apache.org/jira/browse/SPARK-43215 > Project: Spark > Issue Type: Sub-task > Components: YARN >Affects Versions: 3.5.0 >Reporter: Yang Jie >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43215) Remove `ResourceRequestHelper#isYarnResourceTypesAvailable`
[ https://issues.apache.org/jira/browse/SPARK-43215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie resolved SPARK-43215. -- Resolution: Duplicate > Remove `ResourceRequestHelper#isYarnResourceTypesAvailable` > --- > > Key: SPARK-43215 > URL: https://issues.apache.org/jira/browse/SPARK-43215 > Project: Spark > Issue Type: Sub-task > Components: YARN >Affects Versions: 3.5.0 >Reporter: Yang Jie >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43113) Codegen error when full outer join's bound condition has multiple references to the same stream-side column
[ https://issues.apache.org/jira/browse/SPARK-43113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17714645#comment-17714645 ] Hudson commented on SPARK-43113: User 'bersprockets' has created a pull request for this issue: https://github.com/apache/spark/pull/40881 > Codegen error when full outer join's bound condition has multiple references > to the same stream-side column > --- > > Key: SPARK-43113 > URL: https://issues.apache.org/jira/browse/SPARK-43113 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.2, 3.4.0, 3.5.0 >Reporter: Bruce Robbins >Assignee: Bruce Robbins >Priority: Major > Fix For: 3.4.1, 3.5.0 > > > Example # 1 (sort merge join): > {noformat} > create or replace temp view v1 as > select * from values > (1, 1), > (2, 2), > (3, 1) > as v1(key, value); > create or replace temp view v2 as > select * from values > (1, 22, 22), > (3, -1, -1), > (7, null, null) > as v2(a, b, c); > select * > from v1 > full outer join v2 > on key = a > and value > b > and value > c; > {noformat} > The join's generated code causes the following compilation error: > {noformat} > org.codehaus.commons.compiler.CompileException: File 'generated.java', Line > 277, Column 9: Redefinition of local variable "smj_isNull_7" > {noformat} > Example #2 (shuffle hash join): > {noformat} > select /*+ SHUFFLE_HASH(v2) */ * > from v1 > full outer join v2 > on key = a > and value > b > and value > c; > {noformat} > The shuffle hash join's generated code causes the following compilation error: > {noformat} > org.codehaus.commons.compiler.CompileException: File 'generated.java', Line > 174, Column 5: Redefinition of local variable "shj_value_1" > {noformat} > With default configuration, both queries end up succeeding, since Spark falls > back to running each query with whole-stage codegen disabled. > The issue happens only when the join's bound condition refers to the same > stream-side column more than once. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43220) INSERT INTO REPLACE statement can't support WHERE with bool_expression
[ https://issues.apache.org/jira/browse/SPARK-43220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17714644#comment-17714644 ] Jia Fan commented on SPARK-43220: - Maybe the way of test have problem or `InMemoryTable` not support this, I will test again to confirm that. > INSERT INTO REPLACE statement can't support WHERE with bool_expression > -- > > Key: SPARK-43220 > URL: https://issues.apache.org/jira/browse/SPARK-43220 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0 >Reporter: Jia Fan >Priority: Major > Attachments: image-2023-04-20-23-40-25-212.png > > > {code:java} > sql("CREATE TABLE persons (name string,address String,ssn int) USING parquet") > sql("CREATE TABLE persons2 (name string,address String,ssn int) USING > parquet") > sql("INSERT INTO TABLE persons VALUES " + > "('Dora Williams', '134 Forest Ave, Menlo Park', 123456789)," + > "('Eddie Davis','245 Market St, Milpitas',345678901)") > sql("INSERT INTO TABLE persons2 VALUES ('Ashua Hill', '456 Erica Ct, > Cupertino', 432795921)") > sql("INSERT INTO persons REPLACE WHERE ssn = 123456789 SELECT * FROM > persons2") > sql("SELECT * FROM persons").show(){code} > When use `INSERT INTO table REPLACE WHERE`, only support `WHERE TRUE` at now. > `WHERE ssn = 123456789` or `WHERE FALSE` both not support. > !image-2023-04-20-23-40-25-212.png|width=795,height=152! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43216) Refactor `ResourceRequestHelper ` to no longer use reflection
[ https://issues.apache.org/jira/browse/SPARK-43216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie resolved SPARK-43216. -- Resolution: Duplicate > Refactor `ResourceRequestHelper ` to no longer use reflection > - > > Key: SPARK-43216 > URL: https://issues.apache.org/jira/browse/SPARK-43216 > Project: Spark > Issue Type: Sub-task > Components: YARN >Affects Versions: 3.5.0 >Reporter: Yang Jie >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43220) INSERT INTO REPLACE statement can't support WHERE with bool_expression
[ https://issues.apache.org/jira/browse/SPARK-43220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jia Fan updated SPARK-43220: Description: {code:java} sql("CREATE TABLE persons (name string,address String,ssn int) USING parquet") sql("CREATE TABLE persons2 (name string,address String,ssn int) USING parquet") sql("INSERT INTO TABLE persons VALUES " + "('Dora Williams', '134 Forest Ave, Menlo Park', 123456789)," + "('Eddie Davis','245 Market St, Milpitas',345678901)") sql("INSERT INTO TABLE persons2 VALUES ('Ashua Hill', '456 Erica Ct, Cupertino', 432795921)") sql("INSERT INTO persons REPLACE WHERE ssn = 123456789 SELECT * FROM persons2") sql("SELECT * FROM persons").show(){code} When use `INSERT INTO table REPLACE WHERE`, only support `WHERE TRUE` at now. `WHERE ssn = 123456789` or `WHERE FALSE` both not support. !image-2023-04-20-23-40-25-212.png|width=795,height=152! was: sql("CREATE TABLE persons (name string,address String,ssn int) USING parquet") sql("CREATE TABLE persons2 (name string,address String,ssn int) USING parquet") sql("INSERT INTO TABLE persons VALUES " + "('Dora Williams', '134 Forest Ave, Menlo Park', 123456789)," + "('Eddie Davis','245 Market St, Milpitas',345678901)") sql("INSERT INTO TABLE persons2 VALUES ('Ashua Hill', '456 Erica Ct, Cupertino', 432795921)") sql("INSERT INTO persons REPLACE WHERE ssn = 123456789 SELECT * FROM persons2") sql("SELECT * FROM persons").show() When use `INSERT INTO table REPLACE WHERE`, only support `WHERE TRUE` at now. `WHERE ssn = 123456789` or `WHERE FALSE` both not support. !image-2023-04-20-23-40-25-212.png|width=795,height=152! > INSERT INTO REPLACE statement can't support WHERE with bool_expression > -- > > Key: SPARK-43220 > URL: https://issues.apache.org/jira/browse/SPARK-43220 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0 >Reporter: Jia Fan >Priority: Major > Attachments: image-2023-04-20-23-40-25-212.png > > > {code:java} > sql("CREATE TABLE persons (name string,address String,ssn int) USING parquet") > sql("CREATE TABLE persons2 (name string,address String,ssn int) USING > parquet") > sql("INSERT INTO TABLE persons VALUES " + > "('Dora Williams', '134 Forest Ave, Menlo Park', 123456789)," + > "('Eddie Davis','245 Market St, Milpitas',345678901)") > sql("INSERT INTO TABLE persons2 VALUES ('Ashua Hill', '456 Erica Ct, > Cupertino', 432795921)") > sql("INSERT INTO persons REPLACE WHERE ssn = 123456789 SELECT * FROM > persons2") > sql("SELECT * FROM persons").show(){code} > When use `INSERT INTO table REPLACE WHERE`, only support `WHERE TRUE` at now. > `WHERE ssn = 123456789` or `WHERE FALSE` both not support. > !image-2023-04-20-23-40-25-212.png|width=795,height=152! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43220) INSERT INTO REPLACE statement can't support WHERE with bool_expression
[ https://issues.apache.org/jira/browse/SPARK-43220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jia Fan updated SPARK-43220: Description: sql("CREATE TABLE persons (name string,address String,ssn int) USING parquet") sql("CREATE TABLE persons2 (name string,address String,ssn int) USING parquet") sql("INSERT INTO TABLE persons VALUES " + "('Dora Williams', '134 Forest Ave, Menlo Park', 123456789)," + "('Eddie Davis','245 Market St, Milpitas',345678901)") sql("INSERT INTO TABLE persons2 VALUES ('Ashua Hill', '456 Erica Ct, Cupertino', 432795921)") sql("INSERT INTO persons REPLACE WHERE ssn = 123456789 SELECT * FROM persons2") sql("SELECT * FROM persons").show() When use `INSERT INTO table REPLACE WHERE`, only support `WHERE TRUE` at now. `WHERE ssn = 123456789` or `WHERE FALSE` both not support. !image-2023-04-20-23-40-25-212.png|width=795,height=152! was: sql("CREATE TABLE persons (name string,address String,ssn int) USING parquet") sql("CREATE TABLE persons2 (name string,address String,ssn int) USING parquet") sql("INSERT INTO TABLE persons VALUES " + "('Dora Williams', '134 Forest Ave, Menlo Park', 123456789)," + "('Eddie Davis','245 Market St, Milpitas',345678901)") sql("INSERT INTO TABLE persons2 VALUES ('Ashua Hill', '456 Erica Ct, Cupertino', 432795921)") sql("INSERT INTO persons REPLACE WHERE ssn = 123456789 SELECT * FROM persons2") sql("SELECT * FROM persons").show() When use `INSERT INTO table REPLACE WHERE`, only support `WHERE TRUE` at now. `WHERE ssn = 123456789` or `WHERE FALSE` both not support. > INSERT INTO REPLACE statement can't support WHERE with bool_expression > -- > > Key: SPARK-43220 > URL: https://issues.apache.org/jira/browse/SPARK-43220 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0 >Reporter: Jia Fan >Priority: Major > Attachments: image-2023-04-20-23-40-25-212.png > > > sql("CREATE TABLE persons (name string,address String,ssn int) USING parquet") > sql("CREATE TABLE persons2 (name string,address String,ssn int) USING > parquet") > sql("INSERT INTO TABLE persons VALUES " + > "('Dora Williams', '134 Forest Ave, Menlo Park', 123456789)," + > "('Eddie Davis','245 Market St, Milpitas',345678901)") > sql("INSERT INTO TABLE persons2 VALUES ('Ashua Hill', '456 Erica Ct, > Cupertino', 432795921)") > sql("INSERT INTO persons REPLACE WHERE ssn = 123456789 SELECT * FROM > persons2") > sql("SELECT * FROM persons").show() > > When use `INSERT INTO table REPLACE WHERE`, only support `WHERE TRUE` at now. > `WHERE ssn = 123456789` or `WHERE FALSE` both not support. > !image-2023-04-20-23-40-25-212.png|width=795,height=152! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43220) INSERT INTO REPLACE statement can't support WHERE with bool_expression
[ https://issues.apache.org/jira/browse/SPARK-43220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jia Fan updated SPARK-43220: Attachment: image-2023-04-20-23-40-25-212.png > INSERT INTO REPLACE statement can't support WHERE with bool_expression > -- > > Key: SPARK-43220 > URL: https://issues.apache.org/jira/browse/SPARK-43220 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0 >Reporter: Jia Fan >Priority: Major > Attachments: image-2023-04-20-23-40-25-212.png > > > sql("CREATE TABLE persons (name string,address String,ssn int) USING parquet") > sql("CREATE TABLE persons2 (name string,address String,ssn int) USING > parquet") > sql("INSERT INTO TABLE persons VALUES " + > "('Dora Williams', '134 Forest Ave, Menlo Park', 123456789)," + > "('Eddie Davis','245 Market St, Milpitas',345678901)") > sql("INSERT INTO TABLE persons2 VALUES ('Ashua Hill', '456 Erica Ct, > Cupertino', 432795921)") > sql("INSERT INTO persons REPLACE WHERE ssn = 123456789 SELECT * FROM > persons2") > sql("SELECT * FROM persons").show() > > When use `INSERT INTO table REPLACE WHERE`, only support `WHERE TRUE` at now. > `WHERE ssn = 123456789` or `WHERE FALSE` both not support. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43220) INSERT INTO REPLACE statement can't support WHERE with bool_expression
Jia Fan created SPARK-43220: --- Summary: INSERT INTO REPLACE statement can't support WHERE with bool_expression Key: SPARK-43220 URL: https://issues.apache.org/jira/browse/SPARK-43220 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.4.0 Reporter: Jia Fan sql("CREATE TABLE persons (name string,address String,ssn int) USING parquet") sql("CREATE TABLE persons2 (name string,address String,ssn int) USING parquet") sql("INSERT INTO TABLE persons VALUES " + "('Dora Williams', '134 Forest Ave, Menlo Park', 123456789)," + "('Eddie Davis','245 Market St, Milpitas',345678901)") sql("INSERT INTO TABLE persons2 VALUES ('Ashua Hill', '456 Erica Ct, Cupertino', 432795921)") sql("INSERT INTO persons REPLACE WHERE ssn = 123456789 SELECT * FROM persons2") sql("SELECT * FROM persons").show() When use `INSERT INTO table REPLACE WHERE`, only support `WHERE TRUE` at now. `WHERE ssn = 123456789` or `WHERE FALSE` both not support. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43219) Website can't find INSERT INTO REPLACE Statement
Jia Fan created SPARK-43219: --- Summary: Website can't find INSERT INTO REPLACE Statement Key: SPARK-43219 URL: https://issues.apache.org/jira/browse/SPARK-43219 Project: Spark Issue Type: Documentation Components: Documentation Affects Versions: 3.4.0 Reporter: Jia Fan `INSERT INTO REPLACE` statement be supported in [SPARK_40956], but can't be found in website -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43218) Support "ESCAPE BY" in SparkScriptTransformationExec
jiang13021 created SPARK-43218: -- Summary: Support "ESCAPE BY" in SparkScriptTransformationExec Key: SPARK-43218 URL: https://issues.apache.org/jira/browse/SPARK-43218 Project: Spark Issue Type: Wish Components: SQL Affects Versions: 3.4.0, 3.3.0, 3.2.0 Reporter: jiang13021 If I don't `set spark.sql.catalogImplementation=hive`, I can't use "SELECT TRANSFORM" with "ESCAPE BY". Although HiveScriptTransform also doesn't implement ESCAPE BY, I can use RowFormatSerde to achieve this ability. In fact, HiveScriptTransform doesn't need to connect to Hive Metastore. I can use reflection to forcibly call HiveScriptTransformationExec without connecting to Hive Metastore, and it can work properly. Maybe HiveScriptTransform can be more generic. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43152) User-defined output metadata path (_spark_metadata)
[ https://issues.apache.org/jira/browse/SPARK-43152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacek Laskowski updated SPARK-43152: Summary: User-defined output metadata path (_spark_metadata) (was: Parametrisable output metadata path (_spark_metadata)) > User-defined output metadata path (_spark_metadata) > --- > > Key: SPARK-43152 > URL: https://issues.apache.org/jira/browse/SPARK-43152 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 3.4.0 >Reporter: Wojciech Indyk >Priority: Major > > Currently path of metadata of output checkpoint is hardcoded. The metadata is > saved in output path in _spark_metadata folder. It's a constraint on > structure of paths, that might be easily relaxed by parametrisable path of > output metadata. It would help with issues like [changing output directory of > spark streaming > job|https://kb.databricks.com/en_US/streaming/file-sink-streaming], [two jobs > writing to the same output > path|https://issues.apache.org/jira/browse/SPARK-30542] or [partition > discovery|https://stackoverflow.com/questions/61904732/is-it-possible-to-change-location-of-spark-metadata-folder-in-spark-structured/61905158]. > It would also help with separation of metadata from data in path structure. > The main target of change is getMetadataLogPath method in FileStreamSink. It > has got access to sqlConf, so this method can override the default > _spark_metadata path if defined it config. Introduction of parametrised > metadata path needs reconsidering of meaning of hasMetadata method in > FileStreamSink. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43217) Correctly recurse into maps of maps and arrays of arrays in StructType.findNestedField
[ https://issues.apache.org/jira/browse/SPARK-43217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Johan Lasperas updated SPARK-43217: --- Description: [StructType.findNestedField|https://github.com/apache/spark/blob/db2625c70a8c3aff64e6a9466981c8dd49a4ca51/sql/catalyst/src/main/scala/org/apache/spark/sql/types/StructType.scala#L325] is unable to reach nested fields below two directly nested maps or arrays. Whenever it reaches a map or an array, it'll throw an `invalidFieldName` exception if the child is not a struct. The following throws '{{{}Field name `a`.`element`.element`.`i` is invalid: `a`.`element`.`element` is not a struct.'{}}}, even though the access path is valid: {code:java} val schema = new StructType() .add("a", ArrayType(ArrayType( new StructType().add("i", "int" findNestedField(Seq("a", "element", "element", "i"), schema) {code} was: [StructType.findNestedField|https://github.com/apache/spark/blob/db2625c70a8c3aff64e6a9466981c8dd49a4ca51/sql/catalyst/src/main/scala/org/apache/spark/sql/types/StructType.scala#L325] is unable to reach nested field below two directly nested maps or arrays. Whenever it reaches a map or an array, it'll throw an `invalidFieldName` exception if the child is not a struct. The following throws '{{{}Field name `a`.`element`.element`.`i` is invalid: `a`.`element`.`element` is not a struct.'{}}}, even though the access path is valid: {code:java} val schema = new StructType() .add("a", ArrayType(ArrayType( new StructType().add("i", "int" findNestedField(Seq("a", "element", "element", "i"), schema) {code} > Correctly recurse into maps of maps and arrays of arrays in > StructType.findNestedField > -- > > Key: SPARK-43217 > URL: https://issues.apache.org/jira/browse/SPARK-43217 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Johan Lasperas >Priority: Minor > > [StructType.findNestedField|https://github.com/apache/spark/blob/db2625c70a8c3aff64e6a9466981c8dd49a4ca51/sql/catalyst/src/main/scala/org/apache/spark/sql/types/StructType.scala#L325] > is unable to reach nested fields below two directly nested maps or arrays. > Whenever it reaches a map or an array, it'll throw an `invalidFieldName` > exception if the child is not a struct. > The following throws '{{{}Field name `a`.`element`.element`.`i` is invalid: > `a`.`element`.`element` is not a struct.'{}}}, even though the access path is > valid: > {code:java} > val schema = new StructType() > .add("a", ArrayType(ArrayType( > new StructType().add("i", "int" > findNestedField(Seq("a", "element", "element", "i"), schema) {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43217) Correctly recurse into maps of maps and arrays of arrays in StructType.findNestedField
[ https://issues.apache.org/jira/browse/SPARK-43217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Johan Lasperas updated SPARK-43217: --- Description: [StructType.findNestedField|https://github.com/apache/spark/blob/db2625c70a8c3aff64e6a9466981c8dd49a4ca51/sql/catalyst/src/main/scala/org/apache/spark/sql/types/StructType.scala#L325] is unable to reach nested field below two directly nested maps or arrays. Whenever it reaches a map or an array, it'll throw an `invalidFieldName` exception if the child is not a struct. The following throws '{{{}Field name `a`.`element`.element`.`i` is invalid: `a`.`element`.`element` is not a struct.'{}}}, even though the access path is valid: {code:java} val schema = new StructType() .add("a", ArrayType(ArrayType( new StructType().add("i", "int" findNestedField(Seq("a", "element", "element", "i"), schema) {code} was: [StructType.findNestedField|https://github.com/apache/spark/blob/db2625c70a8c3aff64e6a9466981c8dd49a4ca51/sql/catalyst/src/main/scala/org/apache/spark/sql/types/StructType.scala#L325] is unable to reach nested field below two directly nested maps or arrays. Whenever it reaches a map or an array, it'll throw an `invalidFieldName` exception if the child is not a struct. The following throws 'Field name `a`.`element`.element`.`i` is invalid: `a`.`element`.`element` is not a struct.', even though the access path is valid: {code:java} val schema = new StructType() .add("a", ArrayType(ArrayType( new StructType().add("i", "int" findNestedField(Seq("a", "element", "element", "i"), schema) {code} > Correctly recurse into maps of maps and arrays of arrays in > StructType.findNestedField > -- > > Key: SPARK-43217 > URL: https://issues.apache.org/jira/browse/SPARK-43217 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Johan Lasperas >Priority: Minor > > [StructType.findNestedField|https://github.com/apache/spark/blob/db2625c70a8c3aff64e6a9466981c8dd49a4ca51/sql/catalyst/src/main/scala/org/apache/spark/sql/types/StructType.scala#L325] > is unable to reach nested field below two directly nested maps or arrays. > Whenever it reaches a map or an array, it'll throw an `invalidFieldName` > exception if the child is not a struct. > The following throws '{{{}Field name `a`.`element`.element`.`i` is invalid: > `a`.`element`.`element` is not a struct.'{}}}, even though the access path is > valid: > {code:java} > val schema = new StructType() > .add("a", ArrayType(ArrayType( > new StructType().add("i", "int" > findNestedField(Seq("a", "element", "element", "i"), schema) {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43217) Correctly recurse into maps of maps and arrays of arrays in StructType.findNestedField
[ https://issues.apache.org/jira/browse/SPARK-43217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Johan Lasperas updated SPARK-43217: --- Description: [StructType.findNestedField|https://github.com/apache/spark/blob/db2625c70a8c3aff64e6a9466981c8dd49a4ca51/sql/catalyst/src/main/scala/org/apache/spark/sql/types/StructType.scala#L325] is unable to reach nested field below two directly nested maps or arrays. Whenever it reaches a map or an array, it'll throw an `invalidFieldName` exception if the child is not a struct. The following throws 'Field name `a`.`element`.element`.`i` is invalid: `a`.`element`.`element` is not a struct.', even though the access path is valid: {code:java} val schema = new StructType() .add("a", ArrayType(ArrayType( new StructType().add("i", "int" findNestedField(Seq("a", "element", "element", "i"), schema) {code} was: [StructType.findNestedField|https://github.com/apache/spark/blob/db2625c70a8c3aff64e6a9466981c8dd49a4ca51/sql/catalyst/src/main/scala/org/apache/spark/sql/types/StructType.scala#L325] is unable to reach nested field below two directly nested maps or arrays. Whenever it reaches a map or an array, it'll throw an `invalidFieldName` exception if the child is not a struct. The following throws 'Field name `a`.`element`.element`.`i` is invalid: `a`.`element`.`element` is not a struct.', even though the access path is valid: ``` val schema = new StructType() .add("a", ArrayType(ArrayType( new StructType().add("i", "int" findNestedField(Seq("a", "element", "element", "i"), schema) ``` > Correctly recurse into maps of maps and arrays of arrays in > StructType.findNestedField > -- > > Key: SPARK-43217 > URL: https://issues.apache.org/jira/browse/SPARK-43217 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Johan Lasperas >Priority: Minor > > [StructType.findNestedField|https://github.com/apache/spark/blob/db2625c70a8c3aff64e6a9466981c8dd49a4ca51/sql/catalyst/src/main/scala/org/apache/spark/sql/types/StructType.scala#L325] > is unable to reach nested field below two directly nested maps or arrays. > Whenever it reaches a map or an array, it'll throw an `invalidFieldName` > exception if the child is not a struct. > The following throws 'Field name `a`.`element`.element`.`i` is invalid: > `a`.`element`.`element` is not a struct.', even though the access path is > valid: > {code:java} > val schema = new StructType() > .add("a", ArrayType(ArrayType( > new StructType().add("i", "int" > findNestedField(Seq("a", "element", "element", "i"), schema) {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43217) Correctly recurse into maps of maps and arrays of arrays in StructType.findNestedField
Johan Lasperas created SPARK-43217: -- Summary: Correctly recurse into maps of maps and arrays of arrays in StructType.findNestedField Key: SPARK-43217 URL: https://issues.apache.org/jira/browse/SPARK-43217 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 3.4.0 Reporter: Johan Lasperas [StructType.findNestedField|https://github.com/apache/spark/blob/db2625c70a8c3aff64e6a9466981c8dd49a4ca51/sql/catalyst/src/main/scala/org/apache/spark/sql/types/StructType.scala#L325] is unable to reach nested field below two directly nested maps or arrays. Whenever it reaches a map or an array, it'll throw an `invalidFieldName` exception if the child is not a struct. The following throws 'Field name `a`.`element`.element`.`i` is invalid: `a`.`element`.`element` is not a struct.', even though the access path is valid: ``` val schema = new StructType() .add("a", ArrayType(ArrayType( new StructType().add("i", "int" findNestedField(Seq("a", "element", "element", "i"), schema) ``` -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36058) Support replicasets/job API
[ https://issues.apache.org/jira/browse/SPARK-36058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17714556#comment-17714556 ] Hu Ziqian commented on SPARK-36058: --- hi [~holden], i have a question about the statefulsetPodsAllocator. I understand that with dynamic allocation, the driver will delete executor who has idle exceed timeout. For example, we have executor 0 to 9, and the executor 5 is idle. the driver will delete executor 5 and adjust target pod number from 10 to 9. But with stateful set, the k8s will try to delete pod with max index, for example executor 9. So there is a conflict between deletion from driver and deletion from controller manager of k8s. I want to know is there any limitation when use statefulset pod allocator. If not, how to avoid the conflict above? > Support replicasets/job API > --- > > Key: SPARK-36058 > URL: https://issues.apache.org/jira/browse/SPARK-36058 > Project: Spark > Issue Type: Sub-task > Components: Kubernetes >Affects Versions: 3.2.0, 3.3.0 >Reporter: Holden Karau >Assignee: Holden Karau >Priority: Major > Fix For: 3.3.0 > > > Volcano & Yunikorn both support scheduling invidual pods, but they also > support higher level abstractions similar to the vanilla Kube replicasets > which we can use to improve scheduling performance. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43216) Refactor `ResourceRequestHelper ` to no longer use reflection
Yang Jie created SPARK-43216: Summary: Refactor `ResourceRequestHelper ` to no longer use reflection Key: SPARK-43216 URL: https://issues.apache.org/jira/browse/SPARK-43216 Project: Spark Issue Type: Sub-task Components: YARN Affects Versions: 3.5.0 Reporter: Yang Jie -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43215) Remove `ResourceRequestHelper#isYarnResourceTypesAvailable`
Yang Jie created SPARK-43215: Summary: Remove `ResourceRequestHelper#isYarnResourceTypesAvailable` Key: SPARK-43215 URL: https://issues.apache.org/jira/browse/SPARK-43215 Project: Spark Issue Type: Sub-task Components: YARN Affects Versions: 3.5.0 Reporter: Yang Jie -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43214) Post driver-side metrics for LocalTableScanExec/CommandResultExec
Fu Chen created SPARK-43214: --- Summary: Post driver-side metrics for LocalTableScanExec/CommandResultExec Key: SPARK-43214 URL: https://issues.apache.org/jira/browse/SPARK-43214 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.4.0 Reporter: Fu Chen -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43184) Resume using enumeration to compare `NodeState.DECOMMISSIONING`
[ https://issues.apache.org/jira/browse/SPARK-43184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng resolved SPARK-43184. --- Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 40846 [https://github.com/apache/spark/pull/40846] > Resume using enumeration to compare `NodeState.DECOMMISSIONING` > > > Key: SPARK-43184 > URL: https://issues.apache.org/jira/browse/SPARK-43184 > Project: Spark > Issue Type: Sub-task > Components: YARN >Affects Versions: 3.5.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Minor > Fix For: 3.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-43184) Resume using enumeration to compare `NodeState.DECOMMISSIONING`
[ https://issues.apache.org/jira/browse/SPARK-43184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng reassigned SPARK-43184: - Assignee: Yang Jie > Resume using enumeration to compare `NodeState.DECOMMISSIONING` > > > Key: SPARK-43184 > URL: https://issues.apache.org/jira/browse/SPARK-43184 > Project: Spark > Issue Type: Sub-task > Components: YARN >Affects Versions: 3.5.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43213) Add `DataFrame.offset` to PySpark
[ https://issues.apache.org/jira/browse/SPARK-43213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17714501#comment-17714501 ] ASF GitHub Bot commented on SPARK-43213: User 'zhengruifeng' has created a pull request for this issue: https://github.com/apache/spark/pull/40873 > Add `DataFrame.offset` to PySpark > - > > Key: SPARK-43213 > URL: https://issues.apache.org/jira/browse/SPARK-43213 > Project: Spark > Issue Type: New Feature > Components: PySpark >Affects Versions: 3.5.0 >Reporter: Ruifeng Zheng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39203) Fix remote table location based on database location
[ https://issues.apache.org/jira/browse/SPARK-39203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17714499#comment-17714499 ] ASF GitHub Bot commented on SPARK-39203: User 'cloud-fan' has created a pull request for this issue: https://github.com/apache/spark/pull/40871 > Fix remote table location based on database location > > > Key: SPARK-39203 > URL: https://issues.apache.org/jira/browse/SPARK-39203 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0, 2.3.0, 2.4.0, 3.0.0, 3.1.0, 3.1.1, 3.2.0, 3.3.0, > 3.4.0 >Reporter: Yuming Wang >Assignee: Yuming Wang >Priority: Major > Fix For: 3.4.0 > > > We have HDFS and Hive on cluster A. We have Spark on cluster B and need to > read data from cluster A. The table location is incorrect: > {noformat} > spark-sql> desc formatted default.test_table; > fas_acct_id decimal(18,0) > fas_acct_cd string > cmpny_cd string > entity_id string > cre_date date > cre_user string > upd_date timestamp > upd_user string > # Detailed Table Information > Database default > Table test_table > Type EXTERNAL > Provider parquet > Statistics25310025737 bytes > Location /user/hive/warehouse/test_table > Serde Library > org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe > InputFormat > org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat > OutputFormat > org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat > Storage Properties[compression=snappy] > spark-sql> desc database default; > Namespace Namedefault > Comment > Location viewfs://clusterA/user/hive/warehouse/ > Owner hive_dba > {noformat} > The correct table location should be > viewfs://clusterA/user/hive/warehouse/test_table. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43213) Add `DataFrame.offset` to PySpark
[ https://issues.apache.org/jira/browse/SPARK-43213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng updated SPARK-43213: -- Issue Type: New Feature (was: Improvement) > Add `DataFrame.offset` to PySpark > - > > Key: SPARK-43213 > URL: https://issues.apache.org/jira/browse/SPARK-43213 > Project: Spark > Issue Type: New Feature > Components: PySpark >Affects Versions: 3.5.0 >Reporter: Ruifeng Zheng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43169) Update mima's previousSparkVersion to 3.4.0
[ https://issues.apache.org/jira/browse/SPARK-43169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17714498#comment-17714498 ] ASF GitHub Bot commented on SPARK-43169: User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/40862 > Update mima's previousSparkVersion to 3.4.0 > --- > > Key: SPARK-43169 > URL: https://issues.apache.org/jira/browse/SPARK-43169 > Project: Spark > Issue Type: Improvement > Components: Project Infra >Affects Versions: 3.5.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Major > Fix For: 3.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43213) Add `DataFrame.offset` to PySpark
Ruifeng Zheng created SPARK-43213: - Summary: Add `DataFrame.offset` to PySpark Key: SPARK-43213 URL: https://issues.apache.org/jira/browse/SPARK-43213 Project: Spark Issue Type: Improvement Components: PySpark Affects Versions: 3.5.0 Reporter: Ruifeng Zheng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43183) Move update event on idleness in streaming query listener to separate callback method
[ https://issues.apache.org/jira/browse/SPARK-43183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jungtaek Lim resolved SPARK-43183. -- Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 40845 [https://github.com/apache/spark/pull/40845] > Move update event on idleness in streaming query listener to separate > callback method > - > > Key: SPARK-43183 > URL: https://issues.apache.org/jira/browse/SPARK-43183 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 3.5.0 >Reporter: Jungtaek Lim >Assignee: Jungtaek Lim >Priority: Major > Fix For: 3.5.0 > > > People has been having a lot of confusions about update event on idleness; > it’s not only the matter of understanding but also comes up with various > types of complaints. For example, since we give the latest batch ID for > update event on idleness, if the listener implementation blindly performs > upsert based on batch ID, they are in risk to lose metrics. > This also complicates the logic because we have to memorize the execution for > the previous batch, which is arguably not necessary. > Because of this, we’d be better to move the idle event out of progress update > event and have separate callback method for this. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-43183) Move update event on idleness in streaming query listener to separate callback method
[ https://issues.apache.org/jira/browse/SPARK-43183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jungtaek Lim reassigned SPARK-43183: Assignee: Jungtaek Lim > Move update event on idleness in streaming query listener to separate > callback method > - > > Key: SPARK-43183 > URL: https://issues.apache.org/jira/browse/SPARK-43183 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 3.5.0 >Reporter: Jungtaek Lim >Assignee: Jungtaek Lim >Priority: Major > > People has been having a lot of confusions about update event on idleness; > it’s not only the matter of understanding but also comes up with various > types of complaints. For example, since we give the latest batch ID for > update event on idleness, if the listener implementation blindly performs > upsert based on batch ID, they are in risk to lose metrics. > This also complicates the logic because we have to memorize the execution for > the previous batch, which is arguably not necessary. > Because of this, we’d be better to move the idle event out of progress update > event and have separate callback method for this. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-43207) Add helper functions for extract value from literal expression
[ https://issues.apache.org/jira/browse/SPARK-43207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng reassigned SPARK-43207: - Assignee: Ruifeng Zheng > Add helper functions for extract value from literal expression > -- > > Key: SPARK-43207 > URL: https://issues.apache.org/jira/browse/SPARK-43207 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.5.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43207) Add helper functions for extract value from literal expression
[ https://issues.apache.org/jira/browse/SPARK-43207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng resolved SPARK-43207. --- Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 40863 [https://github.com/apache/spark/pull/40863] > Add helper functions for extract value from literal expression > -- > > Key: SPARK-43207 > URL: https://issues.apache.org/jira/browse/SPARK-43207 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.5.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Fix For: 3.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43212) Migrate Structured Streaming errors into error class
Haejoon Lee created SPARK-43212: --- Summary: Migrate Structured Streaming errors into error class Key: SPARK-43212 URL: https://issues.apache.org/jira/browse/SPARK-43212 Project: Spark Issue Type: Sub-task Components: PySpark, Structured Streaming Affects Versions: 3.5.0 Reporter: Haejoon Lee Migrate Structured Streaming errors into error class -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43190) ListQuery.childOutput should be consistent with child output
[ https://issues.apache.org/jira/browse/SPARK-43190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-43190. - Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 40851 [https://github.com/apache/spark/pull/40851] > ListQuery.childOutput should be consistent with child output > > > Key: SPARK-43190 > URL: https://issues.apache.org/jira/browse/SPARK-43190 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.5.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan >Priority: Major > Fix For: 3.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-43190) ListQuery.childOutput should be consistent with child output
[ https://issues.apache.org/jira/browse/SPARK-43190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-43190: --- Assignee: Wenchen Fan > ListQuery.childOutput should be consistent with child output > > > Key: SPARK-43190 > URL: https://issues.apache.org/jira/browse/SPARK-43190 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.5.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43211) Remove Hadoop2 support in IsolatedClientLoader
Cheng Pan created SPARK-43211: - Summary: Remove Hadoop2 support in IsolatedClientLoader Key: SPARK-43211 URL: https://issues.apache.org/jira/browse/SPARK-43211 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.5.0 Reporter: Cheng Pan -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43210) Introduce PySparkAssersionError
Haejoon Lee created SPARK-43210: --- Summary: Introduce PySparkAssersionError Key: SPARK-43210 URL: https://issues.apache.org/jira/browse/SPARK-43210 Project: Spark Issue Type: Sub-task Components: Connect, PySpark Affects Versions: 3.5.0 Reporter: Haejoon Lee Introduce PySparkAssersionError -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43209) Migrate Expression errors into error class
Haejoon Lee created SPARK-43209: --- Summary: Migrate Expression errors into error class Key: SPARK-43209 URL: https://issues.apache.org/jira/browse/SPARK-43209 Project: Spark Issue Type: Sub-task Components: Connect, PySpark Affects Versions: 3.5.0 Reporter: Haejoon Lee Migrate Expression errors into error class -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org