[jira] [Updated] (SPARK-48652) Casting Issue in Spark SQL: String Column Compared to Integer Value Yields Empty Results
[ https://issues.apache.org/jira/browse/SPARK-48652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Singh updated SPARK-48652: --- Description: In Spark SQL, comparing a string column to an integer value can lead to unexpected results due to type casting resulting in an empty result set. {code:java} case class Person(id: String, name: String) val personDF = Seq(Person("a", "amit"), Person("b", "abhishek")).toDF() personDF.createOrReplaceTempView("person_ddf") val sqlQuery = "SELECT * FROM person_ddf WHERE id <> -1" val resultDF = spark.sql(sqlQuery) resultDF.show() // Empty result due to type casting issue {code} Below is the logical and physical plan which I m getting {code:java} == Parsed Logical Plan == 'Project [*] +- 'Filter NOT ('id = -1) +- 'UnresolvedRelation [person_ddf], [], false == Analyzed Logical Plan == id: string, name: string Project [id#356, name#357] +- Filter NOT (cast(id#356 as int) = -1) +- SubqueryAlias person_ddf +- View (`person_ddf`, [id#356,name#357]) +- LocalRelation [id#356, name#357]{code} *But when I m using the same query and table in Redshift which is based on PostGreSQL. I am getting the desired result.* {code:java} select * from person where id <> -1; {code} Explain plan obtained in Redshift. {code:java} XN Seq Scan on person (cost=0.00..0.03 rows=1 width=336) Filter: ((id)::text <> '-1'::text) {code} In the execution plan for Spark, the ID column is cast as an integer, while in Redshift, the ID column is cast as a varchar. Shouldn't Spark SQL handle this the same way as Redshift, using the datatype of the ID column rather than the datatype of -1? was: In Spark SQL, comparing a string column to an integer value can lead to unexpected results due to implicit type casting. When a string column is compared to an integer, Spark attempts to cast the strings to integers, which fails for non-numeric strings, resulting in an empty result set. {code:java} case class Person(id: String, name: String) val personDF = Seq(Person("a", "amit"), Person("b", "abhishek")).toDF() personDF.createOrReplaceTempView("person_ddf") val sqlQuery = "SELECT * FROM person_ddf WHERE id <> -1" val resultDF = spark.sql(sqlQuery) resultDF.show() // Empty result due to type casting issue {code} Below is the logical and physical plan which I m getting {code:java} == Parsed Logical Plan == 'Project [*] +- 'Filter NOT ('id = -1) +- 'UnresolvedRelation [person_ddf], [], false == Analyzed Logical Plan == id: string, name: string Project [id#356, name#357] +- Filter NOT (cast(id#356 as int) = -1) +- SubqueryAlias person_ddf +- View (`person_ddf`, [id#356,name#357]) +- LocalRelation [id#356, name#357] == Optimized Logical Plan == LocalRelation , [id#356, name#357] == Physical Plan == LocalTableScan , [id#356, name#357] == Physical Plan == LocalTableScan (1) {code} > Casting Issue in Spark SQL: String Column Compared to Integer Value Yields > Empty Results > > > Key: SPARK-48652 > URL: https://issues.apache.org/jira/browse/SPARK-48652 > Project: Spark > Issue Type: Question > Components: Spark Core, SQL >Affects Versions: 3.3.2 >Reporter: Abhishek Singh >Priority: Minor > > In Spark SQL, comparing a string column to an integer value can lead to > unexpected results due to type casting resulting in an empty result set. > {code:java} > case class Person(id: String, name: String) > val personDF = Seq(Person("a", "amit"), Person("b", "abhishek")).toDF() > personDF.createOrReplaceTempView("person_ddf") > val sqlQuery = "SELECT * FROM person_ddf WHERE id <> -1" > val resultDF = spark.sql(sqlQuery) > resultDF.show() // Empty result due to type casting issue > {code} > Below is the logical and physical plan which I m getting > {code:java} > == Parsed Logical Plan == > 'Project [*] > +- 'Filter NOT ('id = -1) >+- 'UnresolvedRelation [person_ddf], [], false > == Analyzed Logical Plan == > id: string, name: string > Project [id#356, name#357] > +- Filter NOT (cast(id#356 as int) = -1) >+- SubqueryAlias person_ddf > +- View (`person_ddf`, [id#356,name#357]) > +- LocalRelation [id#356, name#357]{code} > *But when I m using the same query and table in Redshift which is based on > PostGreSQL. I am getting the desired result.* > {code:java} > select * from person where id <> -1; {code} > Explain plan obtained in Redshift. > {code:java} > XN Seq Scan on person (cost=0.00..0.03 rows=1 width=336) > Filter: ((id)::text <> '-1'::text) {code} > > In the execution plan for Spark, the ID column is cast as an integer, while > in Redshift, the ID column is cast as a varchar. > Shouldn't Spark SQL handle this the same way as Redshift, using the datatype > of the ID column
[jira] [Updated] (SPARK-48660) The result of explain is incorrect for CreateTableAsSelect
[ https://issues.apache.org/jira/browse/SPARK-48660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-48660: Description: How to reproduce: {code:sql} CREATE TABLE order_history_version_audit_rno ( eventid STRING, id STRING, referenceid STRING, type STRING, referencetype STRING, sellerid BIGINT, buyerid BIGINT, producerid STRING, versionid INT, changedocuments ARRAY>, dt STRING, hr STRING) USING parquet PARTITIONED BY (dt, hr); explain cost CREATE TABLE order_history_version_audit_rno USING parquet PARTITIONED BY (dt) CLUSTERED BY (id) INTO 1000 buckets AS SELECT * FROM order_history_version_audit_rno WHERE dt >= '2023-11-29'; {code} {noformat} spark-sql (default)> > explain cost > CREATE TABLE order_history_version_audit_rno > USING parquet > PARTITIONED BY (dt) > CLUSTERED BY (id) INTO 1000 buckets > AS SELECT * FROM order_history_version_audit_rno > WHERE dt >= '2023-11-29'; == Optimized Logical Plan == CreateDataSourceTableAsSelectCommand `spark_catalog`.`default`.`order_history_version_audit_rno`, ErrorIfExists, [eventid, id, referenceid, type, referencetype, sellerid, buyerid, producerid, versionid, changedocuments, hr, dt] +- Project [eventid#5, id#6, referenceid#7, type#8, referencetype#9, sellerid#10L, buyerid#11L, producerid#12, versionid#13, changedocuments#14, hr#16, dt#15] +- Project [eventid#5, id#6, referenceid#7, type#8, referencetype#9, sellerid#10L, buyerid#11L, producerid#12, versionid#13, changedocuments#14, dt#15, hr#16] +- Filter (dt#15 >= 2023-11-29) +- SubqueryAlias spark_catalog.default.order_history_version_audit_rno +- Relation spark_catalog.default.order_history_version_audit_rno[eventid#5,id#6,referenceid#7,type#8,referencetype#9,sellerid#10L,buyerid#11L,producerid#12,versionid#13,changedocuments#14,dt#15,hr#16] parquet == Physical Plan == Execute CreateDataSourceTableAsSelectCommand +- CreateDataSourceTableAsSelectCommand `spark_catalog`.`default`.`order_history_version_audit_rno`, ErrorIfExists, [eventid, id, referenceid, type, referencetype, sellerid, buyerid, producerid, versionid, changedocuments, hr, dt] +- Project [eventid#5, id#6, referenceid#7, type#8, referencetype#9, sellerid#10L, buyerid#11L, producerid#12, versionid#13, changedocuments#14, hr#16, dt#15] +- Project [eventid#5, id#6, referenceid#7, type#8, referencetype#9, sellerid#10L, buyerid#11L, producerid#12, versionid#13, changedocuments#14, dt#15, hr#16] +- Filter (dt#15 >= 2023-11-29) +- SubqueryAlias spark_catalog.default.order_history_version_audit_rno +- Relation spark_catalog.default.order_history_version_audit_rno[eventid#5,id#6,referenceid#7,type#8,referencetype#9,sellerid#10L,buyerid#11L,producerid#12,versionid#13,changedocuments#14,dt#15,hr#16] parquet {noformat} If remove create table: {noformat} > explain cost > SELECT * FROM order_history_version_audit_rno > WHERE dt >= '2023-11-29'; == Optimized Logical Plan == Filter (isnotnull(dt#15) AND (dt#15 >= 2023-11-29)), Statistics(sizeInBytes=1.0 B) +- Relation spark_catalog.default.order_history_version_audit_rno[eventid#5,id#6,referenceid#7,type#8,referencetype#9,sellerid#10L,buyerid#11L,producerid#12,versionid#13,changedocuments#14,dt#15,hr#16] parquet, Statistics(sizeInBytes=0.0 B) == Physical Plan == *(1) ColumnarToRow +- FileScan parquet spark_catalog.default.order_history_version_audit_rno[eventid#5,id#6,referenceid#7,type#8,referencetype#9,sellerid#10L,buyerid#11L,producerid#12,versionid#13,changedocuments#14,dt#15,hr#16] Batched: true, DataFilters: [], Format: Parquet, Location: InMemoryFileIndex(0 paths)[], PartitionFilters: [isnotnull(dt#15), (dt#15 >= 2023-11-29)], PushedFilters: [], ReadSchema: struct>, dt STRING, hr STRING) USING parquet PARTITIONED BY (dt, hr); explain cost CREATE TABLE order_history_version_audit_rno USING parquet PARTITIONED BY (dt) CLUSTERED BY (id) INTO 1000 buckets AS SELECT * FROM order_history_version_audit_rno WHERE dt >= '2023-11-29'; {code} {noformat} spark-sql (default)> > explain cost > CREATE TABLE order_history_version_audit_rno > USING parquet > PARTITIONED BY (dt) > CLUSTERED BY (id) INTO 1000 buckets > AS SELECT * FROM order_history_version_audit_rno > WHERE dt >= '2023-11-29'; == Optimized Logical Plan == CreateDataSourceTableAsSelectCommand `spark_catalog`.`default`.`order_history_version_audit_rno`, ErrorIfExists, [eventid, id, referenceid, type, referencetype, sellerid, buyerid, producerid, versionid,
[jira] [Commented] (SPARK-48660) The result of explain is incorrect for CreateTableAsSelect
[ https://issues.apache.org/jira/browse/SPARK-48660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17856122#comment-17856122 ] Wei Guo commented on SPARK-48660: - I am working on this and thank your for recommendation [~yangjie01] . > The result of explain is incorrect for CreateTableAsSelect > -- > > Key: SPARK-48660 > URL: https://issues.apache.org/jira/browse/SPARK-48660 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0, 4.0.0, 3.5.1 >Reporter: Yuming Wang >Priority: Major > > How to reproduce: > {code:sql} > CREATE TABLE order_history_version_audit_rno ( > eventid STRING, > id STRING, > referenceid STRING, > type STRING, > referencetype STRING, > sellerid BIGINT, > buyerid BIGINT, > producerid STRING, > versionid INT, > changedocuments ARRAY BIGINT, changeDetails: STRING>>, > dt STRING, > hr STRING) > USING parquet > PARTITIONED BY (dt, hr); > explain cost > CREATE TABLE order_history_version_audit_rno > USING parquet > PARTITIONED BY (dt) > CLUSTERED BY (id) INTO 1000 buckets > AS SELECT * FROM order_history_version_audit_rno > WHERE dt >= '2023-11-29'; > {code} > {noformat} > spark-sql (default)> >> explain cost >> CREATE TABLE order_history_version_audit_rno >> USING parquet >> PARTITIONED BY (dt) >> CLUSTERED BY (id) INTO 1000 buckets >> AS SELECT * FROM order_history_version_audit_rno >> WHERE dt >= '2023-11-29'; > == Optimized Logical Plan == > CreateDataSourceTableAsSelectCommand > `spark_catalog`.`default`.`order_history_version_audit_rno`, ErrorIfExists, > [eventid, id, referenceid, type, referencetype, sellerid, buyerid, > producerid, versionid, changedocuments, hr, dt] >+- Project [eventid#5, id#6, referenceid#7, type#8, referencetype#9, > sellerid#10L, buyerid#11L, producerid#12, versionid#13, changedocuments#14, > hr#16, dt#15] > +- Project [eventid#5, id#6, referenceid#7, type#8, referencetype#9, > sellerid#10L, buyerid#11L, producerid#12, versionid#13, changedocuments#14, > dt#15, hr#16] > +- Filter (dt#15 >= 2023-11-29) > +- SubqueryAlias > spark_catalog.default.order_history_version_audit_rno >+- Relation > spark_catalog.default.order_history_version_audit_rno[eventid#5,id#6,referenceid#7,type#8,referencetype#9,sellerid#10L,buyerid#11L,producerid#12,versionid#13,changedocuments#14,dt#15,hr#16] > parquet > == Physical Plan == > Execute CreateDataSourceTableAsSelectCommand >+- CreateDataSourceTableAsSelectCommand > `spark_catalog`.`default`.`order_history_version_audit_rno`, ErrorIfExists, > [eventid, id, referenceid, type, referencetype, sellerid, buyerid, > producerid, versionid, changedocuments, hr, dt] > +- Project [eventid#5, id#6, referenceid#7, type#8, referencetype#9, > sellerid#10L, buyerid#11L, producerid#12, versionid#13, changedocuments#14, > hr#16, dt#15] > +- Project [eventid#5, id#6, referenceid#7, type#8, > referencetype#9, sellerid#10L, buyerid#11L, producerid#12, versionid#13, > changedocuments#14, dt#15, hr#16] >+- Filter (dt#15 >= 2023-11-29) > +- SubqueryAlias > spark_catalog.default.order_history_version_audit_rno > +- Relation > spark_catalog.default.order_history_version_audit_rno[eventid#5,id#6,referenceid#7,type#8,referencetype#9,sellerid#10L,buyerid#11L,producerid#12,versionid#13,changedocuments#14,dt#15,hr#16] > parquet > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-48660) The result of explain is incorrect for CreateTableAsSelect
[ https://issues.apache.org/jira/browse/SPARK-48660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17856122#comment-17856122 ] Wei Guo edited comment on SPARK-48660 at 6/19/24 4:18 AM: -- I am working on this and thank your for recommendation [~LuciferYang] was (Author: wayne guo): I am working on this and thank your for recommendation [~yangjie01] . > The result of explain is incorrect for CreateTableAsSelect > -- > > Key: SPARK-48660 > URL: https://issues.apache.org/jira/browse/SPARK-48660 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0, 4.0.0, 3.5.1 >Reporter: Yuming Wang >Priority: Major > > How to reproduce: > {code:sql} > CREATE TABLE order_history_version_audit_rno ( > eventid STRING, > id STRING, > referenceid STRING, > type STRING, > referencetype STRING, > sellerid BIGINT, > buyerid BIGINT, > producerid STRING, > versionid INT, > changedocuments ARRAY BIGINT, changeDetails: STRING>>, > dt STRING, > hr STRING) > USING parquet > PARTITIONED BY (dt, hr); > explain cost > CREATE TABLE order_history_version_audit_rno > USING parquet > PARTITIONED BY (dt) > CLUSTERED BY (id) INTO 1000 buckets > AS SELECT * FROM order_history_version_audit_rno > WHERE dt >= '2023-11-29'; > {code} > {noformat} > spark-sql (default)> >> explain cost >> CREATE TABLE order_history_version_audit_rno >> USING parquet >> PARTITIONED BY (dt) >> CLUSTERED BY (id) INTO 1000 buckets >> AS SELECT * FROM order_history_version_audit_rno >> WHERE dt >= '2023-11-29'; > == Optimized Logical Plan == > CreateDataSourceTableAsSelectCommand > `spark_catalog`.`default`.`order_history_version_audit_rno`, ErrorIfExists, > [eventid, id, referenceid, type, referencetype, sellerid, buyerid, > producerid, versionid, changedocuments, hr, dt] >+- Project [eventid#5, id#6, referenceid#7, type#8, referencetype#9, > sellerid#10L, buyerid#11L, producerid#12, versionid#13, changedocuments#14, > hr#16, dt#15] > +- Project [eventid#5, id#6, referenceid#7, type#8, referencetype#9, > sellerid#10L, buyerid#11L, producerid#12, versionid#13, changedocuments#14, > dt#15, hr#16] > +- Filter (dt#15 >= 2023-11-29) > +- SubqueryAlias > spark_catalog.default.order_history_version_audit_rno >+- Relation > spark_catalog.default.order_history_version_audit_rno[eventid#5,id#6,referenceid#7,type#8,referencetype#9,sellerid#10L,buyerid#11L,producerid#12,versionid#13,changedocuments#14,dt#15,hr#16] > parquet > == Physical Plan == > Execute CreateDataSourceTableAsSelectCommand >+- CreateDataSourceTableAsSelectCommand > `spark_catalog`.`default`.`order_history_version_audit_rno`, ErrorIfExists, > [eventid, id, referenceid, type, referencetype, sellerid, buyerid, > producerid, versionid, changedocuments, hr, dt] > +- Project [eventid#5, id#6, referenceid#7, type#8, referencetype#9, > sellerid#10L, buyerid#11L, producerid#12, versionid#13, changedocuments#14, > hr#16, dt#15] > +- Project [eventid#5, id#6, referenceid#7, type#8, > referencetype#9, sellerid#10L, buyerid#11L, producerid#12, versionid#13, > changedocuments#14, dt#15, hr#16] >+- Filter (dt#15 >= 2023-11-29) > +- SubqueryAlias > spark_catalog.default.order_history_version_audit_rno > +- Relation > spark_catalog.default.order_history_version_audit_rno[eventid#5,id#6,referenceid#7,type#8,referencetype#9,sellerid#10L,buyerid#11L,producerid#12,versionid#13,changedocuments#14,dt#15,hr#16] > parquet > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48661) Upgrade RoaringBitmap to 1.1.0
Wei Guo created SPARK-48661: --- Summary: Upgrade RoaringBitmap to 1.1.0 Key: SPARK-48661 URL: https://issues.apache.org/jira/browse/SPARK-48661 Project: Spark Issue Type: Sub-task Components: Build Affects Versions: 4.0.0 Reporter: Wei Guo -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48660) The result of explain is incorrect for CreateTableAsSelect
[ https://issues.apache.org/jira/browse/SPARK-48660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-48660: Description: How to reproduce: {code:sql} CREATE TABLE order_history_version_audit_rno ( eventid STRING, id STRING, referenceid STRING, type STRING, referencetype STRING, sellerid BIGINT, buyerid BIGINT, producerid STRING, versionid INT, changedocuments ARRAY>, dt STRING, hr STRING) USING parquet PARTITIONED BY (dt, hr); explain cost CREATE TABLE order_history_version_audit_rno USING parquet PARTITIONED BY (dt) CLUSTERED BY (id) INTO 1000 buckets AS SELECT * FROM order_history_version_audit_rno WHERE dt >= '2023-11-29'; {code} {noformat} spark-sql (default)> > explain cost > CREATE TABLE order_history_version_audit_rno > USING parquet > PARTITIONED BY (dt) > CLUSTERED BY (id) INTO 1000 buckets > AS SELECT * FROM order_history_version_audit_rno > WHERE dt >= '2023-11-29'; == Optimized Logical Plan == CreateDataSourceTableAsSelectCommand `spark_catalog`.`default`.`order_history_version_audit_rno`, ErrorIfExists, [eventid, id, referenceid, type, referencetype, sellerid, buyerid, producerid, versionid, changedocuments, hr, dt] +- Project [eventid#5, id#6, referenceid#7, type#8, referencetype#9, sellerid#10L, buyerid#11L, producerid#12, versionid#13, changedocuments#14, hr#16, dt#15] +- Project [eventid#5, id#6, referenceid#7, type#8, referencetype#9, sellerid#10L, buyerid#11L, producerid#12, versionid#13, changedocuments#14, dt#15, hr#16] +- Filter (dt#15 >= 2023-11-29) +- SubqueryAlias spark_catalog.default.order_history_version_audit_rno +- Relation spark_catalog.default.order_history_version_audit_rno[eventid#5,id#6,referenceid#7,type#8,referencetype#9,sellerid#10L,buyerid#11L,producerid#12,versionid#13,changedocuments#14,dt#15,hr#16] parquet == Physical Plan == Execute CreateDataSourceTableAsSelectCommand +- CreateDataSourceTableAsSelectCommand `spark_catalog`.`default`.`order_history_version_audit_rno`, ErrorIfExists, [eventid, id, referenceid, type, referencetype, sellerid, buyerid, producerid, versionid, changedocuments, hr, dt] +- Project [eventid#5, id#6, referenceid#7, type#8, referencetype#9, sellerid#10L, buyerid#11L, producerid#12, versionid#13, changedocuments#14, hr#16, dt#15] +- Project [eventid#5, id#6, referenceid#7, type#8, referencetype#9, sellerid#10L, buyerid#11L, producerid#12, versionid#13, changedocuments#14, dt#15, hr#16] +- Filter (dt#15 >= 2023-11-29) +- SubqueryAlias spark_catalog.default.order_history_version_audit_rno +- Relation spark_catalog.default.order_history_version_audit_rno[eventid#5,id#6,referenceid#7,type#8,referencetype#9,sellerid#10L,buyerid#11L,producerid#12,versionid#13,changedocuments#14,dt#15,hr#16] parquet {noformat} was: How to reproduce: {noformat} spark-sql (default)> > explain cost > CREATE TABLE order_history_version_audit_rno > USING parquet > PARTITIONED BY (dt) > CLUSTERED BY (id) INTO 1000 buckets > AS SELECT * FROM order_history_version_audit_rno > WHERE dt >= '2023-11-29'; == Optimized Logical Plan == CreateDataSourceTableAsSelectCommand `spark_catalog`.`default`.`order_history_version_audit_rno`, ErrorIfExists, [eventid, id, referenceid, type, referencetype, sellerid, buyerid, producerid, versionid, changedocuments, hr, dt] +- Project [eventid#5, id#6, referenceid#7, type#8, referencetype#9, sellerid#10L, buyerid#11L, producerid#12, versionid#13, changedocuments#14, hr#16, dt#15] +- Project [eventid#5, id#6, referenceid#7, type#8, referencetype#9, sellerid#10L, buyerid#11L, producerid#12, versionid#13, changedocuments#14, dt#15, hr#16] +- Filter (dt#15 >= 2023-11-29) +- SubqueryAlias spark_catalog.default.order_history_version_audit_rno +- Relation spark_catalog.default.order_history_version_audit_rno[eventid#5,id#6,referenceid#7,type#8,referencetype#9,sellerid#10L,buyerid#11L,producerid#12,versionid#13,changedocuments#14,dt#15,hr#16] parquet == Physical Plan == Execute CreateDataSourceTableAsSelectCommand +- CreateDataSourceTableAsSelectCommand `spark_catalog`.`default`.`order_history_version_audit_rno`, ErrorIfExists, [eventid, id, referenceid, type, referencetype, sellerid, buyerid, producerid, versionid, changedocuments, hr, dt] +- Project [eventid#5, id#6, referenceid#7, type#8, referencetype#9, sellerid#10L, buyerid#11L, producerid#12, versionid#13, changedocuments#14, hr#16, dt#15] +- Project [eventid#5, id#6, referenceid#7,
[jira] [Created] (SPARK-48660) The result of explain is incorrect for CreateTableAsSelect
Yuming Wang created SPARK-48660: --- Summary: The result of explain is incorrect for CreateTableAsSelect Key: SPARK-48660 URL: https://issues.apache.org/jira/browse/SPARK-48660 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.5.1, 3.5.0, 4.0.0 Reporter: Yuming Wang How to reproduce: {noformat} spark-sql (default)> > explain cost > CREATE TABLE order_history_version_audit_rno > USING parquet > PARTITIONED BY (dt) > CLUSTERED BY (id) INTO 1000 buckets > AS SELECT * FROM order_history_version_audit_rno > WHERE dt >= '2023-11-29'; == Optimized Logical Plan == CreateDataSourceTableAsSelectCommand `spark_catalog`.`default`.`order_history_version_audit_rno`, ErrorIfExists, [eventid, id, referenceid, type, referencetype, sellerid, buyerid, producerid, versionid, changedocuments, hr, dt] +- Project [eventid#5, id#6, referenceid#7, type#8, referencetype#9, sellerid#10L, buyerid#11L, producerid#12, versionid#13, changedocuments#14, hr#16, dt#15] +- Project [eventid#5, id#6, referenceid#7, type#8, referencetype#9, sellerid#10L, buyerid#11L, producerid#12, versionid#13, changedocuments#14, dt#15, hr#16] +- Filter (dt#15 >= 2023-11-29) +- SubqueryAlias spark_catalog.default.order_history_version_audit_rno +- Relation spark_catalog.default.order_history_version_audit_rno[eventid#5,id#6,referenceid#7,type#8,referencetype#9,sellerid#10L,buyerid#11L,producerid#12,versionid#13,changedocuments#14,dt#15,hr#16] parquet == Physical Plan == Execute CreateDataSourceTableAsSelectCommand +- CreateDataSourceTableAsSelectCommand `spark_catalog`.`default`.`order_history_version_audit_rno`, ErrorIfExists, [eventid, id, referenceid, type, referencetype, sellerid, buyerid, producerid, versionid, changedocuments, hr, dt] +- Project [eventid#5, id#6, referenceid#7, type#8, referencetype#9, sellerid#10L, buyerid#11L, producerid#12, versionid#13, changedocuments#14, hr#16, dt#15] +- Project [eventid#5, id#6, referenceid#7, type#8, referencetype#9, sellerid#10L, buyerid#11L, producerid#12, versionid#13, changedocuments#14, dt#15, hr#16] +- Filter (dt#15 >= 2023-11-29) +- SubqueryAlias spark_catalog.default.order_history_version_audit_rno +- Relation spark_catalog.default.order_history_version_audit_rno[eventid#5,id#6,referenceid#7,type#8,referencetype#9,sellerid#10L,buyerid#11L,producerid#12,versionid#13,changedocuments#14,dt#15,hr#16] parquet {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48659) Unify v1 and v2 ALTER TABLE .. SET TBLPROPERTIES tests
BingKun Pan created SPARK-48659: --- Summary: Unify v1 and v2 ALTER TABLE .. SET TBLPROPERTIES tests Key: SPARK-48659 URL: https://issues.apache.org/jira/browse/SPARK-48659 Project: Spark Issue Type: Improvement Components: SQL, Tests Affects Versions: 4.0.0 Reporter: BingKun Pan -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-48567) Pyspark StreamingQuery lastProgress and friend should return actual StreamingQueryProgress
[ https://issues.apache.org/jira/browse/SPARK-48567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reopened SPARK-48567: -- Assignee: (was: Wei Liu) Reverted at https://github.com/apache/spark/commit/d067fc6c1635dfe7730223021e912e78637bb791 > Pyspark StreamingQuery lastProgress and friend should return actual > StreamingQueryProgress > -- > > Key: SPARK-48567 > URL: https://issues.apache.org/jira/browse/SPARK-48567 > Project: Spark > Issue Type: New Feature > Components: PySpark, SS >Affects Versions: 4.0.0 >Reporter: Wei Liu >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48567) Pyspark StreamingQuery lastProgress and friend should return actual StreamingQueryProgress
[ https://issues.apache.org/jira/browse/SPARK-48567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-48567: - Fix Version/s: (was: 4.0.0) > Pyspark StreamingQuery lastProgress and friend should return actual > StreamingQueryProgress > -- > > Key: SPARK-48567 > URL: https://issues.apache.org/jira/browse/SPARK-48567 > Project: Spark > Issue Type: New Feature > Components: PySpark, SS >Affects Versions: 4.0.0 >Reporter: Wei Liu >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48651) Document configuring different JDK for Spark on YARN
[ https://issues.apache.org/jira/browse/SPARK-48651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-48651. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 47010 [https://github.com/apache/spark/pull/47010] > Document configuring different JDK for Spark on YARN > > > Key: SPARK-48651 > URL: https://issues.apache.org/jira/browse/SPARK-48651 > Project: Spark > Issue Type: Documentation > Components: Documentation >Affects Versions: 4.0.0 >Reporter: Cheng Pan >Assignee: Cheng Pan >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48651) Document configuring different JDK for Spark on YARN
[ https://issues.apache.org/jira/browse/SPARK-48651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-48651: --- Assignee: Cheng Pan > Document configuring different JDK for Spark on YARN > > > Key: SPARK-48651 > URL: https://issues.apache.org/jira/browse/SPARK-48651 > Project: Spark > Issue Type: Documentation > Components: Documentation >Affects Versions: 4.0.0 >Reporter: Cheng Pan >Assignee: Cheng Pan >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48658) Encode/Decode functions report coding error instead of mojibake
Kent Yao created SPARK-48658: Summary: Encode/Decode functions report coding error instead of mojibake Key: SPARK-48658 URL: https://issues.apache.org/jira/browse/SPARK-48658 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 4.0.0 Reporter: Kent Yao -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48601) Fix Spark internal error when setting null value for jdbc option
[ https://issues.apache.org/jira/browse/SPARK-48601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-48601. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46955 [https://github.com/apache/spark/pull/46955] > Fix Spark internal error when setting null value for jdbc option > > > Key: SPARK-48601 > URL: https://issues.apache.org/jira/browse/SPARK-48601 > Project: Spark > Issue Type: Task > Components: Connect >Affects Versions: 3.4.3 >Reporter: Stevo Mitric >Assignee: Stevo Mitric >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > When setting a null value for any JDBC option, a spark internal error is > thrown caused by java.lang.nullpointer exception. > > Make this exception more user friendly and explain what is causing it. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48649) Add "ignoreInvalidPartitionPaths" and "spark.sql.files.ignoreInvalidPartitionPaths" configs to allow ignoring invalid partition paths
[ https://issues.apache.org/jira/browse/SPARK-48649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-48649: --- Assignee: Ivan Sadikov > Add "ignoreInvalidPartitionPaths" and > "spark.sql.files.ignoreInvalidPartitionPaths" configs to allow ignoring > invalid partition paths > - > > Key: SPARK-48649 > URL: https://issues.apache.org/jira/browse/SPARK-48649 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Ivan Sadikov >Assignee: Ivan Sadikov >Priority: Major > Labels: pull-request-available > > When having a table directory with invalid partitions such as: > {code:java} > table/ > invalid/... > part=1/... > part=2/... > part=3/...{code} > a SQL query reading all of the partitions would fail with > {code:java} > java.lang.AssertionError: assertion failed: Conflicting directory structures > detected. Suspicious paths: > table > table/invalid {code} > > I propose to add a data source option and Spark SQL config to ignore invalid > partition paths. The config will be disabled by default to retain the current > behaviour. > {code:java} > spark.conf.set("spark.sql.files.ignoreInvalidPartitionPaths", "true"){code} > {code:java} > spark.read.format("parquet").option("ignoreInvalidPartitionPaths", > "true").load(...) {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48649) Add "ignoreInvalidPartitionPaths" and "spark.sql.files.ignoreInvalidPartitionPaths" configs to allow ignoring invalid partition paths
[ https://issues.apache.org/jira/browse/SPARK-48649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-48649. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 47006 [https://github.com/apache/spark/pull/47006] > Add "ignoreInvalidPartitionPaths" and > "spark.sql.files.ignoreInvalidPartitionPaths" configs to allow ignoring > invalid partition paths > - > > Key: SPARK-48649 > URL: https://issues.apache.org/jira/browse/SPARK-48649 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Ivan Sadikov >Assignee: Ivan Sadikov >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > When having a table directory with invalid partitions such as: > {code:java} > table/ > invalid/... > part=1/... > part=2/... > part=3/...{code} > a SQL query reading all of the partitions would fail with > {code:java} > java.lang.AssertionError: assertion failed: Conflicting directory structures > detected. Suspicious paths: > table > table/invalid {code} > > I propose to add a data source option and Spark SQL config to ignore invalid > partition paths. The config will be disabled by default to retain the current > behaviour. > {code:java} > spark.conf.set("spark.sql.files.ignoreInvalidPartitionPaths", "true"){code} > {code:java} > spark.read.format("parquet").option("ignoreInvalidPartitionPaths", > "true").load(...) {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48657) The document is out of date and needs to be updated
BrevinFu created SPARK-48657: Summary: The document is out of date and needs to be updated Key: SPARK-48657 URL: https://issues.apache.org/jira/browse/SPARK-48657 Project: Spark Issue Type: IT Help Components: Examples, Java API, SQL Affects Versions: 3.5.1 Environment: Window 10 Java Reporter: BrevinFu I am looking for a data source implementation for Spark SQL 3.5.1 that can accept MQTT and REST interface, through Google search, I found the latest version is two years ago, and Java implementation is very few, I found that the custom data source has v1 and v2 and unbounded table, I am confused, which implementation should I use for 3.5.1, how to implement, Can you update the documentation or help me? thank you. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48634) Avoid statically initialize threadpool at ExecutePlanResponseReattachableIterator
[ https://issues.apache.org/jira/browse/SPARK-48634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-48634: Assignee: Hyukjin Kwon > Avoid statically initialize threadpool at > ExecutePlanResponseReattachableIterator > - > > Key: SPARK-48634 > URL: https://issues.apache.org/jira/browse/SPARK-48634 > Project: Spark > Issue Type: Bug > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > > Avoid having ExecutePlanResponseReattachableIterator._release_thread_pool to > initialize ThreadPool which might be dragged in pickling. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48634) Avoid statically initialize threadpool at ExecutePlanResponseReattachableIterator
[ https://issues.apache.org/jira/browse/SPARK-48634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48634. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46993 [https://github.com/apache/spark/pull/46993] > Avoid statically initialize threadpool at > ExecutePlanResponseReattachableIterator > - > > Key: SPARK-48634 > URL: https://issues.apache.org/jira/browse/SPARK-48634 > Project: Spark > Issue Type: Bug > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Avoid having ExecutePlanResponseReattachableIterator._release_thread_pool to > initialize ThreadPool which might be dragged in pickling. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48656) ArrayIndexOutOfBoundsException in CartesianRDD getPartitions
Nick Young created SPARK-48656: -- Summary: ArrayIndexOutOfBoundsException in CartesianRDD getPartitions Key: SPARK-48656 URL: https://issues.apache.org/jira/browse/SPARK-48656 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 4.0.0 Reporter: Nick Young ```val rdd1 = spark.sparkContext.parallelize(Seq(1, 2, 3), numSlices = 65536) val rdd2 = spark.sparkContext.parallelize(Seq(1, 2, 3), numSlices = 65536)rdd2.cartesian(rdd1).partitions``` Throws `ArrayIndexOutOfBoundsException: 0` at CartesianRDD.scala:69 because `s1.index * numPartitionsInRdd2 + s2.index` overflows and wraps to 0. We should provide a better error message which indicates the number of partition overflows so it's easier for the user to debug. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48646) Refine Python data source API docstring and type hints
[ https://issues.apache.org/jira/browse/SPARK-48646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48646. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 47003 [https://github.com/apache/spark/pull/47003] > Refine Python data source API docstring and type hints > -- > > Key: SPARK-48646 > URL: https://issues.apache.org/jira/browse/SPARK-48646 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Allison Wang >Assignee: Allison Wang >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Improve the type hints and docstrings for datasource.py -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48655) SPJ: Add tests for shuffle skipping for aggregate queries
Szehon Ho created SPARK-48655: - Summary: SPJ: Add tests for shuffle skipping for aggregate queries Key: SPARK-48655 URL: https://issues.apache.org/jira/browse/SPARK-48655 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 4.0.0 Reporter: Szehon Ho -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48654) Kafka source should allow "enable.auto.commit" setting
Raghu Angadi created SPARK-48654: Summary: Kafka source should allow "enable.auto.commit" setting Key: SPARK-48654 URL: https://issues.apache.org/jira/browse/SPARK-48654 Project: Spark Issue Type: Improvement Components: Structured Streaming Affects Versions: 3.4.3 Reporter: Raghu Angadi Kafka source does not allow setting "enable.auto.commit" configuration. It is not clear why it does not. We should remove it, especially with new admin-client consumer (which is the current default). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48586) Remove lock acquisition in doMaintenance() by making a deep copy of RocksDBFileManager in load()
[ https://issues.apache.org/jira/browse/SPARK-48586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Riya Verma updated SPARK-48586: --- Summary: Remove lock acquisition in doMaintenance() by making a deep copy of RocksDBFileManager in load() (was: Remove lock contention between maintenance and task threads) > Remove lock acquisition in doMaintenance() by making a deep copy of > RocksDBFileManager in load() > > > Key: SPARK-48586 > URL: https://issues.apache.org/jira/browse/SPARK-48586 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 3.4.3 >Reporter: Riya Verma >Priority: Major > Labels: pull-request-available > > Currently the lock of the *RocksDB* state store is acquired when uploading > the snapshot inside maintenance tasks when change log checkpointing is > enabled, which causes lock contention between query processing tasks and > state maintenance thread. To eliminate the lock contention, lock acquisition > inside maintenance tasks should be avoided. To prevent race conditions > between task and maintenance threads, we can ensure that *RocksDBFileManager* > has a linear history by ensuring a deep copy of *RocksDBFileManager* every > time a previous version is loaded. The original file manager is not affected > by future state update. The new file manager is not affected by background > snapshot uploading tasks that attempt to upload a snapshot. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48653) Fix Python data source error class references
[ https://issues.apache.org/jira/browse/SPARK-48653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48653: --- Labels: pull-request-available (was: ) > Fix Python data source error class references > - > > Key: SPARK-48653 > URL: https://issues.apache.org/jira/browse/SPARK-48653 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Allison Wang >Priority: Major > Labels: pull-request-available > > Fix invalid error class references. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48653) Fix Python data source error class references
Allison Wang created SPARK-48653: Summary: Fix Python data source error class references Key: SPARK-48653 URL: https://issues.apache.org/jira/browse/SPARK-48653 Project: Spark Issue Type: Sub-task Components: PySpark Affects Versions: 4.0.0 Reporter: Allison Wang Fix invalid error class references. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48652) Casting Issue in Spark SQL: String Column Compared to Integer Value Yields Empty Results
Abhishek Singh created SPARK-48652: -- Summary: Casting Issue in Spark SQL: String Column Compared to Integer Value Yields Empty Results Key: SPARK-48652 URL: https://issues.apache.org/jira/browse/SPARK-48652 Project: Spark Issue Type: Question Components: Spark Core, SQL Affects Versions: 3.3.2 Reporter: Abhishek Singh In Spark SQL, comparing a string column to an integer value can lead to unexpected results due to implicit type casting. When a string column is compared to an integer, Spark attempts to cast the strings to integers, which fails for non-numeric strings, resulting in an empty result set. {code:java} case class Person(id: String, name: String) val personDF = Seq(Person("a", "amit"), Person("b", "abhishek")).toDF() personDF.createOrReplaceTempView("person_ddf") val sqlQuery = "SELECT * FROM person_ddf WHERE id <> -1" val resultDF = spark.sql(sqlQuery) resultDF.show() // Empty result due to type casting issue {code} Below is the logical and physical plan which I m getting {code:java} == Parsed Logical Plan == 'Project [*] +- 'Filter NOT ('id = -1) +- 'UnresolvedRelation [person_ddf], [], false == Analyzed Logical Plan == id: string, name: string Project [id#356, name#357] +- Filter NOT (cast(id#356 as int) = -1) +- SubqueryAlias person_ddf +- View (`person_ddf`, [id#356,name#357]) +- LocalRelation [id#356, name#357] == Optimized Logical Plan == LocalRelation , [id#356, name#357] == Physical Plan == LocalTableScan , [id#356, name#357] == Physical Plan == LocalTableScan (1) {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48573) Upgrade ICU version
[ https://issues.apache.org/jira/browse/SPARK-48573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48573: --- Labels: pull-request-available (was: ) > Upgrade ICU version > --- > > Key: SPARK-48573 > URL: https://issues.apache.org/jira/browse/SPARK-48573 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0 >Reporter: Mihailo Milosevic >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48573) Upgrade ICU version
[ https://issues.apache.org/jira/browse/SPARK-48573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mihailo Milosevic updated SPARK-48573: -- Parent: (was: SPARK-46837) Issue Type: Bug (was: Sub-task) > Upgrade ICU version > --- > > Key: SPARK-48573 > URL: https://issues.apache.org/jira/browse/SPARK-48573 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0 >Reporter: Mihailo Milosevic >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48573) Upgrade ICU version
[ https://issues.apache.org/jira/browse/SPARK-48573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mihailo Milosevic updated SPARK-48573: -- Epic Link: SPARK-46830 > Upgrade ICU version > --- > > Key: SPARK-48573 > URL: https://issues.apache.org/jira/browse/SPARK-48573 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0 >Reporter: Mihailo Milosevic >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48573) Upgrade ICU version
[ https://issues.apache.org/jira/browse/SPARK-48573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mihailo Milosevic updated SPARK-48573: -- Summary: Upgrade ICU version (was: TBD) > Upgrade ICU version > --- > > Key: SPARK-48573 > URL: https://issues.apache.org/jira/browse/SPARK-48573 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Mihailo Milosevic >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48651) Document configuring different JDK for Spark on YARN
[ https://issues.apache.org/jira/browse/SPARK-48651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48651: --- Labels: pull-request-available (was: ) > Document configuring different JDK for Spark on YARN > > > Key: SPARK-48651 > URL: https://issues.apache.org/jira/browse/SPARK-48651 > Project: Spark > Issue Type: Documentation > Components: Documentation >Affects Versions: 4.0.0 >Reporter: Cheng Pan >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48651) Document configuring different JDK for Spark on YARN
Cheng Pan created SPARK-48651: - Summary: Document configuring different JDK for Spark on YARN Key: SPARK-48651 URL: https://issues.apache.org/jira/browse/SPARK-48651 Project: Spark Issue Type: Documentation Components: Documentation Affects Versions: 4.0.0 Reporter: Cheng Pan -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48280) Improve collation testing surface area using expression walking
[ https://issues.apache.org/jira/browse/SPARK-48280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-48280: -- Assignee: (was: Apache Spark) > Improve collation testing surface area using expression walking > --- > > Key: SPARK-48280 > URL: https://issues.apache.org/jira/browse/SPARK-48280 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Mihailo Milosevic >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48280) Improve collation testing surface area using expression walking
[ https://issues.apache.org/jira/browse/SPARK-48280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-48280: -- Assignee: Apache Spark > Improve collation testing surface area using expression walking > --- > > Key: SPARK-48280 > URL: https://issues.apache.org/jira/browse/SPARK-48280 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Mihailo Milosevic >Assignee: Apache Spark >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48459) Implement DataFrameQueryContext in Spark Connect
[ https://issues.apache.org/jira/browse/SPARK-48459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48459. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46789 [https://github.com/apache/spark/pull/46789] > Implement DataFrameQueryContext in Spark Connect > > > Key: SPARK-48459 > URL: https://issues.apache.org/jira/browse/SPARK-48459 > Project: Spark > Issue Type: Improvement > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Implements the same https://github.com/apache/spark/pull/45377 in Spark > Connect -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48459) Implement DataFrameQueryContext in Spark Connect
[ https://issues.apache.org/jira/browse/SPARK-48459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-48459: Assignee: Hyukjin Kwon > Implement DataFrameQueryContext in Spark Connect > > > Key: SPARK-48459 > URL: https://issues.apache.org/jira/browse/SPARK-48459 > Project: Spark > Issue Type: Improvement > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > > Implements the same https://github.com/apache/spark/pull/45377 in Spark > Connect -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48650) Display correct call site from IPython Notebook
[ https://issues.apache.org/jira/browse/SPARK-48650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48650: --- Labels: pull-request-available (was: ) > Display correct call site from IPython Notebook > --- > > Key: SPARK-48650 > URL: https://issues.apache.org/jira/browse/SPARK-48650 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Priority: Major > Labels: pull-request-available > > Current IPython Notebook does not show proper DataFrameQueryContext -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48650) Display correct call site from IPython Notebook
Haejoon Lee created SPARK-48650: --- Summary: Display correct call site from IPython Notebook Key: SPARK-48650 URL: https://issues.apache.org/jira/browse/SPARK-48650 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 4.0.0 Reporter: Haejoon Lee Current IPython Notebook does not show proper DataFrameQueryContext -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48342) [M0] Parser support
[ https://issues.apache.org/jira/browse/SPARK-48342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-48342. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46665 [https://github.com/apache/spark/pull/46665] > [M0] Parser support > --- > > Key: SPARK-48342 > URL: https://issues.apache.org/jira/browse/SPARK-48342 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: David Milicevic >Assignee: David Milicevic >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Implement parse for SQL scripting with all supporting changes for upcoming > interpreter implementation and future extensions of the parser: > * Parser - support only compound statements > * Parser testing > > For more details, design doc can be found in parent Jira item. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48342) [M0] Parser support
[ https://issues.apache.org/jira/browse/SPARK-48342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-48342: --- Assignee: David Milicevic > [M0] Parser support > --- > > Key: SPARK-48342 > URL: https://issues.apache.org/jira/browse/SPARK-48342 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: David Milicevic >Assignee: David Milicevic >Priority: Major > Labels: pull-request-available > > Implement parse for SQL scripting with all supporting changes for upcoming > interpreter implementation and future extensions of the parser: > * Parser - support only compound statements > * Parser testing > > For more details, design doc can be found in parent Jira item. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48585) Make `JdbcDialect.classifyException` throw out the original exception
[ https://issues.apache.org/jira/browse/SPARK-48585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie resolved SPARK-48585. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46937 [https://github.com/apache/spark/pull/46937] > Make `JdbcDialect.classifyException` throw out the original exception > - > > Key: SPARK-48585 > URL: https://issues.apache.org/jira/browse/SPARK-48585 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Critical > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48585) Make `JdbcDialect.classifyException` throw out the original exception
[ https://issues.apache.org/jira/browse/SPARK-48585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie reassigned SPARK-48585: Assignee: BingKun Pan > Make `JdbcDialect.classifyException` throw out the original exception > - > > Key: SPARK-48585 > URL: https://issues.apache.org/jira/browse/SPARK-48585 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Critical > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48647) Refine the error message for YearMonthIntervalType in df.collect
[ https://issues.apache.org/jira/browse/SPARK-48647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48647. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 47004 [https://github.com/apache/spark/pull/47004] > Refine the error message for YearMonthIntervalType in df.collect > > > Key: SPARK-48647 > URL: https://issues.apache.org/jira/browse/SPARK-48647 > Project: Spark > Issue Type: Improvement > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org