[jira] [Updated] (SPARK-48652) Casting Issue in Spark SQL: String Column Compared to Integer Value Yields Empty Results
[ https://issues.apache.org/jira/browse/SPARK-48652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48652: --- Labels: newbie pull-request-available (was: newbie) > Casting Issue in Spark SQL: String Column Compared to Integer Value Yields > Empty Results > > > Key: SPARK-48652 > URL: https://issues.apache.org/jira/browse/SPARK-48652 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL >Affects Versions: 3.3.2 >Reporter: Abhishek Singh >Priority: Blocker > Labels: newbie, pull-request-available > > In Spark SQL, comparing a string column to an integer value can lead to > unexpected results due to type casting resulting in an empty result set. > {code:java} > case class Person(id: String, name: String) > val personDF = Seq(Person("a", "amit"), Person("b", "abhishek")).toDF() > personDF.createOrReplaceTempView("person_ddf") > val sqlQuery = "SELECT * FROM person_ddf WHERE id <> -1" > val resultDF = spark.sql(sqlQuery) > resultDF.show() // Empty result due to type casting issue > {code} > Below is the logical and physical plan which I m getting > {code:java} > == Parsed Logical Plan == > 'Project [*] > +- 'Filter NOT ('id = -1) >+- 'UnresolvedRelation [person_ddf], [], false > == Analyzed Logical Plan == > id: string, name: string > Project [id#356, name#357] > +- Filter NOT (cast(id#356 as int) = -1) >+- SubqueryAlias person_ddf > +- View (`person_ddf`, [id#356,name#357]) > +- LocalRelation [id#356, name#357]{code} > *But when I m using the same query and table in Redshift which is based on > PostGreSQL. I am getting the desired result.* > {code:java} > select * from person where id <> -1; {code} > Explain plan obtained in Redshift. > {code:java} > XN Seq Scan on person (cost=0.00..0.03 rows=1 width=336) > Filter: ((id)::text <> '-1'::text) {code} > > In the execution plan for Spark, the ID column is cast as an integer, while > in Redshift, the ID column is cast as a varchar. > Shouldn't Spark SQL handle this the same way as Redshift, using the datatype > of the ID column rather than the datatype of -1? > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48652) Casting Issue in Spark SQL: String Column Compared to Integer Value Yields Empty Results
[ https://issues.apache.org/jira/browse/SPARK-48652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Singh updated SPARK-48652: --- Issue Type: Bug (was: Brainstorming) > Casting Issue in Spark SQL: String Column Compared to Integer Value Yields > Empty Results > > > Key: SPARK-48652 > URL: https://issues.apache.org/jira/browse/SPARK-48652 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL >Affects Versions: 3.3.2 >Reporter: Abhishek Singh >Priority: Blocker > Labels: newbie > > In Spark SQL, comparing a string column to an integer value can lead to > unexpected results due to type casting resulting in an empty result set. > {code:java} > case class Person(id: String, name: String) > val personDF = Seq(Person("a", "amit"), Person("b", "abhishek")).toDF() > personDF.createOrReplaceTempView("person_ddf") > val sqlQuery = "SELECT * FROM person_ddf WHERE id <> -1" > val resultDF = spark.sql(sqlQuery) > resultDF.show() // Empty result due to type casting issue > {code} > Below is the logical and physical plan which I m getting > {code:java} > == Parsed Logical Plan == > 'Project [*] > +- 'Filter NOT ('id = -1) >+- 'UnresolvedRelation [person_ddf], [], false > == Analyzed Logical Plan == > id: string, name: string > Project [id#356, name#357] > +- Filter NOT (cast(id#356 as int) = -1) >+- SubqueryAlias person_ddf > +- View (`person_ddf`, [id#356,name#357]) > +- LocalRelation [id#356, name#357]{code} > *But when I m using the same query and table in Redshift which is based on > PostGreSQL. I am getting the desired result.* > {code:java} > select * from person where id <> -1; {code} > Explain plan obtained in Redshift. > {code:java} > XN Seq Scan on person (cost=0.00..0.03 rows=1 width=336) > Filter: ((id)::text <> '-1'::text) {code} > > In the execution plan for Spark, the ID column is cast as an integer, while > in Redshift, the ID column is cast as a varchar. > Shouldn't Spark SQL handle this the same way as Redshift, using the datatype > of the ID column rather than the datatype of -1? > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48652) Casting Issue in Spark SQL: String Column Compared to Integer Value Yields Empty Results
[ https://issues.apache.org/jira/browse/SPARK-48652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Singh updated SPARK-48652: --- Labels: newbie (was: ) > Casting Issue in Spark SQL: String Column Compared to Integer Value Yields > Empty Results > > > Key: SPARK-48652 > URL: https://issues.apache.org/jira/browse/SPARK-48652 > Project: Spark > Issue Type: Brainstorming > Components: Spark Core, SQL >Affects Versions: 3.3.2 >Reporter: Abhishek Singh >Priority: Blocker > Labels: newbie > > In Spark SQL, comparing a string column to an integer value can lead to > unexpected results due to type casting resulting in an empty result set. > {code:java} > case class Person(id: String, name: String) > val personDF = Seq(Person("a", "amit"), Person("b", "abhishek")).toDF() > personDF.createOrReplaceTempView("person_ddf") > val sqlQuery = "SELECT * FROM person_ddf WHERE id <> -1" > val resultDF = spark.sql(sqlQuery) > resultDF.show() // Empty result due to type casting issue > {code} > Below is the logical and physical plan which I m getting > {code:java} > == Parsed Logical Plan == > 'Project [*] > +- 'Filter NOT ('id = -1) >+- 'UnresolvedRelation [person_ddf], [], false > == Analyzed Logical Plan == > id: string, name: string > Project [id#356, name#357] > +- Filter NOT (cast(id#356 as int) = -1) >+- SubqueryAlias person_ddf > +- View (`person_ddf`, [id#356,name#357]) > +- LocalRelation [id#356, name#357]{code} > *But when I m using the same query and table in Redshift which is based on > PostGreSQL. I am getting the desired result.* > {code:java} > select * from person where id <> -1; {code} > Explain plan obtained in Redshift. > {code:java} > XN Seq Scan on person (cost=0.00..0.03 rows=1 width=336) > Filter: ((id)::text <> '-1'::text) {code} > > In the execution plan for Spark, the ID column is cast as an integer, while > in Redshift, the ID column is cast as a varchar. > Shouldn't Spark SQL handle this the same way as Redshift, using the datatype > of the ID column rather than the datatype of -1? > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48652) Casting Issue in Spark SQL: String Column Compared to Integer Value Yields Empty Results
[ https://issues.apache.org/jira/browse/SPARK-48652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Singh updated SPARK-48652: --- Issue Type: Brainstorming (was: Question) > Casting Issue in Spark SQL: String Column Compared to Integer Value Yields > Empty Results > > > Key: SPARK-48652 > URL: https://issues.apache.org/jira/browse/SPARK-48652 > Project: Spark > Issue Type: Brainstorming > Components: Spark Core, SQL >Affects Versions: 3.3.2 >Reporter: Abhishek Singh >Priority: Minor > > In Spark SQL, comparing a string column to an integer value can lead to > unexpected results due to type casting resulting in an empty result set. > {code:java} > case class Person(id: String, name: String) > val personDF = Seq(Person("a", "amit"), Person("b", "abhishek")).toDF() > personDF.createOrReplaceTempView("person_ddf") > val sqlQuery = "SELECT * FROM person_ddf WHERE id <> -1" > val resultDF = spark.sql(sqlQuery) > resultDF.show() // Empty result due to type casting issue > {code} > Below is the logical and physical plan which I m getting > {code:java} > == Parsed Logical Plan == > 'Project [*] > +- 'Filter NOT ('id = -1) >+- 'UnresolvedRelation [person_ddf], [], false > == Analyzed Logical Plan == > id: string, name: string > Project [id#356, name#357] > +- Filter NOT (cast(id#356 as int) = -1) >+- SubqueryAlias person_ddf > +- View (`person_ddf`, [id#356,name#357]) > +- LocalRelation [id#356, name#357]{code} > *But when I m using the same query and table in Redshift which is based on > PostGreSQL. I am getting the desired result.* > {code:java} > select * from person where id <> -1; {code} > Explain plan obtained in Redshift. > {code:java} > XN Seq Scan on person (cost=0.00..0.03 rows=1 width=336) > Filter: ((id)::text <> '-1'::text) {code} > > In the execution plan for Spark, the ID column is cast as an integer, while > in Redshift, the ID column is cast as a varchar. > Shouldn't Spark SQL handle this the same way as Redshift, using the datatype > of the ID column rather than the datatype of -1? > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48652) Casting Issue in Spark SQL: String Column Compared to Integer Value Yields Empty Results
[ https://issues.apache.org/jira/browse/SPARK-48652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Singh updated SPARK-48652: --- Priority: Blocker (was: Minor) > Casting Issue in Spark SQL: String Column Compared to Integer Value Yields > Empty Results > > > Key: SPARK-48652 > URL: https://issues.apache.org/jira/browse/SPARK-48652 > Project: Spark > Issue Type: Brainstorming > Components: Spark Core, SQL >Affects Versions: 3.3.2 >Reporter: Abhishek Singh >Priority: Blocker > > In Spark SQL, comparing a string column to an integer value can lead to > unexpected results due to type casting resulting in an empty result set. > {code:java} > case class Person(id: String, name: String) > val personDF = Seq(Person("a", "amit"), Person("b", "abhishek")).toDF() > personDF.createOrReplaceTempView("person_ddf") > val sqlQuery = "SELECT * FROM person_ddf WHERE id <> -1" > val resultDF = spark.sql(sqlQuery) > resultDF.show() // Empty result due to type casting issue > {code} > Below is the logical and physical plan which I m getting > {code:java} > == Parsed Logical Plan == > 'Project [*] > +- 'Filter NOT ('id = -1) >+- 'UnresolvedRelation [person_ddf], [], false > == Analyzed Logical Plan == > id: string, name: string > Project [id#356, name#357] > +- Filter NOT (cast(id#356 as int) = -1) >+- SubqueryAlias person_ddf > +- View (`person_ddf`, [id#356,name#357]) > +- LocalRelation [id#356, name#357]{code} > *But when I m using the same query and table in Redshift which is based on > PostGreSQL. I am getting the desired result.* > {code:java} > select * from person where id <> -1; {code} > Explain plan obtained in Redshift. > {code:java} > XN Seq Scan on person (cost=0.00..0.03 rows=1 width=336) > Filter: ((id)::text <> '-1'::text) {code} > > In the execution plan for Spark, the ID column is cast as an integer, while > in Redshift, the ID column is cast as a varchar. > Shouldn't Spark SQL handle this the same way as Redshift, using the datatype > of the ID column rather than the datatype of -1? > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48652) Casting Issue in Spark SQL: String Column Compared to Integer Value Yields Empty Results
[ https://issues.apache.org/jira/browse/SPARK-48652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Singh updated SPARK-48652: --- Description: In Spark SQL, comparing a string column to an integer value can lead to unexpected results due to type casting resulting in an empty result set. {code:java} case class Person(id: String, name: String) val personDF = Seq(Person("a", "amit"), Person("b", "abhishek")).toDF() personDF.createOrReplaceTempView("person_ddf") val sqlQuery = "SELECT * FROM person_ddf WHERE id <> -1" val resultDF = spark.sql(sqlQuery) resultDF.show() // Empty result due to type casting issue {code} Below is the logical and physical plan which I m getting {code:java} == Parsed Logical Plan == 'Project [*] +- 'Filter NOT ('id = -1) +- 'UnresolvedRelation [person_ddf], [], false == Analyzed Logical Plan == id: string, name: string Project [id#356, name#357] +- Filter NOT (cast(id#356 as int) = -1) +- SubqueryAlias person_ddf +- View (`person_ddf`, [id#356,name#357]) +- LocalRelation [id#356, name#357]{code} *But when I m using the same query and table in Redshift which is based on PostGreSQL. I am getting the desired result.* {code:java} select * from person where id <> -1; {code} Explain plan obtained in Redshift. {code:java} XN Seq Scan on person (cost=0.00..0.03 rows=1 width=336) Filter: ((id)::text <> '-1'::text) {code} In the execution plan for Spark, the ID column is cast as an integer, while in Redshift, the ID column is cast as a varchar. Shouldn't Spark SQL handle this the same way as Redshift, using the datatype of the ID column rather than the datatype of -1? was: In Spark SQL, comparing a string column to an integer value can lead to unexpected results due to implicit type casting. When a string column is compared to an integer, Spark attempts to cast the strings to integers, which fails for non-numeric strings, resulting in an empty result set. {code:java} case class Person(id: String, name: String) val personDF = Seq(Person("a", "amit"), Person("b", "abhishek")).toDF() personDF.createOrReplaceTempView("person_ddf") val sqlQuery = "SELECT * FROM person_ddf WHERE id <> -1" val resultDF = spark.sql(sqlQuery) resultDF.show() // Empty result due to type casting issue {code} Below is the logical and physical plan which I m getting {code:java} == Parsed Logical Plan == 'Project [*] +- 'Filter NOT ('id = -1) +- 'UnresolvedRelation [person_ddf], [], false == Analyzed Logical Plan == id: string, name: string Project [id#356, name#357] +- Filter NOT (cast(id#356 as int) = -1) +- SubqueryAlias person_ddf +- View (`person_ddf`, [id#356,name#357]) +- LocalRelation [id#356, name#357] == Optimized Logical Plan == LocalRelation , [id#356, name#357] == Physical Plan == LocalTableScan , [id#356, name#357] == Physical Plan == LocalTableScan (1) {code} > Casting Issue in Spark SQL: String Column Compared to Integer Value Yields > Empty Results > > > Key: SPARK-48652 > URL: https://issues.apache.org/jira/browse/SPARK-48652 > Project: Spark > Issue Type: Question > Components: Spark Core, SQL >Affects Versions: 3.3.2 >Reporter: Abhishek Singh >Priority: Minor > > In Spark SQL, comparing a string column to an integer value can lead to > unexpected results due to type casting resulting in an empty result set. > {code:java} > case class Person(id: String, name: String) > val personDF = Seq(Person("a", "amit"), Person("b", "abhishek")).toDF() > personDF.createOrReplaceTempView("person_ddf") > val sqlQuery = "SELECT * FROM person_ddf WHERE id <> -1" > val resultDF = spark.sql(sqlQuery) > resultDF.show() // Empty result due to type casting issue > {code} > Below is the logical and physical plan which I m getting > {code:java} > == Parsed Logical Plan == > 'Project [*] > +- 'Filter NOT ('id = -1) >+- 'UnresolvedRelation [person_ddf], [], false > == Analyzed Logical Plan == > id: string, name: string > Project [id#356, name#357] > +- Filter NOT (cast(id#356 as int) = -1) >+- SubqueryAlias person_ddf > +- View (`person_ddf`, [id#356,name#357]) > +- LocalRelation [id#356, name#357]{code} > *But when I m using the same query and table in Redshift which is based on > PostGreSQL. I am getting the desired result.* > {code:java} > select * from person where id <> -1; {code} > Explain plan obtained in Redshift. > {code:java} > XN Seq Scan on person (cost=0.00..0.03 rows=1 width=336) > Filter: ((id)::text <> '-1'::text) {code} > > In the execution plan for Spark, the ID column is cast as an integer, while > in Redshift, the ID column is cast as a varchar. > Shouldn't Spark SQL handle this the same way as Redshift, using the datatype > of the ID column rathe