subject:"\[jira\] \[Updated\] \(SPARK\-48652\) Casting Issue in Spark SQL\: String Column Compared to Integer Value Yields Empty Results"

[jira] [Updated] (SPARK-48652) Casting Issue in Spark SQL: String Column Compared to Integer Value Yields Empty Results

2024-07-07 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-48652:
---
Labels: newbie pull-request-available  (was: newbie)

> Casting Issue in Spark SQL: String Column Compared to Integer Value Yields 
> Empty Results
> 
>
> Key: SPARK-48652
> URL: https://issues.apache.org/jira/browse/SPARK-48652
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SQL
>Affects Versions: 3.3.2
>Reporter: Abhishek Singh
>Priority: Blocker
>  Labels: newbie, pull-request-available
>
> In Spark SQL, comparing a string column to an integer value can lead to 
> unexpected results due to type casting resulting in an empty result set.
> {code:java}
> case class Person(id: String, name: String)
> val personDF = Seq(Person("a", "amit"), Person("b", "abhishek")).toDF()
> personDF.createOrReplaceTempView("person_ddf")
> val sqlQuery = "SELECT * FROM person_ddf WHERE id <> -1"
> val resultDF = spark.sql(sqlQuery)
> resultDF.show() // Empty result due to type casting issue 
> {code}
> Below is the logical and physical plan which I m getting
> {code:java}
> == Parsed Logical Plan ==
> 'Project [*]
> +- 'Filter NOT ('id = -1)
>+- 'UnresolvedRelation [person_ddf], [], false
> == Analyzed Logical Plan ==
> id: string, name: string
> Project [id#356, name#357]
> +- Filter NOT (cast(id#356 as int) = -1)
>+- SubqueryAlias person_ddf
>   +- View (`person_ddf`, [id#356,name#357])
>  +- LocalRelation [id#356, name#357]{code}
> *But when I m using the same query and table in Redshift which is based on 
> PostGreSQL. I am getting the desired result.*
> {code:java}
> select * from person where id <> -1; {code}
> Explain plan obtained in Redshift.
> {code:java}
> XN Seq Scan on person  (cost=0.00..0.03 rows=1 width=336)
>   Filter: ((id)::text <> '-1'::text) {code}
>  
> In the execution plan for Spark, the ID column is cast as an integer, while 
> in Redshift, the ID column is cast as a varchar.
> Shouldn't Spark SQL handle this the same way as Redshift, using the datatype 
> of the ID column rather than the datatype of -1?
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-48652) Casting Issue in Spark SQL: String Column Compared to Integer Value Yields Empty Results

2024-06-22 Thread Abhishek Singh (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Singh updated SPARK-48652:
---
Issue Type: Bug  (was: Brainstorming)

> Casting Issue in Spark SQL: String Column Compared to Integer Value Yields 
> Empty Results
> 
>
> Key: SPARK-48652
> URL: https://issues.apache.org/jira/browse/SPARK-48652
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SQL
>Affects Versions: 3.3.2
>Reporter: Abhishek Singh
>Priority: Blocker
>  Labels: newbie
>
> In Spark SQL, comparing a string column to an integer value can lead to 
> unexpected results due to type casting resulting in an empty result set.
> {code:java}
> case class Person(id: String, name: String)
> val personDF = Seq(Person("a", "amit"), Person("b", "abhishek")).toDF()
> personDF.createOrReplaceTempView("person_ddf")
> val sqlQuery = "SELECT * FROM person_ddf WHERE id <> -1"
> val resultDF = spark.sql(sqlQuery)
> resultDF.show() // Empty result due to type casting issue 
> {code}
> Below is the logical and physical plan which I m getting
> {code:java}
> == Parsed Logical Plan ==
> 'Project [*]
> +- 'Filter NOT ('id = -1)
>+- 'UnresolvedRelation [person_ddf], [], false
> == Analyzed Logical Plan ==
> id: string, name: string
> Project [id#356, name#357]
> +- Filter NOT (cast(id#356 as int) = -1)
>+- SubqueryAlias person_ddf
>   +- View (`person_ddf`, [id#356,name#357])
>  +- LocalRelation [id#356, name#357]{code}
> *But when I m using the same query and table in Redshift which is based on 
> PostGreSQL. I am getting the desired result.*
> {code:java}
> select * from person where id <> -1; {code}
> Explain plan obtained in Redshift.
> {code:java}
> XN Seq Scan on person  (cost=0.00..0.03 rows=1 width=336)
>   Filter: ((id)::text <> '-1'::text) {code}
>  
> In the execution plan for Spark, the ID column is cast as an integer, while 
> in Redshift, the ID column is cast as a varchar.
> Shouldn't Spark SQL handle this the same way as Redshift, using the datatype 
> of the ID column rather than the datatype of -1?
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-48652) Casting Issue in Spark SQL: String Column Compared to Integer Value Yields Empty Results

2024-06-20 Thread Abhishek Singh (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Singh updated SPARK-48652:
---
Labels: newbie  (was: )

> Casting Issue in Spark SQL: String Column Compared to Integer Value Yields 
> Empty Results
> 
>
> Key: SPARK-48652
> URL: https://issues.apache.org/jira/browse/SPARK-48652
> Project: Spark
>  Issue Type: Brainstorming
>  Components: Spark Core, SQL
>Affects Versions: 3.3.2
>Reporter: Abhishek Singh
>Priority: Blocker
>  Labels: newbie
>
> In Spark SQL, comparing a string column to an integer value can lead to 
> unexpected results due to type casting resulting in an empty result set.
> {code:java}
> case class Person(id: String, name: String)
> val personDF = Seq(Person("a", "amit"), Person("b", "abhishek")).toDF()
> personDF.createOrReplaceTempView("person_ddf")
> val sqlQuery = "SELECT * FROM person_ddf WHERE id <> -1"
> val resultDF = spark.sql(sqlQuery)
> resultDF.show() // Empty result due to type casting issue 
> {code}
> Below is the logical and physical plan which I m getting
> {code:java}
> == Parsed Logical Plan ==
> 'Project [*]
> +- 'Filter NOT ('id = -1)
>+- 'UnresolvedRelation [person_ddf], [], false
> == Analyzed Logical Plan ==
> id: string, name: string
> Project [id#356, name#357]
> +- Filter NOT (cast(id#356 as int) = -1)
>+- SubqueryAlias person_ddf
>   +- View (`person_ddf`, [id#356,name#357])
>  +- LocalRelation [id#356, name#357]{code}
> *But when I m using the same query and table in Redshift which is based on 
> PostGreSQL. I am getting the desired result.*
> {code:java}
> select * from person where id <> -1; {code}
> Explain plan obtained in Redshift.
> {code:java}
> XN Seq Scan on person  (cost=0.00..0.03 rows=1 width=336)
>   Filter: ((id)::text <> '-1'::text) {code}
>  
> In the execution plan for Spark, the ID column is cast as an integer, while 
> in Redshift, the ID column is cast as a varchar.
> Shouldn't Spark SQL handle this the same way as Redshift, using the datatype 
> of the ID column rather than the datatype of -1?
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-48652) Casting Issue in Spark SQL: String Column Compared to Integer Value Yields Empty Results

2024-06-20 Thread Abhishek Singh (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Singh updated SPARK-48652:
---
Issue Type: Brainstorming  (was: Question)

> Casting Issue in Spark SQL: String Column Compared to Integer Value Yields 
> Empty Results
> 
>
> Key: SPARK-48652
> URL: https://issues.apache.org/jira/browse/SPARK-48652
> Project: Spark
>  Issue Type: Brainstorming
>  Components: Spark Core, SQL
>Affects Versions: 3.3.2
>Reporter: Abhishek Singh
>Priority: Minor
>
> In Spark SQL, comparing a string column to an integer value can lead to 
> unexpected results due to type casting resulting in an empty result set.
> {code:java}
> case class Person(id: String, name: String)
> val personDF = Seq(Person("a", "amit"), Person("b", "abhishek")).toDF()
> personDF.createOrReplaceTempView("person_ddf")
> val sqlQuery = "SELECT * FROM person_ddf WHERE id <> -1"
> val resultDF = spark.sql(sqlQuery)
> resultDF.show() // Empty result due to type casting issue 
> {code}
> Below is the logical and physical plan which I m getting
> {code:java}
> == Parsed Logical Plan ==
> 'Project [*]
> +- 'Filter NOT ('id = -1)
>+- 'UnresolvedRelation [person_ddf], [], false
> == Analyzed Logical Plan ==
> id: string, name: string
> Project [id#356, name#357]
> +- Filter NOT (cast(id#356 as int) = -1)
>+- SubqueryAlias person_ddf
>   +- View (`person_ddf`, [id#356,name#357])
>  +- LocalRelation [id#356, name#357]{code}
> *But when I m using the same query and table in Redshift which is based on 
> PostGreSQL. I am getting the desired result.*
> {code:java}
> select * from person where id <> -1; {code}
> Explain plan obtained in Redshift.
> {code:java}
> XN Seq Scan on person  (cost=0.00..0.03 rows=1 width=336)
>   Filter: ((id)::text <> '-1'::text) {code}
>  
> In the execution plan for Spark, the ID column is cast as an integer, while 
> in Redshift, the ID column is cast as a varchar.
> Shouldn't Spark SQL handle this the same way as Redshift, using the datatype 
> of the ID column rather than the datatype of -1?
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-48652) Casting Issue in Spark SQL: String Column Compared to Integer Value Yields Empty Results

2024-06-20 Thread Abhishek Singh (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Singh updated SPARK-48652:
---
Priority: Blocker  (was: Minor)

> Casting Issue in Spark SQL: String Column Compared to Integer Value Yields 
> Empty Results
> 
>
> Key: SPARK-48652
> URL: https://issues.apache.org/jira/browse/SPARK-48652
> Project: Spark
>  Issue Type: Brainstorming
>  Components: Spark Core, SQL
>Affects Versions: 3.3.2
>Reporter: Abhishek Singh
>Priority: Blocker
>
> In Spark SQL, comparing a string column to an integer value can lead to 
> unexpected results due to type casting resulting in an empty result set.
> {code:java}
> case class Person(id: String, name: String)
> val personDF = Seq(Person("a", "amit"), Person("b", "abhishek")).toDF()
> personDF.createOrReplaceTempView("person_ddf")
> val sqlQuery = "SELECT * FROM person_ddf WHERE id <> -1"
> val resultDF = spark.sql(sqlQuery)
> resultDF.show() // Empty result due to type casting issue 
> {code}
> Below is the logical and physical plan which I m getting
> {code:java}
> == Parsed Logical Plan ==
> 'Project [*]
> +- 'Filter NOT ('id = -1)
>+- 'UnresolvedRelation [person_ddf], [], false
> == Analyzed Logical Plan ==
> id: string, name: string
> Project [id#356, name#357]
> +- Filter NOT (cast(id#356 as int) = -1)
>+- SubqueryAlias person_ddf
>   +- View (`person_ddf`, [id#356,name#357])
>  +- LocalRelation [id#356, name#357]{code}
> *But when I m using the same query and table in Redshift which is based on 
> PostGreSQL. I am getting the desired result.*
> {code:java}
> select * from person where id <> -1; {code}
> Explain plan obtained in Redshift.
> {code:java}
> XN Seq Scan on person  (cost=0.00..0.03 rows=1 width=336)
>   Filter: ((id)::text <> '-1'::text) {code}
>  
> In the execution plan for Spark, the ID column is cast as an integer, while 
> in Redshift, the ID column is cast as a varchar.
> Shouldn't Spark SQL handle this the same way as Redshift, using the datatype 
> of the ID column rather than the datatype of -1?
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-48652) Casting Issue in Spark SQL: String Column Compared to Integer Value Yields Empty Results

2024-06-18 Thread Abhishek Singh (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Singh updated SPARK-48652:
---
Description: 
In Spark SQL, comparing a string column to an integer value can lead to 
unexpected results due to type casting resulting in an empty result set.
{code:java}
case class Person(id: String, name: String)
val personDF = Seq(Person("a", "amit"), Person("b", "abhishek")).toDF()
personDF.createOrReplaceTempView("person_ddf")
val sqlQuery = "SELECT * FROM person_ddf WHERE id <> -1"
val resultDF = spark.sql(sqlQuery)
resultDF.show() // Empty result due to type casting issue 

{code}
Below is the logical and physical plan which I m getting
{code:java}
== Parsed Logical Plan ==
'Project [*]
+- 'Filter NOT ('id = -1)
   +- 'UnresolvedRelation [person_ddf], [], false

== Analyzed Logical Plan ==
id: string, name: string
Project [id#356, name#357]
+- Filter NOT (cast(id#356 as int) = -1)
   +- SubqueryAlias person_ddf
  +- View (`person_ddf`, [id#356,name#357])
 +- LocalRelation [id#356, name#357]{code}

*But when I m using the same query and table in Redshift which is based on 
PostGreSQL. I am getting the desired result.*


{code:java}
select * from person where id <> -1; {code}

Explain plan obtained in Redshift.


{code:java}
XN Seq Scan on person  (cost=0.00..0.03 rows=1 width=336)
  Filter: ((id)::text <> '-1'::text) {code}
 

In the execution plan for Spark, the ID column is cast as an integer, while in 
Redshift, the ID column is cast as a varchar.

Shouldn't Spark SQL handle this the same way as Redshift, using the datatype of 
the ID column rather than the datatype of -1?

 

  was:
In Spark SQL, comparing a string column to an integer value can lead to 
unexpected results due to implicit type casting. When a string column is 
compared to an integer, Spark attempts to cast the strings to integers, which 
fails for non-numeric strings, resulting in an empty result set.


{code:java}
case class Person(id: String, name: String)
val personDF = Seq(Person("a", "amit"), Person("b", "abhishek")).toDF()
personDF.createOrReplaceTempView("person_ddf")
val sqlQuery = "SELECT * FROM person_ddf WHERE id <> -1"
val resultDF = spark.sql(sqlQuery)
resultDF.show() // Empty result due to type casting issue 

{code}
Below is the logical and physical plan which I m getting
{code:java}
== Parsed Logical Plan ==
'Project [*]
+- 'Filter NOT ('id = -1)
   +- 'UnresolvedRelation [person_ddf], [], false

== Analyzed Logical Plan ==
id: string, name: string
Project [id#356, name#357]
+- Filter NOT (cast(id#356 as int) = -1)
   +- SubqueryAlias person_ddf
      +- View (`person_ddf`, [id#356,name#357])
         +- LocalRelation [id#356, name#357]

== Optimized Logical Plan ==
LocalRelation , [id#356, name#357]

== Physical Plan ==
LocalTableScan , [id#356, name#357]

== Physical Plan ==
LocalTableScan (1) {code}


> Casting Issue in Spark SQL: String Column Compared to Integer Value Yields 
> Empty Results
> 
>
> Key: SPARK-48652
> URL: https://issues.apache.org/jira/browse/SPARK-48652
> Project: Spark
>  Issue Type: Question
>  Components: Spark Core, SQL
>Affects Versions: 3.3.2
>Reporter: Abhishek Singh
>Priority: Minor
>
> In Spark SQL, comparing a string column to an integer value can lead to 
> unexpected results due to type casting resulting in an empty result set.
> {code:java}
> case class Person(id: String, name: String)
> val personDF = Seq(Person("a", "amit"), Person("b", "abhishek")).toDF()
> personDF.createOrReplaceTempView("person_ddf")
> val sqlQuery = "SELECT * FROM person_ddf WHERE id <> -1"
> val resultDF = spark.sql(sqlQuery)
> resultDF.show() // Empty result due to type casting issue 
> {code}
> Below is the logical and physical plan which I m getting
> {code:java}
> == Parsed Logical Plan ==
> 'Project [*]
> +- 'Filter NOT ('id = -1)
>+- 'UnresolvedRelation [person_ddf], [], false
> == Analyzed Logical Plan ==
> id: string, name: string
> Project [id#356, name#357]
> +- Filter NOT (cast(id#356 as int) = -1)
>+- SubqueryAlias person_ddf
>   +- View (`person_ddf`, [id#356,name#357])
>  +- LocalRelation [id#356, name#357]{code}
> *But when I m using the same query and table in Redshift which is based on 
> PostGreSQL. I am getting the desired result.*
> {code:java}
> select * from person where id <> -1; {code}
> Explain plan obtained in Redshift.
> {code:java}
> XN Seq Scan on person  (cost=0.00..0.03 rows=1 width=336)
>   Filter: ((id)::text <> '-1'::text) {code}
>  
> In the execution plan for Spark, the ID column is cast as an integer, while 
> in Redshift, the ID column is cast as a varchar.
> Shouldn't Spark SQL handle this the same way as Redshift, using the datatype 
> of the ID column rathe

[jira] [Updated] (SPARK-48652) Casting Issue in Spark SQL: String Column Compared to Integer Value Yields Empty Results

[jira] [Updated] (SPARK-48652) Casting Issue in Spark SQL: String Column Compared to Integer Value Yields Empty Results

[jira] [Updated] (SPARK-48652) Casting Issue in Spark SQL: String Column Compared to Integer Value Yields Empty Results

[jira] [Updated] (SPARK-48652) Casting Issue in Spark SQL: String Column Compared to Integer Value Yields Empty Results

[jira] [Updated] (SPARK-48652) Casting Issue in Spark SQL: String Column Compared to Integer Value Yields Empty Results

[jira] [Updated] (SPARK-48652) Casting Issue in Spark SQL: String Column Compared to Integer Value Yields Empty Results

6 matches

Site Navigation

Mail list logo

Footer information