NaN in DataFrame but NULL in SparkSQL

Hyukjin Kwon (Jira) Wed, 05 Oct 2022 18:52:08 -0700


     [ 
https://issues.apache.org/jira/browse/SPARK-40629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Hyukjin Kwon resolved SPARK-40629.
----------------------------------
    Resolution: Invalid

> FLOAT/DOUBLE division by 0 gives Infinity/-Infinity/NaN in DataFrame but NULL 
> in SparkSQL
> -----------------------------------------------------------------------------------------
>
>                 Key: SPARK-40629
>                 URL: https://issues.apache.org/jira/browse/SPARK-40629
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Shell, SQL
>    Affects Versions: 3.2.1
>            Reporter: xsys
>            Priority: Major
>
> h3. Describe the bug
> Storing a FLOAT/DOUBLE value with division by 0 (e.g. {{{}( 1.0/0 
> ).floatValue(){}}}) via {{spark-shell}} outputs {{{}Infinity{}}}. However, 
> {{1.0/0}} ({{{}cast ( 1.0/0 as float){}}}) evaluated to {{NULL}} if the value 
> is inserted into a FLOAT/DOUBLE column of a table via {{{}spark-sql{}}}.
> h3. To Reproduce
> On Spark 3.2.1 (commit {{{}4f25b3f712{}}}), using {{{}spark-sql{}}}:
> {code:java}
> $SPARK_HOME/bin/spark-sql{code}
> Execute the following:
> {code:java}
> spark-sql> create table float_vals(c1 float) stored as ORC;
> spark-sql> insert into float_vals select cast ( 1.0/0  as float);
> spark-sql> select * from float_vals;
> NULL{code}
>  
> Using {{{}spark-shell{}}}:
> {code:java}
> $SPARK_HOME/bin/spark-shell{code}
> Execute the following:
> {code:java}
> scala> import org.apache.spark.sql.{Row, SparkSession}
> import org.apache.spark.sql.{Row, SparkSession}
> scala> import org.apache.spark.sql.types._
> import org.apache.spark.sql.types._
> scala> val rdd = sc.parallelize(Seq(Row(( 1.0/0 ).floatValue())))
> rdd: org.apache.spark.rdd.RDD[org.apache.spark.sql.Row] = 
> ParallelCollectionRDD[180] at parallelize at <console>:28
> scala> val schema = new StructType().add(StructField("c1", FloatType, true) 
> )
> schema: org.apache.spark.sql.types.StructType = StructType( 
> StructField(c1,FloatType,true))
> scala> val df = spark.createDataFrame(rdd, schema)
> df: org.apache.spark.sql.DataFrame = [c1: float]
> scala> df.show(false)
> +---------+
> |c1       |
> +---------+
> |Infinity |
> +---------+
> {code}
> h3. Expected behavior
> We expect the two Spark interfaces ({{{}spark-sql{}}} & {{{}spark-shell{}}}) 
> to behave consistently for the same data type & input combination & 
> configuration ({{{}FLOAT/DOUBLE{}}} and {{{}1.0/0{}}}).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-40629) FLOAT/DOUBLE division by 0 gives Infinity/-Infinity/NaN in DataFrame but NULL in SparkSQL

Reply via email to