[ https://issues.apache.org/jira/browse/SPARK-40629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hyukjin Kwon resolved SPARK-40629. ---------------------------------- Resolution: Invalid > FLOAT/DOUBLE division by 0 gives Infinity/-Infinity/NaN in DataFrame but NULL > in SparkSQL > ----------------------------------------------------------------------------------------- > > Key: SPARK-40629 > URL: https://issues.apache.org/jira/browse/SPARK-40629 > Project: Spark > Issue Type: Bug > Components: Spark Shell, SQL > Affects Versions: 3.2.1 > Reporter: xsys > Priority: Major > > h3. Describe the bug > Storing a FLOAT/DOUBLE value with division by 0 (e.g. {{{}( 1.0/0 > ).floatValue(){}}}) via {{spark-shell}} outputs {{{}Infinity{}}}. However, > {{1.0/0}} ({{{}cast ( 1.0/0 as float){}}}) evaluated to {{NULL}} if the value > is inserted into a FLOAT/DOUBLE column of a table via {{{}spark-sql{}}}. > h3. To Reproduce > On Spark 3.2.1 (commit {{{}4f25b3f712{}}}), using {{{}spark-sql{}}}: > {code:java} > $SPARK_HOME/bin/spark-sql{code} > Execute the following: > {code:java} > spark-sql> create table float_vals(c1 float) stored as ORC; > spark-sql> insert into float_vals select cast ( 1.0/0 as float); > spark-sql> select * from float_vals; > NULL{code} > > Using {{{}spark-shell{}}}: > {code:java} > $SPARK_HOME/bin/spark-shell{code} > Execute the following: > {code:java} > scala> import org.apache.spark.sql.{Row, SparkSession} > import org.apache.spark.sql.{Row, SparkSession} > scala> import org.apache.spark.sql.types._ > import org.apache.spark.sql.types._ > scala> val rdd = sc.parallelize(Seq(Row(( 1.0/0 ).floatValue()))) > rdd: org.apache.spark.rdd.RDD[org.apache.spark.sql.Row] = > ParallelCollectionRDD[180] at parallelize at <console>:28 > scala> val schema = new StructType().add(StructField("c1", FloatType, true) > ) > schema: org.apache.spark.sql.types.StructType = StructType( > StructField(c1,FloatType,true)) > scala> val df = spark.createDataFrame(rdd, schema) > df: org.apache.spark.sql.DataFrame = [c1: float] > scala> df.show(false) > +---------+ > |c1 | > +---------+ > |Infinity | > +---------+ > {code} > h3. Expected behavior > We expect the two Spark interfaces ({{{}spark-sql{}}} & {{{}spark-shell{}}}) > to behave consistently for the same data type & input combination & > configuration ({{{}FLOAT/DOUBLE{}}} and {{{}1.0/0{}}}). -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org