[ 
https://issues.apache.org/jira/browse/SPARK-14051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-14051:
----------------------------------
    Description: 
Since SPARK-9079 and SPARK-9145, `NaN = NaN` returns true and works well. The 
only exception case is direct comparison between  `Row(Float.NaN)` and 
`Row(Double.NaN)`. The following is the example: the last two expressions had 
better be *true* and *List([NaN])* for consistency.

{code}
scala> 
Seq((1d,1f),(Double.NaN,Float.NaN)).toDF("a","b").registerTempTable("tmp")

scala> sql("select a,b,a=b from tmp").collect()
res1: Array[org.apache.spark.sql.Row] = Array([1.0,1.0,true], [NaN,NaN,true])

scala> val row_a = sql("select a from tmp").collect()
row_a: Array[org.apache.spark.sql.Row] = Array([1.0], [NaN])

scala> val row_b = sql("select b from tmp").collect()
row_b: Array[org.apache.spark.sql.Row] = Array([1.0], [NaN])

scala> row_a(0) == row_b(0)
res2: Boolean = true

scala> List(row_a(0),row_b(0)).distinct
res3: List[org.apache.spark.sql.Row] = List([1.0])

scala> row_a(1) == row_b(1)
res4: Boolean = false

scala> List(row_a(1),row_b(1)).distinct
res5: List[org.apache.spark.sql.Row] = List([NaN], [NaN])
{code}

Please note that the following background truths as of today.
* Double.NaN != Double.NaN (Scala/Java/IEEE Standard)
* Float.NaN != Float.NaN (Scala/Java/IEEE Standard)
* Double.NaN != Float.NaN (Scala/Java/IEEE Standard)
* Row(Double.NaN) == Row(Double.NaN)
* Row(Float.NaN) == Row(Float.NaN)
* *Row(Double.NaN) != Row(Float.NaN)*  <== The problem of this issue.

  was:
Since SPARK-9079 and SPARK-9145, `NaN = NaN` returns true and works well. The 
only exception case is direct comparison between  `Row(Float.NaN)` and 
`Row(Double.NaN)`. The following is the example: the last two expressions had 
better be *true* and *List([NaN])* for consistency.

{code}
scala> 
Seq((1d,1f),(Double.NaN,Float.NaN)).toDF("a","b").registerTempTable("tmp")

scala> sql("select a,b,a=b from tmp").collect()
res1: Array[org.apache.spark.sql.Row] = Array([1.0,1.0,true], [NaN,NaN,true])

scala> val row_a = sql("select a from tmp").collect()
row_a: Array[org.apache.spark.sql.Row] = Array([1.0], [NaN])

scala> val row_b = sql("select b from tmp").collect()
row_b: Array[org.apache.spark.sql.Row] = Array([1.0], [NaN])

scala> row_a(0) == row_b(0)
res2: Boolean = true

scala> List(row_a(0),row_b(0)).distinct
res3: List[org.apache.spark.sql.Row] = List([1.0])

scala> row_a(1) == row_b(1)
res4: Boolean = false

scala> List(row_a(1),row_b(1)).distinct
res5: List[org.apache.spark.sql.Row] = List([NaN], [NaN])
{code}

Please note that the following background truths as of today.
* Double.NaN != Double.NaN (Scala/Java/IEEE Standard)
* Float.NaN != Float.NaN (Scala/Java/IEEE Standard)
* Double.NaN != Float.NaN (Scala/Java/IEEE Standard)
* Row(Double.NaN) == Row(Double.NaN)
* Row(Float.NaN) == Row(Float.NaN)
* *Row(Double.NaN) != Row(Float.NaN)*  <== This Issue.


> Implement `Double.NaN==Float.NaN` for consistency
> -------------------------------------------------
>
>                 Key: SPARK-14051
>                 URL: https://issues.apache.org/jira/browse/SPARK-14051
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>            Reporter: Dongjoon Hyun
>            Priority: Minor
>
> Since SPARK-9079 and SPARK-9145, `NaN = NaN` returns true and works well. The 
> only exception case is direct comparison between  `Row(Float.NaN)` and 
> `Row(Double.NaN)`. The following is the example: the last two expressions had 
> better be *true* and *List([NaN])* for consistency.
> {code}
> scala> 
> Seq((1d,1f),(Double.NaN,Float.NaN)).toDF("a","b").registerTempTable("tmp")
> scala> sql("select a,b,a=b from tmp").collect()
> res1: Array[org.apache.spark.sql.Row] = Array([1.0,1.0,true], [NaN,NaN,true])
> scala> val row_a = sql("select a from tmp").collect()
> row_a: Array[org.apache.spark.sql.Row] = Array([1.0], [NaN])
> scala> val row_b = sql("select b from tmp").collect()
> row_b: Array[org.apache.spark.sql.Row] = Array([1.0], [NaN])
> scala> row_a(0) == row_b(0)
> res2: Boolean = true
> scala> List(row_a(0),row_b(0)).distinct
> res3: List[org.apache.spark.sql.Row] = List([1.0])
> scala> row_a(1) == row_b(1)
> res4: Boolean = false
> scala> List(row_a(1),row_b(1)).distinct
> res5: List[org.apache.spark.sql.Row] = List([NaN], [NaN])
> {code}
> Please note that the following background truths as of today.
> * Double.NaN != Double.NaN (Scala/Java/IEEE Standard)
> * Float.NaN != Float.NaN (Scala/Java/IEEE Standard)
> * Double.NaN != Float.NaN (Scala/Java/IEEE Standard)
> * Row(Double.NaN) == Row(Double.NaN)
> * Row(Float.NaN) == Row(Float.NaN)
> * *Row(Double.NaN) != Row(Float.NaN)*  <== The problem of this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to