Re: [PR] [SPARK-45599][CORE] Use object equality in OpenHashSet [spark]

via GitHub Wed, 07 Feb 2024 13:54:03 -0800


nchammas commented on code in PR #45036:
URL: https://github.com/apache/spark/pull/45036#discussion_r1482123330



##########
core/src/test/scala/org/apache/spark/util/collection/OpenHashSetSuite.scala:
##########
@@ -269,4 +269,35 @@ class OpenHashSetSuite extends SparkFunSuite with Matchers 
{
       assert(pos1 == pos2)
     }
   }
+
+  test("SPARK-45599: 0.0 and -0.0 are equal but not the same") {

Review Comment:
   Consider another interesting case where `java.util.HashSet` and 
`OpenHashSet` differ:
   
   ```scala
   scala> val h = new HashSet[Double]()
   val h: java.util.HashSet[Double] = []
   
   scala> h.add(Double.NaN)
   val res9: Boolean = true
   
   scala> h.add(Double.NaN)
   val res10: Boolean = false
   
   scala> h.contains(Double.NaN)
   val res11: Boolean = true
   
   scala> h.size()
   val res12: Int = 1
   ```
   
   On `master`, `OpenHashSet` does something obviously wrong:
   
   ```scala
   val set = new OpenHashSet[Double]()
   set.add(Double.NaN)
   set.add(Double.NaN)
   set.size  // returns 2
   set.contains(Double.NaN)  // returns false
   ```
   
   This could possibly lead to a bug like the one reported in SPARK-45599 but 
in reverse, where a new NaN row is added rather than dropped. I will see if I 
can construct such a scenario as a demonstration. But regardless, I think this 
behavior is incorrect by itself.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-45599][CORE] Use object equality in OpenHashSet [spark]

Reply via email to