cloud-fan commented on code in PR #53695:
URL: https://github.com/apache/spark/pull/53695#discussion_r3288907028
##########
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CollectionExpressionsSuite.scala:
##########
@@ -3111,4 +3111,38 @@ class CollectionExpressionsSuite
a, Literal(5), Literal.create("q", StringType)), Seq("b", "a", "c",
null, "q")
)
}
+
+ test("SPARK-54918: array operations should normalize -0.0 to 0.0") {
Review Comment:
Regardless of which approach we land on, the test needs to be tightened — it
currently doesn't verify what the PR claims.
`ExpressionEvalHelper.checkResult` compares Doubles with `==`
(`ExpressionEvalHelper.scala:146`), and `0.0 == -0.0` is `true` per IEEE 754.
So:
```scala
checkEvaluation(ArrayDistinct(ad), Seq(0.0d, 1.0d))
```
passes whether the output is `[0.0, 1.0]` (the fix worked) or `[-0.0, 1.0]`
(dedup happened, but the *normalized* form wasn't preserved). The test only
verifies the count is right.
The PR description explicitly frames the user-facing behavior as "Returns
`[0.0, 1.0]` instead of `[0.0, -0.0, 1.0]`" — the half that the survivor is
`0.0` (not `-0.0`) is what the IEEE-754 bit-distinction actually matters for
downstream consumers, and it's the half that isn't asserted.
Suggested approach — use `Double.doubleToRawLongBits` (or `1.0 / x` →
`Double.PositiveInfinity`) to check the bits, e.g.
```scala
def isPositiveZero(d: Double): Boolean =
java.lang.Double.doubleToRawLongBits(d) == 0L
```
and assert that against each surviving zero in the result.
Please also add:
- `array_distinct(array(-0.0d, 0.0d))` — verifies that even when `-0.0`
appears first, the result is normalized to `0.0` (this is the property
downstream operators relying on bit equality actually care about).
- `array_except(array(-0.0d), array(0.0d))` and the symmetric
`array_except(array(0.0d), array(-0.0d))` — confirms both sides normalize
before the bit-level set difference.
- `array_distinct(array(Double.NaN, 0.0d, -0.0d, Double.NaN))` — NaN
handling is a separate code path and the interaction with the new normalization
is worth one assertion.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]