[jira] [Commented] (SPARK-51475) ArrayDistinct Producing Inconsistent Behavior For -0.0 and +0.0

Albert Sugranyes (Jira) Tue, 06 Jan 2026 07:49:15 -0800


    [ 
https://issues.apache.org/jira/browse/SPARK-51475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18050171#comment-18050171
 ]


Albert Sugranyes commented on SPARK-51475:
------------------------------------------

I've opened SPARK-54918 with a fix that covers this issue and its equivalents 
for other array operations.

PR: https://github.com/apache/spark/pull/53695

> ArrayDistinct Producing Inconsistent Behavior For -0.0 and +0.0
> ---------------------------------------------------------------
>
>                 Key: SPARK-51475
>                 URL: https://issues.apache.org/jira/browse/SPARK-51475
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.5.0, 3.4.4, 3.5.5
>            Reporter: Warrick He
>            Priority: Major
>              Labels: correctness
>
> This impacts array_distinct. This was tested on Spark versions 3.5.5, 3.5.0, 
> and 3.4.4, but it likely affects all versions.
> Problem: inconsistent behavior for 0.0 and -0.0. See below (ran on 3.5.5)
> I'm not sure what the desired behavior is, does Spark want to follow the IEEE 
> standard and set them to equal, giving only -0.0 or 0.0, or should it 
> consider these distinct?
> {quote}>>> spark.createDataFrame([([0.0, 6.0 -0.0],)], 
> ['values']).createOrReplaceTempView("tab")
> >>> spark.sql("select array_distinct(values) from tab").show()
> +----------------------+
> |array_distinct(values)|
> +----------------------+
> |            [0.0, 6.0]|
> +----------------------+
>  
> >>> spark.createDataFrame([([0.0, -0.0, 6.0],)], 
> >>> ['values']).createOrReplaceTempView("tab")
> >>> spark.sql("select array_distinct(values) from tab").show()
> +----------------------+
> |array_distinct(values)|
> +----------------------+
> |      [0.0, -0.0, 6.0]|
> +----------------------+
> {quote}
> This issue could be related to the implementation of OpenHashSet.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SPARK-51475) ArrayDistinct Producing Inconsistent Behavior For -0.0 and +0.0

Reply via email to