[jira] [Updated] (SPARK-53060) Aggregate followed by ORDER BY doesn't preserve orders

ASF GitHub Bot (Jira) Thu, 31 Jul 2025 22:39:18 -0700


     [ 
https://issues.apache.org/jira/browse/SPARK-53060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


ASF GitHub Bot updated SPARK-53060:
-----------------------------------
    Labels: pull-request-available  (was: )

> Aggregate followed by ORDER BY doesn't preserve orders
> ------------------------------------------------------
>
>                 Key: SPARK-53060
>                 URL: https://issues.apache.org/jira/browse/SPARK-53060
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 4.1.0, 3.5.6
>            Reporter: Boxuan Li
>            Priority: Major
>              Labels: pull-request-available
>
> For this query:
> SELECT FIRST(val) FROM (SELECT val FROM t ORDER BY val)
> You'd expect Spark planner to choose SortAggregate because it needs to 
> preserve orders, but no, Spark chooses HashAggregate. A demo is available in 
> [https://github.com/apache/spark/pull/51768]
> Although in practice, HashAggregate operator seems to preserve orders in this 
> case, this is not documented behavior. Choosing HashAggregate over 
> SortAggregate seems a bug to me.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (SPARK-53060) Aggregate followed by ORDER BY doesn't preserve orders

Reply via email to