[
https://issues.apache.org/jira/browse/SPARK-53060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated SPARK-53060:
-----------------------------------
Labels: pull-request-available (was: )
> Aggregate followed by ORDER BY doesn't preserve orders
> ------------------------------------------------------
>
> Key: SPARK-53060
> URL: https://issues.apache.org/jira/browse/SPARK-53060
> Project: Spark
> Issue Type: Bug
> Components: Spark Core
> Affects Versions: 4.1.0, 3.5.6
> Reporter: Boxuan Li
> Priority: Major
> Labels: pull-request-available
>
> For this query:
> SELECT FIRST(val) FROM (SELECT val FROM t ORDER BY val)
> You'd expect Spark planner to choose SortAggregate because it needs to
> preserve orders, but no, Spark chooses HashAggregate. A demo is available in
> [https://github.com/apache/spark/pull/51768]
> Although in practice, HashAggregate operator seems to preserve orders in this
> case, this is not documented behavior. Choosing HashAggregate over
> SortAggregate seems a bug to me.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]