[ https://issues.apache.org/jira/browse/SPARK-33260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dongjoon Hyun updated SPARK-33260: ---------------------------------- Labels: correctness (was: ) > SortExec produces incorrect results if sortOrder is a Stream > ------------------------------------------------------------ > > Key: SPARK-33260 > URL: https://issues.apache.org/jira/browse/SPARK-33260 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 3.0.0, 3.0.1 > Reporter: Ankur Dave > Assignee: Ankur Dave > Priority: Major > Labels: correctness > Fix For: 3.0.2, 3.1.0 > > > The following query produces incorrect results. The query has two essential > features: (1) it contains a string aggregate, resulting in a {{SortExec}} > node, and (2) it contains a duplicate grouping key, causing > {{RemoveRepetitionFromGroupExpressions}} to produce a sort order stored as a > Stream. > SELECT bigint_col_1, bigint_col_9, MAX(CAST(bigint_col_1 AS string)) > FROM table_4 > GROUP BY bigint_col_1, bigint_col_9, bigint_col_9 > When the sort order is stored as a {{Stream}}, the line > {{ordering.map(_.child.genCode(ctx))}} in > {{GenerateOrdering#createOrderKeys()}} produces unpredictable side effects to > {{ctx}}. This is because {{genCode(ctx)}} modifies {{ctx}}. When {{ordering}} > is a {{Stream}}, the modifications will not happen immediately as intended, > but will instead occur lazily when the returned {{Stream}} is used later. > Similar bugs have occurred at least three times in the past: > https://issues.apache.org/jira/browse/SPARK-24500, > https://issues.apache.org/jira/browse/SPARK-25767, > https://issues.apache.org/jira/browse/SPARK-26680. > The fix is to check if {{ordering}} is a {{Stream}} and force the > modifications to happen immediately if so. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org