[ 
https://issues.apache.org/jira/browse/SPARK-33260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-33260:
----------------------------------
    Labels: correctness  (was: )

> SortExec produces incorrect results if sortOrder is a Stream
> ------------------------------------------------------------
>
>                 Key: SPARK-33260
>                 URL: https://issues.apache.org/jira/browse/SPARK-33260
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.0.0, 3.0.1
>            Reporter: Ankur Dave
>            Assignee: Ankur Dave
>            Priority: Major
>              Labels: correctness
>             Fix For: 3.0.2, 3.1.0
>
>
> The following query produces incorrect results. The query has two essential 
> features: (1) it contains a string aggregate, resulting in a {{SortExec}} 
> node, and (2) it contains a duplicate grouping key, causing 
> {{RemoveRepetitionFromGroupExpressions}} to produce a sort order stored as a 
> Stream.
> SELECT bigint_col_1, bigint_col_9, MAX(CAST(bigint_col_1 AS string))
> FROM table_4
> GROUP BY bigint_col_1, bigint_col_9, bigint_col_9
> When the sort order is stored as a {{Stream}}, the line 
> {{ordering.map(_.child.genCode(ctx))}} in 
> {{GenerateOrdering#createOrderKeys()}} produces unpredictable side effects to 
> {{ctx}}. This is because {{genCode(ctx)}} modifies {{ctx}}. When {{ordering}} 
> is a {{Stream}}, the modifications will not happen immediately as intended, 
> but will instead occur lazily when the returned {{Stream}} is used later.
> Similar bugs have occurred at least three times in the past: 
> https://issues.apache.org/jira/browse/SPARK-24500, 
> https://issues.apache.org/jira/browse/SPARK-25767, 
> https://issues.apache.org/jira/browse/SPARK-26680.
> The fix is to check if {{ordering}} is a {{Stream}} and force the 
> modifications to happen immediately if so.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to