[ 
https://issues.apache.org/jira/browse/SPARK-44240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dzcxzl updated SPARK-44240:
---------------------------
    Description: 
 
{code:java}
set spark.sql.execution.topKSortFallbackThreshold=10000;
SELECT min(id) FROM ( SELECT id FROM range(999999999) ORDER BY id LIMIT 10000) 
a; {code}
 

If GlobalLimitExec is not the final operator, shuffle read does not guarantee 
the order, which leads to the limit read data that may be random.

TakeOrderedAndProjectExec has ordering, so there is no such problem.

 

!topKSortFallbackThreshold.png!

 

 

  was:
 
{code:java}
set spark.sql.execution.topKSortFallbackThreshold=10000;
SELECT min(id) FROM ( SELECT id FROM range(999999999) ORDER BY id LIMIT 10000) 
a; {code}
 

If GlobalLimitExec is not the final operator, shuffle read does not guarantee 
the order, which leads to the limit read data that may be random.

TakeOrderedAndProjectExec has ordering, so there is no such problem.

 

 

 


> Setting the topKSortFallbackThreshold value may lead to inaccurate results
> --------------------------------------------------------------------------
>
>                 Key: SPARK-44240
>                 URL: https://issues.apache.org/jira/browse/SPARK-44240
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.4.0, 3.0.0, 3.1.0, 3.2.0, 3.3.0, 3.4.0
>            Reporter: dzcxzl
>            Priority: Minor
>         Attachments: topKSortFallbackThreshold.png
>
>
>  
> {code:java}
> set spark.sql.execution.topKSortFallbackThreshold=10000;
> SELECT min(id) FROM ( SELECT id FROM range(999999999) ORDER BY id LIMIT 
> 10000) a; {code}
>  
> If GlobalLimitExec is not the final operator, shuffle read does not guarantee 
> the order, which leads to the limit read data that may be random.
> TakeOrderedAndProjectExec has ordering, so there is no such problem.
>  
> !topKSortFallbackThreshold.png!
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to