[ https://issues.apache.org/jira/browse/SPARK-44240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
dzcxzl updated SPARK-44240: --------------------------- Description: {code:java} set spark.sql.execution.topKSortFallbackThreshold=10000; SELECT min(id) FROM ( SELECT id FROM range(999999999) ORDER BY id LIMIT 10000) a; {code} If GlobalLimitExec is not the final operator and has a sort operator, shuffle read does not guarantee the order, which leads to the limit read data that may be random. TakeOrderedAndProjectExec has ordering, so there is no such problem. !topKSortFallbackThreshold.png! {code:java} set spark.sql.execution.topKSortFallbackThreshold=10000; select min(id) from (select id from range(999999999) order by id desc limit 10000) a; {code} !topKSortFallbackThresholdDesc.png! was: {code:java} set spark.sql.execution.topKSortFallbackThreshold=10000; SELECT min(id) FROM ( SELECT id FROM range(999999999) ORDER BY id LIMIT 10000) a; {code} If GlobalLimitExec is not the final operator and has a sort operator, shuffle read does not guarantee the order, which leads to the limit read data that may be random. TakeOrderedAndProjectExec has ordering, so there is no such problem. !topKSortFallbackThreshold.png! > Setting the topKSortFallbackThreshold value may lead to inaccurate results > -------------------------------------------------------------------------- > > Key: SPARK-44240 > URL: https://issues.apache.org/jira/browse/SPARK-44240 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.4.0, 3.0.0, 3.1.0, 3.2.0, 3.3.0, 3.4.0 > Reporter: dzcxzl > Priority: Minor > Attachments: topKSortFallbackThreshold.png, > topKSortFallbackThresholdDesc.png > > > > {code:java} > set spark.sql.execution.topKSortFallbackThreshold=10000; > SELECT min(id) FROM ( SELECT id FROM range(999999999) ORDER BY id LIMIT > 10000) a; {code} > > If GlobalLimitExec is not the final operator and has a sort operator, shuffle > read does not guarantee the order, which leads to the limit read data that > may be random. > TakeOrderedAndProjectExec has ordering, so there is no such problem. > > !topKSortFallbackThreshold.png! > > {code:java} > set spark.sql.execution.topKSortFallbackThreshold=10000; > select min(id) from (select id from range(999999999) order by id desc limit > 10000) a; {code} > !topKSortFallbackThresholdDesc.png! -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org