[ https://issues.apache.org/jira/browse/SPARK-44240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17896337#comment-17896337 ]
leesf commented on SPARK-44240: ------------------------------- [~yumwang] hi, do you have any ideas on this bug, since i saw you have some related questions in https://issues.apache.org/jira/browse/SPARK-39709. > Setting the topKSortFallbackThreshold value may lead to inaccurate results > -------------------------------------------------------------------------- > > Key: SPARK-44240 > URL: https://issues.apache.org/jira/browse/SPARK-44240 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.4.0, 3.0.0, 3.1.0, 3.2.0, 3.3.0, 3.4.0 > Reporter: dzcxzl > Priority: Minor > Attachments: topKSortFallbackThreshold.png, > topKSortFallbackThresholdDesc.png > > > > {code:java} > set spark.sql.execution.topKSortFallbackThreshold=10000; > SELECT min(id) FROM ( SELECT id FROM range(999999999) ORDER BY id LIMIT > 10000) a; {code} > > If GlobalLimitExec is not the final operator and has a sort operator, shuffle > read does not guarantee the order, which leads to the limit read data that > may be random. > TakeOrderedAndProjectExec has ordering, so there is no such problem. > > !topKSortFallbackThreshold.png! > {code:java} > set spark.sql.execution.topKSortFallbackThreshold=10000; > select min(id) from (select id from range(999999999) order by id desc limit > 10000) a; {code} > !topKSortFallbackThresholdDesc.png! > -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org