[ https://issues.apache.org/jira/browse/SPARK-7289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Fei Wang updated SPARK-7289: ---------------------------- Description: Optimize following sql select key from (select * from testData order by key) t limit 5 from == Parsed Logical Plan == 'Limit 5 'Project ['key] 'Subquery t 'Sort ['key ASC], true 'Project [*] 'UnresolvedRelation [testData], None == Analyzed Logical Plan == Limit 5 Project [key#0] Subquery t Sort [key#0 ASC], true Project [key#0,value#1] Subquery testData LogicalRDD [key#0,value#1], MapPartitionsRDD[1] == Optimized Logical Plan == Limit 5 Project [key#0] Sort [key#0 ASC], true LogicalRDD [key#0,value#1], MapPartitionsRDD[1] == Physical Plan == Limit 5 Project [key#0] Sort [key#0 ASC], true Exchange (RangePartitioning [key#0 ASC], 5), [] PhysicalRDD [key#0,value#1], MapPartitionsRDD[1] to == Parsed Logical Plan == 'Limit 5 'Project ['key] 'Subquery t 'Sort ['key ASC], true 'Project [*] 'UnresolvedRelation [testData], None == Analyzed Logical Plan == Limit 5 Project [key#0] Subquery t Sort [key#0 ASC], true Project [key#0,value#1] Subquery testData LogicalRDD [key#0,value#1], MapPartitionsRDD[1] == Optimized Logical Plan == Project [key#0] Limit 5 Sort [key#0 ASC], true LogicalRDD [key#0,value#1], MapPartitionsRDD[1] == Physical Plan == Project [key#0] TakeOrdered 5, [key#0 ASC] PhysicalRDD [key#0,value#1], MapPartitionsRDD[1] was: Optimize following sql `select key from (select * from testData limit 5) t order by key limit 5` optimize it from ``` == Parsed Logical Plan == 'Limit 5 'Sort ['key ASC], true 'Project ['key] 'Subquery t 'Limit 5 'Project [*] 'UnresolvedRelation [testData], None == Analyzed Logical Plan == Limit 5 Sort [key#0 ASC], true Project [key#0] Subquery t Limit 5 Project [key#0,value#1] Subquery testData LogicalRDD [key#0,value#1], MapPartitionsRDD[1] == Optimized Logical Plan == Limit 5 Sort [key#0 ASC], true Project [key#0] Limit 5 LogicalRDD [key#0,value#1], MapPartitionsRDD[1] == Physical Plan == TakeOrdered 5, [key#0 ASC] Project [key#0] Limit 5 PhysicalRDD [key#0,value#1], MapPartitionsRDD[1] ``` to ``` == Parsed Logical Plan == 'Limit 5 'Sort ['key ASC], true 'Project ['key] 'Subquery t 'Limit 5 'Project [*] 'UnresolvedRelation [testData], None == Analyzed Logical Plan == Limit 5 Sort [key#0 ASC], true Project [key#0] Subquery t Limit 5 Project [key#0,value#1] Subquery testData LogicalRDD [key#0,value#1], MapPartitionsRDD[1] == Optimized Logical Plan == Limit 5 Sort [key#0 ASC], true Project [key#0] LogicalRDD [key#0,value#1], MapPartitionsRDD[1] == Physical Plan == TakeOrdered 5, [key#0 ASC] Project [key#0] PhysicalRDD [key#0,value#1], MapPartitionsRDD[1] ``` Summary: Combine Limit and Sort to avoid total ordering (was: push down sort when it's child is Limit) > Combine Limit and Sort to avoid total ordering > ---------------------------------------------- > > Key: SPARK-7289 > URL: https://issues.apache.org/jira/browse/SPARK-7289 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 1.3.1 > Reporter: Fei Wang > > Optimize following sql > select key from (select * from testData order by key) t limit 5 > from > == Parsed Logical Plan == > 'Limit 5 > 'Project ['key] > 'Subquery t > 'Sort ['key ASC], true > 'Project [*] > 'UnresolvedRelation [testData], None > == Analyzed Logical Plan == > Limit 5 > Project [key#0] > Subquery t > Sort [key#0 ASC], true > Project [key#0,value#1] > Subquery testData > LogicalRDD [key#0,value#1], MapPartitionsRDD[1] > == Optimized Logical Plan == > Limit 5 > Project [key#0] > Sort [key#0 ASC], true > LogicalRDD [key#0,value#1], MapPartitionsRDD[1] > == Physical Plan == > Limit 5 > Project [key#0] > Sort [key#0 ASC], true > Exchange (RangePartitioning [key#0 ASC], 5), [] > PhysicalRDD [key#0,value#1], MapPartitionsRDD[1] > to > == Parsed Logical Plan == > 'Limit 5 > 'Project ['key] > 'Subquery t > 'Sort ['key ASC], true > 'Project [*] > 'UnresolvedRelation [testData], None > == Analyzed Logical Plan == > Limit 5 > Project [key#0] > Subquery t > Sort [key#0 ASC], true > Project [key#0,value#1] > Subquery testData > LogicalRDD [key#0,value#1], MapPartitionsRDD[1] > == Optimized Logical Plan == > Project [key#0] > Limit 5 > Sort [key#0 ASC], true > LogicalRDD [key#0,value#1], MapPartitionsRDD[1] > == Physical Plan == > Project [key#0] > TakeOrdered 5, [key#0 ASC] > PhysicalRDD [key#0,value#1], MapPartitionsRDD[1] -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org