[ https://issues.apache.org/jira/browse/SPARK-23368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16366374#comment-16366374 ]
Maryann Xue commented on SPARK-23368: ------------------------------------- [~cloud_fan], [~smilegator], Could you please help review this PR? Thanks in advance! > Avoid unnecessary Exchange or Sort after projection > --------------------------------------------------- > > Key: SPARK-23368 > URL: https://issues.apache.org/jira/browse/SPARK-23368 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 2.3.0 > Reporter: Maryann Xue > Priority: Minor > > After column rename projection, the ProjectExec's outputOrdering and > outputPartitioning should reflect the projected columns as well. For example, > {code:java} > SELECT b1 > FROM ( > SELECT a a1, b b1 > FROM testData2 > ORDER BY a > ) > ORDER BY a1{code} > The inner query is ordered on a1 as well. If we had a rule to eliminate Sort > on sorted result, together with this fix, the order-by in the outer query > could have been optimized out. > > Similarly, the below query > {code:java} > SELECT * > FROM ( > SELECT t1.a a1, t2.a a2, t1.b b1, t2.b b2 > FROM testData2 t1 > LEFT JOIN testData2 t2 > ON t1.a = t2.a > ) > JOIN testData2 t3 > ON a1 = t3.a{code} > is equivalent to > {code:java} > SELECT * > FROM testData2 t1 > LEFT JOIN testData2 t2 > ON t1.a = t2.a > JOIN testData2 t3 > ON t1.a = t3.a{code} > , so the unnecessary sorting and hash-partitioning that have been optimized > out for the second query should have be eliminated in the first query as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org