[ https://issues.apache.org/jira/browse/SPARK-25401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16714109#comment-16714109 ]
Apache Spark commented on SPARK-25401: -------------------------------------- User 'davidvrba' has created a pull request for this issue: https://github.com/apache/spark/pull/23267 > Reorder the required ordering to match the table's output ordering for bucket > join > ---------------------------------------------------------------------------------- > > Key: SPARK-25401 > URL: https://issues.apache.org/jira/browse/SPARK-25401 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 2.3.0 > Reporter: Wang, Gang > Priority: Major > > Currently, we check if SortExec is needed between a operator and its child > operator in method orderingSatisfies, and method orderingSatisfies require > the order in the SortOrders are all the same. > While, take the following case into consideration. > * Table a is bucketed by (a1, a2), sorted by (a2, a1), and buckets number is > 200. > * Table b is bucketed by (b1, b2), sorted by (b2, b1), and buckets number is > 200. > * Table a join table b on (a1=b1, a2=b2) > In this case, if the join is sort merge join, the query planner won't add > exchange on both sides, while, sort will be added on both sides. Actually, > sort is also unnecessary, since in the same bucket, like bucket 1 of table a, > and bucket 1 of table b, (a1=b1, a2=b2) is equivalent to (a2=b2, a1=b1). -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org