[ 
https://issues.apache.org/jira/browse/SPARK-20010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhenhua Wang updated SPARK-20010:
---------------------------------
    Description: 
After sort merge join for inner join, now we only keep left key ordering. 
However, after inner join, right key has the same value and order as left key. 
So if we need another smj on right key, we will unnecessarily add a sort which 
causes additional cost.

As a more complicated example, A join B on A.key = B.key join C on B.key = 
C.key join D on A.key = D.key. We will unnecessarily add a sort on B.key when 
join {A, B} and C, and add a sort on A.key when join {A, B, C} and D.

To fix this, we need to propagate all sorted information (equivalent 
expressions) from bottom up.

  was:
After sort merge join for inner join, now we only keep left key ordering. 
However, after inner join, right key has the same value and order as left key. 
So if we need another smj on right key, we will unnecessarily add a sort which 
causes additional cost.

As a more complicated example, A join B on A.key = B.key join C on B.key = 
C.key join D on A.key = D.key. We will unnecessarily add a sort on B.key when 
join {AB} and C, and add a sort on A.key when join {ABC} and D.

To fix this, we need to propagate all sorted information (equivalent 
expressions) from bottom up.


> Sort information is lost after sort merge join
> ----------------------------------------------
>
>                 Key: SPARK-20010
>                 URL: https://issues.apache.org/jira/browse/SPARK-20010
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.2.0
>            Reporter: Zhenhua Wang
>
> After sort merge join for inner join, now we only keep left key ordering. 
> However, after inner join, right key has the same value and order as left 
> key. So if we need another smj on right key, we will unnecessarily add a sort 
> which causes additional cost.
> As a more complicated example, A join B on A.key = B.key join C on B.key = 
> C.key join D on A.key = D.key. We will unnecessarily add a sort on B.key when 
> join {A, B} and C, and add a sort on A.key when join {A, B, C} and D.
> To fix this, we need to propagate all sorted information (equivalent 
> expressions) from bottom up.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to