[ 
https://issues.apache.org/jira/browse/SPARK-21998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maryann Xue updated SPARK-21998:
--------------------------------
    Summary: SortMergeJoinExec did not calculate its outputOrdering correctly 
during physical planning  (was: SortMergeJoinExec should calculate its 
outputOrdering independent of its children's outputOrdering)

> SortMergeJoinExec did not calculate its outputOrdering correctly during 
> physical planning
> -----------------------------------------------------------------------------------------
>
>                 Key: SPARK-21998
>                 URL: https://issues.apache.org/jira/browse/SPARK-21998
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.2.0
>            Reporter: Maryann Xue
>            Priority: Minor
>
> Right now SortMergeJoinExec calculates its outputOrdering based on its 
> children's outputOrdering, thus oftentimes the SortMergeJoinExec's 
> outputOrdering is NOT correct until after EnsureRequirements, which happens 
> at a rather late stage. As a result, potential optimizations that rely on the 
> required/output orderings, like SPARK-18591, will not work for 
> SortMergeJoinExec.
> Unlike operators like Project or Filter, which simply preserve the ordering 
> of their inputs, the SortMergeJoinExec has a behavior that generates a new 
> ordering in its output regardless of the orderings of its children. I think 
> the code below together with its comment is buggy.
> {code}
>   /**
>    * For SMJ, child's output must have been sorted on key or expressions with 
> the same order as
>    * key, so we can get ordering for key from child's output ordering.
>    */
>   private def getKeyOrdering(keys: Seq[Expression], childOutputOrdering: 
> Seq[SortOrder])
>     : Seq[SortOrder] = {
>     keys.zip(childOutputOrdering).map { case (key, childOrder) =>
>       SortOrder(key, Ascending, childOrder.sameOrderExpressions + 
> childOrder.child - key)
>     }
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to