[jira] [Commented] (SPARK-11111) Fast null-safe join
[ https://issues.apache.org/jira/browse/SPARK-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15125659#comment-15125659 ] Ruslan Dautkhanov commented on SPARK-1: --- Does this affect all OUTER JOINS? I have poor performance in Spark 1.5 with left and full outer joins. Thank you for fixing this. > Fast null-safe join > --- > > Key: SPARK-1 > URL: https://issues.apache.org/jira/browse/SPARK-1 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Davies Liu >Assignee: Davies Liu > Fix For: 1.6.0 > > > Today, null safe joins are executed with a Cartesian product. > {code} > scala> sqlContext.sql("select * from t a join t b on (a.i <=> b.i)").explain > == Physical Plan == > TungstenProject [i#2,j#3,i#7,j#8] > Filter (i#2 <=> i#7) > CartesianProduct >LocalTableScan [i#2,j#3], [[1,1]] >LocalTableScan [i#7,j#8], [[1,1]] > {code} > One option is to add this rewrite to the optimizer: > {code} > select * > from t a > join t b > on coalesce(a.i, ) = coalesce(b.i, ) AND (a.i <=> b.i) > {code} > Acceptance criteria: joins with only null safe equality should not result in > a Cartesian product. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11111) Fast null-safe join
[ https://issues.apache.org/jira/browse/SPARK-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14957669#comment-14957669 ] Apache Spark commented on SPARK-1: -- User 'davies' has created a pull request for this issue: https://github.com/apache/spark/pull/9120 > Fast null-safe join > --- > > Key: SPARK-1 > URL: https://issues.apache.org/jira/browse/SPARK-1 > Project: Spark > Issue Type: Improvement >Reporter: Davies Liu >Assignee: Davies Liu > > Today, null safe joins are executed with a Cartesian product. > {code} > scala> sqlContext.sql("select * from t a join t b on (a.i <=> b.i)").explain > == Physical Plan == > TungstenProject [i#2,j#3,i#7,j#8] > Filter (i#2 <=> i#7) > CartesianProduct >LocalTableScan [i#2,j#3], [[1,1]] >LocalTableScan [i#7,j#8], [[1,1]] > {code} > One option is to add this rewrite to the optimizer: > {code} > select * > from t a > join t b > on coalesce(a.i, ) = coalesce(b.i, ) AND (a.i <=> b.i) > {code} > Acceptance criteria: joins with only null safe equality should not result in > a Cartesian product. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org