[ https://issues.apache.org/jira/browse/SPARK-18067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15600820#comment-15600820 ]
Tejas Patil commented on SPARK-18067: ------------------------------------- The predicate `value1 === value2` is pushed down to the Join operator which makes sense. Although, later the optimizer could have avoided a shuffle. Here is whats happening: https://github.com/apache/spark/blob/39e2bad6a866d27c3ca594d15e574a1da3ee84cc/sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala#L185 {noformat} val useExistingPartitioning = children.zip(requiredChildDistributions).forall { case (child, distribution) => child.outputPartitioning.guarantees( createPartitioning(distribution, maxChildrenNumPartitions)) } {noformat} Child's `outputPartitioning` is `HashPartitioning(expressions = \[key\], numPartitions = 200)`. Partitioning returned by `createPartitioning()` is `HashPartitioning(expressions = \[value, key\], numPartitions = 200)`. Since they don't match, an extra shuffle is added. > Adding filter after SortMergeJoin creates unnecessary shuffle > ------------------------------------------------------------- > > Key: SPARK-18067 > URL: https://issues.apache.org/jira/browse/SPARK-18067 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 1.6.1 > Reporter: Paul Jones > Priority: Minor > > Basic setup > {code} > scala> case class Data1(key: String, value1: Int) > scala> case class Data2(key: String, value2: Int) > scala> val partition1 = sc.parallelize(1 to 100000).map(x => Data1(s"$x", x)) > .toDF.repartition(col("key")).sortWithinPartitions(col("key")).cache > scala> val partition2 = sc.parallelize(1 to 100000).map(x => Data2(s"$x", x)) > .toDF.repartition(col("key")).sortWithinPartitions(col("key")).cache > {code} > Join on key > {code} > scala> partition1.join(partition2, "key").explain > == Physical Plan == > Project [key#0,value1#1,value2#13] > +- SortMergeJoin [key#0], [key#12] > :- InMemoryColumnarTableScan [key#0,value1#1], InMemoryRelation > [key#0,value1#1], true, 10000, StorageLevel(true, true, false, true, 1), Sort > [key#0 ASC], false, 0, None > +- InMemoryColumnarTableScan [key#12,value2#13], InMemoryRelation > [key#12,value2#13], true, 10000, StorageLevel(true, true, false, true, 1), > Sort [key#12 ASC], false, 0, None > {code} > And we get a super efficient join with no shuffle. > But if we add a filter our join gets less efficient and we end up with a > shuffle. > {code} > scala> partition1.join(partition2, "key").filter($"value1" === > $"value2").explain > == Physical Plan == > Project [key#0,value1#1,value2#13] > +- SortMergeJoin [value1#1,key#0], [value2#13,key#12] > :- Sort [value1#1 ASC,key#0 ASC], false, 0 > : +- TungstenExchange hashpartitioning(value1#1,key#0,200), None > : +- InMemoryColumnarTableScan [key#0,value1#1], InMemoryRelation > [key#0,value1#1], true, 10000, StorageLevel(true, true, false, true, 1), Sort > [key#0 ASC], false, 0, None > +- Sort [value2#13 ASC,key#12 ASC], false, 0 > +- TungstenExchange hashpartitioning(value2#13,key#12,200), None > +- InMemoryColumnarTableScan [key#12,value2#13], InMemoryRelation > [key#12,value2#13], true, 10000, StorageLevel(true, true, false, true, 1), > Sort [key#12 ASC], false, 0, None > {code} > And we can avoid the shuffle if use a filter statement that can't be pushed > in the join. > {code} > scala> partition1.join(partition2, "key").filter($"value1" >= > $"value2").explain > == Physical Plan == > Project [key#0,value1#1,value2#13] > +- Filter (value1#1 >= value2#13) > +- SortMergeJoin [key#0], [key#12] > :- InMemoryColumnarTableScan [key#0,value1#1], InMemoryRelation > [key#0,value1#1], true, 10000, StorageLevel(true, true, false, true, 1), Sort > [key#0 ASC], false, 0, None > +- InMemoryColumnarTableScan [key#12,value2#13], InMemoryRelation > [key#12,value2#13], true, 10000, StorageLevel(true, true, false, true, 1), > Sort [key#12 ASC], false, 0, None > {code} > What's the best way to avoid the filter pushdown here?? -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org