[GitHub] [spark] agrawaldevesh commented on a change in pull request #29342: [SPARK-32399][SQL] Full outer shuffled hash join

GitBox Thu, 06 Aug 2020 16:26:27 -0700


agrawaldevesh commented on a change in pull request #29342:
URL: https://github.com/apache/spark/pull/29342#discussion_r466738015




##########
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/ShuffledHashJoinExec.scala
##########
@@ -71,8 +89,134 @@ case class ShuffledHashJoinExec(
     val numOutputRows = longMetric("numOutputRows")
     streamedPlan.execute().zipPartitions(buildPlan.execute()) { (streamIter, 
buildIter) =>
       val hashed = buildHashedRelation(buildIter)
-      join(streamIter, hashed, numOutputRows)
+      joinType match {
+        case FullOuter => fullOuterJoin(streamIter, hashed, numOutputRows)

Review comment:
       I am not sure if that's a good idea: 
spark.sql.autoBroadcastJoinThreshold is a very widely used config and I think 
we should have a separate config to disable just this full outer join 
optimization, without having to turn of BHJ in itself.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] agrawaldevesh commented on a change in pull request #29342: [SPARK-32399][SQL] Full outer shuffled hash join

Reply via email to