Re: [PR] [SPARK-48000][SQL] Enable hash join support for non-binary collations [spark]

via GitHub Tue, 07 May 2024 07:16:19 -0700


dbatomic commented on code in PR #46166:
URL: https://github.com/apache/spark/pull/46166#discussion_r1592561920



##########
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala:
##########
@@ -247,44 +247,36 @@ abstract class SparkStrategies extends 
QueryPlanner[SparkPlan] {
           _, left, right, hint) =>
         val hashJoinSupport = hashJoinSupported(leftKeys, rightKeys)

Review Comment:
   I think that you still need to keep hashJoinSupport flag check. E.g. if 
physical plan is manually constructed you don't have any guarantee that 
CollationKey QO rule kicked in.
   
   I would prefer collation key returning binary normalized value that does 
support hash join and keep hashJoinSupport check for StringType.
   
   Also, in future, we may want to add rules in QO that will chose to avoid 
collation key path, given that that path is also not cheap and not clearly 
better than sort merge join (we need to measure this).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-48000][SQL] Enable hash join support for non-binary collations [spark]

Reply via email to