comphead commented on code in PR #1424:
URL: https://github.com/apache/datafusion-comet/pull/1424#discussion_r1964288717
##########
spark/src/main/scala/org/apache/comet/rules/RewriteJoin.scala:
##########
@@ -67,4 +83,21 @@ object RewriteJoin extends JoinSelectionHelper {
}
case _ => plan
}
+
+ def getOptimalBuildSide(join: Join): BuildSide = {
+ val leftSize = join.left.stats.sizeInBytes
+ val rightSize = join.right.stats.sizeInBytes
+ val leftRowCount = join.left.stats.rowCount
+ val rightRowCount = join.right.stats.rowCount
+ if (leftSize == rightSize && rightRowCount.isDefined &&
leftRowCount.isDefined) {
Review Comment:
maybe I missing something? `leftSize == rightSize` condition looks very
unlikely so by the logic it would never consider rowCounts here and fallback to
sizes.
We can perhaps use something like
https://docs.pingcap.com/tidb/stable/join-reorder#example-the-greedy-algorithm-of-join-reorder
and if rowCounts are not available fallback to sizes
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]