Github user jinfengni commented on the issue:
https://github.com/apache/drill/pull/905
@vvysotskyi , what I'm thinking is not just comparing the ratio of column
count vs ratio of row count.
Let's take a step back. This SwapHashJoinVisitor is trying to correct the
join decision made by either VolcanoPlanner or HepPlanner, in the sense the
original join order might need put bigger dataset on the build side. The swap
is needed because we want to avoid high memory requirement on hash join
operator. The question is how do we define "big dataset". For now,
SwapHashJoinVisitor simply uses rowCount, which is not sufficient.
In stead, we probably should add a method `getSelfMemCost` to all `Prel`
node. For non-blocking operator, it's simply returning either 0 or some
constant (to hold one single batch). For non-blocking operator such as
HashJoin, it will return a value proportional to rowCount X columnCount (more
precisely, total number of bytes per row, considering different column data
type).
Same as existing method of `computeSelfCost`, we need
`getCumulativeMemCost` which will return the cumulative cost for child nodes
rooted at one `Prel` node. With this `getSelfMemCost` and
`getCumulativeMemCost` defined for HashJoin, and a HashJoin with input1, input2
as inputs, we could estimate cumulative memory cost for HashJoin(input1,
input2), and HashJoin(input2, input1), and use that as criteria to decide
whether we have to switch them.
This idea is not trying to adjust the row count estimation. In stead, it's
trying to change the criteria where we may think it's necessary to swap, based
on the observation that we want to do swap only when we want to reduce memory
requirement for a query.
Will the above idea work? If it could not address this issue, it's probably
fine to go with what you proposed. Before we go that option, please give some
thoughts about the above idea.
---