Github user jinfengni commented on the issue: https://github.com/apache/drill/pull/905 This OOM problem exposes two problems. The first one is in planning time, where we choose a sub-optimal plan, due to the inaccurate estimation of row count because of missing of appropriate statistics. The second one is in execution time, where we may need understand whether Drill uses too much memory and whether spill to disk is an option. I think the two are complementary to each other; even when we have spill to disk for hash join, if planner choose a sub-optimal plan, the query still could take long, long time to complete. Looks like the PR is addressing the 1st issue. I agree that the root cause is row count estimation, which is more appropriate to defer to the enhancement of statistics support. For swapJoin logic, the proposal of getMaxRowCount() seems to be in the line of adjusting row count estimation. I like better the idea of combining row count + column count, which was essentially adopted in swapInput() by LoptJoinOptmizeRule. For HashTable build side cost, hash table only has to hold the join key. However, since hash join is a blocking operator, it has hold all the records in the build side, meaning total memory requirement (for both hash table + non-join key columns) depends on row count and column count. Therefore, the cost model of hash join should reflect that. Can we use similar idea in SwapHashJoinVisitor? One further improvement would be to modify HashJoinPrel.computeSelfCost(). Today we only consider join key width, and it makes sense to adjust that logic, by considering the total column counts in build side. Such logic could be extracted into a common places, then SwapHashJoinVisitor could call the same shared logic to decide whether it's cost-wise optimal to swap the input sides. Thoughts?
---