I would like to propose a New merge join algorithm for optimizing non '='
operators. ('<', '<=', '>', '>=')
- Currently Merge join is only supported for '=' operator. For '<' or
'>' operator it chooses Nest Loop Join, or Nest loop with materialization.
- I think when tuple from lower node is sorted or sorting cost is very
less, then we can use Merge Join for Non Equal operator also and which will
give better performance than NLJ (for selecting this new cost calculation can
be implemented in planner).
Example for using merge Join for < operator.
T1 T2
3 1
4 2
5 4
Outer tuple (3) need to be compared with inner tuple one by one, so it will
satisfy condition at third inner tuple (as 3<4). So here we can save this point
of inner tuple so that next outer tuple can directly start comparison from this
tuple.
1. In this algorithm we can put one more optimization: Once outer tuple
satisfies the Merge QUAL it can skip the Merge QUAL test with remaining inner
tuple and directly apply Other QUALs, as merge QUAL will always satisfy for
remaining tuples.
Implementation Detail:
1. Need to add new cost calculation mechanism for this. I still have to
work on this part.
2. Implementing in Executor
a. This algorithm is almost same as normal merge Join with some changes.
b. Both Inner and Outer Data Sources should be sorted, same as normal
merge Join.
ALGORITHM:
Merge Qual (R.A < Q.A)
r = first tuple from R (Outer Relation)
q = first tuple in Q( Inner Relation)
save_pos = q; /* Position to start scanning in relation Q*/
While (fetch tuple r from R till relation end)
{
for each tuple q in Q starting from save_pos
{
Merge Qual Satisfy
{
save_pos = q;
Consume all subsequent tuples
and project(just need to match Other Quals if any.)
}
Else
Fetch Next tuple from Q;
}
}
- Performance Comparison:
Suppose tuples of inner and outer is already sorted or Index scan on inner and
outer.
* Then cost of NLJ is always O (r*q).
* The cost of this MJ will be b/w: O (n) to O (r*q).
Where r is number of tuple in R (outer relation) and q is number of tuple in Q
(inner Relation).
Please provide your feedback/suggestions.
Thanks & Regards,
Dilip Kumar