I would like to propose a New merge join algorithm for optimizing non '=' 
operators. ('<', '<=', '>', '>=')


-          Currently  Merge join is only supported for '=' operator. For '<' or 
'>' operator it chooses Nest Loop Join, or Nest loop with materialization.

-          I think when tuple from lower node is sorted or sorting cost is very 
less, then we can use Merge Join for Non Equal operator also and which will 
give better performance than NLJ (for selecting this new cost calculation can 
be implemented in planner).



Example for using merge Join for < operator.

                        T1                    T2

                        3                      1

                        4                      2

                        5                      4

Outer tuple (3) need to be compared with inner tuple one by one, so it will 
satisfy condition at third inner tuple (as 3<4). So here we can save this point 
of inner tuple so that next outer tuple can directly start comparison from this 
tuple.

1.       In this algorithm we can put one more optimization: Once outer tuple 
satisfies the Merge QUAL it can skip the Merge QUAL test with remaining inner 
tuple and directly apply Other QUALs, as merge QUAL will always satisfy for 
remaining tuples.



Implementation Detail:

1.       Need to add new cost calculation mechanism for this. I still have to 
work on this part.

2.       Implementing in Executor

a.       This algorithm is almost same as normal merge Join with some changes.

b.       Both Inner and Outer Data Sources should be sorted, same as normal 
merge Join.
ALGORITHM:
Merge Qual (R.A < Q.A)
r = first tuple from R (Outer Relation)
q = first tuple in Q( Inner Relation)
save_pos = q;  /* Position to start scanning in relation Q*/
While (fetch tuple r from R till relation end)
{
            for each tuple q in Q starting from save_pos
            {
                             Merge Qual Satisfy
                             {
                                                save_pos = q;
                                                Consume all subsequent tuples 
and project(just need to match Other Quals if any.)
                             }
Else
                                    Fetch Next tuple from Q;
            }
}



-          Performance Comparison:
Suppose tuples of inner and outer is already sorted or Index scan on inner and 
outer.


*         Then cost of NLJ is always O (r*q).

*         The cost of this MJ will be b/w: O (n) to O (r*q).


Where r is number of tuple in R (outer relation) and q is number of tuple in Q 
(inner Relation).

Please provide your feedback/suggestions.

Thanks & Regards,
Dilip Kumar

Reply via email to