I agree with Stamatis.

Let’s suppose we have a merge join and hash join in enumerable convention. The 
merge join requires its input to be sorted, the hash join does not. They can 
both perform equi-join. The merge join can also perform joins like

  FROM orders
  JOIN shipments
  ON shipments.shipDate BETWEEN orders.orderDate
    AND orders.orderDate + INTERVAL ‘7’ DAY

Should we use the same sub-class of RelNode, EnumerableJoin, for both, or 
should we create sub-classes EnumerableHashJoin and EnumerableMergeJoin?

I think we should do the latter:
* They are clearly different algorithms, and we generate different code for them
* They have different cost models
* They have different input and output traits (the merge join requires sorted 
inputs and produces a sorted output)
* If we have an equi-join, we might want to generate both options and see how 
they compare (the merge join might win if one or both of the inputs are 
sorted), and both algorithms used the EnumerableJoin class we would not be able 
to distinguish the RelNode instances.

Julian


> On Apr 23, 2019, at 11:06 AM, Stamatis Zampetakis <zabe...@gmail.com> wrote:
> 
> In the literature, there are many algorithms to perform a join and the vast
> majority of them cannot handle theta conditions (either for performance
> reasons, or because the algorithm was simply not meant for it).
> 
> For the existing EnumerableXXXJoin variants, it may be possible to extend
> the algorithm to handle theta joins without structural changes to the
> algorithm (as I think it is the case in [3]).
> For these cases, I agree with you it does not make sense to have many
> variants (e.g., EnumerableThetaHashJoin seems redundant).
> 
> In the future we may decide to use other hash-based implementations with
> certain applicability restrictions (such as apply only on equijoins) but
> let's not worry too much about that now.
> 
> Regarding the rules, I think it is useful to have different variants.
> Consider for example a system that does not have join processing algorithms
> that can treat theta-joins; pushing non-equi conditions in a join seems
> undesirable in this case.
> 
> For more information regarding join processing, there are a few interesting
> (and old) surveys [4, 5] which outline some of the most typical algorithms
> that are used in relational databases.
> 
> Best,
> Stamatis
> 
> [4] Query evaluation techniques for Large databases (
> https://web.stanford.edu/class/cs346/2014/graefe.pdf)
> [5] Join processing in relational databases (
> https://www.csd.uoc.gr/~hy460/pdf/p63-mishra.pdf)
> 
> 
> On Tue, Apr 23, 2019 at 8:05 AM Yuzhao Chen <yuzhao....@gmail.com> wrote:
> 
>> Thx, Julian
>> 
>> Why not just support non-equi join condition for every physical algorithm,
>> it does not make much sense if we have both HashJoin and a HashTheraJoin,
>> cause a HashThataJoin with empty non-equi join condition is same as  a
>> HashJoin.
>> 
>> And we can remove the limitations in the rule like FilterJoinRule.
>> 
>> Best,
>> 
>> Danny Chan
>> 在 2019年4月23日 +0800 AM3:21,dev@calcite.apache.org,写道:
>>> 
>>> If there are limitations, over time we would like to remove those
>> limitations, but we will probably do it by adding new algorithms, and
>> therefore new EnumerableXxx classes.
>> 

Reply via email to