Julian,

I also agree with you that we should distinguish between EnumerableHashJoin and 
EnumerableMergeJoin, what i want to address about is if we should extend the 
EnumerableHashJoin and EnumerableMergeJoin to let them support Thera join 
conditions except the EquiJoin conditions.

It seems that Stamatis also agree that we can extend the existing 
EnumerableXXXs, correct me if i have wrong understanding.


Best,
Danny Chan
在 2019年4月24日 +0800 AM2:18,Julian Hyde <jh...@apache.org>,写道:
> I agree with Stamatis.
>
> Let’s suppose we have a merge join and hash join in enumerable convention. 
> The merge join requires its input to be sorted, the hash join does not. They 
> can both perform equi-join. The merge join can also perform joins like
>
> FROM orders
> JOIN shipments
> ON shipments.shipDate BETWEEN orders.orderDate
> AND orders.orderDate + INTERVAL ‘7’ DAY
>
> Should we use the same sub-class of RelNode, EnumerableJoin, for both, or 
> should we create sub-classes EnumerableHashJoin and EnumerableMergeJoin?
>
> I think we should do the latter:
> * They are clearly different algorithms, and we generate different code for 
> them
> * They have different cost models
> * They have different input and output traits (the merge join requires sorted 
> inputs and produces a sorted output)
> * If we have an equi-join, we might want to generate both options and see how 
> they compare (the merge join might win if one or both of the inputs are 
> sorted), and both algorithms used the EnumerableJoin class we would not be 
> able to distinguish the RelNode instances.
>
> Julian
>
>
> > On Apr 23, 2019, at 11:06 AM, Stamatis Zampetakis <zabe...@gmail.com> wrote:
> >
> > In the literature, there are many algorithms to perform a join and the vast
> > majority of them cannot handle theta conditions (either for performance
> > reasons, or because the algorithm was simply not meant for it).
> >
> > For the existing EnumerableXXXJoin variants, it may be possible to extend
> > the algorithm to handle theta joins without structural changes to the
> > algorithm (as I think it is the case in [3]).
> > For these cases, I agree with you it does not make sense to have many
> > variants (e.g., EnumerableThetaHashJoin seems redundant).
> >
> > In the future we may decide to use other hash-based implementations with
> > certain applicability restrictions (such as apply only on equijoins) but
> > let's not worry too much about that now.
> >
> > Regarding the rules, I think it is useful to have different variants.
> > Consider for example a system that does not have join processing algorithms
> > that can treat theta-joins; pushing non-equi conditions in a join seems
> > undesirable in this case.
> >
> > For more information regarding join processing, there are a few interesting
> > (and old) surveys [4, 5] which outline some of the most typical algorithms
> > that are used in relational databases.
> >
> > Best,
> > Stamatis
> >
> > [4] Query evaluation techniques for Large databases (
> > https://web.stanford.edu/class/cs346/2014/graefe.pdf)
> > [5] Join processing in relational databases (
> > https://www.csd.uoc.gr/~hy460/pdf/p63-mishra.pdf)
> >
> >
> > On Tue, Apr 23, 2019 at 8:05 AM Yuzhao Chen <yuzhao....@gmail.com> wrote:
> >
> > > Thx, Julian
> > >
> > > Why not just support non-equi join condition for every physical algorithm,
> > > it does not make much sense if we have both HashJoin and a HashTheraJoin,
> > > cause a HashThataJoin with empty non-equi join condition is same as a
> > > HashJoin.
> > >
> > > And we can remove the limitations in the rule like FilterJoinRule.
> > >
> > > Best,
> > >
> > > Danny Chan
> > > 在 2019年4月23日 +0800 AM3:21,dev@calcite.apache.org,写道:
> > > >
> > > > If there are limitations, over time we would like to remove those
> > > limitations, but we will probably do it by adding new algorithms, and
> > > therefore new EnumerableXxx classes.
> > >
>

Reply via email to