Re: Support non-equi join condition for EnumerableJoin, EnumerableMergeJoin and EnumerableSemiJoin ? Just a discussing..

Yuzhao Chen Fri, 26 Apr 2019 15:32:03 -0700

Thx, Stamatis

Finally I think that [3] will provide better compatibility, just like you said 
that some engines may not support non-equi HashJoin, so pushing into non-equi 
join conditions seems not that necessary.


The better way is to give more join algorithm :), I will try my best to 
implement more algorithms if I have time.

Best,
Danny Chan
在 2019年4月26日 +0800 PM9:29，Stamatis Zampetakis <zabe...@gmail.com>，写道：
> Hi Danny,
>
> Yes, I agree that we don't need two variations of hash joins
> (EnumerableHashJoin and EnumerableThetaHashJoin) as it is the code right
> now.
>
> Having that said, I am not sure if we can really handle the general case of
> theta joins with hash-based algorithms.
>
> The approach in [3] does not really handle general theta joins but allows
> to split a theta condition into equi and non-equi parts where the non-equi
> part is executed as a post-join filter.
> Since the post-join filter is part of the join operator we can say that we
> have a theta hash join operator but this is partially true.
>
> For instance, the following query:
>
> SELECT e1.name
> FROM emp e1
> INNER JOIN emp e2
> ON e1.salary < e2.salary
>
> cannot use at all hash join algorithm [3] because there is no equality
> condition.
>
> I wouldn't mind keeping EnumerableHashJoin only for equijoins and execute
> the post-join filter as such but this would make outer joins
> difficult/impossible to handle so I believe [3] is a good step forward.
>
> Best,
> Stamatis
>
>
>
>
>
> On Wed, Apr 24, 2019 at 6:28 AM Yuzhao Chen <yuzhao....@gmail.com> wrote:
>
> > Julian,
> >
> > I also agree with you that we should distinguish between
> > EnumerableHashJoin and EnumerableMergeJoin, what i want to address about is
> > if we should extend the EnumerableHashJoin and EnumerableMergeJoin to let
> > them support Thera join conditions except the EquiJoin conditions.
> >
> > It seems that Stamatis also agree that we can extend the existing
> > EnumerableXXXs, correct me if i have wrong understanding.
> >
> >
> > Best,
> > Danny Chan
> > 在 2019年4月24日 +0800 AM2:18，Julian Hyde <jh...@apache.org>，写道：
> > > I agree with Stamatis.
> > >
> > > Let’s suppose we have a merge join and hash join in enumerable
> > convention. The merge join requires its input to be sorted, the hash join
> > does not. They can both perform equi-join. The merge join can also perform
> > joins like
> > >
> > > FROM orders
> > > JOIN shipments
> > > ON shipments.shipDate BETWEEN orders.orderDate
> > > AND orders.orderDate + INTERVAL ‘7’ DAY
> > >
> > > Should we use the same sub-class of RelNode, EnumerableJoin, for both,
> > or should we create sub-classes EnumerableHashJoin and EnumerableMergeJoin?
> > >
> > > I think we should do the latter:
> > > * They are clearly different algorithms, and we generate different code
> > for them
> > > * They have different cost models
> > > * They have different input and output traits (the merge join requires
> > sorted inputs and produces a sorted output)
> > > * If we have an equi-join, we might want to generate both options and
> > see how they compare (the merge join might win if one or both of the inputs
> > are sorted), and both algorithms used the EnumerableJoin class we would not
> > be able to distinguish the RelNode instances.
> > >
> > > Julian
> > >
> > >
> > > > On Apr 23, 2019, at 11:06 AM, Stamatis Zampetakis <zabe...@gmail.com>
> > wrote:
> > > >
> > > > In the literature, there are many algorithms to perform a join and the
> > vast
> > > > majority of them cannot handle theta conditions (either for performance
> > > > reasons, or because the algorithm was simply not meant for it).
> > > >
> > > > For the existing EnumerableXXXJoin variants, it may be possible to
> > extend
> > > > the algorithm to handle theta joins without structural changes to the
> > > > algorithm (as I think it is the case in [3]).
> > > > For these cases, I agree with you it does not make sense to have many
> > > > variants (e.g., EnumerableThetaHashJoin seems redundant).
> > > >
> > > > In the future we may decide to use other hash-based implementations
> > with
> > > > certain applicability restrictions (such as apply only on equijoins)
> > but
> > > > let's not worry too much about that now.
> > > >
> > > > Regarding the rules, I think it is useful to have different variants.
> > > > Consider for example a system that does not have join processing
> > algorithms
> > > > that can treat theta-joins; pushing non-equi conditions in a join seems
> > > > undesirable in this case.
> > > >
> > > > For more information regarding join processing, there are a few
> > interesting
> > > > (and old) surveys [4, 5] which outline some of the most typical
> > algorithms
> > > > that are used in relational databases.
> > > >
> > > > Best,
> > > > Stamatis
> > > >
> > > > [4] Query evaluation techniques for Large databases (
> > > > https://web.stanford.edu/class/cs346/2014/graefe.pdf)
> > > > [5] Join processing in relational databases (
> > > > https://www.csd.uoc.gr/~hy460/pdf/p63-mishra.pdf)
> > > >
> > > >
> > > > On Tue, Apr 23, 2019 at 8:05 AM Yuzhao Chen <yuzhao....@gmail.com>
> > wrote:
> > > >
> > > > > Thx, Julian
> > > > >
> > > > > Why not just support non-equi join condition for every physical
> > algorithm,
> > > > > it does not make much sense if we have both HashJoin and a
> > HashTheraJoin,
> > > > > cause a HashThataJoin with empty non-equi join condition is same as a
> > > > > HashJoin.
> > > > >
> > > > > And we can remove the limitations in the rule like FilterJoinRule.
> > > > >
> > > > > Best,
> > > > >
> > > > > Danny Chan
> > > > > 在 2019年4月23日 +0800 AM3:21，dev@calcite.apache.org，写道：
> > > > > >
> > > > > > If there are limitations, over time we would like to remove those
> > > > > limitations, but we will probably do it by adding new algorithms, and
> > > > > therefore new EnumerableXxx classes.
> > > > >
> > >
> >

Re: Support non-equi join condition for EnumerableJoin, EnumerableMergeJoin and EnumerableSemiJoin ? Just a discussing..

Reply via email to