Thx, Stamatis Finally I think that [3] will provide better compatibility, just like you said that some engines may not support non-equi HashJoin, so pushing into non-equi join conditions seems not that necessary.
The better way is to give more join algorithm :), I will try my best to implement more algorithms if I have time. Best, Danny Chan 在 2019年4月26日 +0800 PM9:29,Stamatis Zampetakis <zabe...@gmail.com>,写道: > Hi Danny, > > Yes, I agree that we don't need two variations of hash joins > (EnumerableHashJoin and EnumerableThetaHashJoin) as it is the code right > now. > > Having that said, I am not sure if we can really handle the general case of > theta joins with hash-based algorithms. > > The approach in [3] does not really handle general theta joins but allows > to split a theta condition into equi and non-equi parts where the non-equi > part is executed as a post-join filter. > Since the post-join filter is part of the join operator we can say that we > have a theta hash join operator but this is partially true. > > For instance, the following query: > > SELECT e1.name > FROM emp e1 > INNER JOIN emp e2 > ON e1.salary < e2.salary > > cannot use at all hash join algorithm [3] because there is no equality > condition. > > I wouldn't mind keeping EnumerableHashJoin only for equijoins and execute > the post-join filter as such but this would make outer joins > difficult/impossible to handle so I believe [3] is a good step forward. > > Best, > Stamatis > > > > > > On Wed, Apr 24, 2019 at 6:28 AM Yuzhao Chen <yuzhao....@gmail.com> wrote: > > > Julian, > > > > I also agree with you that we should distinguish between > > EnumerableHashJoin and EnumerableMergeJoin, what i want to address about is > > if we should extend the EnumerableHashJoin and EnumerableMergeJoin to let > > them support Thera join conditions except the EquiJoin conditions. > > > > It seems that Stamatis also agree that we can extend the existing > > EnumerableXXXs, correct me if i have wrong understanding. > > > > > > Best, > > Danny Chan > > 在 2019年4月24日 +0800 AM2:18,Julian Hyde <jh...@apache.org>,写道: > > > I agree with Stamatis. > > > > > > Let’s suppose we have a merge join and hash join in enumerable > > convention. The merge join requires its input to be sorted, the hash join > > does not. They can both perform equi-join. The merge join can also perform > > joins like > > > > > > FROM orders > > > JOIN shipments > > > ON shipments.shipDate BETWEEN orders.orderDate > > > AND orders.orderDate + INTERVAL ‘7’ DAY > > > > > > Should we use the same sub-class of RelNode, EnumerableJoin, for both, > > or should we create sub-classes EnumerableHashJoin and EnumerableMergeJoin? > > > > > > I think we should do the latter: > > > * They are clearly different algorithms, and we generate different code > > for them > > > * They have different cost models > > > * They have different input and output traits (the merge join requires > > sorted inputs and produces a sorted output) > > > * If we have an equi-join, we might want to generate both options and > > see how they compare (the merge join might win if one or both of the inputs > > are sorted), and both algorithms used the EnumerableJoin class we would not > > be able to distinguish the RelNode instances. > > > > > > Julian > > > > > > > > > > On Apr 23, 2019, at 11:06 AM, Stamatis Zampetakis <zabe...@gmail.com> > > wrote: > > > > > > > > In the literature, there are many algorithms to perform a join and the > > vast > > > > majority of them cannot handle theta conditions (either for performance > > > > reasons, or because the algorithm was simply not meant for it). > > > > > > > > For the existing EnumerableXXXJoin variants, it may be possible to > > extend > > > > the algorithm to handle theta joins without structural changes to the > > > > algorithm (as I think it is the case in [3]). > > > > For these cases, I agree with you it does not make sense to have many > > > > variants (e.g., EnumerableThetaHashJoin seems redundant). > > > > > > > > In the future we may decide to use other hash-based implementations > > with > > > > certain applicability restrictions (such as apply only on equijoins) > > but > > > > let's not worry too much about that now. > > > > > > > > Regarding the rules, I think it is useful to have different variants. > > > > Consider for example a system that does not have join processing > > algorithms > > > > that can treat theta-joins; pushing non-equi conditions in a join seems > > > > undesirable in this case. > > > > > > > > For more information regarding join processing, there are a few > > interesting > > > > (and old) surveys [4, 5] which outline some of the most typical > > algorithms > > > > that are used in relational databases. > > > > > > > > Best, > > > > Stamatis > > > > > > > > [4] Query evaluation techniques for Large databases ( > > > > https://web.stanford.edu/class/cs346/2014/graefe.pdf) > > > > [5] Join processing in relational databases ( > > > > https://www.csd.uoc.gr/~hy460/pdf/p63-mishra.pdf) > > > > > > > > > > > > On Tue, Apr 23, 2019 at 8:05 AM Yuzhao Chen <yuzhao....@gmail.com> > > wrote: > > > > > > > > > Thx, Julian > > > > > > > > > > Why not just support non-equi join condition for every physical > > algorithm, > > > > > it does not make much sense if we have both HashJoin and a > > HashTheraJoin, > > > > > cause a HashThataJoin with empty non-equi join condition is same as a > > > > > HashJoin. > > > > > > > > > > And we can remove the limitations in the rule like FilterJoinRule. > > > > > > > > > > Best, > > > > > > > > > > Danny Chan > > > > > 在 2019年4月23日 +0800 AM3:21,dev@calcite.apache.org,写道: > > > > > > > > > > > > If there are limitations, over time we would like to remove those > > > > > limitations, but we will probably do it by adding new algorithms, and > > > > > therefore new EnumerableXxx classes. > > > > > > > > > >