Without fully thinking through the implications of this, I personally like option 4. I think it's nice to be able to keep the distinction. That said, the consensus seems to lean towards option 3 which also sounds acceptable. -- Michael Mior mm...@apache.org
Le jeu. 21 mars 2019 à 14:55, Julian Hyde <jh...@apache.org> a écrit : > > I have a few ideas for refactorings. (I’m not convinced by any of them, but > let me know which you like.) > > 1. Get rid of SemiJoinType. It is mis-named (it is not used by SemiJoin, it > is used by Correlate, but in a field called joinType). > > 2. In Correlate, use org.apache.calcite.linq4j.CorrelateJoinType. It has the > same set of values as SemiJoinType, but it has a better name. > > 3. Get rid of both SemiJoinType and CorrelateJoinType, and use JoinRelType > for everything. We would have to add SEMI and ANTI values. Also some methods > to find out whether the resulting row type contains fields from the left and > right inputs or just the left input. > > 4. Add “interface JoinLike extends BiRel” and make Join, SemiJoin and > Correlate implement it. It would have a methods that say whether the LHS and > RHS generate nulls, and whether the output row type contains columns from the > right input. This seems attractive because it lets Join, SemiJoin and > Correlate continue to be structurally different. > > Julian > > > > > > On Mar 20, 2019, at 6:55 PM, Haisheng Yuan <h.y...@alibaba-inc.com> wrote: > > > > SubPlan (in Postgres’ term) is a Postgres physical relational node to > > evaluate correlated subquery. What I mean is correlated subquery that can’t > > be decorrelated can’t be implemented by hashjoin or mergejoin. But it is > > off topic. > > > > Thanks ~ > > Haisheng Yuan > > ------------------------------------------------------------------ > > 发件人:Walaa Eldin Moustafa<wa.moust...@gmail.com> > > 日 期:2019年03月21日 09:31:41 > > 收件人:<dev@calcite.apache.org> > > 抄 送:Stamatis Zampetakis<zabe...@gmail.com> > > 主 题:Re: Re: Join, SemiJoin, Correlate > > > > Agreed with Stamatis. Currently: 1) Correlate is tied to IN, EXISTS, > > NOT IN, NOT EXISTS, and 2) is used as an equivalent to nested loops > > join. The issues here are: 1) IN, EXISTS, NOT IN, NOT EXISTS can be > > rewritten as semi/anti joins, and 2) nested loops join is more of a > > physical operator. > > > > It seems that the minimal set of logical join types are INNER, LEFT, > > RIGHT, OUTER, SEMI, ANTI. > > > > So I think Calciate could have one LogicalJoin operator with an > > attribute to specify the join type (from the above), and a number of > > physical join operators (hash, merge, nested loops) whose > > implementation details depend on the the join type. > > > > What we lose by this model is the structure of the query (whether > > there was a sub-plan or not), but I would say that this is actually > > what is desired from a logical representation -- to abstract away from > > how the query is written, and how it is structured, as long as there > > is a canonical representation. There could also be a world where both > > models coexist (Correlate first then Decorrelate but in the light of a > > single logical join operator?). > > > > @Haisheng, generally, a sub-plan can also be implemented using a > > variant of hash or merge joins as long as we evaluate the sub-plan > > independently (without the join predicate), but that is up to the > > optimizer. > > > > Thanks, > > Walaa. > > > > On Wed, Mar 20, 2019 at 5:23 PM Haisheng Yuan <h.y...@alibaba-inc.com> > > wrote: > >> > >> SemiJoinType and its relationship with JoinRelType do confuse me a little > >> bit. > >> > >> But I don’t think we should not have LogicalCorrelate. It is useful to > >> represent the lateral or correlated subquery (aka SubPlan in Postgres > >> jargon). The LogicalCorrelate can be implemented as NestLoopJoin in > >> Calcite, or SubPlan in Postgres’s terminology, but it can’t be implemented > >> as HashJoin or MergeJoin. > >> > >> Thanks ~ > >> Haisheng Yuan > >> ------------------------------------------------------------------ > >> 发件人:Stamatis Zampetakis<zabe...@gmail.com> > >> 日 期:2019年03月21日 07:13:15 > >> 收件人:<dev@calcite.apache.org> > >> 主 题:Re: Join, SemiJoin, Correlate > >> > >> I have bumped into this quite a few times and I think we should really try > >> to improve the design of the join hierarchy. > >> > >> From a logical point of view I think it makes sense to have the following > >> operators: > >> InnerJoin, LeftOuterJoin, FullOuterJoin, SemiJoin, AntiJoin, (GroupJoin) > >> > >> Yet I have not thought thoroughly what should become a class, and what a > >> property of the class (e.g., JoinRelType, SemiJoinType). > >> > >> Moreover, Correlate as it is right now, is basically a nested loop join (as > >> its Javadoc also indicates). > >> Nested loop join is most often encountered as a physical operator so I am > >> not sure if it should remain as is (in particular the LogicalCorrelate). > >> As we do not have HashJoin, MergeJoin, etc., operators at the logical > >> level, I think we should not have a NestedLoopJoin (aka., > >> LogicalCorrelate). > >> There are valid reasons why Correlate was introduced in the first place but > >> I think we should rethink a bit the design and the needs. > >> > >> @Julian: I do not know to what extend you would like to rethink the > >> hierarchy but I have the impression that even small changes can easily > >> break backward compatibility. > >> > >> > >> Στις Τετ, 20 Μαρ 2019 στις 8:07 μ.μ., ο/η Julian Hyde <jh...@apache.org> > >> έγραψε: > >> > >>> I just discovered that Correlate, which is neither a Join nor a SemiJoin, > >>> uses SemiJoinType, but SemiJoin does not use SemiJoinType. > >>> > >>> Yuck. The Join/SemiJoin/Correlate type hierarchy needs some thought. > >>> > >>> Julian > >>> > >>> > >>> > >> >