I also have this question when i was doing this patch, cause Enumerable nodes are physical operators(implementation) and should be specific to Calcite. So in the beginning i didn’t classify them as public APIs and only keep the constructors and methods for logical nodes.
Well this searching is somewhat convincing, there are many interfaces/methods are marked as deprecated and to be removed before 2.0, it would be better if the principles for backward compatibility are more clear. Best, Danny Chan 在 2019年4月14日 +0800 PM7:41,Hongze Zhang <notify...@126.com>,写道: > I didn't take look on the PR in detail so far, but it seems that a topic > about backward compatibility would be worth to discuss anyway. > > Regarding the Enumerable's API, I don't think there are many use cases of > them from Calcite users. Although users may create instances in some custom > rules, or extend the Enumerable rels' classes to implement some specific > behaviors, I am still not sure if such cases are that usual. > > For example, I've run a Google search for term "EnumerableJoin.create" on > github.com, only 2 results returned[1], and both are from apache/calcite > project. Similar result on "EnumerableCorrelate.create". I am pretty sure > that Google could not give a precise result about code usage (I don't find a > way to search these terms using GitHub code search), but at least it shows > some sort of trend. As a comparison there are 33 results[2] for > "LogicalJoin.create", some are from external projects. > > So my question would be: how much backward compatibility should we respect > when we make API changes to Calcite? To me it is not much clear. I know > compatibility is very, very important for an Apache project (see "The Apache > Project Maturity Model/QU40"[3]), but I am not sure if we should add > "@Depracated" to any changed public staffs, the code will be messy and hard > to understand. > > Anyway my example about EnumerableJoin/Correlate just shows my confusion on a > broader topic. So I will be +1 to the consensus that already be achieved so > far. But I'll be happy to hear more principles on how to manage the backward > compatibility for Calcite, such as: what's the definition about Calcite's > public API, or what changes would be considered backward-incompatible, etc. I > think that will also benefit our developers a lot. > > > Best, > Hongze > > > [1] > https://www.google.com/search?q=%22EnumerableJoin.create%22+site%3A%3Ahttps%3A%2F%2Fgithub.com > [2] > https://www.google.com/search?q=%22LogicalJoin.create%22+site%3A%3Ahttps%3A%2F%2Fgithub.com > [3] > https://community.apache.org/apache-way/apache-project-maturity-model.html#quality > > > On Apr 14, 2019, at 14:53, Walaa Eldin Moustafa <wa.moust...@gmail.com> > > wrote: > > > > Agreed, but not sure what would the best way to do it be without > > making the code very confusing. > > > > On Sat, Apr 13, 2019 at 2:46 PM Haisheng Yuan <h.y...@alibaba-inc.com> > > wrote: > > > > > > I share the same concern with you. > > > > > > > > > > > > > > > > > > Thanks~ > > > Haisheng > > > Yuan------------------------------------------------------------------ > > > 发件人:Stamatis Zampetakis<zabe...@gmail.com> > > > 日 期:2019年04月14日 05:37:29 > > > 收件人:<dev@calcite.apache.org> > > > 主 题:Re: Join, SemiJoin, Correlate > > > > > > Hi Danny, > > > > > > Thanks a lot for taking this on, it is a great start! > > > > > > I didn't look thoroughly through the PR but I noticed that there are many > > > renaming/refactoring of public APIs. I am not sure if we should introduce > > > so many breaking changes without prior notice. A most conservative > > > approach > > > would be to keep existing classes/methods, mark them as deprecated, and > > > then remove them in one of the coming releases. I am not sure if that is > > > the right way to go so let's see what the others have to say. > > > > > > Best, > > > Stamatis > > > > > > On Fri, Apr 12, 2019 at 9:18 AM Yuzhao Chen <yuzhao....@gmail.com> wrote: > > > > > > > Hi, @Haisheng Yuan, @Julian Hyde, @Stamatis Zampetakis, > > > > @Walaa Eldin Moustafa > > > > > > > > I have did the work for this discussion, and look forward to your > > > > suggestions. > > > > > > > > > > > > ### Diff > > > > - Deprecate SemiJoin, EquiJoin, EnumerableSemiJoin, SemiJoinType, > > > > EnumerableSemiJoinRule, EnumerableThetaJoin > > > > - Make EnumerableMergeJoin extends Join instead of EquiJoin > > > > - Add SEMI and ANTI join type to JoinRelType, add method > > > > returnsJustFirstInput() to decide if the join only outputs left side > > > > - Correlate use JoinRelType instead of SemiJoinType > > > > - Rename EnumerableCorrelate to EnumerableNestedLoopJoin and make it > > > > exptends Join instead of Correlate > > > > - Rename EnumerableJoin to EnumerableHashJoin > > > > - EnumerableJoinRule will convert semi-join to EnumerableNestedLoopJoin > > > > (EnumerableSemiJoin's function is merged into this rule) > > > > - Add method isNonCorrelateSemiJoin() in Join.java to make sure if this > > > > join is a semi-join (Comes from SemiJoinRule) or comes from > > > > decorrelation(SubqueryRemoveRule or RelDecorrelator), the returns value > > > > true means the join is a semi-join equivalent to SemiJoin before this > > > > patch. > > > > - Cache the JoinInfo in Join and use it to get leftKeys and rightKeys, > > > > merge the SemiJoin#computeSelfCost to Join#computeSelfCost > > > > - RelBuilder removes SemiJoinFactory, method #semiJoin now return a > > > > LogicalJoin with JoinRelType#SEMI > > > > > > > > ### Rules tweak > > > > - JoinAddRedundantSemiJoinRule now create LogicalJoin with > > > > JoinRelType#SEMI instead of SemiJoin > > > > - JoinToCorrelateRule remove SEMI instance and change the matchs > > > > condition > > > > to !join.getJoinType().generatesNullsOnLeft() which also allowed ANTI > > > > compared before this patch. > > > > - SemiJoinRule match SEMI join specificlly > > > > > > > > ### Metadata tweak > > > > - RelMdAllPredicates, RelMdExpressionLineage: Add full rowType to > > > > getAllPredicates(Join) cause semi-join only outputs one side > > > > - RelMdColumnUniqueness, RelMdSelectivity, RelMdDistinctRowCount, > > > > RelMdSize, RelMdUniqueKeys: merge semi-join logic to join > > > > > > > > > > > > ### Test cases change > > > > - MaterializationTest#testJoinMaterialization11 now can materialize > > > > successfully, cause i allow logical SemiJoin node to match, the original > > > > matchs SemiJoin as SemiJoin.class.isAssignableFrom(), which i think is > > > > wrong cause this will only matches subClasses of SemiJoin which is only > > > > EnumerableSemiJoin before this patch. > > > > - SortRemoveRuleTest#removeSortOverEnumerableCorrelate, because > > > > CALCITE-2018, the final EnumerableSort's cost was cache by the previous > > > > EnumerableSort with logical childs, so i remove the EnumerableSortRule > > > > and > > > > the best plan is correct > > > > - sub-query.iq has better plan for null correlate > > > > > > > > > > > > > > > > Best, > > > > Danny Chan > > > > 在 2019年3月21日 +0800 AM3:07,Julian Hyde <jh...@apache.org>,写道: > > > > > I just discovered that Correlate, which is neither a Join nor a > > > > SemiJoin, uses SemiJoinType, but SemiJoin does not use SemiJoinType. > > > > > > > > > > Yuck. The Join/SemiJoin/Correlate type hierarchy needs some thought. > > > > > > > > > > Julian > > > > > > > > > > > > > > > > > >