Hi Danny, Thanks a lot for taking this on, it is a great start!
I didn't look thoroughly through the PR but I noticed that there are many renaming/refactoring of public APIs. I am not sure if we should introduce so many breaking changes without prior notice. A most conservative approach would be to keep existing classes/methods, mark them as deprecated, and then remove them in one of the coming releases. I am not sure if that is the right way to go so let's see what the others have to say. Best, Stamatis On Fri, Apr 12, 2019 at 9:18 AM Yuzhao Chen <yuzhao....@gmail.com> wrote: > Hi, @Haisheng Yuan, @Julian Hyde, @Stamatis Zampetakis, > @Walaa Eldin Moustafa > > I have did the work for this discussion, and look forward to your > suggestions. > > > ### Diff > - Deprecate SemiJoin, EquiJoin, EnumerableSemiJoin, SemiJoinType, > EnumerableSemiJoinRule, EnumerableThetaJoin > - Make EnumerableMergeJoin extends Join instead of EquiJoin > - Add SEMI and ANTI join type to JoinRelType, add method > returnsJustFirstInput() to decide if the join only outputs left side > - Correlate use JoinRelType instead of SemiJoinType > - Rename EnumerableCorrelate to EnumerableNestedLoopJoin and make it > exptends Join instead of Correlate > - Rename EnumerableJoin to EnumerableHashJoin > - EnumerableJoinRule will convert semi-join to EnumerableNestedLoopJoin > (EnumerableSemiJoin's function is merged into this rule) > - Add method isNonCorrelateSemiJoin() in Join.java to make sure if this > join is a semi-join (Comes from SemiJoinRule) or comes from > decorrelation(SubqueryRemoveRule or RelDecorrelator), the returns value > true means the join is a semi-join equivalent to SemiJoin before this patch. > - Cache the JoinInfo in Join and use it to get leftKeys and rightKeys, > merge the SemiJoin#computeSelfCost to Join#computeSelfCost > - RelBuilder removes SemiJoinFactory, method #semiJoin now return a > LogicalJoin with JoinRelType#SEMI > > ### Rules tweak > - JoinAddRedundantSemiJoinRule now create LogicalJoin with > JoinRelType#SEMI instead of SemiJoin > - JoinToCorrelateRule remove SEMI instance and change the matchs condition > to !join.getJoinType().generatesNullsOnLeft() which also allowed ANTI > compared before this patch. > - SemiJoinRule match SEMI join specificlly > > ### Metadata tweak > - RelMdAllPredicates, RelMdExpressionLineage: Add full rowType to > getAllPredicates(Join) cause semi-join only outputs one side > - RelMdColumnUniqueness, RelMdSelectivity, RelMdDistinctRowCount, > RelMdSize, RelMdUniqueKeys: merge semi-join logic to join > > > ### Test cases change > - MaterializationTest#testJoinMaterialization11 now can materialize > successfully, cause i allow logical SemiJoin node to match, the original > matchs SemiJoin as SemiJoin.class.isAssignableFrom(), which i think is > wrong cause this will only matches subClasses of SemiJoin which is only > EnumerableSemiJoin before this patch. > - SortRemoveRuleTest#removeSortOverEnumerableCorrelate, because > CALCITE-2018, the final EnumerableSort's cost was cache by the previous > EnumerableSort with logical childs, so i remove the EnumerableSortRule and > the best plan is correct > - sub-query.iq has better plan for null correlate > > > > Best, > Danny Chan > 在 2019年3月21日 +0800 AM3:07,Julian Hyde <jh...@apache.org>,写道: > > I just discovered that Correlate, which is neither a Join nor a > SemiJoin, uses SemiJoinType, but SemiJoin does not use SemiJoinType. > > > > Yuck. The Join/SemiJoin/Correlate type hierarchy needs some thought. > > > > Julian > > > > >