Based on the discussion so far, it seems we would want to go with option #3. Let me know if there are potential problems with that approach.
Aman On Mon, May 11, 2015 at 8:43 PM, Aman Sinha <[email protected]> wrote: > Apart from the JoinType, Correlate would also need to have the > 'condition' to represent a join condition because the FilterJoinRule relies > on placing the join condition on the join node during filter push down. > > Summarizing the alternatives: > 1. Have a completely separate implementation of Correlate specific > rules. This has the obvious disadvantage of redundant code. Also, it is > unlikely that > methods such as classifyFilters() would work seamlessly with the > Correlate specific rules. > 2. The redundant code in #1 can be mitigated by creating base classes for > some of the rules and have the Join specific and Correlator specific rules > share > the code. > 3. Modify Correlate to have JoinType, SemiJoinType as well as > 'condition'. In this sense, it is getting closer to a Join without > actually being a derived class > of Join. The FilterJoinRule and similar rules would be modified to > use 'BiRel' instead of 'Join' since BiRel is the base class for both > Join and Correlate. > > To Julian's question about the list of rules affected, it seems most of > the *Join*Rules would probably need examination otherwise we could miss > certain optimizations. However, we would get most bang for the buck by > focusing on FilterJoinRule, so I would like to get that taken care of > first. > > Aman > > > On Mon, May 11, 2015 at 7:06 PM, Julian Hyde <[email protected]> wrote: > >> Seems a bit of a stretch, since Join has other ways to represent SEMI and >> ANTI. Maybe a Correlate could have both a JoinType and a SemiJoinType? >> >> Can you & Vladimir find a compromise for how to restore the missing >> functionality with no more copy-paste than necessary. It would help if we >> had a full list of rules which ought to work for Correlate. >> >> Julian >> >> On May 11, 2015, at 5:27 PM, Jinfeng Ni <[email protected]> wrote: >> >> > Can we extend Join.JoinType, so that it includes the SemiJointype (SEMI, >> > ANTI) represented by Correlate? That way, we could leverage the rule for >> > Join and apply them to Correlate as well, just like the way it used to >> > work. Otherwise, we have to come up with a new set of rules for >> Correlate, >> > to make thing work again. >> > >> > >> > >> > On Mon, May 11, 2015 at 5:02 PM, Julian Hyde <[email protected]> >> wrote: >> > >> >> This comment in Correlate seems to express Vladimir’s motivation: >> >> >> >>> Correlate is not a join since: typical rules should not match >> Correlate. >> >> >> >> I agree with him. For instance, Correlate.joinType is enum >> SemiJoinType { >> >> INNER, LEFT, SEMI, ANTI } and therefore different semantics to >> >> Join.joinType. >> >> >> >> It’s unfortunate that FilterJoinRule got broken. We should fix it. Any >> >> other rules that would be needed? Probably ProjectJoinTransposeRule, >> >> AggregateJoinTransposeRule. >> >> >> >> Julian >> >> >> >> >> >> On May 11, 2015, at 4:17 PM, Aman Sinha <[email protected]> wrote: >> >> >> >>> As part of CALCITE-483, the class hierarchy of CorrelateRel was >> changed >> >>> such that the new LogicalCorrelate is not a derived class of Join >> >> anymore. >> >>> Thus, any rule such as FilterJoinRule that used to push the filter >> down >> >>> into the Join (or a derived class of Join) does not apply anymore for >> the >> >>> LogicalCorrelate. >> >>> >> >>> I am continuing down the path of my proposal to have a version of the >> >> push >> >>> filter rule that allows pushing into/past a LogicalCorrelate. But >> >> perhaps >> >>> Vladimir can shed some light on the motivation for changing the class >> >>> hierarchy. >> >>> >> >>> thanks, >> >>> Aman >> >>> >> >>> >> >>> On Mon, May 11, 2015 at 10:21 AM, Aman Sinha <[email protected]> >> >> wrote: >> >>> >> >>>> Note that I have made some changes to the decorrlation logic to call >> >>>> findBestExp() *after* the decorrelation is done and supply it the >> set >> >> of >> >>>> rules including FilterJoinRule. This does push the join condition >> into >> >> one >> >>>> part of the tree but it does not push it into all other parts where >> that >> >>>> join may have been copied during decorrelation. The main point is: >> >> we >> >>>> need to do the filter pushdown early rather than late. >> >>>> >> >>>> Aman >> >>>> >> >>>> On Mon, May 11, 2015 at 10:16 AM, Aman Sinha <[email protected]> >> >> wrote: >> >>>> >> >>>>> I want to be able to push the join condition (=($7, $9)) highlighted >> >> into >> >>>>> the LogicalJoin that is right below the LogicalCorrelate. What's >> the >> >> right >> >>>>> way to do it ? >> >>>>> >> >>>>> The current method of first decorrelating and then pushing the >> filter >> >>>>> (via the FilterJoinRule) is not quite right because once >> decorrelation >> >> is >> >>>>> done, it may be too late to push the filter into the join. During >> >>>>> decorrelation we take that LogicalJoin (with its TRUE condition) and >> >> push >> >>>>> it into other places - for instance we call createDistinct() to >> build a >> >>>>> distinct row set on the result of this join but since the join has a >> >> true >> >>>>> condition, the distinct is created on a cartesian join. >> >>>>> >> >>>>> What I really need is something like a FilterJoinRule that allows >> >> pushing >> >>>>> it past a LogicalCorrelate. >> >>>>> >> >>>>> LogicalProject(EXPR$0=[1]): rowcount = 1.0, cumulative cost = 10.25, >> >> id = >> >>>>> 53 >> >>>>> LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], >> >>>>> HIREDATE=[$4], SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8], >> >>>>> DEPTNO0=[$9], NAME=[$10], EXPR$0=[$11]): rowcount = 1.0, cumulative >> >> cost = >> >>>>> 9.25, id = 71 >> >>>>> * LogicalFilter(condition=[AND(=($7, $9), >($5, $11))]): rowcount >> = >> >>>>> 1.0, cumulative cost = 8.25, id = 68* >> >>>>> LogicalCorrelate(correlation=[$cor0], joinType=[LEFT], >> >>>>> requiredColumns=[{0}]): rowcount = 1.0, cumulative cost = 7.25, id >> = 61 >> >>>>> LogicalJoin(condition=[true], joinType=[inner]): rowcount = >> 1.0, >> >>>>> cumulative cost = 1.0, id = 42 >> >>>>> LogicalTableScan(table=[[CATALOG, SALES, EMP]]): rowcount = >> >>>>> 1.0, cumulative cost = 0.0, id = 11 >> >>>>> LogicalTableScan(table=[[CATALOG, SALES, DEPT]]): rowcount = >> >>>>> 1.0, cumulative cost = 0.0, id = 12 >> >>>>> LogicalAggregate(group=[{}], EXPR$0=[AVG($5)]): rowcount = >> 1.0, >> >>>>> cumulative cost = 2.125, id = 47 >> >>>>> LogicalFilter(condition=[=($cor0.EMPNO, $0)]): rowcount = >> 1.0, >> >>>>> cumulative cost = 1.0, id = 45 >> >>>>> LogicalTableScan(table=[[CATALOG, SALES, EMP]]): rowcount >> = >> >>>>> 1.0, cumulative cost = 0.0, id = 14 >> >>>>> >> >>>>> >> >>>>> Thanks, >> >>>>> Aman >> >>>>> >> >>>> >> >>>> >> >> >> >> >> >> >
