Thanks for raising this, Haisheng. Do you mean that we should have better subquery unnesting?
Best, Chunwei On Tue, Jun 4, 2019 at 8:27 AM Haisheng Yuan <h.y...@alibaba-inc.com> wrote: > Hi, > > Since we have commited CALCITE-2969, and in the next release of 1.21, we > will deprecate EquiJoin and make enumerable hash join support non-equi join > condition, the optimization door is open to us. > > Currently Calcite often generates complicated and inefficient plan for > boolean context correlated subquery (putting value context subquery aside, > which is more complex during unnesting). e.g. > https://issues.apache.org/jira/browse/CALCITE-2857 > https://issues.apache.org/jira/browse/CALCITE-2874 > > There are many more cases if you want to try and find out. Some underlying > reasons are: > 1. SemJoin doesn't support non-equi condition > 2. Correlate comes with Aggregate in many cases > > Sometimes, SubqueryRemoveRule generates Correlate with Aggregate, > sometimes generates Join with Aggregate. Then we use SemiJoinRule to match > Join(Rel, Agg) pattern to see whether we can transform it to semijoin or > not. This is counter intuitive, because we already know it will be a > SemiJoin or AntiSemi when we generate Correlate for the subquery. In > current way, we may miss the chance to get back SemiJoin for it, and we > almost lose the opportunity to do IndexScan if there is index available. > > In addition, I see 2 rules JoinAddRedundantSemiJoinRule and > SemiJoinRemoveRule seem to provide the chance to do index scan. But through > the test cases related with the 2 rules, I don't see any chance for a > better plan with them and they don't seem to be able to generate plan with > index scan. I doubt whether they are used at all. Can someone shed some > light on this and give us some examples, if I get it wrong? > > The transformation flow for boolean context sub-query in my head is: > sub-query -> Correate(Semi/AntiSemi) without Aggregate (SubqueryRemoveRule) > Correlate(Semi/AntiSemi) -> NestedLoopJoin (correlate implementation rule) > Correlate(Semi/AntiSemi) -> IndexedNestedLoopJoin (index apply rule) > Correlate(Semi/AntiSemi) -> SemiJoin/AntiSemiJoin (Semi Correlate to Semi > Join rule) > SemiJoin -> InnerJoin(rel, Agg) (SemiJoin to Join rule) > InnerJoin(rel, Agg) -> InnerJoin(Agg, rel) (Join reorder rule) > SemiJoin/InnerJoin -> HashJoin/MergeJoin/NLJ (Join implementation rule) > > This is a long shot, and will involve tons of changes. > Any thoughts? > > - Haisheng Yuan