Thank you, Stamatis. This is helpful. I have linked to this email thread in 
CALCITE-5631.

> On Apr 16, 2023, at 2:44 AM, Stamatis Zampetakis <zabe...@gmail.com> wrote:
> 
> Few quick thoughts about this.
> 
> For to the problem of query minimization/redundant joins the simpler
> scenarios that I can think of are the following:
> 
> # Scenario A
> select e1.id from emp e1 inner join emp e2 on e1.name = e2.name;
> 
> If you know the name column is UNIQUE then you can drop the join on e2.
> 
> # Scenario B
> select e.name from emp e inner join dept d on e.dept = d.id;
> 
> If you know that e.dept and d.id is a foreign key relationship then
> you can drop the join on dept.
> 
> There are probably other cases to define and handle but we should move
> incrementally.
> 
> As Julian pointed out, the issue logged  in CALITE-5631 could also be
> addressed by employing common table expression related optimizations.
> CTE optimizations and query minimization are both interesting and
> powerful techniques to reduce the cost of the query (whatever that is
> speed, money, resources, etc).
> 
> I would suggest focusing on query minimization first since it is
> pretty well defined and we could come up with solutions much faster
> than CTEs. CTEs usually come up with decisions about materializing or
> not the common expressions which are closer to lower level ("physical
> plan") optimizations.
> 
> Most minimization techniques focus on select project join (SPJ)
> queries so I guess we would have to do some preprocessing to bring the
> plan in this format (only Scan, Project, Filter, and Join operators)
> before applying the rule. It would be a separate planning phase
> combining a bunch of existing rules followed by some new which is
> inline with what Julian was saying about bottom-up unification.
> 
> The new rule could be something similar to LoptOptimizeJoinRule that
> operates on a MultiJoin. I haven't checked if the MultiJoin operator
> is sufficient to express an SPJ query but I think the general idea of
> grouping joins together seems to be a promising direction for writing
> new rules.
> 
> Best,
> Stamatis
> 
> On Sun, Apr 16, 2023 at 2:27 AM Julian Hyde <jh...@apache.org> wrote:
>> 
>> Ian Bertolacci recently logged
>> https://issues.apache.org/jira/browse/CALCITE-5631, to convert
>> 
>>  select
>>     (select numarrayagg(C5633_203) from T893 where C5633_586 = T895.id),
>>     (select numarrayagg(C5633_170) from T893 where C5633_586 = T895.id)
>>  from T895
>> 
>> into
>> 
>>  select agg.agg1,
>>      agg.agg2
>>  from T895
>>  left join (
>>    select C5633_586,
>>        numarrayagg(C5633_203) as agg1,
>>        numarrayagg(C5633_170) as agg2
>>    from T893
>>    where C5633_586 is not null
>>    group by C5633_586) as agg
>>  on agg.C5633_586 = T895.id
>> 
>> This seems to me an interesting and important problem. But it's also a
>> hard problem, and it's not clear to me which approach is the best.
>> Does anyone have any ideas for how to approach it?
>> 
>> Also, we could use more example queries that illustrate the general
>> pattern.  (Preferably in terms of simple databases such as EMP and
>> DEPT.)
>> 
>> In Calcite rewrite rules (RelRule) are usually the preferred approach.
>> Because the common relational expressions scans can be an arbitrary
>> distance apart in the RelNode tree, RelRule doesn't seem suitable.
>> 
>> There seem to be some similarities to algorithms to use materialized
>> views, which use bottom-up unification.
>> 
>> Ian's original query actually has correlated scalar sub-queries rather
>> than explicit joins. Would it be better to target common sub-queries
>> rather than joins?
>> 
>> Lastly, there are similarities with the WinMagic algorithm, which
>> converts correlated sub-queries into window aggregates. Is that a
>> useful direction? (My implementation of measures in CALCITE-4496
>> naturally creates correlated scalar sub-queries that can be inlined in
>> the enclosing query if simple, or converted to window aggregates if
>> more complex.)
>> 
>> Julian

Reply via email to