Rewrite rule to convert self-joins into scans

Julian Hyde Sat, 15 Apr 2023 17:27:57 -0700

Ian Bertolacci recently logged
https://issues.apache.org/jira/browse/CALCITE-5631, to convert


  select
     (select numarrayagg(C5633_203) from T893 where C5633_586 = T895.id),
     (select numarrayagg(C5633_170) from T893 where C5633_586 = T895.id)
  from T895

into

  select agg.agg1,
      agg.agg2
  from T895
  left join (
    select C5633_586,
        numarrayagg(C5633_203) as agg1,
        numarrayagg(C5633_170) as agg2
    from T893
    where C5633_586 is not null
    group by C5633_586) as agg
  on agg.C5633_586 = T895.id

This seems to me an interesting and important problem. But it's also a
hard problem, and it's not clear to me which approach is the best.
Does anyone have any ideas for how to approach it?

Also, we could use more example queries that illustrate the general
pattern.  (Preferably in terms of simple databases such as EMP and
DEPT.)

In Calcite rewrite rules (RelRule) are usually the preferred approach.
Because the common relational expressions scans can be an arbitrary
distance apart in the RelNode tree, RelRule doesn't seem suitable.

There seem to be some similarities to algorithms to use materialized
views, which use bottom-up unification.

Ian's original query actually has correlated scalar sub-queries rather
than explicit joins. Would it be better to target common sub-queries
rather than joins?

Lastly, there are similarities with the WinMagic algorithm, which
converts correlated sub-queries into window aggregates. Is that a
useful direction? (My implementation of measures in CALCITE-4496
naturally creates correlated scalar sub-queries that can be inlined in
the enclosing query if simple, or converted to window aggregates if
more complex.)

Julian

Rewrite rule to convert self-joins into scans

Reply via email to