Mihai Budiu created CALCITE-7031:
------------------------------------

             Summary: Implement the general decorrelation algorithm (Neumann & 
Kemper)
                 Key: CALCITE-7031
                 URL: https://issues.apache.org/jira/browse/CALCITE-7031
             Project: Calcite
          Issue Type: Wish
          Components: core
    Affects Versions: 1.39.0
            Reporter: Mihai Budiu


Today Calcite uses a heuristic decorrelator which looks for some specific 
patterns and replaces them with equivalent ones. This can handle a restricted 
set of queries.

There is a general algorithm for doing this, at least for the standard 
operators: 
https://github.com/lonng/db-papers/blob/main/papers/nested-query/unnesting-arbitrary-queries.pdf

The algorithm is a series of rewrites rules, each of which pushes the 
decorrelation operators down in the plan towards leaves, until they can be 
eliminated. 

It would be great if Calcite had an implementation of this algorithm. This 
could be contributed in pieces, where each rewrite rule is a separate PR. I 
think these rules could be easily implemented in the existing planner rewrite 
framework.

There are three design problems that I can foresee:

1) The plans generated by this algorithm are in general DAGs and not trees. I 
think trees would work, but they have the potential to be much larger than the 
corresponding DAGs (perhaps exponentially larger in the size of the original 
plan). I understand that there is a Calcite operator for representing DAGs; I 
don't know if that's the goal of the Spool operator - it seems to imply some 
kind of materialization of the result. The most important decision is how to 
represent DAG plans such that the rewriting framework continues to operate 
correctly.

2) A second issue is that the rewrite rules in the paper use a restricted form 
of relation D which has some nice properties (e.g., it is a set). I am not sure 
how such information can be represented in a Calcite plan, but I suspect this 
can be done.

3) Third, while the algorithm in the paper handles many SQL-like operators, the 
Calcite IR is even richer, supporting operators like Window, Cubes, Unnest, 
recursive queries, etc. I don't know how the algorithm would extend to plans 
containing such operators. But even if it doesn't handle all such operators, a 
general-purpose decorrelator would be a significant improvement over the 
existing one.

This project would also give us the chance to close many issues related to the 
current decorrelator, some of which have been unsolved over many years.

Please comment here if you are interested in this project. I think the most 
important problem to address is no 1 above, the handling of DAGs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to