Multi-query optimization is one of the big challenges of our field.
Examples of multi-queries:
 * an INSERT statement that writes into a table but also updates an index,
 * a DAG that represents an ETL/ELT job;
 * a query that produces several data sets (say a list of invoices for
orders and a list of products that need to be restocked);
 * a query that uses intermediate results more than once.

Common features of multi-queries are multiple output data sets, and
re-used intermediate results. In other words, the dataflow graph is a
DAG rather than just a tree.

I think it would be useful to frame the problem by extending SQL so
that such multi-queries can be represented as a single unit. From that
would follow extensions to relational algebra, and improvements to
planner algorithms and cost models.

I would like to hear people's thoughts before I log a Jira case with a
sketch of the problem.

Related work:
 * https://issues.apache.org/jira/browse/CALCITE-481 Spool operator
 * https://issues.apache.org/jira/browse/CALCITE-1440 Multiple RelNodes
 * https://issues.apache.org/jira/browse/CALCITE-4568 Incremental
query optimization
 * https://issues.apache.org/jira/browse/CALCITE-129 Recursive queries

Julian

Reply via email to